Drobo 5D periodically restarting

I have a Thunderbolt Drobo 5D and over the past few months the device has started to randomly reboot at times of data transfer (notably Time Machine backups).

The Drobo has 5 x 8TB 7200 rpm drives and a 512GB SSD cache module.

I have tried the following:

  • different OS (macOS 10.15 Intel, macOS 12 Apple M1)
  • different cables
  • switch from Thunderbolt to USB 3
  • replaced the cache SSD
  • removed each drive sequentially over a period of a few weeks
  • replacement power supply
  • re-flash the latest firmware
  • with / without drive spin down enabled

It appears that the reboot is at times of high data transfer and the error log only reports an ‘internal error’ and nothing more. After reboot, everything is fine, but this is not ideal, of course. I wondered if this could be a power issue (5x 7200 SATA drives), but if I remove one of the drives, the reboot still happens.

Any ideas, or do I just need to find a second-hand 5D / 5D3 and move to that?

Thanks for the summary of your actions to date - looks like you have sifted through most issues that may have been the logical root cause. One new ‘fix’ that came up on another post was deploying a power surge protector that assisted in keeping another Drobo in-check for high data transfers - perhaps this is worth a small investment to explore?

Lastly, how often are your Time Machine backups occurring? Are they scheduled or random?

Thanks for the reply.

The Drobo is downstream of an APC backup device, so would have thought that surge protection is in place? What sort of surge protector are you meaning - AC or DC?

Time Machine backups are over the network and are at fairly random times. If I try and do some heavy work on a TM backup (such as copying a complete snapshot), I can pretty much guarantee that this will stall the drive and it will reboot. This is the same when connected to two different systems (Intel and M1).

What’s not clear is whether high data transfer uses considerably more energy and if for some reason the Drobo is tripping. Could the internal backup battery be at fault?

To be clear, the 5D is connected to a Mac mini running macOS 10.15 and using Thunderbolt. Both are on an APC Pro backup system. One single HFS+ (Mac OS Extended (Journaled)) volume shared via a number of shares over the network, including one for Time Machine backups.

I have also tried connecting directly to an M1-based Mac and done some large file copies. After a fairly short period the drive reboots and the macOS has a wobble as the drive has gone away.

Pesky thing has just rebooted again.

The macOS reported the drive had been disconnected, but the Drobo lights were still all green. Then they all went yellow and I heard the drives spin down. It then rebooted, with the progressive blue dots until it them came back online.

All appears you have a reasonable setup and noted the APC Pro which should resolve any power surges. It almost mimics my setup, except I don’t use my Drobo 5D for any Time Machine backups, so if this is the only application causing your wobbles - would you consider moving this application to another file server?

The Time Machine backups have worked flawlessly for about 7 years, so that is what’s causing me concern. It is just very strange behaviour for the Drobo to shutdown. The log says “an internal fault” but there is no more detail than this to try and work out where the problem lies.

Here is the latest report from the dashboard:

Screenshot 2022-04-25 at 14.41.01

I have powered down the Drobo, removed the drives and given the system a good clean. I then used IPA / switch cleaner on all the SATA HDD connectors and the mSATA card and powered it all back up.

As a test, I created a 1TB ‘sparse image’ on the Drobo and mounted this on the macOS. I am then using BlackMagic’s Disk Speed Test to continuously read / write a 10GB file to this image. Fairly dismal read / write, but remember this is to an encrypted sparse image on the Drobo with SATA HDDs, so to be expected!

@TwinTiger - once I cleaned the Drobo, I have it powered on but without the outer metal case. I have run hours of Disk Speed Test and done a lot of copying and the drive has NOT rebooted. This made me wonder if there might be something wrong with the fan / cooling and the device could have been overheating.

If the Drobo does overheat, will it simply reboot and show ‘internal error’ or will it say some form of overheating message?

I have a replacement Noctua NF-F12 iPPC-2000 120 x 120 x 25 fan coming which I will fit and re-assemble and see if the device is more stable.

Sam,
Would you submit a support case,

Drobo.com > Go to Support

If the Dashboard see’s the Drobo please include a diagnostic file.
Thank You.

Thanks @DroboFan123 - I have a support ticket #220425-116147

I am wondering whether the fact that I have 5 x 7200 rpm Seagate 8 TB IronWolf NAS drives could be causing the drive to overheat? I have tried replacing the fan and powering it directly with 12v - this does reduce the temperature but the drive still does a periodical reboot.

Has anyone had similar issues when using 5 x 7200 rpm drives? I could switch to single-disk redundancy and then remove one of the drives to see if that makes things more stable, but that’s at a huge risk!

Also, when looking at the diagnostic file, it reports all drives (inc. the mSATA card) as having a temperature of 0c.

In reviewing the diagnostic file, I see a couple of things:

  • I do see the drives temp is of interest, meaning the we see this in the diagnostic file but it is not enough to shutdown the Drobo. As you know when the Drobo gets to hot all of the lights go off expect the power light and this turns Red. This is to protect the drives.
  • Two of the drives in the pack are working a lot harder than they should, i.e. 90% busy. Generally a drive is less than 5% busy: (Disk Busy Time)
    Physical Disk Info for Drive Number :0
    Model ST8000VN004-2M21SATA
    Rev SC60
    Serial WSD4XE9C
    Disk Size (in LBAs) = 15628053168 in GB 8001
    Native sector size = 512
    Disk Life Remaining = 100%
    DiskQueue current depth = 4
    DiskQueue max depth = 12
    Disk busy time = 90%
    Disk Last IO Activity Tick = 59967
    Last Drive Temperature = 0C
    Last Cache Flush time = 999 (~1Mins:39Secs ago)

Physical Disk Info for Drive Number :3
Model ST8000VN0022-2ELSATA
Rev SC61
Serial ZA1FNLEY
Disk Size (in LBAs) = 15628053168 in GB 8001
Native sector size = 512
Disk Life Remaining = 100%
DiskQueue current depth = 1
DiskQueue max depth = 11
Disk busy time = 91%
Disk Last IO Activity Tick = 59968
Last Drive Temperature = 0C
Last Cache Flush time = 999 (~1Mins:40Secs ago)

  • I see the unit has experienced random reboots since December 2021, this is not related to the drives getting warm.

My recommendation would be as follows:

  1. For a number of reasons you should always have backup of the data you have one Drobo.
  2. For the random reboots, I would recommend you backup all of the data on Drobo and then start Fresh (reformat). However this is not likely to resolve the issues with the drives getting warm.
  3. For the drives getting warm this is a tough one, we have seen this before with the Ironwolf drive and there is also a number of articles about this online. Possibly Seagate has a solution? For the Drobo always make sure the faceplate is one, this keeps the fan circulating the air inside the Drobo.

Hope this helps.

If the unit is not stable enough for copy off try putting the unit into Read Only Mode:

Please follow this procedure to put the Drobo into Read Only mode:

  1. Put your Drobo into Standby/Shutdown via the Drobo Dashboard (if it’s not already) and remove the power. With the Drobo completely powered off eject your drives at least 1 to 2 inches.

How do I safely shut down my Drobo device? https://myproducts.drobo.com/article/AA-01686

  1. Power up the empty Drobo.

  2. Once the Drobo connects to the Drobo Dashboard, double-click the Drobo icon and press keystroke: CTRL-ALT-SHIFT-R in Windows or CTRL-OPTION-SHIFT-R if you are running Macintosh.

  3. You should get two prompts, one right away, and one a few minutes later. Please click to reboot the Drobo at that time (if you don’t get the second prompt for 5 minutes standby the unit using Drobo Dashboard and then cycle the power).

  4. Once the Drobo connects to the Drobo Dashboard please check that the Drobo is in read-only mode (RO) by looking at the name of the Drobo in the Status or Capacity screens and verifying that there is an (RO) after the name. If it is not in read only mode please start over at step 4.

  5. Once verified Drobo is in Read Only (RO) Mode, put Drobo into Standby. Once in Standby/Shutdown disconnect power. It is Critical that step 5 above is confirmed. Data loss can occur if the Drobo is not in read only mode.

  6. Once the drives are securely seated in the Drobo, power On the Drobo. The Drobo should connect to the Dashboard. Once connected to Dashboard please generate a diagnostic file and upload it to the case (procedure below)

How to Generate a Diagnotic File

After the Drobo boots are you able to access your data? If so please begin copying the data to another location.

If the Drobo does not connect to Dashboard with drives installed please:

  • Properly Shutdown the Drobo.

  • With Drobo powered off eject the drives.

  • Power up the Empty Drobo and generate a diagnostic and upload it to the case.

  • Properly Shutdown Drobo and wait for next steps.

@DroboFan123 - thank you for your reply.

Would it be worth replacing the drives in slots 0 and 3 (waiting for a full re-sync as each drive is added) and then seeing if this fixes the high ‘disk busy time’?

I would hold off and see if the fresh start, reformat resolves - but I would also contact Seagate.

@DroboFan123 - fresh start is a proper pain in the backside, as I’d need to copy the data onto multiple drives to achieve that!

Out of interest, can you see anything from the diagnostic files that gives a reason for the ‘internal error’?

Yes, the random reboots I mentioned. I went into our database to look up the error, from past cases the copy off and reformat was suggested in these cases.

And did the reformat resolve the issue? I have replacement HDDs arriving tomorrow, so is there any point in trying to replace one of the drives to see if (once the sync has been done) this resolves the issue for that bay?