Drobo

Problems with DroboPro

My trusty DroboPro started having problems recently. First it was constant rebooting within 5 min. After trying many different things suggested online, I found that removing the first drive seemed to have stopped that (no warnings on that drive).

Firmware: v1.2.2
Dashboard: v2.8.5

Now access to the drobo is extremely slow, locking up the mac frequently. For example, I just tried deleting one file (i.e. move to trash), and it took 7 minutes. It has other random behavior like different % used in dashboard after restarts etc. Basically very unreliable. I have it connected via firewire, but I’ve also used USB, same results.

At this point I simply want to get the data off, but copies frequently fail or lock up, or the drobo stops responding.

One of the difficulties is that there is no other data or errors to go off of. Nothing is being logged in the console. Activity Monitor doesn’t show any significant activity, the finder UI just freezes.

Is there any way to get logs/info on what’s actually going on and where the errors are occurring? Or a way to decrypt a diagnostic log v2?

Thanks

hi angusc, can i check when you mentioned no warnings on that drive, was that on the main drobo bay lights (and dashboard bay lights?)

if you can check in dashboard the extra information dropdown (which sometimes shows when you click on an actual drive bay), can you see any extra info there (usually next to model and firmware info about the drives) such as a label saying Warning or Healed?

is the first drive still removed?

can you remember how many blue led lights were lit before you started having the reboot issues (such as what you expected it to be), both before and after removing the drive, along with your used and free Values & Percent when you get a moment?

any other info you have about the drive bays could be useful too, for example imagining the 1st slot as slot1:
Slot 1 = x TB (currently inside / outside) & make
etc
and we can try and see what it could be

I’ve never seen the extra info menu, just tried clicking, control-clicking, etc on the drives in the status window, nothing happened.

Below is the progression of my work on the drobo covering about 6 weeks. Perhaps not the clearest, as I’ve tried many different
things and the config/setup has changed with those attempts.

When this all started (prob 6 weeks ago now) the unit was ~70% full. It was set to DDR and had 3x 1.5TB drives and 3x 2TB drives. Pretty sure all the drives are WD greens and were in the following setup:

Slot 1: 1.5 TB
Slot 2: 1.5 TB
Slot 3: 2 TB
Slot 4: 1.5TB
Slot 5: Empty
Slot 6: 2TB
Slot 7: 2TB
Slot 8: Empty

It was one of the 1.5TB drives in the first slot that I pulled which seemed to stop the constant rebooting.

I still have the first drive pulled out. It was the only thing that seemed to stop the unit from continually rebooting. There were no warning lights on that drive before I pulled it. I didn’t believe I could be lucky enough to have the first drive be the problem, so I wondered if the power supply was going and one less drive to spin was the actual problem, but when I reinserted it and pulled a different one, it still rebooted.

It was also asking for a new disk, saying it couldn’t provide redundancy, so I started deleting data as much as possible. Warnings never went away even as I got the data usage down under 50%.

When the unit restarts, there are various warnings about capacity, such as rebuilding, 100% full, etc. These are usually based on incorrect amount of data stored. I watched it closely just now after a failed overnight copy and the warnings happen when Drobo Dashboard has connected but the drive isn’t mounted yet. Once it mounts in the OS, Dashboard changed (from 3+ TB used to 1.14 currently).

I gave up on it for about a month until a few days ago when I turned it back on. Now it reported one of the 2TB drives to be dead, so I have now pulled it as well. Tried re-inserting, but still reported dead. So current status is:

Slot 1: Empty, flashing red asking for new drive (1.5 TB pulled)
Slot 2: 1.5 TB
Slot 3: 2 TB
Slot 4: 1.5TB
Slot 5: Empty
Slot 6: Empty (2 TB pulled)
Slot 7: 2TB
Slot 8: Empty

Even with the 4 remaining drives (2x 1.5TB and 2x 2TB) it was giving errors saying “Cannot protect your data against hard drive failures until you provide at least 2 hard drives for single disk redundancy or 3 blah blah blah” and asking for a new disk. Obviously didn’t make any sense as it had 50% free space (I had deleted some data off in last attempts to fix).

I’ve been (slowly, because of the problems) deleting data off, and it’s down to 29% used (1.14TB). Also, I turned off DDR, restarted the drobo, and then re-enabled DDR just to see. Drobo Desktop is now saying “…converting to DDR. This will protect your data from 2 drive failures” now. The capacity circle graph is now flashing green/yellow, but it’s been “converting” for 12 hours now.

As I’ve managed to copy some data, I’ve noticed that it acts normally at times, copying quickly, and then hits a patch where it slows to a crawl. Even opening a folder will take a minute or two (or 10) sometimes. Trying to delete a folder of pictures with about 12k files (not all in one folder) earlier today took about 7 minutes to move to trash. I gave up on emptying the trash after the “Preparing to empty” count had only reached 3,000 after 40 minutes. I moved on to some other files which deleted more quickly, but not normally.

Makes me think one of the disks is having problems, but I can’t tell which one, and the data should be replicated across more than one. My only other thought is that the slow parts are the drobo reconstructing data from one of the two missing drives from checksums (assuming the redundancy is similar to RAID5), but even if that was the case it wouldn’t be this slow. Feels more like a flaky disk that works just enough to not error out completely, but without any other info, really hard to tell. Of course now that two drive are seemingly bad and possibly a third in a short period of time, makes me wonder if the problem is with the drobo itself.

If i can get the rest of the data off, I’ll start pulling the remaining drives one at a time and see if the drobo speeds up, while testing them in a standalone external drive. Not sure I’ll trust this drobo ever again, but we’ll see what happens if I wipe everything.

I also looked through the diag files it output. Couldn’t find any obvious disk error, but there’s the encrypted file that unreadable.

On an overall note I’ve been really happy with the drobo and recommended to others over the years, even as I’ve been aware of the discussions/reviews. The flakiness and lack of info when things do go wrong is frustrating. Drobos are supposed to take the pain/tech out of reliable storage; that same philosophy should be followed when things are working and when they aren’t. Think that’s been the root cause of most of the complaints I’ve read about over the years.

Update: This morning after another overnight copy failed, I switched back to SDR and rebooted the drobo. Copies (of different data, so might just be in a better location) is working better.

Since the Drobo reboot (and upgrading Dashboard to v 3.0.0) it’s been saying data protection is in progress. Been going for 5 hours now. Just checked the status window and it’s changed from a unknown (candy stripe) progress bar to an empty bar saying protection will take 90 hours.

5 hours to figure that out, something is clearly still wrong.

After a few more freezes I finally got all the data copied off.

Deleted all the data off the Drobo in the finder, and it still reported 100+ hours of data protection in Drobo Desktop. With the data off I was finally able to reset it, and it rebooted cleanly and showed disk waiting for setup. All looked good.

Since the new firmware came out the other day, I decided to install it. After the reboot the Drobo is back to continually rebooting, even with all drives installed.

So much for this unit (and that upgrade).

hi angusc, thanks for writing out all of that info,

just before i forget, the part about dashboard having an area that can show more info, may only be in newer versions of dashboard, though here is a screenshot to show an example (taken from he dashboard help site for a 5d, but it should hopefully help convey the area i was trying to describe)

http://dashboardhelp.drobo.com/guide/250/en/5D_Checking_Drive_Information_for_the_Drobo_5D.htm
(the dropdown there says Drive Information, though i think usually says System Information unless selected)

for the deletion of tons of small files, usually that can take time in general (for example when i needed to free up space as was really low on space in the past, one util i used was to clear out old temporary cache files but something hanged and it was only on inspection i saw it was due to document object model dom files in the thousands that were taking ages on a computer to parse. In your case though, the live rebuild taking place at the time probably was competing with the delete tasks you had, and they both probably slowed each other down a lot too.

now that you have gone though considerable time and effort of copying off data / deleting Resetting/wiping and patching with new firmware, and the unit is rebooting again, it might be a flaky disk as you mentioned…

were you able to run all the drives in a standalone caddy to test them for errors? i know it will probably need a bit more effort and time to do, but a tool that can do a full surface scan might end up finding the culprit, in case the drobopro is actually ok but a flaky drive is causing it issues?

maybe one way forward (if feasible) could be to try obtaining or using 2 separate drves, such as new/blank or un-needed drives, that you know or believe work ok, and to see if the drobo still reboots with those… if it does not reboot in a loop, and actually starts working fine again, then one of the current drive (or drives) is likely the cause, and then it could just be a process of elimination by testing out each original drive separatey in a usb caddy or similar, with some tools.

a good tool that ive used before is the Western Digital Datalife Guard wdutil, and usually i look at the smart info via my icydock usb caddy on the drive i have to test, then run the quick test, and then the full test, and then to compare the smart info again.