Drobo

drobo unseated drive scare and problem

yesterday i was moving some things around including my drobo. i took care to shut it down through the drobo app, i unplugged the cables and moved it to its new home on a nice shelf. plugged it back in and watched the light dance. a few minutes later it informed me i had a drive failure in the bottom most drive and to replace the failed drive.

well the drive is only a few weeks old and i had doubts it failed so i shut down again, ejected, and reseated the drive. powered up again and the drobo thanked me for a new drive and proceeded to rebuild itself (taking 4 hours as the missing drive was 2tb)!

this prompts two questions:

why did the drobo not recognize that the missing drive was returned and just continue on?

what would happen if 2 drives came unseated? when i reseated them both would my data be entirely lost because of a false “2 drive failure”?

the second question very much concerns me. while i have no immediate plans to move my drobo again, not after this most recent experience, this false alert and automatic rebuild could actually result in a data loss when there was none.

i have a linux server running just a few feet away from my drobo equipped with 6 data drives. something like this could never happen. if the same situation had happened the raid array would have been fine after i reseated the drive.

Are you sure that the drive failure was the Drobo’s fault? I had a similar situation with two brand new hard drives in my - also brand new - Drobo FS.
Drobo moved, drive in bay #2 failed. Reinserted, worked fine. Same drive failed to start during a Drobo start some days later. I put the drive to bay #4. Worked fine again, failed after two more days.
I sent the drive back. Other drive (same model, same purchase date) failed a week later.
My dealer told me that another customer had two hard drive failures with the same model some days ago: Samsung Spinpoint F2 ECOGREEN HD154UI 1,5TB

But: I agree with you that the Drobo should detect the reinserted drive a little ‘smarter’ and should recover a little faster.

[quote=“scottjl, post:1, topic:1496”]
why did the drobo not recognize that the missing drive was returned and just continue on?[/quote]
Could be a number of things…
The drive might have been slow to spin up, so it timed out.
The drive might actually be going bad. Remember that Drobo tracks drive error trending, so it tries to predict failure and warn you in more of a proactive manner, versus being reactive.

Most drive diagnostic utilities, however, are reactive and won’t tell you that a drive is going bad, it’ll say the drive is good until it has gone bad.

[quote=“scottjl, post:1, topic:1496”]
what would happen if 2 drives came unseated? when i reseated them both would my data be entirely lost because of a false “2 drive failure”?[/quote]
Depends…

If you have Single Disk Redundancy (SDR) then when the second drive is “missing” Drobo will tell you “Too many drives have been removed” and the drive bay lights will blink red.

If you have Dual Disk Redundancy (DDR), then if the drives are missing for more than a short amount of time, the remaining drives will go into relayout and Drobo will tell you your Data is not protected (because it isn’t)

Did you move the Drobo with the drives in it? DRI does not recommend that as movement of the drives (since not all drives are built exactly the same size/height) can put stress on the backplane connection.

well. all 5 drives in my FS have been running about 6 weeks now just fine. i physically relocated the drobo from a desktop to a shelf, moving it only several feet, while it was powered off. i re-seated the drive and it was recognized as a fresh new drive and the drobo took off rebuilding on its own. i wouldn’t say the drive failed. i may not have been the most gentle moving it over, but i certainly didn’t drop it either.

actually when i initially populated my drobo some of the drives (2) were not immediately recognized. i think the connectors in back might be a little tight or something. re-seating both got them set properly and they worked. this was bays 2 and 3 (from the top). my failure yesterday was the bottom most bay.

what concerns me most is the auto-recovery that kicked in after reseating the drive. as i said, with my linux server, this would not have happened, i would have had to initiate a rebuild. re-seating the drive and powering up again as i did with the drobo would have recovered from the error without any loss or rebuild. but linux-raid would have recognized the re-attached drive as part of my md set and all would be well.

i understand drobo tries to automate most of the administration functions of managing a raid set, to make it very easy on the user. but on the other hand, this is dangerous behavior that could result in data loss in a situation where there should be no loss.

the worst part, you can’t even protect yourself from it. unless you watch the bays when you restore power after a move and immediately shut down if one drive does not light up. you might not realize what is going on until the unit is on-line and the drobo app issues its warnings. in my experience, it was already too late by then.

and if 3 drives became loose. would it be irrecoverable? better be very careful. as i said, i have no plans on moving my drobo physically again until this issue is addressed.[hr]
the drive was not bad. i know it was unseated because the bay had no light on, so it must not have been making physical contact. so the drobo assumed it had physically failed out entirely.

where does the documentation state not to move the unit with drives in it? yes. i moved it with the drives in it, a distance of all of 5’. as i said, i didn’t treat it like an egg, but i certainly didn’t abuse the unit either. it went from a hard (desk) surface to a hard (shelf) surface. it was not powered up at the time.

as for time, this all occurred within a 15 minute window at most. i powered it down. moved it. powered it up. when it came up i immediately got alerts. looked over at the drobo. saw the bottom bay was not lit up at all. powered it down again. reseated the drive, and powered it back up.

[quote=“bhiga, post:3, topic:1496”]DRI does not recommend that as movement of the drives (since not all drives are built exactly the same size/height) can put stress on the backplane connection.
[/quote]

Just a quick question about this… is the Drobo then smart enough to maintain data if you pull all the drives and put them back in in a different order? Or does order need to be maintained?

When fully-inserted and the eject arm is latched, the drives should be fairly secure.
That said, a swift kick (not enough to break anything, but enough to dent plaster) would definitely eject at least one drive. Trust me… I did that once. Not that I was angry with Drobo - it was just at my feet and my leg slipped as I was pushing my chair back.

You might want to open a support ticket and send DRI your logs so they can check for anything that looks like an intermittent connectivity-related issue.

I understand your concern about data loss, but Drobo in and of itself won’t cause any loss of data.

If too many drives are missing, it will simply unmount the array and tell you that too many drives have been removed. There’s no data loss. Once the missing drive is reattached, the array is available once again. It might go into relayout, but there’s still no data loss.

Even if you were to eject all drives except for one, this is not an unrecoverable condition as long as the drives are still good and are replaced.

In fact, if you replace the drive within a few seconds (I think it’s actually 15-30 seconds), it won’t trigger relayout at all and it’d be like the drive never disappeared to begin with.

For moving/transporting Drobo:
http://support.datarobotics.com/app/answers/detail/a_id/173

Technically, when you eject a drive one of two things can happen:

  1. If there is enough remaining storage to protect the amount of used data, Drobo will go into relayout (rebuilding fault-tolerance) and downsize the amount of available protected storage.
    For example, if you have 4 x 2 TB drives (8 TB actual, 5.5 TiB SDR protected) and are only using 1 TiB of space, then you eject one of the 2 TB drives, Drobo will reconfigure itself into a 3 x 2 TB (6 TB actual, 3.6 TiB protected) protected configuration. Until the reconfiguration (relayout) is complete, data is unprotected, unless you have DDR, in which case data is still protected from one additional drive loss.

  2. If there is not enough remaining storage to protect the amount of used data, Drobo will go into unprotected mode and tell you to add a drive.
    For example, in the same case as above with 4 x 2 TB drives (8 TB actual, 5.5 TiB SDR protected), if you were using 4 TiB of space, then you eject one of the 2 TB drives, Drobo will recognize that it cannot protect all 4 TiB of data in a 3 x 2 TB (6 TB actual, 3.6 TiB protected) configuration, so it goes into unprotected mode. You can still access your data, but the array is no longer fault-tolerant. In traditional RAID controllers this is running in Degraded state.
    If you have DDR enabled, then your data is still protected from one additional drive loss, but from a technical point of view, it’s still degraded as you do not have the full level of configured fault-tolerance.

i have wondered that as well, nothing mentions it in the documentation that i have found.

@Naenyn:
Order does not need to be maintained. For work we ejected the drives from our Drobo, packed it all up, got to the destination, stuck the drives in willy-nilly and all was good. In fact, someone asked us about the ordering, so for fun we quickly swapped the positions of two drives and showed all was still happy.

why couldn’t they have mentioned that procedure in the docs, rather than force me to go searching through the knowledge base? and those docs really are more for cross-country transportation than 5’ across a room.

i don’t believe the drobo was in re-layout mode as it took 4 hours and said it was rebuilding. just as when i swapped out a 1tb for a 2tb drive 2 weeks ago. my array currently is about 20% utilized, so i have plenty of free space. it did say my data was fine and i was protected and all that.

the arms are no guarantee that the drive is seated correctly, i found this when i initially populated the device. 2 drives had no power and did not come up. i powered off. ejected. reseated. and the drives were online. personally i think the sockets are a little flaky or just sensitive, but since i don’t plan on moving the device around much i don’t expect this to be a major concern once it is up and running.

i’m really not willing to test it again, but from my observation, removal and reinsertion of one drive threw the raid array into failure and recovery mode and that concerns me. why didn’t it recognize that the same drive was reinstalled (in the same bay) and just carry on happily?

Thanks for the response! Very interesting. can the same be said then if a Drobo unit fails and needs to be replaced…? Can you pop 5 drives out of one Drobo FS and into another and have all your data magically intact?

Yes.

Rebuilding = relayout, though relayout can involve a bit more than rebuilding.

If it makes you feel any better, I moved my Drobo 10 feet across the room with the drives in it. This was before the unfortunate “kicking” incident that caused a 36-hours relayout. :slight_smile:

Being that you had problems with drive connections from the get-go, I would open a support ticket and attach your logs, just for peace of mind.

Removing any drive puts Drobo into “hey, a drive is gone” and it’ll wait a little bit, and if the drive doesn’t return, it’ll go into relayout aka “data protection in progress” so it can re-establish redundancy and up/down-size the array as necessary.

It’s supposed to recognize that the drive has returned, but if relayout has already started, then it gets treated as a remove-then-add just like a drive upgrade.

ok. so when is it supposed to recognize the drive has returned? what is the time frame? i had a very small window here, 15 minutes at most. it couldn’t handle that? that’s scary.

are the drives hot swappable? as a general rule i don’t test hot swap functionality if i can avoid it, a power down was easy for me, but would i have been better off just ejecting and reinserting the drive live at the time? would the drobo have caught on?

as i said, i hope to avoid this situation entirely in the future, but i’d also like to know how to handle it better if it happens.

thanks to all who have responded!

It’s supposed to recognize it immediately. I don’t know what the “window” is between eject and replacement for it not to trigger relayout, but I’m pretty sure it’s less than a few minutes.

Yes, the drives are hot-swappable. Some folks like to power down the unit and boot it with/without the drives though, as that reduces the risk of accidentally ejecting another drive - or the wrong one.

Yes as in, you can take the drives out of one Drobo FS and put them into another Drobo FS and have all your data intact without any issues?

Does this auto replicate the shares and settings onto the new Drobo FS? Is that information stored in the array itself or in flash mem internal to the Drobo FS? I think some deeper understanding of how the Drobo FS works in all of these disk failure/unit failures would let us all feel a bit better for those of us with unique data on our Drobo’s.

Yes, you can remove the entire disk pack (all the disks) from one Drobo FS and put them in another Drobo FS and everything will be there.

Things vary if you’re migrating disk packs between different models of Drobo:
http://www.datarobotics.com/migration/index.php