Drobo

drobo failed relayout and crashes

Hi there,

calling for help here from germany and hope to get some information, which neither the support in the US nor in Germany was able to deliver so far:
I was running my DROBO 2nd generation on 4 1TB Seagate Barcuda Harddrives safely for month.
At approx. 70 percent capacity I’ve swapped the 4th 1TB drive with a WD 2TB caviar green. The relayout started. According to status bar it was supposed to take almost 40 hours. 2/3ds into the process drobo dismounted itself and kept shutting down every 5 minutes.
After going through some tests with support-line with no result. Has data from remaining three original hd’s already spread across entire volume or is everything still on the 3 hd’s? Could I just put in the old drive when power is off and thus put drobo back into a stage it was before the swap started???

Added Oct. 30th:
Support read the diagnostic file: one of the original HD in slot two failed during the relayout and caused the systemcrash. Recommendation by support: clone the failed drive. I wonder if I cause any more harm doing this myself if the drive is not working correctly!?

Still no idea how to rescue the data? I get the feeling support doesn’t know any answer to the problem, try to clone the presumably failed drive (during relayout) and reinsert that on. Hope the controller will still be able to rearrange all data into one functioning unit!

Any update?

Took the failed drive to a renown data rescue company here in Berlin to have it cloned in order to revive drobo and hopefully finish the relayout. I didn’t want to play around with it myself. They called me up today to say that the drive (Seagate Baracuda) is physically damaged. At this stage they would only be able to clone 7 of the 8 sectors. Or they could repair the heads in the HD which would cost about 1000 Euro. Don’t know what to do as I’m not sure if the relayout would fine or even more damage will occur to the system…

To everyone out there changing drives or upgrading to more capacity: Make sure you have ALL DATA SAVED on different location before Drobo goes into relayout.

If a harddrive fails, as happened to me, it’s very likely you loose all data!!!

What I do not understand is why you are not going back to your original idea, and could not simply put back in the Drobo the original 1TB drive you hoped to upgrade to 2TB, wait for Drobo to recognize 3 valid drives and a failed one, and then put in your WD 2TB as replacement for the failed Seagate drive.
Of course it assumes you were purely “read-only” during the whole rebuilt phase (otherwise data would have moved).
Has DRI support anything to say on that subject ?

Tried that with the telephone support already. Drobo just isn’t able to recognize the drives as being healthy and keeps booting up and shutting down in turns of about two minutes, thus not starting a relayout.

It mounts on and off, which just gives enough time to write a diagnostic file before it shuts down again-very strange behaviour!

Also, even tier three support at Drobo can’t give any sophisticated advice on what to do and admits that with every additional try more damage could be done to the system.

[quote=“fraumanu, post:6, topic:646”]Also, even tier three support at Drobo can’t give any sophisticated advice on what to do and admits that with every additional try more damage could be done to the system.
[/quote]
If I were in your situation, I would pull the whole drive pack and do whatever was necessary to get a different Drobo chassis to try the drive pack in.

While Drobo protects against drive failure, it doesn’t protect against Drobo failure…

Hey fraumanu,

I re-read your thread from the beginning… It sounds like the relayout process didn’t succeed?
Seems like another drive failed during the relayout process.
If that’s the case, I’m sorry to say, but your data is gone.

While relayout is in process, you data is not protected (unless you have DroboPro in Dual-Drive Redundancy mode). This is not a Drobo-specific problem, this is inherent to any RAID-5 type solution.

So to ammend my last statement… While Drobo protects again a single drive failure, it doesn’t protect against another drive failure during relayout. DroboPro, in Dual-Drive Redudancy (DDR) mode is equivalent to RAID-6 and can protect against two simultaneous drive failures, but if you have more than two drive failures.

My question getting past the data loss (I’ve been there) would be why you had two drives fail back-to-back. It could just be coincidence (stuff happens), but it may also be a sign of a greater issue. Perhaps power-hungry drives, overheating, failed fan, or a crap batch of drives.

Are you drives all the same/close serial and/or batch numbers?

RAID primer from AC&NC (hope this is OK to post - it’s a good tutorial) if you need it:
http://www.acnc.com/04_00.html

Brandon

well i always understood you are at a slightly higher risk of a second drive failing during a relayout/rebuild - since obviously its putting a higher than normal strain on the remaining drives

And short of a power surge or power supply going bad, I have never seen a system physically damage a drive other than long-term wear (3+ years of heavy use) with heat being a factor.
I myself have physically damaged drives, mostly due to falls… :\

[quote=“bhiga, post:8, topic:646”]…
Seems like another drive failed during the relayout process.
If that’s the case, I’m sorry to say, but your data is gone.

While relayout is in process, you data is not protected (unless you have DroboPro in Dual-Drive Redundancy mode). [/quote]
My understanding of fraumanu’s problem is that the new bigger drive failed, one of the 3 older drives failed during relayout, but the original replaced drive was OK when previously in Drobo.
Thus my reading of the situation was that, assuming he did not write anything new on the Drobo during relayout, the original data files would still be present on the 2 OK drives in Drobo, plus the removed OK drive.
If Drobo software was clever enough, putting back in Drobo the 3 original OK drives should thus allow a rebuilt of the 4th defective one, right ?

Why do you think this is not the case and the data is lost ? Does Drobo relayout continuously destroys the original directory as soon as it starts ?
It is really a pity DRI does not communicate on what errors are recoverable during relayout and what are not; for instance, it is not clear at all to me if there is some reliability benefit to put the Drobo off-line during relayout or not.

Actually, I am close to thinking DRI made a design error : even the base Drobo should have 5 disks slots, including 1 always free during normal operations. This slot would be used only for inserting a new disk during relayout. The removal of the disk to be replaced by the new one would be done ONLY after relayout is notified as successful, thus avoiding 1 week or more of non-protected data (furthermore during the most stressful activity period).
For sure, you get the same benefit from Drobo Pro, but the price point is totally different and largely inadequate for individual users. 4 disks Drobo was right where it should be, which made its success, except that relayout failures are frightening and seem to be far from exceptional, considering the number of forums entries on the subject, although some had happier ends.

On the other hand, DRI could probably achieve the exact same benefit with different, more optimized relayout software, offering the option of “safer relayout” by forcing a lengthy relayout to proceed off-line, thus freezing the data on all original disks and allowing to go back to the original situation (putting back in the removed disk).
That obviously would assume a checkpoint taken at the beginning of the relayout and no data files moved during relayout on the 3 untouched disks.
Alternatively, keeping Drobo on-line and forsaking data protection only for data newly written or modified during the relayout period (while keeping it for the data present at the beginning of the relayout and not modified during it) would be acceptable to me.

I truly believe DRI needs to make a major effort on its relayout software, since the conjunction of extremely lengthy relayout (2 weeks for 4 x 2TB upgrades !!) and non-protected data and non-recoverable errors during that lengthy period is frightening enough to destroy the confidence most of us had in Drobo data safety : if I have to go back to mirroring, the economic benefit of the Drobo (1/4 the cost of storage mirroring) becomes zilch.

As far as I understand, during relayout, the array is not fault-tolerant, because parity and data is being remapped and redistributed.

If he had swapped the failed drive back with the original drive before relayout started, then what you propose might have worked.

But once relayout starts, the data is being shuffled around to reallocate it across the available space. The array is in a re-initialize state, so parity is getting rewritten and redistributed - this means the original drive’s parity mapping no longer matches the new array’s mapping.

If I’m right, relayout is more than just “replace dead drive” in a typical RAID sense - it’s also online expansion. It’s the online expansion part that makes the process more lengthy and also keeps you from being able to just pop the old drive back in once relayout has started.

If relayout mode had where it essentially says “treat replaced drive just like the old one” then what you describe would work all the time - but your volume would never be able to grow in size, because even if you replaced a 500 GB drive with a 1 TB drive, it’d treat the 1 TB drive as a 500 GB drive and that extra 500 GB would never get accessed or used.

“Safest relayout” in my mind is - unmount and disconnect Drobo from the host system. That’ll ensure there’s no extra data changes coming in.

Brandon

[quote=“bhiga, post:12, topic:646”]
As far as I understand, during relayout, the array is not fault-tolerant, because parity and data is being remapped and redistributed.[/quote]
You are probably right since it fits the symptoms. Too bad.
Nevertheless, I continue to think that DRI should make a major effort to offer thru enhanced software safer checkpoints during relayout, since this too lengthy vulnerability period destroys for me most of the supposed benefits of a supposedly “fail-safe” Drobo.

This problem plagues any RAID-5 type system. Making it faster would require faster processor (if that’s the bottleneck) or having more/faster drives. My old 2 TB array took anywhere between 12-24 hours to reinitialize after replacing a failed drive, and since it was a RAID card and not a standalone box, the host system had to be powered on.

The only thing I can think of would be to have the notion of a “hot spare” on the Drobo, but on a 4-port unit, that’d reduce your usable storage to max of 2 x largest drive size.

It’d be a reasonable option for a DroboPro though that already has dual-drive redundancy which is better than single drive redundancy+hot spare anyway.