Drobo

Dual Disk Redundancy

I’ve had my drobopro for a few months now and have not had the need to replace a drive yet. Recently I noticed that one of my drives seems to be vibrating more than the others so I removed it. The drobo dashboard instantly came up telling me not to remove another drive, that my data was not protected. The drobo was setup from the get go with dual disk redundancy turned on. I have 3x1tb, 4x500gb, 1x250 gb. I removed the 500gb and the message came up. I figured maybe it was too big of a drive to remove so i put it back in and waited for all the lights to go green and I then removed the 250 and again the dashboard stated that my data was not protected. I though that when 1 drive failed/was removed my data should still be protected against another drive failure. Is this normal or is there some setting I missed? The “server” I’m using is a macmini with 10.5.8.

I have seen the same. It seems that the failure of one of the two redundancy drives (so to speak) gives the same message as the failure of the only redundancy drive. That being many red lights blinking.

Thats a great question - hopefully Drobo will reply to it with some official answer… I sweated the two hours mine was rebuilding…

When DroboPro is in relayout it is in a compromising state, as data is moving back and forth between drives. During this state we do not want you to remove/add drive or disconnect power from Drobo. We put this message in the dashboard as an extra precaution to warn you against doing so. If dual disk redundancy is enabled, and Drobo Pro is in relayout, and a drive should fail - the answer is Yes your data will be protected.

Just to make sure, when I first turn on dual disk redundancy for the first time (when I first “upgrade” my drobo2 drives to the drobo pro). During the lengthy relayout, there was a thunderstorm and the power went out for 6 hours. The drobo pro was on an UPS, but it eventually ran out and I was forced to turn off the drobo pro during the relayout. My question is, would I had harmed the drobo pro, or more importantly, my data that way? Thank you.

I think some additional info in the dashboard would be helpful here, but I’m also puzzled at the functionality.

I have 7 bays occupied, with dual redundancy on.
5x1TB, 2x320GB.

I’m in the process of replacing the 2 320’s with 1.5TB’s. Not because I need the room yet, but have another need for the 320’s.

In the existing configuration, I had 1.29TB used, 1.7TB free.

Pulling the first 320 generated the same results as the original poster. I held off inserting the 1.5 because I was curious.

The Relayout ETA started at 11hours and settled down to about 2 hours.
My issues:
[list]
[]The interface IMhO, should grant the end user the comfort of confirming that dual redundancy is in place and that, should a drive fail during this process, data is still protected. Currently it doesn’t seem to be conveying the truth with “Drobo cannot currently protect your data against hard drive failures”. That’s guaranteed to generate a number of thoughts in the mind of the end-user. None of them benefitting post sales satisfaction for Data Robotics.
[
]I would suggest that the color of the pop-up’s should be yellow rather than red. This should not be indicating a dire risk state.
[]It’s always perplexed me, that the relay process would take so long when removing the smallest drive of a more than half empty device. Searching here didn’t shed any light on what’s actually happening during a Relay, has it been posted?
[
]What exactly is the performance penalty we live with of read/write opperations for data protection, if that protection doesn’t appear to kick in until the drive fails. Why is the data not already fully protected as with a more traditional RAID (which also has a performance hit if doing parity and striping operations). If Drobo isn’t doing all of the protection as it goes along, why the performance hit? Instead of giving uncompromised performance because protection overhead is deferred until a drive failure event. (I’m sure I must be missing an important concept or there’s a white elephant in the room).
[/list]

After half an hour of the ETA at 2 hours, I inserted the new 1.5TB in place of the removed 320GB. The ETA goes up to 5 hours then settles down to 3. I’m not sure how I’ve increased the workload. It’s still has to at most relay-out 320GB’s of data.

As another interface artifact however, after inserting the new drive, for almost a minute it flashed red on the 8th empty slot and instructed me to add a drive. Which created yet another “what the…” moment.

Just some feedback that for a device meant to simplify the expansion/protection experience, and give some warm fuzzies that data remains protected, the expansion process leaves me with many more moments of discomfort and hesitation, than when I’ve done the same process on an Infrant ReadyNAS.

That stems from one part functionality but two parts UI/Feedback design.
</2 cents>

While the message of “Your data is not protected” is a bit disconcerting, I can understand the high level of discouragement for removing another drive, because at that point the data is really unprotected and it’s going to take a LONG time to achieve any level of redundancy because both redundancy datasets would need to be recalculated.

Removing/upgrading a single drive with DDR only requires recalculation of one set of data. Time and security-wise, it’s best to pull only one drive at a time.

Unfortunately, lots of users would just say “Well, I have dual redundancy, so I’m going to pull two drives at once!”
Then they pull two drives, sh…tuff happens during the extended relayout and it all ends in lots of tears and screaming.

Perhaps a message of “Do not remove another drive” without the “Your data is not protected” may work?

As for relayout… Here’s a good primer, though BeyondRAID is a little more complicated as it supports mixed drive sizes so it also needs to do more data-balancing.

Where you’re getting your “extended” times is from the fact that you replaced a smaller drive with a larger one. The relayout also needs to redistribute the data in order to give you more usable space. You’re not only rebuilding redundancy, you’re also building new redundancy for the added space, and the second layer of redundancy for DDR also needs to be built for that too.

I’m pretty sure you’d see a much faster relayout time if you were simply replacing a dead drive with another drive of the same size.

Brandon

I’ve commented before (about a year ago maybe - when i had just got my pro and was experimenting) that the message needs to be clearer since our data IS still redundantly protected… something like:

Your data is only single redundantly protected, please do not remove anymore drives.

at least some kind of an acknowledgement that we had dual disk redundancy, and just because a single disk has failed, we still have an additional level of redundancy.

as for:

What exactly is the performance penalty we live with of read/write opperations for data protection, if that protection doesn’t appear to kick in until the drive fails. Why is the data not already fully protected as with a more traditional RAID (which also has a performance hit if doing parity and striping operations). If Drobo isn’t doing all of the protection as it goes along, why the performance hit? Instead of giving uncompromised performance because protection overhead is deferred until a drive failure event. (I’m sure I must be missing an important concept or there’s a white elephant in the room).

I think you have kind of missed the point. drobo is doing the data protection as it goes along, it calculates and stores all the parity as you write to it. Your data is already fully protected as in !“traditional” RAID. it has to be - you cant protect the data after its failed - you cant protect what you dont have.

Im not sure i fully understand what you think is happening with “protection kicks in after the drive fails”. once a drive fails drobo begins the process of rebuilding/replacing the missing data (assuming it has space to do so obviously)

I was wondering about that, too. Protection is created in advance, it just takes some time to recover the protected state. But maybe he meant the following:

In principle it would be possible to provide an extra layer of redundancy by using empty space on raid drives (assuming empty space is available) and storing data and parity information two or more times on different drives. In this scenario, when one drive fails, all data and parity would be available on some other drive without relayout, just by updating some pointers.

This way, not only parity but also relayout information would be calculated and stored in advance, for example during drive idle time. When a drive fails, only a small amount of recently written data would need to go through relayout, most data would be available by just pointing to the spare copy of data or parity.

When a drive is 75% full this would not work - unless, of course, you can predict which particular drive is going to fail next (not completely unpredictable!) and duplicate it’s data as first priority. But with 50% free space, is there a reason this would not work?

woudlnt that just end up being roughly equivalent to raid 1 with several disks? or are we trotting off into the realms for RAID 7 (i just made that up - i have no idea i it exists and im too lazy to even google it) with triple disk redundancy? which is basically what you are saying i think, conceptually, store even more data and when a drive fails, we are already back to dual disk redundancy?

i would expect on a nearly empty drobo it probably dos store data along the lines you are suggesting, with multiple mirrors, but who knows.

but he was saying “as with a more traditional raid” and to my knowledge there any systems which works as you propose?

and - following a drive failure - in your system there would not need to be any rebuilding as such, but there would still need to be a relayout to put data onto the replacement disk, it would just mean that your data was never at risk during this period.

Yeah, I think that’s what would happen too…

Unless I’m off-base, in DDR, one you lose the first drive, the redundancy layer repopulates its data, which consists of both the stored (user) data as well as the SDR redundancy data onto the replacement drive.

So each additional level of drive redundancy has to protect all of its predecessors’ data. So, for sake of argument, if your redundancy overhead is 10%, and you have 2 TiB of data, the single redundancy requires 200 GiB, but dual-redundancy requires 220 GiB (10% of 2.2 TiB), and so on and so forth.

Unless you have a huge number of disks in the array, at some point it might just be better to have hot spares.

that wasnt my understanding, i thought it was another 10% (for example) but with the parity calculated in a differnet way

Thanks for the memory jogs on expansion and relay. Clarity restored.

Well sure. Removing dual drives removes the dual redundancy. Data isn’t lost, just the protection until the process is complete.

Pulling 2 drives with DDR enabled should produce the same warning as on a Drobo (1,2 Gen), or Pro/S/Elite when DDR is not enabled.

Pulling 1 drive shouldn’t. No data is at risk. It remains protected against a drive failure. The dialog popups should be at most, amber for caution.