Anyone have experience on how quickly the Drobo 5D can recover from a failed/replaced drive?
Had a 5D with
D0 = 5TB
D1 = 2TB
D2 = 2TB
D3 = 5TB
D4 = 2TB
mSATA = 128G
Disk 0 reported a failure. Went RED light, then the unit went into replication and after lights on D1-4 were green, I removed the 5TB drive. Turned off unit and computer.
New drive arrived, I turned on the unit and the drives (1-4) went blinking orange to green. Powered on computer and dashboard indicated that it was ‘replicating’ data. This is after the unit was already green before power down.
When lights turned green (minutes later), I inserted new 6TB drive into slot 0.
12 hours later the unit is still replicating, showing another 12546 hours. There is a total of 5.57TB of data on the drive. This morning as I was concerned the Array was confused I powered down the unit via the dash board and re-powered. The drive is now showing 28 hours to replicate.
I cleared un need data off of the drive, and it now has only 3.26TB of data, but still showing 24 hours (this is about 4 hours later than the 28 hour above)…
I opened a ticket on this as the first 1000+ hour estimate is concerning.
the estimates are notoriously inaccurate… they are based on the current rate of recovery… and if you are using your drobo, recovery can virtually stoop (as priority is given to dealing with other data requests)
24-48 hours would be typical with no usage.
could be up to 96 hours if you’re using it while it is rebuilding.
thats quite reasonable if you think about what it has to do… it need to READ all the data off your other drives… (which would be multiple TB)… then it had to perform calculations on those data to generate what went missing… then it has to write the replicated data back to the new drive (or free space on your remaining drives)… either way, that s a lot of reading, a lot of writing, and a lot of calculating
earlier drobos were about 1tb/day as a VERY rough rule of thumb. newer ones are faster
Thank you. I GREATLY appreciate the insight. IMHO, I do not think it is ‘reasonable’ from the fact that the Drobo before I inserted the new 6TB disk was green, so it did not need to figure out what is missing, as nothing was, it was stable with the 3x2TB and 1x5TB drive. This was just an insert a new drive to expand the array at the point I inserted the drive, unless I am missing something. It would hash 4.53TB or Par or what ever and distribute some new portion on to the new drive. I am VERY surprised on the time it is taking.
No matter, it is what it is, and though I would like to RMA the TWO 5TB disk ASAP, this looks like it will be a week long event. Did not read that in the brochure
Another interesting item. As I mentioned I reduced the data on the drive from 4.58TB down to 3.5TB. The dashboard was showed this reduction, and I was hopping that it would save time, like 20% or so?
I ejected the drive, thinking that it might speed things up even more.
The dashboard immediately jumped up in usage to 4.58TB ???
Well, I go to DiskUtility, and remount the Drobo, and dashboard immediately drops to 3.5TB, eject, back to 4.57TB…
I have 2x6TB disks sitting on my desk to put in to the array. One to replace the sister to the 5TB drive that failed, and one to replace a 2TB that is old. Seems that I will not be able to do this for DAYS…
I HOPE that the other 5TB drive does not fail before the replication completes.
I think there are probably several things happening at once here.
the most important one is that if you deleted a lot of data, yes that would reduce the rebuild time, but drobo is filesystem aware, if you delete a huge amount of data, especially if it is a lot of small files, basically drobo flags those blocks as “unused” but then it has an internal garbage collection routine, so its perfectly possible that you can delete a lot of data from the file system, but then it takes quite some time (especially if it is degraded/rebuilding) for the blue lights/dashboard to reflect this, i.e. when those blocks actually become available for use again. Its possible that the switching/rebooting/etc is causing some of the garbage collection routine to restart, hence the toggling of used space up and down.
maybe they have updated it so that it uses the filesystem used space as a guide now, so, perhaps having the drive ejected means that dashboard can no longer access the filesystem in the same way - so it defaulted back to the actual used space on the drive (which was still higher as it hadnt done the garbage collection)
The Drobo is now stable, I learned a lot. The key is that the unit was rebooting (think there is still an issue there), and I would get frustrated and ‘shutdown’ via the dashboard. Both of these activities reset the replication time clock.
According to Tech Support, when the 5TB drive failed, the ‘free space’ dropped so low that the replication was taking a long time.
They are looking (I hope) into why it self rebooted at least three times over the 48 hours it took to become stable.
Also, by luck or the fact that I ‘ejected’ the disk, the reboots stopped and replication completed.
I am now coping everything off of the drive array, as I want to replace a 5TB and a 2TB drive with two 6TB, and do not want to be at risk (which is a shame as that is why I bought the Drobo, was to not be at risk) of loosing the data.
Also if I pull a 5TB and push in a 6TB, and I need to wait 36 hours PER TB, why would I not copy everything off and just erase the array and start fresh. That option is frustrating to me, as a 16TB Drive that needs 36hours for each TB would be 24 days to restore stability is TOOOOOO long since the Array was only serving data at 24KB/sec while it was replicating.
Not the experience I was expecting from the $500 Drive.
I Hope Tech support finds what is wrong and my current experience is not normal, but people are telling me to expect 24-36 hours per TB of replication… WOW…