Drobo

So apparently all my data is now "at risk"

Chris, if I understand his concern correctly, the DroboPro is effectively in an endless reboot. It might take, say, 60 hours to do the rebuild. Whatever drive (drives?) is causing the problem is failing more frequently. His rebuild will likely never complete. That is one problem, as I see it.

On top of that, DRI has told him he has two suspect drives, one worse than the other. However, at any given moment in time he does not know which drive is being rebuilt, especially since he has dual redundancy. For all he can know, given what Drobo tells him, two drives could be playing tag team.

Now, if he replaces a drive based on what DRI told him yesterday, or even a few hours ago, he might be pulling the wrong drive. For that matter there could be a third drive playing tag team. Hold this thought a second.

I have been very critical of, among other things, the lack of direct real time reporting from Drobos. You have continually responded that the solution is to submit logs and get tech support. Now, totally aside from the issue of what DRI’s tech support policy really is, this is a classic case of why “submitting logs” is not a proper solution for a box that provides inadequate user diagnostics. It is, at best, a frequently inadequate workaround.

To put it another way, I don’t see any clear practical way for him to determine, absolutely and positively (his liife data depending on it) exactly which drive to replace, at any point in time, unless the Drobo itself (and not DRI support) is telling him what drive has failed, even if only intermittently as is the case here.

The only possible way for DRI to (definitively) help would be if he pulls a log, submits it to DRI, they immediately diagnose the log and tell him which drive is rebuilding. From the time he pulls the log, until the time he pulls the drive (per DRI’s instructions) he must watch the Drobo continuously, alert to a reboot. And no potty breaks! If the machine reboots in the interim then the cycle starts over with a fresh log submission.

It should be clear that the above is not very practical. And even if it were practical, it is also possible that two drives are rebuilding at once (having dual redundancy) and that would further complicate things.

Having been a software developer for much of the last 35 years now, also responsible for customer support for 20 years, it is crystal clear to me the flaws in the current “system”. It is just difficult to articulate without such a fine case study as we have here.

From his last post, he appears to have purchased 4 new drives in order to make an independent backup before he proceeds any further (if not for a permanent solution, I guess). If my understanding of the nature of the problem is anywhere near correct, I think he has made a wise and prudent decision.

I don’t know how well I explained rkuo’s specific concerns but I’ve done the best I can to explain what mine would be if I were in his shoes and I would be surprised if there isn’t much in common between our respective concerns.

I know drive 5 is pretty bad and drive 7 is starting to develop a few bad blocks from support. They have instructed me to try to catch the drobo between rebuilds so I can update the firmware and replace drive 5 with a 3TB drive.

I’m not going to elaborate here on what’s already been posted in this thread, but there are at least 5 or 6 things wrong with how the Drobo is handling this situation. About the only good thing I can say so far is that the DroboPro hasn’t annihilated my data (yet).

What should have happened is either the Drobo warns me that bad blocks are developing on drive 5 or marks it as bad, informs me that I now have only 1 drive worth of protection left, and instructs me on exactly which drive to replace. What happened was that the Drobo never warned me of a bad drive, I was forced to e-mail support to figure out what was dying, I was never informed correctly how much protection I have left and am instead getting uninformative warnings stating all my data is at risk, and the Drobo is now stuck in a constant rebuild loop against a drive that the Drobo has already told me is bad (which is perhaps the worst part of this situation). During a rebuild, I’m apparently not supposed to replace any drives so I’m incapable of resolving the problem directly like I should be able to.

My plan is to offload all the data and then just force through the firmware update and drive replacements even while the rebuild is taking place and hope the dual redundancy is working.

Not acceptable.

One of the peculiarities of the Drobo is that some people complain that drives that are perfectly fine (according to drive maker utilities and standard drive test utils) are marked bad (no rebuild attempts, just a single red “replace me” light). Others, such as yourself, report repetitive rebuild loops because apparently the disk is so bad the rebuilds get corrupted or cannot complete, yet the Drobo refuses to mark the drive bad, set a red light and wait for a replacement. It seems to suffer from both extremes of the spectrum.

Did DRI have any recommendations to “break” the rebuild loop cycle? In your case that is the major issue; the others are inconveniences (which you know I agree with you on that but they are surmountable). To put it another way, even if you had all the reporting you wanted it wouldn’t help you catch an interval between relays. As I interpret your posts, the Drobo actually never “told you” a drive is bad. It knows because it is doing a rebuild but it never informed you (DRI tech support did).

Just out of curiosity, do you know for a fact that the rebuilds completed before restarting? Your post above suggests that but not enough time seems to have transpired for 4 complete rebuilds unless they took far less than the 50-60 hours we would expect for a full rebuild. Since you were not there when they completed how do you know they actually did? Email alerts?

I know because the progress bar reset to the beginning and a new estimate kicked in. Also because the last time this occurred there appeared to be a window of a few minutes where the drobo e-mailed me that it had found a bad drive (and then subsequently started a rebuild with it!!!).

The original rebuild took about 3 days. Subsequent rebuilds have been on the order of 26 hours … I assume because a lot of the initial housekeeping completed. I’ve been keeping the array idle to help it go as fast as possible, but at this point I’ve given up and am simply offloading all the data as fast as I can.

It is possible that none of the relays completed. The 26 hour relays may have just gotten 1/3 of the way through before the relay was restarted.

I am basing this on the fact that you are not offering any solid evidence that any of the relays completed. The progress bar would reset in the event of a restart of a partially completed relay. The Drobo might kick out a “bad disk” error in the middle of a relay. Neither are conclusive by any means. I think you would need to either see green lights or get an email confirming completion of the relay- which may not even be issued (I don’t know, never having had a relay when email was enabled).

Just something to think about.

If this is true, then at the very least they all reset very close to the end of the relayout process, based on the progress bar and time estimates. I have no way of verifying for sure tho, having not been present for the final pieces. I certainly did not get any e-mail notifications saying the relayout process is complete.

6th rebuild now. The Drobo is not rebooting itself, it’s just rebuilding constantly. I know this because I’m currently copying data off the Drobo in an effort to get all my data off the accursed thing and the copies have not been interrupted.

I did manage to see the Drobo start its rebuild this time … about a 5 minute window between watching it and coming back and seeing the damn thing kick off again before I could do anything interesting.

I have about 4TB left on the Drobo so in a day or two I’ll have all the data off and be able to just force remove a drive/update the firmware, etc … and see what happens.

Got all my data off. Rather than reset the unit completely like tech support suggested, I decided to update the firmware while the unit was rebuilding, and then reset it.

The firmware update worked and my remaining data on the drobo is intact.

However, the unit is behaving weirdly because it is asking me to replace a drive to add more capacity, even though the array is only 40% full due to all the data I offloaded. I then proceeded to live remove the the bad drives (5 and 7). My data continued to be accessible (indicating that the dual redundancy actually was working), but the Drobo did not initiate a relayout of any kind, which it should have as there was sufficient space on the remaining 6 drives to get back to that protection level.

I then inserted a 3TB drive into bay 5. It is recognized, but again, no relayout, and continues to claim my data is at risk.

I think something is weird with the current configuration, so I’m just going to reset the whole damn thing now.

It can take a bit of time for the “garbage collection” to happen on Drobo.

For example, I deleted an entire folder containing about 2.5 TB of files. The capacity gauge didn’t reflect the change until a few hours later.

I"m copying my data back. Now for the fun part. Here’s my last e-mail to tech support.

=========================
05/10/2011 06:30 PM
Hi … I completely reset the Drobo and am copying files back now. So far it’s behaving OK, but to do so required me to completely reset the Drobo, as the array appears to have been in a bad state.

So thanks for the tech support … or, well, sort of, since I certainly am capable of copying all my data off and resetting the Drobo without contacting tech support. But you did the best you could.

Now I’m going to ask the hard question.

I spent about 1500 dollars on the DroboPro to protect my data with the explicit promise that your device would handle drive failures and allow me to swap drives in and out flexibly. Now a drive fails and it doesn’t do any of that, but instead goes into total rebuild loop from which there is no escape, doesn’t allow me to decommission a drive, doesn’t report the drive failure, and doesn’t inform me correctly about any drive protection I have remaining.

In short, you sold me insurance for my data … only it didn’t work when the time came to deliver. I had to spent over 500 dollars on new drives to offload all my data. I’m lucky I didn’t lose everything.

I want to know how Drobo plans to respond to this situation. I want a real response and I don’t want generic apologies. You sold me a lemon of a product that is defective by design.

Escalate this to a VP if you have to. I’m waiting to see if you, Drobo, as a company, can step up and do the right thing, whatever that may be.

I think the problem comes down the to Dashboard not knowing the difference between dual disk failure and single disk failure. Its always sees the issue as a dual disk failure and will report it to you as such.

But like many people have said, all you have to do is contact technical support and upload a diag file, they will tell you which one is at fault. Takes only a couple of seconds.

"But like many people have said, all you have to do is contact technical support and upload a diag file, they will tell you which one is at fault. Takes only a couple of seconds. "

I really don’t think this is good enough - why on earth can’t they just sort the desktop software out so that it tells you? The answer is that they obviously could, but choose not to. I guess that’s for business reasons (selling drobocare / deliberatly making it difficult to support wihout their help).

But then again, There’s isn’t even a way to set the disk spin-down timer on my drobo which is the most basic of settings. So perhaps they are just no good at writing software.

[quote=“rkuo, post:30, topic:2571”]
I"m copying my data back. Now for the fun part. Here’s my last e-mail to tech support.[/quote]

And…? Any response from DRI?

[quote=“egmaf, post:33, topic:2571”]

I got escalated to level 3 tech support, who said they would try to make it right. I told them they could do any or all of three things:

  1. take a point by point list of the problems I encountered to product management and get an explanation and commitment to fixing the major defects and feature omissions in a product designed to supposedly protect data
  2. Upgrade me when available to a product that actually does those things.
  3. Reimburse me for the material cost of having to purchase extra storage to backup all my data.

Haven’t gotten any response yet … as you might expect.

[quote=“rkuo, post:34, topic:2571”]

I get emails all the time from DRI. Encouraging me to purchase a new unit!

One month with no response from Drobo. Maybe it’s time for a guest article on Techcrunch or the like?

Finally a response.

As you would expect:

  1. No acknowledgement that the product is defective or needs to be fixed.
  2. No timeline for any fixes.
  3. No reimbursement for the fact that the product fails at the only thing you buy it to do (protect against drive failures)
  4. Tentative offer of a refund, but will need to check with mgmt.

Any quiet 8 or more bay alternatives available?

Press them and you’ll get a refund. I’ve been there.
As for alternatives: get a NAS. QNAP or Synology.

i think qnap tend to be slightly better built (metal trays on their lower end models versus plastic) and you can downgrade the firmware if you dont like the latest

i’ve never owned a qnap though, but i have had 3 synologys and i’ve always found their support to be first class and i’ve never had less than a 100MB/sec on any of my 3 units

Yes, QNAP seem to be better build, more metal less plastic, I have 4 of them and one Synology. The downgrade option on the QNAP is very helpful if a beta has a problem and they release fw with smaller intervals. At this stage they are ahead of Synology! Synology has the SHR volume option to mix different-size disks.
Performancewise they are the same and the basic stuff is also the same.