Drobo

DroboPro Critical Error, and FW 1.1.3 problems

I reported a couple of problems to Tech Support yesterday, one being an unexpected Critical error, and the other the failure of the DroboPro firmware 1.1.3 install, plus two feature requests. Jeff came back quickly with a very helpful reply, which I think is worth posting:

Answering your questions and feature requests below inline:

“1. At 1:28AM, DroboPro reported a Critical error, and started a rebuild. By 7:00AM there was about 2 hours to go – an unusually short amount of time. What caused it, which disk drive was involved, and if only one sector had to be remapped, why did it take 8 hours?”

The Drive in slot 2/8 ( WD-WCASJ1573704 WDCWD10EACS-00ZJB0 ) had a issue at Sep 16 01:28:40 and did not respond like the drive was removed. Drobo started a relayout and then the drive came back at Sep 16 01:29:04. When it came back it was added as a new drive (the reason it took the longer relayout time)

You could leave the drive in, but if I was [sic] you, a drive not responding for 25 seconds is a pretty big issue and I would put it though a RMA process.

Also while you are at it, the drive in slot 6/8 is being watched closely by the Drobo and is on its way out also. ( WD-WCASJ0357545 WDCWD10EACS-00ZJB0 )

“2. I have not been able to manually install FW 1.1.3. Why? See my DroboSpace thread on the subject (Suite B).”

Read your post, Please try:

If you use an Apple Mac computer, the boot drive’s boot volume setting might be set to “case sensitive.” This setting prevents an update from seeing the correct case of the filenames in the firmware file directory.
Work around this problem by creating an additional directory called “updates” (with a lower-case ‘u’) in /Applications/Drobo Dashboard/
For example: /Applications/Drobo Dashboard/updates
Then try your update process again, and the process should complete.

Please check you have permissions to the Drobo (updates) folders

If the above is not the issue please try a different computer to do the update. (for updates if does not matter if you use a PC or a Mac as long as Drobo Dashboard is installed)

“3. Feature request. Please update Drobo Dashboard to provide specific information about ANY failure, or stop using log file encryption so we can determine this for ourselves.”

As for the diagnostics, they contain proprietary information so making them readable is not an option. However I do think that having a Techie Panel (or having more detailed results show up when an option is turned on) would be nice. I’ll send this up to Product Development.

  1. Feature request. Please update Drobo Dashboard so that it knows the difference between a failure involving only one drive, when dual disk redundancy is turned on, vs. the more dangerous case where there are no spare drives available.

I agree! This is in the works, no ETA when it will be finished / included in a future Drobo Dashboard.


Thanks, Jeff!!!

The two drives in question (WD 1TB Green) have never caused a problem (or at least Drobo has silently fixed them), and are about 18 months old. When I called back and talked to Bryce, he suggested that I download the Lifeguard diagnostic program for the WD drives and run it to see what it says. Since I don’t have a spare slot available on my Mac, I will have to use an external USB adapter, and hope that works. What will happen if the diagnostics say that everything is fine is TBD at this point.

I did a hot swap with the drive in question, guessing (correctly) that the numbering starts on the left. I would have shut down the DroboPro, but after I uploaded about 1.5GB using CrashPlan, only to decide that I didn’t like their proprietary, encrypted format, I started another synchronization job using ChronoSync, deleted the old files, and emptied the trash to make room. I’m not just sure what I did – I think I must have somehow invoked the Secure Delete function. Anyway, that process has been chugging away for the past three days, and I am now down to 63 items remaining, from an original 200+. I assume these are folders, with lots of photos.

I can’t seem to cancel that operation, and I’m afraid to try the Force Quit option. So I did a hot swap of the first drive in question with a new 2 TB drive. At first, Drobo Dashboard estimated 65 hours, then 127, and now it’s down to 22/25/28/25/30/17/24 hours remaining. When it finishes, I will remove the second bad drive, and replace it with a second 2 TB drive, then recycle those drives in something that is less critical, after running the WD diagnostics.

But this indicates the need for a vastly improved diagnostic readout capability. Not only did I have to go to Tech Support to find out what happened to the first drive, but if Jeff hadn’t been so conscientious, I would have never known that another drive was becoming problematic. Murphy’s law didn’t strike in this case, but it certainly could have, with a double drive failure. And after I did the hot swap, for a while Drive 1 lit up red, and everything else green, before returning to the alternating green/yellow sequence. Be still, my heart!

I’m not entirely sure what SMART statistics are available, or what they would say, but certainly any suspicions that the Drobo firmware has needs to be reported to the users – unless Tech Support would like to camp out in my office!

Since the 1.1.3 firmware seems to apply to VM ESX primarily, I’m not in a rush to install it, so I will wait until everything settles down before fixing the other problem. However, the problem would appear to be twofold.

First, the directory is indeed called “Updates”, and not “updates,” and I didn’t create it, the Drobo installer did. And I have no idea how to see or change the boot drive’s boot volume setting to be case sensitive or not, or what other problems that might casue (or solve).

Secondly, the permissions are set to give my Admin account Read/Write access, but not my limited user account else, even though I installed the Drobo Dashboard in my limited user account (so that I wouldn’t have to log in as Admin just to get the iSCSI initiator to kick off). And I can’t run the Dashboard as Admin, because it wasn’t installed there – Catch22!

This is yet another example of horrendously bad coding and QA practices, by programmers and testers who always assume that everyone runs in Administrator mode, just because they do. WAKE UP, DRI! At least some Mac users actually care about security, and not allowing botnets to run on their computers!

It is inexcusable to merely say, “Firmware update failed” without providing some clue or reason why. Not only does this greatly irritate the customer, but it causes increased Tech Support costs, and hurts their bottom line. The VP of Engineering ought to ream some new orifices in their programming team, and if he doesn’t understand the issue, then the CEO needs to look for a new VP.

Although the Drobo firmware occasionally suffers from less than Outstanding performance, in general it is pretty solid and reliable. The functionality of the Drobo Dashboard, however, is well below par, and ought to be re-architected and rewritten from scratch.

IMHO.

Suite B - This increase in troubleshooting/monitoring communicative-ness is something I have been asking for from them literally for years. Including internal temperature, individual drive temperature, fan speed, SMART errors (like increasing number of ATA faults or block relocations) and stuff like that. It is very VERY encouraging to have a corporate warm body responding to you that these items are actually on the queue to be dealt with.

I don’t know if you have a ReadyNAS, but mine (I have 5 different ReadyNASes) all give me all of this info, and I have used it 7 times in the past 3 years to replace drives on the way out before they failed.

This is one of the reasons I still recommend ReadyNASes over Drobos and DroboPros when people ask for my advice.

It’s like covering that damn activity light with the lid. Don’t do it. If there’s info available in there - ANY INFO - we want it!

… and I also must say that the case sensitivity item is REALLY good info for the Mac. I have my DroboPro packed right now because I’m moving. (So far it is not reliable enough to be a part of my “Critical” items that stay running until the final day) But when I am in the new place and it is once again running I will test this on my Mac. I still find that the DroboPro works better on a Mac than on Windows, so that would remove my last Mac-related problem - the inability to upgrade.

Thanks for the effort and the update - Very useful!

@SuiteB - some of your comments are dead on, others have a weak foundation.

Complaint about running with root privileges – this is valid. Shame on DRI. This has been an issue for years.

Comments about Drobopro failing drives vs WD’s test tool – this is a gray area. How does it benefit you to know N blocks seem to be bad? Do you know how many are on the drive? Can you really tell now that Apple has moved from base 2, to base 10 for reporting capacity. Do you know enough about drive failures to know if N is good, or bad? What if it is N-2? See, there is no real way to make a meaningful determination.

Comparing a drive that Drobopro failed with the WD tool is pointless. Can you tell us that they measure “failures” in the same way? The Drobo is watching drives as they perform hour-by-hour, day-by-day, week-by-week, etc. WD’s tool only runs for a short time. Look at the incentives, Drobo will fail a drive “early” to protect your data, WD (or any other drive manufacturer) will fail a drive “late” in order to deflect a return. Which one of these would you prefer to trust your data to? Why?

Switcher, with all due respect, you are jumping the gun.

I didn’t say anything one way or the other about Drobos’ diagnostics vs. WD diagnostics, except to speculate about a possible finger-pointing game on WD’s part. I haven’t concluded anything at all yet as to the efficacy of Drobo’s tests vs WD. In fact, just today I started running the WD “exhaustive” test, which won’t be through until sometime after midnight, so I’ll see the results in the morning. And at least while I was in the office (where I was running the test, because it only runs on Windows), the drive only seemed to be spinning, and not seeking, so if there was an extended seek or other problem that Drobo caught, the WD diagnostics might or might not catch it. TBD.

In the meantime, I gave sufficient credence to Tech Support’s summary of the problem to go out and buy two new 2 TB drives, for $400+. I’ve swapped out one, and will swap out the other tonight.

So where have I been unfair, or premature in my judgement – gray area or not?

All I’m really asking is that the Drobo Dashboard report EVERYTHING it sees, whether it is a solid failure, or just a potential one lurking in the wings. Sooner or later, I will decide for myself whether they/it are merely crying wolf, or whether there is a serious problem. But at the moment, I might as well have a stick in one good eye, for all of the useful input I get.

I would certainly like to see greatly improved diagnostics incorporated within Drobo Dashboard. But it also might make sense to turn the light on a drive to yellow to indicate “caution” if excessive errors are occurring, rather than waiting for a complete failure.

Oh, no, not another “what are you going to do with that info” response. Don’t tell me we’re off in THOSE weeds again. Switcher, as soon as you begin a sentence with “How does it benefit you to know…” we might as well stop right there. It ALWAYS benefits us to know - anything - whatever it is. All info is useful. It doesn’t take rocket science to put it in some context and make a sensible decision based on it.

Let me give you a nice practical way to do this. And I’m sorry to keep harping on the ReadyNAS comparison in this forum, but this is a glaring gap where it’s needed. If there is a problem developing with a drive in my ReadyNAS, like Suite B was warned about by Drobo Support, My ReadyNAS will send me this exact alert:

Reallocated sector count has increased in the last day. Disk 5: Previous count: 102 Current count: 105 Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk.

Isn’t that a nice, plain, easy to understand alert? And it is informative. And it doesn’t make it so I have to know about how many sectors are on my disk. It underscores the one simple idea - an error happened. We corrected it. But if this keeps happening, you should replace the disk. And most importantly, it does all of this before the disk fails. I’m informed, and the decision is mine to make, in an informed way.

THAT is what I want my DroboPro to tell me. THAT is what I expect from a PRO device. THAT is apparently what the Drobo already knows internally, as evidenced by the warning given to Suite B by Support. If there is available information in there, I WANT IT. Don’t hold back. I will decide whether it makes sense or is useful. That’s what PRO is all about.

Amen, Corndog!

You will probably not be surprised to learn that after I ran the WD LifeGuard diagnostics for about 15 hours, the only output I received was that the drive “Passed.” So is the drive showing real problems, or is the Drobo firmware hypochondriac? Who knows, but I’m inclined to believe that something really happened. Now, will it happen again, God only knows.

As far as I can tell, the WD test was only reading (and perhaps writing) every sector, with no apparent attempt to do seeks or any other kind of heavy-duty functional test that might have cause the device to go offline during a heavy load. It didn’t even inform me whether the tests were destructive or not. So brickbats and boos to Western Digital, as well. If I try to RMA the drive, i fully expect a finger-pointing contest.

(This reminds me of the time when I was working for IBM in Huntsville, supporting NASA in the mid 1960’s, and we were having a problem with some of the 1311 disk drives (they were about 14" across, 8" high, and held about 130 MB!). We called the Field Engineers, who came in and assured us that according to their tests, everything was perfect. I told them, “Great! Now if you can just get your test programs to run the Launch Vehicle Digital Computer simulation program, everything will be hunky-dory!”) You can test all you want, but real life performance is all that counts, eventually. And for that you need DATA, in order to do a reasonable assessment.)