Drobo

Drobo Errors or True Disk Errors ?

I recently suffered multiple dropouts when playing back videos stored on my Drobo V2 (2 x 2TB + 2 x 1TB, 1.3.5 FW, 1.6.1 Dashboard). During those dropouts, which lasted several seconds, the image on the Mac screen was frozen and the 4 Drobo LEDs on the right were blinking green<->red.
No message appeared on Drobo dashboard during those blinking periods.
No write activity was proceeding on the Drobo at the same time.

The Drobo Log shows at the time of the dropouts multiple entries like this one :

Furthermore, the cumulated “block error counts” in the log is 58 (see below the log abstract) concentrated on the sole 2 WD 2TB disks.

What I find extremely curious is that all those “block errors” are equally split on the 2 new WD20EADS 2TB drives I included 3 months ago, and for which I suffered multiple problems (including 2 power outages :frowning: during their lengthy relayout.
If I have had 58 true hard disks errors on those WD 2TB disks, I must have been exceptionally unlucky and have got a batch of faulty drives.
On the other hand, if somehow the Drobo “beyond RAID” sub-structures got corrupted during my problematic relayouts, that may explain why I got zero errors on the 2 drives which were never replaced and plenty of errors on the ones relayouted.

Does anybody know what is the exact nature of those “block errors” ?
Is there any way to check (outside of Drobo ?) for true disk errors ?
Should I return the 2 disks under warranty ? (assuming I can prove they are defective outside a Drobo…).

Any help appreciated…

PS: and yes, I went through the multiple posts were Drobo users were complaining Drobo was finding faulty drives which tested OK outside of Drobo.
My problem is kind of opposite since the Drobo does not complain about any of my drives, it just takes a walk for several seconds every 10mn or so when reading from them.

[quote]

Error counts (historical included)

disk:

WD-WCAVY0213497 WDCWD20EADS-00R6B0 01.00A01
=> slot : 3
=> block : 28
=> block duplicate : 0
=> block rate (sec/b) : 85762 (list length = 5)
=> FULL timeout : 0
=> RECOVERED timeout : 0
=> DISK START ERROR timeout : 0
=> SHORT timeout : 0
=> SHORT timeout rate : 0/s
=> SHORT timeout total : 0
WD-WCAVY0256462 WDCWD20EADS-00R6B0 01.00A01
=> slot : 0
=> block : 30
=> block duplicate : 0
=> block rate (sec/b) : 350 (list length = 1)
=> FULL timeout : 0
=> RECOVERED timeout : 0
=> DISK START ERROR timeout : 0
=> SHORT timeout : 0
=> SHORT timeout rate : 0/s
=> SHORT timeout total : 0[/quote]

No suggestion from anybody ? Jennifer ??

What make/model drives?

As for finding “true” errors, it’s a bit fuzzy. Most of the manufacturer-supplied diagnostics only provide Pass/Fail information.
Pass has two states:
a. Drive has NO errors
b. Drive has corrected errors (spare sectors were used)

Fail has one state:
Drive has errors that cannot be corrected

The key would be to find a way to determine whether Pass is condition a or b.

If TLER is disabled, it adds yet another layer of complexity as the drive will pause while correcting a recoverable error and in that pause time, Drobo may think the drive is simply encountering an error condition and will try to do its own remapping.

If I looked at the diags and saw those kind of errors on those drives, I would suggest to replace those drives.
Those are on their way out. They haven’t had enough errors in a certain time frame to be marked as bad by the drobo yet but they are heading that way.

Thanks Jennifer, at least it is clear.
Sad to find that on 2 brand new (3 months) 2TB WD Caviar Green; the only explanation I can find is that they were part of a faulty batch, or that the Drobo does not like those (but this same exact drive+FW was certified OK for Drobo, right ?).
2 more questions :

  1. what tool should I use to justify their replacement under warranty with the seller ?
    Drobo log may look a bit cryptic to them…
  2. is there any integrity test you can perform on a stand alone new drive before inserting it in a Drobo to prevent that ? I did use for more than 24h on each drive the Drive Genius Integrity Test, but obviously, it does not appear a-posteriori very discriminating…

I worried about that and used the TLER Utility (WDTLER.exe) to set the TLER ON, but obviously it did not help…

  1. What ever tools they use from their website.
  2. Not that I am aware of.

http://support.datarobotics.com/app/answers/detail/a_id/137/kw/drive%20failed/r_id/100004

http://support.wdc.com/product/download.asp?groupid=608&lang=en

Personally I trust the DOS utility more than the Windows one because there far less “stuff going on in the background” in DOS, but there have been rare occasions where the Windows one worked and the DOS one didn’t. YMMV.

How do you check which drives have been qualified?

Looks like WD don’t recommend their “end-user” drives in NAS & RAID.
http://www.drobospace.com/forums/showthread.php?tid=1120

Did you replace your drives? How are they behaving?

[quote=“skywalka, post:8, topic:780”]Looks like WD don’t recommend their “end-user” drives in NAS & RAID.
http://www.drobospace.com/forums/showthread.php?tid=1120

Did you replace your drives? How are they behaving?
[/quote]
I did replace my drives (new ones with firmware 32S2B0 instead of 00R6B0), and have not had a problem since.

I believe the WD recommendation about not using those drives in RAIDs is because by default their TLER option is turned off. I had unsatisfactory iterations with Data Robotics support on that subject, and they basically argued that WD warning against those disks in RAID was not pertinent because “Drobo is not a RAID”.
To me it seems highly silly, since the TLER argument is about the timeouts involved in error detections done by intermediate disk controllers, like RAIDs… or Drobos, and I thus do not see why it would not apply there too.

I did replace my drives (new ones with firmware 32S2B0 instead of 00R6B0), and have not had a problem since.

I believe the WD recommendation about not using those drives in RAIDs is because by default their TLER option is turned off. I had unsatisfactory iterations with Data Robotics support on that subject, and they basically argued that WD warning against those disks in RAID was not pertinent because “Drobo is not a RAID”.
To me it seems highly silly, since the TLER argument is about the timeouts involved in error detections done by intermediate disk controllers, like RAIDs… or Drobos, and I thus do not see why it would not apply there too.
[/quote]
Then do Data Robotics say it’s OK to use the RAID models in their non-RAID Drobo?

You can’t change the setting on the new drives so is TLER irrelevant, preferable or discouraged?

[quote=“skywalka, post:10, topic:780”]Then do Data Robotics say it’s OK to use the RAID models in their non-RAID Drobo?

You can’t change the setting on the new drives so is TLER irrelevant, preferable or discouraged?[/quote]

Sure, you can use the “Enterprise” version in Drobo; only caveat is that they are about 50% more expensive and harder to find.
There is a WD utility which is supposed to allow setting of TLER. I assume it still works on the new models with the latest firmware.