Drive dying?

I’m concerned that a drive in my 2nd Gen Drobo is dying but I can’t see for sure.

The basic gist is that while streaming music from the Drobo a spontaneous Data Protection rebuild started. For the first 10-15 minutes I couldn’t use Finder or terminal commands on the Drobo.

I popped out for a little bit and when I came back I then went to do something on the Drobo and had a brief heart attack as:

  1. the first disk disappeared from the dashboard bay (empty + no light)
  2. All the remaining Drobo bay lights went a solid green (10 -15 seconds)
  3. the first bay reappeared but with a solid red light
  4. then all lights started the yellow / green blinking again
  5. Data Protection restarted. (Time Remaining has gone down from 340 hours to 137 hours in about 10 minutes)

Do you think the first drive is on the way out? If it is I’d really rather replace it now with a new (tested) drive than wait for the Data Protection rebuild to finish in now 161 hours.

(The time has gone up a little as I am just making sure what how much isn’t backed up from the last time)

2nd Gen Drobo, Firewire connection to Mac running OS X 10.9.2
With 2x2TB and 2x3TB discs, 83% full.

Sounds like your first drive “dropped off” the disk pack for a while.
This can be caused by a loose connection, a drive that is going bad, or an issue with the backplane itself.

Gently but firmly push all your drives in. Just because it latches doesn’t mean the drive is well seated. Vibration can cause the drive to work loose over time.

You can try sending a diagnostic log to Drobo Support to see if they can tell you if there’s a problem with any of the drives, but they may tell you the 2nd Gen units are out of warranty. Depends.

If you can shut down Drobo for a while, shut it down, pull that drive out, connect it to a computer and run the manufacturer’s diagnostic.

If it happens again, I would replace the drive just for my own peace of mind.

As for your symptoms…

  1. This is when the drive in the first bay “dropped off” or timed out.
  2. At this point Drobo is in a degraded state - all data is available, but there is no data protection (fault tolerance). Drobo will try to reduce the size of protected storage and re-establish data protection.
  3. Drobo successfully “downsized” your protected storage, but now you are approaching capacity. Red light means “add a drive here very soon”
  4. Now Drobo recognized that there was a drive in the first bay and started to re-establish data protection
  5. Data Protection in progress

I think because the drive was gone long enough that the “downsize” happened, Drobo couldn’t just pretend the removal never happened.

Cheers for the reply Brandon, the Drobo decided yesterday that the disk was fault and flagged it with a flashing red.

I’ve now begun a test using WD’s software to see what it thinks. So far the SMART Status and Short test have come back OK. I’ve started the extended test and I’m waiting for its result. Sadly the disk is 3 months out of warranty. (Likewise my 2nd Gen Drobo, I let it lapse a while back and had meant to renew before it expire. My hope is to buy a new Drobo later this year.)

Yeah, unfortunately SMART status usually has too small a “buffer” for the “working but needs replacement soon” timing.

I had a drive that had (in retrospect) been going bad for months, but SMART kept saying OK until it said BAD. I think I had only 2 or 3 successful boots after that.

My symptoms in a computer were complete system hang for a few minutes, then things would resume like nothing happened. After the drive died I realized the hang was probably from the system waiting on some I/O request while the drive was busy retrying whatever it was doing.

SMART said OK while the drive eventually responded within some threshold, but eventually things failed to the point the drive stopped responding entirely, very shortly after SMART said it was BAD.

WDDiag used to report a code if any bad sectors had been reallocated, but nowadays it seems WDDiag simply reports pass/fail with a measure similar to SMART.
PASS = no errors, or all errors were recoverable
FAIL = unrecoverable errors occurred

Not so useful for pre-emptive replacement. Luckily WD also no longer seems to require the WDDiag results in order to pursue replacement.