my 1st gen1 hard drive failure

I recently had a drive fail in my gen1 (first time a drive failed actually)…
but so far after a few rebuilds i have a solid drive failure but all my data :slight_smile:

and as i was away last week i need to catch up on some posts but will post back afterwards with more details as i took some notes and steps etc as things progressed (and still in progress)

Hi, i finally got a chance to type up my running log notes about the hard drive which failed :slight_smile:

High Level status of gen1 drobo slots, in SDR mode:
1.5TB
1.5TB (flashing red currently, and i need to order a new replacement)
1TB
1TB

Data = backed up
Data = still accessible as normal.
A drive needs to be replaced.

Running log Notes: (including some thoughts i was thinking about in case useful)

day zero:
all was ok for a long time, though accessing or copying or verifying data was very slow for a few hours. (maybe it was windows, or maybe i reached a critical stage of the 90% full but wasn’t sure as have been around 90% for quite a while)

day 1:
all ok until computer hang/crash
(all ok meaning computer and drobo working fine, and usual 3 of 4 drives are green, and 1 shows red as am at 90% full and haven’t been able to upgrade a drive yet)

= shutdown computer and reboot, and use as normal for a bit of time and then ran a chkdsk /x on 1st (larger) volume. “ah-haaa i can
save time” i thought, as i left it overnight to run the chkdsk which usually takes hours on that volume. :slight_smile:
(sometimes the 1st pass of chkdsk needs a 2nd pass - so running overnight can save a good few hours in case needed again)

i had also shutdown all other programs (especially any residing on the drobo), including ddservice and dashboard, before doing the chkdsk.

day 2: chkdsk thankfully showed the all clear message ending in “and found no problems” :slight_smile:
= so i ran chkdsk /x on the 2nd (smaller) volume…
… and thats when nothing seemed to happen, as chkdsk wasn’t updating any cmd lines, and i realised that the drobo (and/or) computer had probably gone into spindown / spin saving mode, and that ideally i should have waited for the computer and drobo to wake up etc
before trying something else…

(it was before my morning coffee you see) :slight_smile:
and that’s when i noticed the flashing lights on my gen1 (all 4 were blinking/flashing green and red) - it’s possible that they were flashing already, though i think it only was flashing after trying the 2nd chkdsk…

Chkdsk was still running on the 2nd volume, and while i tried to cancel it via a command, i seem to remember reading that you cant stop a chkdsk until it finishes, so i let the chkdsk slowly continue its progress status, as below, while drobo was still flashing all lights presumably doing its rebuild/verify/blinking work at the same time.

chkdsk mentioned some deleting index lines, but it then completed, as did the status of the drobo rebuild. (all lights were solid green again, with the 3rd drive in the bay a solid red as per usual)

(insert slightly worried expression :smiley: as part of me thought that the rebuild etc would show wrong or missing data to chkdsk, which would in-turn maybe delete or fix the wrong things, while the other part of me thought that chkdsk should actually still see the correct info because only 1 drive seemed to have failed - as per a rebuilding seemingly in progress).

so after chkdsk completed, i decided to run chkdsk on that 2nd volume again (in read only mode this time for info purposes) where it mentioned a missing or duplicate object etc, and drobo lights were fine as normal.

after chkdsk finished, and said that errors can only be corrected when chkdsk is not in read-only mode, i thought to run it on the 2nd volume again, in /x mode.

and thats when all lights flashed green/red again as per another rebuild.
then chkdsk finished, (including messages about re-inserting a missing index etc)
and then the drobo completed its rebuild too
but this time, all lights were green, except the 2nd slot which was now flashing/blinking red, indicating that the 2nd slot drive had failed.

(maybe this is the culprit the last couple of days, which had caused the initial slowdown, the initial rebuild, and subsequent ones, and
possibly even the original crash?)

rather than run another chkdsk, i launched ddservice and dashboard back again, to see what info it would show… and it shows exactly that… 2nd slot drive (wd15eads) has failed and data at risk until replaced, with 0% free space remaining.

i took a diagnostics for future ref.

(the 2nd slot faulty 1.5tb drive was purchased in 2010 so is 4-5 years old, though i didn’t start using them until later on when i did the wd10eads upgrade project thread)

Incidentally, the 1st slot drive is also a wd15eads (and from same batch of 7 identical drives), but so far seems to be holding its own - but might be worth bearing in mind for a swap. it seems the original wd10eads 1TB drives are still going strong, despite being older and used for much longer.

I tested a few random files of different types, jpg thumbnail and full view, audio, text, and a few random parity hash checks that i had within some folders, and all seemed ok, but this was just a tiny sampling.

Next action steps considered were the following:

  • to hot swap replace the faulty 1.5tb wd15eads from 2nd slot of gen1, with a previous and working 1tb wd10eads (which i only took out to make room for the wd15eads larger drive), which would give me some better protection (in theory???)
    (but i would need to find it as it might be in the drobo-s)

  • to run through my syncback routine (while important data is backed up, it had been a few weeks since my last full mirror of gen1 data onto gen2) but also to firstly run it in compare mode to see if any files would appear to be missing on the gen 1 as a result of the drive failure / windows crash.
    (but ideally i should verify the integrity of the nfts filesystem beforehand, via chkdsk)

  • to run a chkdsk to make sure the existing 2 volumes of my gen1, were still ok.

Seeing as pretty much any action steps would likely cause or require accessing the drives further, i thought i might as well carry out these steps in reverse order.

eg:
a) to run chkdsk /x until both volumes show no errors
b) and then to run a “compare only” of gen1 volumes against gen2 volumes
(which should be relatively almost in synch except the last few weeks basic data etc)

c) However, if there were any “new” files/folders on the gen1 that were not yet backed up onto the gen2, i would copy and paste and verify these manually.
(i would also not allow any “synching” to take place, because the dashboard shows the drobo as having 0% space available, which im still not 100% sure what that means or limits me to?)
= it turns out this wasn’t necessary as im using a “mirror-right” approach which only syncs the destination side, based upon the source, and its just a basic semi-automated script which doesn’t modify any archive flags etc.

a) = ok
volume 2 showed no errors in chkdsk

  • volume 1 (the larger one) seems to be churning away but for a long time on 0%
    also noticed that all 10 blue Led lights are lit up… (10th looks brightest)
    but thankfully this chkdsk also completed and with no errors found.

b1+c1) compare seems to be ok for the 2nd smaller volume = ok
and back up of any new data etc = done.

b2+c2)
compare for the 1st larger volume = ok
backup of any new data etc = done

I also had a couple of windows explorer crashes, so did some more chkdsks before finally powering down the gen2 (to be an offline
backup), but of the chkdsks i ran on the gen1 which had the problems, here are the chkdsk log entries of special note:

#########

Deleting an index entry from index $O of file 25.
+
Missing object id index entry or duplicate object id detected for file record segment 43256.
+
Errors found. CHKDSK cannot continue in read-only mode.
+
Inserting an index entry into index $O of file 25.
+
ran again and got the no problems found message.

#########

just a quick update on this:

i managed to find a replacement drive which is a WD15EADS model.
it took a while to find, but since i was not able to find another green low-power model, i ended up getting one with the black lable.

i opted to go for a replacement using EADS (rather then other models that have 4k sectors etc) since all my others are wd eads drives to avoid having to update firmware.

i used the wd utils (data lifeguard) to run all checks on the new drive (which took about 17hours) and all checks pass.
smart is ok, quick test and full test.

with power still on, the faulty drive (red blinking slot) was taken out,
and the new drive was put in
and rebuild is in progress.

with hindsight, it would have been nice to try some other drive tools on the drive, and to see what ttl timeout settings that model has, in case things can be improved by updating the settings, but i wanted to get the drobo in a stable state as it was already several days since the drive failed, let alone all the chkdsks and time to locate a drive and have it shipped from another country lol. :slight_smile:

if anyone does know more about those ttl drive timeout settings please feel free to post and i have more details saved about the make and model and firmware of the drive if needed too.

eta for current rebuild was about 72hours rebuild time,
current eta to completion via dashboard = 54h

just to update this;
the rebuild did actually complete fine and original eta was pretty much accurate.

(can post back full details later on)

btw i also noticed the small print, where the drive which had failed, mentioned the words “recertified”… maybe it had failed for someone else in some way, and was shipped back and then sent out and resealed.

found some info about recertified here and 1 post down on linked page:
http://www.drobospace.com/forums/showthread.php?tid=2576&pid=159926#pid159926

Paul,

Have you been posting on this thread since August and no one as been replied to it yet?

I am not trying to stir the pot, I am actually a 1st Gen owner and I came to this forum to try for some help but it seems like there is no help to be given for an original owner. I am scared. I upgraded ONE disk on my Drobo 1st Gen and I basically get ONLY 4 Red lights now. I don’t know where to turn to. But if no one is out there answering, such as your situation, I think I am done with Drobo. Let me know if you have any success. I am sorry I have no solution just problems.

hi mister mouse, i was just about to refer to this post of mine from another users thread, and saw your reply to me.

thank you for the enquiry - my issue was actually more of a running log of the problem i had, and the solution i carried out to rectify (and i updated the little thread post icon with a thumbs-up picture to show that it was resolved), so that others could see what happened in case they get something similar :slight_smile:

i saw your latest post here about the issue so will add some thoughts there for you instead if thats ok?
http://www.drobospace.com/forums/showthread.php?tid=145602

Thank you Paul