Hi, i finally got a chance to type up my running log notes about the hard drive which failed
High Level status of gen1 drobo slots, in SDR mode:
1.5TB (flashing red currently, and i need to order a new replacement)
Data = backed up
Data = still accessible as normal.
A drive needs to be replaced.
Running log Notes: (including some thoughts i was thinking about in case useful)
all was ok for a long time, though accessing or copying or verifying data was very slow for a few hours. (maybe it was windows, or maybe i reached a critical stage of the 90% full but wasn’t sure as have been around 90% for quite a while)
all ok until computer hang/crash
(all ok meaning computer and drobo working fine, and usual 3 of 4 drives are green, and 1 shows red as am at 90% full and haven’t been able to upgrade a drive yet)
= shutdown computer and reboot, and use as normal for a bit of time and then ran a chkdsk /x on 1st (larger) volume. “ah-haaa i can
save time” i thought, as i left it overnight to run the chkdsk which usually takes hours on that volume.
(sometimes the 1st pass of chkdsk needs a 2nd pass - so running overnight can save a good few hours in case needed again)
i had also shutdown all other programs (especially any residing on the drobo), including ddservice and dashboard, before doing the chkdsk.
day 2: chkdsk thankfully showed the all clear message ending in “and found no problems”
= so i ran chkdsk /x on the 2nd (smaller) volume…
… and thats when nothing seemed to happen, as chkdsk wasn’t updating any cmd lines, and i realised that the drobo (and/or) computer had probably gone into spindown / spin saving mode, and that ideally i should have waited for the computer and drobo to wake up etc
before trying something else…
(it was before my morning coffee you see)
and that’s when i noticed the flashing lights on my gen1 (all 4 were blinking/flashing green and red) - it’s possible that they were flashing already, though i think it only was flashing after trying the 2nd chkdsk…
Chkdsk was still running on the 2nd volume, and while i tried to cancel it via a command, i seem to remember reading that you cant stop a chkdsk until it finishes, so i let the chkdsk slowly continue its progress status, as below, while drobo was still flashing all lights presumably doing its rebuild/verify/blinking work at the same time.
chkdsk mentioned some deleting index lines, but it then completed, as did the status of the drobo rebuild. (all lights were solid green again, with the 3rd drive in the bay a solid red as per usual)
(insert slightly worried expression as part of me thought that the rebuild etc would show wrong or missing data to chkdsk, which would in-turn maybe delete or fix the wrong things, while the other part of me thought that chkdsk should actually still see the correct info because only 1 drive seemed to have failed - as per a rebuilding seemingly in progress).
so after chkdsk completed, i decided to run chkdsk on that 2nd volume again (in read only mode this time for info purposes) where it mentioned a missing or duplicate object etc, and drobo lights were fine as normal.
after chkdsk finished, and said that errors can only be corrected when chkdsk is not in read-only mode, i thought to run it on the 2nd volume again, in /x mode.
and thats when all lights flashed green/red again as per another rebuild.
then chkdsk finished, (including messages about re-inserting a missing index etc)
and then the drobo completed its rebuild too
but this time, all lights were green, except the 2nd slot which was now flashing/blinking red, indicating that the 2nd slot drive had failed.
(maybe this is the culprit the last couple of days, which had caused the initial slowdown, the initial rebuild, and subsequent ones, and
possibly even the original crash?)
rather than run another chkdsk, i launched ddservice and dashboard back again, to see what info it would show… and it shows exactly that… 2nd slot drive (wd15eads) has failed and data at risk until replaced, with 0% free space remaining.
i took a diagnostics for future ref.
(the 2nd slot faulty 1.5tb drive was purchased in 2010 so is 4-5 years old, though i didn’t start using them until later on when i did the wd10eads upgrade project thread)
Incidentally, the 1st slot drive is also a wd15eads (and from same batch of 7 identical drives), but so far seems to be holding its own - but might be worth bearing in mind for a swap. it seems the original wd10eads 1TB drives are still going strong, despite being older and used for much longer.
I tested a few random files of different types, jpg thumbnail and full view, audio, text, and a few random parity hash checks that i had within some folders, and all seemed ok, but this was just a tiny sampling.
Next action steps considered were the following:
to hot swap replace the faulty 1.5tb wd15eads from 2nd slot of gen1, with a previous and working 1tb wd10eads (which i only took out to make room for the wd15eads larger drive), which would give me some better protection (in theory???)
(but i would need to find it as it might be in the drobo-s)
to run through my syncback routine (while important data is backed up, it had been a few weeks since my last full mirror of gen1 data onto gen2) but also to firstly run it in compare mode to see if any files would appear to be missing on the gen 1 as a result of the drive failure / windows crash.
(but ideally i should verify the integrity of the nfts filesystem beforehand, via chkdsk)
to run a chkdsk to make sure the existing 2 volumes of my gen1, were still ok.
Seeing as pretty much any action steps would likely cause or require accessing the drives further, i thought i might as well carry out these steps in reverse order.
a) to run chkdsk /x until both volumes show no errors
b) and then to run a “compare only” of gen1 volumes against gen2 volumes
(which should be relatively almost in synch except the last few weeks basic data etc)
c) However, if there were any “new” files/folders on the gen1 that were not yet backed up onto the gen2, i would copy and paste and verify these manually.
(i would also not allow any “synching” to take place, because the dashboard shows the drobo as having 0% space available, which im still not 100% sure what that means or limits me to?)
= it turns out this wasn’t necessary as im using a “mirror-right” approach which only syncs the destination side, based upon the source, and its just a basic semi-automated script which doesn’t modify any archive flags etc.
a) = ok
volume 2 showed no errors in chkdsk
- volume 1 (the larger one) seems to be churning away but for a long time on 0%
also noticed that all 10 blue Led lights are lit up… (10th looks brightest)
but thankfully this chkdsk also completed and with no errors found.
b1+c1) compare seems to be ok for the 2nd smaller volume = ok
and back up of any new data etc = done.
compare for the 1st larger volume = ok
backup of any new data etc = done
I also had a couple of windows explorer crashes, so did some more chkdsks before finally powering down the gen2 (to be an offline
backup), but of the chkdsks i ran on the gen1 which had the problems, here are the chkdsk log entries of special note:
Deleting an index entry from index $O of file 25.
Missing object id index entry or duplicate object id detected for file record segment 43256.
Errors found. CHKDSK cannot continue in read-only mode.
Inserting an index entry into index $O of file 25.
ran again and got the no problems found message.