Constant activity after drive failure and relayout

I took some time over the holidays to do some backing up and transferred some larger than usual amounts of data to my Drobo.

One evening I copied a little over 270 GB of files and let it run. OS X reported an embarrassingly long 3 hours that turned out to be 9-10 hours in real life. I have noticed that the transfer speeds have been slower than usual recently, sometimes the progress bar stopping altogether for minutes at a time to resume slowly on and off.

The 270 GB transfer seems to have passed correctly, didn’t check every file in detail but merely compared sizes of the original folder to the one on the Drobo.

A few minutes later, drive #2 blinked red and the remaining 3 started to blink green and orange. I’ve seen videos of Drobos failing and they usually only blink red though, is my case normal behavior?

Dashboard had 3 messages, I don’t remember them word for word:

[]One drive is dead
]Replace a drive so I can protect your data
[*]I need at least 2 drives to protect your data or 3 for dual redundancy (even though my 4-bay Drobo doesn’t support that)

The remaining 3 lights kept on blinking, which I’m guessing is the rebuild or relayout process. It completed 10 hours later, 3 solid green lights with the lone blinking red light.

I replaced drive #2 with a new WD20EARX and it went to 4 solid green lights. Sigh of relief but now the issue at hand is that I’m hearing constant disk activity. Not a clicking noise but the typical drive chugging, shuffling as if you were writing data to it. Dashboard is indicating that everything is a-okay, no rebuild or relayout being done. I have no applications accessing the Drobo. It just keeps on making “drive noises” if you know what I mean, along with more frequent fan spin ups.

Is that the internal housekeeping I’ve been reading about? The activity light isn’t on or flashing (but that is normal since I’m not actually accessing the drive). Should the Drobo keep this activity up, even 3 days after I’ve replaced the new drive?

The noise goes away as soon as I put my computer to sleep, which puts the Drobo to standby with an orange light.

Could the 10 hour transfer put a load of stress that one of my drives couldn’t bear? I still have another 500 GB to copy to it but am a hesitant in starting the process.

Dashboard 2.1.2 and Drobo firmware 1.4.1
3 x WD20EARS and 1 x WD20EARX
45% of used capacity

You described the rebuild green/orange process perfectly.
This extra activity sounds exactly like the data redistribution “housekeeping” tasks that run when things are quiet. Drobo moves data around to more equally use all drives. That gives faster performance (because it can use more disks at once) and probably faster rebuild times (because if one drive fails, it will hold only 1/4 of your data rather than up to 1/3.)
I don’t know how long this process will take, though 3 days seems reasonable. Certainly it will take longer if there’s other activity going on, but really the optimizing task isn’t necessary for Drobo to protect your data. That’s why it’s a low priority.

tl;dr Yes, this is normal. Your Drobo will sort everything out in time. Nice work surviving a drive failure. :slight_smile:

Surprisingly quick reply! :slight_smile:

Ah yes, that’s what I wanted to say when I said housekeeping, forgot to expand on it. Sorry my long winded post.

So situation normal then? The Drobo is only redistributing the data back to the new drive? If so, that’s very comforting. But still a little troubling about the slow transfer speeds and drive failure right after completion. It used to be bearable slow but not 0-10 KB/s slow.

I read the support notes below that mention shutting down your computer and letting the Drobo finish the relayout process on its own. But I guess this doesn’t apply to the data redistribution as it goes straight to standby when I shut down my computer.

Would ejecting/unmounting the Drobo from OS X help the process?

About surviving the failure, if you backtrack the 3 days, this happened on January 1st. All computers stores closed, no spare drives on hand, you can imagine the stress.

yes, what happened was:

Drive failed
since you had enough free space it immediately rebuilt back to a “safe” array on just the remaining 3 disks (and simply ignored the broken one) - once you were back to 3 solid greens, its finished the rebuild
when you added the new drive, it was immediately accepted and became part of the pack
now its optimising the layout in the background, somerimes takes a couple of days, the best way to think of it is lke running a “defrag” , its making sure th edata is laid out in the best way possible - it does this consntantly, and silently, the only wayto tell it is happening is drive noise (and possibly a slight performance hit)

unplugging it fromyour computer shoudlnt make a differnece (but everythign does happen a little faster with drobo if you arent using it at the time)

Thanks for the further explanation Docchris.

I transferred some more data to test the waters and it sped through several 10-20 GB in a matter of minutes ( <10 minutes) so it seems like the new drive is helping a bit.

Could the abnormally slow transfers be an early indicator of a bad drive? I still haven’t tested the red flagged drive somewhere else to see if it’s indeed dead or merely snubbed by the Drobo.

On an unrelated note but concerning your signature and one of your recent post in another board, I am considering adding a Synology 4-bay to my line-up, a home or consumer grade one, as a backup to the Drobo. The features seem very compelling but I am hesitant in the ease-of-use, will look into in details later.

hi johnny,
it’s probably been a bit slow due to needing time after the new data/rebuild/optimising, but if in doubt, wait a bit more and take a diagnostics file for support who could look to see if theres any specific problems with a drive.

most times though it need a bit of time for things to settle.

imaging making a tea or coffee with lots of the powdered bits (and no bag)
stir it up in a clear glass
and shine a laser or light through it
= you will get some light.

but wait a bit for things to settle and shine again
= you get a lot more light

& as the device starts to get more and more full (usually above 90%)
then theres already a bit piled up so even if nothing else needs to settle, some of the light is already obscured, but its still quite bright :slight_smile:

(i dont know if that analogy is as good as rdo’s car one and others but hey :slight_smile:

@Paul The slow transfer speeds happened before the drive failure and relayout :wink:

That’s why I was wondering:[quote=“Johnny, post:5, topic:36885”]
Could the abnormally slow transfers be an early indicator of a bad drive?[/quote]

I’m now back to normal transfer speeds or so it seems, will see over the week and report back if it gets better or worse.

About the drive noises, they’re still present 5 days after the relayout but only appear intermittently and less frequently now. Again, I’ll report back later on once it’s settled.

yes - slow transfer speeds are often an indicator of a failing drive, as drobo will have multiple retries at reading data, or the a disk takes a long time to respond to requests

In August 2010 I submitted a tech support issue for a severe performance problem with my Drobo V2. I submitted a log and was told to replace a certain drive (WD 1.5TB GP Drive). I was also told that the drive might work fine for other uses. I was not given any more detail than that.

I replaced the drive and that resolved my V2 performance problem (throughput went back to about 15-20MB/s verses as poorly as 0.1MB/s but it was very variable and inconsistent).

I still use the drive as either a usually offline JBOD backup or it may be spinning 24/7 on my server as a stand alone volume - I forget what I did with it and have two of those 1.5TB drives :-). The disk showed no SMART errors, and it performs just as well as the other 1.5TB GP drive in real world use and HDTune tests.

(Edit: It is definitely spinning 24/7 on my server as an internal JBOD)

What I learned is to submit a diagnostic log with a support case whenever there are unexpected performance problems. And that is the ONLY possible way to resolve the problem I had, short of replacing all the drives, one by one, in order to try to isolate the problem drive. And there was no way to ascertain that the problem was even due to a bad drive that tests find in all the standard ways.

In my experience, most modern drives tend to fail quickly, but it is entirely possible for a drive to go through a phase of recoverable errors before failing. If these errors are recovered by the drive microcode, then the drive might not report any errors. But retries nearly always cause noticeable performance problems - just what you got.

For peace of mind I keep drive (equal to the largest drive in the Drobo) as a spare. I need to know if it is working, so no point keeping it in the cupboard, so it’s in a caddy on my desk. I use it for temporary space, and anything else that I can afford to lose at the drop of a hat.

So, when one of my drives fails, I can have the replacement fitted within a minute (though I wait for the recovery before I start swapping drives; doing it too quickly can confuse the Drobo)

Hi Swifty,

Looking at the SMART data for the 1.5TB drive (WD15EADS) that I discussed above, it has ZERO reallocated sectors, nor any counts on any other “error” values.

The only unusual status is the Load Cycle Count, at 92,929, with 18,346 power on hours. That works out to a load operation every 12 minutes or so. This is the only drive I’ve run across with an inordinate Load Cycle Count but most of my drives need to be pulled from wherever they reside in order to get SMART data. (Whoever designed that standard and all the related interfaces was not SMART, he was supremely stupid, but I’m thread drifting :-)).

Checking my records, the drive was installed in the Drobo for no more than 298 days, representing a maximum possible power on hours of about 7150. So the drive has spun a minimum of 11,000 hours, as an internal SATA drive in my server, since then.

My Drobo V2 historically had a transfer rate of 10-15MB/s, on average, more or less. With that drive installed the performance varied considerably and somewhat randomly, between “normal” for short periods of time, to under 1MB/s, and occasionally far lower than 1MB/s.

I lived with it a long time before initiating my support case. I thought the Drobo was just really sucking :-). Anyone that owns a V2 and has spent much time monitoring throughput performance knows that even under the best of circumstances it can vary considerably. As soon as I replaced the drive the performance went back to the reasonably consistent 10-15MB/s performance I was accustomed to with the original set of 4x WD10EADS drives.

That drive also tests very well in terms of performance, using HDTune, comparable to my other drives. And the benchmark graph is as smooth as other drives tested in the same dock, suggesting that if the drive is experiencing a lot of recoverable read errors it is not obvious in the HDTune tests. I would expect either a lot of intermittent sharp dips or generally slow benchmarks if it was experiencing enough read retries to significantly slow the Drobo the way it did.

Anyway, you seem to be implying that the Drobo’s performance hit is due to recoverable read errors, or something like that. I’m just pointing out that I see no evidence of that in this one anecdotal case of a similarly “impaired” drive in a Drobo.

The whole thing is a total mystery to me. I have no clue why the Drobo did not like that drive, nor can I find anything wrong with it. And most of the 18,000+ power on hours have occurred after it was retired from the Drobo. However, I see no reason to assume anything is “wrong” with that drive, except that the Drobo did not like it.

hi neil, there is radio video on the grc website about spinright, which explains how it forces the drive to verify itself.
maybe if you can run that or something which looks deeper at the hard drive surface, it might show something interesting when repeated again later.

eg, i think hdtune might be (skimming the surface) in its benchmark tests rather than specifically testing the same area multiple times (for that given measurement)

i use a program from grc called TIP (trouble in pardise) which was (and still is) brilliant with iomega zip disks.
it does multiple read and write and verifications on each sector of a zipdisk, and pulls off info directly from the iomega hardware unit such as Soft/Firm/Hard errors

(theres more info in detail on the side) but apparantly most of the Soft errors come and go normally, firm errors are indications of more problems over time, and hard errors are worst than firms.

maybe hdtune is like reading a scratched cd (where the scratch goes from edge to edge) and only a small tiny pit of data is read which is scratched, rather than if the scratch was following the same circular spiral-reading motion of the laser (in which case a larger amount of the benchmarked data would be affected) - if you see what i mean :)[hr]
edit: the only bad thing about TIP was that Steve built-in a date checker, to force TIP to stop working years ago.
but promised to simply release a tweaked version that would always run.

BUT he never kept his promise… steve gibson i hope you read this one day - you seem cool and a nice person but you still have yet to release a TIP wihtout a date runtime restriction!! :smiley:

Hi Paul,

I watched Gibson’s video. He did not convince me that his $89 app is doing much more than WD’s diagnostic tool when it is used to write zeoes to a drive. Yes, Steve does it non-destructively, and if I thought I really needed to do that regularly it might be worth it. But it seems like a solution in search of a problem (for me). My experience is that drives work well until they no longer work well, and I have never had a problem with data corruption from drives that only “marginally” work.

I actually owned a very early version, probably V1, back in the days of low level formats. But I never updated the software because it seemed to me that the technology changed, to the point where most of it was a “black box” that SpinRite cannot deal with. And if you take a critical look at his site, he still spends most of his printer’s ink talking about old IDE technology and precious little on the real world of modern drives.

I find it interesting that he acknowledges the importance of SMART data, and his current V6.0’s inability to read SMART data in many cases, and he suggests he has a solution. He also acknowledges the “increasing uptake” of SATA drives in the market. Say what? Increasing uptake or… when is the last time you bought an IDE drive? It’s been at least 5 years for me now, and his promised V6.1 that fixes the problem is still vaporware? Sounds like he is too busy cashing checks from his web site cash register and not spending enough time keeping up with technology :-). If he is 5 years behind dealing with that fairly straightforward SATA problem, what else is modern Spinrite lacking?

If I am wrong on that, let me know, but you need to point to something specific, not just vague claims in a not too well put together “stream of thought” video.

As I understand it, HDtune’s surface analysis simply runs through a disk, from one end to the other, a sector at a time, requesting reads. If the disk firmware reports an unreadable sector, it is flagged on the sector chart.

All apps do the same thing when they read files, including Windows file operations. The problem is that there is no law that says they have to be in your face about read errors. Windows should report it in the event log, but by the time I get around to viewing my disk error filter, the system has usually been rebooted so the drive mappings are no longer reliable. Sorry, sometimes I just hate Windows . Anyway, HDTune just does the whole drive all at once and puts the results in your face.

Now, in my case, my Drobo was slowed to as little as 10% of normal speed, and on a very regular basis. If I run HDTune and it runs at normal speeds, and I copy large amounts of data (many gigabytes) back and forth from a drive and all that runs in the normal expected time, then there is nothing for Spinrite to tell me. The drive is simply not having a problem with recoverable read errors, nor is it reporting read errors and/or relocating sectors.

The Drobo was likely having a problem with something else but it will remain a mystery because either the diagnostic log did not pinpoint the problem or the support rep did not care to relay that info to me. (just exploring all possibilities- I have no reason to believe she intentionally withheld anything important).

Somewhere in my PC Museum, in storage, I have a couple of Zip drives and a small pile of disks. I did not realize anyone actually used them any more :-). I can take my entire pile of Zip disks and put them on one $10 USB flash drive :-). And I guess I more or less did that many years ago. I suspect if Steve has not fixed his now 5+ year old SMART problem it is going to be a L.O.N.G. time before he gets around to fixing your Zip Drive utility :-).[hr]
P.S. I just took delivery of 3x 3TB Green WD30EZRX drives. One will go in my Sans Digital JBOD box, and the other two (plus the Drobo S) will back up the 1st one. I guess this means I have to do the dirty deed and reformat the Drobo to something larger than 2TB volumes :-(.

@Swifty: the new 2TB drives will replace a set of 2TB drives. The 2TB drives will replace a set of 1TB drives, and the 3 x 1TB drives will be set aside for my “digital negatives” offline backup. Sorry, nothing left for you :-). The 3TB drives were a silver bullet that solved 7 almost full hard drives in one grand exercise of musical chairs.

Anyway, one of the reasons I bought these 3TB monsters now is that I wanted to see what hardware and software incompatibilities I might run into before I actually hit a real disk crunch. They do not work in my BlacX USB 2.0/eSata docks when connected via USB, but that is either a motherboard USB controller problem or perhaps something in the docks- not sure and haven’t researched it.

The other problem I ran into is that HDTune 2.55, which is the last freeware version issued, in certain places only reports a 2TB volume or partion or drive, as the case may be, where it should be reporting 3TB. So I assume that is no longer usable for >2TB drives (?).

Does anyone know any freeware solutions to replace HDTune, or do I have to suck it up and buy the paid version just to get it working with these newer >2TB drives?

And no, I already decided against Spinrite :slight_smile: :slight_smile:

i guess i’ll have to carry on using runasdate, or changing the computer date :slight_smile:
no big issue, just principle as he said he would do it. :slight_smile:

High LCC (on a WD Green drive at least) is a symptom of the Idle3 “issue”
I don’t have the exact numbers but one of my old drives had a high LCC because I had forgotten to disable (or extend) the Idle3 timer. Another drive that I did disable the timer on had a significantly lower LCC and comparable or longer power-on time.

While not officially supported (I think), I have successfully used WDIdle3 v1.05 to disable the Idle3 timer on WD20EADS, WD20EARS, WD20EZRX and WD30EZRX drives.

Note that you will most likely need to use on-board SATA (in AHCI mode) or a “WD-known” SATA add-in card.

I know the Maxtor SATA150PCICARD (Promise PDC20375) PCI SATA+PATA card works. The WDIdle utility may be more forgiving than I give it credit for - don’t know as I don’t have any other SATA add-in cards. :slight_smile:

I label them after I disable the timer so I remember.

FYI, SpinRite seems to want nothing to do with said Maxtor/Promise SATA controller. :frowning:

just as a side note, synology’s firmware now does this automatically when you boot a synology with a WD GP drive in it - it disables the timer!

its very smart idea

Smart. No pun intended. :slight_smile:

See my post tonight here.

thats useful to know bhiga… having had good success so far with WD drives, (thanks to docchris) :slight_smile: im likely to stick with WD drives in future.
(unless of course something suddenly changes, like with the reported old/vs/new verbatim media where something reportedly changed in the manufactoring process leading to some new branded media to be highly defective unless specifically manufactured by a certain place or something) :slight_smile:

If synology can modify a drive to use the best values for the idle timer etc, or tler etc, it makes sense. actually its’ common sense and would be a good feature for a drobo to be able to do that too.
(it all depends on whether a drive manufacturer has made a deal with a specific hardware vendor or not, as it might not be shared with multi vendors, but logically seems like common sense for all hardware devices to be able to tweak the firmware to a degree, to make them a best fit for purpose when used in that device, kind like an initialisation process).

by the way… it turns out that Steve Gibson actually “is” working on a new iomega tool after all :slight_smile:
it’s called the ZipMon project… - good ol steve :slight_smile:

ZipMon — the long-awaited replacement of TIP (Trouble In Paradise)
more info on this page just fyi: