5Tb or 8Tb Drive Warning or Failure

I keep getting either a warning or complete failure on 8Tb Seagate drives. I also saw the warning on 5Tb drives.

I pull the drive out and replacement it but when I put it on my PC and run Seagate diagnostics there is never an issue with any of these drives Drobo is reporting as either a warning or a complete failure.

Any ideas on how to troubleshot what Drobo is detecting and causing it to show the drive as either failing or completely failed?

these are Seagate archive drives aren’t they?

they are NOT compatible with Drobo and should not be used[hr]

the bit below is my comment copied from this thread http://www.drobospace.com/forums/showthread.php?tid=145496

When a “desktop” hard disk cant read some data… it will keep trying and trying… sometimes for up to 30 seconds… which is good - because you want to get back your only copy of the data that is on the disk! while it is doing this thought… the computer may seem to hang because its waiting for the disk to tell it whether it can read it or not… and the disk isn’t replying because its trying to decide whether it can read it or not. This causes a problem for some raid arrays (including drobo) as they sometimes think that the fact the drive has stopped responding means its failed… so they kick it out of the array.

“NAS” hard disks (like Seagate NAS drives or the WD Red line), if they cant read some data… they will try… but not for very long… usually 7 seconds (I think some drives it may be as short as 3!), WD Called this TLER (Time Limited Error Recovery). So the drive wont spend too long trying to read unreadable data - because its in a RAID array, if that data is unreadable it can just be recreated. so it gives up quickly and tells the controller “i couldn’t read that” and then the controller can use the parity to recreate what was lost and everything continues in a lovely fashion.

So that is why you should really stick to NAS type drives in RAID arrays (Yes Drobo and other RAID arrays are usually smart enough to deal with the gracefully now - Drobos can read the model number and allow drives longer to respond. I know the synology units actually go so far as to change the setting in the firmware of the drives - disabling the long recover when they are first inserted!)

This brings us onto the unique characteristics of the Archive drive. the reason its called “archive” is because its not meant to be written to a lot, but it can be read from a lot. This is because data is stored on these drives in what is called SMR (The S is “shingled”) This means that rather than laying down the data tracks next to each other … in fact they overlap! this mens you can get more data on the disk. the drawback is if you want to change data on a track which is half “underneath” another track… you first need to read the outer track and store that somewhere safely… then re-write your new data to the inner track… then re-lay the data back on the outer track. so as you can see there is potential for the write performance to REALLY suck!

it is beautifully explained here, including the re-writing when you change data, with pretty pictures:

http://www.seagate.com/gb/en/tech-insigh...http://www.seagate.com/gb/en/tech-insights/breaking-areal-density-barriers-with-seagate-smr-

To avoid this, the Seagate drives have a little trick up their sleeves. part of the drive, 20GB , is reserved as a NON shingled cache… so when you write to the disk… it writes as fast as a normal disk… because its writing it to the non shingled cache area. then once you are not writing … the drive starts shuffling around all the data internally… that is why you might see people on forums complaining that the drive seems to be going crazy when they aren’t even doing anything to it.

the problem arises if you try and write more than 20GB to the drive at once…

Now I own 8 of these disks… and I’ve been moving a LOT of data from a large array of very fast drives onto these archive drives… I’m talking a total move of over 100TB…

and I can tell whether its writing to the archive drives… or to my WD Reds by watching the transfer rates… the reds are pretty steady at over 100MB/sec. The archive drives actually go up to 140MB/Sec… for quite a while… then they just STOP… 0MB/sec… sometimes for quite a long time (like a minute) then they will just carry on going again., actually the drive had filled its write cache and was moving data into the shingled area ready to accept the next lot of writes.

However if drobo is doing its internal optimising/relay out… or you are writing to… and a drive simply stops responding for a minute or two… Drobo will think it has died, and will kick it / mark it as failed.

So the two main points are:

  1. archive drives aren’t great for RAID because they are erratic … and if one fails and then you replace it… that new drive is going to be hit with a long continuous write while will run into terabytes as the data is rebuilt onto it… and they don’t really like that

but also:

  1. there is no reason why they could not be made to work in Drobo… it just needs for the next Drobo firmware to be programmed to recognise that “this is an archive drive… it does the follow strange behaviours: etc…” and then it will be fine. but at the moment Drobos do not know about archive drives… so Drobo freak out when they see the Archive Drives being odd… and kick them from the array thinking they are failing/have failed

Also…a regular diagnostic test (like Seagates) unfortunately does not test everything about the drive. The Drobo has much tighter limits as to what it considers a pass/fail. Drobo can get over-cautious when it comes to protecting data. Your drive would most likely work just fine as a stand-alone drive.

maybe the sata spec could also benefit from a (please wait) command, so that all drives that are still trying to read something or do something, but which have not hung, simply issue a “please wait”, or “please hold”, command to the host device, on a regular basis, and a host would then be able to recognise that and act appropriately… possibly by also issuing a “please wait” command to its own host, in this case such as to an operating system.

1 Like