Drobo-FS file system recovery

I have a fried that asked me to assist him in data recovery of his Drobo-FS. As a Data Storage Architect that usually works on large scale enterprise storage arrays I felt I could help him out. So he brought over his DroBo-FS with a set of cloned drives and a USB Drobo that he had tried to do the recovery on with Drobo support.

Problem 1: after plugging in the Drobo (USB) unit I quickly discovered that is was stuck in the ever popular Drobo reboot of death. Based on what I had been told, the reboots followed a firmware update. From all the posts I have read this is a well-known issue with Drobo. Without any good way of recovering the USB unit I turned my attention to the Drobo-FS. I moved the cloned drives into the NAS unit (keeping them in order) and began the long task of rebuilding the data.

Problem #2: One of the drives was showing up in a failed state. Now I’ve seen soft errors trigger a drive failure before so I powered down the NAS unit and took the drive out to run Seatools on it. I discovered that the drive needed a critical and mandatory firmware update. So I downloaded the firmware and applied it to the drive. I also decided to check the rest of the drives finding 3 or the 5 drives in need of an update. Once I updated the drives and reinserted them into the NAS enclosure I noticed that the failed drives had moved to a different slot even though I kept the drives in the same slot order.

Problem #3: Several of the posts that I came across describe a procedure to drain the unit of all power by removing the case and extracting the 3V battery from the main board, waiting three hours to “drain the capacitors” and then reassemble. Well let me tell you that might work for some models but not an FS. Since I had the USB Drobo at my disposal and the units are out of warrantee I’m not afraid to do a little part swapping. I replaced the drive backplane in the NAS unit with the one I pulled out from the failed USB unit and reinserted the drives. I powered up the unit, and low and behold, no failed drives. The NAS enclosure then spent the next 96 hours trying to recover the data. The recovery process did complete; however, when I tried to get into the file system it wouldn’t mount. Go figure!!

Problem #4: The Drobo-FS runs a LINUX kernel but has no tools installed to get to the command line out of the box. No internal web interface, no SSH, no way to get into a failed filesystem to even load an SSH anything without being able to get to the \drobo-apps folder. This is a major flaw in the design of the Drobo products. Even worse on the NAS enclosure there is no serial connection that I can use to plug a serial cable into to get to the OS! So Drobo has a great marketing strategy going. Fail one Drobo, and have to buy another to recover the bad one. Go Drobo!! Based on all the posts I read this is the only way you will stay in business. I can think of several other enterprise level manufacturers that have tried the same approach and failed or should I say gone out of business.

Problem #5: So I’m now sitting with a Drobo-FS that I can’t mount even after performing the drobo “repair” several times, can’t load any tools to recover the file system, and have no other option than to reinitialize the array? REALLY??? If you are going to be a storage company, you have to provide the end user with the tools to recover their precious data. Even if that data is nothing more than the user’s ability to hoard every one and zero they have every saved.

The Drobo line of products is a great concept; however, there are several flaws with the product. For instance, stable hardware. Spend a few extra bucks and put components that have tighter tolerances, higher MTBF ratings, and are not made in a third world country. Add some cache to the system with battery backup so that pending writes are not lost. If I had to guess this is where the majority of the file system corruption is occurring. Load a few tools on the device as part of the factory build. Give the end user the ability to FUBAR their own array. At least then you could charge for support calls and it would be legit. Build some checks into firmware updates to verify that drive firmware is up to date and if not download it and install it to the affected drives. I have seen many data loss issues because of drive firmware. But I have also had the tools available to me to recover the data from that type of failure.

I challenge Data Robotics to implement some changes to truly provide a class of storage arrays that fits the prosumer market. If I’m going to pay $1000 to $1500 for and storage device with drives it better be worth the money. No this is not an enterprise storage device, but data is important and data loss can cost a business their business. I wouldn’t want to be responsible for that.