5N won't finish rebuild

Hi,

Here’s the timeline of events.

  • Array with 5 drives, one of them, a 3Tb, fails and the 5N goes “critical” (red), as it was 90% full.
  • Shut down the Drobo, until I can buy a new drive.
  • Buy new 4Tb Seagate, insert in place of the old 3Tb “failed” drive.
  • Turn 5N on. It goes into the “rebuilding” process. It tells me it should take aprox. 24 hours.
  • I leave it overnight, and the next day, I find the 5N is in an endless cycle of reboots. It starts the rebuild process, but reboots itself shortly (2 mins) into it.

Steps I’ve taken:

  • The “plug 1 drive at a time” routine. Once I plugged all drives back in, got the same result.
  • Removed the 4Tb drive, and booted the 5N back. It goes “critical” again, but I can at least mount the volume and read the files. I then insert the 4Tb drive back, with the 5N on. It goes on the “rebuild” process, and keeps at it for a few hours, until it eventually reboots. I did this a few times, and it seems to be rebooting every time, at the same spot, with “14 hours left”.

The 5N is out of warranty, and Drobo went as far as to say that I junk this unit and buy a new one.

Is that really what I need to do?

Thanks,
Alex.

hi just working backwards up the page…
for warranty, have you looked at drobocare? the 5n still seems relatively new and maybe a drobocare to get a replacement unit under it will be much cheaper than to buy a new one?

Usually when a drive fails, the faulty drive can be ejected (while drobo is on), and then a replacement inserted (while drobo is on).

(from a user perspective, i would also expect the firmware to be able to recognise whether the unit was powered off, and faulty drive removed, and new drive put in, even when then powered on but i need to check if that is 100% the case)

it sounds like you are having the reboot loop issue that ive seen quite a bit recently, and i think the high level answer from a mod was to try to nondestructively block clone each hard drive, (most probably professionally if unsure) and that some bad blocks on existing disk pack drives was causing the reboots, though those drives had not fully failed, and that a COLD diskpack swap with power off, might let the rebuild complete, eg using the new error-free clones.

lets see what others think too in case am mistaken, but id definitely suggest looking into drobocare to see what costs of options you have.

btw if you have a moment, could you also please list which drives /sizes you had in which slots?
and whether SDR or dual (ddr) redundancy was being used?

Sure, let’s go:

Slot 1: Seagate 4Tb
Slot 2: Hitachi 4Tb
Slot 3: Hitach 2Tb
Slot 4: WD 3Tb
Slot 5: Seagate 4Tb (was Seagate 3Tb = failed)

Using single redundancy. Ah, interesting thing that I noticed now: the WD drive in slot 4 says “Healed”, while the other says “Good”… “Healed”? WTF?
I did try hot swapping the new drive in, after successfully booting the Drobo 5N into “critical” condition (without the drive in the lower bay). Does the same thing, starts the rebuild process, stays there for a few hours, and when there’s “approx. 14 hours remaining”, it’ll reboot by itself.
Bad blocks can’t (or rather, shouldn’t) throw the whole thing into a loop. This is something that should be fixed via software, not via replacement hardware…[hr]
Interesting… I was logged in, via SSH, while the machine rebooted…
These are the last messages I saw on /var/log/messages before rebooting:

$ Jul 7 16:37:58 shockwave user.info kernel: [15464.091438] dri_dnas_abort: called to abort cmnd: cdb=ea b1fe33a0, b05953e8
Jul 7 16:38:08 shockwave user.info kernel: [15474.091420] dri_dnas_abort: called to abort cmnd: cdb= 0 b1fe33a0, b0595410
Jul 7 16:38:08 shockwave user.info kernel: [15474.098418] dri_dnas_device_reset: called to reset device, cmnd: b1fe33a0
Jul 7 16:38:08 shockwave user.info kernel: [15474.105249] sd 0:0:0:0: Device offlined - not ready after error recovery

Does this help?

  1. With replacement drive in slot 5 (Bottom most), the rebuilding will proceed but eventually a reboot loop.

  2. Without replacement drive in slot 5 (Bottom most), rebuilding starts… and able to mount and access the Shares and data. The question is … any rebooting loop?

Let’s assume for awhile that in (2), the rebuilding did complete and all 4 drive bays Green and due to the fact that total used space is high at 90%… the last bay will light up RED to indicate insert a drive.

With the above, it will mean that the 4 drives are most likely in “good” condition and rebuilding is able to complete. Rebuilding will need to read the parity (read/write) and re-distribute data across the remaining 4 drives.

When the new Seagate 4TB was inserted in slot 5, rebuilding will start to distribute data and re-do the parity with the newly insert drive in the pack. So basically … data will be written into the new 4TB drive… But we now know that half-way a reboot was triggered.

My take… maybe just maybe this new 4TB drive has some sort of “fault”. Yes… is a new drive I know… but I have encountered before a newly purchase drive with tons of bad sector… Is new to user who just purchased it, but you not know how long it has been sitting on the shelve or storage… and how it was transported or did the store-keeper dropped it before and put it back to the shelve?? :slight_smile:

It there is another used old 3TB drive laying around… maybe you can insert that into slot 5 and try it.

I guess there is a reason to why with the new Seagate 4TB drive in slot 5 will cause rebuilding to reboot.

Yes, it has crossed my mind that the new 4Tb drive might have problems :slight_smile:
But you gotta admit that no matter how faulty the drive is, the Drobo shouldn’t just up and reboot itself when finding an error in the rebuild process.
I don’t have any extra drives, so I’ll have to get back to the store and replace the 4Tb unit…

Thanks,
Alex

A couple of things regarding Drobo Rebooting.

If Drobo reboots with No drives installed, the issue is with the Drobo.
If Drobo only reboots with drives installed, one of your drives is bad.

If one of your drives failed during the data protection or rebuild process this will cause the Drobo to reboot or go into a too many hard drives removed situation. In this case the bad drive needs to be block level cloned so the rebuild can continue.

Also, we do not recommend hot adding drives as this could cause your drives to be reformatted and the data erased.

Thanks for your reply!

I don’t think the Drobo should reboot when it encounters a drive problem. That’s just not clean, and reeks of bad programming/software. The drive didn’t fail during the rebuild process, at least that’s not what Drobo tells me. They’re all flagged green.
I had one drive fail (cleanly, Drobo told me so, as it should). I replaced that with a brand new one. Replaced the drive (cold, not hot), and the reboots started.
Could it be I have ANOTHER bad drive (that Drobo is not reporting), that is not the brand new one, but one of the other 4 existing drives?
With that said, how do I find out which drive is bad, given my situation?
Should I try the rebuild process with 4 drives, alternating them? I’m down one 3Tb drive, and since the rebuild process hasn’t finished, I’m afraid of removing yet another drive, and damaging/losing data in the process…

thanks,
alex

A couple of points.

People are trying to help you but you’re repeatedly complaining that the Drobo shouldn’t behave the way it is doing. You may well be right but that’s a separate argument for another day, as was the shoddy treatment when you called the help line.

You should replace a failed drive with the power on. As Paul suggests, it may be that you can replace it with the power off but it is not the accepted route. With the power on, the Drobo detects the removal of the failed drive and the insertion of the new one. I’m not sure what you mean by the ‘“plug 1 drive at a time” routine’ but, as DroboMod points out, you shouldn’t be inserting drives from your disk pack into an empty Drobo one at a time. It is safe, however, to power down, remove all the drives and power up again to test whether the problem is related to the Drobo itself, as DroboMod implies. So you might like to do that first in order to break the problem into two parts. If the Drobo powers up and doesn’t enter the reboot cycle you have eliminated it and confirmed that the problem lies with your set of drives.

johnm,

Your advice goes against what the DroboMod just wrote. You say I should replace drives with the power on, also known as “hot swapping”. The DroboMod just wrote that’s a no-no. So what is it?

I’ve tried both ways, with the same results.

Right now, I have the Drobo working with 4 drives, in “Red” condition, as there’s simply not enough space. I can’t insert the old drive that was in there, as it has failed. If I insert the new one, either hot or cold, it’ll start the rebuild process, but once there’s 14 hours left in the process (always at the same spot, apparently), it’ll go into a reboot cycle.

I’ll eventually replace this new drive I just bought, and that might as well stop the Drobo from rebooting. But I still think the Drobo is to blame. I bought so it could tell me when a drive fails. I didn’t buy it so it could go into random reboot mode whenever it thinks it should. This is something that Drobo could address via software, and if they’re not doing it, it’s, again, their fault.

I did submit a ticket, with diagnostics and everything, but I was just referred to his board instead. I really don’t think this will be fixed…

hi, if we take a step back for a moment, can i check soemthing please:

a) if it is the rebuild having the problem (and rebooting), can dashboard currently still see the drives and drobo?
b) can your computer currently see the drobo drive letter (or multiple volume letters as per your config)?
c) can you currently still access your data?

ultimately if the answer to c) is yes (for a good amount of time) or yes, in general, then i would like to suggest the following:
to stop any plans of repairing/trying to fix the current problems, and to start copying and pasting/verifying the data onto another device, just to make a backup of the data while it is there, and then we can later continue to try to follow steps etc to see if the current reboot problem can be resolved. (eg to play safe and to back up data in case something goes wrong or a human error or something while we can?) :slight_smile:

just a thought…

Paul,

a) Yes, Dashboard sees the unit. And will, from time to time, “lose” the Drobo, until it reboots again, at which point, Dashboard will see it again (and notify me, via MacOS’ notification system), and start the whole rebuild process again.
b) It will mount the volume, yes.
c) Yes. Only for a sort while, of course, as there’s rebooting going on.

I have a backup of all the data. I know better than to trust a Drobo with all my music. But I’d still iike to sort this out.

No it doesn’t. I agree with what DroboMod wrote and I said so. You’re confusing two different processes. If you swap a drive it should be done with the power on. If you remove all the drives, in order to test power up the empty Drobo, and replace them again afterwards, it should be done with the power off.

hi how is it going so far, has it stablised for you asiufi