This is an automatically generated mail message from mdadm
running on obiwan
A Fail event had been detected on md device /dev/md0.
Faithfully yours, etc.
oh yay! failed drives! luckily obiwan is still the "sandbox system" for now - it was supposed to be turned into my main externally-facing server once i was done with openwrt/dmz setup/etc. so much for good intentions - i'll never get this shit done.
so, i at least had the forethought to mirror the drives - it's dual 60GB ATA100 drives - good ol' hda and hdb. on each drive, i created two partitions - the first partition is /boot and the other is half of md0 - a raid1 device. i then built on md0 some logical volumes with LVM2, i usually name them /dev/linux/root, /dev/centos/usr, /dev/obiwan/home, or something like that. as far as the other partition, i thought i was doing the right thing by performing:
rsync -av --delete /boot /boot2... to sync the kernel/initrd after a yum update included a kernel update, but that's only 1/2 of it. in today's failed case, it was hda that failed, which brings us to the crux of the problem - where's your bootloader now, eh? basically, nowhere. i'm screwed. so, i broke out the knoppix dvd and get to installing a bootloader on the second drive so i could bring the system up. how could i have prevented this from happening?
well, i think i have it worked out:
- edit /boot/grub/device.map. make sure there's an entry for the second device there. in my case, it would be:
(hd1) /dev/hdb
- since grub-install likes to install in /boot of the grub root (very different from the system root - "/"), i gave it a little symlink hack:
cd /boot; ln -s . boot
- clean-up! get rid of all those old kernels that were installed with yum update:
rpm -e kernel-old-version-blah
- re-sync everything:
rsync -av --delete /boot /boot2
- now install grub on the second drive:
grub-install --root-directory=/boot2 /dev/hdb
i think that should do it. i'm going to see if there's a way i can test this - maybe i'll pull some of the really 2GB drives out of the closet and get them in the test system to simulate failure.
Update: so much for that... i just got:
This is an automatically generated mail message from mdadm
running on obiwan
A DegradedArray event had been detected on md device /dev/md0.
Faithfully yours, etc.
ding-dong, the system's dead. if i'm gonna be using knoppix so much, maybe i should re-download the latest dvd. sigh
No comments:
Post a Comment