RAID1 Drives allow you to have a redundant solution to bring back system with a mirrored drive during disk failures.
Let us look at a disk failure in one of the linux machines.
This will show the current raid statistics as as follows:
server1:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb3
4594496 blocks [2/1] [_U]
md1 : active raid1 sdb2
497920 blocks [2/1] [_U]
md0 : active raid1 sdb1
144448 blocks [2/1] [_U]
The current output shows that the primary drive has gone bad (Observe [_U]).
You can further investigate this using mdadm command as follows:
# mdadm --detail /dev/md0
# mdadm -D /dev/md0
The output would confirm the drive which has gone bad.
If your server is unstable, you might think of removing the bad drive and boot it back temporarily from the second drive in place. For this you should ensure that the grub is installed on the second drive as well so that it boots without any trouble. It is a best practice to install the grub on both drives after configuring RAID1. If it is not done, Not an issue, its not too late to configure that before rebooting the machine for disk removal. Even otherwise through rescue mode grub can be installed easily.
To install grub when you’re on working server:
With (Grub v1.x), Goto grub prompt
Find existing grub setups using find command
grub>> find /grub/stage1
If you have any you will find
otherwise you will have to continue with the grub setup as follows,
The above lines setup grub on MBR of both the drives. Depending on the drives currently available on the machine/status of your raid you can follow the above instrutions to recover the GRUB while troubleshooting RAID1 setup’s.
If you’re on (Grub v2.x) grub-install /dev/sdX (PS: X in /dev/sdX is drive letter. eg: if you want to install grub on first drive ie sda, then change X with a) command should do all the work.
Once you have the grub installed on drive, you can remove the bad drive from the RAID array using mdadm commands.
In our case (from the initial mdstats output), we should mark bad drive as fail and remove it from the RAID array as follows:
mdadm --manage /dev/md0 --fail /dev/sda1
mdadm --manage /dev/md0 --remove /dev/sda1
Repeat this command for other arrays’s too.
Now you’re good to go ahead shutdown the system and remove the drive. If you have a replacement drive, better add it before rebooting and follow the instructions required to rebuild the RAID arrays.