Talk:Raid:Manual Rebuild

From SME Server
Revision as of 22:04, 6 February 2013 by Stephdl (talk | contribs)
Jump to navigationJump to search

Please see my remarks at User_talk:Davidbray — Cactus (talk | contribs 16:48, 19 March 2010 (UTC)

Thanks Cactus - I've made some changes here so look forward to your feedback

HowTo: Write the GRUB boot sector

Trex (talk) 00:26, 5 February 2013 (MST) Should add a note as per the comment 24 in this Bug re grub will not install on an unpartioned disk

Stephdl (talk) 12:24, 6 February 2013 (MST) ok i work on the howTo...work in progress, don't disturb :p



HowTo: Remove a disk from the RAID1 Array from the command Line

Look at the mdstat

First we must determine which drive is in default.


[root@ ~]#cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
md2 : active raid1 sdb2[2](F) sda2[0]
      52323584 blocks [2/1] [U_]
      
unused devices: <none>

(S)= Spare (F)= Fail [0]= number of the disk


Important.png Note:
As we can see the partition sdb2 is in default, we can see the flag: sdb2 [2] (F). We need to resynchronize the disk sdb to the existing array md2.


Fail and remove the disk, sdb in this case

[root@ ~]# mdadm --manage /dev/md2 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md2
[root@ ~]# mdadm --manage /dev/md2 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2
[root@ ~]# mdadm --manage /dev/md1 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1
[root@ ~]# mdadm --manage /dev/md1 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1

Do your Disk Maintenance here

At this point the disk is idle.

[root@ ~]# cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sda1[0]
      104320 blocks [2/1] [U_]
      
md2 : active raid1 sda2[0]
      52323584 blocks [2/1] [U_]
      
unused devices: <none>


Important.png Note:
You'll have to determine if your disk can be reinstated at the array. In fact sometimes a raid can get out of sync after a power failure but also some times for physical outages of the hard disk. It is necessary to test the hard drive if this occurs repeatedly. For this we will use smartctl.


For all the details available by SMART on the disk

[root@ ~]#smartctl -a /dev/sdb

At least two types of tests are possible, short (~ 1 min) and long (~ 10 min to 90 min).

[root@ ~]#smartctl -t short /dev/sdb #short test
[root@ ~]# smartctl -t long  /dev/sdb #long test

to access the results / statistics for these tests:

[root@ ~]#smartctl -l selftest /dev/sdb


Important.png Note:
if you need to change the disk due to physical failure found by the smartctl command, install a new disk of the same capacity (or more) and enter the following commands to recreate new partitions by copying them from healthy disk sda.


[root@ ~]#sfdisk -d /dev/sda > sfdisk_sda.output
[root@ ~]#sfdisk /dev/sdb < sfdisk_sda.output

If you want to reintegrate the same disk without replacing it, go to the next step.

Add the partitions back

[root@ ~]# mdadm --manage /dev/md1 --add /dev/sdb1
mdadm: hot added /dev/sdb1
[root@ ~]# mdadm --manage /dev/md2 --add /dev/sdb2
mdadm: hot added /dev/sdb2

Another Look at the mdstat

[root@sme8-64-dev ~]# cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
md2 : active raid1 sdb2[2] sda2[0]
      52323584 blocks [2/1] [U_]
      [>....................]  recovery =  1.9% (1041600/52323584) finish=14.7min speed=57866K/sec

unused devices: <none>