Talk:Raid:Manual Rebuild

From SME Server
Revision as of 21:46, 6 February 2013 by Stephdl (talk | contribs)
Jump to navigation Jump to search

Please see my remarks at User_talk:Davidbray — Cactus (talk | contribs 16:48, 19 March 2010 (UTC)

Thanks Cactus - I've made some changes here so look forward to your feedback

HowTo: Write the GRUB boot sector

Trex (talk) 00:26, 5 February 2013 (MST) Should add a note as per the comment 24 in this Bug re grub will not install on an unpartioned disk

Stephdl (talk) 12:24, 6 February 2013 (MST) ok i work on the howTo...work in progress, don't disturb :p



HowTo: Remove a disk from the RAID1 Array from the command Line

Look at the mdstat

First we must determine which drive is in default.


[root@sme8-64-dev ~]# cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
md2 : active raid1 sdb2[2](F) sda2[0]
      52323584 blocks [2/1] [U_]
      
unused devices: <none>

(S)= Spare (F)= Fail [0]= number of the disk


  Note:
As we can see the partition sdb2 is in default, we can see the flag: sdb2 [2] (F). We need to resynchronize the disk sdb to the existing array md2.


Fail and remove the disk, sdb in this case

[root@ ~]# mdadm --manage /dev/md2 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md2
[root@ ~]# mdadm --manage /dev/md2 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2
[root@ ~]# mdadm --manage /dev/md1 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1
[root@ ~]# mdadm --manage /dev/md1 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1

Do your Disk Maintenance here

At this point the disk is idle.

[root@sme8-64-dev ~]# cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sda1[0]
      104320 blocks [2/1] [U_]
      
md2 : active raid1 sda2[0]
      52323584 blocks [2/1] [U_]
      
unused devices: <none>


  Note:
You'll have to determine if your disk can be reinstated at the array. In fact sometimes a raid can get out of sync after a power failure but also some outages times for physical disk itself. It is necessary to test the hard drive if this occurs repeatedly. For this we will use smartctl.


For all the details available by SMART on the disk

smartctl -a /dev/sdb

At least two types of tests are possible, short (~ 1 min) and long (~ 10 min to 90 min).

smartctl -t short /dev/sdb #short test
smartctl -t long  /dev/sdb #long test

to access the results / statistics for these tests:

smartctl -l selftest /dev/sdb


  Note:
if you need to change the disk due to physical failure found by the smartctl command, install a new disk of the same capacity (or more) and enter the following commands to recreate new partitions by copying them from healthy disk sda.


sfdisk -d /dev/sda > sfdisk_sda.output
sfdisk /dev/sdb < sfdisk_sda.output

If you want to reintegrate the same disk without replacing it, go to the next step.

Add the partitions back

[root@ ~]# mdadm --manage /dev/md1 --add /dev/sdb1
mdadm: hot added /dev/sdb1
[root@ ~]# mdadm --manage /dev/md2 --add /dev/sdb2
mdadm: hot added /dev/sdb2

Another Look at the mdstat

[root@sme8-64-dev ~]# cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
md2 : active raid1 sdb2[2] sda2[0]
      52323584 blocks [2/1] [U_]
      [>....................]  recovery =  1.9% (1041600/52323584) finish=14.7min speed=57866K/sec

unused devices: <none>