Raid:Manual Rebuild

From SME Server
Jump to navigation Jump to search
PythonIcon.png Skill level: Advanced
The instructions on this page may require deviations from standard procedures. A good understanding of linux and Koozali SME Server is recommended.


Warning.png Warning:
Get it right or you will lose data. Take a backup! Let the raid sync, this can take quite a while.


SME Servers Raid Options are largely automated, if you built your system with a single hard disk simply logon as admin and select Disk Redundancy to add a new drive to your RAID1 array. The same procedure is used if you have a disk failure in a RAID array and you have replaced that failed disk.

But with the best laid plans things don't always go according to plan, these are the processes required to do it manually.

See also: Hard Disk Partitioning and Raid#Resynchronising_a_Failed_RAID

HowTo: Manage/Check a RAID1 Array from the command Line

What is the Status of the Array

[root@ ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[2] sda2[0]
      488279488 blocks [2/1] [U_]
      [=>...................]  recovery =  6.3% (31179264/488279488) finish=91.3min speed=83358K/sec
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

unused devices: <none>

HowTo: Reinstate a disk from the RAID1 Array with the command Line

Look at the mdstat

First we must determine which drive is in default.


[root@ ~]#cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
md2 : active raid1 sdb2[2](F) sda2[0]
      52323584 blocks [2/1] [U_]
      
unused devices: <none>

(S)= Spare (F)= Fail [0]= number of the disk


  Note:
As we can see the partition sdb2 is in default, we can see the flag: sdb2 [2] (F). We need to resynchronize the disk sdb to the existing array md2.


Fail and remove the disk, sdb in this case

[root@ ~]# mdadm --manage /dev/md2 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md2
[root@ ~]# mdadm --manage /dev/md2 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2
[root@ ~]# mdadm --manage /dev/md1 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1
[root@ ~]# mdadm --manage /dev/md1 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1

Do your Disk Maintenance here

At this point the disk is idle.

[root@ ~]# cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sda1[0]
      104320 blocks [2/1] [U_]
      
md2 : active raid1 sda2[0]
      52323584 blocks [2/1] [U_]
      
unused devices: <none>


  Note:
You'll have to determine if your disk can be reinstated at the array. In fact sometimes a raid can get out of sync after a power failure but also some times for physical outages of the hard disk. It is necessary to test the hard disk if this occurs repeatedly. For this we will use smartctl.


For all the details available by SMART on the disk

[root@ ~]# smartctl -a /dev/sdb

At least two types of tests are possible, short (~ 1 min) and long (~ 10 min to 90 min).

[root@ ~]# smartctl -t short /dev/sdb #short test
[root@ ~]# smartctl -t long  /dev/sdb #long test

to access the results / statistics for these tests:

[root@ ~]# smartctl -l selftest /dev/sdb

You can refer to this page for more information how activate or understand the Analysis and Reporting Technology (SMART) Monitor_Disk_Health


  Note:
if you need to change the disk due to physical failure found by the smartctl command, install a new disk of the same capacity (or more) and enter the following commands to recreate new partitions by copying them from healthy disk sda.


[root@ ~]# sfdisk -d /dev/sda > sfdisk_sda.output
[root@ ~]# sfdisk /dev/sdb < sfdisk_sda.output

GPT Disks

Larger disks will be GPT Disks, sfdisk will not work - you will need to use gdisk and partx (parted)

[root@ ~]# yum install gdisk

The copy the partition table from a good disk to the new disk, the first line will copy the partition table from disk sda to sdd, the second will randomize the GUID

[root@ ~]# sgdisk /dev/sda -R /dev/sdd
[root@ ~]# sgdisk -G /dev/sdd

To view the partitions use partx

[root@ ~]# partx -l /dev/sdd


If you want to reinstate the same disk without replacing it, go to the next step.

Add the partitions back

mdadm: hot added /dev/sdb1

[root@ ~]# mdadm --manage /dev/md1 --add /dev/sdb1

mdadm: hot added /dev/sdb2

[root@ ~]# mdadm --manage /dev/md2 --add /dev/sdb2

Another Look at the mdstat

[root@sme8-64-dev ~]# cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
md2 : active raid1 sdb2[2] sda2[0]
      52323584 blocks [2/1] [U_]
      [>....................]  recovery =  1.9% (1041600/52323584) finish=14.7min speed=57866K/sec

unused devices: <none>


  Note:
with a new disk it may be worthwhile to reinstall grub to avoid problems on startup error. The grub is the program that allows you to launch the operating systems. Please enter the following commands.


HowTo: Write the GRUB boot sector

  Warning:
as the dd command is named "data destroyer" you need to be extremely prudent and sure of the name of source partition and/or destination. At first you should skip the dd command, Step 1 below, and attempt to install grub without it, see Step 2 below. If grub can be installed without using dd, then Step 1 can be discarded.


  • 1.dd
[root@ ~]# dd if=/dev/sda1 of=/dev/sdb1
  • 2.grub
[root@ ~]# grub

    GNU GRUB  version 0.95  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]

grub> device (hd0) /dev/sdb

grub> root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd

grub> setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  16 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd1)1+16 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.

grub> quit