Difference between revisions of "Raid:Manual Rebuild"
(35 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
− | + | {{Level|Advanced}} | |
− | {{ | + | {{Warning box|Get it right or you will lose data. '''Take a backup!''' Let the raid sync, this can take quite a while.}} |
− | + | SME Servers Raid Options are largely automated, if you built your system with a single hard disk simply logon as admin and select Disk Redundancy to add a new drive to your RAID1 array. The same procedure is used if you have a disk failure in a RAID array and you have replaced that failed disk. | |
− | + | But with the best laid plans things don't always go according to plan, these are the processes required to do it manually. | |
− | + | See also: [[Hard Disk Partitioning]] and [[Raid#Resynchronising_a_Failed_RAID]] | |
− | === | + | ==HowTo: Manage/Check a RAID1 Array from the command Line== |
+ | ===What is the Status of the Array=== | ||
− | + | [root@ ~]# '''cat /proc/mdstat''' | |
+ | Personalities : [raid1] | ||
+ | md2 : active raid1 sdb2[2] sda2[0] | ||
+ | 488279488 blocks [2/1] [U_] | ||
+ | [=>...................] recovery = 6.3% (31179264/488279488) finish=91.3min speed=83358K/sec | ||
+ | md1 : active raid1 sdb1[1] sda1[0] | ||
+ | 104320 blocks [2/2] [UU] | ||
+ | |||
+ | unused devices: <none> | ||
+ | |||
+ | ==HowTo: Reinstate a disk from the RAID1 Array with the command Line== | ||
− | + | ===Look at the mdstat=== | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | First we must determine which drive is in default. | |
− | |||
− | [root@ ~]# ''' | + | [root@ ~]#'''cat /proc/mdstat''' |
− | + | Personalities : [raid1] | |
− | + | md1 : active raid1 sdb1[1] sda1[0] | |
− | + | 104320 blocks [2/2] [UU] | |
− | + | ||
− | + | md2 : active raid1 sdb2[2](F) sda2[0] | |
− | + | 52323584 blocks [2/1] [U_] | |
− | + | ||
− | + | unused devices: <none> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | === | + | (S)= Spare |
− | + | (F)= Fail | |
+ | [0]= number of the disk | ||
− | + | {{note box|As we can see the partition sdb2 is in default, we can see the flag: sdb2 [2] (F). We need to resynchronize the disk sdb to the existing array md2.}} | |
− | + | ===Fail and remove the disk, '''sdb''' in this case=== | |
− | '' | + | mdadm: set /dev/sdb2 faulty in /dev/md2 |
+ | [root@ ~]# '''mdadm --manage /dev/md2 --fail /dev/sdb2''' | ||
− | + | mdadm: hot removed /dev/sdb2 | |
+ | [root@ ~]# '''mdadm --manage /dev/md2 --remove /dev/sdb2''' | ||
− | + | mdadm: set /dev/sdb1 faulty in /dev/md1 | |
− | + | [root@ ~]# '''mdadm --manage /dev/md1 --fail /dev/sdb1''' | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | mdadm: hot removed /dev/sdb1 | |
+ | [root@ ~]# '''mdadm --manage /dev/md1 --remove /dev/sdb1''' | ||
− | + | ===Do your Disk Maintenance here=== | |
− | + | At this point the disk is idle. | |
− | + | [root@ ~]# '''cat /proc/mdstat''' | |
− | + | Personalities : [raid1] | |
− | + | md1 : active raid1 sda1[0] | |
− | + | 104320 blocks [2/1] [U_] | |
− | + | ||
− | /dev/ | + | md2 : active raid1 sda2[0] |
− | /dev/ | + | 52323584 blocks [2/1] [U_] |
+ | |||
+ | unused devices: <none> | ||
+ | |||
+ | {{note box|You'll have to determine if your disk can be reinstated at the array. In fact sometimes a raid can get out of sync after a power failure but also some times for physical outages of the hard disk. It is necessary to test the hard disk if this occurs repeatedly. For this we will use '''smartctl'''.}} | ||
+ | |||
+ | For all the details available by SMART on the disk | ||
+ | |||
+ | [root@ ~]# '''smartctl -a /dev/sdb''' | ||
+ | |||
+ | At least two types of tests are possible, short (~ 1 min) and long (~ 10 min to 90 min). | ||
+ | |||
+ | [root@ ~]# '''smartctl -t short /dev/sdb''' #short test | ||
+ | [root@ ~]# '''smartctl -t long /dev/sdb''' #long test | ||
+ | |||
+ | to access the results / statistics for these tests: | ||
− | + | [root@ ~]# '''smartctl -l selftest /dev/sdb''' | |
− | + | You can refer to this page for more information how activate or understand the Analysis and Reporting Technology (SMART) [[Monitor_Disk_Health]] | |
− | {{ | + | {{Note box|if you need to change the disk due to physical failure found by the smartctl command, install a new disk of the same capacity (or more) and enter the following commands to recreate new partitions by copying them from healthy disk sda.}}<!-- Do NOT try to use sfdisk on disks llarger than 2 TiB, use gdisk or similar, see below. --> |
− | + | [root@ ~]# '''sfdisk -d /dev/sda > sfdisk_sda.output''' | |
+ | [root@ ~]# '''sfdisk /dev/sdb < sfdisk_sda.output''' | ||
− | + | GPT Disks | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | Larger disks will be GPT Disks, sfdisk will not work - you will need to use gdisk and partx (parted) | |
+ | [root@ ~]# '''yum install gdisk''' | ||
− | + | The copy the partition table from a good disk to the new disk, the first line will copy the partition table from disk sda to sdd, the second will randomize the GUID | |
+ | [root@ ~]# '''sgdisk /dev/sda -R /dev/sdd''' | ||
+ | [root@ ~]# '''sgdisk -G /dev/sdd''' | ||
− | [root@ ~]# | + | To view the partitions use partx |
− | + | [root@ ~]# '''partx -l /dev/sdd''' | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | If you want to reinstate the same disk without replacing it, go to the next step. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | === | + | ===Add the partitions back=== |
− | [root@ ~]# | + | mdadm: hot added /dev/sdb1 |
− | + | [root@ ~]# '''mdadm --manage /dev/md1 --add /dev/sdb1''' | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | mdadm: hot added /dev/sdb2 | |
− | + | [root@ ~]# '''mdadm --manage /dev/md2 --add /dev/sdb2''' | |
− | |||
− | /dev/ | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | === | + | ===Another Look at the mdstat=== |
− | |||
− | [root@ ~]# | + | [root@sme8-64-dev ~]# cat /proc/mdstat |
+ | Personalities : [raid1] | ||
+ | md1 : active raid1 sdb1[1] sda1[0] | ||
+ | 104320 blocks [2/2] [UU] | ||
+ | |||
+ | md2 : active raid1 sdb2[2] sda2[0] | ||
+ | 52323584 blocks [2/1] [U_] | ||
+ | [>....................] recovery = 1.9% (1041600/52323584) finish=14.7min speed=57866K/sec | ||
− | + | unused devices: <none> | |
− | + | ||
− | + | {{note box|with a new disk it may be worthwhile to reinstall grub to avoid problems on startup error. The grub is the program that allows you to launch the operating systems. Please enter the following commands. }} | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | ==HowTo: Write the GRUB boot sector== | ||
− | + | {{Warning box|as the dd command is named "data destroyer" you need to be extremely prudent and sure of the name of source partition and/or destination. At first you should skip the dd command, Step 1 below, and attempt to install grub without it, see Step 2 below. If grub can be installed without using dd, then Step 1 can be discarded. }} | |
− | + | *1.dd | |
− | |||
− | |||
− | |||
+ | [root@ ~]# '''dd if=/dev/sda1 of=/dev/sdb1''' | ||
− | + | *2.grub | |
− | [root@ ~]# grub | + | [root@ ~]# '''grub''' |
GNU GRUB version 0.95 (640K lower / 3072K upper memory) | GNU GRUB version 0.95 (640K lower / 3072K upper memory) | ||
Line 289: | Line 150: | ||
completions of a device/filename.] | completions of a device/filename.] | ||
− | grub> device (hd0) /dev/sdb | + | grub> '''device (hd0) /dev/sdb''' |
− | grub> root (hd0,0) | + | grub> '''root (hd0,0)''' |
Filesystem type is ext2fs, partition type 0xfd | Filesystem type is ext2fs, partition type 0xfd | ||
− | grub> setup (hd0) | + | grub> '''setup (hd0)''' |
Checking if "/boot/grub/stage1" exists... no | Checking if "/boot/grub/stage1" exists... no | ||
Checking if "/grub/stage1" exists... yes | Checking if "/grub/stage1" exists... yes | ||
Line 301: | Line 162: | ||
Running "embed /grub/e2fs_stage1_5 (hd0)"... 16 sectors are embedded. | Running "embed /grub/e2fs_stage1_5 (hd0)"... 16 sectors are embedded. | ||
succeeded | succeeded | ||
− | Running "install /grub/stage1 (hd0) ( | + | Running "install /grub/stage1 (hd0) (hd1)1+16 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded |
Done. | Done. | ||
− | grub> quit | + | grub> '''quit''' |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | <noinclude> | |
+ | [[Category:Howto]] | ||
+ | [[Category:Administration:Storage]] | ||
+ | </noinclude> |
Latest revision as of 23:33, 14 April 2021
SME Servers Raid Options are largely automated, if you built your system with a single hard disk simply logon as admin and select Disk Redundancy to add a new drive to your RAID1 array. The same procedure is used if you have a disk failure in a RAID array and you have replaced that failed disk.
But with the best laid plans things don't always go according to plan, these are the processes required to do it manually.
See also: Hard Disk Partitioning and Raid#Resynchronising_a_Failed_RAID
HowTo: Manage/Check a RAID1 Array from the command Line
What is the Status of the Array
[root@ ~]# cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sdb2[2] sda2[0] 488279488 blocks [2/1] [U_] [=>...................] recovery = 6.3% (31179264/488279488) finish=91.3min speed=83358K/sec md1 : active raid1 sdb1[1] sda1[0] 104320 blocks [2/2] [UU] unused devices: <none>
HowTo: Reinstate a disk from the RAID1 Array with the command Line
Look at the mdstat
First we must determine which drive is in default.
[root@ ~]#cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb1[1] sda1[0] 104320 blocks [2/2] [UU] md2 : active raid1 sdb2[2](F) sda2[0] 52323584 blocks [2/1] [U_] unused devices: <none>
(S)= Spare (F)= Fail [0]= number of the disk
Fail and remove the disk, sdb in this case
mdadm: set /dev/sdb2 faulty in /dev/md2
[root@ ~]# mdadm --manage /dev/md2 --fail /dev/sdb2
mdadm: hot removed /dev/sdb2
[root@ ~]# mdadm --manage /dev/md2 --remove /dev/sdb2
mdadm: set /dev/sdb1 faulty in /dev/md1
[root@ ~]# mdadm --manage /dev/md1 --fail /dev/sdb1
mdadm: hot removed /dev/sdb1
[root@ ~]# mdadm --manage /dev/md1 --remove /dev/sdb1
Do your Disk Maintenance here
At this point the disk is idle.
[root@ ~]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sda1[0] 104320 blocks [2/1] [U_] md2 : active raid1 sda2[0] 52323584 blocks [2/1] [U_] unused devices: <none>
For all the details available by SMART on the disk
[root@ ~]# smartctl -a /dev/sdb
At least two types of tests are possible, short (~ 1 min) and long (~ 10 min to 90 min).
[root@ ~]# smartctl -t short /dev/sdb #short test [root@ ~]# smartctl -t long /dev/sdb #long test
to access the results / statistics for these tests:
[root@ ~]# smartctl -l selftest /dev/sdb
You can refer to this page for more information how activate or understand the Analysis and Reporting Technology (SMART) Monitor_Disk_Health
[root@ ~]# sfdisk -d /dev/sda > sfdisk_sda.output [root@ ~]# sfdisk /dev/sdb < sfdisk_sda.output
GPT Disks
Larger disks will be GPT Disks, sfdisk will not work - you will need to use gdisk and partx (parted)
[root@ ~]# yum install gdisk
The copy the partition table from a good disk to the new disk, the first line will copy the partition table from disk sda to sdd, the second will randomize the GUID
[root@ ~]# sgdisk /dev/sda -R /dev/sdd [root@ ~]# sgdisk -G /dev/sdd
To view the partitions use partx
[root@ ~]# partx -l /dev/sdd
If you want to reinstate the same disk without replacing it, go to the next step.
Add the partitions back
mdadm: hot added /dev/sdb1
[root@ ~]# mdadm --manage /dev/md1 --add /dev/sdb1
mdadm: hot added /dev/sdb2
[root@ ~]# mdadm --manage /dev/md2 --add /dev/sdb2
Another Look at the mdstat
[root@sme8-64-dev ~]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb1[1] sda1[0] 104320 blocks [2/2] [UU] md2 : active raid1 sdb2[2] sda2[0] 52323584 blocks [2/1] [U_] [>....................] recovery = 1.9% (1041600/52323584) finish=14.7min speed=57866K/sec unused devices: <none>
HowTo: Write the GRUB boot sector
- 1.dd
[root@ ~]# dd if=/dev/sda1 of=/dev/sdb1
- 2.grub
[root@ ~]# grub GNU GRUB version 0.95 (640K lower / 3072K upper memory) [ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename.] grub> device (hd0) /dev/sdb grub> root (hd0,0) Filesystem type is ext2fs, partition type 0xfd grub> setup (hd0) Checking if "/boot/grub/stage1" exists... no Checking if "/grub/stage1" exists... yes Checking if "/grub/stage2" exists... yes Checking if "/grub/e2fs_stage1_5" exists... yes Running "embed /grub/e2fs_stage1_5 (hd0)"... 16 sectors are embedded. succeeded Running "install /grub/stage1 (hd0) (hd1)1+16 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded Done. grub> quit