Difference between revisions of "Raid:Manual Rebuild"

From SME Server
Jump to navigation Jump to search
 
(14 intermediate revisions by 5 users not shown)
Line 1: Line 1:
==Raid: Manual Rebuild==
 
 
{{Level|Advanced}}
 
{{Level|Advanced}}
 +
{{Warning box|Get it right or you will lose data. '''Take a backup!''' Let the raid sync, this can take quite a while.}}
  
{{Warning box|Get it right or you will lose data. Take a backup, let the raid sync}}
+
SME Servers Raid Options are largely automated, if you built your system with a single hard disk simply logon as admin and select Disk Redundancy to add a new drive to your RAID1 array. The same procedure is used if you have a disk failure in a RAID array and you have replaced that failed disk.
  
SME Servers Raid Options are largely automated, if you built your system with a single hard disk, or have a hard disk failure, simply logon as ''admin'' and select ''Disk Redundancy'' to add a new drive to your RAID1 array.
+
But with the best laid plans things don't always go according to plan, these are the processes required to do it manually.
  
But with the best laid plans things don't always goaccording to plan, these are the processes required to do it manually
+
See also: [[Hard Disk Partitioning]] and [[Raid#Resynchronising_a_Failed_RAID]]
  
== HowTo: Manage/Check a RAID1 Array from the command Line ==
+
==HowTo: Manage/Check a RAID1 Array from the command Line==
 
+
===What is the Status of the Array===
=== What is the Status of the Array ===
 
  
 
  [root@ ~]# '''cat /proc/mdstat'''
 
  [root@ ~]# '''cat /proc/mdstat'''
Line 22: Line 21:
 
  unused devices: <none>
 
  unused devices: <none>
  
== HowTo: Reinstate a disk from the RAID1 Array with the command Line ==
+
==HowTo: Reinstate a disk from the RAID1 Array with the command Line==
  
=== Look at the mdstat ===
+
===Look at the mdstat===
  
 
First we must determine which drive is in default.
 
First we must determine which drive is in default.
Line 45: Line 44:
 
{{note box|As we can see the partition sdb2 is in default, we can see the flag: sdb2 [2] (F). We need to resynchronize the disk sdb to the existing array md2.}}
 
{{note box|As we can see the partition sdb2 is in default, we can see the flag: sdb2 [2] (F). We need to resynchronize the disk sdb to the existing array md2.}}
  
=== Fail and remove the disk, '''sdb''' in this case ===
+
===Fail and remove the disk, '''sdb''' in this case===
  
 +
mdadm: set /dev/sdb2 faulty in /dev/md2
 
  [root@ ~]# '''mdadm --manage /dev/md2 --fail /dev/sdb2'''
 
  [root@ ~]# '''mdadm --manage /dev/md2 --fail /dev/sdb2'''
mdadm: set /dev/sdb2 faulty in /dev/md2
+
 
 +
mdadm: hot removed /dev/sdb2
 
  [root@ ~]# '''mdadm --manage /dev/md2 --remove /dev/sdb2'''
 
  [root@ ~]# '''mdadm --manage /dev/md2 --remove /dev/sdb2'''
mdadm: hot removed /dev/sdb2
+
 
 +
mdadm: set /dev/sdb1 faulty in /dev/md1
 
  [root@ ~]# '''mdadm --manage /dev/md1 --fail /dev/sdb1'''
 
  [root@ ~]# '''mdadm --manage /dev/md1 --fail /dev/sdb1'''
mdadm: set /dev/sdb1 faulty in /dev/md1
+
 
 +
mdadm: hot removed /dev/sdb1
 
  [root@ ~]# '''mdadm --manage /dev/md1 --remove /dev/sdb1'''
 
  [root@ ~]# '''mdadm --manage /dev/md1 --remove /dev/sdb1'''
mdadm: hot removed /dev/sdb1
 
  
=== Do your Disk Maintenance here ===
+
===Do your Disk Maintenance here===
  
 
At this point the disk is idle.
 
At this point the disk is idle.
Line 85: Line 87:
 
  [root@ ~]# '''smartctl -l selftest /dev/sdb'''
 
  [root@ ~]# '''smartctl -l selftest /dev/sdb'''
  
{{Note box|if you need to change the disk due to physical failure found by the smartctl command, install a new disk of the same capacity (or more) and enter the following commands to recreate new partitions by copying them from healthy disk sda.}}
+
You can refer to this page for more information how activate or understand the Analysis and Reporting Technology (SMART) [[Monitor_Disk_Health]]
 +
 
 +
{{Note box|if you need to change the disk due to physical failure found by the smartctl command, install a new disk of the same capacity (or more) and enter the following commands to recreate new partitions by copying them from healthy disk sda.}}<!-- Do NOT try to use sfdisk on disks llarger than 2 TiB, use gdisk or similar, see below. -->
  
 
  [root@ ~]# '''sfdisk -d /dev/sda > sfdisk_sda.output'''
 
  [root@ ~]# '''sfdisk -d /dev/sda > sfdisk_sda.output'''
 
  [root@ ~]# '''sfdisk /dev/sdb < sfdisk_sda.output'''
 
  [root@ ~]# '''sfdisk /dev/sdb < sfdisk_sda.output'''
 +
 +
GPT Disks
 +
 +
Larger disks will be GPT Disks, sfdisk will not work - you will need to use gdisk and partx (parted)
 +
[root@ ~]# '''yum install gdisk'''
 +
 +
The copy the partition table from a good disk to the new disk, the first line will copy the partition table from disk sda to sdd, the second will randomize the GUID
 +
[root@ ~]# '''sgdisk /dev/sda -R /dev/sdd'''
 +
[root@ ~]# '''sgdisk -G /dev/sdd'''
 +
 +
To view the partitions use partx
 +
[root@ ~]# '''partx -l /dev/sdd'''
 +
  
 
If you want to reinstate the same disk without replacing it, go to the next step.
 
If you want to reinstate the same disk without replacing it, go to the next step.
  
=== Add the partitions back ===
+
===Add the partitions back===
  
 +
mdadm: hot added /dev/sdb1
 
  [root@ ~]# '''mdadm --manage /dev/md1 --add /dev/sdb1'''
 
  [root@ ~]# '''mdadm --manage /dev/md1 --add /dev/sdb1'''
  mdadm: hot added /dev/sdb1
+
   
 +
mdadm: hot added /dev/sdb2
 
  [root@ ~]# '''mdadm --manage /dev/md2 --add /dev/sdb2'''
 
  [root@ ~]# '''mdadm --manage /dev/md2 --add /dev/sdb2'''
mdadm: hot added /dev/sdb2
 
  
=== Another Look at the mdstat ===
+
===Another Look at the mdstat===
  
 
  [root@sme8-64-dev ~]# cat /proc/mdstat
 
  [root@sme8-64-dev ~]# cat /proc/mdstat
Line 112: Line 130:
 
  unused devices: <none>
 
  unused devices: <none>
  
{{note box|with a new disk it may be worthwhile to reinstall grub to avoid problems on startup error. The grub is the program that allows you to launch the operating systems.please enter the following commands. }}
+
{{note box|with a new disk it may be worthwhile to reinstall grub to avoid problems on startup error. The grub is the program that allows you to launch the operating systems. Please enter the following commands. }}
  
== HowTo: Write the GRUB boot sector ==
+
==HowTo: Write the GRUB boot sector==
 +
 
 +
{{Warning box|as the dd command is named "data destroyer" you need to be extremely prudent and sure of the name of source partition and/or destination. At first you should skip the dd command, Step 1 below, and attempt to install grub without it, see Step 2 below. If grub can be installed without using dd, then Step 1 can be discarded. }}
 +
 
 +
*1.dd
  
 
  [root@ ~]# '''dd if=/dev/sda1 of=/dev/sdb1'''
 
  [root@ ~]# '''dd if=/dev/sda1 of=/dev/sdb1'''
 +
 +
*2.grub
  
 
  [root@ ~]# '''grub'''
 
  [root@ ~]# '''grub'''
Line 138: Line 162:
 
   Running "embed /grub/e2fs_stage1_5 (hd0)"...  16 sectors are embedded.
 
   Running "embed /grub/e2fs_stage1_5 (hd0)"...  16 sectors are embedded.
 
  succeeded
 
  succeeded
   Running "install /grub/stage1 (hd0) (hd0)1+16 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
+
   Running "install /grub/stage1 (hd0) (hd1)1+16 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
 
  Done.
 
  Done.
 
   
 
   
 
  grub> '''quit'''
 
  grub> '''quit'''
  
 
+
<noinclude>
<noinclude>[[Category:Howto]][[Category:Administration:Storage]]</noinclude>
+
[[Category:Howto]]
 +
[[Category:Administration:Storage]]
 +
</noinclude>

Latest revision as of 23:33, 14 April 2021

PythonIcon.png Skill level: Advanced
The instructions on this page may require deviations from standard procedures. A good understanding of linux and Koozali SME Server is recommended.


Warning.png Warning:
Get it right or you will lose data. Take a backup! Let the raid sync, this can take quite a while.


SME Servers Raid Options are largely automated, if you built your system with a single hard disk simply logon as admin and select Disk Redundancy to add a new drive to your RAID1 array. The same procedure is used if you have a disk failure in a RAID array and you have replaced that failed disk.

But with the best laid plans things don't always go according to plan, these are the processes required to do it manually.

See also: Hard Disk Partitioning and Raid#Resynchronising_a_Failed_RAID

HowTo: Manage/Check a RAID1 Array from the command Line

What is the Status of the Array

[root@ ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[2] sda2[0]
      488279488 blocks [2/1] [U_]
      [=>...................]  recovery =  6.3% (31179264/488279488) finish=91.3min speed=83358K/sec
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

unused devices: <none>

HowTo: Reinstate a disk from the RAID1 Array with the command Line

Look at the mdstat

First we must determine which drive is in default.


[root@ ~]#cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
md2 : active raid1 sdb2[2](F) sda2[0]
      52323584 blocks [2/1] [U_]
      
unused devices: <none>

(S)= Spare (F)= Fail [0]= number of the disk


  Note:
As we can see the partition sdb2 is in default, we can see the flag: sdb2 [2] (F). We need to resynchronize the disk sdb to the existing array md2.


Fail and remove the disk, sdb in this case

mdadm: set /dev/sdb2 faulty in /dev/md2

[root@ ~]# mdadm --manage /dev/md2 --fail /dev/sdb2

mdadm: hot removed /dev/sdb2

[root@ ~]# mdadm --manage /dev/md2 --remove /dev/sdb2

mdadm: set /dev/sdb1 faulty in /dev/md1

[root@ ~]# mdadm --manage /dev/md1 --fail /dev/sdb1

mdadm: hot removed /dev/sdb1

[root@ ~]# mdadm --manage /dev/md1 --remove /dev/sdb1

Do your Disk Maintenance here

At this point the disk is idle.

[root@ ~]# cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sda1[0]
      104320 blocks [2/1] [U_]
      
md2 : active raid1 sda2[0]
      52323584 blocks [2/1] [U_]
      
unused devices: <none>


  Note:
You'll have to determine if your disk can be reinstated at the array. In fact sometimes a raid can get out of sync after a power failure but also some times for physical outages of the hard disk. It is necessary to test the hard disk if this occurs repeatedly. For this we will use smartctl.


For all the details available by SMART on the disk

[root@ ~]# smartctl -a /dev/sdb

At least two types of tests are possible, short (~ 1 min) and long (~ 10 min to 90 min).

[root@ ~]# smartctl -t short /dev/sdb #short test
[root@ ~]# smartctl -t long  /dev/sdb #long test

to access the results / statistics for these tests:

[root@ ~]# smartctl -l selftest /dev/sdb

You can refer to this page for more information how activate or understand the Analysis and Reporting Technology (SMART) Monitor_Disk_Health


  Note:
if you need to change the disk due to physical failure found by the smartctl command, install a new disk of the same capacity (or more) and enter the following commands to recreate new partitions by copying them from healthy disk sda.


[root@ ~]# sfdisk -d /dev/sda > sfdisk_sda.output
[root@ ~]# sfdisk /dev/sdb < sfdisk_sda.output

GPT Disks

Larger disks will be GPT Disks, sfdisk will not work - you will need to use gdisk and partx (parted)

[root@ ~]# yum install gdisk

The copy the partition table from a good disk to the new disk, the first line will copy the partition table from disk sda to sdd, the second will randomize the GUID

[root@ ~]# sgdisk /dev/sda -R /dev/sdd
[root@ ~]# sgdisk -G /dev/sdd

To view the partitions use partx

[root@ ~]# partx -l /dev/sdd


If you want to reinstate the same disk without replacing it, go to the next step.

Add the partitions back

mdadm: hot added /dev/sdb1

[root@ ~]# mdadm --manage /dev/md1 --add /dev/sdb1

mdadm: hot added /dev/sdb2

[root@ ~]# mdadm --manage /dev/md2 --add /dev/sdb2

Another Look at the mdstat

[root@sme8-64-dev ~]# cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
md2 : active raid1 sdb2[2] sda2[0]
      52323584 blocks [2/1] [U_]
      [>....................]  recovery =  1.9% (1041600/52323584) finish=14.7min speed=57866K/sec

unused devices: <none>


  Note:
with a new disk it may be worthwhile to reinstall grub to avoid problems on startup error. The grub is the program that allows you to launch the operating systems. Please enter the following commands.


HowTo: Write the GRUB boot sector

  Warning:
as the dd command is named "data destroyer" you need to be extremely prudent and sure of the name of source partition and/or destination. At first you should skip the dd command, Step 1 below, and attempt to install grub without it, see Step 2 below. If grub can be installed without using dd, then Step 1 can be discarded.


  • 1.dd
[root@ ~]# dd if=/dev/sda1 of=/dev/sdb1
  • 2.grub
[root@ ~]# grub

    GNU GRUB  version 0.95  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]

grub> device (hd0) /dev/sdb

grub> root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd

grub> setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  16 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd1)1+16 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.

grub> quit