|
|
Line 22: |
Line 22: |
| unused devices: <none> | | unused devices: <none> |
| | | |
− | === Are the Disk Partitioned Correctly ? === | + | == HowTo: Reinstate a disk from the RAID1 Array with the command Line == |
| | | |
− | Here two disks are partitioned identically
| + | === Look at the mdstat === |
| | | |
− | [root@ ~]# '''fdisk -lu /dev/sda; fdisk -lu /dev/sdb'''
| + | First we must determine which drive is in default. |
− |
| |
− | Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
| |
− | 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
| |
− | Units = sectors of 1 * 512 = 512 bytes
| |
− |
| |
− | Device Boot Start End Blocks Id System
| |
− | /dev/sda1 * 63 208844 104391 fd Linux raid autodetect
| |
− | /dev/sda2 208845 1953520064 976655610 fd Linux raid autodetect
| |
− |
| |
− | Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
| |
− | 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
| |
− | Units = sectors of 1 * 512 = 512 bytes
| |
− |
| |
− | Device Boot Start End Blocks Id System
| |
− | /dev/sdb1 * 63 208844 104391 fd Linux raid autodetect
| |
− | /dev/sdb2 208845 1953520064 976655610 fd Linux raid autodetect
| |
| | | |
− | ==== Example : Incorrecty Partitioned 2nd Disk ====
| |
| | | |
− | In this example the partitions are set too close to the start of the disk and there is no room for GRUB to be written, the disk will not boot, there will not be enough room for grub ''staging''
| + | [root@ ~]#'''cat /proc/mdstat''' |
− | | + | Personalities : [raid1] |
− | [root@ ~]# '''fdisk -l /dev/sdb; fdisk -lu /dev/sdb'''
| |
− |
| |
− | Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
| |
− | 255 heads, 63 sectors/track, 121601 cylinders
| |
− | Units = cylinders of 16065 * 512 = 8225280 bytes
| |
− |
| |
− | Device Boot Start End Blocks Id System
| |
− | /dev/sdb1 * 1 13 104384+ fd Linux raid autodetect
| |
− | '''Partition 1 does not end on cylinder boundary.'''
| |
− | /dev/sdb2 13 121601 976655647 fd Linux raid autodetect
| |
− |
| |
− | Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
| |
− | 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
| |
− | Units = sectors of 1 * 512 = 512 bytes
| |
− |
| |
− | Device Boot Start End Blocks Id System
| |
− | /dev/sdb1 * 1 208769 104384+ fd Linux raid autodetect
| |
− | '''Partition 1 does not end on cylinder boundary.'''
| |
− | /dev/sdb2 208770 1953520063 976655647 fd Linux raid autodetect
| |
− | | |
− | ===== message Log showing Grub errors =====
| |
− | | |
− | add_drive_to_raid: Waiting for boot partition to sync before installing grub...
| |
− | add_drive_to_raid: Probing devices to guess BIOS drives. This may take a long time.
| |
− | add_drive_to_raid:
| |
− | add_drive_to_raid:
| |
− | add_drive_to_raid: GNU GRUB version 0.95 (640K lower / 3072K upper memory)
| |
− | add_drive_to_raid:
| |
− | add_drive_to_raid: [ Minimal BASH-like line editing is supported. For the first word, TAB
| |
− | add_drive_to_raid: lists possible command completions. Anywhere else TAB lists the possible
| |
− | add_drive_to_raid: completions of a device/filename.]
| |
− | add_drive_to_raid: grub> device (hd0) /dev/sdb
| |
− | add_drive_to_raid: grub> root (hd0,0)
| |
− | add_drive_to_raid: Filesystem type is ext2fs, partition type 0xfd
| |
− | add_drive_to_raid: grub> setup (hd0)
| |
− | add_drive_to_raid: Checking if "/boot/grub/stage1" exists... no
| |
− | add_drive_to_raid: Checking if "/grub/stage1" exists... yes
| |
− | add_drive_to_raid: Checking if "/grub/stage2" exists... yes
| |
− | add_drive_to_raid: Checking if "/grub/e2fs_stage1_5" exists... yes
| |
− | add_drive_to_raid: Running "embed /grub/e2fs_stage1_5 (hd0)"... failed (this is not fatal)
| |
− | add_drive_to_raid: Running "embed /grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal)
| |
− | add_drive_to_raid: Running "install /grub/stage1 (hd0) /grub/stage2 p /grub/grub.conf "... succeeded
| |
− | add_drive_to_raid: Done.
| |
− | add_drive_to_raid: grub> quit
| |
− | | |
− | == HowTo: Remove a disk from the RAID1 Array from the command Line ==
| |
− | | |
− | === Look at the mdstat ===
| |
− | | |
− | [root@ ~]# '''cat /proc/mdstat''' | |
− | Personalities : [raid1] | |
− | md2 : active raid1 sdb2[1] sda2[0]
| |
− | 488279488 blocks [2/2] [UU]
| |
− |
| |
| md1 : active raid1 sdb1[1] sda1[0] | | md1 : active raid1 sdb1[1] sda1[0] |
| 104320 blocks [2/2] [UU] | | 104320 blocks [2/2] [UU] |
− | | + | |
| + | md2 : active raid1 sdb2[2](F) sda2[0] |
| + | 52323584 blocks [2/1] [U_] |
| + | |
| unused devices: <none> | | unused devices: <none> |
| + | |
| + | (S)= Spare |
| + | (F)= Fail |
| + | [0]= number of the disk |
| + | |
| + | {{note box|As we can see the partition sdb2 is in default, we can see the flag: sdb2 [2] (F). We need to resynchronize the disk sdb to the existing array md2.}} |
| | | |
| === Fail and remove the disk, '''sdb''' in this case === | | === Fail and remove the disk, '''sdb''' in this case === |
Line 120: |
Line 58: |
| === Do your Disk Maintenance here === | | === Do your Disk Maintenance here === |
| | | |
− | At this point the disk is idle, repartition it etc before adding it back to array | + | At this point the disk is idle. |
| + | |
| + | [root@ ~]# '''cat /proc/mdstat''' |
| + | Personalities : [raid1] |
| + | md1 : active raid1 sda1[0] |
| + | 104320 blocks [2/1] [U_] |
| + | |
| + | md2 : active raid1 sda2[0] |
| + | 52323584 blocks [2/1] [U_] |
| + | |
| + | unused devices: <none> |
| + | |
| + | {{note box|You'll have to determine if your disk can be reinstated at the array. In fact sometimes a raid can get out of sync after a power failure but also some times for physical outages of the hard disk. It is necessary to test the hard disk if this occurs repeatedly. For this we will use '''smartctl'''.}} |
| + | |
| + | For all the details available by SMART on the disk |
| + | |
| + | [root@ ~]# '''smartctl -a /dev/sdb''' |
| + | |
| + | At least two types of tests are possible, short (~ 1 min) and long (~ 10 min to 90 min). |
| + | |
| + | [root@ ~]# '''smartctl -t short /dev/sdb''' #short test |
| + | [root@ ~]# '''smartctl -t long /dev/sdb''' #long test |
| + | |
| + | to access the results / statistics for these tests: |
| + | |
| + | [root@ ~]# '''smartctl -l selftest /dev/sdb''' |
| + | |
| + | {{Note box|if you need to change the disk due to physical failure found by the smartctl command, install a new disk of the same capacity (or more) and enter the following commands to recreate new partitions by copying them from healthy disk sda.}} |
| + | |
| + | [root@ ~]# '''sfdisk -d /dev/sda > sfdisk_sda.output''' |
| + | [root@ ~]# '''sfdisk /dev/sdb < sfdisk_sda.output''' |
| + | |
| + | If you want to reinstate the same disk without replacing it, go to the next step. |
| | | |
| === Add the partitions back === | | === Add the partitions back === |
Line 131: |
Line 101: |
| === Another Look at the mdstat === | | === Another Look at the mdstat === |
| | | |
− | [root@ ~]# '''cat /proc/mdstat''' | + | [root@sme8-64-dev ~]# cat /proc/mdstat |
− | Personalities : [raid1] | + | Personalities : [raid1] |
− | md2 : active raid1 sdb2[2] sda2[0]
| |
− | 488279488 blocks [2/1] [U_]
| |
− | [=>...................] recovery = 6.3% (31179264/488279488) finish=91.3min speed=83358K/sec
| |
| md1 : active raid1 sdb1[1] sda1[0] | | md1 : active raid1 sdb1[1] sda1[0] |
| 104320 blocks [2/2] [UU] | | 104320 blocks [2/2] [UU] |
| + | |
| + | md2 : active raid1 sdb2[2] sda2[0] |
| + | 52323584 blocks [2/1] [U_] |
| + | [>....................] recovery = 1.9% (1041600/52323584) finish=14.7min speed=57866K/sec |
| | | |
| unused devices: <none> | | unused devices: <none> |
| | | |
− | == HowTo: Partition / Re-Partition a disk ==
| + | {{note box|with a new disk it may be worthwhile to reinstall grub to avoid problems on startup error. The grub is the program that allows you to launch the operating systems.please enter the following commands. }} |
| | | |
− | === Delete Existing Partitions === | + | == HowTo: Write the GRUB boot sector == |
| | | |
− | [root@ ~]# '''fdisk /dev/sdb''' | + | [root@ ~]# '''dd if=/dev/sda1 of=/dev/sdb1''' |
− |
| |
− | The number of cylinders for this disk is set to 121601.
| |
− | There is nothing wrong with that, but this is larger than 1024,
| |
− | and could in certain setups cause problems with:
| |
− | 1) software that runs at boot time (e.g., old versions of LILO)
| |
− | 2) booting and partitioning software from other OSs
| |
− | (e.g., DOS FDISK, OS/2 FDISK)
| |
− |
| |
− | Command (m for help): '''p'''
| |
− |
| |
− | Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
| |
− | 255 heads, 63 sectors/track, 121601 cylinders
| |
− | Units = cylinders of 16065 * 512 = 8225280 bytes
| |
− |
| |
− | Device Boot Start End Blocks Id System
| |
− | /dev/sdb1 * 1 13 104384+ fd Linux raid autodetect
| |
− | Partition 1 does not end on cylinder boundary.
| |
− | /dev/sdb2 13 121601 976655647 fd Linux raid autodetect
| |
− |
| |
− | Command (m for help): '''d'''
| |
− | Partition number (1-4): '''1'''
| |
− |
| |
− | Command (m for help): '''d'''
| |
− | Selected partition 2
| |
− |
| |
− | Command (m for help): '''w'''
| |
− | The partition table has been altered!
| |
− |
| |
− | Calling ioctl() to re-read partition table.
| |
− | Syncing disks.
| |
− | | |
− | === Create new partitions ===
| |
− | | |
− | Note: change the partitions system id to reflect Linux raid autodetect
| |
− | | |
− | [root@ ~]# fdisk /dev/sdb
| |
− |
| |
− | The number of cylinders for this disk is set to 121601.
| |
− | There is nothing wrong with that, but this is larger than 1024,
| |
− | and could in certain setups cause problems with:
| |
− | 1) software that runs at boot time (e.g., old versions of LILO)
| |
− | 2) booting and partitioning software from other OSs
| |
− | (e.g., DOS FDISK, OS/2 FDISK)
| |
− |
| |
− | Command (m for help): '''n'''
| |
− | Command action
| |
− | e extended
| |
− | p primary partition (1-4)
| |
− | '''p'''
| |
− | Partition number (1-4): 1
| |
− | First cylinder (1-121601, default 1):
| |
− | Using default value 1
| |
− | Last cylinder or +size or +sizeM or +sizeK (1-121601, default 121601): 13
| |
− |
| |
− | Command (m for help): '''n'''
| |
− | Command action
| |
− | e extended
| |
− | p primary partition (1-4)
| |
− | '''p'''
| |
− | Partition number (1-4): 2
| |
− | First cylinder (14-121601, default 14):
| |
− | Using default value 14
| |
− | Last cylinder or +size or +sizeM or +sizeK (14-121601, default 121601):
| |
− | Using default value 121601
| |
− |
| |
− | Command (m for help): '''a'''
| |
− | Partition number (1-4): '''1'''
| |
− |
| |
− | Command (m for help): '''t'''
| |
− | Partition number (1-4): '''1'''
| |
− | Hex code (type L to list codes): '''fd'''
| |
− | Changed system type of partition 1 to fd (Linux raid autodetect)
| |
− |
| |
− | Command (m for help): '''t'''
| |
− | Partition number (1-4): '''2'''
| |
− | Hex code (type L to list codes): '''fd'''
| |
− | Changed system type of partition 2 to fd (Linux raid autodetect)
| |
− |
| |
− | Command (m for help): '''p'''
| |
− |
| |
− | Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
| |
− | 255 heads, 63 sectors/track, 121601 cylinders
| |
− | Units = cylinders of 16065 * 512 = 8225280 bytes
| |
− |
| |
− | Device Boot Start End Blocks Id System
| |
− | /dev/sdb1 * 1 13 104391 fd Linux raid autodetect
| |
− | /dev/sdb2 14 121601 976655610 fd Linux raid autodetect
| |
− |
| |
− | Command (m for help): '''w'''
| |
− | The partition table has been altered!
| |
− |
| |
− | Calling ioctl() to re-read partition table.
| |
− | Syncing disks.
| |
− | | |
− | == HowTo: Write the GRUB boot sector ==
| |
| | | |
| [root@ ~]# '''grub''' | | [root@ ~]# '''grub''' |
Line 267: |
Line 143: |
| grub> '''quit''' | | grub> '''quit''' |
| | | |
− | == My Experience ==
| |
− |
| |
− | My experience is loosely alluded to above, I upgraded the disk in a Dell server with two new Seagate 1Tb ST1000340NS from the 500Gb that came with the server, they are a Server Edition disk.
| |
− |
| |
− | The Disks were installed separately and allowed to come up and sync in the array, the first indication that something wasn't working was the machine would not boot when the 2nd disk was installed. I set back my original first disk and looked through the system log, noting th grub failures. ''It's not fatal'' was the message, but it did stop the machine from booting on the disk, perhaps that's just ''not living, therefore not fatal'', whatever, it's not terribly useful. It did this on both disks.
| |
− |
| |
− | What has happened is that disk partition was written too close to the start of the drive, so the boot record didn't have enough room for its GRUB staging - if thats the right term.
| |
− |
| |
− | To correct this:
| |
− | # I removed the disk from the array
| |
− | ## by failing it
| |
− | ## then remove it
| |
− | # then repartitioning
| |
− | # add it back to the array
| |
− | # and finally re-wrote grub
| |
− |
| |
− | David Bray
| |
− |
| |
− | 17 March, 2010
| |
| | | |
| <noinclude>[[Category:Howto]][[Category:Administration:Storage]]</noinclude> | | <noinclude>[[Category:Howto]][[Category:Administration:Storage]]</noinclude> |
Raid: Manual Rebuild
|
Skill level: Advanced
|
|
The instructions on this page may require deviations from standard procedures. A good understanding of linux and Koozali SME Server is recommended.
|
|
Warning:
|
|
Get it right or you will lose data. Take a backup, let the raid sync
|
SME Servers Raid Options are largely automated, if you built your system with a single hard disk, or have a hard disk failure, simply logon as admin and select Disk Redundancy to add a new drive to your RAID1 array.
But with the best laid plans things don't always goaccording to plan, these are the processes required to do it manually
HowTo: Manage/Check a RAID1 Array from the command Line
What is the Status of the Array
[root@ ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[2] sda2[0]
488279488 blocks [2/1] [U_]
[=>...................] recovery = 6.3% (31179264/488279488) finish=91.3min speed=83358K/sec
md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]
unused devices: <none>
HowTo: Reinstate a disk from the RAID1 Array with the command Line
Look at the mdstat
First we must determine which drive is in default.
[root@ ~]#cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]
md2 : active raid1 sdb2[2](F) sda2[0]
52323584 blocks [2/1] [U_]
unused devices: <none>
(S)= Spare
(F)= Fail
[0]= number of the disk
|
Note:
|
|
As we can see the partition sdb2 is in default, we can see the flag: sdb2 [2] (F). We need to resynchronize the disk sdb to the existing array md2.
|
Fail and remove the disk, sdb in this case
[root@ ~]# mdadm --manage /dev/md2 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md2
[root@ ~]# mdadm --manage /dev/md2 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2
[root@ ~]# mdadm --manage /dev/md1 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1
[root@ ~]# mdadm --manage /dev/md1 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1
Do your Disk Maintenance here
At this point the disk is idle.
[root@ ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[0]
104320 blocks [2/1] [U_]
md2 : active raid1 sda2[0]
52323584 blocks [2/1] [U_]
unused devices: <none>
|
Note:
|
|
You'll have to determine if your disk can be reinstated at the array. In fact sometimes a raid can get out of sync after a power failure but also some times for physical outages of the hard disk. It is necessary to test the hard disk if this occurs repeatedly. For this we will use smartctl.
|
For all the details available by SMART on the disk
[root@ ~]# smartctl -a /dev/sdb
At least two types of tests are possible, short (~ 1 min) and long (~ 10 min to 90 min).
[root@ ~]# smartctl -t short /dev/sdb #short test
[root@ ~]# smartctl -t long /dev/sdb #long test
to access the results / statistics for these tests:
[root@ ~]# smartctl -l selftest /dev/sdb
|
Note:
|
|
if you need to change the disk due to physical failure found by the smartctl command, install a new disk of the same capacity (or more) and enter the following commands to recreate new partitions by copying them from healthy disk sda.
|
[root@ ~]# sfdisk -d /dev/sda > sfdisk_sda.output
[root@ ~]# sfdisk /dev/sdb < sfdisk_sda.output
If you want to reinstate the same disk without replacing it, go to the next step.
Add the partitions back
[root@ ~]# mdadm --manage /dev/md1 --add /dev/sdb1
mdadm: hot added /dev/sdb1
[root@ ~]# mdadm --manage /dev/md2 --add /dev/sdb2
mdadm: hot added /dev/sdb2
Another Look at the mdstat
[root@sme8-64-dev ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]
md2 : active raid1 sdb2[2] sda2[0]
52323584 blocks [2/1] [U_]
[>....................] recovery = 1.9% (1041600/52323584) finish=14.7min speed=57866K/sec
unused devices: <none>
|
Note:
|
|
with a new disk it may be worthwhile to reinstall grub to avoid problems on startup error. The grub is the program that allows you to launch the operating systems.please enter the following commands.
|
HowTo: Write the GRUB boot sector
[root@ ~]# dd if=/dev/sda1 of=/dev/sdb1
[root@ ~]# grub
GNU GRUB version 0.95 (640K lower / 3072K upper memory)
[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]
grub> device (hd0) /dev/sdb
grub> root (hd0,0)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/e2fs_stage1_5" exists... yes
Running "embed /grub/e2fs_stage1_5 (hd0)"... 16 sectors are embedded.
succeeded
Running "install /grub/stage1 (hd0) (hd0)1+16 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> quit