Raid:Manual Rebuild

From SME Server
Revision as of 17:09, 10 May 2010 by Timn (talk | contribs)
Jump to navigationJump to search

Raid: Manual Rebuild

PythonIcon.png Skill level: Advanced
The instructions on this page may require deviations from standard procedures. A good understanding of linux and Koozali SME Server is recommended.



Warning.png Warning:
Get it right or you will lose data. Take a backup, let the raid sync


SME Servers Raid Options are largely automated, if you built your system with a single hard disk, or have a hard disk failure, simply logon as admin and select Disk Redundancy to add a new drive to your RAID1 array.

But with the best laid plans things don't always goaccording to plan, these are the processes required to do it manually

HowTo: Manage/Check a RAID1 Array from the command Line

What is the Status of the Array

[root@ ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[2] sda2[0]
      488279488 blocks [2/1] [U_]
      [=>...................]  recovery =  6.3% (31179264/488279488) finish=91.3min speed=83358K/sec
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

unused devices: <none>

Are the Disk Partitioned Correctly ?

Here two disks are partitioned identically

[root@ ~]# fdisk -lu /dev/sda; fdisk -lu /dev/sdb

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *          63      208844      104391   fd  Linux raid autodetect
/dev/sda2          208845  1953520064   976655610   fd  Linux raid autodetect

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *          63      208844      104391   fd  Linux raid autodetect
/dev/sdb2          208845  1953520064   976655610   fd  Linux raid autodetect

Example : Incorrecty Partitioned 2nd Disk

In this example the partitions are set too close to the start of the disk and there is no room for GRUB to be written, the disk will not boot, there will not be enough room for grub staging

[root@ ~]# fdisk -l /dev/sdb; fdisk -lu /dev/sdb

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104384+  fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sdb2              13      121601   976655647   fd  Linux raid autodetect

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1      208769      104384+  fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sdb2          208770  1953520063   976655647   fd  Linux raid autodetect
message Log showing Grub errors
add_drive_to_raid: Waiting for boot partition to sync before installing grub...
add_drive_to_raid: Probing devices to guess BIOS drives. This may take a long time.
add_drive_to_raid:
add_drive_to_raid:
add_drive_to_raid:     GNU GRUB  version 0.95  (640K lower / 3072K upper memory)
add_drive_to_raid:
add_drive_to_raid:  [ Minimal BASH-like line editing is supported.  For the first word, TAB
add_drive_to_raid:    lists possible command completions.  Anywhere else TAB lists the possible
add_drive_to_raid:    completions of a device/filename.]
add_drive_to_raid: grub> device (hd0) /dev/sdb
add_drive_to_raid: grub> root (hd0,0)
add_drive_to_raid:  Filesystem type is ext2fs, partition type 0xfd
add_drive_to_raid: grub> setup (hd0)
add_drive_to_raid:  Checking if "/boot/grub/stage1" exists... no
add_drive_to_raid:  Checking if "/grub/stage1" exists... yes
add_drive_to_raid:  Checking if "/grub/stage2" exists... yes
add_drive_to_raid:  Checking if "/grub/e2fs_stage1_5" exists... yes
add_drive_to_raid:  Running "embed /grub/e2fs_stage1_5 (hd0)"... failed (this is not fatal)
add_drive_to_raid:  Running "embed /grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal)
add_drive_to_raid:  Running "install /grub/stage1 (hd0) /grub/stage2 p /grub/grub.conf "... succeeded
add_drive_to_raid: Done.
add_drive_to_raid: grub> quit

HowTo: Remove a disk from the RAID1 Array from the command Line

Look at the mdstat

[root@ ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[1] sda2[0]
      488279488 blocks [2/2] [UU]

md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

unused devices: <none>

Fail and remove the disk, sdb in this case

[root@ ~]# mdadm --manage /dev/md2 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md2
[root@ ~]# mdadm --manage /dev/md2 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2
[root@ ~]# mdadm --manage /dev/md1 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1
[root@ ~]# mdadm --manage /dev/md1 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1

Do your Disk Maintenance here

At this point the disk is idle, repartition it etc before adding it back to array

Add the partitions back

[root@ ~]# mdadm --manage /dev/md1 --add /dev/sdb1
mdadm: hot added /dev/sdb1
[root@ ~]# mdadm --manage /dev/md2 --add /dev/sdb2
mdadm: hot added /dev/sdb2

Another Look at the mdstat

[root@ ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[2] sda2[0]
      488279488 blocks [2/1] [U_]
      [=>...................]  recovery =  6.3% (31179264/488279488) finish=91.3min speed=83358K/sec
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

unused devices: <none>

HowTo: Partition / Re-Partition a disk

Delete Existing Partitions

[root@ ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 121601.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104384+  fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sdb2              13      121601   976655647   fd  Linux raid autodetect

Command (m for help): d
Partition number (1-4): 1

Command (m for help): d
Selected partition 2

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Create new partitions

Note: change the partitions system id to reflect Linux raid autodetect

[root@ ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 121601.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-121601, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-121601, default 121601): 13

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (14-121601, default 14):
Using default value 14
Last cylinder or +size or +sizeM or +sizeK (14-121601, default 121601):
Using default value 121601

Command (m for help): a
Partition number (1-4): 1

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104391   fd  Linux raid autodetect
/dev/sdb2              14      121601   976655610   fd  Linux raid autodetect
 
Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

HowTo: Write the GRUB boot sector

[root@ ~]# grub

    GNU GRUB  version 0.95  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]

grub> device (hd0) /dev/sdb

grub> root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd

grub> setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  16 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+16 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.

grub> quit

My Experience

My experience is loosely alluded to above, I upgraded the disk in a Dell server with two new Seagate 1Tb ST1000340NS from the 500Gb that came with the server, they are a Server Edition disk.

The Disks were installed separately and allowed to come up and sync in the array, the first indication that something wasn't working was the machine would not boot when the 2nd disk was installed. I set back my original first disk and looked through the system log, noting th grub failures. It's not fatal was the message, but it did stop the machine from booting on the disk, perhaps that's just not living, therefore not fatal, whatever, it's not terribly useful. It did this on both disks.

What has happened is that disk partition was written too close to the start of the drive, so the boot record didn't have enough room for its GRUB staging - if thats the right term.

To correct this:

  1. I removed the disk from the array
    1. by failing it
    2. then remove it
  2. then repartitioning
  3. add it back to the array
  4. and finally re-wrote grub

David Bray

17 March, 2010