Repairing RAID of the swap

From SME Server
Revision as of 20:56, 10 February 2016 by Arnaud (talk | contribs)
Jump to navigation Jump to search

Repairing manually the RAID of the swap

Author: Arnaud

Requirements:

  Warning:
This howto works for SME9.1, raid1, nolvm and only for the Raid-device concerning the swap.

Some adaptations may be necessary for other version of SME or for other parameters concerning the RAID and the LVM!


Because the SME is able to run without swap, the job can be done directly from the running SME, without any LiveCD or Rescue-mode.

Adapt the name of the partitions (hdX. sdX etc…) to your case.

The starting point: the device for the swap can't get sync

This can occur when a new disk has been added to the SME and that this disk is "a little bit" smaller than the disk what is already running. The raid sync (e.g. started from the console) works of "/", for "/boot" but not for the swap because os missing some space on the added disk.

look to the current state of the RAID:

   # cat /proc/mdstat
   Personalities : [raid1] 
   md0 : active raid1 vda1[0] vdb1[2]
         255936 blocks super 1.0 [2/2] [UU]
         
   md2 : active raid1 vda3[0]
         2095104 blocks super 1.1 [2/1] [U_]
         
   md1 : active raid1 vda2[0] vdb3[2]
         18600960 blocks super 1.1 [2/2] [UU]
         bitmap: 0/1 pages [0KB], 65536KB chunk
   
   unused devices: <none> 

As indicated over the console too, md2 runs with only 1 disk (vda3). The disk vdb2 is missing into the RAID. The reason is:

    # mdadm --manage /dev/md2 --add /dev/vdb2
   mdadm: /dev/vdb2 not large enough to join array (the disk#2 has been added to the machine afterwards)

The RAID array:

  • stop the swap:
   # swapoff -a 
   
  • get details about /dev/md2 and remember the UUID and the name of the RAID. In my case:
   # mdadm --detail /dev/md2
   /dev/md2:
           Version : 1.1
     Creation Time : Mon Feb  1 21:42:29 2016
        Raid Level : raid1
        Array Size : 2095104 (2046.34 MiB 2145.39 MB)
     Used Dev Size : 2095104 (2046.34 MiB 2145.39 MB)
      Raid Devices : 2
     Total Devices : 1
       Persistence : Superblock is persistent
   
       Update Time : Mon Feb  1 21:42:30 2016
             State : clean, degraded 
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 0
     Spare Devices : 0
   
              Name : localhost.localdomain:2
              UUID : 3ee2fded:12de8ad4:736bc4ee:b74e8f89
            Events : 4
   
       Number   Major   Minor   RaidDevice State
          0     252        3        0      active sync   /dev/vda3
          2       0        0        2      removed 


  • stop the RAID device:
    # mdadm --stop /dev/md2
   mdadm: stopped /dev/md2
     * check that md2 doesn't exist any more:  # mdadm --remove /dev/md2
   mdadm: error opening /dev/md2: No such file or directory 
   
  • remove the superblocks of vda3 (was previously into the RAID:
    # mdadm --zero-superblock /dev/vda3 
  • recreate the RAID device md2 with both disks with the UUID and the name of the old RAID:
   # mdadm  --create /dev/md2 --level=1 --raid-devices=2 /dev/vda3 /dev/vdb2 --uuid=3ee2fded:12de8ad4:736bc4ee:b74e8f89 --name=localhost.localdomain:2
   mdadm: Note: this array has metadata at the start and
       may not be suitable as a boot device.  If you plan to
       store '/boot' on this device please ensure that
       your boot-loader understands md/v1.x metadata, or use
       --metadata=0.90
   Continue creating array? y
   mdadm: Defaulting to version 1.2 metadata
   mdadm: array /dev/md2 started.
  • The RAID automatically starts to resync:
   # mdadm --detail /dev/md2
   /dev/md2:
           Version : 1.2
     Creation Time : Fri Feb  5 16:17:21 2016
        Raid Level : raid1
        Array Size : 2093120 (2044.41 MiB 2143.35 MB)
     Used Dev Size : 2093120 (2044.41 MiB 2143.35 MB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent
   
       Update Time : Fri Feb  5 16:18:03 2016
             State : clean, resyncing 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0
   
     Resync Status : 19% complete
   
              Name : localhost.localdomain:2
              UUID : 3ee2fded:12de8ad4:736bc4ee:b74e8f89
            Events : 3
   
       Number   Major   Minor   RaidDevice State
          0     252        3        0      active sync   /dev/vda3
          1     252       18        1      active sync   /dev/vdb2
    
  • Check that the UUID and the name are correct and wait the end of the sync.