Repairing RAID of the swap

From SME Server
Revision as of 21:10, 10 February 2016 by Arnaud (talk | contribs)
Jump to navigation Jump to search

Repairing manually the RAID of the swap

Author: Arnaud

Requirements:

  Warning:
This howto works for SME9.1, raid1, nolvm and only for the Raid-device concerning the swap.

Some adaptations may be necessary for other version of SME or for other parameters concerning the RAID and the LVM!


Because the SME is able to run without swap, the job can be done directly from the running SME, without any LiveCD or Rescue-mode.

Adapt the name of the partitions (hdX. sdX etc…) to your case.

The starting point: the device for the swap can't get sync

This can occur when a new disk has been added to the SME and that this disk is "a little bit" smaller than the disk what is already running. The raid sync (e.g. started from the console) works of "/", for "/boot" but not for the swap because os missing some space on the added disk.

look to the current state of the RAID:

   # cat /proc/mdstat
   Personalities : [raid1] 
   md0 : active raid1 vda1[0] vdb1[2]
         255936 blocks super 1.0 [2/2] [UU]
         
   md2 : active raid1 vda3[0]
         2095104 blocks super 1.1 [2/1] [U_]
         
   md1 : active raid1 vda2[0] vdb3[2]
         18600960 blocks super 1.1 [2/2] [UU]
         bitmap: 0/1 pages [0KB], 65536KB chunk
   
   unused devices: <none> 

As indicated over the console too, md2 runs with only 1 disk (vda3). The disk vdb2 is missing into the RAID. The reason is:

    # mdadm --manage /dev/md2 --add /dev/vdb2
   mdadm: /dev/vdb2 not large enough to join array (the disk#2 has been added to the machine afterwards)

The RAID array:

  • stop the swap:
   # swapoff -a 
   
  • get details about /dev/md2 and remember the UUID and the name of the RAID. In my case:
   # mdadm --detail /dev/md2
   /dev/md2:
           Version : 1.1
     Creation Time : Mon Feb  1 21:42:29 2016
        Raid Level : raid1
        Array Size : 2095104 (2046.34 MiB 2145.39 MB)
     Used Dev Size : 2095104 (2046.34 MiB 2145.39 MB)
      Raid Devices : 2
     Total Devices : 1
       Persistence : Superblock is persistent
   
       Update Time : Mon Feb  1 21:42:30 2016
             State : clean, degraded 
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 0
     Spare Devices : 0
   
              Name : localhost.localdomain:2
              UUID : 3ee2fded:12de8ad4:736bc4ee:b74e8f89
            Events : 4
   
       Number   Major   Minor   RaidDevice State
          0     252        3        0      active sync   /dev/vda3
          2       0        0        2      removed 


  • stop the RAID device:
    # mdadm --stop /dev/md2
   mdadm: stopped /dev/md2
     * check that md2 doesn't exist any more:  # mdadm --remove /dev/md2
   mdadm: error opening /dev/md2: No such file or directory 
   
  • remove the superblocks of vda3 (was previously into the RAID:
    # mdadm --zero-superblock /dev/vda3 
  • recreate the RAID device md2 with both disks with the UUID and the name of the old RAID:
   # mdadm  --create /dev/md2 --level=1 --raid-devices=2 /dev/vda3 /dev/vdb2 --uuid=3ee2fded:12de8ad4:736bc4ee:b74e8f89 --name=localhost.localdomain:2
   mdadm: Note: this array has metadata at the start and
       may not be suitable as a boot device.  If you plan to
       store '/boot' on this device please ensure that
       your boot-loader understands md/v1.x metadata, or use
       --metadata=0.90
   Continue creating array? y
   mdadm: Defaulting to version 1.2 metadata
   mdadm: array /dev/md2 started.
  • The RAID automatically starts to resync:
   # mdadm --detail /dev/md2
   /dev/md2:
           Version : 1.2
     Creation Time : Fri Feb  5 16:17:21 2016
        Raid Level : raid1
        Array Size : 2093120 (2044.41 MiB 2143.35 MB)
     Used Dev Size : 2093120 (2044.41 MiB 2143.35 MB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent
   
       Update Time : Fri Feb  5 16:18:03 2016
             State : clean, resyncing 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0
   
     Resync Status : 19% complete
   
              Name : localhost.localdomain:2
              UUID : 3ee2fded:12de8ad4:736bc4ee:b74e8f89
            Events : 3
   
       Number   Major   Minor   RaidDevice State
          0     252        3        0      active sync   /dev/vda3
          1     252       18        1      active sync   /dev/vdb2
    
  • Check that the UUID and the name are correct and wait the end of the sync.

The swap:

Open /etc/fstab and remember the UUID set for the swap file system

   nano /etc/fstab
   
   UUID=6844de9b-2c3c-433b-a7b5-c39258dbb85a swap                    swap    defaults        0 0 
  • Create a swap file system on the RAID device with the UUID present into /etc/fstab:
   # mkswap /dev/md2 -U 6844de9b-2c3c-433b-a7b5-c39258dbb85a
   mkswap : /dev/md2 : attention, ne pas effacer les secteurs de démarrage
           tout le disque.Use -f to force.
   Configure l'espace d'échange (swap) en version 1, taille = 2093116 Ko
   pas d'étiquette, UUID=6844de9b-2c3c-433b-a7b5-c39258dbb85a 
  • Reconfigure the server (only reboot should be sufficient in fact…)
   signal-event post-upgrade; signal-event reboot 
  • Check the RAID in the console or with:
    # cat /proc/mdstat 
  • Check the swap is running:
   # top
   
   top - 21:27:13 up 2 min,  1 user,  load average: 2.46, 1.02, 0.38
   Tasks: 213 total,   1 running, 212 sleeping,   0 stopped,   0 zombie
   Cpu(s):  0.1%us,  0.2%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
   Mem:   2029604k total,   957820k used,  1071784k free,    20912k buffers
   Swap:  2093116k total,        0k used,  2093116k free,   413852k cached 
   etc.........


  Tip:
Reusing the "old" UUIDs makes the thing simple because there is no need to change any parameter in the configuration of the SME.


Link to a topic of the forum relating to this.

Enjoy!


  Note:
Where was the difficulty??....