Table Of Contents

Previous topic

One application, multiple languages

Next topic

Les pseudo-terminaux

This Page

An upgrade safety net with the logical volume manager

It is the story of a website with files and a database which gets into production and then, after a while, needs to have its schema corrected and related files upgraded safely. As usual, it would have been better to get the schema right from the beginning but as errors always happen at some point, technologies such as the LVM snapshots exists so that the website gets its invasive (read scary) upgrade but there is a good safety net in case of a mess.

The idea is to take a snapshot of the disk partition: it is like making a backup of a partition and copy it back later whenever required, with the added benefit of :

  • the backup is done in almost not time, it is not depending on the size of the disk,
  • there is no actual copy of the blocks so it is very light on the io and space available,
  • most importantly, even if the backup happens during a massive modification of the filesystem structure, the filesystem data structures are coherent.

It is not magic though, the snapshot itself is a partition and it grows along when new data is added to the original partition to which the snapshot is attached. Make sure the snapshot partition is big enough to contain the updates to the original partition or suppress the snapshot partition before it gets full.

Note that the operation needs to be done from the administrator account.

A trick for fake physical partitions

Before showing how to setup a logical partition with lvm, let’s first present a cool trick to create virtual physical partitions: it is not related to lvm per se but it eases experimentation with RAID or LVM without mangling a real disk.

With losetup‘s loopback partitions, it is possible to turn a normal file into a disk image device, available in /dev. Here we create, with dd, a file of 1.5 gigabyte called loop1.raw in the current directory, and make it available under /dev/loop1

~# dd if=/dev/zero of=loop1.raw bs=1M count=40
~# losetup /dev/loop1 loop1.raw

~# grep loop /proc/partitions
 7        1     ... loop1

There was no partitions available called loop1 and it appeared after the losetup command. We will set up the partition to use with lvm in the next paragraph.

Setting up an logical volume

To represent LVM hot resize, hot backup or physical disk aggregation, there are three object you need to be familiar with, and for each of these objects, there is the corresponding shell commands for creating, listing and removing said object.

physical volume: a physical partitions of one of the physical hard
drive which has been allocated to lvm. Managed with pvcreate, pvs and pvremove. The s of pvs is for show.
volume group: aggregation of physical volumes. When I first read
documentation about LVM, I thought the volume groups was meant to aggregate the logical partitions and data, but it is the other way round, they aggregate the real partitions. Managed with vgcreate, vgs and vgremove.
logical volume: volume is what you will use in the end with mkfs
and mount. This is the device which look like the traditional partitions but with additional features like snapshots or resize on the fly. Managed with lvcreate, lvs and lvemove.

Use pvcreate from the lvm2 package to initialise the partition for use with lvm

~# pvcreate /dev/loop1
  Physical volume "/dev/loop1" successfully created

A disk can only be added once to a volume group, multiple physical disks compose a volume group

~# vgcreate datadisks /dev/loop1
  Volume group "datadisks" successfully created

So far, the partitions available have not changed and the /dev/datadisks/website partition does not exists. A logical volume can now be created, it has a name and a size parameter and is inside a group

~# lvcreate -n website -L 12M datadisks
  Logical volume "website" created

~# lvs
LV      VG        Attr   LSize  Origin Snap%  Move Log Copy%  Convert
  website datadisks -wi-a...M

~# grep dm /proc/partitions && ls /dev/datadisks/website
 252 ... dm-0
/dev/datadisks/website

Among the partitions, a new dm entry is shown (I’ll bet it stands for device mapper), the device is available in /dev contained in a directory named after the volume group.

As usual, the partition must be formatted and mounted to be integrated to the filesystem

~# mkfs.ext4 /dev/datadisks/website > /dev/null
~# mkdir -p ./mnt/website && mount /dev/datadisks/website ./mnt/website

Design of an upgrade plan

Let’s compose a dummy three-tier website, that we will have to upgrade, corrupt, rollback, etc

~# touch ./mnt/website/database
~# touch ./mnt/website/index.html
~# add_new_user () {
       echo "name:$1,age:$2" >> ./mnt/website/database ; }

With the adapted amount of marketing and public relation, the website is put in production and made available to the public. Everyday, torrents of new users line up to subscribe

~# add_new_user alice 29
~# add_new_user bob 18
~# cat ./mnt/website/database
name:alice,age:29
name:bob,age:18

Sparky the architect have realised that the database schema must be upgraded to include an id for each user. It should end up look like this:

id=001,name:alice,age:29
id=002,name:bob,age:18

Also, the website in production is not web2.0 enough, so a designer has done a great job beautifying a new prototype, which is added to the upgrade procedure. So the upgrade procedure is

~# upgrade_schema_and_website () {

      # Web changes
      touch ./mnt/website/{social-caramels.js,ponies.js,eye-candy.css}

      # API upgrade: now there is an id
      add_new_user () {
         echo "id:$RANDOM,name:$1,age:$2" >> ./mnt/website/database ; }

      # For the "db schema", you don't want to know ...
      nl -n rz -w 3 ./mnt/website/database \
         | sed 's/\t/,/; s/^/if:/' > ./mnt/website/database.new
      mv ./mnt/website/database{.new,}
      }

Rollback of a failed upgrade

The system administrator tunes a transaction API and convince the operator to use it the day of the upgrade. Before doing any change, the operator must use the command transaction. If all is well after a few days of testing, the command remove_snapshot is used, else the operator can use the abort function.

The transaction functions are built on top of the LVM snapshot

~# transaction () {
      lvcreate -s -n backup -L 24M  /dev/datadisks/website ; }

~# abort () {
      mkdir ./mnt/backup
      mount /dev/datadisks/backup ./mnt/backup

      # tar cf - -C ./mnt/backup . | tar  x -C ./mnt/website
      rsync --del -a ./mnt/backup/ ./mnt/website/ ;

      add_new_user () {
           echo "name:$1,age:$2" >> ./mnt/website/database ; }
      }

~# remove_snapshot () {
      umount /dev/datadisks/backup
      lvremove -f /dev/datadisks/backup ; }

The upgrade procedure requires the database to go read only, no new users can be created. Comes the night of the upgrade

~# transaction
Logical volume "backup" created
~# upgrade_schema_and_website

At dawn, the db looks like

~# cat ./mnt/website/database
if:001,name:alice,age:29
if:002,name:bob,age:18

Ouuuch man! it is corrupted, there is no ‘id’ column instead it is written ‘if’ everywhere now and we have no clue why. We need to go back to the lab, figure out what happened... What do we do now with this mess now: we need roll back so that the production site can continue. Easy, here is the command

~# abort

The abort is based on the lvcreate --snapshot and really is the core of this article. Now, to control that the rollback went fine

~# cat ./mnt/website/database
name:alice,age:29
name:bob,age:18

~# ls ./mnt/website/ponies.js 2>&1 || true
ls: cannot access ./mnt/website/ponies.js: No such file or directory

Ok, the situation is similar as before the upgrade. The service can be restored.

Note

It is actually not easy to get the right options for rsync or tar for re-install the data of the backup. The version 2.02.57 of lvm with the device mapper hopefully integrated into the linux 2.6.33 will be more convenient by integrating it to the lvconvert command of the LVM set of commands: no need for everyone to write (and debug) a custom abort function like we did. The new lvconvert command could be available in the distributions in the second half of 2010.

Fixing and re-applying the upgrade

Three weeks later, many more users have been created

~# add_new_user robwilco 35
~# add_new_user DuncanMacLeod 539

~# cat ./mnt/website/database
name:alice,age:29
name:bob,age:18
name:robwilco,age:35
name:DuncanMacLeod,age:539

R&D has come up with a complete re-design of the upgrade procedure: a snapshot and some correct database mangling commands. Only the schema upgrade was modified

~# upgrade_schema_and_website () {

    # Same as before ...
    touch ./mnt/website/{social-caramels.js,ponies.js,eye-candy.css}

    # Same as before ...
    add_new_user () {
      echo "id:$RANDOM,name:$1,age:$2" >> ./mnt/website/database ; }

    # Correction added here: substituted 'if' by 'id'
    nl -n rz -w 5 ./mnt/website/database \
       | sed 's/\t/,/; s/^/id:/' > ./mnt/website/database.new
    mv ./mnt/website/database.new ./mnt/website/database
    }

~# upgrade_schema_and_website

~# cat ./mnt/website/database
id:00001,name:alice,age:29
id:00002,name:bob,age:18
id:00003,name:robwilco,age:35
id:00004,name:DuncanMacLeod,age:539

At dawn, the database is correct, the snapshot safety net was thankfully not used. It is possible to confirm the upgrade by removing the snapshot

~# remove_snapshot
Logical volume "backup" successfully removed

Obviously, removing the snapshot does not impact the original partition

~# cat ./mnt/website/database
id:00001,name:alice,age:29
id:00002,name:bob,age:18
id:00003,name:robwilco,age:35
id:00004,name:DuncanMacLeod,age:539

We are done with this howto, to clean up after this exercice

~# umount ./mnt/website
~# lvremove -f /dev/datadisks/backup 2> /dev/null || true
~# lvremove -f /dev/datadisks/website
Logical volume "website" successfully removed

~# vgremove datadisks
Volume group "datadisks" successfully removed

~# pvremove /dev/loop1
Labels on physical volume "/dev/loop1" successfully wiped

~# losetup -d /dev/loop1
~# rm -r ./mnt/backup ./mnt/website loop1.raw

9 Feb 2010, this article was verified with the wordish_ module