Archive for March, 2023

LVM, backups and snapshots

Thursday, March 9th, 2023

When replacing (and upgrading) disks, pre-failure, I prefer backing up the original data first. Although the original disks are in a RAID0 (mirror) array, where available, I prefer to make a backup despite the fact I should be able to swap the old disk for a newer (and larger) disk without impact. This covers the case where the RAID0 array is able to read the data while both disks have failures in different locations, but removing a disk to upgrade would ultimately cause data-loss.

I have made use of LVM under Linux for decades. Segregating files onto separate logical-volumes (partitions) means that smaller filesystems can be repaired more quickly, while restricting the impact of data affected (or services down on a server), and the ability to extend logical-volumes has been a lifesaver. However, I haven’t made much use of LVM snapshots. LVM snapshots allow one to take an instant snapshot/copy (static, like a photo) of a logical-volume, and allow the original LV to continue being used/changed while one mounts and takes a backup of the snapshot. This is of most benefit when backing up volumes containing database files where consistency is necessary. Without such a snapshot (or stopping the database) individual DB files are likely to be changing as they are backed up, meaning a restored database will either be corrupt, or inconsistent.

In this particular case I want to make a copy of an LV (logical-volume) containing rsnapshot backups of multiple servers. rsnapshot runs every ~2 hours, and I know this backup will take a couple of days (2d4h30m) for 1.6TB of backups.

Rsnapshot gratuitously uses hard-links to reduce utilisation, so a copy needs to maintain hard-links. I would normally use rsync -H but, in this case, there is a large number of files/directories (19.5 million inodes) and rsync would uses a lot of RAM to complete the sync. This server has a small amount of RAM (2GB), and rsync crashed after 1-2 days with an out-of-memory error. The alternative is cp -a which doesn’t have the same RAM requirements.


root@stor01:~# lvcreate -L100G -s -n bkp_rsnapshot /dev/stor01_vg/rsnapshot_lv
Using default stripesize 64.00 KiB.
Logical volume "bkp_rsnapshot" created.
root@stor01:~# mount -r /dev/stor01_vg/bkp_rsnapshot /mnt/bkp

root@stor01:~# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
bkp_rsnapshot stor01_vg swi-aos--- 100.00g rsnapshot_lv 10.74
rsnapshot_lv stor01_vg owi-aos--- <1.60t
root@stor01:~# date ; time cp -a /mnt/bkp/ /mnt/dest/BACKUPS/rsnapshot/ ; date
Sun 5 Mar 21:19:43 GMT 2023

real 3142m7.637s
user 20m10.877s
sys 381m17.791s
Wed 8 Mar 01:41:51 GMT 2023
root@stor01:~# umount /mnt/bkp
root@stor01:~# lvremove /dev/stor01_vg/bkp_rsnapshot
Do you really want to remove and DISCARD active logical volume stor01_vg/bkp_rsnapshot? [y/n]: y
Logical volume "bkp_rsnapshot" successfully removed
root@stor01:~#

The source of an example is always useful, this from the LVM HOWTO on The Linux Documentation Project website.

Now that is done, I can be comfortable I have backups (albeit a couple of days old) to get me out of a hole if the disk replacement causes a volume failure.