LVM

From regional-training
Revision as of 20:08, 19 August 2022 by Ralph (talk | contribs) (→‎categories)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Logical Volume Management.

Device Mapper

Device-mapper is infrastructure in the Linux kernel that provides a generic way to create virtual layers of block devices.

The device-mapper is a component of the linux kernel since version 2.6 that support logical volume management. It is required by LVM2 and EVMS. If you need device-mapper you should install dmsetup and libdevmapper.

The encryption target dm-crypt may be used to create and use encrypted disks

LVM

Logical Volume Management utilizes the kernel's device-mapper feature to provide a system of partitions independent of underlying disk layout. With LVM you abstract your storage and have "virtual partitions", making extending/shrinking easier (subject to potential filesystem limitations).

Virtual partitions allow addition and removal without worry of whether you have enough contiguous space on a particular disk, getting caught up fdisking a disk in use (and wondering whether the kernel is using the old or new partition table), or, having to move other partitions out of the way.


See also [1]

Basic building blocks of LVM:

Physical volume (PV)

  • Unix block device node, usable for storage by LVM. Examples are:
    • a hard disk,
    • an MBR or GPT partition,
    • a loopback file,
    • a device mapper device (e.g. dm-crypt).

The PV hosts an LVM header.

Volume group (VG)

  • A group of PVs that serves as a container for LVs. PEs are allocated from a VG for a LV.

Logical volume (LV)

  • "Virtual/logical partition" that resides in a VG and is composed of PEs. LVs are Unix block devices analogous to physical partitions, e.g. they can be directly formatted with a file system.

Physical extent (PE)

  • The smallest contiguous extent (default 4 MiB) in the PV that can be assigned to a LV. Think of PEs as parts of PVs that can be allocated to any LV.

Examples:

Physical disks

  Disk1 (/dev/sda):
     _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    |Partition1 50 GiB (Physical volume) |Partition2 80 GiB (Physical volume)     |
    |/dev/sda1                           |/dev/sda2                               |
    |_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |

  Disk2 (/dev/sdb):
     _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    |Partition1 120 GiB (Physical volume)                 |
    |/dev/sdb1                                            |
    |_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|

LVM logical volumes

  Volume Group1 (/dev/MyVolGroup/ = /dev/sda1 + /dev/sda2 + /dev/sdb1):
     _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    |Logical volume1 15 GiB  |Logical volume2 35 GiB      |Logical volume3 200 GiB              |
    |/dev/MyVolGroup/rootvol |/dev/MyVolGroup/homevol     |/dev/MyVolGroup/mediavol             |
    |_ _ _ _ _ _ _ _ _ _ _ _ |_ _ _ _ _ _ _ _ _ _ _ _ _ _ |_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|

Advantages

LVM gives you more flexibility than just using normal hard drive partitions:

  • Use any number of disks as one big disk.
  • Have logical volumes stretched over several disks.
  • Create small logical volumes and resize them "dynamically" as they get filled up.
  • Resize logical volumes regardless of their order on disk. It does not depend on the position of the LV within VG, there is no need to ensure surrounding available space.
  • Resize/create/delete logical and physical volumes online. File systems on them still need to be resized, but some (such as ext4) support online resizing.
  • Online/live migration of LV being used by services to different disks without having to restart services.
  • Snapshots allow you to backup a frozen copy of the file system, while keeping service downtime to a minimum.
  • Support for various device-mapper targets, including transparent filesystem encryption and caching of frequently used data. This allows creating a system with (one or more) physical disks (encrypted with LUKS) and LVM on top to allow for easy resizing and management of separate volumes (e.g. for /, /home, /backup, etc.) without the hassle of entering a key multiple times on boot.

Disadvantages

  • Additional steps in setting up the system, more complicated. Requires (multiple) daemons to constantly run.
  • If dual-booting, note that Windows does not support LVM; you will be unable to access any LVM partitions from Windows.
  • If your physical volumes are not on a RAID-1, RAID-5 or RAID-6 losing one disk can lose one or more logical volumes if you span (or extend) your logical volumes across multiple non-redundant disks.

Volume Operations

op comment
pvcreate /dev/sda create a PV on entire disk /dev/sda
pvs display PVs
pvresize extend PV (after you have enlarged the partition via fdisk or gparted)
pvresize will automatic detect and extend the PV to its maximum size.

a partitioning tools is required to be run afterwards to adjust the size of a partition if shrinking, or must be used to increase the size of a partition if pvresize larger.

pvdisplay displays information and extends of PVs, associated volume group and any free space that extends across PVs
pvmove used to move PEs from one PV to another
This may take a long time on large volumes and fsck should be run afterwards, a crash during pvmove could prove fatal.
vgcreate create a volume group of one or more device(s)
e.g. vgreate vg0 /dev/sdb1 /dev/sdb2
vgs display volume groups
vgrename rename the volume group
vgextend <name> <dev> extend vg <name> onto <dev>
pvmove <dev> remove LVM from <dev>
pvmove <dev1> <dev2> move LVM PV from <dev1> to <dev2>
Then move PV from VG vgreduce <vg> <dev1>
vgreduce <vg> <dev> remove PV dev from VG
vgreduce --all <vg> remove all empty PV from the VG
vgreduce --removemissing --force <vg> remove missing/broken disk from VG
pvremove <dev> remove the partition from the VG so it can be used for something else
lvcreate -L <size> <vg> -n <name> create a logical volume in a volume group of specified size and named via <vg>-<name>
e.g lvcreate -L 8G redhat -n swap
lvrename <vg> <lvold> <lvnew> rename logical volume in VG
e.g. lvrename redhat swap swap-s
lvresize -L 15G --resizefs <vg>/<lv> resize logical volume <lv> in <vg> to 15 GiB and resize the file system all at once.
lvextend -l 100%FREE --resizefs <vg>/<lv> resize the logical volume in <vg> to full extents and extend the filesystem
resizefs /dev/vg/lv resize the logical volume file system to the LV size
lvs list all logical volumes
lsblk list volumes and mount point

snapshots

LVM allows you to take a snapshot of your system in a much more efficient way than a traditional backup. It does this efficiently by using a COW (copy-on-write) policy.

The initial snapshot you take simply contains hard-links to the inodes of your actual data. So long as your data remains unchanged, the snapshot merely contains its inode pointers and not the data itself. Whenever you modify a file or directory that the snapshot points to, LVM automatically clones the data, the old copy referenced by the snapshot, and the new copy referenced by your active system. Thus, you can snapshot a system with 35 GiB of data using just 2 GiB of free space so long as you modify less than 2 GiB (on both the original and snapshot). In order to be able to create snapshots you need to have unallocated space in your volume group. Snapshot like any other volume will take up space in the volume group. So, if you plan to use snapshots for backing up your root partition do not allocate 100% of your volume group for root logical volume.

You create snapshot logical volumes just like normal ones.

lvcreate --size 100M --snapshot --name snap01 /dev/vg0/lv

With that volume, you may modify less than 100 MiB of data, before the snapshot volume fills up.

Reverting the modified 'lv' logical volume to the state when the 'snap01' snapshot was taken can be done with

 lvconvert --merge /dev/vg0/snap01

In case the origin logical volume is active, merging will occur on the next reboot (merging can be done even from a LiveCD). Note: The snapshot will no longer exist after merging.

Also multiple snapshots can be taken and each one can be merged with the origin logical volume at will.

The snapshot can be mounted and backed up with dd or tar. The size of the backup file done with dd will be the size of the files residing on the snapshot volume. To restore just create a snapshot, mount it, and write or extract the backup to it. And then merge it with the origin.

Snapshots are primarily used to provide a frozen copy of a file system to make backups; a backup taking two hours provides a more consistent image of the file system than directly backing up the partition.

See Create root filesystem snapshots with LVM for automating the creation of clean root file system snapshots during system startup for backup and rollback.

See also https://tutonics.com/2012/12/lvm-guide-part-2-snapshots.html on how to use a snapshot to secure your system state before making changes.

LV cache

create cache

The fast method is creating a PV (if necessary) on the fast disk (replace X with your drive letter) and add it to the existing volume group:

vgextend dataVG /dev/sdX

Create a cache pool with automatic meta data on sdX, and convert the existing logical volume (dataLV) to a cached volume, all in one step:

lvcreate --type cache --cachemode writethrough -L 20G -n dataLV_cachepool dataVG/dataLV /dev/sdX

Obviously, if you want your cache to be bigger, you can change the -L parameter to a different size. Note: Cachemode has two possible options:

  • writethrough ensures that any data written will be stored both in the cache pool LV and on the origin LV. The loss of a device associated with the cache pool LV in this case would not mean the loss of any data;
  • writeback ensures better performance, but at the cost of a higher risk of data loss in case the drive used for cache fails.

If a specific --cachemode is not indicated, the system will assume writethrough as default.

Remove cache

If you ever need to undo the one step creation operation above:

lvconvert --uncache dataVG/dataLV

This commits any pending writes still in the cache back to the origin LV, then deletes the cache. Other options are available and described in lvmcache(7).

RAID

From lvmraid(7):

  • lvm(8) RAID is a way to create a Logical Volume (LV) that uses multiple physical devices to improve performance or tolerate device failures. In LVM, the physical devices are Physical Volumes (PVs) in a single Volume Group (VG).

LVM RAID supports RAID 0, RAID 1, RAID 4, RAID 5, RAID 6 and RAID 10. See Wikipedia:Standard RAID levels for details on each level.

cache

fsck count

You can reset the fsck count via tune2fs if your LVM file systems are ext.

  • locate the mapped drives you want to alter the fsck count for via:
fsck -N
:* on padme.server we have:
[/usr/sbin/fsck.ext4 (1) -- /] fsck.ext4 /dev/mapper/os-root 
[/usr/sbin/fsck.vfat (1) -- /boot/efi] fsck.vfat /dev/sdb1 
[/usr/sbin/fsck.ext4 (1) -- /hdd] fsck.ext4 /dev/mapper/hdd-hdstore 
[/usr/sbin/fsck.ext4 (1) -- /ssd] fsck.ext4 /dev/mapper/ssd-ssdstore 
  • now list the volume file system e.g.
tune2fs -l /dev/mapp/os-root
tune2fs 1.46.2 (28-Feb-2021)
Filesystem volume name:   root
Last mounted on:          /
Filesystem UUID:          a745eed1-e988-440b-869c-ea16525eee41
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              1892352
Block count:              7567360
Reserved block count:     378368
Free blocks:              2461154
Free inodes:              1449221
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Wed Jan 20 19:00:53 2021
Last mount time:          Fri Feb 18 12:03:25 2022
Last write time:          Fri Feb 18 12:03:25 2022
Mount count:              338
Maximum mount count:      -1
Last checked:             Sun Jan 24 16:51:26 2021
Check interval:           0 (<none>)
Lifetime writes:          255 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
First orphan inode:       1706072
Default directory hash:   half_md4
Directory Hash Seed:      790edd61-9039-4383-af73-85b601163e31
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0x5dde0897
  • set the fsck count via:
tune2fs -c 5 /dev/mapper/os-root

Configuration Files

Configuration:

  • /etc/lvm/lvm.conf - the central config read by tools
  • /etc/lv/lvm_hosttag.conf - cluster configuration
  • /etc/lvm/profile - where configuration profiles are stored
  • etc/lvm/cache/.cache - device name filter cache file
  • /etc/lvm/backuo - back for configuration
  • /etc/lvm/archive - archive for configuration
  • /var/lock/lvm - single host configuration used to lock files to prevent parallel tool runs from corrupting the meta-data.

You can display the LVM configuration

root@c3po:/etc/lvm# lvmconfig
config {
	checks=1
	abort_on_errors=0
	profile_dir="/etc/lvm/profile"
}
dmeventd {
	mirror_library="libdevmapper-event-lvm2mirror.so"
	snapshot_library="libdevmapper-event-lvm2snapshot.so"
	thin_library="libdevmapper-event-lvm2thin.so"
}
activation {
	checks=0
	udev_sync=1
	udev_rules=1
	verify_udev_operations=0
	retry_deactivation=1
	missing_stripe_filler="error"
	use_linear_target=1
	reserved_stack=64
	reserved_memory=8192
	process_priority=-18
	raid_region_size=512
	readahead="auto"
	raid_fault_policy="warn"
	mirror_image_fault_policy="remove"
	mirror_log_fault_policy="allocate"
	snapshot_autoextend_threshold=75
	snapshot_autoextend_percent=20
	thin_pool_autoextend_threshold=100
	thin_pool_autoextend_percent=20
	use_mlockall=0
	monitoring=1
	polling_interval=15
	activation_mode="degraded"
}
global {
	umask=63
	test=0
	units="h"
	si_unit_consistency=1
	suffix=1
	activation=1
	proc="/proc"
	etc="/etc"
	locking_type=1
	wait_for_locks=1
	fallback_to_clustered_locking=1
	fallback_to_local_locking=1
	locking_dir="/run/lock/lvm"
	prioritise_write_locks=1
	abort_on_internal_errors=0
	detect_internal_vg_cache_corruption=0
	metadata_read_only=0
	mirror_segtype_default="raid1"
	raid10_segtype_default="raid10"
	sparse_segtype_default="thin"
	use_lvmetad=1
	use_lvmlockd=0
	system_id_source="none"
	use_lvmpolld=1
	notify_dbus=1
}
shell {
	history_size=100
}
backup {
	backup=1
	backup_dir="/etc/lvm/backup"
	archive=1
	archive_dir="/etc/lvm/archive"
	retain_min=10
	retain_days=30
}
log {
	verbose=0
	silent=0
	syslog=1
	overwrite=0
	level=0
	indent=1
	command_names=0
	prefix="  "
	activation=0
	debug_classes=["memory","devices","activation","allocation","lvmetad","metadata","cache","locking","lvmpolld","dbus"]
}
allocation {
	maximise_cling=1
	use_blkid_wiping=1
	wipe_signatures_when_zeroing_new_lvs=1
	mirror_logs_require_separate_pvs=0
	cache_pool_metadata_require_separate_pvs=0
	thin_pool_metadata_require_separate_pvs=0
}
devices {
	dir="/dev"
	scan="/dev"
	obtain_device_list_from_udev=1
	external_device_info_source="none"
	filter=["a|^/dev/sd.*|","r|^/dev/cdrom|"]
	cache_dir="/run/lvm"
	cache_file_prefix=""
	write_cache_state=1
	sysfs_scan=1
	multipath_component_detection=1
	md_component_detection=1
	fw_raid_component_detection=0
	md_chunk_alignment=1
	data_alignment_detection=1
	data_alignment=0
	data_alignment_offset_detection=1
	ignore_suspended_devices=0
	ignore_lvm_mirrors=1
	disable_after_error_count=0
	require_restorefile_with_uuid=1
	pv_min_size=2048
	issue_discards=0
	allow_changes_with_duplicate_pvs=0
}

Resizing FS

Once a LV have been resized, the resident filesystem must also be resized:

e2fsck /dev/volume_group/volume
resize2fs /dev/volume_group/volume

See Resize fs for other leads.

Backup and Restore

This example is for a system running LVM2 with two partitions, the first /dev/sda1 is /boot and the second /dev/sda2 is an LVM2_Member for the / PV.

on live system

Note: If you live booted you may need to do the following:

  • download lvm2
apt install lvm2
  • activate LVM groups
vgchange -a y

This technique highlighted that I needed some free extents available to make a snapshot. Since the /dev/mapper/os/root LVM was filled to 100% I downsized the hdd PV and made a new partition of 20G and extended /os/root onto it, which enabled me to make the snapshot. (I need to read up on LVM to better understand extents and extending / to see if there is a way this can be fixed offline instead of with vgextend on the live system.)

  • create a snapshot (note you only need a snapshot size big enough to handle any changes that are occuring while you take the backup.
lvcreate –L 100M –s /dev/vg/lvm –n lvm-snapshot
  • mount the snapshot
mount /dev/vg/lv-snapshot /mount-point
  • archive with tar
tar –zcvpzf   /backup/snapshot.tar.gz  /mount-point
  • save the boot partition (it may be a UEFI or a legacy boot and cannot reside on LVM)
dd if=/dev/sdxn of=boot-backup.img
  • note the uid of root:
pvscan –u > uuid-root.txt
  • preserve details of physical drives
lvdisplay > lvdisplay.txt

restore onto existing system

If you need to copy a LVM snapshot source copy onto your os root (/) disk, you must boot a recovery or live image which includes LVM2.

  • if you do not have LVM2 in the live image obtain across the www via the package manager
apt update
apt install lvm2
  • you might also want a few tools e.g.
apt install lz4 pv
  • now scan your prior system volumes:
vgscan
  • make what you need active (I am taking the lazy way and activate the lot):
vgchange -ay
lsblk
NAME              MAJ:MIN RM   SIZE  RO TYPE MOUNTPOINT
sda                 8:0    0   931.5G 0 disk 
├─sda1              8:1    0    15.4G 0 part [swap]
├─sda2              8:2    0   916.1G 0 part /boot
  └─hdd-hddstore  254:2    0   916.1G 0 lvm  /hdd
sdb                 8:16   0   111.8G 0 disk 
├─sdb1              8:17   0     467M 0 part 
├─sdb2              8:18   0    28.9G 0 part   
| └─os-root       254:1    0    28.9G 0 lvm  /
├─sdb3              8:18   0    82.5G 0 part   
  └─sdd-ssdstore  254:0    0    26.8G 0 lvm  /ssd
  • now mount your root partition
mkdir /mnt/root
mount /dev/os/root /mnt/root
  • examine
ls /mnt/root
  • (optional)
rm -fr /mnt/root/*
  • now overwrite the root with the backup image
cd /mnt/root
tar xvf snapshot.tar

on new system

To restore the / partition from your backup you need a Linux Live-CD that supports LVM, such as Knoppix or the Debian (or you can attach your live boot to the network and obtain the lvm2 package):

  • prerequisties
apt update
apt install lvm2
apt install dd tar lz4 pv
  • repartition your main disk for /
dd if=boot-backup.img of=/dev/sda1 bs=2M
  • examine your former uuid for /
more uuid-root.txt
  • create a pv for p2
  1. pvcreate –uuid uuid /dev/sda2
  1. more /backup/lvdisplay
pvcreate Volume-Group-name /dev/sda2
  • create the LV for /
//                      LVM  VG
lvcreate –l 100%FREE –n root os
  • mount the partition
mount /dev/Volume-Group/Logic-Volume /mount-point
  • restore the former contents
cd /mount-point
tar –xvzf  snapshot.tar.gz
  • umount
umount  /mount-point
  • test
reboot

Note: lvm stores backups of its metadata in /etc/lvm/backup. Each volume group will have a file, which will list the uuid for each PV.

You may also take tar snapshots of any other PVs that you wish to copy to another machine. This will be faster than performing disk and partition imaging.

Tricks

where are things

If you want to now where files and directories are located then type

df <fullpath>

move an LV from one VG to another

dd image copy

umount /somedir/
lvdisplay /dev/vgsource/lv0 --units b
lvcreate -L 12345b -n lv0 vgtarget
dd if=/dev/vgsource/lv0 of=/dev/vgtarget/lv0 bs=1024K conv=noerror,sync status=progress
mount /dev/vgtarget/lv0 /somedir/

if everything is good, remove the source

via partclone

lvremove vgsource/lv0 If you need to copy a logical volume from VG A to another VG B, I found a interesting variant using partclone. The snapshot then copy with dd is a good method but might be slow if your file-systems are not full. This solution is very fast because it copy only the used blocks.

First create a snapshot of the source LV

lvcreate --snapshot --size 1G /dev/sourcevg/lv --name lv-backup

the --size here is how much write can occur before the snapshot will be disabled

Create the destination LV in the destination VG

lvcreate --size <new_lv_size> /dev/destvg --name newlv

new_lv_size must be at least the size of the source LV

Copy the file-system from source lv backup to destination LV

partclone.<fs_type> --dev-to-dev --source /dev/sourcevg/lv-backup --output /dev/destvg/newlv

fs_type can be ext4, fat32, btrfs, xfs, ... any FS supported by partclone

Delete the snapshot

lvremove /dev/sourcevg/lv-backup

abort pvmove

If you are running a pvmove and you have a crash you can:

pvmove  // with no arguments and the move will resume
pvmove --abort // and the pvmove (and mirror) will wind-back
  • no lvmetad
 pvs
 WARNING: Not using lvmetad because duplicate PVs were found.
 WARNING: Use multipath or vgimportclone to resolve duplicate PVs?
 WARNING: After duplicates are resolved, run "pvscan --cache" to enable lvmetad.
 PV         VG      Fmt  Attr PSize   PFree  
 /dev/sda3  c3po-vg lvm2 a--    1.82t 648.75g
 /dev/sdb1  hdd     lvm2 a--  919.17g 919.17g
 /dev/sdb2  swap    lvm2 a--   32.00g      0 
 /dev/sdb3  hdd     lvm2 a--  911.84g 390.84g

more PV stuff

removable drives

Suspend/resume with LVM and removable media

In order for LVM to work properly with removable media – like an external USB drive – the volume group of the external drive needs to be deactivated before suspend. If this is not done, you may get 'buffer I/O errors on the dm device (after resume). For this reason, it is not recommended to mix external and internal drives in the same volume group.

To automatically deactivate the volume groups with external USB drives, tag each volume group with the sleep_umount tag in this way:

vgchange --addtag sleep_umount vg_external

Configuration changes

  • configuration is stored in
/etc/lv/lvm.conf
  • configuration may be examined and operated on by
lvm lvconfig
  • after substantive changes, including fstab, make a new kernel initramfs on Debian:
cp -p /boot/initrd.img-$(uname -r) /boot/initrd.img-$(uname -r)$(date +%Y%m%dT%H%M%S).bak
update-initramfs
  • on Centos:
cp -p /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.$(date +%Y%m%dT%H%M%S).bak
dracut -f

If you receive a warning from udate-initramfs regarding RESUME being different to your swap dev/fs blkid then edit the file

/etc/initramfs-tools/conf.d/resume

and remove the RESUME override.

Problems

References

categories