Steinar H. Gunderson

Mon, 19 Aug 2013 - Whole-disk dm-cache

dm-cache is an interesting new technology in the 3.10 kernel onwards; basically, it is a way to use SSDs as a cache layer in front of rotating media, supposedly getting the capacity of the latter and the speed of the former, similar to how the page cache already tries to exploit the good properties of both RAM and disks. (This is, historically, nothing new; for instance, ZFS has had this ability for years, in the form of a patented cache algorithm called L2ARC.)

dm-cache is not the only technology that does this; it competes with, for instance, bcache (also merged in 3.10). However, bcache expects you to format the data volume, which was a no-go in my case: What I wanted, was for dm-cache to sit below my main RAID-6 LVM (which has tons of volumes), without having to erase anything.

This is all a bit raw. Bear with me.

First of all, after a new enough kernel has been installed (you probably want 3.11-rc-something, actually), we want some basic scripts to hook onto initramfs-tools and so on. I used dmcache-tools, and simply converted it to a Debian package with alien. It comes with a tool called dmcache-format-blockdev that tries to partition your block device as an LVM, split into blocks and metadata volumes (seemingly they are separate in case you want e.g. RAID-1 for your metadata only), but I found it to make a metadata volume that was too small for my use. I ended up with 512MB for metadata and then the rest for blocks.

The next part is how to get startup right. First of all, we want an /etc/cachetab so that dmcache-load-cachetab knows how to set up the cache:

cache:/dev/cache/metadata:/dev/cache/blocks:/dev/md1:1024:1 writeback default 4 random_threshold 8 sequential_threshold 512
This gives you a new /dev/mapper/cache that's basically identical to /dev/md1 except faster du to the extra cache. Then, you'll have to tell LVM that it should never try to use /dev/md1 as a physical volume on its own (that would be very bad if the cache had dirty blocks!), so /etc/lvm/lvm.conf needs to contain something like:
    filter = [ "a/md2/", "r/md/", "a/.*/" ]
Note that my SSD RAID is on md2, so I'll need to make an exception for it. LVM aficionados will probably know of something more efficient here (r/md1/ didn't work for me, since there's also /dev/md/1 and possibly others). Then, we need to get everything set up right during boot. This is governed by /sbin/dmcache-load-cachetab. Unfortunately, LVM is not started by udev, but rather late in the process, so /dev/cache/blocks and /dev/cache/mapper are not available when dmcache-load-cachetab runs! I hacked that in, just before the “Devices not ready?” comment, by simply adding the LVM load line used elsewhere in the initramfs:
/sbin/lvm vgchange -aly --ignorelockingfailure
Finally, we need to make sure the hook is installed in the first place. The hook script has a line to check if dm-cache is needed for the root volume, but it's far too simplistic, so I simply changed /usr/share/initramfs-tools/hooks/dmcache so that should_install() always returned true:
should_install() {
        # sesse hack
        echo yes

After that, all you need to do is clear the first few kilobytes of the metadata filesystem using dd, update the initramfs, and voila! Cache.

It would seem the code in the kernel is still a bit young; it has memory allocation issues and doesn't cache all that aggressively yet, but most of my writes are already going to the cache, and an increasing amount of reads, so I think this is going to be quite OK in a few revisions.

The integration with Debian could use some work, though =)

[23:24] | | Whole-disk dm-cache

Steinar H. Gunderson <>