Prevent linux system freezes because of dm-crypt

❝dm-crypt does not always handle I/O gracefully when slow storage devices are involved.❞
Contents

TL;DR dm-crypt is designed to include (global) workqueues for its processing of block I/O operations. The design is somewhat aged, and consequently some effects are mostly undesirable, because block devices and layered filesystems have evolved past the need. Workqueues can be disabled with a configuration option. This avoids freezes because of device-mapper operations on slow media.

This seems to be an issue that, although recognized and investigated, remains largely unknown. This is unfortunate, because the fix is available and fairly simple.

The issue: linux system freezes during data transfers

If you use encrypted partitions in linux, you most likely rely on dm-crypt. If most of your storage hardware also uses encrypted partitions, then you will be using dm-crypt for your operating system as well as significantly slower media, e.g. slow/external harddisks or SD-cards. If you now perform large data transfers onto a slow medium, you might notice that the transfer speeds vary wildly and there may be intermittend freezes. The tinkerers among us might notice that it is hard to discern how well different block queue schedulers perform, i.e. notice (“feel”) the difference in behavior. That’s because dm-crypt by default uses workqueues to manage its data processing, and even individual I/O operations queue up multiple times.

The article Cloudflare blog: Speeding up Linux disk encryption explains everything in far more details.

The queues contain the operations for those dm-crypt devices. Some behavior that you’ll see is: writing to a slow medium like an SD-card will happen in large bursts of 50+ MB/s where no actual I/O is performed, then tanking to 3 or 4 MB/s when actually writing. If you happen to use the system drive, that might perform write actions, and those get mixed in. So now everything comes to a stand still when to slow SD-card suddenly gets all the I/O to perform, and everything else has to wait. In addition, block queue schedulers like bfq have a reputation of being responsive, but dm-crypt interferes with their behavior.

Mounting options

These problems are easily solved by, at least, disable workqueues on the system drive. To check whether workqueues are in use:

> sudo dmsetup table
mydevice_crypt: 0 12345678 crypt aes-xts-plain64 :64:logon:cryptsetup:00000000-0000-0000-0000-000000000000-00 0 8:3 32768 3 allow_discards

The example above lists all device mapper devices. For any crypt device it lists the active options at the end. If no_read_workqueue and no_write_workqueue are absent, then workqueues are in use.

To disable workqueues in /etc/crypttab, one can use options no-read-workqueue and no-write-workqueue (from kernel 5.9 onward). To disable workqueues for LUKS partitions, one needs to mount them with options --perf-no_read_workqueue and --perf-no_write_workqueue. In case of LUKS2 partitions, these flags can be persisted as part an open or refresh operations: sudo cryptsetup refresh --perf-no_read_workqueue --perf-no_write_workqueue --persistent mydevice_crypt.

After disabling workqueues – or mounting with workqueues disabled – the listing should look like this:

> sudo dmsetup table
mydevice_crypt: 0 12345678 crypt aes-xts-plain64 :64:logon:cryptsetup:00000000-0000-0000-0000-000000000000-00 0 8:3 32768 3 allow_discards no_read_workqueue no_write_workqueue

If you disabled the workqueues in LUKS2 with the --persistent option, then you can find the flags with sudo cryptsetup luksDump /dev/sda3 (where sda3 is the LUKS partition on the block device). Check the flags line in the information dump.

LUKS header information
[..]
Label:         	(no label)
Subsystem:     	(no subsystem)
Flags:       	no-read-workqueue no-write-workqueue 
[..]

I did not fully investigate disadvantages of disabling the workqueues, because the Cloudflare article is already fairly thorough. I vastly prefer not having unnecessary freezes. Without the workqueues, the I/O operations are performed synchronously. Block device I/O works fine when dm-crypt is not involved. The kernel has advanced significantly. There are block queue schedulers to choose from to influence how responsive vs performant block devices should be. I prefer the simple, straight-forward mechanisms when dm-crypt’s blockqueues are disabled.