Prevent linux system freezes because of dm-crypt
Sun, Jun 25, 2023 ❝dm-crypt does not always handle I/O gracefully when slow storage devices are involved.❞Contents
TL;DR dm-crypt
is designed to include (global) workqueues for its processing of block I/O operations. The design is somewhat aged, and consequently some effects are mostly undesirable, because block devices and layered filesystems have evolved past the need. Workqueues can be disabled with a configuration option. This avoids freezes because of device-mapper operations on slow media.
This seems to be an issue that, although recognized and investigated, remains largely unknown. This is unfortunate, because the fix is available and fairly simple.
The issue: linux system freezes during data transfers
If you use encrypted partitions in linux, you most likely rely on dm-crypt
. If most of your storage hardware also uses encrypted partitions, then you will be using dm-crypt
for your operating system as well as significantly slower media, e.g. slow/external harddisks or SD-cards. If you now perform large data transfers onto a slow medium, you might notice that the transfer speeds vary wildly and there may be intermittend freezes. The tinkerers among us might notice that it is hard to discern how well different block queue schedulers perform, i.e. notice (“feel”) the difference in behavior. That’s because dm-crypt
by default uses workqueues to manage its data processing, and even individual I/O operations queue up multiple times.
The article Cloudflare blog: Speeding up Linux disk encryption explains everything in far more details.
The queues contain the operations for those dm-crypt
devices. Some behavior that you’ll see is: writing to a slow medium like an SD-card will happen in large bursts of 50+ MB/s where no actual I/O is performed, then tanking to 3 or 4 MB/s when actually writing. If you happen to use the system drive, that might perform write actions, and those get mixed in. So now everything comes to a stand still when to slow SD-card suddenly gets all the I/O to perform, and everything else has to wait. In addition, block queue schedulers like bfq
have a reputation of being responsive, but dm-crypt
interferes with their behavior.
Mounting options
These problems are easily solved by, at least, disable workqueues on the system drive. To check whether workqueues are in use:
> sudo dmsetup table
mydevice_crypt: 0 12345678 crypt aes-xts-plain64 :64:logon:cryptsetup:00000000-0000-0000-0000-000000000000-00 0 8:3 32768 3 allow_discards
The example above lists all device mapper devices. For any crypt
device it lists the active options at the end. If no_read_workqueue
and no_write_workqueue
are absent, then workqueues are in use.
To disable workqueues in /etc/crypttab
, one can use options no-read-workqueue
and no-write-workqueue
(from kernel 5.9 onward). To disable workqueues for LUKS partitions, one needs to mount them with options --perf-no_read_workqueue
and --perf-no_write_workqueue
. In case of LUKS2 partitions, these flags can be persisted as part an open
or refresh
operations: sudo cryptsetup refresh --perf-no_read_workqueue --perf-no_write_workqueue --persistent mydevice_crypt
.
After disabling workqueues – or mounting with workqueues disabled – the listing should look like this:
> sudo dmsetup table
mydevice_crypt: 0 12345678 crypt aes-xts-plain64 :64:logon:cryptsetup:00000000-0000-0000-0000-000000000000-00 0 8:3 32768 3 allow_discards no_read_workqueue no_write_workqueue
If you disabled the workqueues in LUKS2 with the --persistent
option, then you can find the flags with sudo cryptsetup luksDump /dev/sda3
(where sda3
is the LUKS partition on the block device). Check the flags
line in the information dump.
LUKS header information
[..]
Label: (no label)
Subsystem: (no subsystem)
Flags: no-read-workqueue no-write-workqueue
[..]
I did not fully investigate disadvantages of disabling the workqueues, because the Cloudflare article is already fairly thorough. I vastly prefer not having unnecessary freezes. Without the workqueues, the I/O operations are performed synchronously. Block device I/O works fine when dm-crypt
is not involved. The kernel has advanced significantly. There are block queue schedulers to choose from to influence how responsive vs performant block devices should be. I prefer the simple, straight-forward mechanisms when dm-crypt
’s blockqueues are disabled.