kernel - saturneric's kernel source tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	dm cache: support shrinking the origin device	Ming-Hung Tsai	2025-03-06	1	-3/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces formal support for shrinking the cache origin by reducing the cache target length via table reloads. Cache blocks mapped beyond the new target length must be clean and are invalidated during preresume. If any dirty blocks exist in the area being removed, the preresume operation fails without setting the NEEDS_CHECK flag in superblock, and the resume ioctl returns EFBIG. The cache device remains suspended until a table reload with target length that fits existing mappings is performed. Without this patch, reducing the cache target length could result in io errors (RHBZ: 2134334), out-of-bounds memory access to the discard bitset, and security concerns regarding data leakage. Verification steps: 1. create a cache metadata with some cached blocks mapped to the tail of the origin device. Here we use cache_restore v1.0 to build a metadata with one clean block mapped to the last origin block. cat <<EOF >> cmeta.xml <superblock uuid="" block_size="128" nr_cache_blocks="512" \ policy="smq" hint_width="4"> <mappings> <mapping cache_block="0" origin_block="4095" dirty="false"/> </mappings> </superblock> EOF dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" cache_restore -i cmeta.xml -o /dev/mapper/cmeta --metadata-version=2 dmsetup remove cmeta 2. bring up the cache whilst shrinking the cache origin by one block: dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524160 linear /dev/sdc 262144" dmsetup create cache --table "0 524160 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" 3. check the number of cached data blocks via dmsetup status. It is expected to be zero. dmsetup status cache \| cut -d ' ' -f 7 In addition to the script above, this patch can be verified using the "cache/resize" tests in dmtest-python: ./dmtest run --rx cache/resize/shrink_origin --result-set default Signed-off-by: Ming-Hung Tsai <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]>
*	dm cache: prevent BUG_ON by blocking retries on failed device resumes	Ming-Hung Tsai	2025-03-06	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A cache device failing to resume due to mapping errors should not be retried, as the failure leaves a partially initialized policy object. Repeating the resume operation risks triggering BUG_ON when reloading cache mappings into the incomplete policy object. Reproduce steps: 1. create a cache metadata consisting of 512 or more cache blocks, with some mappings stored in the first array block of the mapping array. Here we use cache_restore v1.0 to build the metadata. cat <<EOF >> cmeta.xml <superblock uuid="" block_size="128" nr_cache_blocks="512" \ policy="smq" hint_width="4"> <mappings> <mapping cache_block="0" origin_block="0" dirty="false"/> </mappings> </superblock> EOF dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" cache_restore -i cmeta.xml -o /dev/mapper/cmeta --metadata-version=2 dmsetup remove cmeta 2. wipe the second array block of the mapping array to simulate data degradations. mapping_root=$(dd if=/dev/sdc bs=1c count=8 skip=192 \ 2>/dev/null \| hexdump -e '1/8 "%u\n"') ablock=$(dd if=/dev/sdc bs=1c count=8 skip=$((4096*mapping_root+2056)) \ 2>/dev/null \| hexdump -e '1/8 "%u\n"') dd if=/dev/zero of=/dev/sdc bs=4k count=1 seek=$ablock 3. try bringing up the cache device. The resume is expected to fail due to the broken array block. dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc 262144" dmsetup create cache --notable dmsetup load cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" dmsetup resume cache 4. try resuming the cache again. An unexpected BUG_ON is triggered while loading cache mappings. dmsetup resume cache Kernel logs: (snip) ------------[ cut here ]------------ kernel BUG at drivers/md/dm-cache-policy-smq.c:752! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 332 Comm: dmsetup Not tainted 6.13.4 #3 RIP: 0010:smq_load_mapping+0x3e5/0x570 Fix by disallowing resume operations for devices that failed the initial attempt. Signed-off-by: Ming-Hung Tsai <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]>
*	Merge tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux	Linus Torvalds	2024-11-19	1	-2/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull block updates from Jens Axboe: - NVMe updates via Keith: - Use uring_cmd helper (Pavel) - Host Memory Buffer allocation enhancements (Christoph) - Target persistent reservation support (Guixin) - Persistent reservation tracing (Guixen) - NVMe 2.1 specification support (Keith) - Rotational Meta Support (Matias, Wang, Keith) - Volatile cache detection enhancment (Guixen) - MD updates via Song: - Maintainers update - raid5 sync IO fix - Enhance handling of faulty and blocked devices - raid5-ppl atomic improvement - md-bitmap fix - Support for manually defining embedded partition tables - Zone append fixes and cleanups - Stop sending the queued requests in the plug list to the driver ->queue_rqs() handle in reverse order. - Zoned write plug cleanups - Cleanups disk stats tracking and add support for disk stats for passthrough IO - Add preparatory support for file system atomic writes - Add lockdep support for queue freezing. Already found a bunch of issues, and some fixes for that are in here. More will be coming. - Fix race between queue stopping/quiescing and IO queueing - ublk recovery improvements - Fix ublk mmap for 64k pages - Various fixes and cleanups * tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux: (118 commits) MAINTAINERS: Update git tree for mdraid subsystem block: make struct rq_list available for !CONFIG_BLOCK block/genhd: use seq_put_decimal_ull for diskstats decimal values block: don't reorder requests in blk_mq_add_to_batch block: don't reorder requests in blk_add_rq_to_plug block: add a rq_list type block: remove rq_list_move virtio_blk: reverse request order in virtio_queue_rqs nvme-pci: reverse request order in nvme_queue_rqs btrfs: validate queue limits block: export blk_validate_limits nvmet: add tracing of reservation commands nvme: parse reservation commands's action and rtype to string nvmet: report ns's vwc not present md/raid5: Increase r5conf.cache_name size block: remove the ioprio field from struct request block: remove the write_hint field from struct request nvme: check ns's volatile write cache not present nvme: add rotational support nvme: use command set independent id ns if available ...
\| *	block: add a bdev_limits helper	Christoph Hellwig	2024-10-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a helper to get the queue_limits from the bdev without having to poke into the request_queue. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
* \|	dm-cache: fix warnings about duplicate slab caches	Mikulas Patocka	2024-11-11	1	-5/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The commit 4c39529663b9 adds a warning about duplicate cache names if CONFIG_DEBUG_VM is selected. These warnings are triggered by the dm-cache code. The dm-cache code allocates a slab cache for each device. This commit changes it to allocate just one slab cache in the module init function. Signed-off-by: Mikulas Patocka <[email protected]> Fixes: 4c39529663b9 ("slab: Warn on duplicate cache names when DEBUG_VM=y")
* \|	dm cache: fix potential out-of-bounds access on the first resume	Ming-Hung Tsai	2024-11-04	1	-21/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Out-of-bounds access occurs if the fast device is expanded unexpectedly before the first-time resume of the cache table. This happens because expanding the fast device requires reloading the cache table for cache_create to allocate new in-core data structures that fit the new size, and the check in cache_preresume is not performed during the first resume, leading to the issue. Reproduce steps: 1. prepare component devices: dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc 262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct 2. load a cache table of 512 cache blocks, and deliberately expand the fast device before resuming the cache, making the in-core data structures inadequate. dmsetup create cache --notable dmsetup reload cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" dmsetup reload cdata --table "0 131072 linear /dev/sdc 8192" dmsetup resume cdata dmsetup resume cache 3. suspend the cache to write out the in-core dirty bitset and hint array, leading to out-of-bounds access to the dirty bitset at offset 0x40: dmsetup suspend cache KASAN reports: BUG: KASAN: vmalloc-out-of-bounds in is_dirty_callback+0x2b/0x80 Read of size 8 at addr ffffc90000085040 by task dmsetup/90 (...snip...) The buggy address belongs to the virtual mapping at [ffffc90000085000, ffffc90000087000) created by: cache_ctr+0x176a/0x35f0 (...snip...) Memory state around the buggy address: ffffc90000084f00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ffffc90000084f80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 >ffffc90000085000: 00 00 00 00 00 00 00 00 f8 f8 f8 f8 f8 f8 f8 f8 ^ ffffc90000085080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ffffc90000085100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 Fix by checking the size change on the first resume. Signed-off-by: Ming-Hung Tsai <[email protected]> Fixes: f494a9c6b1b6 ("dm cache: cache shrinking support") Cc: [email protected] Signed-off-by: Mikulas Patocka <[email protected]> Acked-by: Joe Thornber <[email protected]>
* \|	dm cache: optimize dirty bit checking with find_next_bit when resizing	Ming-Hung Tsai	2024-11-04	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When shrinking the fast device, dm-cache iteratively searches for a dirty bit among the cache blocks to be dropped, which is less efficient. Use find_next_bit instead, as it is twice as fast as the iterative approach with test_bit. Signed-off-by: Ming-Hung Tsai <[email protected]> Fixes: f494a9c6b1b6 ("dm cache: cache shrinking support") Cc: [email protected] Signed-off-by: Mikulas Patocka <[email protected]> Acked-by: Joe Thornber <[email protected]>
* \|	dm cache: fix out-of-bounds access to the dirty bitset when resizing	Ming-Hung Tsai	2024-11-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dm-cache checks the dirty bits of the cache blocks to be dropped when shrinking the fast device, but an index bug in bitset iteration causes out-of-bounds access. Reproduce steps: 1. create a cache device of 1024 cache blocks (128 bytes dirty bitset) dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 131072 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc 262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct dmsetup create cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" 2. shrink the fast device to 512 cache blocks, triggering out-of-bounds access to the dirty bitset (offset 0x80) dmsetup suspend cache dmsetup reload cdata --table "0 65536 linear /dev/sdc 8192" dmsetup resume cdata dmsetup resume cache KASAN reports: BUG: KASAN: vmalloc-out-of-bounds in cache_preresume+0x269/0x7b0 Read of size 8 at addr ffffc900000f3080 by task dmsetup/131 (...snip...) The buggy address belongs to the virtual mapping at [ffffc900000f3000, ffffc900000f5000) created by: cache_ctr+0x176a/0x35f0 (...snip...) Memory state around the buggy address: ffffc900000f2f80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ffffc900000f3000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffffc900000f3080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ^ ffffc900000f3100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ffffc900000f3180: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 Fix by making the index post-incremented. Signed-off-by: Ming-Hung Tsai <[email protected]> Fixes: f494a9c6b1b6 ("dm cache: cache shrinking support") Cc: [email protected] Signed-off-by: Mikulas Patocka <[email protected]> Acked-by: Joe Thornber <[email protected]>
* \|	dm cache: fix flushing uninitialized delayed_work on cache_ctr error	Ming-Hung Tsai	2024-11-04	1	-9/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An unexpected WARN_ON from flush_work() may occur when cache creation fails, caused by destroying the uninitialized delayed_work waker in the error path of cache_create(). For example, the warning appears on the superblock checksum error. Reproduce steps: dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc 262144" dd if=/dev/urandom of=/dev/mapper/cmeta bs=4k count=1 oflag=direct dmsetup create cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" Kernel logs: (snip) WARNING: CPU: 0 PID: 84 at kernel/workqueue.c:4178 __flush_work+0x5d4/0x890 Fix by pulling out the cancel_delayed_work_sync() from the constructor's error path. This patch doesn't affect the use-after-free fix for concurrent dm_resume and dm_destroy (commit 6a459d8edbdb ("dm cache: Fix UAF in destroy()")) as cache_dtr is not changed. Signed-off-by: Ming-Hung Tsai <[email protected]> Fixes: 6a459d8edbdb ("dm cache: Fix UAF in destroy()") Cc: [email protected] Signed-off-by: Mikulas Patocka <[email protected]> Acked-by: Joe Thornber <[email protected]>
* \|	dm cache: correct the number of origin blocks to match the target length	Ming-Hung Tsai	2024-11-04	1	-4/+4
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When creating a cache device, the actual size of the cache origin might be greater than the specified cache target length. In such case, the number of origin blocks should match the cache target length, not the full size of the origin device, since access beyond the cache target is not possible. This issue occurs when reducing the origin device size using lvm, as lvreduce preloads the new cache table before resuming the cache origin, which can result in incorrect sizes for the discard bitset and smq hotspot blocks. Reproduce steps: 1. create a cache device consists of 4096 origin blocks dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc 262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct dmsetup create cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" 2. reduce the cache origin to 2048 oblocks, in lvreduce's approach dmsetup reload corig --table "0 262144 linear /dev/sdc 262144" dmsetup reload cache --table "0 262144 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" dmsetup suspend cache dmsetup suspend corig dmsetup suspend cdata dmsetup suspend cmeta dmsetup resume corig dmsetup resume cdata dmsetup resume cmeta dmsetup resume cache 3. shutdown the cache, and check the number of discard blocks in superblock. The value is expected to be 2048, but actually is 4096. dmsetup remove cache corig cdata cmeta dd if=/dev/sdc bs=1c count=8 skip=224 2>/dev/null \| hexdump -e '1/8 "%u\n"' Fix by correcting the origin_blocks initialization in cache_create and removing the unused origin_sectors from struct cache_args accordingly. Signed-off-by: Ming-Hung Tsai <[email protected]> Fixes: c6b4fcbad044 ("dm: add cache target") Cc: [email protected] Signed-off-by: Mikulas Patocka <[email protected]> Acked-by: Joe Thornber <[email protected]>
*	dm: fix spelling errors	Shen Lichuan	2024-09-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Fixed some confusing spelling errors that were currently identified, the details are as follows: -in the code comments: dm-cache-target.c: 1371: exclussive ==> exclusive dm-raid.c: 2522: repective ==> respective Signed-off-by: Shen Lichuan <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]>
*	dm-cache: remove pointless error check	Dipendra Khadka	2024-09-26	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Smatch reported following: ''' drivers/md/dm-cache-target.c:3204 parse_cblock_range() warn: sscanf doesn't return error codes drivers/md/dm-cache-target.c:3217 parse_cblock_range() warn: sscanf doesn't return error codes ''' Sscanf doesn't return negative values at all. Signed-off-by: Dipendra Khadka <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]>
*	dm: stop using blk_limits_io_{min,opt}	Christoph Hellwig	2024-07-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Remove use of the blk_limits_io_{min,opt} and assign the values directly to the queue_limits structure. For the io_opt this is a completely mechanical change, for io_min it removes flooring the limit to the physical and logical block size in the particular caller. But as blk_validate_limits will do the same later when actually applying the limits, there still is no change in overall behavior. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]>
*	block: remove the discard_alignment flag	Christoph Hellwig	2024-06-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \|	queue_limits.discard_alignment is never read except in the places where it is stacked into another limit. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
*	Merge tag 'for-6.10/dm-fixes' of ↵	Linus Torvalds	2024-05-21	1	-3/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM discard regressions due to DM core switching over to using queue_limits_set() without DM core and targets first being updated to set (and stack) discard limits in terms of max_hw_discard_sectors and not max_discard_sectors - Fix stable@ DM integrity discard support to set device's discard_granularity limit to the device's logical block size * tag 'for-6.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: always manage discard support in terms of max_hw_discard_sectors dm-integrity: set discard_granularity to logical block size
\| *	dm: always manage discard support in terms of max_hw_discard_sectors	Mike Snitzer	2024-05-20	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 4f563a64732d ("block: add a max_user_discard_sectors queue limit") changed block core to set max_discard_sectors to: min(lim->max_hw_discard_sectors, lim->max_user_discard_sectors) Since commit 1c0e720228ad ("dm: use queue_limits_set") it was reported dm-thinp was failing in a few fstests (generic/347 and generic/405) with the first WARN_ON_ONCE in dm_cell_key_has_valid_range() being reported, e.g.: WARNING: CPU: 1 PID: 30 at drivers/md/dm-bio-prison-v1.c:128 dm_cell_key_has_valid_range+0x3d/0x50 blk_set_stacking_limits() sets max_user_discard_sectors to UINT_MAX, so given how block core now sets max_discard_sectors (detailed above) it follows that blk_stack_limits() stacks up the underlying device's max_hw_discard_sectors and max_discard_sectors is set to match it. If max_hw_discard_sectors exceeds dm's BIO_PRISON_MAX_RANGE, then dm_cell_key_has_valid_range() will trigger the warning with: WARN_ON_ONCE(key->block_end - key->block_begin > BIO_PRISON_MAX_RANGE) Aside from this warning, the discard will fail. Fix this and other DM issues by governing discard support in terms of max_hw_discard_sectors instead of max_discard_sectors. Reported-by: Theodore Ts'o <[email protected]> Fixes: 1c0e720228ad ("dm: use queue_limits_set") Signed-off-by: Mike Snitzer <[email protected]>
* \|	dm: use bio_list_merge_init	Christoph Hellwig	2024-04-01	1	-8/+4
\|/ \| \| \| \| \| \| \| \| \| \| \|	Use bio_list_merge_init instead of open coding bio_list_merge and bio_list_init. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Mike Snitzer <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
*	block: replace fmode_t with a block-specific type for block open flags	Christoph Hellwig	2023-06-12	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Jack Wang <[email protected]> [rnbd] Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
*	dm: push error reporting down to dm_register_target()	Yangtao Li	2023-04-11	1	-1/+0
\| \| \| \| \| \| \| \|	Simplifies each DM target's init method by making dm_register_target() responsible for its error reporting (on behalf of targets). Signed-off-by: Yangtao Li <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm cache: add cond_resched() to various workqueue loops	Mike Snitzer	2023-02-17	1	-0/+4
\| \| \| \| \| \| \|	Otherwise on resource constrained systems these workqueues may be too greedy. Signed-off-by: Mike Snitzer <[email protected]>
*	dm: declare variables static when sensible	Heinz Mauelshagen	2023-02-14	1	-1/+1
\| \| \| \| \|	Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm: fix suspect indent whitespace	Heinz Mauelshagen	2023-02-14	1	-1/+1
\| \| \| \| \|	Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm: add missing empty lines	Heinz Mauelshagen	2023-02-14	1	-1/+11
\| \| \| \| \|	Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm: avoid spaces before function arguments or in favour of tabs	Heinz Mauelshagen	2023-02-14	1	-6/+6
\| \| \| \| \|	Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm: correct block comments format.	Heinz Mauelshagen	2023-02-14	1	-21/+37
\| \| \| \| \|	Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm: change "unsigned" to "unsigned int"	Heinz Mauelshagen	2023-02-14	1	-25/+25
\| \| \| \| \|	Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm: add missing SPDX-License-Indentifiers	Heinz Mauelshagen	2023-02-14	1	-0/+1
\| \| \| \| \| \| \| \| \|	'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has deprecated its use. Suggested-by: John Wiele <[email protected]> Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm cache: set needs_check flag after aborting metadata	Mike Snitzer	2022-12-01	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Otherwise the commit that will be aborted will be associated with the metadata objects that will be torn down. Must write needs_check flag to metadata with a reset block manager. Found through code-inspection (and compared against dm-thin.c). Cc: [email protected] Fixes: 028ae9f76f29 ("dm cache: add fail io mode and needs_check flag") Signed-off-by: Mike Snitzer <[email protected]>
*	dm cache: Fix UAF in destroy()	Luo Meng	2022-11-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Dm_cache also has the same UAF problem when dm_resume() and dm_destroy() are concurrent. Therefore, cancelling timer again in destroy(). Cc: [email protected] Fixes: c6b4fcbad044e ("dm: add cache target") Signed-off-by: Luo Meng <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm cache: fix typo in 2 comment blocks	Steven Lung	2022-07-07	1	-1/+1
\| \| \| \| \| \| \|	Replace neccessarily with necessarily. Signed-off-by: Steven Lung <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	block: remove QUEUE_FLAG_DISCARD	Christoph Hellwig	2022-04-18	1	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Acked-by: Christoph Böhmwalder <[email protected]> [drbd] Acked-by: Jan Höppner <[email protected]> [s390] Acked-by: Coly Li <[email protected]> [bcache] Acked-by: David Sterba <[email protected]> [btrfs] Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
*	dm cache: use dm_submit_bio_remap	Mike Snitzer	2022-03-10	1	-3/+4
\| \| \| \|	Signed-off-by: Mike Snitzer <[email protected]>
*	dm: stop using bdevname	Christoph Hellwig	2022-03-02	1	-6/+4
\| \| \| \| \| \| \| \|	Just use the %pg format specifier instead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	block: pass a block_device to bio_clone_fast	Christoph Hellwig	2022-02-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Pass a block_device to bio_clone_fast and __bio_clone_fast and give the functions more suitable names. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Mike Snitzer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
*	dm-cache: remove __remap_to_origin_clear_discard	Christoph Hellwig	2022-02-04	1	-16/+8
\| \| \| \| \| \| \| \| \| \|	Fold __remap_to_origin_clear_discard into the two callers to prepare for bio cloning refactoring. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Mike Snitzer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
*	dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them	Christoph Hellwig	2021-10-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Use the proper helpers to read the block device size. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Mike Snitzer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
*	dm: update target status functions to support IMA measurement	Tushar Sugandhi	2021-08-10	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For device mapper targets to take advantage of IMA's measurement capabilities, the status functions for the individual targets need to be updated to handle the status_type_t case for value STATUSTYPE_IMA. Update status functions for the following target types, to log their respective attributes to be measured using IMA. 01. cache 02. crypt 03. integrity 04. linear 05. mirror 06. multipath 07. raid 08. snapshot 09. striped 10. verity For rest of the targets, handle the STATUSTYPE_IMA case by setting the measurement buffer to NULL. For IMA to measure the data on a given system, the IMA policy on the system needs to be updated to have the following line, and the system needs to be restarted for the measurements to take effect. /etc/ima/ima-policy measure func=CRITICAL_DATA label=device-mapper template=ima-buf The measurements will be reflected in the IMA logs, which are located at: /sys/kernel/security/integrity/ima/ascii_runtime_measurements /sys/kernel/security/integrity/ima/binary_runtime_measurements These IMA logs can later be consumed by various attestation clients running on the system, and send them to external services for attesting the system. The DM target data measured by IMA subsystem can alternatively be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with DM_TABLE_STATUS_CMD. Signed-off-by: Tushar Sugandhi <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm io tracker: factor out IO tracker	Mike Snitzer	2021-06-25	1	-76/+6
\| \| \| \| \| \|	Allow other code to use dm_io_tracker. Signed-off-by: Mike Snitzer <[email protected]>
*	dm cache: remove needless request_queue NULL pointer checks	Xu Wang	2021-03-26	1	-1/+1
\| \| \| \| \| \| \| \| \|	Since commit ff9ea323816d ("block, bdi: an active gendisk always has a request_queue associated with it") the request_queue pointer returned from bdev_get_queue() shall never be NULL. Signed-off-by: Xu Wang <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm cache: simplify the return expression of load_mapping()	Zheng Yongjun	2020-12-22	1	-6/+1
\| \| \| \| \| \| \|	Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	Revert "dm cache: fix arm link errors with inline"	Nick Desaulniers	2020-12-01	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 43aeaa29573924df76f44eda2bbd94ca36e407b5. Since commit 0bddd227f3dc ("Documentation: update for gcc 4.9 requirement") the minimum supported version of GCC is gcc-4.9. It's now safe to remove this code. Link: https://github.com/ClangBuiltLinux/linux/issues/427 Signed-off-by: Nick Desaulniers <[email protected]> Acked-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm: use dm_table_get_device_name() where appropriate in targets	Mike Snitzer	2020-09-29	1	-1/+1
\| \| \| \| \| \| \|	dm_table_get_device_name() avoids calling dm_table_get_md() followed by dm_device_name() -- saves intermediate dm_table_get_md() call. Signed-off-by: Mike Snitzer <[email protected]>
*	writeback: remove bdi->congested_fn	Christoph Hellwig	2020-07-08	1	-19/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Except for pktdvd, the only places setting congested bits are file systems that allocate their own backing_dev_info structures. And pktdvd is a deprecated driver that isn't useful in stack setup either. So remove the dead congested_fn stacking infrastructure. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: David Sterba <[email protected]> [axboe: fixup unused variables in bcache/request.c] Signed-off-by: Jens Axboe <[email protected]>
*	block: rename generic_make_request to submit_bio_noacct	Christoph Hellwig	2020-07-01	1	-3/+3
\| \| \| \| \| \| \| \| \|	generic_make_request has always been very confusingly misnamed, so rename it to submit_bio_noacct to make it clear that it is submit_bio minus accounting and a few checks. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
*	dm: bump version of core and various targets	Mike Snitzer	2020-03-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Changes made during the 5.6 cycle warrant bumping the version number for DM core and the targets modified by this commit. It should be noted that dm-thin, dm-crypt and dm-raid already had their target version bumped during the 5.6 merge window. Signed-off-by; Mike Snitzer <[email protected]>
*	dm cache: fix a crash due to incorrect work item cancelling	Mikulas Patocka	2020-02-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The crash can be reproduced by running the lvm2 testsuite test lvconvert-thin-external-cache.sh for several minutes, e.g.: while :; do make check T=shell/lvconvert-thin-external-cache.sh; done The crash happens in this call chain: do_waker -> policy_tick -> smq_tick -> end_hotspot_period -> clear_bitset -> memset -> __memset -- which accesses an invalid pointer in the vmalloc area. The work entry on the workqueue is executed even after the bitmap was freed. The problem is that cancel_delayed_work doesn't wait for the running work item to finish, so the work item can continue running and re-submitting itself even after cache_postsuspend. In order to make sure that the work item won't be running, we must use cancel_delayed_work_sync. Also, change flush_workqueue to drain_workqueue, so that if some work item submits itself or another work item, we are properly waiting for both of them. Fixes: c6b4fcbad044 ("dm: add cache target") Cc: [email protected] # v3.9 Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm cache: replace spin_lock_irqsave with spin_lock_irq	Mikulas Patocka	2019-11-05	1	-49/+28
\| \| \| \| \| \| \| \| \| \| \| \|	If we are in a place where it is known that interrupts are enabled, functions spin_lock_irq/spin_unlock_irq should be used instead of spin_lock_irqsave/spin_unlock_irqrestore. spin_lock_irq and spin_unlock_irq are faster because they don't need to push and pop the flags register. Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm cache: fix bugs when a GFP_NOWAIT allocation fails	Mikulas Patocka	2019-10-17	1	-26/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GFP_NOWAIT allocation can fail anytime - it doesn't wait for memory being available and it fails if the mempool is exhausted and there is not enough memory. If we go down this path: map_bio -> mg_start -> alloc_migration -> mempool_alloc(GFP_NOWAIT) we can see that map_bio() doesn't check the return value of mg_start(), and the bio is leaked. If we go down this path: map_bio -> mg_start -> mg_lock_writes -> alloc_prison_cell -> dm_bio_prison_alloc_cell_v2 -> mempool_alloc(GFP_NOWAIT) -> mg_lock_writes -> mg_complete the bio is ended with an error - it is unacceptable because it could cause filesystem corruption if the machine ran out of memory temporarily. Change GFP_NOWAIT to GFP_NOIO, so that the mempool code will properly wait until memory becomes available. mempool_alloc with GFP_NOIO can't fail, so remove the code paths that deal with allocation failure. Cc: [email protected] Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
*	dm cache: add support for discard passdown to the origin device	Mike Snitzer	2019-03-05	1	-26/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	DM cache now defaults to passing discards down to the origin device. User may disable this using the "no_discard_passdown" feature when creating the cache device. If the cache's underlying origin device doesn't support discards then passdown is disabled (with warning). Similarly, if the underlying origin device's max_discard_sectors is less than a cache block discard passdown will be disabled (this is required because sizing of the cache internal discard bitset depends on it). Signed-off-by: Mike Snitzer <[email protected]>
*	dm: eliminate 'split_discard_bios' flag from DM target interface	Mike Snitzer	2019-02-21	1	-1/+0
\| \| \| \| \| \| \| \| \|	There is no need to have DM core split discards on behalf of a DM target now that blk_queue_split() handles splitting discards based on the queue_limits. A DM target just needs to set max_discard_sectors, discard_granularity, etc, in queue_limits. Signed-off-by: Mike Snitzer <[email protected]>