aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems
Commit message (Collapse)AuthorAgeFilesLines
...
| * | fs/netfs: remove unused flag NETFS_SREQ_SEEK_DATA_READMax Kellermann2025-05-211-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This flag was added by commit 3d3c95046742 ("netfs: Provide readahead and readpage netfs helpers") but its only user was removed by commit 86b374d061ee ("netfs: Remove fs/netfs/io.c"). Signed-off-by: Max Kellermann <[email protected]> Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/[email protected] cc: Paulo Alcantara <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
* | | Merge tag 'mm-nonmm-stable-2025-05-31-15-28' of ↵Linus Torvalds2025-06-011-10/+0
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "hung_task: extend blocking task stacktrace dump to semaphore" from Lance Yang enhances the hung task detector. The detector presently dumps the blocking tasks's stack when it is blocked on a mutex. Lance's series extends this to semaphores - "nilfs2: improve sanity checks in dirty state propagation" from Wentao Liang addresses a couple of minor flaws in nilfs2 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn fixes a couple of issues in the gdb scripts - "Support kdump with LUKS encryption by reusing LUKS volume keys" from Coiby Xu addresses a usability problem with kdump. When the dump device is LUKS-encrypted, the kdump kernel may not have the keys to the encrypted filesystem. A full writeup of this is in the series [0/N] cover letter - "sysfs: add counters for lockups and stalls" from Max Kellermann adds /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and /sys/kernel/rcu_stall_count - "fork: Page operation cleanups in the fork code" from Pasha Tatashin implements a number of code cleanups in fork.c - "scripts/gdb/symbols: determine KASLR offset on s390 during early boot" from Ilya Leoshkevich fixes some s390 issues in the gdb scripts * tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits) llist: make llist_add_batch() a static inline delayacct: remove redundant code and adjust indentation squashfs: add optional full compressed block caching crash_dump, nvme: select CONFIGFS_FS as built-in scripts/gdb/symbols: determine KASLR offset on s390 during early boot scripts/gdb/symbols: factor out pagination_off() scripts/gdb/symbols: factor out get_vmlinux() kernel/panic.c: format kernel-doc comments mailmap: update and consolidate Casey Connolly's name and email nilfs2: remove wbc->for_reclaim handling fork: define a local GFP_VMAP_STACK fork: check charging success before zeroing stack fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code fork: clean-up ifdef logic around stack allocation kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count x86/crash: make the page that stores the dm crypt keys inaccessible x86/crash: pass dm crypt keys to kdump kernel Revert "x86/mm: Remove unused __set_memory_prot()" crash_dump: retrieve dm crypt keys in kdump kernel ...
| * | | relay: remove unused relay_late_setup_filesDr. David Alan Gilbert2025-05-121-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last use of relay_late_setup_files() was removed in 2018 by commit 2b47733045aa ("drm/i915/guc: Merge log relay file and channel creation") Remove it and the helper it used. relay_late_setup_files() was used for eventually registering 'buffer only' channels. With it gone, delete the docs that explain how to do that. Which suggests it should be possible to lose the 'has_base_filename' flags. (Are there any other uses??) Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Dr. David Alan Gilbert <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Cc: Al Viro <[email protected]> Cc: Andriy Shevchenko <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* | | | Merge tag 'pull-automount' of ↵Linus Torvalds2025-05-302-3/+7
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull automount updates from Al Viro: "Automount wart removal A bunch of odd boilerplate gone from instances - the reason for those was the need to protect the yet-to-be-attched mount from mark_mounts_for_expiry() deciding to take it out. But that's easy to detect and take care of in mark_mounts_for_expiry() itself; no need to have every instance simulate mount being busy by grabbing an extra reference to it, with finish_automount() undoing that once it attaches that mount. Should've done it that way from the very beginning... This is a flagday change, thankfully there are very few instances. vfs_submount() is gone - its sole remaining user (trace_automount) had been switched to saner primitives" * tag 'pull-automount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: kill vfs_submount() saner calling conventions for ->d_automount()
| * | | | saner calling conventions for ->d_automount()Al Viro2025-05-052-3/+8
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the calling conventions for ->d_automount() instances have an odd wart - returned new mount to be attached is expected to have refcount 2. That kludge is intended to make sure that mark_mounts_for_expiry() called before we get around to attaching that new mount to the tree won't decide to take it out. finish_automount() drops the extra reference after it's done with attaching mount to the tree - or drops the reference twice in case of error. ->d_automount() instances have rather counterintuitive boilerplate in them. There's a much simpler approach: have mark_mounts_for_expiry() skip the mounts that are yet to be mounted. And to hell with grabbing/dropping those extra references. Makes for simpler correctness analysis, at that... Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Reviewed-by: Paulo Alcantara (Red Hat) <[email protected]> Acked-by: David Howells <[email protected]> Tested-by: David Howells <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Al Viro <[email protected]>
* | | | Merge tag 'f2fs-for-6.16-rc1' of ↵Linus Torvalds2025-05-301-25/+27
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, Matthew converted most of page operations to using folio. Beyond the work, we've applied some performance tunings such as GC and linear lookup, in addition to enhancing fault injection and sanity checks. Enhancements: - large number of folio conversions - add a control to turn on/off the linear lookup for performance - tune GC logics for zoned block device - improve fault injection and sanity checks Bug fixes: - handle error cases of memory donation - fix to correct check conditions in f2fs_cross_rename - fix to skip f2fs_balance_fs() if checkpoint is disabled - don't over-report free space or inodes in statvfs - prevent the current section from being selected as a victim during GC - fix to calculate first_zoned_segno correctly - fix to avoid inconsistence between SIT and SSA for zoned block device As usual, there are several debugging patches and clean-ups as well" * tag 'f2fs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (195 commits) f2fs: fix to correct check conditions in f2fs_cross_rename f2fs: use d_inode(dentry) cleanup dentry->d_inode f2fs: fix to skip f2fs_balance_fs() if checkpoint is disabled f2fs: clean up to check bi_status w/ BLK_STS_OK f2fs: introduce is_{meta,node}_folio f2fs: add ckpt_valid_blocks to the section entry f2fs: add a method for calculating the remaining blocks in the current segment in LFS mode. f2fs: introduce FAULT_VMALLOC f2fs: use vmalloc instead of kvmalloc in .init_{,de}compress_ctx f2fs: add f2fs_bug_on() in f2fs_quota_read() f2fs: add f2fs_bug_on() to detect potential bug f2fs: remove unused sbi argument from checksum functions f2fs: fix 32-bits hexademical number in fault injection doc f2fs: don't over-report free space or inodes in statvfs f2fs: return bool from __write_node_folio f2fs: simplify return value handling in f2fs_fsync_node_pages f2fs: always unlock the page in f2fs_write_single_data_page f2fs: remove wbc->for_reclaim handling f2fs: return bool from __f2fs_write_meta_folio f2fs: fix to return correct error number in f2fs_sync_node_pages() ...
| * | | | f2fs: introduce FAULT_VMALLOCChao Yu2025-05-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new fault type FAULT_VMALLOC to simulate no memory error in f2fs_vmalloc(). Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
| * | | | f2fs: fix 32-bits hexademical number in fault injection docChao Yu2025-05-131-26/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | FAULT_KMALLOC 0x000000001 There is one redundant '0' in 32-bits hexademical number of fault type, remove it. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
| * | | | f2fs: support FAULT_TIMEOUTChao Yu2025-05-061-0/+1
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support to inject a timeout fault into function, currently it only support to inject timeout to commit_atomic_write flow to reproduce inconsistent bug, like the bug fixed by commit f098aeba04c9 ("f2fs: fix to avoid atomicity corruption of atomic file"). By default, the new type fault will inject 1000ms timeout, and the timeout process can be interrupted by SIGKILL. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
* | | | Merge tag 'driver-core-6.16-rc1' of ↵Linus Torvalds2025-05-291-13/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Greg KH: "Here are the driver core / kernfs changes for 6.16-rc1. Not a huge number of changes this development cycle, here's the summary of what is included in here: - kernfs locking tweaks, pushing some global locks down into a per-fs image lock - rust driver core and pci device bindings added for new features. - sysfs const work for bin_attributes. The final churn of switching away from and removing the transitional struct members, "read_new", "write_new" and "bin_attrs_new" will come after the merge window to avoid unnecesary merge conflicts. - auxbus device creation helpers added - fauxbus fix for creating sysfs files after the probe completed properly - other tiny updates for driver core things. All of these have been in linux-next for over a week with no reported issues" * tag 'driver-core-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: kernfs: Relax constraint in draining guard Documentation: embargoed-hardware-issues.rst: Remove myself drivers: hv: fix up const issue with vmbus_chan_bin_attrs firmware_loader: use SHA-256 library API instead of crypto_shash API docs: debugfs: do not recommend debugfs_remove_recursive PM: wakeup: Do not expose 4 device wakeup source APIs kernfs: switch global kernfs_rename_lock to per-fs lock kernfs: switch global kernfs_idr_lock to per-fs lock driver core: auxiliary bus: Fix IS_ERR() vs NULL mixup in __devm_auxiliary_device_create() sysfs: constify attribute_group::bin_attrs sysfs: constify bin_attribute argument of bin_attribute::read/write() software node: Correct a OOB check in software_node_get_reference_args() devres: simplify devm_kstrdup() using devm_kmemdup() platform: replace magic number with macro PLATFORM_DEVID_NONE component: do not try to unbind unbound components driver core: auxiliary bus: add device creation helpers driver core: faux: Add sysfs groups after probing
| * | | | docs: debugfs: do not recommend debugfs_remove_recursiveTimur Tabi2025-04-301-13/+6
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the debugfs documentation to indicate that debugfs_remove() should be used to clean up debugfs entries. In commit a3d1e7eb5abe ("simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems"), function debugfs_remove_recursive() was made into an alias for debugfs_remove(): #define debugfs_remove_recursive debugfs_remove Therefore, drivers should just use debugfs_remove() going forward. Signed-off-by: Timur Tabi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
* | | | Merge tag 'ext4_for_linus-6.16-rc1' of ↵Linus Torvalds2025-05-282-0/+226
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "New ext4 features and performance improvements: - Fast commit performance improvements - Multi-fsblock atomic write support for bigalloc file systems - Large folio support for regular files This last can result in really stupendous performance for the right workloads. For example, see [1] where the Kernel Test Robot reported over 37% improvement on a large sequential I/O workload. There are also the usual bug fixes and cleanups. Of note are cleanups of the extent status tree to fix potential races that could result in the extent status tree getting corrupted under heavy simultaneous allocation and deallocation to a single file" Link: https://lore.kernel.org/all/[email protected]/ [1] * tag 'ext4_for_linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (52 commits) ext4: Add a WARN_ON_ONCE for querying LAST_IN_LEAF instead ext4: Simplify flags in ext4_map_query_blocks() ext4: Rename and document EXT4_EX_FILTER to EXT4_EX_QUERY_FILTER ext4: Simplify last in leaf check in ext4_map_query_blocks ext4: Unwritten to written conversion requires EXT4_EX_NOCACHE ext4: only dirty folios when data journaling regular files ext4: Add atomic block write documentation ext4: Enable support for ext4 multi-fsblock atomic write using bigalloc ext4: Add multi-fsblock atomic write support with bigalloc ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS ext4: Make ext4_meta_trans_blocks() non-static for later use ext4: Check if inode uses extents in ext4_inode_can_atomic_write() ext4: Document an edge case for overwrites jbd2: remove journal_t argument from jbd2_superblock_csum() jbd2: remove journal_t argument from jbd2_chksum() ext4: remove sb argument from ext4_superblock_csum() ext4: remove sbi argument from ext4_chksum() ext4: enable large folio for regular file ext4: make online defragmentation support large folios ext4: make the writeback path support large folios ...
| * | | | ext4: Add atomic block write documentationRitesh Harjani (IBM)2025-05-202-0/+226
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an initial documentation around atomic writes support in ext4. Acked-by: Darrick J. Wong <[email protected]> Signed-off-by: Ritesh Harjani (IBM) <[email protected]> Link: https://patch.msgid.link/d3893b9f5ad70317abae72046e81e4c180af91bf.1747337952.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <[email protected]>
* | | | Merge tag 'docs-6.16' of git://git.lwn.net/linuxLinus Torvalds2025-05-271-13/+13
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull documentation updates from Jonathan Corbet: "A moderately busy cycle for documentation this time around: - The most significant change is the replacement of the old kernel-doc script (a monstrous collection of Perl regexes that predates the Git era) with a Python reimplementation. That, too, is a horrifying collection of regexes, but in a much cleaner and more maintainable structure that integrates far better with the Sphinx build system. This change has been in linux-next for the full 6.15 cycle; the small number of problems that turned up have been addressed, seemingly to everybody's satisfaction. The Perl kernel-doc script remains in tree (as scripts/kernel-doc.pl) and can be used with a command-line option if need be. Unless some reason to keep it around materializes, it will probably go away in 6.17. Credit goes to Mauro Carvalho Chehab for doing all this work. - Some RTLA documentation updates - A handful of Chinese translations - The usual collection of typo fixes, general updates, etc" * tag 'docs-6.16' of git://git.lwn.net/linux: (85 commits) Docs: doc-guide: update sphinx.rst Sphinx version number docs: doc-guide: clarify latest theme usage Documentation/scheduler: Fix typo in sched-stats domain field description scripts: kernel-doc: prevent a KeyError when checking output docs: kerneldoc.py: simplify exception handling logic MAINTAINERS: update linux-doc entry to cover new Python scripts docs: align with scripts/syscall.tbl migration Documentation: NTB: Fix typo Documentation: ioctl-number: Update table intro docs: conf.py: drop backward support for old Sphinx versions Docs: driver-api/basics: add kobject_event interfaces Docs: relay: editing cleanups docs: fix "incase" typo in coresight/panic.rst Fix spelling error for 'parallel' docs: admin-guide: fix typos in reporting-issues.rst docs: dmaengine: add explanation for DMA_ASYNC_TX capability Documentation: leds: improve readibility of multicolor doc docs: fix typo in firmware-related section docs: Makefile: Inherit PYTHONPYCACHEPREFIX setting as env variable Documentation: ioctl-number: Update outdated submission info ...
| * | | | Docs: relay: editing cleanupsRandy Dunlap2025-05-191-13/+13
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cleanup some punctuation, capital letter, and a missing word in relay.rst. Signed-off-by: Randy Dunlap <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: [email protected] Cc: Andrew Morton <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Tom Zanussi <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]> Message-ID: <[email protected]>
* | | | Merge tag 'x86_cache_for_v6.16_rc1' of ↵Linus Torvalds2025-05-272-0/+1524
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Carve out the resctrl filesystem-related code into fs/resctrl/ so that multiple architectures can share the fs API for manipulating their respective hw resource control implementation. This is the second step in the work towards sharing the resctrl filesystem interface, the next one being plugging ARM's MPAM into the aforementioned fs API" * tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) MAINTAINERS: Add reviewers for fs/resctrl x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl x86/resctrl: Always initialise rid field in rdt_resources_all[] x86/resctrl: Relax some asm #includes x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context() x86/resctrl: Squelch whitespace anomalies in resctrl core code x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs x86/resctrl: Move enum resctrl_event_id to resctrl.h x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl fs/resctrl: Add boiler plate for external resctrl code x86/resctrl: Add 'resctrl' to the title of the resctrl documentation x86/resctrl: Split trace.h x86/resctrl: Expand the width of domid by replacing mon_data_bits x86/resctrl: Add end-marker to the resctrl_event_id enum x86/resctrl: Move is_mba_sc() out of core.c x86/resctrl: Drop __init/__exit on assorted symbols x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point x86/resctrl: Check all domains are offline in resctrl_exit() x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_" ...
| * | | | x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrlJames Morse2025-05-162-0/+1524
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resctrl is a filesystem interface to hardware that provides cache allocation policy and bandwidth control for groups of tasks or CPUs. To support more than one architecture, resctrl needs to live in /fs/. Move the code that is concerned with the filesystem interface to /fs/resctrl. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Tested-by: Fenghua Yu <[email protected]> Tested-by: Tony Luck <[email protected]> Link: https://lore.kernel.org/[email protected]
* | | | Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds2025-05-261-37/+148
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull fscrypt update from Eric Biggers: "Add support for 'hardware-wrapped inline encryption keys' to fscrypt. When enabled on supported platforms, this feature protects file contents keys from certain attacks, such as cold boot attacks. This feature uses the block layer support for wrapped keys which was merged in 6.15. Wrapped key support has existed out-of-tree in Android for a long time, and it's finally ready for upstream now that there is a platform on which it works end-to-end with upstream. Specifically, it works on the Qualcomm SM8650 HDK, using the Qualcomm ICE (Inline Crypto Engine) and HWKM (Hardware Key Manager). The corresponding driver support is included in the SCSI tree for 6.16. Validation for this feature includes two new tests that were already merged into xfstests (generic/368 and generic/369)" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: add support for hardware-wrapped keys
| * | | | fscrypt: add support for hardware-wrapped keysEric Biggers2025-04-091-37/+148
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for hardware-wrapped keys to fscrypt. Such keys are protected from certain attacks, such as cold boot attacks. For more information, see the "Hardware-wrapped keys" section of Documentation/block/inline-encryption.rst. To support hardware-wrapped keys in fscrypt, we allow the fscrypt master keys to be hardware-wrapped. File contents encryption is done by passing the wrapped key to the inline encryption hardware via blk-crypto. Other fscrypt operations such as filenames encryption continue to be done by the kernel, using the "software secret" which the hardware derives. For more information, see the documentation which this patch adds to Documentation/filesystems/fscrypt.rst. Note that this feature doesn't require any filesystem-specific changes. However it does depend on inline encryption support, and thus currently it is only applicable to ext4 and f2fs. The version of this feature introduced by this patch is mostly equivalent to the version that has existed downstream in the Android Common Kernels since 2020. However, a couple fixes are included. First, the flags field in struct fscrypt_add_key_arg is now placed in the proper location. Second, key identifiers for HW-wrapped keys are now derived using a distinct HKDF context byte; this fixes a bug where a raw key could have the same identifier as a HW-wrapped key. Note that as a result of these fixes, the version of this feature introduced by this patch is not UAPI or on-disk format compatible with the version in the Android Common Kernels, though the divergence is limited to just those specific fixes. This version should be used going forwards. This patch has been heavily rewritten from the original version by Gaurav Kashyap <[email protected]> and Barani Muthukumaran <[email protected]>. Tested-by: Bartosz Golaszewski <[email protected]> # sm8650 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
* | | | Merge tag 'erofs-for-6.16-rc1' of ↵Linus Torvalds2025-05-261-0/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "In this cycle, Intel QAT hardware accelerators are supported to improve DEFLATE decompression performance. I've tested it with the enwik9 dataset of 1 MiB pclusters on our Intel Sapphire Rapids bare-metal server and a PL0 ESSD, and the sequential read performance even surpasses LZ4 software decompression on this setup. In addition, a `fsoffset` mount option is introduced for file-backed mounts to specify the filesystem offset in order to adapt customized container formats. And other improvements and minor cleanups. Summary: - Add a `fsoffset` mount option to specify the filesystem offset - Support Intel QAT accelerators to boost up the DEFLATE algorithm - Initialize per-CPU workers and CPU hotplug hooks lazily to avoid unnecessary overhead when EROFS is not mounted - Fix file handle encoding for 64-bit NIDs - Minor cleanups" * tag 'erofs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: support DEFLATE decompression by using Intel QAT erofs: clean up erofs_{init,exit}_sysfs() erofs: add 'fsoffset' mount option to specify filesystem offset erofs: lazily initialize per-CPU workers and CPU hotplug hooks erofs: refine readahead tracepoint erofs: avoid using multiple devices with different type erofs: fix file handle encoding for 64-bit NIDs
| * | | | erofs: add 'fsoffset' mount option to specify filesystem offsetSheng Yong2025-05-221-0/+1
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When attempting to use an archive file, such as APEX on android, as a file-backed mount source, it fails because EROFS image within the archive file does not start at offset 0. As a result, a loop or a dm device is still needed to attach the image file at an appropriate offset first. Similarly, if an EROFS image within a block device does not start at offset 0, it cannot be mounted directly either. To address this issue, this patch adds a new mount option `fsoffset=x' to accept a start offset for the primary device. The offset should be aligned to the block size. EROFS will add this offset before performing read requests. Signed-off-by: Sheng Yong <[email protected]> Signed-off-by: Wang Shuai <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Gao Xiang: minor update on documentation and the error message. ] Reviewed-by: Hongbo Li <[email protected]> Signed-off-by: Gao Xiang <[email protected]>
* | | | Merge tag 'bcachefs-2025-05-24' of git://evilpiepirate.org/bcachefsLinus Torvalds2025-05-263-0/+103
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull bcachefs updates from Kent Overstreet: - Poisoned extents can now be moved: this lets us handle bitrotted data without deleting it. For now, reading from poisoned extents only returns -EIO: in the future we'll have an API for specifying "read this data even if there were bitflips". - Incompatible features may now be enabled at runtime, via "opts/version_upgrade" in sysfs. Toggle it to incompatible, and then toggle it back - option changes via the sysfs interface are persistent. - Various changes to support deployable disk images: - RO mounts now use less memory - Images may be stripped of alloc info, particularly useful for slimming them down if they will primarily be mounted RO. Alloc info will be automatically regenerated on first RW mount, and this is quite fast - Filesystem images generated with 'bcachefs image' will be automatically resized the first time they're mounted on a larger device The images 'bcachefs image' generates with compression enabled have been comparable in size to those generated by squashfs and erofs - but you get a full RW capable filesystem - Major error message improvements for btree node reads, data reads, and elsewhere. We now build up a single error message that lists all the errors encountered, actions taken to repair, and success/failure of the IO. This extends to other error paths that may kick off other actions, e.g. scheduling recovery passes: actions we took because of an error are included in that error message, with grouping/indentation so we can see what caused what. - New option, 'rebalance_on_ac_only'. Does exactly what the name suggests, quite handy with background compression. - Repair/self healing: - We can now kick off recovery passes and run them in the background if we detect errors. Currently, this is just used by code that walks backpointers. We now also check for missing backpointers at runtime and run check_extents_to_backpointers if required. The messy 6.14 upgrade left missing backpointers for some users, and this will correct that automatically instead of requiring a manual fsck - some users noticed this as copygc spinning and not making progress. In the future, as more recovery passes come online, we'll be able to repair and recover from nearly anything - except for unreadable btree nodes, and that's why you're using replication, of course - without shutting down the filesystem. - There's a new recovery pass, for checking the rebalance_work btree, which tracks extents that rebalance will process later. - Hardening: - Close the last known hole in btree iterator/btree locking assertions: path->should_be_locked paths must stay locked until the end of the transaction. This shook out a few bugs, including a performance issue that was causing unnecessary path_upgrade transaction restarts. - Performance: - Faster snapshot deletion: this is an incompatible feature, as it requires new sentinal values, for safety. Snapshot deletion no longer has to do a full metadata scan, it now just scans the inodes btree: if an extent/dirent/xattr is present for a given snapshot ID, we already require that an inode be present with that same snapshot ID. If/when users hit scalability limits again (ridiculously huge filesystems with lots of inodes, and many sparse snapshots), let me know - the next step will be to add an index from snapshot ID -> inode number, which won't be too hard. - Faster device removal: the "scan for pointers to this device" no longer does a full metadata scan, instead it walks backpointers. Like fast snapshot deletion this is another incompat feature: it also requires a new sentinal value, because we don't want to reuse these device IDs until after a fsck. - We're now coalescing redundant accounting updates prior to transaction commit, taking some pressure off the journal. Shortly we'll also be doing multiple extent updates in a transaction in the main write path, which combined with the previous should drastically cut down on the amount of metadata updates we have to journal. - Stack usage improvements: All allocator state has been moved off the stack - Debug improvements: - enumerated refcounts: The debug code previously used for filesystem write refs is now a small library, and used for other heavily used refcounts. Different users of a refcount are enumerated, making it much easier to debug refcount issues. - Async object debugging: There's a new kconfig option that makes various async objects (different types of bios, data updates, write ops, etc.) visible in debugfs, and it should be fast enough to leave on in production. - Various sets of assertions no longer require CONFIG_BCACHEFS_DEBUG, instead they're controlled by module parameters and static keys, meaning users won't need to compile custom kernels as often to help debug issues. - bch2_trans_kmalloc() calls can be tracked (there's a new kconfig option). With it on you can check the btree_transaction_stats in debugfs to see the bch2_trans_kmalloc() calls a transaction did when it used the most memory. * tag 'bcachefs-2025-05-24' of git://evilpiepirate.org/bcachefs: (218 commits) bcachefs: Don't mount bs > ps without TRANSPARENT_HUGEPAGE bcachefs: Fix btree_iter_next_node() for new locking asserts bcachefs: Ensure we don't use a blacklisted journal seq bcachefs: Small check_fix_ptr fixes bcachefs: Fix opts.recovery_pass_last bcachefs: Fix allocate -> self healing path bcachefs: Fix endianness in casefold check/repair bcachefs: Path must be locked if trans->locked && should_be_locked bcachefs: Simplify bch2_path_put() bcachefs: Plumb btree_trans for more locking asserts bcachefs: Clear trans->locked before unlock bcachefs: Clear should_be_locked before unlock in key_cache_drop() bcachefs: bch2_path_get() reuses paths if upgrade_fails & !should_be_locked bcachefs: Give out new path if upgrade fails bcachefs: Fix btree_path_get_locks when not doing trans restart bcachefs: btree_node_locked_type_nowrite() bcachefs: Kill bch2_path_put_nokeep() bcachefs: bch2_journal_write_checksum() bcachefs: Reduce stack usage in data_update_index_update() bcachefs: bch2_trans_log_str() ...
| * | | | docs: bcachefs: add casefolding referenceKent Overstreet2025-05-221-0/+18
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Kent Overstreet <[email protected]>
| * | | | docs: bcachefs: idle work scheduling design docKent Overstreet2025-05-222-0/+85
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | People have been asking to see the plan for this, so - bcachefs has various background tasks that need to be scheduled to balance efficiency, predictability of performance, etc. The design and philosophy hasn't changed too much since bcache, which was primarily designed for server usage, with sustained load in mind. These days we're seeing more desktop usage - where we really want to let the system idle effictively, to reduce total power usage - while also still balancing previous concerns, we still want to let work accumulate to a degree. This lays out all the requirements and starts to sketch out the algorithm I have in mind. Signed-off-by: Kent Overstreet <[email protected]>
* | | | Merge tag 'vfs-6.16-rc1.iomap' of ↵Linus Torvalds2025-05-261-2/+14
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull iomap updates from Christian Brauner: - More fallout and preparatory work associated with the folio batch prototype posted a while back. Mainly this just cleans up some of the helpers and pushes some pos/len trimming further down in the write begin path. - Add missing flag descriptions to the iomap documentation * tag 'vfs-6.16-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: rework iomap_write_begin() to return folio offset and length iomap: push non-large folio check into get folio path iomap: helper to trim pos/bytes to within folio iomap: drop pos param from __iomap_[get|put]_folio() iomap: drop unnecessary pos param from iomap_write_[begin|end] iomap: resample iter->pos after iomap_write_begin() calls iomap: trace: Add missing flags to [IOMAP_|IOMAP_F_]FLAGS_STRINGS Documentation: iomap: Add missing flags description
| * | | | Documentation: iomap: Add missing flags descriptionRitesh Harjani (IBM)2025-04-151-2/+14
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's document the use of these flags in iomap design doc where other flags are defined too - - IOMAP_F_BOUNDARY was added by XFS to prevent merging of I/O and I/O completions across RTG boundaries. - IOMAP_F_ATOMIC_BIO was added for supporting atomic I/O operations for filesystems to inform the iomap that it needs HW-offload based mechanism for torn-write protection. While we are at it, let's also fix the description of IOMAP_F_PRIVATE flag after a recent: commit 923936efeb74b3 ("iomap: Fix conflicting values of iomap flags") Signed-off-by: "Ritesh Harjani (IBM)" <[email protected]> Link: https://lore.kernel.org/8d8534a704c4f162f347a84830710db32a927b2e.1744432270.git.ritesh.list@gmail.com Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
* | | | Merge tag 'vfs-6.16-rc1.misc' of ↵Linus Torvalds2025-05-262-290/+736
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Use folios for symlinks in the page cache FUSE already uses folios for its symlinks. Mirror that conversion in the generic code and the NFS code. That lets us get rid of a few folio->page->folio conversions in this path, and some of the few remaining users of read_cache_page() / read_mapping_page() - Try and make a few filesystem operations killable on the VFS inode->i_mutex level - Add sysctl vfs_cache_pressure_denom for bulk file operations Some workloads need to preserve more dentries than we currently allow through out sysctl interface A HDFS servers with 12 HDDs per server, on a HDFS datanode startup involves scanning all files and caching their metadata (including dentries and inodes) in memory. Each HDD contains approximately 2 million files, resulting in a total of ~20 million cached dentries after initialization To minimize dentry reclamation, they set vfs_cache_pressure to 1. Despite this configuration, memory pressure conditions can still trigger reclamation of up to 50% of cached dentries, reducing the cache from 20 million to approximately 10 million entries. During the subsequent cache rebuild period, any HDFS datanode restart operation incurs substantial latency penalties until full cache recovery completes To maintain service stability, more dentries need to be preserved during memory reclamation. The current minimum reclaim ratio (1/100 of total dentries) remains too aggressive for such workload. This patch introduces vfs_cache_pressure_denom for more granular cache pressure control The configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000] effectively maintains the full 20 million dentry cache under memory pressure, preventing datanode restart performance degradation - Avoid some jumps in inode_permission() using likely()/unlikely() - Avid a memory access which is most likely a cache miss when descending into devcgroup_inode_permission() - Add fastpath predicts for stat() and fdput() - Anonymous inodes currently don't come with a proper mode causing issues in the kernel when we want to add useful VFS debug assert. Fix that by giving them a proper mode and masking it off when we report it to userspace which relies on them not having any mode - Anonymous inodes currently allow to change inode attributes because the VFS falls back to simple_setattr() if i_op->setattr isn't implemented. This means the ownership and mode for every single user of anon_inode_inode can be changed. Block that as it's either useless or actively harmful. If specific ownership is needed the respective subsystem should allocate anonymous inodes from their own private superblock - Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock - Add proper tests for anonymous inode behavior - Make it easy to detect proper anonymous inodes and to ensure that we can detect them in codepaths such as readahead() Cleanups: - Port pidfs to the new anon_inode_{g,s}etattr() helpers - Try to remove the uselib() system call - Add unlikely branch hint return path for poll - Add unlikely branch hint on return path for core_sys_select - Don't allow signals to interrupt getdents copying for fuse - Provide a size hint to dir_context for during readdir() - Use writeback_iter directly in mpage_writepages - Update compression and mtime descriptions in initramfs documentation - Update main netfs API document - Remove useless plus one in super_cache_scan() - Remove unnecessary NULL-check guards during setns() - Add separate separate {get,put}_cgroup_ns no-op cases Fixes: - Fix typo in root= kernel parameter description - Use KERN_INFO for infof()|info_plog()|infofc() - Correct comments of fs_validate_description() - Mark an unlikely if condition with unlikely() in vfs_parse_monolithic_sep() - Delete macro fsparam_u32hex() - Remove unused and problematic validate_constant_table() - Fix potential unsigned integer underflow in fs_name() - Make file-nr output the total allocated file handles" * tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits) fs: Pass a folio to page_put_link() nfs: Use a folio in nfs_get_link() fs: Convert __page_get_link() to use a folio fs/read_write: make default_llseek() killable fs/open: make do_truncate() killable fs/open: make chmod_common() and chown_common() killable include/linux/fs.h: add inode_lock_killable() readdir: supply dir_context.count as readdir buffer size hint vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations fuse: don't allow signals to interrupt getdents copying Documentation: fix typo in root= kernel parameter description include/cgroup: separate {get,put}_cgroup_ns no-op case kernel/nsproxy: remove unnecessary guards fs: use writeback_iter directly in mpage_writepages fs: remove useless plus one in super_cache_scan() fs: add S_ANON_INODE fs: remove uselib() system call device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission() fs/fs_parse: Remove unused and problematic validate_constant_table() fs: touch up predicts in inode_permission() ...
| * | | | fs/fs_parse: Remove unused and problematic validate_constant_table()Zijun Hu2025-04-211-15/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove validate_constant_table() since: - It has no caller. - It has below 3 bugs for good constant table array array[] which must end with a empty entry, and take below invocation for explaination: validate_constant_table(array, ARRAY_SIZE(array), ...) - Always return wrong value due to the last empty entry. - Imprecise error message for missorted case. - Potential NULL pointer dereference since the last pr_err() may use @tbl[i].name NULL pointer to print the last empty entry's name. Signed-off-by: Zijun Hu <[email protected]> Link: https://lore.kernel.org/[email protected] Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
| * | | | fs/fs_parse: Delete macro fsparam_u32hex()Zijun Hu2025-04-211-1/+0
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Delete macro fsparam_u32hex() since: - it has no caller. - it uses as type @fs_param_is_u32_hex which is never defined, so will cause compile error when caller uses it. Signed-off-by: Zijun Hu <[email protected]> Link: https://lore.kernel.org/[email protected] Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
| * | | netfs: Update main API documentDavid Howells2025-04-111-274/+736
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bring the netfs documentation up to date. Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/[email protected] Reviewed-by: "Paulo Alcantara (Red Hat)" <[email protected]> cc: Jeff Layton <[email protected]> cc: Viacheslav Dubeyko <[email protected]> cc: Alex Markuze <[email protected]> cc: Timothy Day <[email protected]> cc: Jonathan Corbet <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
* | | Merge tag 'vfs-6.16-rc1.writepage' of ↵Linus Torvalds2025-05-263-83/+12
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull final writepage conversion from Christian Brauner: "This converts vboxfs from ->writepage() to ->writepages(). This was the last user of the ->writepage() method. So remove ->writepage() completely and all references to it" * tag 'vfs-6.16-rc1.writepage' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Remove aops->writepage mm: Remove swap_writepage() and shmem_writepage() ttm: Call shmem_writeout() from ttm_backup_backup_page() i915: Use writeback_iter() shmem: Add shmem_writeout() writeback: Remove writeback_use_writepage() migrate: Remove call to ->writepage vboxsf: Convert to writepages 9p: Add a migrate_folio method
| * | | fs: Remove aops->writepageMatthew Wilcox (Oracle)2025-04-073-83/+12
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | All callers and implementations are now removed, so remove the operation and update the documentation to match. Signed-off-by: "Matthew Wilcox (Oracle)" <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
* | | Merge tag 'vfs-6.16-rc1.async.dir' of ↵Linus Torvalds2025-05-261-0/+40
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs directory lookup updates from Christian Brauner: "This contains cleanups for the lookup_one*() family of helpers. We expose a set of functions with names containing "lookup_one_len" and others without the "_len". This difference has nothing to do with "len". It's rater a historical accident that can be confusing. The functions without "_len" take a "mnt_idmap" pointer. This is found in the "vfsmount" and that is an important question when choosing which to use: do you have a vfsmount, or are you "inside" the filesystem. A related question is "is permission checking relevant here?". nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len functions. They pass nop_mnt_idmap and refuse to work on filesystems which have any other idmap. This work changes nfsd and cachefile to use the lookup_one family of functions and to explictily pass &nop_mnt_idmap which is consistent with all other vfs interfaces used where &nop_mnt_idmap is explicitly passed. The remaining uses of the "_one" functions do not require permission checks so these are renamed to be "_noperm" and the permission checking is removed. This series also changes these lookup function to take a qstr instead of separate name and len. In many cases this simplifies the call" * tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: VFS: change lookup_one_common and lookup_noperm_common to take a qstr Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS VFS: rename lookup_one_len family to lookup_noperm and remove permission check cachefiles: Use lookup_one() rather than lookup_one_len() nfsd: Use lookup_one() rather than lookup_one_len() VFS: improve interface for lookup_one functions
| * | Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFSNeilBrown2025-04-081-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | try_lookup_noperm() and d_hash_and_lookup() are nearly identical. The former does some validation of the name where the latter doesn't. Outside of the VFS that validation is likely valuable, and having only one exported function for this task is certainly a good idea. So make d_hash_and_lookup() local to VFS files and change all other callers to try_lookup_noperm(). Note that the arguments are swapped. Signed-off-by: NeilBrown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
| * | VFS: rename lookup_one_len family to lookup_noperm and remove permission checkNeilBrown2025-04-081-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The lookup_one_len family of functions is (now) only used internally by a filesystem on itself either - in a context where permission checking is irrelevant such as by a virtual filesystem populating itself, or xfs accessing its ORPHANAGE or dquota accessing the quota file; or - in a context where a permission check (MAY_EXEC on the parent) has just been performed such as a network filesystem finding in "silly-rename" file in the same directory. This is also the context after the _parentat() functions where currently lookup_one_qstr_excl() is used. So the permission check is pointless. The name "one_len" is unhelpful in understanding the purpose of these functions and should be changed. Most of the callers pass the len as "strlen()" so using a qstr and QSTR() can simplify the code. This patch renames these functions (include lookup_positive_unlocked() which is part of the family despite the name) to have a name based on "lookup_noperm". They are changed to receive a 'struct qstr' instead of separate name and len. In a few cases the use of QSTR() results in a new call to strlen(). try_lookup_noperm() takes a pointer to a qstr instead of the whole qstr. This is consistent with d_hash_and_lookup() (which is nearly identical) and useful for lookup_noperm_unlocked(). The new lookup_noperm_common() doesn't take a qstr yet. That will be tidied up in a subsequent patch. Signed-off-by: NeilBrown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
| * | VFS: improve interface for lookup_one functionsNeilBrown2025-04-071-0/+9
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The family of functions: lookup_one() lookup_one_unlocked() lookup_one_positive_unlocked() appear designed to be used by external clients of the filesystem rather than by filesystems acting on themselves as the lookup_one_len family are used. They are used by: btrfs/ioctl - which is a user-space interface rather than an internal activity exportfs - i.e. from nfsd or the open_by_handle_at interface overlayfs - at access the underlying filesystems smb/server - for file service They should be used by nfsd (more than just the exportfs path) and cachefs but aren't. It would help if the documentation didn't claim they should "not be called by generic code". Also the path component name is passed as "name" and "len" which are (confusingly?) separate by the "base". In some cases the len in simply "strlen" and so passing a qstr using QSTR() would make the calling clearer. Other callers do pass separate name and len which are stored in a struct. Sometimes these are already stored in a qstr, other times it easily could be. So this patch changes these three functions to receive a 'struct qstr *', and improves the documentation. QSTR_LEN() is added to make it easy to pass a QSTR containing a known len. [[email protected]: take a struct qstr pointer] Signed-off-by: NeilBrown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
* | Merge tag 'ext4_for_linus-6.15-rc2' of ↵Linus Torvalds2025-04-131-6/+14
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A few more miscellaneous ext4 bug fixes and cleanups including some syzbot failures and fixing a stale file handing refeencing an inode previously used as a regular file, but which has been deleted and reused as an ea_inode would result in ext4 erroneously considering this a case of fs corruption" * tag 'ext4_for_linus-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix off-by-one error in do_split ext4: make block validity check resistent to sb bh corruption ext4: avoid -Wflex-array-member-not-at-end warning Documentation: ext4: Add fields to ext4_super_block documentation ext4: don't treat fhandle lookup of ea_inode as FS corruption
| * Documentation: ext4: Add fields to ext4_super_block documentationTom Vierjahn2025-04-131-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Documentation and implementation of the ext4 super block have slightly diverged: Padding has been removed in order to make room for new fields that are still missing in the documentation. Add the new fields s_encryption_level, s_first_error_errorcode, s_last_error_errorcode to the documentation of the ext4 super block. Fixes: f542fbe8d5e8 ("ext4 crypto: reserve codepoints used by the ext4 encryption feature") Fixes: 878520ac45f9 ("ext4: save the error code which triggered an ext4_error() in the superblock") Signed-off-by: Tom Vierjahn <[email protected]> Reviewed-by: Ojaswin Mujoo <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
* | Merge tag '9p-for-6.15-rc1' of https://github.com/martinetd/linuxLinus Torvalds2025-04-031-3/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull 9p updates from Dominique Martinet: - fix handling of bogus (negative/too long) replies - fix crash on mkdir with ACLs (... looks like nobody is using ACLs with semi-recent kernels...) - ipv6 support for trans=tcp - minor concurrency fix to make syzbot happy - minor cleanup * tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux: docs: fs/9p: Add missing "not" in cache documentation 9p: Use hashtable.h for hash_errmap Documentation/fs/9p: fix broken link 9p/trans_fd: mark concurrent read and writes to p9_conn->err 9p/net: return error on bogus (longer than requested) replies 9p/net: fix improper handling of bogus negative read/write replies fs/9p: fix NULL pointer dereference on mkdir net/9p/fd: support ipv6 for trans=tcp
| * | docs: fs/9p: Add missing "not" in cache documentationTingmao Wang2025-04-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | A quick fix for what I assume is a typo. Signed-off-by: Tingmao Wang <[email protected]> Reviewed-by: Christian Schoenebeck <[email protected]> Message-ID: <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
| * | Documentation/fs/9p: fix broken linkTuomas Ahola2025-03-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In b529c06f9dc7 (Update the documentation referencing Plan 9 from User Space., 2020-04-26), another instance of the link was left unfixed. Fix that as well. Signed-off-by: Tuomas Ahola <[email protected]> Message-ID: <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
* | | Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of ↵Linus Torvalds2025-04-011-0/+10
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - The series "powerpc/crash: use generic crashkernel reservation" from Sourabh Jain changes powerpc's kexec code to use more of the generic layers. - The series "get_maintainer: report subsystem status separately" from Vlastimil Babka makes some long-requested improvements to the get_maintainer output. - The series "ucount: Simplify refcounting with rcuref_t" from Sebastian Siewior cleans up and optimizing the refcounting in the ucount code. - The series "reboot: support runtime configuration of emergency hw_protection action" from Ahmad Fatoum improves the ability for a driver to perform an emergency system shutdown or reboot. - The series "Converge on using secs_to_jiffies() part two" from Easwar Hariharan performs further migrations from msecs_to_jiffies() to secs_to_jiffies(). - The series "lib/interval_tree: add some test cases and cleanup" from Wei Yang permits more userspace testing of kernel library code, adds some more tests and performs some cleanups. - The series "hung_task: Dump the blocking task stacktrace" from Masami Hiramatsu arranges for the hung_task detector to dump the stack of the blocking task and not just that of the blocked task. - The series "resource: Split and use DEFINE_RES*() macros" from Andy Shevchenko provides some cleanups to the resource definition macros. - Plus the usual shower of singleton patches - please see the individual changelogs for details. * tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) mailmap: consolidate email addresses of Alexander Sverdlin fs/procfs: fix the comment above proc_pid_wchan() relay: use kasprintf() instead of fixed buffer formatting resource: replace open coded variant of DEFINE_RES() resource: replace open coded variants of DEFINE_RES_*_NAMED() resource: replace open coded variant of DEFINE_RES_NAMED_DESC() resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED() samples: add hung_task detector mutex blocking sample hung_task: show the blocker task if the task is hung on mutex kexec_core: accept unaccepted kexec segments' destination addresses watchdog/perf: optimize bytes copied and remove manual NUL-termination lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap() lib/interval_tree: skip the check before go to the right subtree lib/interval_tree: add test case for span iteration lib/interval_tree: add test case for interval_tree_iter_xxx() helpers lib/rbtree: add random seed lib/rbtree: split tests lib/rbtree: enable userland test suite for rbtree related data structure checkpatch: describe --min-conf-desc-length scripts/gdb/symbols: determine KASLR offset on s390 ...
| * | | docs,procfs: document /proc/PID/* access permission checksAndrii Nakryiko2025-03-171-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a paragraph explaining what sort of capabilities a process would need to read procfs data for some other process. Also mention that reading data for its own process doesn't require any extra permissions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andrii Nakryiko <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Al Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kees Cook <[email protected]> Cc: Liam Howlett <[email protected]> Cc: "Mike Rapoport (IBM)" <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* | | | Merge tag 'mm-stable-2025-03-30-16-52' of ↵Linus Torvalds2025-04-012-6/+38
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was found to be incorrect. - The series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. * tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits) mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex() x86/mm: restore early initialization of high_memory for 32-bits mm/vmscan: don't try to reclaim hwpoison folio mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper cgroup: docs: add pswpin and pswpout items in cgroup v2 doc mm: vmscan: split proactive reclaim statistics from direct reclaim statistics selftests/mm: speed up split_huge_page_test selftests/mm: uffd-unit-tests support for hugepages > 2M docs/mm/damon/design: document active DAMOS filter type mm/damon: implement a new DAMOS filter type for active pages fs/dax: don't disassociate zero page entries MM documentation: add "Unaccepted" meminfo entry selftests/mm: add commentary about 9pfs bugs fork: use __vmalloc_node() for stack allocation docs/mm: Physical Memory: Populate the "Zones" section xen: balloon: update the NR_BALLOON_PAGES state hv_balloon: update the NR_BALLOON_PAGES state balloon_compaction: update the NR_BALLOON_PAGES state meminfo: add a per node counter for balloon drivers mm: remove references to folio in __memcg_kmem_uncharge_page() ...
| * | | | MM documentation: add "Unaccepted" meminfo entryNico Pache2025-03-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit dcdfdd40fa82 ("mm: Add support for unaccepted memory") added a entry to meminfo but did not document it in the proc.rst file. This counter tracks the amount of "Unaccepted" guest memory for some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP. Add the missing entry in the documentation. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nico Pache <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Jeff Xu <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: xu xin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * | | | meminfo: add a per node counter for balloon driversNico Pache2025-03-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "track memory used by balloon drivers", v2. This series introduces a way to track memory used by balloon drivers. Add a NR_BALLOON_PAGES counter to track how many pages are reclaimed by the balloon drivers. First add the accounting, then updates the balloon drivers (virtio, Hyper-V, VMware, Pseries-cmm, and Xen) to maintain this counter. The virtio, Vmware, and pseries-cmm balloon drivers utilize the balloon_compaction interface to allocate and free balloon pages. Other balloon drivers will have to maintain this counter manually. This makes the information visible in memory reporting interfaces like /proc/meminfo, show_mem, and OOM reporting. This provides admins visibility into their VM balloon sizes without requiring different virtualization tooling. Furthermore, this information is helpful when debugging an OOM inside a VM. This patch (of 4): Add NR_BALLOON_PAGES counter to track memory used by balloon drivers and expose it through /proc/meminfo and other memory reporting interfaces. [[email protected]: document Balloon Meminfo entry] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nico Pache <[email protected]> Cc: Alexander Atanasov <[email protected]> Cc: Chengming Zhou <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dexuan Cui <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Juegren Gross <[email protected]> Cc: Kanchana P Sridhar <[email protected]> Cc: K. Y. Srinivasan <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Oleksandr Tyshchenko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Wei Liu <[email protected]> Cc: Michael Kelley <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * | | | mm: stop maintaining the per-page mapcount of large folios ↵David Hildenbrand2025-03-181-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (CONFIG_NO_PAGE_MAPCOUNT) Everything is in place to stop using the per-page mapcounts in large folios: the mapcount of tail pages will always be logically 0 (-1 value), just like it currently is for hugetlb folios already, and the page mapcount of the head page is either 0 (-1 value) or contains a page type (e.g., hugetlb). Maintaining _nr_pages_mapped without per-page mapcounts is impossible, so that one also has to go with CONFIG_NO_PAGE_MAPCOUNT. There are two remaining implications: (1) Per-node, per-cgroup and per-lruvec stats of "NR_ANON_MAPPED" ("mapped anonymous memory") and "NR_FILE_MAPPED" ("mapped file memory"): As soon as any page of the folio is mapped -- folio_mapped() -- we now account the complete folio as mapped. Once the last page is unmapped -- !folio_mapped() -- we account the complete folio as unmapped. This implies that ... * "AnonPages" and "Mapped" in /proc/meminfo and /sys/devices/system/node/*/meminfo * cgroup v2: "anon" and "file_mapped" in "memory.stat" and "memory.numa_stat" * cgroup v1: "rss" and "mapped_file" in "memory.stat" and "memory.numa_stat ... can now appear higher than before. But note that these folios do consume that memory, simply not all pages are actually currently mapped. It's worth nothing that other accounting in the kernel (esp. cgroup charging on allocation) is not affected by this change. [why oh why is "anon" called "rss" in cgroup v1] (2) Detecting partial mappings Detecting whether anon THPs are partially mapped gets a bit more unreliable. As long as a single MM maps such a large folio ("exclusively mapped"), we can reliably detect it. Especially before fork() / after a short-lived child process quit, we will detect partial mappings reliably, which is the common case. In essence, if the average per-page mapcount in an anon THP is < 1, we know for sure that we have a partial mapping. However, as soon as multiple MMs are involved, we might miss detecting partial mappings: this might be relevant with long-lived child processes. If we have a fully-mapped anon folio before fork(), once our child processes and our parent all unmap (zap/COW) the same pages (but not the complete folio), we might not detect the partial mapping. However, once the child processes quit we would detect the partial mapping. How relevant this case is in practice remains to be seen. Swapout/migration will likely mitigate this. In the future, RMAP walkers could check for that for that case (e.g., when collecting access bits during reclaim) and simply flag them for deferred-splitting. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Andy Lutomirks^H^Hski <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lance Yang <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcow (Oracle) <[email protected]> Cc: Michal Koutn <[email protected]> Cc: Muchun Song <[email protected]> Cc: tejun heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * | | | fs/proc/task_mmu: remove per-page mapcount dependency for smaps/smaps_rollup ↵David Hildenbrand2025-03-181-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (CONFIG_NO_PAGE_MAPCOUNT) Let's implement an alternative when per-page mapcounts in large folios are no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT. When computing the output for smaps / smaps_rollups, in particular when calculating the USS (Unique Set Size) and the PSS (Proportional Set Size), we still rely on per-page mapcounts. To determine private vs. shared, we'll use folio_likely_mapped_shared(), similar to how we handle PM_MMAP_EXCLUSIVE. Similarly, we might now under-estimate the USS and count pages towards "shared" that are actually "private" ("exclusively mapped"). When calculating the PSS, we'll now also use the average per-page mapcount for large folios: this can result in both, an over-estimation and an under-estimation of the PSS. The difference is not expected to matter much in practice, but we'll have to learn as we go. We can now provide folio_precise_page_mapcount() only with CONFIG_PAGE_MAPCOUNT, and remove one of the last users of per-page mapcounts when CONFIG_NO_PAGE_MAPCOUNT is enabled. Document the new behavior. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Andy Lutomirks^H^Hski <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lance Yang <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcow (Oracle) <[email protected]> Cc: Michal Koutn <[email protected]> Cc: Muchun Song <[email protected]> Cc: tejun heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * | | | fs/proc/task_mmu: remove per-page mapcount dependency for "mapmax" ↵David Hildenbrand2025-03-181-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (CONFIG_NO_PAGE_MAPCOUNT) Let's implement an alternative when per-page mapcounts in large folios are no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT. For calculating "mapmax", we now use the average per-page mapcount in a large folio instead of the per-page mapcount. For hugetlb folios and folios that are not partially mapped into MMs, there is no change. Likely, this change will not matter much in practice, and an alternative might be to simple remove this stat with CONFIG_NO_PAGE_MAPCOUNT. However, there might be value to it, so let's keep it like that and document the behavior. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Andy Lutomirks^H^Hski <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lance Yang <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcow (Oracle) <[email protected]> Cc: Michal Koutn <[email protected]> Cc: Muchun Song <[email protected]> Cc: tejun heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * | | | dcssblk: mark DAX broken, remove FS_DAX_LIMITED supportDan Williams2025-03-181-1/+0
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dcssblk driver has long needed special case supoprt to enable limited dax operation, so called CONFIG_FS_DAX_LIMITED. This mode works around the incomplete support for ZONE_DEVICE on s390 by forgoing the ability of dax-mapped pages to support GUP. Now, pending cleanups to fsdax that fix its reference counting [1] depend on the ability of all dax drivers to supply ZONE_DEVICE pages. To allow that work to move forward, dax support needs to be paused for dcssblk until ZONE_DEVICE support arrives. That work has been known for a few years [2], and the removal of "pte_devmap" requirements [3] makes the conversion easier. For now, place the support behind CONFIG_BROKEN, and remove PFN_SPECIAL (dcssblk was the only user). Link: http://lore.kernel.org/cover.9f0e45d52f5cff58807831b6b867084d0b14b61c.1725941415.git-series.apopple@nvidia.com [1] Link: http://lore.kernel.org/20210820210318.187742e8@thinkpad/ [2] Link: http://lore.kernel.org/4511465a4f8429f45e2ac70d2e65dc5e1df1eb47.1725941415.git-series.apopple@nvidia.com [3] Link: https://lkml.kernel.org/r/33eef2379c0d240f40cc15453fad2df1a4ae34c8.1740713401.git-series.apopple@nvidia.com Signed-off-by: Dan Williams <[email protected]> Reviewed-by: Gerald Schaefer <[email protected]> Tested-by: Alexander Gordeev <[email protected]> Acked-by: David Hildenbrand <[email protected]> Tested-by: Alison Schofield <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Jan Kara <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Asahi Lina <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chunyan Zhang <[email protected]> Cc: "Darrick J. Wong" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: linmiaohe <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Michael "Camp Drill Sergeant" Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ted Ts'o <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>