kernel - saturneric's kernel source tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	NFSv4/flexfiles: Fix layout merge mirror check.	Jonathan Curley	2025-09-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Typo in ff_lseg_match_mirrors makes the diff ineffective. This results in merge happening all the time. Merge happening all the time is problematic because it marks lsegs invalid. Marking lsegs invalid causes all outstanding IO to get restarted with EAGAIN and connections to get closed. Closing connections constantly triggers race conditions in the RDMA implementation... Fixes: 660d1eb22301c ("pNFS/flexfile: Don't merge layout segments if the mirrors don't match") Signed-off-by: Jonathan Curley <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
*	NFS: Fix the marking of the folio as up to date	Trond Myklebust	2025-09-06	1	-47/+5
\| \| \| \| \| \| \| \|	Since all callers of nfs_page_group_covers_page() have already ensured that there is only one group member, all that is required is to check that the entire folio contains dirty data. Signed-off-by: Trond Myklebust <[email protected]>
*	NFS: nfs_invalidate_folio() must observe the offset and size arguments	Trond Myklebust	2025-09-06	2	-3/+5
\| \| \| \| \| \| \| \|	If we're truncating part of the folio, then we need to write out the data on the part that is not covered by the cancellation. Fixes: d47992f86b30 ("mm: change invalidatepage prototype to accept length") Signed-off-by: Trond Myklebust <[email protected]>
*	NFSv4.2: Serialise O_DIRECT i/o and copy range	Trond Myklebust	2025-09-06	1	-0/+1
\| \| \| \| \| \| \| \|	Ensure that all O_DIRECT reads and writes complete before copying a file range, so that the destination is up to date. Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <[email protected]>
*	NFSv4.2: Serialise O_DIRECT i/o and clone range	Trond Myklebust	2025-09-06	1	-0/+2
\| \| \| \| \| \| \| \|	Ensure that all O_DIRECT reads and writes complete before cloning a file range, so that both the source and destination are up to date. Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <[email protected]>
*	NFSv4.2: Serialise O_DIRECT i/o and fallocate()	Trond Myklebust	2025-09-06	1	-0/+1
\| \| \| \| \| \| \| \|	Ensure that all O_DIRECT reads and writes complete before calling fallocate so that we don't race w.r.t. attribute updates. Fixes: 99f237832243 ("NFSv4.2: Always flush out writes in nfs42_proc_fallocate()") Signed-off-by: Trond Myklebust <[email protected]>
*	NFS: Serialise O_DIRECT i/o and truncate()	Trond Myklebust	2025-09-06	3	-12/+15
\| \| \| \| \| \| \| \| \|	Ensure that all O_DIRECT reads and writes are complete, and prevent the initiation of new i/o until the setattr operation that will truncate the file is complete. Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <[email protected]>
*	NFSv4.2: Protect copy offload and clone against 'eof page pollution'	Trond Myklebust	2025-09-06	1	-6/+13
\| \| \| \| \| \| \| \| \|	The NFSv4.2 copy offload and clone functions can also end up extending the size of the destination file, so they too need to call nfs_truncate_last_folio(). Reported-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
*	NFS: Protect against 'eof page pollution'	Trond Myklebust	2025-09-06	5	-5/+54
\| \| \| \| \| \| \| \| \| \| \|	This commit fixes the failing xfstest 'generic/363'. When the user mmaps() an area that extends beyond the end of file, and proceeds to write data into the folio that straddles that eof, we're required to discard that folio data if the user calls some function that extends the file length. Signed-off-by: Trond Myklebust <[email protected]>
*	flexfiles/pNFS: fix NULL checks on result of ff_layout_choose_ds_for_read	Tigran Mkrtchyan	2025-09-06	1	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent commit f06bedfa62d5 ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors") has changed the error return type of ff_layout_choose_ds_for_read() from NULL to an error pointer. However, not all code paths have been updated to match the change. Thus, some non-NULL checks will accept error pointers as a valid return value. Reported-by: Dan Carpenter <[email protected]> Suggested-by: Dan Carpenter <[email protected]> Fixes: f06bedfa62d5 ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors") Signed-off-by: Tigran Mkrtchyan <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
*	nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()	Mike Snitzer	2025-09-06	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously nfs_local_probe() was made to disable and then attempt to re-enable LOCALIO (via LOCALIO protocol handshake) if/when it was called and LOCALIO already enabled. Vague memory for _why_ this was the case is that this was useful if/when a local NFS server were to be restarted with a local NFS client connected to it. But as it happens this causes an absurd amount of LOCALIO flapping which has a side-effect of too much IO being needlessly sent to NFSD (using RPC over the loopback network interface). This is the definition of "serious performance loss" (that negates the point of having LOCALIO). So remove this mis-optimization for re-enabling LOCALIO if/when an NFS server is restarted (which is an extremely rare thing to do). Will revisit testing that scenario again but in the meantime this patch restores the full benefit of LOCALIO. Signed-off-by: Mike Snitzer <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
*	nfs/localio: restore creds before releasing pageio data	Scott Mayhew	2025-09-05	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Otherwise if the nfsd filecache code releases the nfsd_file immediately, it can trigger the BUG_ON(cred == current->cred) in __put_cred() when it puts the nfsd_file->nf_file->f-cred. Fixes: b9f5dd57f4a5 ("nfs/localio: use dedicated workqueues for filesystem read and write") Signed-off-by: Scott Mayhew <[email protected]> Reviewed-by: Mike Snitzer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
*	NFSv4: Clear the NFS_CAP_XATTR flag if not supported by the server	Trond Myklebust	2025-08-29	1	-0/+2
\| \| \| \| \| \| \| \|	nfs_server_set_fsinfo() shouldn't assume that NFS_CAP_XATTR is unset on entry to the function. Fixes: b78ef845c35d ("NFSv4.2: query the server for extended attribute support") Signed-off-by: Trond Myklebust <[email protected]>
*	NFSv4: Clear NFS_CAP_OPEN_XOR and NFS_CAP_DELEGTIME if not supported	Trond Myklebust	2025-08-29	1	-1/+2
\| \| \| \| \| \| \| \|	_nfs4_server_capabilities() should clear capabilities that are not supported by the server. Fixes: d2a00cceb93a ("NFSv4: Detect support for OPEN4_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") Signed-off-by: Trond Myklebust <[email protected]>
*	NFSv4: Clear the NFS_CAP_FS_LOCATIONS flag if it is not set	Trond Myklebust	2025-08-29	1	-2/+3
\| \| \| \| \| \| \| \|	_nfs4_server_capabilities() is expected to clear any flags that are not supported by the server. Fixes: 8a59bb93b7e3 ("NFSv4 store server support for fs_location attribute") Signed-off-by: Trond Myklebust <[email protected]>
*	NFSv4: Don't clear capabilities that won't be reset	Trond Myklebust	2025-08-29	1	-1/+0
\| \| \| \| \| \| \| \| \|	Don't clear the capabilities that are not going to get reset by the call to _nfs4_server_capabilities(). Reported-by: Scott Haiden <[email protected]> Fixes: b01f21cacde9 ("NFS: Fix the setting of capabilities when automounting a new filesystem") Signed-off-by: Trond Myklebust <[email protected]>
*	NFS: Fix a race when updating an existing write	Trond Myklebust	2025-08-19	2	-23/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After nfs_lock_and_join_requests() tests for whether the request is still attached to the mapping, nothing prevents a call to nfs_inode_remove_request() from succeeding until we actually lock the page group. The reason is that whoever called nfs_inode_remove_request() doesn't necessarily have a lock on the page group head. So in order to avoid races, let's take the page group lock earlier in nfs_lock_and_join_requests(), and hold it across the removal of the request in nfs_inode_remove_request(). Reported-by: Jeff Layton <[email protected]> Tested-by: Joe Quanaim <[email protected]> Tested-by: Andrew Steffen <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Fixes: bd37d6fce184 ("NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request()") Cc: [email protected] Signed-off-by: Trond Myklebust <[email protected]>
*	Merge tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs	Linus Torvalds	2025-08-09	27	-389/+767
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - don't inherit NFS filesystem capabilities when crossing from one filesystem to another Bugfixes: - NFS wakeup of __nfs_lookup_revalidate() needs memory barriers - NFS improve bounds checking in nfs_fh_to_dentry() - NFS Fix allocation errors when writing to a NFS file backed loopback device - NFSv4: More listxattr fixes - SUNRPC: fix client handling of TLS alerts - pNFS block/scsi layout fix for an uninitialised pointer dereference - pNFS block/scsi layout fixes for the extent encoding, stripe mapping, and disk offset overflows - pNFS layoutcommit work around for RPC size limitations - pNFS/flexfiles avoid looping when handling fatal errors after layoutget - localio: fix various race conditions Features and cleanups: - Add NFSv4 support for retrieving the btime - NFS: Allow folio migration for the case of mode == MIGRATE_SYNC - NFS: Support using a kernel keyring to store TLS certificates - NFSv4: Speed up delegation lookup using a hash table - Assorted cleanups to remove unused variables and struct fields - Assorted new tracepoints to improve debugging" * tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits) NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh() NFS/localio: nfs_close_local_fh() fix check for file closed NFSv4: Remove duplicate lookups, capability probes and fsinfo calls NFS: Fix the setting of capabilities when automounting a new filesystem sunrpc: fix client side handling of tls alerts nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock() NFS: Fixup allocation flags for nfsiod's __GFP_NORETRY NFSv4.2: another fix for listxattr NFS: Fix filehandle bounds checking in nfs_fh_to_dentry() SUNRPC: Silence warnings about parameters not being described NFS: Clean up pnfs_put_layout_hdr()/pnfs_destroy_layout_final() NFS: Fix wakeup of __nfs_lookup_revalidate() in unblock_revalidate() NFS: use a hash table for delegation lookup NFS: track active delegations per-server NFS: move the delegation_watermark module parameter NFS: cleanup nfs_inode_reclaim_delegation NFS: cleanup error handling in nfs4_server_common_setup pNFS/flexfiles: don't attempt pnfs on fatal DS errors NFS: drop __exit from nfs_exit_keyring ...
\| *	NFSv4: Remove duplicate lookups, capability probes and fsinfo calls	Trond Myklebust	2025-08-04	3	-58/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When crossing into a new filesystem, the NFSv4 client will look up the new directory, and then call nfs4_server_capabilities() as well as nfs4_do_fsinfo() at least twice. This patch removes the duplicate calls, and reduces the initial lookup to retrieve just a minimal set of attributes. Reviewed-by: Benjamin Coddington <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: Fix the setting of capabilities when automounting a new filesystem	Trond Myklebust	2025-08-04	4	-23/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Capabilities cannot be inherited when we cross into a new filesystem. They need to be reset to the minimal defaults, and then probed for again. Fixes: 54ceac451598 ("NFS: Share NFS superblocks per-protocol per-server per-FSID") Cc: [email protected] Reviewed-by: Benjamin Coddington <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
\| *	nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock()	Li RongQing	2025-08-03	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The usage of read_seqbegin_or_lock() in nfs_copy_boot_verifier() is wrong. "seq" is always even and thus "or_lock" has no effect. nfs_copy_boot_verifier() just copies 8 bytes and is supposed to be very rare operation, so we do not need the adaptive locking in this case. Signed-off-by: Li RongQing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: Fixup allocation flags for nfsiod's __GFP_NORETRY	Benjamin Coddington	2025-07-28	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the NFS client is doing writeback from a workqueue context, avoid using __GFP_NORETRY for allocations if the task has set PF_MEMALLOC_NOIO or PF_MEMALLOC_NOFS. The combination of these flags makes memory allocation failures much more likely. We've seen those allocation failures show up when the loopback driver is doing writeback from a workqueue to a file on NFS, where memory allocation failure results in errors or corruption within the loopback device's filesystem. Suggested-by: Trond Myklebust <[email protected]> Fixes: 0bae835b63c5 ("NFS: Avoid writeback threads getting stuck in mempool_alloc()") Signed-off-by: Benjamin Coddington <[email protected]> Reviewed-by: Laurence Oberman <[email protected]> Tested-by: Laurence Oberman <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Link: https://lore.kernel.org/r/f83ac1155a4bc670f2663959a7a068571e06afd9.1752111622.git.bcodding@redhat.com Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFSv4.2: another fix for listxattr	Olga Kornievskaia	2025-07-28	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, when the server supports NFS4.1 security labels then security.selinux label in included twice. Instead, only add it when the server doesn't possess security label support. Fixes: 243fea134633 ("NFSv4.2: fix listxattr to return selinux security label") Signed-off-by: Olga Kornievskaia <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: Fix filehandle bounds checking in nfs_fh_to_dentry()	Trond Myklebust	2025-07-22	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The function needs to check the minimal filehandle length before it can access the embedded filehandle. Reported-by: zhangjian <[email protected]> Fixes: 20fa19027286 ("nfs: add export operations") Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: Clean up pnfs_put_layout_hdr()/pnfs_destroy_layout_final()	Trond Myklebust	2025-07-22	1	-18/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Use the wake_up_var_locked() and wait_var_event_spinlock() helpers. Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: Fix wakeup of __nfs_lookup_revalidate() in unblock_revalidate()	Trond Myklebust	2025-07-22	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use store_release_wake_up() to add the appropriate memory barrier before calling wake_up_var(&dentry->d_fsdata). Reported-by: Lukáš Hejtmánek<[email protected]> Suggested-by: Santosh Pradhan <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Fixes: 99bc9f2eb3f7 ("NFS: add barriers when testing for NFS_FSDATA_BLOCKED") Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: use a hash table for delegation lookup	Christoph Hellwig	2025-07-22	4	-2/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nfs_delegation_find_inode currently has to walk the entire list of delegations per inode, which can become pretty large, and can become even larger when increasing the delegation watermark. Add a hash table to speed up the delegation lookup, sized as a fraction of the delegation watermark. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: track active delegations per-server	Christoph Hellwig	2025-07-22	2	-16/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The active delegation watermark was added to avoid overloading servers. Track the active delegation per-server instead of globally so that clients talking to multiple servers aren't limited by the global limit. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: move the delegation_watermark module parameter	Christoph Hellwig	2025-07-22	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Keep the module_param_named next to the variable declaration instead of somewhere unrelated, following the best practice in the rest of the kernel. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: cleanup nfs_inode_reclaim_delegation	Christoph Hellwig	2025-07-22	1	-24/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reduce a level of indentation for most of the code in this function. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: cleanup error handling in nfs4_server_common_setup	Christoph Hellwig	2025-07-22	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Return error directly instead of using a goto label for it. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	pNFS/flexfiles: don't attempt pnfs on fatal DS errors	Tigran Mkrtchyan	2025-07-22	2	-13/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an applications get killed (SIGTERM/SIGINT) while pNFS client performs a connection to DS, client ends in an infinite loop of connect-disconnect. This source of the issue, it that flexfilelayoutdev#nfs4_ff_layout_prepare_ds gets an error on nfs4_pnfs_ds_connect with status ERESTARTSYS, which is set by rpc_signal_task, but the error is treated as transient, thus retried. The issue is reproducible with Ctrl+C the following script(there should be ~1000 files in a directory, client should must not have any connections to DSes): ``` echo 3 > /proc/sys/vm/drop_caches for i in * do head -1 $i done ``` The change aims to propagate the nfs4_ff_layout_prepare_ds error state to the caller that can decide whatever this is a retryable error or not. Signed-off-by: Tigran Mkrtchyan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Fixes: 260f32adb88d ("pNFS/flexfiles: Check the result of nfs4_pnfs_ds_connect") Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: drop __exit from nfs_exit_keyring	Christoph Hellwig	2025-07-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise built-in NFS can lead to sectіon mismatches. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Fixes: 87268f7a4f1f ("nfs: create a kernel keyring") Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: pass struct nfs_client_initdata to nfs4_set_client	Christoph Hellwig	2025-07-22	1	-83/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Passed the partially filled out structure to nfs4_set_client instead of 11 arguments that then get stashed into the structure. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
\| *	pNFS: Fix disk addr range check in block/scsi layout	Sergey Bashirov	2025-07-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At the end of the isect translation, disc_addr represents the physical disk offset. Thus, end calculated from disk_addr is also a physical disk offset. Therefore, range checking should be done using map->disk_offset, not map->start. Signed-off-by: Sergey Bashirov <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	pNFS: Fix stripe mapping in block/scsi layout	Sergey Bashirov	2025-07-14	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because of integer division, we need to carefully calculate the disk offset. Consider the example below for a stripe of 6 volumes, a chunk size of 4096, and an offset of 70000. chunk = div_u64(offset, dev->chunk_size) = 70000 / 4096 = 17 offset = chunk * dev->chunk_size = 17 * 4096 = 69632 disk_offset_wrong = div_u64(offset, dev->nr_children) = 69632 / 6 = 11605 disk_chunk = div_u64(chunk, dev->nr_children) = 17 / 6 = 2 disk_offset = disk_chunk * dev->chunk_size = 2 * 4096 = 8192 Signed-off-by: Sergey Bashirov <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	pNFS: Handle RPC size limit for layoutcommits	Sergey Bashirov	2025-07-14	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When there are too many block extents for a layoutcommit, they may not all fit into the maximum-sized RPC. This patch allows the generic pnfs code to properly handle -ENOSPC returned by the block/scsi layout driver and trigger additional layoutcommits if necessary. Co-developed-by: Konstantin Evtushenko <[email protected]> Signed-off-by: Konstantin Evtushenko <[email protected]> Signed-off-by: Sergey Bashirov <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	pNFS: Add prepare commit trace to block/scsi layout	Sergey Bashirov	2025-07-14	3	-3/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace dprintk with trace event in ext_tree_prepare_commit() function. Co-developed-by: Konstantin Evtushenko <[email protected]> Signed-off-by: Konstantin Evtushenko <[email protected]> Signed-off-by: Sergey Bashirov <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	pNFS: Fix extent encoding in block/scsi layout	Sergey Bashirov	2025-07-14	1	-6/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ext_tree_encode_commit() function may be called multiple times for the same file, layout, and last written byte if the provided buffer is not large enough to encode all extents in it. The first problem is that the last written byte field must be zeroed only on a successful call, otherwise we will lose its actual value and get an integer overflow on the next encoding attempt. The second problem is that we can't count and encode in one pass. The extent state changes during encoding, so if we return -ENOSPC but have already encoded some extents into a small buffer, they will not be re-encoded into a new larger buffer on the next try. As a result, the client never commits these extents to the server. Co-developed-by: Konstantin Evtushenko <[email protected]> Signed-off-by: Konstantin Evtushenko <[email protected]> Signed-off-by: Sergey Bashirov <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	pNFS: Fix uninited ptr deref in block/scsi layout	Sergey Bashirov	2025-07-14	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The error occurs on the third attempt to encode extents. When function ext_tree_prepare_commit() reallocates a larger buffer to retry encoding extents, the "layoutupdate_pages" page array is initialized only after the retry loop. But ext_tree_free_commitdata() is called on every iteration and tries to put pages in the array, thus dereferencing uninitialized pointers. An additional problem is that there is no limit on the maximum possible buffer_size. When there are too many extents, the client may create a layoutcommit that is larger than the maximum possible RPC size accepted by the server. During testing, we observed two typical scenarios. First, one memory page for extents is enough when we work with small files, append data to the end of the file, or preallocate extents before writing. But when we fill a new large file without preallocating, the number of extents can be huge, and counting the number of written extents in ext_tree_encode_commit() does not help much. Since this number increases even more between unlocking and locking of ext_tree, the reallocated buffer may not be large enough again and again. Co-developed-by: Konstantin Evtushenko <[email protected]> Signed-off-by: Konstantin Evtushenko <[email protected]> Signed-off-by: Sergey Bashirov <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: Remove unused function nfs_umount	Dr. David Alan Gilbert	2025-07-14	2	-69/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nfs_umount() has been unused since 2013's commit 4580a92d44e2 ("NFS: Use server-recommended security flavor by default (NFSv3)") Remove it. Signed-off-by: Dr. David Alan Gilbert <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	nfs: create a kernel keyring	Christoph Hellwig	2025-07-14	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Create a kernel .nfs keyring similar to the nvme .nvme one. Unlike for a userspace-created keyrind, tlshd is a possesor of the keys with this and thus the keys don't need user read permissions. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: support the kernel keyring for TLS	Christoph Hellwig	2025-07-14	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow tlshd to use a per-mount key from the kernel keyring similar to NVMe over TCP. Note that tlshd expects keys and certificates stored in the kernel keyring to be in DER format, not the PEM format used for file based keys and certificates, so they need to be converted before they are added to the keyring, which is a bit unexpected. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: Allow folio migration for the case of mode == MIGRATE_SYNC	Trond Myklebust	2025-07-14	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the mode is MIGRATE_SYNC, we are allowed to call nfs_wb_folio() under the folio lock. Reviewed-by: Benjamin Coddington <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
\| *	nfs: new tracepoint in match_stateid operation	Jeff Layton	2025-07-14	2	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new tracepoints in the NFSv4 match_stateid minorversion op that show the info in both stateids. Reviewed-by: Benjamin Coddington <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	nfs: new tracepoint in nfs_delegation_need_return	Jeff Layton	2025-07-14	2	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a tracepoint in the function that decides whether to return a delegation to the server. Reviewed-by: Benjamin Coddington <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	nfs: add a tracepoint to nfs_inode_detach_delegation_locked	Jeff Layton	2025-07-14	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have tracepoints for setting a delegation and reclaiming them. Add a tracepoint for when the delegation is being detached from the inode. Reviewed-by: Benjamin Coddington <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	nfs: add cache_validity to the nfs_inode_event tracepoints	Jeff Layton	2025-07-14	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Managing the cache_validity flags is the deep voodoo of NFS cache coherency. Let's have a little extra visibility into that value via the nfs_inode_event tracepoints. Reviewed-by: Benjamin Coddington <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: remove unused time_delta field from struct nfs_server	Anthony Iliopoulos	2025-07-14	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The last code that was using this was removed via commit ca0daa277aca ("NFS: Cache aggressively when file is open for writing") which was merged in v4.8-rc1, so it can be removed completely. Signed-off-by: Anthony Iliopoulos <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
\| *	NFS: remove unused wpages field from struct nfs_server	Anthony Iliopoulos	2025-07-14	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The wpages field is not serving any purpose since commit c63c7b051395 ("NFS: Fix a race when doing NFS write coalescing") which was merged in v2.6.22-rc1. Remove it completely. Signed-off-by: Anthony Iliopoulos <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>