diff options
| author | Linus Torvalds <[email protected]> | 2025-01-20 17:29:11 +0000 |
|---|---|---|
| committer | Linus Torvalds <[email protected]> | 2025-01-20 17:29:11 +0000 |
| commit | ca56a74a31e26d81a481304ed2f631e65883372b (patch) | |
| tree | e17f181dbd6be574ef0731cd6729422ed2cd0c09 /fs/netfs/internal.h | |
| parent | x86: use cmov for user address masking (diff) | |
| parent | Merge patch series "netfs: Read performance improvements and "single-blob" su... (diff) | |
| download | kernel-ca56a74a31e26d81a481304ed2f631e65883372b.tar.gz kernel-ca56a74a31e26d81a481304ed2f631e65883372b.zip | |
Merge tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs netfs updates from Christian Brauner:
"This contains read performance improvements and support for monolithic
single-blob objects that have to be read/written as such (e.g. AFS
directory contents). The implementation of the two parts is interwoven
as each makes the other possible.
- Read performance improvements
The read performance improvements are intended to speed up some
loss of performance detected in cifs and to a lesser extend in afs.
The problem is that we queue too many work items during the
collection of read results: each individual subrequest is collected
by its own work item, and then they have to interact with each
other when a series of subrequests don't exactly align with the
pattern of folios that are being read by the overall request.
Whilst the processing of the pages covered by individual
subrequests as they complete potentially allows folios to be woken
in parallel and with minimum delay, it can shuffle wakeups for
sequential reads out of order - and that is the most common I/O
pattern.
The final assessment and cleanup of an operation is then held up
until the last I/O completes - and for a synchronous sequential
operation, this means the bouncing around of work items just adds
latency.
Two changes have been made to make this work:
(1) All collection is now done in a single "work item" that works
progressively through the subrequests as they complete (and
also dispatches retries as necessary).
(2) For readahead and AIO, this work item be done on a workqueue
and can run in parallel with the ultimate consumer of the data;
for synchronous direct or unbuffered reads, the collection is
run in the application thread and not offloaded.
Functions such as smb2_readv_callback() then just tell netfslib
that the subrequest has terminated; netfslib does a minimal bit of
processing on the spot - stat counting and tracing mostly - and
then queues/wakes up the worker. This simplifies the logic as the
collector just walks sequentially through the subrequests as they
complete and walks through the folios, if buffered, unlocking them
as it goes. It also keeps to a minimum the amount of latency
injected into the filesystem's low-level I/O handling
The way netfs supports filesystems using the deprecated
PG_private_2 flag is changed: folios are flagged and added to a
write request as they complete and that takes care of scheduling
the writes to the cache. The originating read request can then just
unlock the pages whatever happens.
- Single-blob object support
Single-blob objects are files for which the content of the file
must be read from or written to the server in a single operation
because reading them in parts may yield inconsistent results. AFS
directories are an example of this as there exists the possibility
that the contents are generated on the fly and would differ between
reads or might change due to third party interference.
Such objects will be written to and retrieved from the cache if one
is present, though we allow/may need to propose multiple
subrequests to do so. The important part is that read from/write to
the *server* is monolithic.
Single blob reading is, for the moment, fully synchronous and does
result collection in the application thread and, also for the
moment, the API is supplied the buffer in the form of a folio_queue
chain rather than using the pagecache.
- Related afs changes
This series makes a number of changes to the kafs filesystem,
primarily in the area of directory handling:
- AFS's FetchData RPC reply processing is made partially
asynchronous which allows the netfs_io_request's outstanding
operation counter to be removed as part of reducing the
collection to a single work item.
- Directory and symlink reading are plumbed through netfslib using
the single-blob object API and are now cacheable with fscache.
This also allows the afs_read struct to be eliminated and
netfs_io_subrequest to be used directly instead.
- Directory and symlink content are now stored in a folio_queue
buffer rather than in the pagecache. This means we don't require
the RCU read lock and xarray iteration to access it, and folios
won't randomly disappear under us because the VM wants them
back.
- The vnode operation lock is changed from a mutex struct to a
private lock implementation. The problem is that the lock now
needs to be dropped in a separate thread and mutexes don't
permit that.
- When a new directory or symlink is created, we now initialise it
locally and mark it valid rather than downloading it (we know
what it's likely to look like).
- We now use the in-directory hashtable to reduce the number of
entries we need to scan when doing a lookup. The edit routines
have to maintain the hash chains.
- Cancellation (e.g. by signal) of an async call after the
rxrpc_call has been set up is now offloaded to the worker thread
as there will be a notification from rxrpc upon completion. This
avoids a double cleanup.
- A "rolling buffer" implementation is created to abstract out the
two separate folio_queue chaining implementations I had (one for
read and one for write).
- Functions are provided to create/extend a buffer in a folio_queue
chain and tear it down again.
This is used to handle AFS directories, but could also be used to
create bounce buffers for content crypto and transport crypto.
- The was_async argument is dropped from netfs_read_subreq_terminated()
Instead we wake the read collection work item by either queuing it
or waking up the app thread.
- We don't need to use BH-excluding locks when communicating between
the issuing thread and the collection thread as neither of them now
run in BH context.
- Also included are a number of new tracepoints; a split of the
netfslib write collection code to put retrying into its own file
(it gets more complicated with content encryption).
- There are also some minor fixes AFS included, including fixing the
AFS directory format struct layout, reducing some directory
over-invalidation and making afs_mkdir() translate EEXIST to
ENOTEMPY (which is not available on all systems the servers
support).
- Finally, there's a patch to try and detect entry into the folio
unlock function with no folio_queue structs in the buffer (which
isn't allowed in the cases that can get there).
This is a debugging patch, but should be minimal overhead"
* tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
netfs: Report on NULL folioq in netfs_writeback_unlock_folios()
afs: Add a tracepoint for afs_read_receive()
afs: Locally initialise the contents of a new symlink on creation
afs: Use the contained hashtable to search a directory
afs: Make afs_mkdir() locally initialise a new directory's content
netfs: Change the read result collector to only use one work item
afs: Make {Y,}FS.FetchData an asynchronous operation
afs: Fix cleanup of immediately failed async calls
afs: Eliminate afs_read
afs: Use netfslib for symlinks, allowing them to be cached
afs: Use netfslib for directories
afs: Make afs_init_request() get a key if not given a file
netfs: Add support for caching single monolithic objects such as AFS dirs
netfs: Add functions to build/clean a buffer in a folio_queue
afs: Add more tracepoints to do with tracking validity
cachefiles: Add auxiliary data trace
cachefiles: Add some subrequest tracepoints
netfs: Remove some extraneous directory invalidations
afs: Fix directory format encoding struct
afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTY
...
Diffstat (limited to 'fs/netfs/internal.h')
| -rw-r--r-- | fs/netfs/internal.h | 41 |
1 files changed, 26 insertions, 15 deletions
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index c562aec3b483..eb76f98c894b 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -23,6 +23,7 @@ /* * buffered_read.c */ +void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error, bool was_async); int netfs_prefetch_for_write(struct file *file, struct folio *folio, size_t offset, size_t len); @@ -58,11 +59,8 @@ static inline void netfs_proc_del_rreq(struct netfs_io_request *rreq) {} /* * misc.c */ -struct folio_queue *netfs_buffer_make_space(struct netfs_io_request *rreq); -int netfs_buffer_append_folio(struct netfs_io_request *rreq, struct folio *folio, - bool needs_put); -struct folio_queue *netfs_delete_buffer_head(struct netfs_io_request *wreq); -void netfs_clear_buffer(struct netfs_io_request *rreq); +struct folio_queue *netfs_buffer_make_space(struct netfs_io_request *rreq, + enum netfs_folioq_trace trace); void netfs_reset_iter(struct netfs_io_subrequest *subreq); /* @@ -84,20 +82,27 @@ static inline void netfs_see_request(struct netfs_io_request *rreq, trace_netfs_rreq_ref(rreq->debug_id, refcount_read(&rreq->ref), what); } +static inline void netfs_see_subrequest(struct netfs_io_subrequest *subreq, + enum netfs_sreq_ref_trace what) +{ + trace_netfs_sreq_ref(subreq->rreq->debug_id, subreq->debug_index, + refcount_read(&subreq->ref), what); +} + /* * read_collect.c */ -void netfs_read_termination_worker(struct work_struct *work); -void netfs_rreq_terminated(struct netfs_io_request *rreq, bool was_async); +void netfs_read_collection_worker(struct work_struct *work); +void netfs_wake_read_collector(struct netfs_io_request *rreq); +void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error, bool was_async); +ssize_t netfs_wait_for_read(struct netfs_io_request *rreq); +void netfs_wait_for_pause(struct netfs_io_request *rreq); /* * read_pgpriv2.c */ -void netfs_pgpriv2_mark_copy_to_cache(struct netfs_io_subrequest *subreq, - struct netfs_io_request *rreq, - struct folio_queue *folioq, - int slot); -void netfs_pgpriv2_write_to_the_cache(struct netfs_io_request *rreq); +void netfs_pgpriv2_copy_to_cache(struct netfs_io_request *rreq, struct folio *folio); +void netfs_pgpriv2_end_copy_to_cache(struct netfs_io_request *rreq); bool netfs_pgpriv2_unlock_copied_folios(struct netfs_io_request *wreq); /* @@ -113,6 +118,7 @@ void netfs_unlock_abandoned_read_pages(struct netfs_io_request *rreq); extern atomic_t netfs_n_rh_dio_read; extern atomic_t netfs_n_rh_readahead; extern atomic_t netfs_n_rh_read_folio; +extern atomic_t netfs_n_rh_read_single; extern atomic_t netfs_n_rh_rreq; extern atomic_t netfs_n_rh_sreq; extern atomic_t netfs_n_rh_download; @@ -181,9 +187,9 @@ void netfs_reissue_write(struct netfs_io_stream *stream, struct iov_iter *source); void netfs_issue_write(struct netfs_io_request *wreq, struct netfs_io_stream *stream); -int netfs_advance_write(struct netfs_io_request *wreq, - struct netfs_io_stream *stream, - loff_t start, size_t len, bool to_eof); +size_t netfs_advance_write(struct netfs_io_request *wreq, + struct netfs_io_stream *stream, + loff_t start, size_t len, bool to_eof); struct netfs_io_request *netfs_begin_writethrough(struct kiocb *iocb, size_t len); int netfs_advance_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc, struct folio *folio, size_t copied, bool to_page_end, @@ -193,6 +199,11 @@ int netfs_end_writethrough(struct netfs_io_request *wreq, struct writeback_contr int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t len); /* + * write_retry.c + */ +void netfs_retry_writes(struct netfs_io_request *wreq); + +/* * Miscellaneous functions. */ static inline bool netfs_is_cache_enabled(struct netfs_inode *ctx) |
