aboutsummaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/btree_trans_commit.c
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: Kill unused tracepointsKent Overstreet2025-06-161-2/+3
| | | | | | | Dead code cleanup. Link: https://lore.kernel.org/linux-bcachefs/[email protected]/ Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Delay calculation of trans->journal_u64sAlan Huang2025-06-161-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there is commit error that need split btree leaf, fsck might change the value of trans->journal_entries.u64s, when retry commit, the value of trans->journal_u64s would be incorrect, which will lead to trans->journal_res.u64s underflow, and then out of bounds write will occur: [ 464.496970][T11969] Call trace: [ 464.496973][T11969] show_stack+0x3c/0x88 (C) [ 464.496995][T11969] dump_stack_lvl+0xf8/0x178 [ 464.497014][T11969] dump_stack+0x20/0x30 [ 464.497031][T11969] __bch2_trans_log_str+0x344/0x350 [ 464.497048][T11969] bch2_trans_log_str+0x3c/0x60 [ 464.497065][T11969] __bch2_fsck_err+0x11bc/0x1390 [ 464.497083][T11969] bch2_check_discard_freespace_key+0xad4/0x10d0 [ 464.497100][T11969] bch2_bucket_alloc_freelist+0x99c/0x1130 [ 464.497117][T11969] bch2_bucket_alloc_trans+0x79c/0xcb8 [ 464.497133][T11969] bch2_bucket_alloc_set_trans+0x378/0xc20 [ 464.497151][T11969] __open_bucket_add_buckets+0x7fc/0x1c00 [ 464.497168][T11969] open_bucket_add_buckets+0x184/0x3a8 [ 464.497185][T11969] bch2_alloc_sectors_start_trans+0xa04/0x1da0 [ 464.497203][T11969] bch2_btree_reserve_get+0x6e0/0xef0 [ 464.497220][T11969] bch2_btree_update_start+0x1618/0x2600 [ 464.497239][T11969] bch2_btree_split_leaf+0xcc/0x730 [ 464.497258][T11969] bch2_trans_commit_error+0x22c/0xc30 [ 464.497276][T11969] __bch2_trans_commit+0x207c/0x4e30 [ 464.497292][T11969] bch2_journal_replay+0x9e0/0x1420 [ 464.497305][T11969] __bch2_run_recovery_passes+0x458/0xf98 [ 464.497318][T11969] bch2_run_recovery_passes+0x280/0x478 [ 464.497331][T11969] bch2_fs_recovery+0x24f0/0x3a28 [ 464.497344][T11969] bch2_fs_start+0xb80/0x1248 [ 464.497358][T11969] bch2_fs_get_tree+0xe94/0x1708 [ 464.497377][T11969] vfs_get_tree+0x84/0x2d0 Signed-off-by: Alan Huang <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Add missing EBUG_ONAlan Huang2025-06-161-0/+2
| | | | | | | Just like the EBUG_ON in bch2_journal_add_entry(). Signed-off-by: Alan Huang <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: bch_err_throw()Kent Overstreet2025-06-021-7/+8
| | | | | | Add a tracepoint for any time we return an error and unwind. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: bch2_check_fix_ptrs() can now repair btree rootsKent Overstreet2025-05-301-4/+17
| | | | | | | | | | | This is straightforward enough: check_fix_ptrs() currently only runs before we go RW, so updating the btree root pointer in c->btree_roots suffices - it'll be written out in the first journal write we do. For that, do_bch2_trans_commit_to_journal_replay() now handles JSET_ENTRY_btree_root entries. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Split out accounting in transaction commitKent Overstreet2025-05-221-20/+31
| | | | | | | | | | There can be a lot of rendundancy in accounting updates within a single btree transaction. Split out accounting updates so that they can be deduped, in the next commit. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: btree_trans_subbufKent Overstreet2025-05-221-15/+16
| | | | Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Debug params are now static_keysKent Overstreet2025-05-221-2/+2
| | | | | | | | We'd like users to be able to debug without building custom kernels, so this will help us get rid of CONFIG_BCACHEFS_DEBUG, at least for most things. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: bch_fs.writes -> enumerated_refsKent Overstreet2025-05-221-2/+3
| | | | | | | Drop the single-purpose write ref code in bcachefs.h, and convert to enumarated refs. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: replace strncpy() with memcpy_and_pad in journal_transaction_nameRoxana Nicolescu2025-05-221-1/+3
| | | | | | | | | | | Strncpy is now deprecated. The buffer destination is not required to be NULL-terminated, but we also want to zero out the rest of the buffer as it is already done in other places. Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Roxana Nicolescu <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: kmsan assertsKent Overstreet2025-03-241-0/+1
| | | | | | Catching these early makes them a lot easier to track down. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Kill JOURNAL_ERRORS()Kent Overstreet2025-03-241-14/+16
| | | | | | | Convert these to standard error codes, which means we can pass them outside the journal code, they're easier to pass to tracepoints, etc. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: rework bch2_trans_commit_run_triggers()Kent Overstreet2025-03-151-55/+32
| | | | Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: CONFIG_BCACHEFS_INJECT_TRANSACTION_RESTARTSKent Overstreet2025-02-121-0/+4
| | | | | | | Incorrectly handled transaction restarts can be a source of heisenbugs; add a mode where we randomly inject them to shake them out. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: "Journal stuck" timeout now takes into account device latencyKent Overstreet2025-01-211-1/+1
| | | | | | | | | | | If a block device (e.g. your typical consumer SSD) is taking multiple seconds for IOs (typically flushes), we don't want to emit the "journal stuck" message prematurely. Also, make sure to drop the btree_trans srcu lock if we're blocking for more than a second. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: bch2_trans_unlock_write()Kent Overstreet2025-01-101-3/+3
| | | | | | | | New helper for dropping all write locks; which is distinct from the helper the transaction commit path uses, which is faster and only touches updates. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: bch2_btree_node_write_trans()Kent Overstreet2025-01-101-1/+1
| | | | | | Avoiding screwing up path->lock_seq. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Don't run overwrite triggers before insertKent Overstreet2024-12-211-44/+37
| | | | | | | This breaks when the trigger is inserting updates for the same btree, as the inode trigger now does. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Plumb bkey_validate_context to journal_entry_validateKent Overstreet2024-12-211-25/+17
| | | | | | | This lets us print the exact location in the journal if it was found in the journal, or correctly print if it was found in the superblock. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: struct bkey_validate_contextKent Overstreet2024-12-211-1/+6
| | | | | | | | Add a new parameter to bkey validate functions, and use it to improve invalid bkey error messages: we can now print the btree and depth it came from, or if it came from the journal, or is a btree root. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: bch2_trans_verify_not_unlocked_or_in_restart()Kent Overstreet2024-12-211-6/+3
| | | | | | Fold two asserts into one. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Kill BCH_TRANS_COMMIT_lazy_rwKent Overstreet2024-12-211-26/+5
| | | | | | | We unconditionally go read-write, if we're going to do so, before journal replay: lazy_rw is obsolete. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Add assert for use of journal replay keys for updatesKent Overstreet2024-12-211-0/+2
| | | | | | | | | The journal replay keys mechanism can only be used for updates in early recovery, when still single threaded. Add some asserts to make sure we never accidentally use it elsewhere. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: kill btree_trans_restart_nounlock()Kent Overstreet2024-12-211-1/+1
| | | | | | Redundant, the normal btree_trans_restart() doesn't unlock. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Pull disk accounting hooks out of trans_commit.cKent Overstreet2024-12-211-29/+6
| | | | | | | Also, fix a minor bug in the revert path, where we weren't checking the journal entry type correctly. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Fix trans_commit disk accounting revertKent Overstreet2024-10-031-1/+2
| | | | | | | | We only are applying JSET_ENTRY_TYPE_write_buffer_keys, revert path was missed. Fixes: a3581ca35d2b ("bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_apply") Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_applyKent Overstreet2024-09-281-14/+18
| | | | | | | | | | | This was added to avoid double-counting accounting keys in journal replay. But applied incorrectly (easily done since it applies to the transaction commit, not a particular update), it leads to skipping in-mem accounting for real accounting updates, and failure to give them a version number - which leads to journal replay becoming very confused the next time around. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: rename version -> bversionKent Overstreet2024-09-281-4/+4
| | | | | | give bversions a more distinct name, to aid in grepping Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Move transaction commit path validation to as late as possibleKent Overstreet2024-09-281-34/+34
| | | | | | | In order to check for accounting keys with version=0, we need to run validation after they've been assigned version numbers. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: bch_accounting_modeKent Overstreet2024-09-281-2/+2
| | | | | | | Minor refactoring - replace multiple bool arguments with an enum; prep work for fixing a bug in accounting read. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Remove unused parameterAlan Huang2024-09-091-1/+1
| | | | | | | iter here is unused, remove it. Signed-off-by: Alan Huang <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Kill __bch2_accounting_mem_mod()Kent Overstreet2024-08-141-2/+2
| | | | | | | | | | | The next patch will be adding a disk accounting counter type which is not kept in the in-memory eytzinger tree. As prep, fold __bch2_accounting_mem_mod() into bch2_accounting_mem_mod_locked() so that we can check for that counter type and bail out without calling bpos_to_disk_accounting_pos() twice. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()Kent Overstreet2024-08-141-58/+14
| | | | | | | | | | | | | | | | | | bkey_fsck_err() was added as an interface that looks like fsck_err(), but previously all it did was ensure that the appropriate error counter was incremented in the superblock. This is a cleanup and bugfix patch that converts it to a wrapper around fsck_err(). This is needed to fix an issue with the upgrade path to disk_accounting_v3, where the "silent fix" error list now includes bkey_fsck errors; fsck_err() handles this in a unified way, and since we need to change printing of bkey fsck errors from the caller to the inner bkey_fsck_err() calls, this ends up being a pretty big change. Als,, rename .invalid() methods to .validate(), for clarity, while we're changing the function signature anyways (to drop the printbuf argument). Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Add a time_stat for blocked on key cache flushKent Overstreet2024-08-141-0/+4
| | | | Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Add hysteresis to waiting on btree key cache flushKent Overstreet2024-08-141-1/+1
| | | | | | | | This helps ensure key cache reclaim isn't contending with threads waiting for the key cache to be helped, and fixes a severe performance bug. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: bch2_btree_key_cache_drop() now evictsKent Overstreet2024-07-141-6/+5
| | | | | | | As part of improving btree key cache coherency, the bkey_cached.valid flag is going away. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Kill gc_pos_btree_node()Kent Overstreet2024-07-141-1/+1
| | | | | | | gc_pos is now based on keys, not nodes, for invariantness w.r.t. splits and merges Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: btree_types bitmask cleanupsKent Overstreet2024-07-141-28/+22
| | | | | | | Make things more consistent and ensure that we're using u64 bitfields - key types and btree ids are already around 32 bits. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Delete old assertion for online fsckKent Overstreet2024-07-141-8/+1
| | | | | | | | the order in which btree_gc walks keys have changed, so we no longer have the sort of issues with online fsck this assertion was warning about. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Convert gc to new accountingKent Overstreet2024-07-141-2/+2
| | | | | | | | | | Rewrite fsck/gc for the new accounting scheme. This adds a second set of in-memory accounting counters for gc to use; like with other parts of gc we run all trigger in TRIGGER_GC mode, then compare what we calculated to existing in-memory accounting at the end. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Disk space accounting rewriteKent Overstreet2024-07-141-15/+50
| | | | | | | | | | | | | | | | | | | | | | | | | Main part of the disk accounting rewrite. This is a wholesale rewrite of the existing disk space accounting, which relies on percepu counters that are sharded by journal buffer, and rolled up and added to each journal write. With the new scheme, every set of counters is a distinct key in the accounting btree; this fixes scaling limitations of the old scheme, where counters took up space in each journal entry and required multiple percpu counters. Now, in memory accounting requires a single set of percpu counters - not multiple for each in flight journal buffer - and in the future we'll probably also have counters that don't use in memory percpu counters, they're not strictly required. An accounting update is now a normal btree update, using the btree write buffer path. At transaction commit time, we apply accounting updates to the in memory counters, which are percpu counters indexed in an eytzinger tree by the accounting key. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Accumulate accounting keys in journal replayKent Overstreet2024-07-141-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | Until accounting keys hit the btree, they are deltas, not new versions of the existing key; this means we have to teach journal replay to accumulate them. Additionally, the journal doesn't track precisely which entries have been flushed to the btree; it only tracks a range of entries that may possibly still need to be flushed. That means we need to compare accounting keys against the version in the btree and only flush updates that are newer. There's another wrinkle with the write buffer: if the write buffer starts flushing accounting keys before journal replay has finished flushing accounting keys, journal replay will see the version number from the new updates and updates from the journal will be lost. To avoid this, journal replay has to flush accounting keys first, and we'll be adding a flag so that write buffer flush knows to hold accounting keys until then. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Use try_cmpxchg() family of functions instead of cmpxchg()Uros Bizjak2024-07-141-4/+4
| | | | | | | | | | | | | | | | | Use try_cmpxchg() family of functions instead of cmpxchg (*ptr, old, new) == old. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Signed-off-by: Uros Bizjak <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Brian Foster <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: s/bkey_invalid_flags/bch_validate_flagsKent Overstreet2024-05-091-5/+5
| | | | | | | We're about to start using bch_validate_flags for superblock section validation - it's no longer bkey specific. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: x-macroize journal flags enumsKent Overstreet2024-05-081-1/+1
| | | | Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: bch2_trans_verify_not_unlocked()Kent Overstreet2024-05-081-0/+7
| | | | Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: bch2_trans_commit_flags_to_text()Kent Overstreet2024-05-081-0/+21
| | | | Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet2024-05-081-12/+12
| | | | | | | | | | | Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: prt_printf() now respects \r\n\tKent Overstreet2024-05-081-4/+2
| | | | Signed-off-by: Kent Overstreet <[email protected]>
* bcachefs: Fix deadlock in journal replayKent Overstreet2024-04-141-3/+4
| | | | | | | | btree_key_can_insert_cached() should be checking the watermark - BCH_TRANS_COMMIT_journal_replay really means nonblocking mode when watermark < reclaim, it was being used incorrectly. Signed-off-by: Kent Overstreet <[email protected]>