aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/mmc/core/queue.c
diff options
context:
space:
mode:
authorDave Hansen <[email protected]>2025-05-08 22:41:32 +0000
committerLinus Torvalds <[email protected]>2025-05-09 15:00:31 +0000
commitfea4e317f9e7e1f449ce90dedc27a2d2a95bee5a (patch)
tree9b5284c25eb5122b00a6e8d822d39c16536c6309 /drivers/mmc/core/queue.c
parentMerge tag 'bcachefs-2025-05-08' of git://evilpiepirate.org/bcachefs (diff)
downloadkernel-fea4e317f9e7e1f449ce90dedc27a2d2a95bee5a.tar.gz
kernel-fea4e317f9e7e1f449ce90dedc27a2d2a95bee5a.zip
x86/mm: Eliminate window where TLB flushes may be inadvertently skipped
tl;dr: There is a window in the mm switching code where the new CR3 is set and the CPU should be getting TLB flushes for the new mm. But should_flush_tlb() has a bug and suppresses the flush. Fix it by widening the window where should_flush_tlb() sends an IPI. Long Version: === History === There were a few things leading up to this. First, updating mm_cpumask() was observed to be too expensive, so it was made lazier. But being lazy caused too many unnecessary IPIs to CPUs due to the now-lazy mm_cpumask(). So code was added to cull mm_cpumask() periodically[2]. But that culling was a bit too aggressive and skipped sending TLB flushes to CPUs that need them. So here we are again. === Problem === The too-aggressive code in should_flush_tlb() strikes in this window: // Turn on IPIs for this CPU/mm combination, but only // if should_flush_tlb() agrees: cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen = atomic64_read(&next->context.tlb_gen); choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush); load_new_mm_cr3(need_flush); // ^ After 'need_flush' is set to false, IPIs *MUST* // be sent to this CPU and not be ignored. this_cpu_write(cpu_tlbstate.loaded_mm, next); // ^ Not until this point does should_flush_tlb() // become true! should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3() and writing to 'loaded_mm', which is a window where they should not be suppressed. Whoops. === Solution === Thankfully, the fuzzy "just about to write CR3" window is already marked with loaded_mm==LOADED_MM_SWITCHING. Simply checking for that state in should_flush_tlb() is sufficient to ensure that the CPU is targeted with an IPI. This will cause more TLB flush IPIs. But the window is relatively small and I do not expect this to cause any kind of measurable performance impact. Update the comment where LOADED_MM_SWITCHING is written since it grew yet another user. Peter Z also raised a concern that should_flush_tlb() might not observe 'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off() writes them. Add a barrier to ensure that they are observed in the order they are written. Signed-off-by: Dave Hansen <[email protected]> Acked-by: Rik van Riel <[email protected]> Link: https://lore.kernel.org/oe-lkp/[email protected]/ [1] Fixes: 6db2526c1d69 ("x86/mm/tlb: Only trim the mm_cpumask once a second") [2] Reported-by: Stephen Dolan <[email protected]> Cc: [email protected] Acked-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Diffstat (limited to 'drivers/mmc/core/queue.c')
0 files changed, 0 insertions, 0 deletions