aboutsummaryrefslogtreecommitdiffstats
path: root/mm/debug_vm_pgtable.c
diff options
context:
space:
mode:
authorDavid Hildenbrand <[email protected]>2023-01-13 17:10:01 +0000
committerAndrew Morton <[email protected]>2023-02-03 06:33:05 +0000
commit2321ba3e3733f513e46e29b9c70512ecddbf1085 (patch)
tree2896bac7550dd008e674c8ffc733c97307ee1f38 /mm/debug_vm_pgtable.c
parentmm/khugepaged: convert release_pte_pages() to use folios (diff)
downloadkernel-2321ba3e3733f513e46e29b9c70512ecddbf1085.tar.gz
kernel-2321ba3e3733f513e46e29b9c70512ecddbf1085.zip
mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks
Patch series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". This is the follow-up on [1]: [PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all remaining architectures that support swap PTEs. This makes sure that exclusive anonymous pages will stay exclusive, even after they were swapped out -- for example, making GUP R/W FOLL_GET of anonymous pages reliable. Details can be found in [1]. This primarily fixes remaining known O_DIRECT memory corruptions that can happen on concurrent swapout, whereby we can lose DMA reads to a page (modifying the user page by writing to it). To verify, there are two test cases (requiring swap space, obviously): (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries triggering a race condition. (2) My vmsplice() test case [3] that tries to detect if the exclusive marker was lost during swapout, not relying on a race condition. For example, on 32bit x86 (with and without PAE), my test case fails without these patches: $ ./test_swp_exclusive FAIL: page was replaced during COW But succeeds with these patches: $ ./test_swp_exclusive PASS: page was not replaced during COW Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even the ones where swap support might be in a questionable state? This is the first step towards removing "readable_exclusive" migration entries, and instead using pte_swp_exclusive() also with (readable) migration entries instead (as suggested by Peter). The only missing piece for that is supporting pmd_swp_exclusive() on relevant architectures with THP migration support. As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,, we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch. I tried cross-compiling all relevant setups and tested on x86 and sparc64 so far. CCing arch maintainers only on this cover letter and on the respective patch(es). [1] https://lkml.kernel.org/r/[email protected] [2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c [3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c This patch (of 26): We want to implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures. Let's extend our sanity checks, especially testing that our PTE bit does not affect: * is_swap_pte() -> pte_present() and pte_none() * the swap entry + type * pte_swp_soft_dirty() Especially, the pfn_pte() is dodgy when the swap PTE layout differs heavily from ordinary PTEs. Let's properly construct a swap PTE from swap type+offset. [[email protected]: fix build] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Brian Cain <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David S. Miller <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: H. Peter Anvin (Intel) <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Johannes Berg <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Peter Xu <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xuerui Wang <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Diffstat (limited to 'mm/debug_vm_pgtable.c')
-rw-r--r--mm/debug_vm_pgtable.c25
1 files changed, 24 insertions, 1 deletions
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index bb3328f46126..ff8d6f6af896 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -811,13 +811,36 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args) {
static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args)
{
#ifdef __HAVE_ARCH_PTE_SWP_EXCLUSIVE
- pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot);
+ unsigned long max_swap_offset;
+ swp_entry_t entry, entry2;
+ pte_t pte;
pr_debug("Validating PTE swap exclusive\n");
+
+ /* See generic_max_swapfile_size(): probe the maximum offset */
+ max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL))));
+
+ /* Create a swp entry with all possible bits set */
+ entry = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
+
+ pte = swp_entry_to_pte(entry);
+ WARN_ON(pte_swp_exclusive(pte));
+ WARN_ON(!is_swap_pte(pte));
+ entry2 = pte_to_swp_entry(pte);
+ WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
+
pte = pte_swp_mkexclusive(pte);
WARN_ON(!pte_swp_exclusive(pte));
+ WARN_ON(!is_swap_pte(pte));
+ WARN_ON(pte_swp_soft_dirty(pte));
+ entry2 = pte_to_swp_entry(pte);
+ WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
+
pte = pte_swp_clear_exclusive(pte);
WARN_ON(pte_swp_exclusive(pte));
+ WARN_ON(!is_swap_pte(pte));
+ entry2 = pte_to_swp_entry(pte);
+ WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
#endif /* __HAVE_ARCH_PTE_SWP_EXCLUSIVE */
}