mempolicy: optimize queue_folios_pte_range by PTE batching - kernel

diff options

author	Dev Jain <[email protected]>	2025-04-16 05:30:48 +0000
committer	Andrew Morton <[email protected]>	2025-05-12 00:48:33 +0000
commit	4a34c584d8cd13d2b721d21cf629f77c60bfb4a4 (patch)
tree	cdb64a916e3e5a9c9f849494524f0ff29b4139fb /lib/test_vmalloc.c
parent	mm: move mmap/vma locking logic into specific files (diff)
download	kernel-4a34c584d8cd13d2b721d21cf629f77c60bfb4a4.tar.gz kernel-4a34c584d8cd13d2b721d21cf629f77c60bfb4a4.zip

mempolicy: optimize queue_folios_pte_range by PTE batching

After the check for queue_folio_required(), the code only cares about the folio in the for loop, i.e the PTEs are redundant. Therefore, optimize this loop by skipping over a PTE batch mapping the same folio. With a test program migrating pages of the calling process, which includes a mapped VMA of size 4GB with pte-mapped large folios of order-9, and migrating once back and forth node-0 and node-1, the average execution time reduces from 7.5 to 4 seconds, giving an approx 47% speedup. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Dev Jain <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

Diffstat (limited to 'lib/test_vmalloc.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: