diff options
| author | Dev Jain <[email protected]> | 2025-04-16 05:30:48 +0000 |
|---|---|---|
| committer | Andrew Morton <[email protected]> | 2025-05-12 00:48:33 +0000 |
| commit | 4a34c584d8cd13d2b721d21cf629f77c60bfb4a4 (patch) | |
| tree | cdb64a916e3e5a9c9f849494524f0ff29b4139fb /lib/test_vmalloc.c | |
| parent | mm: move mmap/vma locking logic into specific files (diff) | |
| download | kernel-4a34c584d8cd13d2b721d21cf629f77c60bfb4a4.tar.gz kernel-4a34c584d8cd13d2b721d21cf629f77c60bfb4a4.zip | |
mempolicy: optimize queue_folios_pte_range by PTE batching
After the check for queue_folio_required(), the code only cares about the
folio in the for loop, i.e the PTEs are redundant. Therefore, optimize
this loop by skipping over a PTE batch mapping the same folio.
With a test program migrating pages of the calling process, which includes
a mapped VMA of size 4GB with pte-mapped large folios of order-9, and
migrating once back and forth node-0 and node-1, the average execution
time reduces from 7.5 to 4 seconds, giving an approx 47% speedup.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Dev Jain <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Diffstat (limited to 'lib/test_vmalloc.c')
0 files changed, 0 insertions, 0 deletions
