aboutsummaryrefslogtreecommitdiffstats
path: root/mm/mm_init.c
Commit message (Collapse)AuthorAgeFilesLines
...
| * mm/mm_init.c: print mem_init info after defer_init is doneWei Yang2024-07-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current call flow looks like this: start_kernel mm_core_init mem_init mem_init_print_info rest_init kernel_init kernel_init_freeable page_alloc_init_late deferred_init_memmap If CONFIG_DEFERRED_STRUCT_PAGE_INIT, the time mem_init_print_info() calls, pages are not totally initialized and freed to buddy. This has one issue * nr_free_pages() just contains partial free pages in the system, which is not we expect. Let's print the mem info after defer_init is done. Also this would help changing totalram_pages accounting, since we plan to move the accounting into __free_pages_core(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init: use node's number of cpus in deferred_page_init_max_threadsEric Chanudet2024-07-041-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86_64 is already using the node's cpu as maximum threads. Make that the default for all archs setting DEFERRED_STRUCT_PAGE_INIT. This returns to the behavior prior making the function arch-specific with commit ecd096506922 ("mm: make deferred init's max threads arch-specific"). Setting DEFERRED_STRUCT_PAGE_INIT and testing on a few arm64 platforms shows faster deferred_init_memmap completions: | | x13s | SA8775p-ride | Ampere R137-P31 | Ampere HR330 | | | Metal, 32GB | VM, 36GB | VM, 58GB | Metal, 128GB | | | 8cpus | 8cpus | 8cpus | 32cpus | |---------|-------------|--------------|-----------------|--------------| | threads | ms (%) | ms (%) | ms (%) | ms (%) | |---------|-------------|--------------|-----------------|--------------| | 1 | 108 (0%) | 72 (0%) | 224 (0%) | 324 (0%) | | cpus | 24 (-77%) | 36 (-50%) | 40 (-82%) | 56 (-82%) | Michael Ellerman reported: : On a machine here (1TB, 40 cores, 4KB pages) the existing code gives: : : [ 0.500124] node 2 deferred pages initialised in 210ms : [ 0.515790] node 3 deferred pages initialised in 230ms : [ 0.516061] node 0 deferred pages initialised in 230ms : [ 0.516522] node 7 deferred pages initialised in 230ms : [ 0.516672] node 4 deferred pages initialised in 230ms : [ 0.516798] node 6 deferred pages initialised in 230ms : [ 0.517051] node 5 deferred pages initialised in 230ms : [ 0.523887] node 1 deferred pages initialised in 240ms : : vs with the patch: : : [ 0.379613] node 0 deferred pages initialised in 90ms : [ 0.380388] node 1 deferred pages initialised in 90ms : [ 0.380540] node 4 deferred pages initialised in 100ms : [ 0.390239] node 6 deferred pages initialised in 100ms : [ 0.390249] node 2 deferred pages initialised in 100ms : [ 0.390786] node 3 deferred pages initialised in 110ms : [ 0.396721] node 5 deferred pages initialised in 110ms : [ 0.397095] node 7 deferred pages initialised in 110ms : : Which is a nice speedup. [[email protected]: v3] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric Chanudet <[email protected]> Tested-by: Michael Ellerman <[email protected]> (powerpc) Reviewed-by: Baoquan He <[email protected]> Acked-by: Alexander Gordeev <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* | Merge tag 'memblock-v6.11-rc1' of ↵Linus Torvalds2024-07-181-38/+31
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: - 'reserve_mem' command line parameter to allow creation of named memory reservation at boot time. The driving use-case is to improve the ability of pstore to retain ramoops data across reboots. - cleanups and small improvements in memblock and mm_init - new tests cases in memblock test suite * tag 'memblock-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock tests: fix implicit declaration of function 'numa_valid_node' memblock: Move late alloc warning down to phys alloc pstore/ramoops: Add ramoops.mem_name= command line option mm/memblock: Add "reserve_mem" to reserved named memory at boot up mm/mm_init.c: don't initialize page->lru again mm/mm_init.c: not always search next deferred_init_pfn from very beginning mm/mm_init.c: use deferred_init_mem_pfn_range_in_zone() to decide loop condition mm/mm_init.c: get the highest zone directly mm/mm_init.c: move nr_initialised reset down a bit mm/memblock: fix a typo in description of for_each_mem_region() mm/mm_init.c: use memblock_region_memory_base_pfn() to get startpfn mm/memblock: use PAGE_ALIGN_DOWN to get pgend in free_memmap mm/memblock: return true directly on finding overlap region memblock tests: add memblock_overlaps_region_checks mm/memblock: fix comment for memblock_isolate_range() memblock tests: add memblock_reserve_many_may_conflict_check() memblock tests: add memblock_reserve_all_locations_check() mm/memblock: remove empty dummy entry
| * mm/mm_init.c: don't initialize page->lru againWei Yang2024-06-101-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current page initialization call flow looks like this with some simplification: setup_arch() paging_init() free_area_init() memmap_init() memmap_init_zone_range() memmap_init_range() defer_init() __init_single_page() mm_core_init() mem_init() memblock_free_all() free_low_memory_core_early() memmap_init_reserved_pages() reserve_bootmem_region() init_reserved_page() __init_single_page() There two cases depends on CONFIG_DEFERRED_STRUCT_PAGE_INIT. * If CONFIG_DEFERRED_STRUCT_PAGE_INIT, pages after first_init_pfn is skipped at defer_init(). Then init_reserved_page() is defined to call __init_single_page() for them. * If !CONFIG_DEFERRED_STRUCT_PAGE_INIT, pages are all initialized by memmap_init_range(). In both cases, after init_reserved_page(), we expect __init_single_page() has done its work to the page, which already initialize page->lru properly. We don't need to do it again. Signed-off-by: Wei Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]>
| * mm/mm_init.c: not always search next deferred_init_pfn from very beginningWei Yang2024-06-061-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In function deferred_init_memmap(), we call deferred_init_mem_pfn_range_in_zone() to get the next deferred_init_pfn. But we always search it from the very beginning. Since we save the index in i, we can leverage this to search from i next time. [rppt refine the comment] Signed-off-by: Wei Yang <[email protected]> Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]>
| * mm/mm_init.c: use deferred_init_mem_pfn_range_in_zone() to decide loop conditionWei Yang2024-06-061-11/+4
| | | | | | | | | | | | | | | | | | | | | | If deferred_init_mem_pfn_range_in_zone() return true, we know it finds some range in (spfn, epfn). Then we can use it directly for the loop condition. Signed-off-by: Wei Yang <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]>
| * mm/mm_init.c: get the highest zone directlyWei Yang2024-06-061-8/+4
| | | | | | | | | | | | | | | | | | We have recorded nr_zones in pgdat, just get it directly. Signed-off-by: Wei Yang <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]>
| * mm/mm_init.c: move nr_initialised reset down a bitWei Yang2024-06-051-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need to count nr_initialised in two cases: * for low zones that are always populated * after first_deferred_pfn is detected Let's move the nr_initialised reset down a bit to reduce some comparison of prev_end_pfn and end_pfn. Signed-off-by: Wei Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]>
| * mm/mm_init.c: use memblock_region_memory_base_pfn() to get startpfnWei Yang2024-06-051-1/+1
| | | | | | | | | | | | | | | | | | Just like what it does in "if (mirrored_kernelcore)", we should use memblock_region_memory_base_pfn() to get the startpfn. Signed-off-by: Wei Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]>
* | Revert "mm: init_mlocked_on_free_v3"David Hildenbrand2024-06-151-36/+7
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was insufficient review and no agreement that this is the right approach. There are serious flaws with the implementation that make processes using mlock() not even work with simple fork() [1] and we get reliable crashes when rebooting. Further, simply because we might be unmapping a single PTE of a large mlocked folio, we shouldn't zero out the whole folio. ... especially because the code can also *corrupt* urelated memory because kernel_init_pages(page, folio_nr_pages(folio)); Could end up writing outside of the actual folio if we work with a tail page. Let's revert it. Once there is agreement that this is the right approach, the issues were fixed and there was reasonable review and proper testing, we can consider it again. [1] https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: ba42b524a040 ("mm: init_mlocked_on_free_v3") Signed-off-by: David Hildenbrand <[email protected]> Reported-by: David Wang <[email protected]> Closes: https://lore.kernel.org/lkml/[email protected]/ Reported-by: Lance Yang <[email protected]> Closes: https://lkml.kernel.org/r/[email protected] Acked-by: Lance Yang <[email protected]> Cc: York Jasper Niebuhr <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* Merge tag 'mm-stable-2024-05-17-19-19' of ↵Linus Torvalds2024-05-191-126/+90
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ...
| * mm: init_mlocked_on_free_v3York Jasper Niebuhr2024-04-261-7/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implements the "init_mlocked_on_free" boot option. When this boot option is enabled, any mlock'ed pages are zeroed on free. If the pages are munlock'ed beforehand, no initialization takes place. This boot option is meant to combat the performance hit of "init_on_free" as reported in commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options"). With "init_mlocked_on_free=1" only relevant data is freed while everything else is left untouched by the kernel. Correspondingly, this patch introduces no performance hit for unmapping non-mlock'ed memory. The unmapping overhead for purely mlocked memory was measured to be approximately 13%. Realistically, most systems mlock only a fraction of the total memory so the real-world system overhead should be close to zero. Optimally, userspace programs clear any key material or other confidential memory before exit and munlock the according memory regions. If a program crashes, userspace key managers fail to do this job. Accordingly, no munlock operations are performed so the data is caught and zeroed by the kernel. Should the program not crash, all memory will ideally be munlocked so no overhead is caused. CONFIG_INIT_MLOCKED_ON_FREE_DEFAULT_ON can be set to enable "init_mlocked_on_free" by default. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: York Jasper Niebuhr <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: York Jasper Niebuhr <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init.c: remove the outdated code comment above deferred_grow_zone()Baoquan He2024-04-261-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The noinline attribute has been taken off in commit 9420f89db2dd ("mm: move most of core MM initialization to mm/mm_init.c"). So remove the unneeded code comment above deferred_grow_zone(). And also remove the unneeded bracket in deferred_init_pages(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm: make __absent_pages_in_range() as staticBaoquan He2024-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | It's only called in mm/mm_init.c now. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/init: remove the unnecessary special treatment for memory-less nodeBaoquan He2024-04-261-16/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because memory-less node's ->node_present_pages and its zone's ->present_pages are all 0, the judgement before calling node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for node is enough to skip memory-less node. The 'continue;' statement inside for_each_node() loop of free_area_init() is gilding the lily. Here, remove the special handling to make memory-less node share the same code flow as normal node. And also rephrase the code comments above the 'continue' statement and move them above above line 'if (pgdat->node_present_pages)'. [[email protected]: redo code comments, per Mike] Link: https://lkml.kernel.org/r/ZhYJAVQRYJSTKZng@MiWiFi-R3L-srv Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Cc: Mel Gorman <[email protected]> Cc: "Mike Rapoport (IBM)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init.c: remove arch_reserved_kernel_pages()Baoquan He2024-04-261-12/+0
| | | | | | | | | | | | | | | | | | | | | | Since the current calculation of calc_nr_kernel_pages() has taken into consideration of kernel reserved memory, no need to have arch_reserved_kernel_pages() any more. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init.c: remove unneeded calc_memmap_size()Baoquan He2024-04-261-20/+0
| | | | | | | | | | | | | | | | | | Nobody calls calc_memmap_size() now. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init.c: remove meaningless calculation of zone->managed_pages in ↵Baoquan He2024-04-261-41/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | free_area_init_core() Currently, in free_area_init_core(), when initialize zone's field, a rough value is set to zone->managed_pages. That value is calculated by (zone->present_pages - memmap_pages). In the meantime, add the value to nr_all_pages and nr_kernel_pages which represent all free pages of system (only low memory or including HIGHMEM memory separately). Both of them are gonna be used in alloc_large_system_hash(). However, the rough calculation and setting of zone->managed_pages is meaningless because a) memmap pages are allocated on units of node in sparse_init() or alloc_node_mem_map(pgdat); The simple (zone->present_pages - memmap_pages) is too rough to make sense for zone; b) the set zone->managed_pages will be zeroed out and reset with acutal value in mem_init() via memblock_free_all(). Before the resetting, no buddy allocation request is issued. Here, remove the meaningless and complicated calculation of (zone->present_pages - memmap_pages), directly set zone->managed_pages as zone->present_pages for now. It will be adjusted in mem_init(). And also remove the assignment of nr_all_pages and nr_kernel_pages in free_area_init_core(). Instead, call the newly added calc_nr_kernel_pages() to count up all free but not reserved memory in memblock and assign to nr_all_pages and nr_kernel_pages. The counting excludes memmap_pages, and other kernel used data, which is more accurate than old way and simpler, and can also cover the ppc required arch_reserved_kernel_pages() case. And also clean up the outdated code comment above free_area_init_core(). And free_area_init_core() is easy to understand now, no need to add words to explain. [[email protected]: initialize zone->managed_pages as zone->present_pages for now] Link: https://lkml.kernel.org/r/ZgU0bsJ2FEjykvju@MiWiFi-R3L-srv Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init.c: add new function calc_nr_all_pages()Baoquan He2024-04-261-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a preparation to calculate nr_kernel_pages and nr_all_pages, both of which will be used later in alloc_large_system_hash(). nr_all_pages counts up all free but not reserved memory in memblock allocator, including HIGHMEM memory. While nr_kernel_pages counts up all free but not reserved low memory in memblock allocator, excluding HIGHMEM memory. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init.c: remove the useless dma_reserveBaoquan He2024-04-261-23/+0
| | | | | | | | | | | | | | | | | | | | Now nobody calls set_dma_reserve() to set value for dma_reserve, remove set_dma_reserve(), global variable dma_reserve and the codes using it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * codetag: debug: mark codetags for reserved pages as emptySuren Baghdasaryan2024-04-261-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid debug warnings while freeing reserved pages which were not allocated with usual allocators, mark their codetags as empty before freeing. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Reviewed-by: Kees Cook <[email protected]> Tested-by: Kees Cook <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alex Gaynor <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Andreas Hindborg <[email protected]> Cc: Benno Lossin <[email protected]> Cc: "Björn Roy Baron" <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Gary Guo <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wedson Almeida Filho <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * lib: introduce support for page allocation taggingSuren Baghdasaryan2024-04-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce helper functions to easily instrument page allocators by storing a pointer to the allocation tag associated with the code that allocated the page in a page_ext field. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Co-developed-by: Kent Overstreet <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Tested-by: Kees Cook <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alex Gaynor <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Andreas Hindborg <[email protected]> Cc: Benno Lossin <[email protected]> Cc: "Björn Roy Baron" <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Gary Guo <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Wedson Almeida Filho <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* | mm/execmem, arch: convert simple overrides of module_alloc to execmemMike Rapoport (IBM)2024-05-141-0/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | Several architectures override module_alloc() only to define address range for code allocations different than VMALLOC address space. Provide a generic implementation in execmem that uses the parameters for address space ranges, required alignment and page protections provided by architectures. The architectures must fill execmem_info structure and implement execmem_arch_setup() that returns a pointer to that structure. This way the execmem initialization won't be called from every architecture, but rather from a central place, namely a core_initcall() in execmem. The execmem provides execmem_alloc() API that wraps __vmalloc_node_range() with the parameters defined by the architectures. If an architecture does not implement execmem_arch_setup(), execmem_alloc() will fall back to module_alloc(). Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Song Liu <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
* Author: Gang Li padata: dispatch works onGang Li Subject: padata: dispatch works on2024-03-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | different nodes Date: Thu, 22 Feb 2024 22:04:17 +0800 When a group of tasks that access different nodes are scheduled on the same node, they may encounter bandwidth bottlenecks and access latency. Thus, numa_aware flag is introduced here, allowing tasks to be distributed across different nodes to fully utilize the advantage of multi-node systems. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gang Li <[email protected]> Tested-by: David Rientjes <[email protected]> Reviewed-by: Muchun Song <[email protected]> Reviewed-by: Tim Chen <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Steffen Klassert <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* efi: disable mirror feature during crashkernelMa Wupeng2024-01-121-0/+6
| | | | | | | | | | | | | | If the system has no mirrored memory or uses crashkernel.high while kernelcore=mirror is enabled on the command line then during crashkernel, there will be limited mirrored memory and this usually leads to OOM. To solve this problem, disable the mirror feature during crashkernel. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ma Wupeng <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDERKirill A. Shutemov2024-01-081-11/+11
| | | | | | | | | | | | | | | commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Linus Torvalds <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm: remove unnecessary ia64 code and commentKefeng Wang2023-12-291-29/+19
| | | | | | | | | | IA64 has gone with commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture"), remove unnecessary ia64 special mm code and comment too. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm/mm_init.c: append newline to the unavailable ranges log-messageSerge Semin2023-12-111-1/+1
| | | | | | | | | | | Based on the init_unavailable_range() method and it's callee semantics no multi-line info messages are intended to be printed to the console. Thus append the '\n' symbol to the respective info string. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Serge Semin <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm/mm_init.c: extend init unavailable range doc infoSerge Semin2023-12-111-0/+1
| | | | | | | | | | | | | | | Besides of the already described reasons the pages backended memory holes might be persistent due to having memory mapped IO spaces behind those ranges in the framework of flatmem kernel config. Add such note to the init_unavailable_range() method kdoc in order to point out to one more reason of having the function executed for such regions. [[email protected]: update per Mike] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Serge Semin <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm: hugetlb: skip initialization of gigantic tail struct pages if freed by HVOUsama Arif2023-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new boot flow when it comes to initialization of gigantic pages is as follows: - At boot time, for a gigantic page during __alloc_bootmem_hugepage, the region after the first struct page is marked as noinit. - This results in only the first struct page to be initialized in reserve_bootmem_region. As the tail struct pages are not initialized at this point, there can be a significant saving in boot time if HVO succeeds later on. - Later on in the boot, the head page is prepped and the first HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages are initialized. - HVO is attempted. If it is not successful, then the rest of the tail struct pages are initialized. If it is successful, no more tail struct pages need to be initialized saving significant boot time. The WARN_ON for increased ref count in gather_bootmem_prealloc was changed to a VM_BUG_ON. This is OK as there should be no speculative references this early in boot process. The VM_BUG_ON's are there just in case such code is introduced. [[email protected]: make it nicer for 80 cols] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Usama Arif <[email protected]> Reviewed-by: Muchun Song <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Fam Zheng <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Punit Agrawal <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm/mm_init.c: remove redundant pr_info when node is memorylessYajun Deng2023-10-041-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a similar pr_info in free_area_init_node(), so remove the redundant pr_info. before: [ 0.006314] Initializing node 0 as memoryless [ 0.006445] Initmem setup node 0 as memoryless [ 0.006450] Initmem setup node 1 [mem 0x0000000000001000-0x000000003fffffff] [ 0.006453] Initmem setup node 2 [mem 0x0000000040000000-0x000000007ffd7fff] [ 0.006454] Initializing node 3 as memoryless [ 0.006584] Initmem setup node 3 as memoryless [ 0.006585] Initmem setup node 4 [mem 0x0000000100000000-0x00000001bfffffff] [ 0.006586] Initmem setup node 5 [mem 0x00000001c0000000-0x00000001ffffffff] [ 0.006587] Initmem setup node 6 [mem 0x0000000200000000-0x000000023fffffff] after: [ 0.004147] Initmem setup node 0 as memoryless [ 0.004148] Initmem setup node 1 [mem 0x0000000000001000-0x000000003fffffff] [ 0.004150] Initmem setup node 2 [mem 0x0000000040000000-0x000000007ffd7fff] [ 0.004154] Initmem setup node 3 as memoryless [ 0.004155] Initmem setup node 4 [mem 0x0000000100000000-0x00000001bfffffff] [ 0.004156] Initmem setup node 5 [mem 0x00000001c0000000-0x00000001ffffffff] [ 0.004157] Initmem setup node 6 [mem 0x0000000200000000-0x000000023fffffff] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yajun Deng <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm/mm_init: use helper macro BITS_PER_LONG and BITS_PER_BYTEMiaohe Lin2023-08-211-3/+3
| | | | | | | | | | | It's more readable to use helper macro BITS_PER_LONG and BITS_PER_BYTE. No functional change intended. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm: disable kernelcore=mirror when no mirror memoryMa Wupeng2023-08-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | For system with kernelcore=mirror enabled while no mirrored memory is reported by efi. This could lead to kernel OOM during startup since all memory beside zone DMA are in the movable zone and this prevents the kernel to use it. Zone DMA/DMA32 initialization is independent of mirrored memory and their max pfn is set in zone_sizes_init(). Since kernel can fallback to zone DMA/DMA32 if there is no memory in zone Normal, these zones are seen as mirrored memory no mather their memory attributes are. To solve this problem, disable kernelcore=mirror when there is no real mirrored memory exists. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ma Wupeng <[email protected]> Suggested-by: Kefeng Wang <[email protected]> Suggested-by: Mike Rapoport <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Kefeng Wang <[email protected]> Cc: Levi Yun <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm: no need to export mm_kobjGreg Kroah-Hartman2023-08-211-1/+0
| | | | | | | | | | | There are no modules using mm_kobj, so do not export it. Link: https://lkml.kernel.org/r/2023080436-algebra-cabana-417d@gregkh Signed-off-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm/vmemmap: improve vmemmap_can_optimize and allow architectures to overrideAneesh Kumar K.V2023-08-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | dax vmemmap optimization requires a minimum of 2 PAGE_SIZE area within vmemmap such that tail page mapping can point to the second PAGE_SIZE area. Enforce that in vmemmap_can_optimize() function. Architectures like powerpc also want to enable vmemmap optimization conditionally (only with radix MMU translation). Hence allow architecture override. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dan Williams <[email protected]> Cc: Joao Martins <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm: kfence: allocate kfence_metadata at runtimePeng Zhang2023-08-181-1/+1
| | | | | | | | | | | | | | | | | kfence_metadata is currently a static array. For the purpose of allocating scalable __kfence_pool, we first change it to runtime allocation of metadata. Since the size of an object of kfence_metadata is 1160 bytes, we can save at least 72 pages (with default 256 objects) without enabling kfence. [[email protected]: restore newline, per Marco] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peng Zhang <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm/mm_init.c: drop node_start_pfn from adjust_zone_range_for_zone_movable()Haifeng Xu2023-08-181-4/+2
| | | | | | | | | | | | node_start_pfn is not used in adjust_zone_range_for_zone_movable(), so it is pointless to waste a function argument. Drop the parameter. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haifeng Xu <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm/mm_init.c: mark check_for_memory() as __initHaifeng Xu2023-08-181-1/+1
| | | | | | | | | | | | The only caller of check_for_memory() is free_area_init(), which is annotated with __init, so it should be safe to also mark the former as __init. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haifeng Xu <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm/mm_init.c: remove obsolete macro HASH_SMALLMiaohe Lin2023-08-181-9/+1
| | | | | | | | | | | | | | | | | | HASH_SMALL only works when parameter numentries is 0. But the sole caller futex_init() never calls alloc_large_system_hash() with numentries set to 0. So HASH_SMALL is obsolete and remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: André Almeida <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* mm/mm_init.c: update obsolete comment in get_pfn_range_for_nid()Miaohe Lin2023-08-181-2/+1
| | | | | | | | | | | Since commit 633c0666b5a5 ("Memoryless nodes: drop one memoryless node boot warning"), the warning for a node with no available memory is removed. Update the corresponding comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds2023-06-281-52/+102
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
| * mm: pass nid to reserve_bootmem_region()Yajun Deng2023-06-231-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | early_pfn_to_nid() is called frequently in init_reserved_page(), it returns the node id of the PFN. These PFN are probably from the same memory region, they have the same node id. It's not necessary to call early_pfn_to_nid() for each PFN. Pass nid to reserve_bootmem_region() and drop the call to early_pfn_to_nid() in init_reserved_page(). Also, set nid on all reserved pages before doing this, as some reserved memory regions may not be set nid. The most beneficial function is memmap_init_reserved_pages() if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled. The following data was tested on an x86 machine with 190GB of RAM. before: memmap_init_reserved_pages() 67ms after: memmap_init_reserved_pages() 20ms Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yajun Deng <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init.c: remove reset_node_present_pages()Haifeng Xu2023-06-191-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | reset_node_present_pages() only get called in hotadd_init_pgdat(), move the action that clear present pages to free_area_init_core_hotplug(), so the helper can be removed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haifeng Xu <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init.c: drop 'nid' parameter from check_for_memory()Haifeng Xu2023-06-191-4/+4
| | | | | | | | | | | | | | | | | | | | | | The node_id in pgdat has already been set in free_area_init_node(), so use it internally instead of passing a redundant parameter. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haifeng Xu <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init.c: move set_pageblock_order() to free_area_init()Haifeng Xu2023-06-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | pageblock_order only needs to be set once, there is no need to initialize it in every zone/node. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haifeng Xu <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init.c: remove free_area_init_memoryless_node()Haifeng Xu2023-06-091-6/+1
| | | | | | | | | | | | | | | | | | | | | | free_area_init_memoryless_node() is just a wrapper of free_area_init_node(), remove it to clean up. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haifeng Xu <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init.c: do not calculate zone_start_pfn/zone_end_pfn in ↵Haifeng Xu2023-06-091-18/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zone_absent_pages_in_node() In calculate_node_totalpages(), zone_start_pfn/zone_end_pfn are already calculated in zone_spanned_pages_in_node(), so use them as parameters instead of node_start_pfn/node_end_pfn and the duplicated calculation process can de dropped. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haifeng Xu <[email protected]> Suggested-by: Mike Rapoport <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Haifeng Xu <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm/mm_init.c: introduce reset_memoryless_node_totalpages()Haifeng Xu2023-06-091-9/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, no matter whether a node actually has memory or not, calculate_node_totalpages() is used to account number of pages in zone/node. However, for node without memory, these unnecessary calculations can be skipped. All the zone/node page counts can be set to 0 directly. So introduce reset_memoryless_node_totalpages() to perform this action. Furthermore, calculate_node_totalpages() only gets called for the node with memory. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haifeng Xu <[email protected]> Suggested-by: Mike Rapoport <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm: page_alloc: move sysctls into it own filsKefeng Wang2023-06-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves all page alloc related sysctls to its own file, as part of the kernel/sysctl.c spring cleaning, also move some functions declarations from mm.h into internal.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Len Brown <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * mm: page_alloc: move set_zone_contiguous() into mm_init.cKefeng Wang2023-06-091-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_zone_contiguous() is only used in mm init/hotplug, and clear_zone_contiguous() only used in hotplug, move them from page_alloc.c to the more appropriate file. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Len Brown <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]>