kernel - saturneric's kernel source tree

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	net: page_pool: report when page pool was destroyed	Jakub Kicinski	2023-11-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Report when page pool was destroyed. Together with the inflight / memory use reporting this can serve as a replacement for the warning about leaked page pools we currently print to dmesg. Example output for a fake leaked page pool using some hacks in netdevsim (one "live" pool, and one "leaked" on the same dev): $ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \ --dump page-pool-get [{'id': 2, 'ifindex': 3}, {'id': 1, 'ifindex': 3, 'destroyed': 133, 'inflight': 1}] Tested-by: Dragos Tatulea <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
*	net: page_pool: report amount of memory held by page pools	Jakub Kicinski	2023-11-28	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	Advanced deployments need the ability to check memory use of various system components. It makes it possible to make informed decisions about memory allocation and to find regressions and leaks. Report memory use of page pools. Report both number of references and bytes held. Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
*	net: page_pool: id the page pools	Jakub Kicinski	2023-11-28	1	-5/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	To give ourselves the flexibility of creating netlink commands and ability to refer to page pool instances in uAPIs create IDs for page pools. Reviewed-by: Ilias Apalodimas <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
*	net: page_pool: factor out uninit	Jakub Kicinski	2023-11-28	1	-8/+13
\| \| \| \| \| \| \| \| \| \| \| \|	We'll soon (next change in the series) need a fuller unwind path in page_pool_create() so create the inverse of page_pool_init(). Reviewed-by: Ilias Apalodimas <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
*	net: page_pool: avoid touching slow on the fastpath	Jakub Kicinski	2023-11-22	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To fully benefit from previous commit add one byte of state in the first cache line recording if we need to look at the slow part. The packing isn't all that impressive right now, we create a 7B hole. I'm expecting Olek's rework will reshuffle this, anyway. Acked-by: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Ilias Apalodimas <[email protected]> Reviewed-by: Mina Almasry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	net: page_pool: split the page_pool_params into fast and slow	Jakub Kicinski	2023-11-22	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	struct page_pool is rather performance critical and we use 16B of the first cache line to store 2 pointers used only by test code. Future patches will add more informational (non-fast path) attributes. It's convenient for the user of the API to not have to worry which fields are fast and which are slow path. Use struct groups to split the params into the two categories internally. Acked-by: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Mina Almasry <[email protected]> Reviewed-by: Ilias Apalodimas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	net: page_pool: add missing free_percpu when page_pool_init fail	Jian Shen	2023-11-02	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When ptr_ring_init() returns failure in page_pool_init(), free_percpu() is not called to free pool->recycle_stats, which may cause memory leak. Fixes: ad6fa1e1ab1b ("page_pool: Add recycle stats") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Jijie Shao <[email protected]> Reviewed-by: Yunsheng Lin <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Somnath Kotur <[email protected]> Reviewed-by: Ilias Apalodimas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
*	page_pool: introduce page_pool_alloc() API	Yunsheng Lin	2023-10-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently page pool supports the below use cases: use case 1: allocate page without page splitting using page_pool_alloc_pages() API if the driver knows that the memory it need is always bigger than half of the page allocated from page pool. use case 2: allocate page frag with page splitting using page_pool_alloc_frag() API if the driver knows that the memory it need is always smaller than or equal to the half of the page allocated from page pool. There is emerging use case [1] & [2] that is a mix of the above two case: the driver doesn't know the size of memory it need beforehand, so the driver may use something like below to allocate memory with least memory utilization and performance penalty: if (size << 1 > max_size) page = page_pool_alloc_pages(); else page = page_pool_alloc_frag(); To avoid the driver doing something like above, add the page_pool_alloc() API to support the above use case, and update the true size of memory that is acctually allocated by updating '*size' back to the driver in order to avoid exacerbating truesize underestimate problem. Rename page_pool_free() which is used in the destroy process to __page_pool_destroy() to avoid confusion with the newly added API. 1. https://lore.kernel.org/all/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/ 2. https://lore.kernel.org/all/[email protected]/ Signed-off-by: Yunsheng Lin <[email protected]> CC: Lorenzo Bianconi <[email protected]> CC: Alexander Duyck <[email protected]> CC: Liang Chen <[email protected]> CC: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	page_pool: remove PP_FLAG_PAGE_FRAG	Yunsheng Lin	2023-10-24	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	PP_FLAG_PAGE_FRAG is not really needed after pp_frag_count handling is unified and page_pool_alloc_frag() is supported in 32-bit arch with 64-bit DMA, so remove it. Signed-off-by: Yunsheng Lin <[email protected]> CC: Lorenzo Bianconi <[email protected]> CC: Alexander Duyck <[email protected]> CC: Liang Chen <[email protected]> CC: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	page_pool: unify frag_count handling in page_pool_is_last_frag()	Yunsheng Lin	2023-10-24	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when page_pool_create() is called with PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only allowed to be called under the below constraints: 1. page_pool_fragment_page() need to be called to setup page->pp_frag_count immediately. 2. page_pool_defrag_page() often need to be called to drain the page->pp_frag_count when there is no more user will be holding on to that page. Those constraints exist in order to support a page to be split into multi fragments. And those constraints have some overhead because of the cache line dirtying/bouncing and atomic update. Those constraints are unavoidable for case when we need a page to be split into more than one fragment, but there is also case that we want to avoid the above constraints and their overhead when a page can't be split as it can only hold a fragment as requested by user, depending on different use cases: use case 1: allocate page without page splitting. use case 2: allocate page with page splitting. use case 3: allocate page with or without page splitting depending on the fragment size. Currently page pool only provide page_pool_alloc_pages() and page_pool_alloc_frag() API to enable the 1 & 2 separately, so we can not use a combination of 1 & 2 to enable 3, it is not possible yet because of the per page_pool flag PP_FLAG_PAGE_FRAG. So in order to allow allocating unsplit page without the overhead of split page while still allow allocating split page we need to remove the per page_pool flag in page_pool_is_last_frag(), as best as I can think of, it seems there are two methods as below: 1. Add per page flag/bit to indicate a page is split or not, which means we might need to update that flag/bit everytime the page is recycled, dirtying the cache line of 'struct page' for use case 1. 2. Unify the page->pp_frag_count handling for both split and unsplit page by assuming all pages in the page pool is split into a big fragment initially. As page pool already supports use case 1 without dirtying the cache line of 'struct page' whenever a page is recyclable, we need to support the above use case 3 with minimal overhead, especially not adding any noticeable overhead for use case 1, and we are already doing an optimization by not updating pp_frag_count in page_pool_defrag_page() for the last fragment user, this patch chooses to unify the pp_frag_count handling to support the above use case 3. There is no noticeable performance degradation and some justification for unifying the frag_count handling with this patch applied using a micro-benchmark testing in [1]. 1. https://lore.kernel.org/all/[email protected]/ Signed-off-by: Yunsheng Lin <[email protected]> CC: Lorenzo Bianconi <[email protected]> CC: Alexander Duyck <[email protected]> CC: Liang Chen <[email protected]> CC: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	page_pool: fragment API support for 32-bit arch with 64-bit DMA	Yunsheng Lin	2023-10-17	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently page_pool_alloc_frag() is not supported in 32-bit arch with 64-bit DMA because of the overlap issue between pp_frag_count and dma_addr_upper in 'struct page' for those arches, which seems to be quite common, see [1], which means driver may need to handle it when using fragment API. It is assumed that the combination of the above arch with an address space >16TB does not exist, as all those arches have 64b equivalent, it seems logical to use the 64b version for a system with a large address space. It is also assumed that dma address is page aligned when we are dma mapping a page aligned buffer, see [2]. That means we're storing 12 bits of 0 at the lower end for a dma address, we can reuse those bits for the above arches to support 32b+12b, which is 16TB of memory. If we make a wrong assumption, a warning is emitted so that user can report to us. 1. https://lore.kernel.org/all/[email protected]/ 2. https://lore.kernel.org/all/[email protected]/ Tested-by: Alexander Lobakin <[email protected]> Signed-off-by: Yunsheng Lin <[email protected]> CC: Lorenzo Bianconi <[email protected]> CC: Alexander Duyck <[email protected]> CC: Liang Chen <[email protected]> CC: Guillaume Tucker <[email protected]> CC: Matthew Wilcox <[email protected]> CC: Linux-MM <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	page_pool: add a lockdep check for recycling in hardirq	Jakub Kicinski	2023-08-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Page pool use in hardirq is prohibited, add debug checks to catch misuses. IIRC we previously discussed using DEBUG_NET_WARN_ON_ONCE() for this, but there were concerns that people will have DEBUG_NET enabled in perf testing. I don't think anyone enables lockdep in perf testing, so use lockdep to avoid pushback and arguing :) Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	net: skbuff: don't include <net/page_pool/types.h> to <linux/skbuff.h>	Alexander Lobakin	2023-08-07	1	-39/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, touching <net/page_pool/types.h> triggers a rebuild of more than half of the kernel. That's because it's included in <linux/skbuff.h>. And each new include to page_pool/types.h adds more [useless] data for the toolchain to process per each source file from that pile. In commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling"), Matteo included it to be able to call a couple of functions defined there. Then, in commit 57f05bc2ab24 ("page_pool: keep pp info as long as page pool owns the page") one of the calls was removed, so only one was left. It's the call to page_pool_return_skb_page() in napi_frag_unref(). The function is external and doesn't have any dependencies. Having very niche page_pool_types.h included only for that looks like an overkill. As %PP_SIGNATURE is not local to page_pool.c (was only in the early submissions), nothing holds this function there. Teleport page_pool_return_skb_page() to skbuff.c, just next to the main consumer, skb_pp_recycle(), and rename it to napi_pp_put_page(), as it doesn't work with skbs at all and the former name tells nothing. The #if guards here are only to not compile and have it in the vmlinux when not needed -- both call sites are already guarded. Now, touching page_pool_types.h only triggers rebuilding of the drivers using it and a couple of core networking files. Suggested-by: Jakub Kicinski <[email protected]> # make skbuff.h less heavy Suggested-by: Alexander Duyck <[email protected]> # move to skbuff.c Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	page_pool: split types and declarations from page_pool.h	Yunsheng Lin	2023-08-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Split types and pure function declarations from page_pool.h and add them in page_page/types.h, so that C sources can include page_pool.h and headers should generally only include page_pool/types.h as suggested by jakub. Rename page_pool.h to page_pool/helpers.h to have both in one place. Signed-off-by: Yunsheng Lin <[email protected]> Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Link: https://lore.kernel.org/r/[email protected] [Jakub: change microsoft/mana, fix kdoc paths in Documentation] Signed-off-by: Jakub Kicinski <[email protected]>
*	docs: net: page_pool: use kdoc to avoid duplicating the information	Jakub Kicinski	2023-08-03	1	-1/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All struct members of the driver-facing APIs are documented twice, in the code and under Documentation. This is a bit tedious. I also get the feeling that a lot of developers will read the header when coding, rather than the doc. Bring the two a little closer together by using kdoc for structs and functions. Using kdoc also gives us links (mentioning a function or struct in the text gets replaced by a link to its doc). Reviewed-by: Randy Dunlap <[email protected]> Tested-by: Randy Dunlap <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	net: page_pool: merge page_pool_release_page() with page_pool_return_page()	Jakub Kicinski	2023-07-22	1	-10/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that page_pool_release_page() is not exported we can merge it with page_pool_return_page(). I believe that the "Do not replace this with page_pool_return_page()" comment was there in case page_pool_return_page() was not inlined, to avoid two function calls. Acked-by: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Yunsheng Lin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Alexander Lobakin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
*	net: page_pool: hide page_pool_release_page()	Jakub Kicinski	2023-07-22	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	There seems to be no user calling page_pool_release_page() for legit reasons, all the users simply haven't been converted to skb-based recycling, yet. Previous changes converted them. Update the docs, and unexport the function. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Alexander Lobakin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
*	page_pool: fix inconsistency for page_pool_ring_[un]lock()	Yunsheng Lin	2023-05-24	1	-2/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	page_pool_ring_[un]lock() use in_softirq() to decide which spin lock variant to use, and when they are called in the context with in_softirq() being false, spin_lock_bh() is called in page_pool_ring_lock() while spin_unlock() is called in page_pool_ring_unlock(), because spin_lock_bh() has disabled the softirq in page_pool_ring_lock(), which causes inconsistency for spin lock pair calling. This patch fixes it by returning in_softirq state from page_pool_producer_lock(), and use it to decide which spin lock variant to use in page_pool_producer_unlock(). As pool->ring has both producer and consumer lock, so rename it to page_pool_producer_[un]lock() to reflect the actual usage. Also move them to page_pool.c as they are only used there, and remove the 'inline' as the compiler may have better idea to do inlining or not. Fixes: 7886244736a4 ("net: page_pool: Add bulk support for ptr_ring") Signed-off-by: Yunsheng Lin <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Acked-by: Ilias Apalodimas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	page_pool: unlink from napi during destroy	Jakub Kicinski	2023-04-21	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jesper points out that we must prevent recycling into cache after page_pool_destroy() is called, because page_pool_destroy() is not synchronized with recycling (some pages may still be outstanding when destroy() gets called). I assumed this will not happen because NAPI can't be scheduled if its page pool is being destroyed. But I missed the fact that NAPI may get reused. For instance when user changes ring configuration driver may allocate a new page pool, stop NAPI, swap, start NAPI, and then destroy the old pool. The NAPI is running so old page pool will think it can recycle to the cache, but the consumer at that point is the destroy() path, not NAPI. To avoid extra synchronization let the drivers do "unlinking" during the "swap" stage while NAPI is indeed disabled. Fixes: 8c48eea3adf3 ("page_pool: allow caching from safely localized NAPI") Reported-by: Jesper Dangaard Brouer <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Acked-by: Jesper Dangaard Brouer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	page_pool: add DMA_ATTR_WEAK_ORDERING on all mappings	Jakub Kicinski	2023-04-19	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit c519fe9a4f0d ("bnxt: add dma mapping attributes") added DMA_ATTR_WEAK_ORDERING to DMA attrs on bnxt. It has since spread to a few more drivers (possibly as a copy'n'paste). DMA_ATTR_WEAK_ORDERING only seems to matter on Sparc and PowerPC/cell, the rarity of these platforms is likely why we never bothered adding the attribute in the page pool, even though it should be safe to add. To make the page pool migration in drivers which set this flag less of a risk (of regressing the precious sparc database workloads or whatever needed this) let's add DMA_ATTR_WEAK_ORDERING on all page pool DMA mappings. We could make this a driver opt-in but frankly I don't think it's worth complicating the API. I can't think of a reason why device accesses to packet memory would have to be ordered. Acked-by: Ilias Apalodimas <[email protected]> Acked-by: Somnath Kotur <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	page_pool: allow caching from safely localized NAPI	Jakub Kicinski	2023-04-15	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent patches to mlx5 mentioned a regression when moving from driver local page pool to only using the generic page pool code. Page pool has two recycling paths (1) direct one, which runs in safe NAPI context (basically consumer context, so producing can be lockless); and (2) via a ptr_ring, which takes a spin lock because the freeing can happen from any CPU; producer and consumer may run concurrently. Since the page pool code was added, Eric introduced a revised version of deferred skb freeing. TCP skbs are now usually returned to the CPU which allocated them, and freed in softirq context. This places the freeing (producing of pages back to the pool) enticingly close to the allocation (consumer). If we can prove that we're freeing in the same softirq context in which the consumer NAPI will run - lockless use of the cache is perfectly fine, no need for the lock. Let drivers link the page pool to a NAPI instance. If the NAPI instance is scheduled on the same CPU on which we're freeing - place the pages in the direct cache. With that and patched bnxt (XDP enabled to engage the page pool, sigh, bnxt really needs page pool work :() I see a 2.6% perf boost with a TCP stream test (app on a different physical core than softirq). The CPU use of relevant functions decreases as expected: page_pool_refill_alloc_cache 1.17% -> 0% _raw_spin_lock 2.41% -> 0.98% Only consider lockless path to be safe when NAPI is scheduled - in practice this should cover majority if not all of steady state workloads. It's usually the NAPI kicking in that causes the skb flush. The main case we'll miss out on is when application runs on the same CPU as NAPI. In that case we don't use the deferred skb free path. Reviewed-by: Tariq Toukan <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Tested-by: Dragos Tatulea <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
*	net: page_pool: use in_softirq() instead	Qingfang DENG	2023-02-06	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	We use BH context only for synchronization, so we don't care if it's actually serving softirq or not. As a side node, in case of threaded NAPI, in_serving_softirq() will return false because it's in process context with BH off, making page_pool_recycle_in_cache() unreachable. Signed-off-by: Qingfang DENG <[email protected]> Tested-by: Felix Fietkau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
*	Merge tag 'mm-stable-2022-08-03' of ↵	Linus Torvalds	2022-08-05	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...
\| *	mm/swap: convert __put_page() to __folio_put()	Matthew Wilcox (Oracle)	2022-07-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Saves 11 bytes of text by removing a check of PageTail. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* \|	net: page_pool: optimize page pool page allocation in NUMA scenario	Jie Wang	2022-07-08	1	-1/+2
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently NIC packet receiving performance based on page pool deteriorates occasionally. To analysis the causes of this problem page allocation stats are collected. Here are the stats when NIC rx performance deteriorates: bandwidth(Gbits/s) 16.8 6.91 rx_pp_alloc_fast 13794308 21141869 rx_pp_alloc_slow 108625 166481 rx_pp_alloc_slow_h 0 0 rx_pp_alloc_empty 8192 8192 rx_pp_alloc_refill 0 0 rx_pp_alloc_waive 100433 158289 rx_pp_recycle_cached 0 0 rx_pp_recycle_cache_full 0 0 rx_pp_recycle_ring 362400 420281 rx_pp_recycle_ring_full 6064893 9709724 rx_pp_recycle_released_ref 0 0 The rx_pp_alloc_waive count indicates that a large number of pages' numa node are inconsistent with the NIC device numa node. Therefore these pages can't be reused by the page pool. As a result, many new pages would be allocated by __page_pool_alloc_pages_slow which is time consuming. This causes the NIC rx performance fluctuations. The main reason of huge numa mismatch pages in page pool is that page pool uses alloc_pages_bulk_array to allocate original pages. This function is not suitable for page allocation in NUMA scenario. So this patch uses alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure the NUMA consistent between NIC device and allocated pages. Repeated NIC rx performance tests are performed 40 times. NIC rx bandwidth is higher and more stable compared to the datas above. Here are three test stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which indicates pages allocated from slow patch is relatively low. bandwidth(Gbits/s) 93 93.9 93.8 rx_pp_alloc_fast 60066264 61266386 60938254 rx_pp_alloc_slow 16512 16517 16539 rx_pp_alloc_slow_ho 0 0 0 rx_pp_alloc_empty 16512 16517 16539 rx_pp_alloc_refill 473841 481910 481585 rx_pp_alloc_waive 0 0 0 rx_pp_recycle_cached 0 0 0 rx_pp_recycle_cache_full 0 0 0 rx_pp_recycle_ring 29754145 30358243 30194023 rx_pp_recycle_ring_full 0 0 0 rx_pp_recycle_released_ref 0 0 0 Signed-off-by: Jie Wang <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Acked-by: Ilias Apalodimas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	net: page_pool: add page allocation stats for two fast page allocate path	Jie Wang	2022-05-13	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Currently If use page pool allocation stats to analysis a RX performance degradation problem. These stats only count for pages allocate from page_pool_alloc_pages. But nic drivers such as hns3 use page_pool_dev_alloc_frag to allocate pages, so page stats in this API should also be counted. Signed-off-by: Jie Wang <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
*	net: page_pool: introduce ethtool stats	Lorenzo Bianconi	2022-04-15	1	-1/+62
\| \| \| \| \| \| \| \| \| \|	Introduce page_pool APIs to report stats through ethtool and reduce duplicated code in each driver. Signed-off-by: Lorenzo Bianconi <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Reviewed-by: Ilias Apalodimas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
*	page_pool: Add recycle stats to page_pool_put_page_bulk	Lorenzo Bianconi	2022-04-12	1	-2/+13
\| \| \| \| \| \| \| \| \| \|	Add missing recycle stats to page_pool_put_page_bulk routine. Reviewed-by: Joe Damato <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Reviewed-by: Ilias Apalodimas <[email protected]> Link: https://lore.kernel.org/r/3712178b51c007cfaed910ea80e68f00c916b1fa.1649685634.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <[email protected]>
*	page_pool: Add function to batch and return stats	Joe Damato	2022-03-03	1	-0/+25
\| \| \| \| \| \| \| \| \| \|	Adds a function page_pool_get_stats which can be used by drivers to obtain stats for a specified page_pool. Signed-off-by: Joe Damato <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Ilias Apalodimas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
*	page_pool: Add recycle stats	Joe Damato	2022-03-03	1	-2/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add per-cpu stats tracking page pool recycling events: - cached: recycling placed page in the page pool cache - cache_full: page pool cache was full - ring: page placed into the ptr ring - ring_full: page released from page pool because the ptr ring was full - released_refcnt: page released (and not recycled) because refcnt > 1 Signed-off-by: Joe Damato <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Ilias Apalodimas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
*	page_pool: Add allocation stats	Joe Damato	2022-03-03	1	-4/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add per-pool statistics counters for the allocation path of a page pool. These stats are incremented in softirq context, so no locking or per-cpu variables are needed. This code is disabled by default and a kernel config option is provided for users who wish to enable them. The statistics added are: - fast: successful fast path allocations - slow: slow path order-0 allocations - slow_high_order: slow path high order allocations - empty: ptr ring is empty, so a slow path allocation was forced. - refill: an allocation which triggered a refill of the cache - waive: pages obtained from the ptr ring that cannot be added to the cache due to a NUMA mismatch. Signed-off-by: Joe Damato <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Ilias Apalodimas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
*	page_pool: Refactor page_pool to enable fragmenting after allocation	Alexander Duyck	2022-02-03	1	-13/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change is meant to permit a driver to perform "fragmenting" of the page from within the driver instead of the current model which requires pre-partitioning the page. The main motivation behind this is to support use cases where the page will be split up by the driver after DMA instead of before. With this change it becomes possible to start using page pool to replace some of the existing use cases where multiple references were being used for a single page, but the number needed was unknown as the size could be dynamic. For example, with this code it would be possible to do something like the following to handle allocation: page = page_pool_alloc_pages(); if (!page) return NULL; page_pool_fragment_page(page, DRIVER_PAGECNT_BIAS_MAX); rx_buf->page = page; rx_buf->pagecnt_bias = DRIVER_PAGECNT_BIAS_MAX; Then we would process a received buffer by handling it with: rx_buf->pagecnt_bias--; Once the page has been fully consumed we could then flush the remaining instances with: if (page_pool_defrag_page(page, rx_buf->pagecnt_bias)) continue; page_pool_put_defragged_page(pool, page -1, !!budget); The general idea is that we want to have the ability to allocate a page with excess fragment count and then trim off the unneeded fragments. Signed-off-by: Alexander Duyck <[email protected]> Reviewed-by: Ilias Apalodimas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
*	page_pool: remove spinlock in page_pool_refill_alloc_cache()	Yunsheng Lin	2022-01-10	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	As page_pool_refill_alloc_cache() is only called by __page_pool_get_cached(), which assumes non-concurrent access as suggested by the comment in __page_pool_get_cached(), and ptr_ring allows concurrent access between consumer and producer, so remove the spinlock in page_pool_refill_alloc_cache(). Signed-off-by: Yunsheng Lin <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
*	page_pool: Store the XDP mem id	Toke Høiland-Jørgensen	2022-01-06	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	Store the XDP mem ID inside the page_pool struct so it can be retrieved later for use in bpf_prog_run(). Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
*	page_pool: Add callback to init pages when they are allocated	Toke Høiland-Jørgensen	2022-01-06	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new callback function to page_pool that, if set, will be called every time a new page is allocated. This will be used from bpf_test_run() to initialise the page data with the data provided by userspace when running XDP programs with redirect turned on. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
*	page_pool: Revert "page_pool: disable dma mapping support..."	Yunsheng Lin	2021-11-18	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit d00e60ee54b12de945b8493cf18c1ada9e422514. As reported by Guillaume in [1]: Enabling LPAE always enables CONFIG_ARCH_DMA_ADDR_T_64BIT in 32-bit systems, which breaks the bootup proceess when a ethernet driver is using page pool with PP_FLAG_DMA_MAP flag. As we were hoping we had no active consumers for such system when we removed the dma mapping support, and LPAE seems like a common feature for 32 bits system, so revert it. 1. https://www.spinics.net/lists/netdev/msg779890.html Fixes: d00e60ee54b1 ("page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA") Signed-off-by: Yunsheng Lin <[email protected]> Reported-by: "kernelci.org bot" <[email protected]> Tested-by: "kernelci.org bot" <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Acked-by: Ilias Apalodimas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
*	page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA	Yunsheng Lin	2021-10-15	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As the 32-bit arch with 64-bit DMA seems to rare those days, and page pool might carry a lot of code and complexity for systems that possibly. So disable dma mapping support for such systems, if drivers really want to work on such systems, they have to implement their own DMA-mapping fallback tracking outside page_pool. Reviewed-by: Ilias Apalodimas <[email protected]> Signed-off-by: Yunsheng Lin <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
*	page_pool: use relaxed atomic for release side accounting	Yunsheng Lin	2021-08-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	There is no need to synchronize the account updating, so use the relaxed atomic to avoid some memory barrier in the data path. Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Yunsheng Lin <[email protected]> Acked-by: Ilias Apalodimas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	2021-08-13	1	-1/+9
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h 9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp") 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins") 099fdeda659d ("bnxt_en: Event handler for PPS events") kernel/bpf/helpers.c include/linux/bpf-cgroup.h a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()") c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current") drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c 5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray") 2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer") MAINTAINERS 7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo") 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") Signed-off-by: Jakub Kicinski <[email protected]>
\| *	page_pool: mask the page->signature before the checking	Yunsheng Lin	2021-08-09	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As mentioned in commit c07aea3ef4d4 ("mm: add a signature in struct page"): "The page->signature field is aliased to page->lru.next and page->compound_head." And as the comment in page_is_pfmemalloc(): "lru.next has bit 1 set if the page is allocated from the pfmemalloc reserves. Callers may simply overwrite it if they do not need to preserve that information." The page->signature is OR’ed with PP_SIGNATURE when a page is allocated in page pool, see __page_pool_alloc_pages_slow(), and page->signature is checked directly with PP_SIGNATURE in page_pool_return_skb_page(), which might cause resoure leaking problem for a page from page pool if bit 1 of lru.next is set for a pfmemalloc page. What happens here is that the original pp->signature is OR'ed with PP_SIGNATURE after the allocation in order to preserve any existing bits(such as the bit 1, used to indicate a pfmemalloc page), so when those bits are present, those page is not considered to be from page pool and the DMA mapping of those pages will be left stale. As bit 0 is for page->compound_head, So mask both bit 0/1 before the checking in page_pool_return_skb_page(). And we will return those pfmemalloc pages back to the page allocator after cleaning up the DMA mapping. Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") Reviewed-by: Ilias Apalodimas <[email protected]> Signed-off-by: Yunsheng Lin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* \|	page_pool: add frag page recycling support in page pool	Yunsheng Lin	2021-08-09	1	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently page pool only support page recycling when there is only one user of the page, and the split page reusing implemented in the most driver can not use the page pool as bing-pong way of reusing requires the multi user support in page pool. Those reusing or recycling has below limitations: 1. page from page pool can only be used be one user in order for the page recycling to happen. 2. Bing-pong way of reusing in most driver does not support multi desc using different part of the same page in order to save memory. So add multi-users support and frag page recycling in page pool to overcome the above limitation. Signed-off-by: Yunsheng Lin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
* \|	page_pool: add interface to manipulate frag count in page pool	Yunsheng Lin	2021-08-09	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For 32 bit systems with 64 bit dma, dma_addr[1] is used to store the upper 32 bit dma addr, those system should be rare those days. For normal system, the dma_addr[1] in 'struct page' is not used, so we can reuse dma_addr[1] for storing frag count, which means how many frags this page might be splited to. In order to simplify the page frag support in the page pool, the PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to indicate the 32 bit systems with 64 bit dma, and the page frag support in page pool is disabled for such system. The newly added page_pool_set_frag_count() is called to reserve the maximum frag count before any page frag is passed to the user. The page_pool_atomic_sub_frag_count_return() is called when user is done with the page frag. Signed-off-by: Yunsheng Lin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
* \|	page_pool: keep pp info as long as page pool owns the page	Yunsheng Lin	2021-08-09	1	-4/+17
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, page->pp is cleared and set everytime the page is recycled, which is unnecessary. So only set the page->pp when the page is added to the page pool and only clear it when the page is released from the page pool. This is also a preparation to support allocating frag page in page pool. Reviewed-by: Ilias Apalodimas <[email protected]> Signed-off-by: Yunsheng Lin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
*	page_pool: Allow drivers to hint on SKB recycling	Ilias Apalodimas	2021-06-07	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Up to now several high speed NICs have custom mechanisms of recycling the allocated memory they use for their payloads. Our page_pool API already has recycling capabilities that are always used when we are running in 'XDP mode'. So let's tweak the API and the kernel network stack slightly and allow the recycling to happen even during the standard operation. The API doesn't take into account 'split page' policies used by those drivers currently, but can be extended once we have users for that. The idea is to be able to intercept the packet on skb_release_data(). If it's a buffer coming from our page_pool API recycle it back to the pool for further usage or just release the packet entirely. To achieve that we introduce a bit in struct sk_buff (pp_recycle:1) and a field in struct page (page->pp) to store the page_pool pointer. Storing the information in page->pp allows us to recycle both SKBs and their fragments. We could have skipped the skb bit entirely, since identical information can bederived from struct page. However, in an effort to affect the free path as less as possible, reading a single bit in the skb which is already in cache, is better that trying to derive identical information for the page stored data. The driver or page_pool has to take care of the sync operations on it's own during the buffer recycling since the buffer is, after opting-in to the recycling, never unmapped. Since the gain on the drivers depends on the architecture, we are not enabling recycling by default if the page_pool API is used on a driver. In order to enable recycling the driver must call skb_mark_for_recycle() to store the information we need for recycling in page->pp and enabling the recycling bit, or page_pool_store_mem_info() for a fragment. Co-developed-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Co-developed-by: Matteo Croce <[email protected]> Signed-off-by: Matteo Croce <[email protected]> Signed-off-by: Ilias Apalodimas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
*	mm: add a signature in struct page	Matteo Croce	2021-06-07	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is needed by the page_pool to avoid recycling a page not allocated via page_pool. The page->signature field is aliased to page->lru.next and page->compound_head, but it can't be set by mistake because the signature value is a bad pointer, and can't trigger a false positive in PageTail() because the last bit is 0. Co-developed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Matteo Croce <[email protected]> Signed-off-by: David S. Miller <[email protected]>
*	mm: fix struct page layout on 32-bit systems	Matthew Wilcox (Oracle)	2021-05-15	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	32-bit architectures which expect 8-byte alignment for 8-byte integers and need 64-bit DMA addresses (arm, mips, ppc) had their struct page inadvertently expanded in 2019. When the dma_addr_t was added, it forced the alignment of the union to 8 bytes, which inserted a 4 byte gap between 'flags' and the union. Fix this by storing the dma_addr_t in one or two adjacent unsigned longs. This restores the alignment to that of an unsigned long. We always store the low bits in the first word to prevent the PageTail bit from being inadvertently set on a big endian platform. If that happened, get_user_pages_fast() racing against a page which was freed and reallocated to the page_pool could dereference a bogus compound_head(), which would be hard to trace back to this cause. Link: https://lkml.kernel.org/r/[email protected] Fixes: c25fff7171be ("mm: add dma_addr_t to struct page") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Ilias Apalodimas <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Matteo Croce <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
*	net: page_pool: use alloc_pages_bulk in refill code path	Jesper Dangaard Brouer	2021-04-30	1	-26/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are cases where the page_pool need to refill with pages from the page allocator. Some workloads cause the page_pool to release pages instead of recycling these pages. For these workload it can improve performance to bulk alloc pages from the page-allocator to refill the alloc cache. For XDP-redirect workload with 100G mlx5 driver (that use page_pool) redirecting xdp_frame packets into a veth, that does XDP_PASS to create an SKB from the xdp_frame, which then cannot return the page to the page_pool. Performance results under GitHub xdp-project[1]: [1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org Mel: The patch "net: page_pool: convert to use alloc_pages_bulk_array variant" was squashed with this patch. From the test page, the array variant was superior with one of the test results as follows. Kernel XDP stats CPU pps Delta Baseline XDP-RX CPU total 3,771,046 n/a List XDP-RX CPU total 3,940,242 +4.49% Array XDP-RX CPU total 4,249,224 +12.68% Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: David Miller <[email protected]> Cc: Ilias Apalodimas <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
*	net: page_pool: refactor dma_map into own function page_pool_dma_map	Jesper Dangaard Brouer	2021-04-30	1	-19/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation for next patch, move the dma mapping into its own function, as this will make it easier to follow the changes. [ilias.apalodimas: make page_pool_dma_map return boolean] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Ilias Apalodimas <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: David Miller <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
*	net: page_pool: simplify page recycling condition tests	Alexander Lobakin	2021-02-05	1	-10/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	pool_page_reusable() is a leftover from pre-NUMA-aware times. For now, this function is just a redundant wrapper over page_is_pfmemalloc(), so inline it into its sole call site. Signed-off-by: Alexander Lobakin <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Ilias Apalodimas <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
*	net: page_pool: Add bulk support for ptr_ring	Lorenzo Bianconi	2020-11-14	1	-10/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce the capability to batch page_pool ptr_ring refill since it is usually run inside the driver NAPI tx completion loop. Suggested-by: Jesper Dangaard Brouer <[email protected]> Co-developed-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Acked-by: Ilias Apalodimas <[email protected]> Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org