aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc
Commit message (Collapse)AuthorAgeFilesLines
* Revert "eth: remove the DLink/Sundance (ST201) driver"Jakub Kicinski2025-09-021-0/+1
| | | | | | | | | | | | | | | | | | | This reverts commit 8401a108a63302a5a198c7075d857895ca624851. I got a report from an (anonymous) Sundance user: Ethernet controller: Sundance Technology Inc / IC Plus Corp IC Plus IP100A Integrated 10/100 Ethernet MAC + PHY (rev 31) Revert the driver back in. Make following changes: - update Denis's email address in MAINTAINERS - adjust to timer API renames: - del_timer_sync() -> timer_delete_sync() - from_timer() -> timer_container_of() Fixes: 8401a108a633 ("eth: remove the DLink/Sundance (ST201) driver") Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* powerpc/boot/install.sh: Fix shellcheck warningsMadhavan Srinivasan2025-08-201-7/+7
| | | | | | | | | | | Fix shellcheck warning such as "Double quote to prevent globbing and word splitting." and Use $(...) notation instead of legacy backticks `...`. Tested-by: Venkat Rao Bagalkote <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
* powerpc/prom_init: Fix shellcheck warningsMadhavan Srinivasan2025-08-201-8/+8
| | | | | | | | | | | Fix "Double quote to prevent globbing and word splitting." warning from shellcheck Tested-by: Venkat Rao Bagalkote <[email protected]> Reviewed-by: Stephen Rothwell <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
* powerpc/kvm: Fix ifdef to remove build warningMadhavan Srinivasan2025-08-201-4/+4
| | | | | | | | | | | | | | | | | | When compiling for pseries or powernv defconfig with "make C=1", these warning were reported bu sparse tool in powerpc/kernel/kvm.c arch/powerpc/kernel/kvm.c:635:9: warning: switch with no cases arch/powerpc/kernel/kvm.c:646:9: warning: switch with no cases Currently #ifdef were added after the switch case which are specific for BOOKE and PPC_BOOK3S_32. These are not enabled in pseries/powernv defconfig. Fix it by moving the #ifdef before switch(){} Fixes: cbe487fac7fc0 ("KVM: PPC: Add mtsrin PV code") Tested-by: Venkat Rao Bagalkote <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
* powerpc: unify two CONFIG_POWERPC64_CPU entries in the same choice blockMasahiro Yamada2025-08-201-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | There are two CONFIG_POWERPC64_CPU entries in the "CPU selection" choice block. I guess the intent is to display a different prompt depending on CPU_LITTLE_ENDIAN: "Generic (POWER5 and PowerPC 970 and above)" for big endian, and "Generic (POWER8 and above)" for little endian. I stumbled on this tricky use case, and worked around it on Kconfig with commit 4d46b5b623e0 ("kconfig: fix infinite loop in sym_calc_choice()"). However, I doubt that supporting multiple entries with the same symbol in a choice block is worth the complexity - this is the only such case in the kernel tree. This commit merges the two entries. Once this cleanup is accepted in the powerpc subsystem, I will proceed to refactor the Kconfig parser. Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
* powerpc: use always-y instead of extra-y in MakefilesMasahiro Yamada2025-08-202-4/+6
| | | | | | | | | | | | | | | | | | The extra-y syntax is planned for deprecation because it is similar to always-y. When building the boot wrapper, always-y and extra-y are equivalent. Use always-y instead. In arch/powerpc/kernel/Makefile, I added ifdef KBUILD_BUILTIN to keep the current behavior: prom_init_check is skipped when building only modular objects. Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
* powerpc/64: Drop unnecessary 'rc' variableXichao Zhao2025-08-201-4/+1
| | | | | | | | | | | Simplify the code to enhance readability and maintain a consistent coding style. Signed-off-by: Xichao Zhao <[email protected]> Acked-by: Gautam Menghani <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
* powerpc: Use dev_fwnode()Jiri Slaby (SUSE)2025-08-202-5/+3
| | | | | | | | | | | | | | | irq_domain_create_simple() takes fwnode as the first argument. It can be extracted from the struct device using dev_fwnode() helper instead of using of_node with of_fwnode_handle(). So use the dev_fwnode() helper. Signed-off-by: Jiri Slaby (SUSE) <[email protected]> Acked-by: Christophe Leroy <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
* KVM: PPC: Fix misleading interrupts comment in kvmppc_prepare_to_enter()Andrew Donnellan2025-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | Until commit 6c85f52b10fd ("kvm/ppc: IRQ disabling cleanup"), kvmppc_prepare_to_enter() was called with interrupts already disabled by the caller, which was documented in the comment above the function. Post-cleanup, the function is now called with interrupts enabled, and disables interrupts itself. Fix the comment to reflect the current behaviour. Fixes: 6c85f52b10fd ("kvm/ppc: IRQ disabling cleanup") Signed-off-by: Andrew Donnellan <[email protected]> Reviewed-by: Amit Machhiwal <[email protected]> Reviewed-by: Gautam Menghani <[email protected]> Reviewed-by: Shrikanth Hegde <[email protected]> [Fixed the double colon in Reviewed-by line] Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
* treewide: rename GPIO set callbacks back to their original namesBartosz Golaszewski2025-08-075-6/+6
| | | | | | | | | | | The conversion of all GPIO drivers to using the .set_rv() and .set_multiple_rv() callbacks from struct gpio_chip (which - unlike their predecessors - return an integer and allow the controller drivers to indicate failures to users) is now complete and the legacy ones have been removed. Rename the new callbacks back to their original names in one sweeping change. Signed-off-by: Bartosz Golaszewski <[email protected]>
* Merge tag 'powerpc-6.17-2' of ↵Linus Torvalds2025-08-046-20/+125
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - Fixes for several issues in the powernv PCI hotplug path - Fix htmldoc generation for htm.rst in toctree - Add jit support for load_acquire and store_release in ppc64 bpf jit Thanks to Bjorn Helgaas, Hari Bathini, Puranjay Mohan, Saket Kumar Bhaskar, Shawn Anastasio, Timothy Pearson, and Vishal Parmar * tag 'powerpc-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc64/bpf: Add jit support for load_acquire and store_release docs: powerpc: add htm.rst to toctree PCI: pnv_php: Enable third attention indicator state PCI: pnv_php: Fix surprise plug detection and recovery powerpc/eeh: Make EEH driver device hotplug safe powerpc/eeh: Export eeh_unfreeze_pe() PCI: pnv_php: Work around switches with broken presence detection PCI: pnv_php: Clean up allocated IRQs on unplug
| * powerpc64/bpf: Add jit support for load_acquire and store_releasePuranjay Mohan2025-07-282-0/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add JIT support for the load_acquire and store_release instructions. The implementation is similar to the kernel where: load_acquire => plain load -> lwsync store_release => lwsync -> plain store To test the correctness of the implementation, following selftests were run: [fedora@linux-kernel bpf]$ sudo ./test_progs -a \ verifier_load_acquire,verifier_store_release,atomics #11/1 atomics/add:OK #11/2 atomics/sub:OK #11/3 atomics/and:OK #11/4 atomics/or:OK #11/5 atomics/xor:OK #11/6 atomics/cmpxchg:OK #11/7 atomics/xchg:OK #11 atomics:OK #519/1 verifier_load_acquire/load-acquire, 8-bit:OK #519/2 verifier_load_acquire/load-acquire, 8-bit @unpriv:OK #519/3 verifier_load_acquire/load-acquire, 16-bit:OK #519/4 verifier_load_acquire/load-acquire, 16-bit @unpriv:OK #519/5 verifier_load_acquire/load-acquire, 32-bit:OK #519/6 verifier_load_acquire/load-acquire, 32-bit @unpriv:OK #519/7 verifier_load_acquire/load-acquire, 64-bit:OK #519/8 verifier_load_acquire/load-acquire, 64-bit @unpriv:OK #519/9 verifier_load_acquire/load-acquire with uninitialized src_reg:OK #519/10 verifier_load_acquire/load-acquire with uninitialized src_reg @unpriv:OK #519/11 verifier_load_acquire/load-acquire with non-pointer src_reg:OK #519/12 verifier_load_acquire/load-acquire with non-pointer src_reg @unpriv:OK #519/13 verifier_load_acquire/misaligned load-acquire:OK #519/14 verifier_load_acquire/misaligned load-acquire @unpriv:OK #519/15 verifier_load_acquire/load-acquire from ctx pointer:OK #519/16 verifier_load_acquire/load-acquire from ctx pointer @unpriv:OK #519/17 verifier_load_acquire/load-acquire with invalid register R15:OK #519/18 verifier_load_acquire/load-acquire with invalid register R15 @unpriv:OK #519/19 verifier_load_acquire/load-acquire from pkt pointer:OK #519/20 verifier_load_acquire/load-acquire from flow_keys pointer:OK #519/21 verifier_load_acquire/load-acquire from sock pointer:OK #519 verifier_load_acquire:OK #556/1 verifier_store_release/store-release, 8-bit:OK #556/2 verifier_store_release/store-release, 8-bit @unpriv:OK #556/3 verifier_store_release/store-release, 16-bit:OK #556/4 verifier_store_release/store-release, 16-bit @unpriv:OK #556/5 verifier_store_release/store-release, 32-bit:OK #556/6 verifier_store_release/store-release, 32-bit @unpriv:OK #556/7 verifier_store_release/store-release, 64-bit:OK #556/8 verifier_store_release/store-release, 64-bit @unpriv:OK #556/9 verifier_store_release/store-release with uninitialized src_reg:OK #556/10 verifier_store_release/store-release with uninitialized src_reg @unpriv:OK #556/11 verifier_store_release/store-release with uninitialized dst_reg:OK #556/12 verifier_store_release/store-release with uninitialized dst_reg @unpriv:OK #556/13 verifier_store_release/store-release with non-pointer dst_reg:OK #556/14 verifier_store_release/store-release with non-pointer dst_reg @unpriv:OK #556/15 verifier_store_release/misaligned store-release:OK #556/16 verifier_store_release/misaligned store-release @unpriv:OK #556/17 verifier_store_release/store-release to ctx pointer:OK #556/18 verifier_store_release/store-release to ctx pointer @unpriv:OK #556/19 verifier_store_release/store-release, leak pointer to stack:OK #556/20 verifier_store_release/store-release, leak pointer to stack @unpriv:OK #556/21 verifier_store_release/store-release, leak pointer to map:OK #556/22 verifier_store_release/store-release, leak pointer to map @unpriv:OK #556/23 verifier_store_release/store-release with invalid register R15:OK #556/24 verifier_store_release/store-release with invalid register R15 @unpriv:OK #556/25 verifier_store_release/store-release to pkt pointer:OK #556/26 verifier_store_release/store-release to flow_keys pointer:OK #556/27 verifier_store_release/store-release to sock pointer:OK #556 verifier_store_release:OK Summary: 3/55 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Puranjay Mohan <[email protected]> Tested-by: Saket Kumar Bhaskar <[email protected]> Reviewed-by: Hari Bathini <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
| * PCI: pnv_php: Fix surprise plug detection and recoveryTimothy Pearson2025-07-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing PowerNV hotplug code did not handle surprise plug events correctly, leading to a complete failure of the hotplug system after device removal and a required reboot to detect new devices. This comes down to two issues: 1) When a device is surprise removed, often the bridge upstream port will cause a PE freeze on the PHB. If this freeze is not cleared, the MSI interrupts from the bridge hotplug notification logic will not be received by the kernel, stalling all plug events on all slots associated with the PE. 2) When a device is removed from a slot, regardless of surprise or programmatic removal, the associated PHB/PE ls left frozen. If this freeze is not cleared via a fundamental reset, skiboot is unable to clear the freeze and cannot retrain / rescan the slot. This also requires a reboot to clear the freeze and redetect the device in the slot. Issue the appropriate unfreeze and rescan commands on hotplug events, and don't oops on hotplug if pci_bus_to_OF_node() returns NULL. Signed-off-by: Timothy Pearson <[email protected]> [bhelgaas: tidy comments] Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/171044224.1359864.1752615546988.JavaMail.zimbra@raptorengineeringinc.com
| * powerpc/eeh: Make EEH driver device hotplug safeTimothy Pearson2025-07-262-20/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multiple race conditions existed between the PCIe hotplug driver and the EEH driver, leading to a variety of kernel oopses of the same general nature: <pcie device unplug> <eeh driver trigger> <hotplug removal trigger> <pcie tree reconfiguration> <eeh recovery next step> <oops in EEH driver bus iteration loop> A second class of oops is also seen when the underlying bus disappears during device recovery. Refactor the EEH module to be PCI rescan and remove safe. Also clean up a few minor formatting / readability issues. Signed-off-by: Timothy Pearson <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/1334208367.1359861.1752615503144.JavaMail.zimbra@raptorengineeringinc.com
| * powerpc/eeh: Export eeh_unfreeze_pe()Timothy Pearson2025-07-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The PowerNV hotplug driver needs to be able to clear any frozen PE(s) on the PHB after suprise removal of a downstream device. Export the eeh_unfreeze_pe() symbol to allow implementation of this functionality in the php_nv module. Signed-off-by: Timothy Pearson <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/1778535414.1359858.1752615454618.JavaMail.zimbra@raptorengineeringinc.com
* | Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of ↵Linus Torvalds2025-08-033-3/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Significant patch series in this pull request: - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets us closer to being able to remove page->mapping - "relayfs: misc changes" (Jason Xing) does some maintenance and minor feature addition work in relayfs - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches us from static preallocation of the kdump crashkernel's working memory over to dynamic allocation. So the difficulty of a-priori estimation of the second kernel's needs is removed and the first kernel obtains extra memory - "generalize panic_print's dump function to be used by other kernel parts" (Feng Tang) implements some consolidation and rationalization of the various ways in which a failing kernel splats information at the operator * tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits) tools/getdelays: add backward compatibility for taskstats version kho: add test for kexec handover delaytop: enhance error logging and add PSI feature description samples: Kconfig: fix spelling mistake "instancess" -> "instances" fat: fix too many log in fat_chain_add() scripts/spelling.txt: add notifer||notifier to spelling.txt xen/xenbus: fix typo "notifer" net: mvneta: fix typo "notifer" drm/xe: fix typo "notifer" cxl: mce: fix typo "notifer" KVM: x86: fix typo "notifer" MAINTAINERS: add maintainers for delaytop ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below() ucount: fix atomic_long_inc_below() argument type kexec: enable CMA based contiguous allocation stackdepot: make max number of pools boot-time configurable lib/xxhash: remove unused functions init/Kconfig: restore CONFIG_BROKEN help text lib/raid6: update recov_rvv.c zero page usage docs: update docs after introducing delaytop ...
| * | Add a new optional ",cma" suffix to the crashkernel= command line optionJiri Bohac2025-07-203-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "kdump: crashkernel reservation from CMA", v5. This series implements a way to reserve additional crash kernel memory using CMA. Currently, all the memory for the crash kernel is not usable by the 1st (production) kernel. It is also unmapped so that it can't be corrupted by the fault that will eventually trigger the crash. This makes sense for the memory actually used by the kexec-loaded crash kernel image and initrd and the data prepared during the load (vmcoreinfo, ...). However, the reserved space needs to be much larger than that to provide enough run-time memory for the crash kernel and the kdump userspace. Estimating the amount of memory to reserve is difficult. Being too careful makes kdump likely to end in OOM, being too generous takes even more memory from the production system. Also, the reservation only allows reserving a single contiguous block (or two with the "low" suffix). I've seen systems where this fails because the physical memory is fragmented. By reserving additional crashkernel memory from CMA, the main crashkernel reservation can be just large enough to fit the kernel and initrd image, minimizing the memory taken away from the production system. Most of the run-time memory for the crash kernel will be memory previously available to userspace in the production system. As this memory is no longer wasted, the reservation can be done with a generous margin, making kdump more reliable. Kernel memory that we need to preserve for dumping is normally not allocated from CMA, unless it is explicitly allocated as movable. Currently this is only the case for memory ballooning and zswap. Such movable memory will be missing from the vmcore. User data is typically not dumped by makedumpfile. When dumping of user data is intended this new CMA reservation cannot be used. There are five patches in this series: The first adds a new ",cma" suffix to the recenly introduced generic crashkernel parsing code. parse_crashkernel() takes one more argument to store the cma reservation size. The second patch implements reserve_crashkernel_cma() which performs the reservation. If the requested size is not available in a single range, multiple smaller ranges will be reserved. The third patch updates Documentation/, explicitly mentioning the potential DMA corruption of the CMA-reserved memory. The fourth patch adds a short delay before booting the kdump kernel, allowing pending DMA transfers to finish. The fifth patch enables the functionality for x86 as a proof of concept. There are just three things every arch needs to do: - call reserve_crashkernel_cma() - include the CMA-reserved ranges in the physical memory map - exclude the CMA-reserved ranges from the memory available through /proc/vmcore by excluding them from the vmcoreinfo PT_LOAD ranges. Adding other architectures is easy and I can do that as soon as this series is merged. With this series applied, specifying crashkernel=100M craskhernel=1G,cma on the command line will make a standard crashkernel reservation of 100M, where kexec will load the kernel and initrd. An additional 1G will be reserved from CMA, still usable by the production system. The crash kernel will have 1.1G memory available. The 100M can be reliably predicted based on the size of the kernel and initrd. The new cma suffix is completely optional. When no crashkernel=size,cma is specified, everything works as before. This patch (of 5): Add a new cma_size parameter to parse_crashkernel(). When not NULL, call __parse_crashkernel to parse the CMA reservation size from "crashkernel=size,cma" and store it in cma_size. Set cma_size to NULL in all calls to parse_crashkernel(). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiri Bohac <[email protected]> Cc: Baoquan He <[email protected]> Cc: Dave Young <[email protected]> Cc: Donald Dutile <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Philipp Rudo <[email protected]> Cc: Pingfan Liu <[email protected]> Cc: Tao Liu <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* | | Merge tag 'mm-stable-2025-07-30-15-25' of ↵Linus Torvalds2025-07-3117-98/+23
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "As usual, many cleanups. The below blurbiage describes 42 patchsets. 21 of those are partially or fully cleanup work. "cleans up", "cleanup", "maintainability", "rationalizes", etc. I never knew the MM code was so dirty. "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes) addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for merging with existing adjacent VMAs. "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park) adds a new kernel module which simplifies the setup and usage of DAMON in production environments. "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig) is a cleanup to the writeback code which removes a couple of pointers from struct writeback_control. "drivers/base/node.c: optimization and cleanups" (Donet Tom) contains largely uncorrelated cleanups to the NUMA node setup and management code. "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman) does some maintenance work on the userfaultfd code. "Readahead tweaks for larger folios" (Ryan Roberts) implements some tuneups for pagecache readahead when it is reading into order>0 folios. "selftests/mm: Tweaks to the cow test" (Mark Brown) provides some cleanups and consistency improvements to the selftests code. "Optimize mremap() for large folios" (Dev Jain) does that. A 37% reduction in execution time was measured in a memset+mremap+munmap microbenchmark. "Remove zero_user()" (Matthew Wilcox) expunges zero_user() in favor of the more modern memzero_page(). "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand) addresses some warts which David noticed in the huge page code. These were not known to be causing any issues at this time. "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park) provides some cleanup and consolidation work in DAMON. "use vm_flags_t consistently" (Lorenzo Stoakes) uses vm_flags_t in places where we were inappropriately using other types. "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy) increases the reliability of large page allocation in the memfd code. "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple) removes several now-unneeded PFN_* flags. "mm/damon: decouple sysfs from core" (SeongJae Park) implememnts some cleanup and maintainability work in the DAMON sysfs layer. "madvise cleanup" (Lorenzo Stoakes) does quite a lot of cleanup/maintenance work in the madvise() code. "madvise anon_name cleanups" (Vlastimil Babka) provides additional cleanups on top or Lorenzo's effort. "Implement numa node notifier" (Oscar Salvador) creates a standalone notifier for NUMA node memory state changes. Previously these were lumped under the more general memory on/offline notifier. "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan) cleans up the pageblock isolation code and fixes a potential issue which doesn't seem to cause any problems in practice. "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park) adds additional drgn- and python-based DAMON selftests which are more comprehensive than the existing selftest suite. "Misc rework on hugetlb faulting path" (Oscar Salvador) fixes a rather obscure deadlock in the hugetlb fault code and follows that fix with a series of cleanups. "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport) rationalizes and cleans up the highmem-specific code in the CMA allocator. "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand) provides cleanups and future-preparedness to the migration code. "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park) adds some tracepoints to some DAMON auto-tuning code. "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park) does that. "mm/damon: misc cleanups" (SeongJae Park) also does what it claims. "mm: folio_pte_batch() improvements" (David Hildenbrand) cleans up the large folio PTE batching code. "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park) facilitates dynamic alteration of DAMON's inter-node allocation policy. "Remove unmap_and_put_page()" (Vishal Moola) provides a couple of page->folio conversions. "mm: per-node proactive reclaim" (Davidlohr Bueso) implements a per-node control of proactive reclaim - beyond the current memcg-based implementation. "mm/damon: remove damon_callback" (SeongJae Park) replaces the damon_callback interface with a more general and powerful damon_call()+damos_walk() interface. "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes) implements a number of mremap cleanups (of course) in preparation for adding new mremap() functionality: newly permit the remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It still excludes some specialized situations where this cannot be performed reliably. "drop hugetlb_free_pgd_range()" (Anthony Yznaga) switches some sparc hugetlb code over to the generic version and removes the thus-unneeded hugetlb_free_pgd_range(). "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park) augments the present userspace-requested update of DAMON sysfs monitoring files. Automatic update is now provided, along with a tunable to control the update interval. "Some randome fixes and cleanups to swapfile" (Kemeng Shi) does what is claims. "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand) provides (and uses) a means by which debug-style functions can grab a copy of a pageframe and inspect it locklessly without tripping over the races inherent in operating on the live pageframe directly. "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan) addresses the large contention issues which can be triggered by reads from that procfs file. Latencies are reduced by more than half in some situations. The series also introduces several new selftests for the /proc/pid/maps interface. "__folio_split() clean up" (Zi Yan) cleans up __folio_split()! "Optimize mprotect() for large folios" (Dev Jain) provides some quite large (>3x) speedups to mprotect() when dealing with large folios. "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian) does some cleanup work in the selftests code. "tools/testing: expand mremap testing" (Lorenzo Stoakes) extends the mremap() selftest in several ways, including adding more checking of Lorenzo's recently added "permit mremap() move of multiple VMAs" feature. "selftests/damon/sysfs.py: test all parameters" (SeongJae Park) extends the DAMON sysfs interface selftest so that it tests all possible user-requested parameters. Rather than the present minimal subset" * tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits) MAINTAINERS: add missing headers to mempory policy & migration section MAINTAINERS: add missing file to cgroup section MAINTAINERS: add MM MISC section, add missing files to MISC and CORE MAINTAINERS: add missing zsmalloc file MAINTAINERS: add missing files to page alloc section MAINTAINERS: add missing shrinker files MAINTAINERS: move memremap.[ch] to hotplug section MAINTAINERS: add missing mm_slot.h file THP section MAINTAINERS: add missing interval_tree.c to memory mapping section MAINTAINERS: add missing percpu-internal.h file to per-cpu section mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() selftests/damon: introduce _common.sh to host shared function selftests/damon/sysfs.py: test runtime reduction of DAMON parameters selftests/damon/sysfs.py: test non-default parameters runtime commit selftests/damon/sysfs.py: generalize DAMON context commit assertion selftests/damon/sysfs.py: generalize monitoring attributes commit assertion selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion selftests/damon/sysfs.py: test DAMOS filters commitment selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion selftests/damon/sysfs.py: test DAMOS destinations commitment ...
| * | | mm/balloon_compaction: convert balloon_page_delete() to balloon_page_finalize()David Hildenbrand2025-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's move the removal of the page from the balloon list into the single caller, to remove the dependency on the PG_isolated flag and clarify locking requirements. Note that for now, balloon_page_delete() was used on two paths: (1) Removing a page from the balloon for deflation through balloon_page_list_dequeue() (2) Removing an isolated page from the balloon for migration in the per-driver migration handlers. Isolated pages were already removed from the balloon list during isolation. So instead of relying on the flag, we can just distinguish both cases directly and handle it accordingly in the caller. We'll shuffle the operations a bit such that they logically make more sense (e.g., remove from the list before clearing flags). In balloon migration functions we can now move the balloon_page_finalize() out of the balloon lock and perform the finalization just before dropping the balloon reference. Document that the page lock is currently required when modifying the movability aspects of a page; hopefully we can soon decouple this from the page lock. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Al Viro <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Brendan Jackman <[email protected]> Cc: Byungchul Park <[email protected]> Cc: Chengming Zhou <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Eugenio Pé rez <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Gregory Price <[email protected]> Cc: Harry Yoo <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jerrin Shaji George <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Joshua Hahn <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mathew Brost <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Xu <[email protected]> Cc: Qi Zheng <[email protected]> Cc: Rakie Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xuan Zhuo <[email protected]> Cc: xu xin <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * | | mm: remove devmap related functions and page table bitsAlistair Popple2025-07-105-76/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that DAX and all other reference counts to ZONE_DEVICE pages are managed normally there is no need for the special devmap PTE/PMD/PUD page table bits. So drop all references to these, freeing up a software defined page table bit on architectures supporting it. Link: https://lkml.kernel.org/r/6389398c32cc9daa3dfcaa9f79c7972525d310ce.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <[email protected]> Acked-by: Will Deacon <[email protected]> # arm64 Acked-by: David Hildenbrand <[email protected]> Suggested-by: Chunyan Zhang <[email protected]> Reviewed-by: Björn Töpel <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dan Williams <[email protected]> Cc: Deepak Gupta <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Inki Dae <[email protected]> Cc: John Groves <[email protected]> Cc: John Hubbard <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * | | powerpc: remove checks for devmap pages and PMDs/PUDsAlistair Popple2025-07-106-14/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PFN_DEV no longer exists. This means no devmap PMDs or PUDs will be created, so checking for them is redundant. Instead mappings of pages that would have previously returned true for pXd_devmap() will return true for pXd_trans_huge() Link: https://lkml.kernel.org/r/31f63cc8dd518f9e2ec300681fe302eb4adf49b4.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chunyan Zhang <[email protected]> Cc: Dan Williams <[email protected]> Cc: Deepak Gupta <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Inki Dae <[email protected]> Cc: John Groves <[email protected]> Cc: John Hubbard <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * | | mm: update architecture and driver code to use vm_flags_tLorenzo Stoakes2025-07-103-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In future we intend to change the vm_flags_t type, so it isn't correct for architecture and driver code to assume it is unsigned long. Correct this assumption across the board. Overall, this patch does not introduce any functional change. Link: https://lkml.kernel.org/r/b6eb1894abc5555ece80bb08af5c022ef780c8bc.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Acked-by: Christian Brauner <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Pedro Falcato <[email protected]> Acked-by: Catalin Marinas <[email protected]> [arm64] Acked-by: Zi Yan <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kees Cook <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * | | mm: change vm_get_page_prot() to accept vm_flags_t argumentLorenzo Stoakes2025-07-102-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "use vm_flags_t consistently". The VMA flags field vma->vm_flags is of type vm_flags_t. Right now this is exactly equivalent to unsigned long, but it should not be assumed to be. Much code that references vma->vm_flags already correctly uses vm_flags_t, but a fairly large chunk of code simply uses unsigned long and assumes that the two are equivalent. This series corrects that and has us use vm_flags_t consistently. This series is motivated by the desire to, in a future series, adjust vm_flags_t to be a u64 regardless of whether the kernel is 32-bit or 64-bit in order to deal with the VMA flag exhaustion issue and avoid all the various problems that arise from it (being unable to use certain features in 32-bit, being unable to add new flags except for 64-bit, etc.) This is therefore a critical first step towards that goal. At any rate, using the correct type is of value regardless. We additionally take the opportunity to refer to VMA flags as vm_flags where possible to make clear what we're referring to. Overall, this series does not introduce any functional change. This patch (of 3): We abstract the type of the VMA flags to vm_flags_t, however in may places it is simply assumed this is unsigned long, which is simply incorrect. At the moment this is simply an incongruity, however in future we plan to change this type and therefore this change is a critical requirement for doing so. Overall, this patch does not introduce any functional change. [[email protected]: add missing vm_get_page_prot() instance, remove include] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/a12769720a2743f235643b158c4f4f0a9911daf0.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Acked-by: Christian Brauner <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Pedro Falcato <[email protected]> Acked-by: Catalin Marinas <[email protected]> [arm64] Acked-by: Zi Yan <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kees Cook <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
| * | | drivers/base/node: rename __register_one_node() to register_one_node()Donet Tom2025-07-101-1/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The register_one_node() function was a simple wrapper around __register_one_node(). To simplify the code, register_one_node() has been removed, and __register_one_node() has been renamed to register_one_node(). Link: https://lkml.kernel.org/r/8262cd0f44eeb048a1fcd3ac8382760d7f7dea60.1748452242.git.donettom@linux.ibm.com Signed-off-by: Donet Tom <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Mike Rapoport (Microsoft) <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
* | | Merge tag 'ftrace-v6.17' of ↵Linus Torvalds2025-07-301-1/+0
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: - Keep track of when fgraph_ops are registered or not Keep accounting of when fgraph_ops are registered as if a fgraph_ops is registered twice it can mess up the accounting and it will not work as expected later. Trigger a warning if something registers it twice as to catch bugs before they are found by things just not working as expected. - Make DYNAMIC_FTRACE always enabled for architectures that support it As static ftrace (where all functions are always traced) is very expensive and only exists to help architectures support ftrace, do not make it an option. As soon as an architecture supports DYNAMIC_FTRACE make it use it. This simplifies the code. - Remove redundant config HAVE_FTRACE_MCOUNT_RECORD The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the DYNAMIC_FTRACE work, but now every architecture that implements DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant with the HAVE_DYNAMIC_FTRACE. - Make pid_ptr string size match the comment In print_graph_proc() the pid_ptr string is of size 11, but the comment says /* sign + log10(MAX_INT) + '\0' */ which is actually 12. * tag 'ftrace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORD ftrace: Make DYNAMIC_FTRACE always enabled for architectures that support it fgraph: Keep track of when fgraph_ops are registered or not fgraph: Make pid_str size match the comment
| * | | tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORDSteven Rostedt2025-07-231-1/+0
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ftrace is tightly coupled with architecture specific code because it requires the use of trampolines written in assembly. This means that when a new feature or optimization is made, it must be done for all architectures. To simplify the approach, CONFIG_HAVE_FTRACE_* configs are added to denote which architecture has the new enhancement so that other architectures can still function until they too have been updated. The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the DYNAMIC_FTRACE work, but now every architecture that implements DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant with the HAVE_DYNAMIC_FTRACE. Remove the HAVE_FTRACE_MCOUNT config and use DYNAMIC_FTRACE directly where applicable. Link: https://lore.kernel.org/all/[email protected]/ Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
* | | Merge tag 'bpf-next-6.17' of ↵Linus Torvalds2025-07-301-20/+59
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Remove usermode driver (UMD) framework (Thomas Weißschuh) - Introduce Strongly Connected Component (SCC) in the verifier to detect loops and refine register liveness (Eduard Zingerman) - Allow 'void *' cast using bpf_rdonly_cast() and corresponding '__arg_untrusted' for global function parameters (Eduard Zingerman) - Improve precision for BPF_ADD and BPF_SUB operations in the verifier (Harishankar Vishwanathan) - Teach the verifier that constant pointer to a map cannot be NULL (Ihor Solodrai) - Introduce BPF streams for error reporting of various conditions detected by BPF runtime (Kumar Kartikeya Dwivedi) - Teach the verifier to insert runtime speculation barrier (lfence on x86) to mitigate speculative execution instead of rejecting the programs (Luis Gerhorst) - Various improvements for 'veristat' (Mykyta Yatsenko) - For CONFIG_DEBUG_KERNEL config warn on internal verifier errors to improve bug detection by syzbot (Paul Chaignon) - Support BPF private stack on arm64 (Puranjay Mohan) - Introduce bpf_cgroup_read_xattr() kfunc to read xattr of cgroup's node (Song Liu) - Introduce kfuncs for read-only string opreations (Viktor Malik) - Implement show_fdinfo() for bpf_links (Tao Chen) - Reduce verifier's stack consumption (Yonghong Song) - Implement mprog API for cgroup-bpf programs (Yonghong Song) * tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (192 commits) selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite selftests/bpf: Add selftest for attaching tracing programs to functions in deny list bpf: Add log for attaching tracing programs to functions in deny list bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions bpf: Fix various typos in verifier.c comments bpf: Add third round of bounds deduction selftests/bpf: Test invariants on JSLT crossing sign selftests/bpf: Test cross-sign 64bits range refinement selftests/bpf: Update reg_bound range refinement logic bpf: Improve bounds when s64 crosses sign boundary bpf: Simplify bounds refinement from s32 selftests/bpf: Enable private stack tests for arm64 bpf, arm64: JIT support for private stack bpf: Move bpf_jit_get_prog_name() to core.c bpf, arm64: Fix fp initialization for exception boundary umd: Remove usermode driver framework bpf/preload: Don't select USERMODE_DRIVER selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure selftests/bpf: Increase xdp data size for arm64 64K page size ...
| * | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc3Alexei Starovoitov2025-06-2614-15/+41
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge BPF, perf and other fixes after downstream PRs. It restores BPF CI to green after critical fix commit bc4394e5e79c ("perf: Fix the throttle error of some clock events") No conflicts. Signed-off-by: Alexei Starovoitov <[email protected]>
| * | | powerpc/bpf: Fix warning for unused ori31_emittedLuis Gerhorst2025-06-191-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this, the compiler (clang21) might emit a warning under W=1 because the variable ori31_emitted is set but never used if CONFIG_PPC_BOOK3S_64=n. Without this patch: $ make -j $(nproc) W=1 ARCH=powerpc SHELL=/bin/bash arch/powerpc/net [...] CC arch/powerpc/net/bpf_jit_comp.o CC arch/powerpc/net/bpf_jit_comp64.o ../arch/powerpc/net/bpf_jit_comp64.c: In function 'bpf_jit_build_body': ../arch/powerpc/net/bpf_jit_comp64.c:417:28: warning: variable 'ori31_emitted' set but not used [-Wunused-but-set-variable] 417 | bool sync_emitted, ori31_emitted; | ^~~~~~~~~~~~~ AR arch/powerpc/net/built-in.a With this patch: [...] CC arch/powerpc/net/bpf_jit_comp.o CC arch/powerpc/net/bpf_jit_comp64.o AR arch/powerpc/net/built-in.a Fixes: dff883d9e93a ("bpf, arm64, powerpc: Change nospec to include v1 barrier") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Luis Gerhorst <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
| * | | bpf, arm64, powerpc: Change nospec to include v1 barrierLuis Gerhorst2025-06-101-16/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes the semantics of BPF_NOSPEC (previously a v4-only barrier) to always emit a speculation barrier that works against both Spectre v1 AND v4. If mitigation is not needed on an architecture, the backend should set bpf_jit_bypass_spec_v4/v1(). As of now, this commit only has the user-visible implication that unpriv BPF's performance on PowerPC is reduced. This is the case because we have to emit additional v1 barrier instructions for BPF_NOSPEC now. This commit is required for a future commit to allow us to rely on BPF_NOSPEC for Spectre v1 mitigation. As of this commit, the feature that nospec acts as a v1 barrier is unused. Commit f5e81d111750 ("bpf: Introduce BPF nospec instruction for mitigating Spectre v4") noted that mitigation instructions for v1 and v4 might be different on some archs. While this would potentially offer improved performance on PowerPC, it was dismissed after the following considerations: * Only having one barrier simplifies the verifier and allows us to easily rely on v4-induced barriers for reducing the complexity of v1-induced speculative path verification. * For the architectures that implemented BPF_NOSPEC, only PowerPC has distinct instructions for v1 and v4. Even there, some insns may be shared between the barriers for v1 and v4 (e.g., 'ori 31,31,0' and 'sync'). If this is still found to impact performance in an unacceptable way, BPF_NOSPEC can be split into BPF_NOSPEC_V1 and BPF_NOSPEC_V4 later. As an optimization, we can already skip v1/v4 insns from being emitted for PowerPC with this setup if bypass_spec_v1/v4 is set. Vulnerability-status for BPF_NOSPEC-based Spectre mitigations (v4 as of this commit, v1 in the future) is therefore: * x86 (32-bit and 64-bit), ARM64, and PowerPC (64-bit): Mitigated - This patch implements BPF_NOSPEC for these architectures. The previous v4-only version was supported since commit f5e81d111750 ("bpf: Introduce BPF nospec instruction for mitigating Spectre v4") and commit b7540d625094 ("powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC"). * LoongArch: Not Vulnerable - Commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation barrier opcode") is the only other past commit related to BPF_NOSPEC and indicates that the insn is not required there. * MIPS: Vulnerable (if unprivileged BPF is enabled) - Commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation barrier opcode") indicates that it is not vulnerable, but this contradicts the kernel and Debian documentation. Therefore, I assume that there exist vulnerable MIPS CPUs (but maybe not from Loongson?). In the future, BPF_NOSPEC could be implemented for MIPS based on the GCC speculation_barrier [1]. For now, we rely on unprivileged BPF being disabled by default. * Other: Unknown - To the best of my knowledge there is no definitive information available that indicates that any other arch is vulnerable. They are therefore left untouched (BPF_NOSPEC is not implemented, but bypass_spec_v1/v4 is also not set). I did the following testing to ensure the insn encoding is correct: * ARM64: * 'dsb nsh; isb' was successfully tested with the BPF CI in [2] * 'sb' locally using QEMU v7.2.15 -cpu max (emitted sb insn is executed for example with './test_progs -t verifier_array_access') * PowerPC: The following configs were tested locally with ppc64le QEMU v8.2 '-machine pseries -cpu POWER9': * STF_BARRIER_EIEIO + CONFIG_PPC_BOOK32_64 * STF_BARRIER_SYNC_ORI (forced on) + CONFIG_PPC_BOOK32_64 * STF_BARRIER_FALLBACK (forced on) + CONFIG_PPC_BOOK32_64 * CONFIG_PPC_E500 (forced on) + STF_BARRIER_EIEIO * CONFIG_PPC_E500 (forced on) + STF_BARRIER_SYNC_ORI (forced on) * CONFIG_PPC_E500 (forced on) + STF_BARRIER_FALLBACK (forced on) * CONFIG_PPC_E500 (forced on) + STF_BARRIER_NONE (forced on) Most of those cobinations should not occur in practice, but I was not able to get an PPC e6500 rootfs (for testing PPC_E500 without forcing it on). In any case, this should ensure that there are no unexpected conflicts between the insns when combined like this. Individual v1/v4 barriers were already emitted elsewhere. Hari's ack is for the PowerPC changes only. [1] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=29b74545531f6afbee9fc38c267524326dbfbedf ("MIPS: Add speculation_barrier support") [2] https://github.com/kernel-patches/bpf/pull/8576 Signed-off-by: Luis Gerhorst <[email protected]> Acked-by: Hari Bathini <[email protected]> Cc: Henriette Herzog <[email protected]> Cc: Maximilian Ott <[email protected]> Cc: Milan Stephan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
| * | | bpf, arm64, powerpc: Add bpf_jit_bypass_spec_v1/v4()Luis Gerhorst2025-06-101-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | JITs can set bpf_jit_bypass_spec_v1/v4() if they want the verifier to skip analysis/patching for the respective vulnerability. For v4, this will reduce the number of barriers the verifier inserts. For v1, it allows more programs to be accepted. The primary motivation for this is to not regress unpriv BPF's performance on ARM64 in a future commit where BPF_NOSPEC is also used against Spectre v1. This has the user-visible change that v1-induced rejections on non-vulnerable PowerPC CPUs are avoided. For now, this does not change the semantics of BPF_NOSPEC. It is still a v4-only barrier and must not be implemented if bypass_spec_v4 is always true for the arch. Changing it to a v1 AND v4-barrier is done in a future commit. As an alternative to bypass_spec_v1/v4, one could introduce NOSPEC_V1 AND NOSPEC_V4 instructions and allow backends to skip their lowering as suggested by commit f5e81d111750 ("bpf: Introduce BPF nospec instruction for mitigating Spectre v4"). Adding bpf_jit_bypass_spec_v1/v4() was found to be preferable for the following reason: * bypass_spec_v1/v4 benefits non-vulnerable CPUs: Always performing the same analysis (not taking into account whether the current CPU is vulnerable), needlessly restricts users of CPUs that are not vulnerable. The only use case for this would be portability-testing, but this can later be added easily when needed by allowing users to force bypass_spec_v1/v4 to false. * Portability is still acceptable: Directly disabling the analysis instead of skipping the lowering of BPF_NOSPEC(_V1/V4) might allow programs on non-vulnerable CPUs to be accepted while the program will be rejected on vulnerable CPUs. With the fallback to speculation barriers for Spectre v1 implemented in a future commit, this will only affect programs that do variable stack-accesses or are very complex. For PowerPC, the SEC_FTR checking in bpf_jit_bypass_spec_v4() is based on the check that was previously located in the BPF_NOSPEC case. For LoongArch, it would likely be safe to set both bpf_jit_bypass_spec_v1() and _v4() according to commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation barrier opcode"). This is omitted here as I am unable to do any testing for LoongArch. Hari's ack concerns the PowerPC part only. Signed-off-by: Luis Gerhorst <[email protected]> Acked-by: Hari Bathini <[email protected]> Cc: Henriette Herzog <[email protected]> Cc: Maximilian Ott <[email protected]> Cc: Milan Stephan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
* | | | Merge tag 'net-next-6.17' of ↵Linus Torvalds2025-07-302-1/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Wrap datapath globals into net_aligned_data, to avoid false sharing - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container) - Add SO_INQ and SCM_INQ support to AF_UNIX - Add SIOCINQ support to AF_VSOCK - Add TCP_MAXSEG sockopt to MPTCP - Add IPv6 force_forwarding sysctl to enable forwarding per interface - Make TCP validation of whether packet fully fits in the receive window and the rcv_buf more strict. With increased use of HW aggregation a single "packet" can be multiple 100s of kB - Add MSG_MORE flag to optimize large TCP transmissions via sockmap, improves latency up to 33% for sockmap users - Convert TCP send queue handling from tasklet to BH workque - Improve BPF iteration over TCP sockets to see each socket exactly once - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code - Support enabling kernel threads for NAPI processing on per-NAPI instance basis rather than a whole device. Fully stop the kernel NAPI thread when threaded NAPI gets disabled. Previously thread would stick around until ifdown due to tricky synchronization - Allow multicast routing to take effect on locally-generated packets - Add output interface argument for End.X in segment routing - MCTP: add support for gateway routing, improve bind() handling - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink - Add a new neighbor flag ("extern_valid"), which cedes refresh responsibilities to userspace. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed - Support NUD_PERMANENT for proxy neighbor entries - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM - Add sequence numbers to netconsole messages. Unregister netconsole's console when all net targets are removed. Code refactoring. Add a number of selftests - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol should be used for an inbound SA lookup - Support inspecting ref_tracker state via DebugFS - Don't force bonding advertisement frames tx to ~333 ms boundaries. Add broadcast_neighbor option to send ARP/ND on all bonded links - Allow providing upcall pid for the 'execute' command in openvswitch - Remove DCCP support from Netfilter's conntrack - Disallow multiple packet duplications in the queuing layer - Prevent use of deprecated iptables code on PREEMPT_RT Driver API: - Support RSS and hashing configuration over ethtool Netlink - Add dedicated ethtool callbacks for getting and setting hashing fields - Add support for power budget evaluation strategy in PSE / Power-over-Ethernet. Generate Netlink events for overcurrent etc - Support DPLL phase offset monitoring across all device inputs. Support providing clock reference and SYNC over separate DPLL inputs - Support traffic classes in devlink rate API for bandwidth management - Remove rtnl_lock dependency from UDP tunnel port configuration Device drivers: - Add a new Broadcom driver for 800G Ethernet (bnge) - Add a standalone driver for Microchip ZL3073x DPLL - Remove IBM's NETIUCV device driver - Ethernet high-speed NICs: - Broadcom (bnxt): - support zero-copy Tx of DMABUF memory - take page size into account for page pool recycling rings - Intel (100G, ice, idpf): - idpf: XDP and AF_XDP support preparations - idpf: add flow steering - add link_down_events statistic - clean up the TSPLL code - preparations for live VM migration - nVidia/Mellanox: - support zero-copy Rx/Tx interfaces (DMABUF and io_uring) - optimize context memory usage for matchers - expose serial numbers in devlink info - support PCIe congestion metrics - Meta (fbnic): - add 25G, 50G, and 100G link modes to phylink - support dumping FW logs - Marvell/Cavium: - support for CN20K generation of the Octeon chips - Amazon: - add HW clock (without timestamping, just hypervisor time access) - Ethernet virtual: - VirtIO net: - support segmentation of UDP-tunnel-encapsulated packets - Google (gve): - support packet timestamping and clock synchronization - Microsoft vNIC: - add handler for device-originated servicing events - allow dynamic MSI-X vector allocation - support Tx bandwidth clamping - Ethernet NICs consumer, and embedded: - AMD: - amd-xgbe: hardware timestamping and PTP clock support - Broadcom integrated MACs (bcmgenet, bcmasp): - use napi_complete_done() return value to support NAPI polling - add support for re-starting auto-negotiation - Broadcom switches (b53): - support BCM5325 switches - add bcm63xx EPHY power control - Synopsys (stmmac): - lots of code refactoring and cleanups - TI: - icssg-prueth: read firmware-names from device tree - icssg: PRP offload support - Microchip: - lan78xx: convert to PHYLINK for improved PHY and MAC management - ksz: add KSZ8463 switch support - Intel: - support similar queue priority scheme in multi-queue and time-sensitive networking (taprio) - support packet pre-emption in both - RealTek (r8169): - enable EEE at 5Gbps on RTL8126 - Airoha: - add PPPoE offload support - MDIO bus controller for Airoha AN7583 - Ethernet PHYs: - support for the IPQ5018 internal GE PHY - micrel KSZ9477 switch-integrated PHYs: - add MDI/MDI-X control support - add RX error counters - add cable test support - add Signal Quality Indicator (SQI) reporting - dp83tg720: improve reset handling and reduce link recovery time - support bcm54811 (and its MII-Lite interface type) - air_en8811h: support resume/suspend - support PHY counters for QCA807x and QCA808x - support WoL for QCA807x - CAN drivers: - rcar_canfd: support for Transceiver Delay Compensation - kvaser: report FW versions via devlink dev info - WiFi: - extended regulatory info support (6 GHz) - add statistics and beacon monitor for Multi-Link Operation (MLO) - support S1G aggregation, improve S1G support - add Radio Measurement action fields - support per-radio RTS threshold - some work around how FIPS affects wifi, which was wrong (RC4 is used by TKIP, not only WEP) - improvements for unsolicited probe response handling - WiFi drivers: - RealTek (rtw88): - IBSS mode for SDIO devices - RealTek (rtw89): - BT coexistence for MLO/WiFi7 - concurrent station + P2P support - support for USB devices RTL8851BU/RTL8852BU - Intel (iwlwifi): - use embedded PNVM in (to be released) FW images to fix compatibility issues - many cleanups (unused FW APIs, PCIe code, WoWLAN) - some FIPS interoperability - MediaTek (mt76): - firmware recovery improvements - more MLO work - Qualcomm/Atheros (ath12k): - fix scan on multi-radio devices - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - Broadcom (brcm80211): - support SDIO 43751 device - Bluetooth: - hci_event: add support for handling LE BIG Sync Lost event - ISO: add socket option to report packet seqnum via CMSG - ISO: support SCM_TIMESTAMPING for ISO TS - Bluetooth drivers: - intel_pcie: support Function Level Reset - nxpuart: add support for 4M baudrate - nxpuart: implement powerup sequence, reset, FW dump, and FW loading" * tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits) dpll: zl3073x: Fix build failure selftests: bpf: fix legacy netfilter options ipv6: annotate data-races around rt->fib6_nsiblings ipv6: fix possible infinite loop in fib6_info_uses_dev() ipv6: prevent infinite loop in rt6_nlmsg_size() ipv6: add a retry logic in net6_rt_notify() vrf: Drop existing dst reference in vrf_ip6_input_dst net/sched: taprio: align entry index attr validation with mqprio net: fsl_pq_mdio: use dev_err_probe selftests: rtnetlink.sh: remove esp4_offload after test vsock: remove unnecessary null check in vsock_getname() igb: xsk: solve negative overflow of nb_pkts in zerocopy mode stmmac: xsk: fix negative overflow of budget in zerocopy mode dt-bindings: ieee802154: Convert at86rf230.txt yaml format net: dsa: microchip: Disable PTP function of KSZ8463 net: dsa: microchip: Setup fiber ports for KSZ8463 net: dsa: microchip: Write switch MAC address differently for KSZ8463 net: dsa: microchip: Use different registers for KSZ8463 net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver dt-bindings: net: dsa: microchip: Add KSZ8463 switch support ...
| * | | | ibmveth: Add multi buffers rx replenishment hcall supportMingming Cao2025-07-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables batched RX buffer replenishment in ibmveth by using the new firmware-supported h_add_logical_lan_buffers() hcall to submit up to 8 RX buffers in a single call, instead of repeatedly calling the single-buffer h_add_logical_lan_buffer() hcall. During the probe, with the patch, the driver queries ILLAN attributes to detect IBMVETH_ILLAN_RX_MULTI_BUFF_SUPPORT bit. If the attribute is present, rx_buffers_per_hcall is set to 8, enabling batched replenishment. Otherwise, it defaults to 1, preserving the original upstream behavior with no change in code flow for unsupported systems. The core rx replenish logic remains the same. But when batching is enabled, the driver aggregates up to 8 fully prepared descriptors into a single h_add_logical_lan_buffers() hypercall. If any allocation or DMA mapping fails while preparing a batch, only the successfully prepared buffers are submitted, and the remaining are deferred for the next replenish cycle. If at runtime the firmware stops accepting the batched hcall—e,g, after a Live Partition Migration (LPM) to a host that does not support h_add_logical_lan_buffers(), the hypercall returns H_FUNCTION. In that case, the driver transparently disables batching, resets rx_buffers_per_hcall to 1, and falls back to the single-buffer hcall in next future replenishments to take care of these and future buffers. Test were done on systems with firmware that both supports and does not support the new h_add_logical_lan_buffers hcall. On supported firmware, this reduces hypercall overhead significantly over multiple buffers. SAR measurements showed about a 15% improvement in packet processing rate under moderate RX load, with heavier traffic seeing gains more than 30% Signed-off-by: Mingming Cao <[email protected]> Reviewed-by: Brian King <[email protected]> Reviewed-by: Haren Myneni <[email protected]> Reviewed-by: Dave Marquardt <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
| * | | | netfilter: conntrack: remove DCCP protocol supportPablo Neira Ayuso2025-07-031-1/+0
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DCCP socket family has now been removed from this tree, see: 8bb3212be4b4 ("Merge branch 'net-retire-dccp-socket'") Remove connection tracking and NAT support for this protocol, this should not pose a problem because no DCCP traffic is expected to be seen on the wire. As for the code for matching on dccp header for iptables and nftables, mark it as deprecated and keep it in place. Ruleset restoration is an atomic operation. Without dccp matching support, an astray match on dccp could break this operation leaving your computer with no policy in place, so let's follow a more conservative approach for matches. Add CONFIG_NFT_EXTHDR_DCCP which is set to 'n' by default to deprecate dccp extension support. Similarly, label CONFIG_NETFILTER_XT_MATCH_DCCP as deprecated too and also set it to 'n' by default. Code to match on DCCP protocol from ebtables also remains in place, this is just a few checks on IPPROTO_DCCP from _check() path which is exercised when ruleset is loaded. There is another use of IPPROTO_DCCP from the _check() path in the iptables multiport match. Another check for IPPROTO_DCCP from the packet in the reject target is also removed. So let's schedule removal of the dccp matching for a second stage, this should not interfer with the dccp retirement since this is only matching on the dccp header. Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
* | | | Merge tag 'powerpc-6.17-1' of ↵Linus Torvalds2025-07-3023-217/+206
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Madhavan Srinivasan: - CONFIG_HZ changes to move the base_slice from 10ms to 1ms - Patchset to move some of the mutex handling to lock guard - Expose secvars relevant to the key management mode - Misc cleanups and fixes Thanks to Ankit Chauhan, Christophe Leroy, Donet Tom, Gautam Menghani, Haren Myneni, Johan Korsnes, Madadi Vineeth Reddy, Paul Mackerras, Shrikanth Hegde, Srish Srinivasan, Thomas Fourier, Thomas Huth, Thomas Weißschuh, Souradeep, Amit Machhiwal, R Nageswara Sastry, Venkat Rao Bagalkote, Andrew Donnellan, Greg Kroah-Hartman, Mimi Zohar, Mukesh Kumar Chaurasiya, Nayna Jain, Ritesh Harjani (IBM), Sourabh Jain, Srikar Dronamraju, Stefan Berger, Tyrel Datwyler, and Kowshik Jois. * tag 'powerpc-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (23 commits) arch/powerpc: Remove .interp section in vmlinux powerpc: Drop GPL boilerplate text with obsolete FSF address powerpc: Don't use %pK through printk arch: powerpc: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEX misc: ocxl: Replace scnprintf() with sysfs_emit() in sysfs show functions integrity/platform_certs: Allow loading of keys in the static key management mode powerpc/secvar: Expose secvars relevant to the key management mode powerpc/pseries: Correct secvar format representation for static key management (powerpc/512) Fix possible `dma_unmap_single()` on uninitialized pointer powerpc: floppy: Add missing checks after DMA map book3s64/radix : Optimize vmemmap start alignment book3s64/radix : Handle error conditions properly in radix_vmemmap_populate powerpc/pseries/dlpar: Search DRC index from ibm,drc-indexes for IO add KVM: PPC: Book3S HV: Add H_VIRT mapping for tracing exits powerpc: sysdev: use lock guard for mutex powerpc: powernv: ocxl: use lock guard for mutex powerpc: book3s: vas: use lock guard for mutex powerpc: fadump: use lock guard for mutex powerpc: rtas: use lock guard for mutex powerpc: eeh: use lock guard for mutex ...
| * | | arch/powerpc: Remove .interp section in vmlinuxChristophe Leroy2025-07-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building with CONFIG_RELOCATABLE, there is a .interp section which contains the name of the expected ELF interpreter: Contents of section .interp: c0000000021c1bac 2f757372 2f6c6962 2f6c642e 736f2e31 /usr/lib/ld.so.1 c0000000021c1bbc 00 . That information is useless and even likely wrong. Remove it. Link: https://github.com/linuxppc/issues/issues/434 Signed-off-by: Christophe Leroy <[email protected]> Acked-by: Segher Boessenkool <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/eeaf8fd6628a75d19872ab31cf7e7179e2baef5e.1751366959.git.christophe.leroy@csgroup.eu
| * | | powerpc: Drop GPL boilerplate text with obsolete FSF addressThomas Huth2025-07-224-52/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The FSF does not reside in the Franklin street anymore, so we should not request the people to write to this address. Fortunately, these header files already contain a proper SPDX license identifier, so it should be fine to simply drop all of this license boilerplate code here. Suggested-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Thomas Huth <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
| * | | powerpc: Don't use %pK through printkThomas Weißschuh2025-07-222-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the past %pK was preferable to %p as it would not leak raw pointer values into the kernel log. Since commit ad67b74d2469 ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. Furthermore, restricted pointers ("%pK") were never meant to be used through printk(). They can still unintentionally leak raw pointers or acquire sleeping locks in atomic contexts. Switch to the regular pointer formatting which is safer and easier to reason about. Link: https://lore.kernel.org/lkml/20250113171731-dc10e3c1-da64-4af0-b767-7c7070468023@linutronix.de/ Signed-off-by: Thomas Weißschuh <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/20250718-restricted-pointers-powerpc-v2-1-fd7bddd809f3@linutronix.de
| * | | arch: powerpc: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEXJohan Korsnes2025-07-111-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This option was removed from the Kconfig in commit 8c710f75256b ("net/sched: Retire tcindex classifier") but it was not removed from the defconfigs. Fixes: 8c710f75256b ("net/sched: Retire tcindex classifier") Signed-off-by: Johan Korsnes <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
| * | | powerpc/secvar: Expose secvars relevant to the key management modeSrish Srinivasan2025-07-091-4/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PLPKS enabled PowerVM LPAR sysfs exposes all of the secure boot secvars irrespective of the key management mode. The PowerVM LPAR supports static and dynamic key management for secure boot. The key management option can be updated in the management console. The secvars PK, trustedcadb, and moduledb can be consumed both in the static and dynamic key management modes for the loading of signed third-party kernel modules. However, other secvars i.e. KEK, grubdb, grubdbx, sbat, db and dbx, which are used to verify the grub and kernel images, are consumed only in the dynamic key management mode. Expose only PK, trustedcadb, and moduledb in the static key management mode. Co-developed-by: Souradeep <[email protected]> Signed-off-by: Souradeep <[email protected]> Signed-off-by: Srish Srinivasan <[email protected]> Tested-by: R Nageswara Sastry <[email protected]> Reviewed-by: Mimi Zohar <[email protected]> Reviewed-by: Stefan Berger <[email protected]> Reviewed-by: Nayna Jain <[email protected]> Reviewed-by: Andrew Donnellan <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
| * | | powerpc/pseries: Correct secvar format representation for static key managementSrish Srinivasan2025-07-091-29/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a PLPKS enabled PowerVM LPAR, the secvar format property for static key management is misrepresented as "ibm,plpks-sb-unknown", creating reason for confusion. Static key management mode uses fixed, built-in keys. Dynamic key management mode allows keys to be updated in production to handle security updates without firmware rebuilds. Define a function named plpks_get_sb_keymgmt_mode() to retrieve the key management mode based on the existence of the SB_VERSION property in the firmware. Set the secvar format property to either "ibm,plpks-sb-v<version>" or "ibm,plpks-sb-v0" based on the key management mode, and return the length of the secvar format property. Co-developed-by: Souradeep <[email protected]> Signed-off-by: Souradeep <[email protected]> Signed-off-by: Srish Srinivasan <[email protected]> Tested-by: R Nageswara Sastry <[email protected]> Reviewed-by: Mimi Zohar <[email protected]> Reviewed-by: Stefan Berger <[email protected]> Reviewed-by: Nayna Jain <[email protected]> Reviewed-by: Andrew Donnellan <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
| * | | (powerpc/512) Fix possible `dma_unmap_single()` on uninitialized pointerThomas Fourier2025-07-091-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the device configuration fails (if `dma_dev->device_config()`), `sg_dma_address(&sg)` is not initialized and the jump to `err_dma_prep` leads to calling `dma_unmap_single()` on `sg_dma_address(&sg)`. Signed-off-by: Thomas Fourier <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
| * | | powerpc: floppy: Add missing checks after DMA mapThomas Fourier2025-06-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DMA map functions can fail and should be tested for errors. Signed-off-by: Thomas Fourier <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
| * | | book3s64/radix : Optimize vmemmap start alignmentDonet Tom2025-06-231-11/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we always align the vmemmap start to PAGE_SIZE, there is a chance that we may end up allocating page-sized vmemmap backing pages in RAM in the altmap not present case, because a PAGE_SIZE aligned address is not PMD_SIZE-aligned. In this patch, we are aligning the vmemmap start address to PMD_SIZE if altmap is not present. This ensures that a PMD_SIZE page is always allocated for the vmemmap mapping if altmap is not present. If altmap is present, Make sure we align the start vmemmap addr to PAGE_SIZE so that we calculate the correct start_pfn in altmap boundary check to decide whether we should use altmap or RAM based backing memory allocation. Also the address need to be aligned for set_pte operation. If the start addr is already PMD_SIZE aligned and with in the altmap boundary then we will try to use a pmd size altmap mapping else we go for page size mapping. So if altmap is present, we try to use the maximum number of altmap pages; otherwise, we allocate a PMD_SIZE RAM page. Signed-off-by: Donet Tom <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/895c4afd912c85d344a2065e348fac90529ed48f.1750593372.git.donettom@linux.ibm.com
| * | | book3s64/radix : Handle error conditions properly in radix_vmemmap_populateDonet Tom2025-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Error conditions are not handled properly if altmap is not present and PMD_SIZE vmemmap_alloc_block_buf fails. In this patch, if vmemmap_alloc_block_buf fails in the non-altmap case, we will fall back to the base mapping. Reviewed-by: Ritesh Harjani (IBM) <[email protected]> Signed-off-by: Donet Tom <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/7f95fe91c827a2fb76367a58dbea724e811fb152.1750593372.git.donettom@linux.ibm.com
| * | | powerpc/pseries/dlpar: Search DRC index from ibm,drc-indexes for IO addHaren Myneni2025-06-231-2/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IO hotplug add event is handled in the user space with drmgr tool. After the device is enabled, the user space uses /sys/kernel/dlpar interface with “dt add index <drc_index>” to update the device tree. The kernel interface (dlpar_hp_dt_add()) finds the parent node for the specified ‘drc_index’ from ibm,drc-info property. The recent FW provides this property from 2017 onwards. But KVM guest code in some releases is still using the older SLOF firmware which has ibm,drc-indexes property instead of ibm,drc-info. If the ibm,drc-info is not available, this patch adds changes to search ‘drc_index’ from the indexes array in ibm,drc-indexes property to support old FW. Fixes: 02b98ff44a57 ("powerpc/pseries/dlpar: Add device tree nodes for DLPAR IO add") Reported-by: Kowshik Jois <[email protected]> Signed-off-by: Haren Myneni <[email protected]> Tested-by: Amit Machhiwal <[email protected]> Reviewed-by: Tyrel Datwyler <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
| * | | KVM: PPC: Book3S HV: Add H_VIRT mapping for tracing exitsGautam Menghani2025-06-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The macro kvm_trace_symbol_exit is used for providing the mappings for the trap vectors and their names. Add mapping for H_VIRT so that trap reason is displayed as string instead of a vector number when using the kvm_guest_exit tracepoint. Signed-off-by: Gautam Menghani <[email protected]> Reviewed-by: Ritesh Harjani (IBM) <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
| * | | powerpc: sysdev: use lock guard for mutexShrikanth Hegde2025-06-231-11/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | use guard(mutex) for scope based resource management of mutex This would make the code simpler and easier to maintain. More details on lock guards can be found at https://lore.kernel.org/all/[email protected]/T/#u Reviewed-by: Srikar Dronamraju <[email protected]> Signed-off-by: Shrikanth Hegde <[email protected]> Tested-by: Venkat Rao Bagalkote <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
| * | | powerpc: powernv: ocxl: use lock guard for mutexShrikanth Hegde2025-06-231-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | use guard(mutex) for scope based resource management of mutex. This would make the code simpler and easier to maintain. More details on lock guards can be found at https://lore.kernel.org/all/[email protected]/T/#u Reviewed-by: Srikar Dronamraju <[email protected]> Acked-by: Andrew Donnellan <[email protected]> Signed-off-by: Shrikanth Hegde <[email protected]> Tested-by: Venkat Rao Bagalkote <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]
| * | | powerpc: book3s: vas: use lock guard for mutexShrikanth Hegde2025-06-231-19/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | use lock guards for scope based resource management of mutex. This would make the code simpler and easier to maintain. More details on lock guards can be found at https://lore.kernel.org/all/[email protected]/T/#u This shows the use of both guard and scoped_guard Reviewed-by: Srikar Dronamraju <[email protected]> Signed-off-by: Shrikanth Hegde <[email protected]> Tested-by: Venkat Rao Bagalkote <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]