aboutsummaryrefslogtreecommitdiffstats
path: root/include/kvm
Commit message (Collapse)AuthorAgeFilesLines
* KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocksOliver Upton2025-09-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzkaller has caught us red-handed once more, this time nesting regular spinlocks behind raw spinlocks: ============================= [ BUG: Invalid wait context ] 6.16.0-rc3-syzkaller-g7b8346bd9fce #0 Not tainted ----------------------------- syz.0.29/3743 is trying to lock: a3ff80008e2e9e18 (&xa->xa_lock#20){....}-{3:3}, at: vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137 other info that might help us debug this: context-{5:5} 3 locks held by syz.0.29/3743: #0: a3ff80008e2e90a8 (&kvm->slots_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x50/0x624 arch/arm64/kvm/vgic/vgic-init.c:499 #1: a3ff80008e2e9fa0 (&kvm->arch.config_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x5c/0x624 arch/arm64/kvm/vgic/vgic-init.c:500 #2: 58f0000021be1428 (&vgic_cpu->ap_list_lock){....}-{2:2}, at: vgic_flush_pending_lpis+0x3c/0x31c arch/arm64/kvm/vgic/vgic.c:150 stack backtrace: CPU: 0 UID: 0 PID: 3743 Comm: syz.0.29 Not tainted 6.16.0-rc3-syzkaller-g7b8346bd9fce #0 PREEMPT Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C) __dump_stack+0x30/0x40 lib/dump_stack.c:94 dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120 dump_stack+0x1c/0x28 lib/dump_stack.c:129 print_lock_invalid_wait_context kernel/locking/lockdep.c:4833 [inline] check_wait_context kernel/locking/lockdep.c:4905 [inline] __lock_acquire+0x978/0x299c kernel/locking/lockdep.c:5190 lock_acquire+0x14c/0x2e0 kernel/locking/lockdep.c:5871 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x5c/0x7c kernel/locking/spinlock.c:162 vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137 vgic_flush_pending_lpis+0x24c/0x31c arch/arm64/kvm/vgic/vgic.c:158 __kvm_vgic_vcpu_destroy+0x44/0x500 arch/arm64/kvm/vgic/vgic-init.c:455 kvm_vgic_destroy+0x100/0x624 arch/arm64/kvm/vgic/vgic-init.c:505 kvm_arch_destroy_vm+0x80/0x138 arch/arm64/kvm/arm.c:244 kvm_destroy_vm virt/kvm/kvm_main.c:1308 [inline] kvm_put_kvm+0x800/0xff8 virt/kvm/kvm_main.c:1344 kvm_vm_release+0x58/0x78 virt/kvm/kvm_main.c:1367 __fput+0x4ac/0x980 fs/file_table.c:465 ____fput+0x20/0x58 fs/file_table.c:493 task_work_run+0x1bc/0x254 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] do_notify_resume+0x1b4/0x270 arch/arm64/kernel/entry-common.c:151 exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline] exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline] el0_svc+0xb4/0x160 arch/arm64/kernel/entry-common.c:768 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600 This is of course no good, but is at odds with how LPI refcounts are managed. Solve the locking mess by deferring the release of unreferenced LPIs after the ap_list_lock is released. Mark these to-be-released LPIs specially to avoid racing with vgic_put_irq() and causing a double-free. Since references can only be taken on LPIs with a nonzero refcount, extending the lifetime of freed LPIs is still safe. Reviewed-by: Marc Zyngier <[email protected]> Reported-by: [email protected] Closes: https://lore.kernel.org/kvmarm/[email protected]/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIsOliver Upton2025-09-101-2/+2
| | | | | | | | | | | | | KVM's use of krefs to manage LPIs isn't adding much, especially considering vgic_irq_release() is a noop due to the lack of sufficient context. Switch to using a regular refcount in anticipation of adding a meaningful release concept for LPIs. Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* KVM: arm64: vgic: Drop stale comment on IRQ active stateOliver Upton2025-09-101-1/+1
| | | | | | | | | | | While LPIs lack an active state, KVM unconditionally folds the active state from the LR into the vgic_irq struct meaning this field cannot be 'creatively' reused for something else. Drop the misleading comment to reflect this. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* Merge tag 'kvmarm-6.17' of ↵Paolo Bonzini2025-07-291-1/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, round #1 - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts. - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface. - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally. - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range. - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor. - Convert more system register sanitization to the config-driven implementation. - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls. - Various cleanups and minor fixes.
| * Merge branch 'kvm-arm64/vgic-v4-ctl' into kvmarm/nextOliver Upton2025-07-281-0/+3
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/vgic-v4-ctl: : Userspace control of nASSGIcap, courtesy of Raghavendra Rao Ananta : : Allow userspace to decide if support for SGIs without an active state is : advertised to the guest, allowing VMs from GICv3-only hardware to be : migrated to to GICv4.1 capable machines. Documentation: KVM: arm64: Describe VGICv3 registers writable pre-init KVM: arm64: selftests: Add test for nASSGIcap attribute KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization KVM: arm64: vgic-v3: Consolidate MAINT_IRQ handling KVM: arm64: Disambiguate support for vSGIs v. vLPIs Signed-off-by: Oliver Upton <[email protected]>
| | * KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcapRaghavendra Rao Ananta2025-07-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM unconditionally advertises GICD_TYPER2.nASSGIcap (which internally implies vSGIs) on GICv4.1 systems. Allow userspace to change whether a VM supports the feature. Only allow changes prior to VGIC initialization as at that point vPEs need to be allocated for the VM. For convenience, bundle support for vLPIs and vSGIs behind this feature, allowing userspace to control vPE allocation for VMs in environments that may be constrained on vPE IDs. Signed-off-by: Raghavendra Rao Ananta <[email protected]> Reviewed-by: Eric Auger <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
| * | KVM: arm64: gic-v5: Support GICv3 compatSascha Bischoff2025-07-081-1/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for GICv3 compat mode (FEAT_GCIE_LEGACY) which allows a GICv5 host to run GICv3-based VMs. This change enables the VHE/nVHE/hVHE/protected modes, but does not support nested virtualization. A lazy-disable approach is taken for compat mode; it is enabled on the vgic_v3_load path but not disabled on the vgic_v3_put path. A non-GICv3 VM, i.e., one based on GICv5, is responsible for disabling compat mode on the corresponding vgic_v5_load path. Currently, GICv5 is not supported, and hence compat mode is not disabled again once it is enabled, and this function is intentionally omitted from the code. Co-authored-by: Timothy Hayes <[email protected]> Signed-off-by: Timothy Hayes <[email protected]> Signed-off-by: Sascha Bischoff <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* / KVM: Don't WARN if updating IRQ bypass route failsSean Christopherson2025-06-231-1/+1
|/ | | | | | | | | | | | | Don't bother WARNing if updating an IRTE route fails now that vendor code provides much more precise WARNs. The generic WARN doesn't provide enough information to actually debug the problem, and has obviously done nothing to surface the myriad bugs in KVM x86's implementation. Drop all of the associated return code plumbing that existed just so that common KVM could WARN. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
* KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding()Oliver Upton2025-05-301-2/+1
| | | | | | | | | | | | | | | | | | | | The virtual mapping and "GSI" routing of a particular vLPI is subject to change in response to the guest / userspace. This can be pretty annoying to deal with when KVM needs to track the physical state that's managed for vLPI direct injection. Make vgic_v4_unset_forwarding() resilient by using the host IRQ to resolve the vgic IRQ. Since this uses the LPI xarray directly, finding the ITS by doorbell address + grabbing it's its_lock is no longer necessary. Note that matching the right ITS / ITE is already handled in vgic_v4_set_forwarding(), and unless there's a bug in KVM's VGIC ITS emulation the virtual mapping that should remain stable for the lifetime of the vLPI mapping. Tested-by: Sweet Tea Dorminy <[email protected]> Signed-off-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
* Merge branch 'kvm-arm64/pmu-fixes' into kvmarm/nextOliver Upton2025-03-191-2/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/pmu-fixes: : vPMU fixes for 6.15 courtesy of Akihiko Odaki : : Various fixes to KVM's vPMU implementation, notably ensuring : userspace-directed changes to the PMCs are reflected in the backing perf : events. KVM: arm64: PMU: Reload when resetting KVM: arm64: PMU: Reload when user modifies registers KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs KVM: arm64: PMU: Assume PMU presence in pmu-emul.c KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} Signed-off-by: Oliver Upton <[email protected]>
| * KVM: arm64: PMU: Reload when resettingAkihiko Odaki2025-03-171-2/+0
| | | | | | | | | | | | | | | | | | | | Replace kvm_pmu_vcpu_reset() with the generic PMU reloading mechanism to ensure the consistency with system registers and to reduce code size. Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
| * KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regsAkihiko Odaki2025-03-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reload the perf event when setting the vPMU counter (vPMC) registers (PMCCNTR_EL0 and PMEVCNTR<n>_EL0). This is a change corresponding to commit 9228b26194d1 ("KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the current value") but for SET_ONE_REG. Values of vPMC registers are saved in sysreg files on certain occasions. These saved values don't represent the current values of the vPMC registers if the perf events for the vPMCs count events after the save. The current values of those registers are the sum of the sysreg file value and the current perf event counter value. But, when userspace writes those registers (using KVM_SET_ONE_REG), KVM only updates the sysreg file value and leaves the current perf event counter value as is. It is also important to keep the correct state even if userspace writes them after first run, specifically when debugging Windows on QEMU with GDB; QEMU tries to write back all visible registers when resuming the VM execution with GDB, corrupting the PMU state. Windows always uses the PMU so this can cause adverse effects on that particular OS. Fix this by releasing the current perf event and trigger recreating one with KVM_REQ_RELOAD_PMU. Fixes: 051ff581ce70 ("arm64: KVM: Add access handler for event counter register") Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* | Merge branch 'kvm-arm64/pmuv3-asahi' into kvmarm/nextOliver Upton2025-03-191-9/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/pmuv3-asahi: : Support PMUv3 for KVM guests on Apple silicon : : Take advantage of some IMPLEMENTATION DEFINED traps available on Apple : parts to trap-and-emulate the PMUv3 registers on behalf of a KVM guest. : Constrain the vPMU to a cycle counter and single event counter, as the : Apple PMU has events that cannot be counted on every counter. : : There is a small new interface between the ARM PMU driver and KVM, where : the PMU driver owns the PMUv3 -> hardware event mappings. arm64: Enable IMP DEF PMUv3 traps on Apple M* KVM: arm64: Provide 1 event counter on IMPDEF hardware drivers/perf: apple_m1: Provide helper for mapping PMUv3 events KVM: arm64: Remap PMUv3 events onto hardware KVM: arm64: Advertise PMUv3 if IMPDEF traps are present KVM: arm64: Compute synthetic sysreg ESR for Apple PMUv3 traps KVM: arm64: Move PMUVer filtering into KVM code KVM: arm64: Use guard() to cleanup usage of arm_pmus_lock KVM: arm64: Drop kvm_arm_pmu_available static key KVM: arm64: Use a cpucap to determine if system supports FEAT_PMUv3 KVM: arm64: Always support SW_INCR PMU event KVM: arm64: Compute PMCEID from arm_pmu's event bitmaps drivers/perf: apple_m1: Support host/guest event filtering drivers/perf: apple_m1: Refactor event select/filter configuration Signed-off-by: Oliver Upton <[email protected]>
| * | KVM: arm64: Drop kvm_arm_pmu_available static keyOliver Upton2025-03-111-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the PMUv3 cpucap, kvm_arm_pmu_available is no longer used in the hot path of guest entry/exit. On top of that, guest support for PMUv3 may not correlate with host support for the feature, e.g. on IMPDEF hardware. Throw out the static key and just inspect the list of PMUs to determine if PMUv3 is supported for KVM guests. Tested-by: Janne Grunau <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
| * | KVM: arm64: Use a cpucap to determine if system supports FEAT_PMUv3Oliver Upton2025-03-111-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | KVM is about to learn some new tricks to virtualize PMUv3 on IMPDEF hardware. As part of that, we now need to differentiate host support from guest support for PMUv3. Add a cpucap to determine if an architectural PMUv3 is present to guard host usage of PMUv3 controls. Tested-by: Janne Grunau <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* | KVM: arm64: nv: Add Maintenance Interrupt emulationMarc Zyngier2025-03-031-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Emulating the vGIC means emulating the dreaded Maintenance Interrupt. This is a two-pronged problem: - while running L2, getting an MI translates into an MI injected in the L1 based on the state of the HW. - while running L1, we must accurately reflect the state of the MI line, based on the in-memory state. The MI INTID is added to the distributor, as expected on any virtualisation-capable implementation, and further patches will allow its configuration. Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* | KVM: arm64: nv: Nested GICv3 emulationMarc Zyngier2025-03-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When entering a nested VM, we set up the hypervisor control interface based on what the guest hypervisor has set. Especially, we investigate each list register written by the guest hypervisor whether HW bit is set. If so, we translate hw irq number from the guest's point of view to the real hardware irq number if there is a mapping. Co-developed-by: Jintack Lim <[email protected]> Signed-off-by: Jintack Lim <[email protected]> [Christoffer: Redesigned execution flow around vcpu load/put] Co-developed-by: Christoffer Dall <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> [maz: Rewritten to support GICv3 instead of GICv2, NV2 support] Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* | KVM: arm64: nv: Plumb handling of GICv3 EL2 accessesMarc Zyngier2025-03-031-0/+4
|/ | | | | | | | | | Wire the handling of all GICv3 EL2 registers, and provide emulation for all the non memory-backed registers (ICC_SRE_EL2, ICH_VTR_EL2, ICH_MISR_EL2, ICH_ELRSR_EL2, and ICH_EISR_EL2). Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* Merge branch kvm-arm64/pkvm-memshare-declutter into kvmarm-master/nextMarc Zyngier2025-01-171-4/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/pkvm-memshare-declutter: : . : pKVM memory transition simplifications, courtesy of Quentin Perret. : : From the cover letter: : "Since its early days, pKVM has formalized memory 'transitions' (shares : and donations) using 'struct pkvm_mem_transition' and bunch of helpers : to manipulate it. The intention was for all transitions to use this : machinery to ensure we're checking things consistently. However, as : development progressed, it became clear that the rigidity of this model : made it really difficult to use in some use-cases which ended-up : side-stepping it entirely. That is the case for the : hyp_{un}pin_shared_mem() and host_{un}share_guest() paths upstream which : use lower level helpers directly, as well as for several other pKVM : features that should land upstream in the future (ex: when a guest : relinquishes a page during ballooning, when annotating a page that is : being DMA'd to, ...). On top of this, the pkvm_mem_transition machinery : requires a lot of boilerplate which makes the code hard to read, but : also adds layers of indirection that no compilers seems to see through, : hence leading to suboptimal generated code. : : Given all the above, this series removes the pkvm_mem_transition : machinery from mem_protect.c, and converts all its users to use : __*_{check,set}_page_state_range() low-level helpers directly." : . KVM: arm64: Drop pkvm_mem_transition for host/hyp donations KVM: arm64: Drop pkvm_mem_transition for host/hyp sharing KVM: arm64: Drop pkvm_mem_transition for FF-A KVM: arm64: Only apply PMCR_EL0.P to the guest range of counters KVM: arm64: nv: Reload PMU events upon MDCR_EL2.HPME change KVM: arm64: Use KVM_REQ_RELOAD_PMU to handle PMCR_EL0.E change KVM: arm64: Add unified helper for reprogramming counters by mask KVM: arm64: Always check the state from hyp_ack_unshare() KVM: arm64: Fix set_id_regs selftest for ASIDBITS becoming unwritable Signed-off-by: Marc Zyngier <[email protected]>
| * KVM: arm64: Add unified helper for reprogramming counters by maskOliver Upton2024-12-181-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having separate helpers for enabling/disabling counters provides the wrong abstraction, as the state of each counter needs to be evaluated independently and, in some cases, use a different global enable bit. Collapse the enable/disable accessors into a single, common helper that reconfigures every counter set in @mask, leaving the complexity of determining if an event is actually enabled in kvm_pmu_counter_is_enabled(). Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* | KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosityMarc Zyngier2025-01-021-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It appears that on Qualcomm's x1e CPU, CNTVOFF_EL2 doesn't really work, specially with HCR_EL2.E2H=1. A non-zero offset results in a screaming virtual timer interrupt, to the tune of a few 100k interrupts per second on a 4 vcpu VM. This is also evidenced by this CPU's inability to correctly run any of the timer selftests. The only case this doesn't break is when this register is set to 0, which breaks VM migration. When HCR_EL2.E2H=0, the timer seems to behave normally, and does not result in an interrupt storm. As a workaround, use the fact that this CPU implements FEAT_ECV, and trap all accesses to the virtual timer and counter, keeping CNTVOFF_EL2 set to zero, and emulate accesses to CVAL/TVAL/CTL and the counter itself, fixing up the timer to account for the missing offset. And if you think this is disgusting, you'd probably be right. Acked-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
* | KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV in useMarc Zyngier2025-01-021-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | Although FEAT_ECV allows us to correctly emulate the timers, it also reduces performances pretty badly. Mitigate this by emulating the CTL/CVAL register reads in the inner run loop, without returning to the general kernel. Acked-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
* | KVM: arm64: nv: Sync nested timer state with FEAT_NV2Marc Zyngier2025-01-021-0/+1
|/ | | | | | | | | | | | | Emulating the timers with FEAT_NV2 is a bit odd, as the timers can be reconfigured behind our back without the hypervisor even noticing. In the VHE case, that's an actual regression in the architecture... Co-developed-by: Christoffer Dall <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> Acked-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
* KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definitionMarc Zyngier2024-11-211-1/+0
| | | | | | | | | VGIC_MAX_PRIVATE is a pretty useless definition, and is better replaced with VGIC_NR_PRIVATE_IRQS. Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* Merge branch kvm-arm64/nv-pmu into kvmarm/nextOliver Upton2024-11-111-2/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/nv-pmu: : Support for vEL2 PMU controls : : Align the vEL2 PMU support with the current state of non-nested KVM, : including: : : - Trap routing, with the annoying complication of EL2 traps that apply : in Host EL0 : : - PMU emulation, using the correct configuration bits depending on : whether a counter falls in the hypervisor or guest range of PMCs : : - Perf event swizzling across nested boundaries, as the event filtering : needs to be remapped to cope with vEL2 KVM: arm64: nv: Reprogram PMU events affected by nested transition KVM: arm64: nv: Apply EL2 event filtering when in hyp context KVM: arm64: nv: Honor MDCR_EL2.HLP KVM: arm64: nv: Honor MDCR_EL2.HPME KVM: arm64: Add helpers to determine if PMC counts at a given EL KVM: arm64: nv: Adjust range of accessible PMCs according to HPMN KVM: arm64: Rename kvm_pmu_valid_counter_mask() KVM: arm64: nv: Advertise support for FEAT_HPMN0 KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMN KVM: arm64: nv: Honor MDCR_EL2.{TPM, TPMCR} in Host EL0 KVM: arm64: nv: Reinject traps that take effect in Host EL0 KVM: arm64: nv: Rename BEHAVE_FORWARD_ANY KVM: arm64: nv: Allow coarse-grained trap combos to use complex traps KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2 arm64: sysreg: Add new definitions for ID_AA64DFR0_EL1 arm64: sysreg: Migrate MDCR_EL2 definition to table arm64: sysreg: Describe ID_AA64DFR2_EL1 fields Signed-off-by: Oliver Upton <[email protected]>
| * KVM: arm64: nv: Reprogram PMU events affected by nested transitionOliver Upton2024-10-311-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Start reprogramming PMU events at nested boundaries now that everything is in place to handle the EL2 event filter. Only repaint events where the filter differs between EL1 and EL2 as a slight optimization. PMU now 'works' for nested VMs, albeit slow. Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
| * KVM: arm64: nv: Adjust range of accessible PMCs according to HPMNOliver Upton2024-10-311-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The value of MDCR_EL2.HPMN controls the number of event counters made visible to EL0 and EL1. This means it is possible for the guest hypervisor to allow direct access to event counters to the L2. Rework KVM's PMU register emulation to take the effects of HPMN into account when handling a trap. For bitmask-style registers, writes only affect accessible registers. Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
| * KVM: arm64: Rename kvm_pmu_valid_counter_mask()Oliver Upton2024-10-311-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Nested PMU support requires dynamically changing the visible range of PMU counters based on the exception level and value of MDCR_EL2.HPMN. At the same time, the PMU emulation code needs to know the absolute number of implemented counters, regardless of context. Rename the existing helper to make it obvious that it returns the number of implemented counters and not anything else. Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
| * KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMNOliver Upton2024-10-311-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MDCR_EL2.HPMN splits the PMU event counters into two ranges: the first range is accessible from all ELs, and the second range is accessible only to EL2/3. Supposing the guest hypervisor allows direct access to the PMU counters from the L2, KVM needs to locally handle those accesses. Add a new complex trap configuration for HPMN that checks if the counter index is accessible to the current context. As written, the architecture suggests HPMN only causes PMEVCNTR<n>_EL0 to trap, though intuition (and the pseudocode) suggest that the trap applies to PMEVTYPER<n>_EL0 as well. Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* | Merge branch kvm-arm64/psci-1.3 into kvmarm/nextOliver Upton2024-11-111-1/+3
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/psci-1.3: : PSCI v1.3 support, courtesy of David Woodhouse : : Bump KVM's PSCI implementation up to v1.3, with the added bonus of : implementing the SYSTEM_OFF2 call. Like other system-scoped PSCI calls, : this gets relayed to userspace for further processing with a new : KVM_SYSTEM_EVENT_SHUTDOWN flag. : : As an added bonus, implement client-side support for hibernation with : the SYSTEM_OFF2 call. arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call KVM: selftests: Add test for PSCI SYSTEM_OFF2 KVM: arm64: Add support for PSCI v1.2 and v1.3 KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation firmware/psci: Add definitions for PSCI v1.3 specification Signed-off-by: Oliver Upton <[email protected]>
| * KVM: arm64: Add support for PSCI v1.2 and v1.3David Woodhouse2024-10-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | As with PSCI v1.1 in commit 512865d83fd9 ("KVM: arm64: Bump guest PSCI version to 1.1"), expose v1.3 to the guest by default. The SYSTEM_OFF2 call which is exposed by doing so is compatible for userspace because it's just a new flag in the event that KVM raises, in precisely the same way that SYSTEM_RESET2 was compatible when v1.1 was enabled by default. Signed-off-by: David Woodhouse <[email protected]> Reviewed-by: Miguel Luis <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* | KVM: arm64: nv: Handle CNTHCTL_EL2 speciallyMarc Zyngier2024-10-311-0/+3
|/ | | | | | | | | | | | | | | | Accessing CNTHCTL_EL2 is fraught with danger if running with HCR_EL2.E2H=1: half of the bits are held in CNTKCTL_EL1, and thus can be changed behind our back, while the rest lives in the CNTHCTL_EL2 shadow copy that is memory-based. Yes, this is a lot of fun! Make sure that we merge the two on read access, while we can write to CNTKCTL_EL1 in a more straightforward manner. Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* KVM: arm64: Refine PMU defines for number of countersRob Herring (Arm)2024-08-161-1/+2
| | | | | | | | | | | | | | | | | | | | There are 2 defines for the number of PMU counters: ARMV8_PMU_MAX_COUNTERS and ARMPMU_MAX_HWEVENTS. Both are the same currently, but Armv9.4/8.9 increases the number of possible counters from 32 to 33. With this change, the maximum number of counters will differ for KVM's PMU emulation which is PMUv3.4. Give KVM PMU emulation its own define to decouple it from the rest of the kernel's number PMU counters. The VHE PMU code needs to match the PMU driver, so switch it to use ARMPMU_MAX_HWEVENTS instead. Acked-by: Mark Rutland <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Signed-off-by: Rob Herring (Arm) <[email protected]> Tested-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
* arm64: perf/kvm: Use a common PMU cycle counter defineRob Herring (Arm)2024-08-161-1/+0
| | | | | | | | | | | | | The PMUv3 and KVM code each have a define for the PMU cycle counter index. Move KVM's define to a shared location and use it for PMUv3 driver. Reviewed-by: Marc Zyngier <[email protected]> Acked-by: Mark Rutland <[email protected]> Signed-off-by: Rob Herring (Arm) <[email protected]> Tested-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
* perf: arm_pmuv3: Prepare for more than 32 countersRob Herring (Arm)2024-08-161-2/+2
| | | | | | | | | | | | | | Various PMUv3 registers which are a mask of counters are 64-bit registers, but the accessor functions take a u32. This has been fine as the upper 32-bits have been RES0 as there has been a maximum of 32 counters prior to Armv9.4/8.9. With Armv9.4/8.9, a 33rd counter is added. Update the accessor functions to use a u64 instead. Acked-by: Mark Rutland <[email protected]> Signed-off-by: Rob Herring (Arm) <[email protected]> Tested-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
* Merge branch kvm-arm64/misc-6.10 into kvmarm-master/nextMarc Zyngier2024-05-081-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/misc-6.10: : . : Misc fixes and updates targeting 6.10 : : - Improve boot-time diagnostics when the sysreg tables : are not correctly sorted : : - Allow FFA_MSG_SEND_DIRECT_REQ in the FFA proxy : : - Fix duplicate XNX field in the ID_AA64MMFR1_EL1 : writeable mask : : - Allocate PPIs and SGIs outside of the vcpu structure, allowing : for smaller EL2 mapping and some flexibility in implementing : more or less than 32 private IRQs. : : - Use bitmap_gather() instead of its open-coded equivalent : : - Make protected mode use hVHE if available : : - Purge stale mpidr_data if a vcpu is created after the MPIDR : map has been created : . KVM: arm64: Destroy mpidr_data for 'late' vCPU creation KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support KVM: arm64: Fix hvhe/nvhe early alias parsing KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather() KVM: arm64: vgic: Allocate private interrupts on demand KVM: arm64: Remove duplicated AA64MMFR1_EL1 XNX KVM: arm64: Remove FFA_MSG_SEND_DIRECT_REQ from the denylist KVM: arm64: Improve out-of-order sysreg table diagnostics Signed-off-by: Marc Zyngier <[email protected]>
| * KVM: arm64: vgic: Allocate private interrupts on demandMarc Zyngier2024-05-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Private interrupts are currently part of the CPU interface structure that is part of each and every vcpu we create. Currently, we have 32 of them per vcpu, resulting in a per-vcpu array that is just shy of 4kB. On its own, that's no big deal, but it gets in the way of other things: - each vcpu gets mapped at EL2 on nVHE/hVHE configurations. This requires memory that is physically contiguous. However, the EL2 code has no purpose looking at the interrupt structures and could do without them being mapped. - supporting features such as EPPIs, which extend the number of private interrupts past the 32 limit would make the array even larger, even for VMs that do not use the EPPI feature. Address these issues by moving the private interrupt array outside of the vcpu, and replace it with a simple pointer. We take this opportunity to make it obvious what gets initialised when, as that path was remarkably opaque, and tighten the locking. Reviewed-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
* | Merge branch kvm-arm64/pkvm-6.10 into kvmarm-master/nextMarc Zyngier2024-05-031-1/+0
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/pkvm-6.10: (25 commits) : . : At last, a bunch of pKVM patches, courtesy of Fuad Tabba. : From the cover letter: : : "This series is a bit of a bombay-mix of patches we've been : carrying. There's no one overarching theme, but they do improve : the code by fixing existing bugs in pKVM, refactoring code to : make it more readable and easier to re-use for pKVM, or adding : functionality to the existing pKVM code upstream." : . KVM: arm64: Force injection of a data abort on NISV MMIO exit KVM: arm64: Restrict supported capabilities for protected VMs KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap() KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst KVM: arm64: Rename firmware pseudo-register documentation file KVM: arm64: Reformat/beautify PTP hypercall documentation KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit KVM: arm64: Introduce and use predicates that check for protected VMs KVM: arm64: Add is_pkvm_initialized() helper KVM: arm64: Simplify vgic-v3 hypercalls KVM: arm64: Move setting the page as dirty out of the critical section KVM: arm64: Change kvm_handle_mmio_return() return polarity KVM: arm64: Fix comment for __pkvm_vcpu_init_traps() KVM: arm64: Prevent kmemleak from accessing .hyp.data KVM: arm64: Do not map the host fpsimd state to hyp in pKVM KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE KVM: arm64: Support TLB invalidation in guest context KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE KVM: arm64: Check for PTE validity when checking for executable/cacheable KVM: arm64: Avoid BUG-ing from the host abort path ... Signed-off-by: Marc Zyngier <[email protected]>
| * | KVM: arm64: Simplify vgic-v3 hypercallsMarc Zyngier2024-05-011-1/+0
| |/ | | | | | | | | | | | | | | | | | | | | Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore hypercalls so that all of the EL2 GICv3 state is covered by a single pair of hypercalls. Signed-off-by: Fuad Tabba <[email protected]> Acked-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
* | KVM: arm64: vgic-its: Get rid of the lpi_list_lockOliver Upton2024-04-251-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The last genuine use case for the lpi_list_lock was the global LPI translation cache, which has been removed in favor of a per-ITS xarray. Remove a layer from the locking puzzle by getting rid of it. vgic_add_lpi() still has a critical section that needs to protect against the insertion of other LPIs; change it to take the LPI xarray's xa_lock to retain this property. Signed-off-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
* | KVM: arm64: vgic-its: Rip out the global translation cacheOliver Upton2024-04-251-3/+0
| | | | | | | | | | | | | | | | | | The MSI injection fast path has been transitioned away from the global translation cache. Rip it out. Signed-off-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
* | KVM: arm64: vgic-its: Maintain a translation cache per ITSOliver Upton2024-04-251-0/+6
| | | | | | | | | | | | | | | | | | | | | | Within the context of a single ITS, it is possible to use an xarray to cache the device ID & event ID translation to a particular irq descriptor. Take advantage of this to build a translation cache capable of fitting all valid translations for a given ITS. Signed-off-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
* | KVM: arm64: vgic-its: Get rid of vgic_copy_lpi_list()Oliver Upton2024-04-251-1/+0
| | | | | | | | | | | | | | | | | | | | The last user has been transitioned to walking the LPI xarray directly. Cut the wart off, and get rid of the now unneeded lpi_count while doing so. Signed-off-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
* | KVM: arm64: vgic-debug: Use an xarray mark for debug iteratorOliver Upton2024-04-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vgic debug iterator is the final user of vgic_copy_lpi_list(), but is a bit more complicated to transition to something else. Use a mark in the LPI xarray to record the indices 'known' to the debug iterator. Protect against the LPIs from being freed by associating an additional reference with the xarray mark. Rework iter_next() to let the xarray walk 'drive' the iteration after visiting all of the SGIs, PPIs, and SPIs. Signed-off-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
* | KVM: arm64: Fix host-programmed guest events in nVHEOliver Upton2024-03-261-1/+1
|/ | | | | | | | | | | | | | | | | | Programming PMU events in the host that count during guest execution is a feature supported by perf, e.g. perf stat -e cpu_cycles:G ./lkvm run While this works for VHE, the guest/host event bitmaps are not carried through to the hypervisor in the nVHE configuration. Make kvm_pmu_update_vcpu_events() conditional on whether or not _hardware_ supports PMUv3 rather than if the vCPU as vPMU enabled. Cc: [email protected] Fixes: 84d751a019a9 ("KVM: arm64: Pass pmu events to hyp via vcpu") Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
* Merge branch kvm-arm64/lpi-xarray into kvmarm/nextOliver Upton2024-03-071-4/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/lpi-xarray: : xarray-based representation of vgic LPIs : : KVM's linked-list of LPI state has proven to be a bottleneck in LPI : injection paths, due to lock serialization when acquiring / releasing a : reference on an IRQ. : : Start the tedious process of reworking KVM's LPI injection by replacing : the LPI linked-list with an xarray, leveraging this to allow RCU readers : to walk it outside of the spinlock. KVM: arm64: vgic: Don't acquire the lpi_list_lock in vgic_put_irq() KVM: arm64: vgic: Ensure the irq refcount is nonzero when taking a ref KVM: arm64: vgic: Rely on RCU protection in vgic_get_lpi() KVM: arm64: vgic: Free LPI vgic_irq structs in an RCU-safe manner KVM: arm64: vgic: Use atomics to count LPIs KVM: arm64: vgic: Get rid of the LPI linked-list KVM: arm64: vgic-its: Walk the LPI xarray in vgic_copy_lpi_list() KVM: arm64: vgic-v3: Iterate the xarray to find pending LPIs KVM: arm64: vgic: Use xarray to find LPI in vgic_get_lpi() KVM: arm64: vgic: Store LPIs in an xarray Signed-off-by: Oliver Upton <[email protected]>
| * KVM: arm64: vgic: Free LPI vgic_irq structs in an RCU-safe mannerOliver Upton2024-02-231-0/+1
| | | | | | | | | | | | | | | | | | Free the vgic_irq structs in an RCU-safe manner to allow reads of the LPI configuration data to happen in parallel with the release of LPIs. Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
| * KVM: arm64: vgic: Use atomics to count LPIsOliver Upton2024-02-231-2/+2
| | | | | | | | | | | | | | | | | | Switch to using atomics for LPI accounting, allowing vgic_irq references to be dropped in parallel. Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
| * KVM: arm64: vgic: Get rid of the LPI linked-listOliver Upton2024-02-231-2/+0
| | | | | | | | | | | | | | | | | | All readers of LPI configuration have been transitioned to use the LPI xarray. Get rid of the linked-list altogether. Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
| * KVM: arm64: vgic: Store LPIs in an xarrayOliver Upton2024-02-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using a linked-list for LPIs is less than ideal as it of course requires iterative searches to find a particular entry. An xarray is a better data structure for this use case, as it provides faster searches and can still handle a potentially sparse range of INTID allocations. Start by storing LPIs in an xarray, punting usage of the xarray to a subsequent change. The observant among you will notice that we added yet another lock to the chain of locking order rules; document the ordering of the xa_lock. Don't worry, we'll get rid of the lpi_list_lock one day... Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>