aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'amd-drm-fixes-6.17-2025-08-28' of ↵Dave Airlie2025-08-284-9/+18
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.17-2025-08-28: amdgpu: - UserQ fixes - Revert CSA fix - SR-IOV fix Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected]
| * drm/amdgpu/userq: fix error handling of invalid doorbellAlex Deucher2025-08-271-0/+1
| | | | | | | | | | | | | | | | | | | | If the doorbell is invalid, be sure to set the r to an error state so the function returns an error. Reviewed-by: David (Ming Qiang) Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 7e2a5b0a9a165a7c51274aa01b18be29491b4345) Cc: [email protected]
| * drm/amdgpu: update firmware version checks for user queue supportJesse.Zhang2025-08-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The minimum firmware versions required for user queue functionality have been increased to address an issue where the queue privilege state was lost during queue connect operations. The problem occurred because the privilege state was being restored to its initial value at the beginning of the function, overwriting the state that was properly set during the queue connect case. This commit updates the minimum version requirements: - ME firmware from 2390 to 2420 - PFP firmware from 2530 to 2580 - MEC firmware from 2600 to 2650 - MES firmware remains at 120 These updated firmware versions contain the necessary fixes to properly maintain queue privilege state throughout connect operations. Fixes: 61ca97e9590c ("drm/amdgpu: Add fw minimum version check for usermode queue") Acked-by: Alex Deucher <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 5f976c9939f0d5916d2b8ef3156a6d1799781df1) Cc: [email protected]
| * Revert "drm/amdgpu: fix incorrect vm flags to map bo"Alex Deucher2025-08-271-2/+2
| | | | | | | | | | | | | | | | | | | | This reverts commit b08425fa77ad2f305fe57a33dceb456be03b653f. Revert this to align with 6.17 because the fixes tag was wrong on this commit. Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit be33e8a239aac204d7e9e673c4220ef244eb1ba3)
| * drm/amdgpu/gfx12: set MQD as appriopriate for queue typesAlex Deucher2025-08-271-2/+6
| | | | | | | | | | | | | | | | | | | | Set the MQD as appropriate for the kernel vs user queues. Acked-by: Christian König <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 7b9110f2897957efd9715b52fc01986509729db3) Cc: [email protected]
| * drm/amdgpu/gfx11: set MQD as appriopriate for queue typesAlex Deucher2025-08-271-2/+6
| | | | | | | | | | | | | | | | | | | | Set the MQD as appropriate for the kernel vs user queues. Acked-by: Christian König <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 063d6683208722b1875f888a45084e3d112701ac) Cc: [email protected]
* | drm/amdgpu: Pin buffers while vmap'ing exported dma-buf objectsThomas Zimmermann2025-08-221-2/+32
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current dma-buf vmap semantics require that the mapped buffer remains in place until the corresponding vunmap has completed. For GEM-SHMEM, this used to be guaranteed by a pin operation while creating an S/G table in import. GEM-SHMEN can now import dma-buf objects without creating the S/G table, so the pin is missing. Leads to page-fault errors, such as the one shown below. [ 102.101726] BUG: unable to handle page fault for address: ffffc90127000000 [...] [ 102.157102] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] [...] [ 102.243250] Call Trace: [ 102.245695] <TASK> [ 102.2477V95] ? validate_chain+0x24e/0x5e0 [ 102.251805] ? __lock_acquire+0x568/0xae0 [ 102.255807] udl_render_hline+0x165/0x341 [udl] [ 102.260338] ? __pfx_udl_render_hline+0x10/0x10 [udl] [ 102.265379] ? local_clock_noinstr+0xb/0x100 [ 102.269642] ? __lock_release.isra.0+0x16c/0x2e0 [ 102.274246] ? mark_held_locks+0x40/0x70 [ 102.278177] udl_primary_plane_helper_atomic_update+0x43e/0x680 [udl] [ 102.284606] ? __pfx_udl_primary_plane_helper_atomic_update+0x10/0x10 [udl] [ 102.291551] ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 [ 102.297208] ? lockdep_hardirqs_on+0x88/0x130 [ 102.301554] ? _raw_spin_unlock_irq+0x24/0x50 [ 102.305901] ? wait_for_completion_timeout+0x2bb/0x3a0 [ 102.311028] ? drm_atomic_helper_calc_timestamping_constants+0x141/0x200 [ 102.317714] ? drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.323279] drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.328664] drm_atomic_helper_commit_tail+0x41/0xb0 [ 102.333622] commit_tail+0x204/0x330 [...] [ 102.529946] ---[ end trace 0000000000000000 ]--- [ 102.651980] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] In this stack strace, udl (based on GEM-SHMEM) imported and vmap'ed a dma-buf from amdgpu. Amdgpu relocated the buffer, thereby invalidating the mapping. Provide a custom dma-buf vmap method in amdgpu that pins the object before mapping it's buffer's pages into kernel address space. Do the opposite in vunmap. Note that dma-buf vmap differs from GEM vmap in how it handles relocation. While dma-buf vmap keeps the buffer in place, GEM vmap requires the caller to keep the buffer in place. Hence, this fix is in amdgpu's dma-buf code instead of its GEM code. A discussion of various approaches to solving the problem is available at [1]. v3: - try (GTT | VRAM); drop CPU domain (Christian) v2: - only use mapable domains (Christian) - try pinning to domains in preferred order Signed-off-by: Thomas Zimmermann <[email protected]> Fixes: 660cd44659a0 ("drm/shmem-helper: Import dmabuf without mapping its sg_table") Reported-by: Thomas Zimmermann <[email protected]> Closes: https://lore.kernel.org/dri-devel/[email protected]/ Cc: Shixiong Ou <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: David Airlie <[email protected]> Cc: Simona Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: "Christian König" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/dri-devel/[email protected]/ # [1] Reviewed-by: Christian König <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* Merge drm/drm-fixes into drm-misc-fixesMaxime Ripard2025-08-205-11/+33
|\ | | | | | | | | | | Update drm-misc-fixes to -rc2. Signed-off-by: Maxime Ripard <[email protected]>
| * drm/amdgpu: fix task hang from failed job submission during process killLiu01 Tong2025-08-122-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During process kill, drm_sched_entity_flush() will kill the vm entities. The following job submissions of this process will fail, and the resources of these jobs have not been released, nor have the fences been signalled, causing tasks to hang and timeout. Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to stopped entity. v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in function amdgpu_cs_vm_handling(). Fixes: 1f02f2044bda ("drm/amdgpu: Avoid extra evict-restore process.") Signed-off-by: Liu01 Tong <[email protected]> Signed-off-by: Lin.Cao <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit f101c13a8720c73e67f8f9d511fbbeda95bcedb1)
| * drm/amdgpu: fix incorrect vm flags to map boJack Xiao2025-08-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | It should use vm flags instead of pte flags to specify bo vm attributes. Fixes: 7946340fa389 ("drm/amdgpu: Move csa related code to separate file") Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit b08425fa77ad2f305fe57a33dceb456be03b653f)
| * drm/amdgpu: fix vram reservation issueYiPeng Chai2025-08-121-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The vram block allocation flag must be cleared before making vram reservation, otherwise reserving addresses within the currently freed memory range will always fail. Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu") Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit d38eaf27de1b8584f42d6fb3f717b7ec44b3a7a1)
| * drm/amdgpu: Add PSP fw version check for fw reserve GFX commandFrank Min2025-08-121-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fw reserved GFX command is only supported starting from PSP fw version 0x3a0e14 and 0x3b0e0d. Older versions do not support this command. Add a version guard to ensure the command is only used when the running PSP fw meets the minimum version requirement. This ensures backward compatibility and safe operation across fw revisions. Fixes: a3b7f9c306e1 ("drm/amdgpu: reclaim psp fw reservation memory region") Signed-off-by: Frank Min <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 065e23170a1e09bc9104b761183e59562a029619)
* | Revert "drm/amdgpu: Use dma_buf from GEM object instance"Thomas Zimmermann2025-08-133-3/+4
|/ | | | | | | | | | | | | | | | | | | | | | This reverts commit 515986100d176663d0a03219a3056e4252f729e6. The dma_buf field in struct drm_gem_object is not stable over the object instance's lifetime. The field becomes NULL when user space releases the final GEM handle on the buffer object. This resulted in a NULL-pointer deref. Workarounds in commit 5307dce878d4 ("drm/gem: Acquire references on GEM handles for framebuffers") and commit f6bfc9afc751 ("drm/framebuffer: Acquire internal references on GEM handles") only solved the problem partially. They especially don't work for buffer objects without a DRM framebuffer associated. Hence, this revert to going back to using .import_attach->dmabuf. Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Simona Vetter <[email protected]> Acked-by: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* Merge tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds2025-08-089-74/+189
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm fixes from Dave Airlie: "This is the fixes that built up in the merge window, mostly amdgpu and xe with one i915 display fix, seems like things are pretty good for rc1. i915: - DP LPFS fixes xe: - SRIOV: PF fixes and removal of need of module param - Fix driver unbind around Devcoredump - Mark xe driver as BROKEN if kernel page size is not 4kB amdgpu: - GC 9.5.0 fixes - SMU fix - DCE 6 DC fixes - mmhub client ID fixes - VRR fix - Backlight fix - UserQ fix - Legacy reset fix - Misc fixes amdkfd: - CRIU fix - Debugfs fix" * tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits) drm/amdgpu: add missing vram lost check for LEGACY RESET drm/amdgpu/discovery: fix fw based ip discovery drm/amdkfd: Destroy KFD debugfs after destroy KFD wq amdgpu/amdgpu_discovery: increase timeout limit for IFWI init drm/amdgpu: Update SDMA firmware version check for user queue support drm/amdgpu: Add NULL check for asic_funcs drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value" drm/amd/display: fix a Null pointer dereference vulnerability drm/amd/display: Add primary plane to commits for correct VRR handling drm/amdgpu: update mmhub 3.3 client id mappings drm/amdgpu: update mmhub 3.0.1 client id mappings drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming. drm/amd/display: Don't overwrite dce60_clk_mgr drm/amdkfd: Fix checkpoint-restore on multi-xcc drm/amd: Restore cached manual clock settings during resume drm/amd: Restore cached power limit during resume drm/amdgpu: Update external revid for GC v9.5.0 drm/amdgpu: Update supported modes for GC v9.5.0 Mark xe driver as BROKEN if kernel page size is not 4kB ...
| * drm/amdgpu: add missing vram lost check for LEGACY RESETAlex Deucher2025-08-061-0/+1
| | | | | | | | | | | | | | | | | | | | Legacy resets reset the memory controllers so VRAM contents may be unreliable after reset. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit aae94897b6661a2a4b1de2d328090fc388b3e0af) Cc: [email protected]
| * drm/amdgpu/discovery: fix fw based ip discoveryAlex Deucher2025-08-062-36/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We only need the fw based discovery table for sysfs. No need to parse it. Additionally parsing some of the board specific tables may result in incorrect data on some boards. just load the binary and don't parse it on those boards. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441 Fixes: 80a0e8282933 ("drm/amdgpu/discovery: optionally use fw based ip discovery") Reviewed-by: Mario Limonciello (AMD) <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 62eedd150fa11aefc2d377fc746633fdb1baeb55) Cc: [email protected]
| * amdgpu/amdgpu_discovery: increase timeout limit for IFWI initXaver Hugl2025-08-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | With a timeout of only 1 second, my rx 5700XT fails to initialize, so this increases the timeout to 2s. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697 Signed-off-by: Xaver Hugl <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 9ed3d7bdf2dcdf1a1196630fab89a124526e9cc2) Cc: [email protected]
| * drm/amdgpu: Update SDMA firmware version check for user queue supportJesse.Zhang2025-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes a firmware version check for enabling user queue support in SDMA v7.0. The previous version check (7836028) was incorrect and could lead to issues with PROTECTED_FENCE_SIGNAL commands causing register conflicts between MCU_DBG0 and MCU_DBG1. Fixes: 8c011408ed84 ("drm/amdgpu/sdma7: add ucode version checks for userq support") Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 92e2449241516c95aab95eea91faecd0fa2b7ed5) Cc: [email protected]
| * drm/amdgpu: Add NULL check for asic_funcsLijo Lazar2025-08-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | If driver load fails too early, asic_funcs pointer remains unassigned. Add NULL check to sanitize unwind path. Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 582bf7c5158dce16f7dc5b8345b7876bd8031224) Cc: [email protected]
| * drm/amdgpu: update mmhub 3.3 client id mappingsAlex Deucher2025-08-041-1/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | Update the client id mapping so the correct clients get printed when there is a mmhub page fault. v2: fix typos spotted by David Wu. v3: fix additional typo spotted by David. Reviewed-by: David (Ming Qiang) Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit e932f4779a2d329841bb9ca70bb80a4bb2d707b6) Cc: [email protected]
| * drm/amdgpu: update mmhub 3.0.1 client id mappingsAlex Deucher2025-08-041-25/+32
| | | | | | | | | | | | | | | | | | | | Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Reviewed-by: David (Ming Qiang) Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 2a2681eda73b99a2c1ee8cdb006099ea5d0c2505) Cc: [email protected]
| * drm/amdgpu: Retain job->vm in amdgpu_job_prepare_jobYuanShang2025-08-041-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The field job->vm is used in function amdgpu_job_run to get the page table re-generation counter and decide whether the job should be skipped. Specifically, function amdgpu_vm_generation checks if the VM is valid for this job to use. For instance, if a gfx job depends on a cancelled sdma job from entity vm->delayed, then the gfx job should be skipped. Fixes: 26c95e838e63 ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") Signed-off-by: YuanShang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit ed76936c6b10b547c6df4ca75412331e9ef6d339) Cc: [email protected]
| * drm/amdgpu: Update external revid for GC v9.5.0Lijo Lazar2025-08-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | Use different external revid for GC v9.5.0 SOCs. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 21c6764ed4bfaecad034bc4fd15dd64c5a436325) Cc: [email protected]
| * drm/amdgpu: Update supported modes for GC v9.5.0Lijo Lazar2025-08-041-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | For GC v9.5.0 SOCs, both CPX and QPX compute modes are also supported in NPS2 mode. Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Mangesh Gadre <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 9d1ac25c7f830e0132aa816393b1e9f140e71148) Cc: [email protected]
* | Merge tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds2025-08-019-35/+43
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm fixes from Dave Airlie: "Just a bunch of amdgpu and xe fixes. amdgpu: - DSC divide by 0 fix - clang fix - DC debugfs fix - Userq fixes - Avoid extra evict-restore with KFD - Backlight fix - Documentation fix - RAS fix - Add new kicker handling - DSC fix for DCN 3.1.4 - PSR fix - Atomic fix - DC reset fixes - DCN 3.0.1 fix - MMHUB client mapping fix xe: - Fix BMG probe on unsupported mailbox command - Fix OA static checker warning about null gt - Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter - Fix missing unwind goto in GuC/HuC - Don't register I2C devices if VF - Clear whole GuC g2h_fence during initialization - Avoid call kfree for drmm_kzalloc - Fix pci_dev reference leak on configfs - SRIOV: Disable CSC support on VF * tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel: (24 commits) drm/xe/vf: Disable CSC support on VF drm/amdgpu: update mmhub 4.1.0 client id mappings drm/amd/display: Allow DCN301 to clear update flags drm/amd/display: Pass up errors for reset GPU that fails to init HW drm/amd/display: Only finalize atomic_obj if it was initialized drm/amd/display: Avoid configuring PSR granularity if PSR-SU not supported drm/amd/display: Disable dsc_power_gate for dcn314 by default drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14 drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr access drm/amd/display: Fix misuse of /** to /* in 'dce_i2c_hw.c' drm/amd/display: fix initial backlight brightness calculation drm/amdgpu: Avoid extra evict-restore process. drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_prop drm/amdgpu: check if hubbub is NULL in debugfs/amdgpu_dm_capabilities drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram() drm/amd/display: Fix divide by zero when calculating min ODM factor drm/xe/configfs: Fix pci_dev reference leak drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc() drm/xe/guc: Clear whole g2h_fence during initialization drm/xe/vf: Don't register I2C devices if VF ...
| * drm/amdgpu: update mmhub 4.1.0 client id mappingsAlex Deucher2025-07-281-21/+13
| | | | | | | | | | | | | | | | | | | | Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Tested-by: David (Ming Qiang) Wu <[email protected]> Reviewed-by: David (Ming Qiang) Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| * drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14Frank Min2025-07-284-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Add kicker firmwares loading for gfx12/smu14/psp14 2. Register additional MODULE_FIRMWARE entries for kicker fws - gc_12_0_1_rlc_kicker.bin - gc_12_0_1_imu_kicker.bin - psp_14_0_3_sos_kicker.bin - psp_14_0_3_ta_kicker.bin - smu_14_0_3_kicker.bin Signed-off-by: Frank Min <[email protected]> Reviewed-by: Gui Chengming <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| * drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr accessYang Wang2025-07-281-2/+4
| | | | | | | | | | | | | | | | | | | | Add lock protection for 'ring->wptr'/'ring->rptr' to ensure the correct execution. Fixes: 8652920d2c00 ("drm/amdgpu: add mutex lock for cper ring") Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| * drm/amdgpu: Avoid extra evict-restore process.Gang Ba2025-07-281-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If vm belongs to another process, this is fclose after fork, wait may enable signaling KFD eviction fence and cause parent process queue evicted. [677852.634569] amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu] [677852.634814] __dma_fence_enable_signaling+0x3e/0xe0 [677852.634820] dma_fence_wait_timeout+0x3a/0x140 [677852.634825] amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl] [677852.634831] amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu] [677852.635026] amdgpu_flush+0x34/0x50 [amdgpu] [677852.635208] filp_flush+0x38/0x90 [677852.635213] filp_close+0x14/0x30 [677852.635216] do_close_on_exec+0xdd/0x130 [677852.635221] begin_new_exec+0x1da/0x490 [677852.635225] load_elf_binary+0x307/0xea0 [677852.635231] ? srso_alias_return_thunk+0x5/0xfbef5 [677852.635235] ? ima_bprm_check+0xa2/0xd0 [677852.635240] search_binary_handler+0xda/0x260 [677852.635245] exec_binprm+0x58/0x1a0 [677852.635249] bprm_execve.part.0+0x16f/0x210 [677852.635254] bprm_execve+0x45/0x80 [677852.635257] do_execveat_common.isra.0+0x190/0x200 Suggested-by: Christian König <[email protected]> Signed-off-by: Gang Ba <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| * drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_propAlex Deucher2025-07-282-0/+2
| | | | | | | | | | | | | | | | | | | | Used to to set the MQD appropriately for each queue type. Kernel queues have additional privileges. Acked-by: Christian König <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.16.x
| * drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram()Nathan Chancellor2025-07-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a recent change in clang to expose uninitialized warnings from const variables and pointers [1], there is a warning in imu_v12_0_program_rlc_ram() because data is passed uninitialized to program_imu_rlc_ram(): drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:374:30: error: variable 'data' is uninitialized when used here [-Werror,-Wuninitialized] 374 | program_imu_rlc_ram(adev, data, (const u32)size); | ^~~~ As this warning happens early in clang's frontend, it does not realize that due to the assignment of r to -EINVAL, program_imu_rlc_ram() is never actually called, and even if it were, data would not be dereferenced because size is 0. Just initialize data to NULL to silence the warning, as the commit that added program_imu_rlc_ram() mentioned it would eventually be used over the old method, at which point data can be properly initialized and used. Cc: [email protected] Closes: https://github.com/ClangBuiltLinux/linux/issues/2107 Fixes: 56159fffaab5 ("drm/amdgpu: use new method to program rlc ram") Link: https://github.com/llvm/llvm-project/commit/2464313eef01c5b1edf0eccf57a32cdee01472c7 [1] Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* | Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds2025-07-31120-1869/+3193
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm updates from Dave Airlie: "Highlights: - Intel xe enable Panthor Lake, started adding WildCat Lake - amdgpu has a bunch of reset improvments along with the usual IP updates - msm got VM_BIND support which is important for vulkan sparse memory - more drm_panic users - gpusvm common code to handle a bunch of core SVM work outside drivers. Detail summary: Changes outside drm subdirectory: - 'shrink_shmem_memory()' for better shmem/hibernate interaction - Rust support infrastructure: - make ETIMEDOUT available - add size constants up to SZ_2G - add DMA coherent allocation bindings - mtd driver for Intel GPU non-volatile storage - i2c designware quirk for Intel xe core: - atomic helpers: tune enable/disable sequences - add task info to wedge API - refactor EDID quirks - connector: move HDR sink to drm_display_info - fourcc: half-float and 32-bit float formats - mode_config: pass format info to simplify dma-buf: - heaps: Give CMA heap a stable name ci: - add device tree validation and kunit displayport: - change AUX DPCD access probe address - add quirk for DPCD probe - add panel replay definitions - backlight control helpers fbdev: - make CONFIG_FIRMWARE_EDID available on all arches fence: - fix UAF issues format-helper: - improve tests gpusvm: - introduce devmem only flag for allocation - add timeslicing support to GPU SVM ttm: - improve eviction sched: - tracing improvements - kunit improvements - memory leak fixes - reset handling improvements color mgmt: - add hardware gamma LUT handling helpers bridge: - add destroy hook - switch to reference counted drm_bridge allocations - tc358767: convert to devm_drm_bridge_alloc - improve CEC handling panel: - switch to reference counter drm_panel allocations - fwnode panel lookup - Huiling hl055fhv028c support - Raspberry Pi 7" 720x1280 support - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK - simple: AUO P238HAN01 - st7701: Winstar wf40eswaa6mnn0 - visionox: rm69299-shift - Renesas R61307, Renesas R69328 support - DJN HX83112B hdmi: - add CEC handling - YUV420 output support xe: - WildCat Lake support - Enable PanthorLake by default - mark BMG as SRIOV capable - update firmware recommendations - Expose media OA units - aux-bux support for non-volatile memory - MTD intel-dg driver for non-volatile memory - Expose fan control and voltage regulator in sysfs - restructure migration for multi-device - Restore GuC submit UAF fix - make GEM shrinker drm managed - SRIOV VF Post-migration recovery of GGTT nodes - W/A additions/reworks - Prefetch support for svm ranges - Don't allocate managed BO for each policy change - HWMON fixes for BMG - Create LRC BO without VM - PCI ID updates - make SLPC debugfs files optional - rework eviction rejection of bound external BOs - consolidate PAT programming logic for pre/post Xe2 - init changes for flicker-free boot - Enable GuC Dynamic Inhibit Context switch i915: - drm_panic support for i915/xe - initial flip queue off by default for LNL/PNL - Wildcat Lake Display support - Support for DSC fractional link bpp - Support for simultaneous Panel Replay and Adaptive sync - Support for PTL+ double buffer LUT - initial PIPEDMC event handling - drm_panel_follower support - DPLL interface renames - allocate struct intel_display dynamically - flip queue preperation - abstract DRAM detection better - avoid GuC scheduling stalls - remove DG1 force probe requirement - fix MEI interrupt handler on RT kernels - use backlight control helpers for eDP - more shared display code refactoring amdgpu: - add userq slot to INFO ioctl - SR-IOV hibernation support - Suspend improvements - Backlight improvements - Use scaling for non-native eDP modes - cleaner shader updates for GC 9.x - Remove fence slab - SDMA fw checks for userq support - RAS updates - DMCUB updates - DP tunneling fixes - Display idle D3 support - Per queue reset improvements - initial smartmux support amdkfd: - enable KFD on loongarch - mtype fix for ext coherent system memory radeon: - CS validation additional GL extensions - drop console lock during suspend/resume - bump driver version msm: - VM BIND support - CI: infrastructure updates - UBWC single source of truth - decouple GPU and KMS support - DP: rework I/O accessors - DPU: SM8750 support - DSI: SM8750 support - GPU: X1-45 support and speedbin support for X1-85 - MDSS: SM8750 support nova: - register! macro improvements - DMA object abstraction - VBIOS parser + fwsec lookup - sysmem flush page support - falcon: generic falcon boot code and HAL - FWSEC-FRTS: fb setup and load/execute ivpu: - Add Wildcat Lake support - Add turbo flag ast: - improve hardware generations implementation imx: - IMX8qxq Display Controller support lima: - Rockchip RK3528 GPU support nouveau: - fence handling cleanup panfrost: - MT8370 support - bo labeling - 64-bit register access qaic: - add RAS support rockchip: - convert inno_hdmi to a bridge rz-du: - add RZ/V2H(P) support - MIPI-DSI DCS support sitronix: - ST7567 support sun4i: - add H616 support tidss: - add TI AM62L support - AM65x OLDI bridge support bochs: - drm panic support vkms: - YUV and R* format support - use faux device vmwgfx: - fence improvements hyperv: - move out of simple - add drm_panic support" * tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits) drm/tidss: oldi: convert to devm_drm_bridge_alloc() API drm/tidss: encoder: convert to devm_drm_bridge_alloc() drm/amdgpu: move reset support type checks into the caller drm/amdgpu/sdma7: re-emit unprocessed state on ring reset drm/amdgpu/sdma6: re-emit unprocessed state on ring reset drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset drm/amdgpu/sdma5: re-emit unprocessed state on ring reset drm/amdgpu/gfx12: re-emit unprocessed state on ring reset drm/amdgpu/gfx11: re-emit unprocessed state on ring reset drm/amdgpu/gfx10: re-emit unprocessed state on ring reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu: Add WARN_ON to the resource clear function drm/amd/pm: Use cached metrics data on SMUv13.0.6 drm/amd/pm: Use cached data for min/max clocks gpu: nova-core: fix bounds check in PmuLookupTableEntry::new drm/amdgpu: Replace HQD terminology with slots naming drm/amdgpu: Add user queue instance count in HW IP info drm/amd/amdgpu: Add helper functions for isp buffers drm/amd/amdgpu: Initialize swnode for ISP MFD device ...
| * Merge tag 'amd-drm-next-6.17-2025-07-17' of ↵Dave Airlie2025-07-2152-374/+910
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.17-2025-07-17: amdgpu: - Partition fixes - Reset fixes - RAS fixes - i2c fix - MPC updates - DSC cleanup - EDID fixes - Display idle D3 update - IPS updates - DMUB updates - Retimer fix - Replay fixes - Fix DC memory leak - Initial support for smartmux - DCN 4.0.1 degamma LUT fix - Per queue reset cleanups - Track ring state associated with a fence - SR-IOV fixes - SMU fixes - Per queue reset improvements for GC 9+ compute - Per queue reset improvements for GC 10+ gfx - Per queue reset improvements for SDMA 5+ - Per queue reset improvements for JPEG 2+ - Per queue reset improvements for VCN 2+ - GC 8 fix - ISP updates amdkfd: - Enable KFD on LoongArch radeon: - Drop console lock during suspend/resume UAPI: - Add userq slot info to INFO IOCTL Used for IGT userq validation tests (https://lists.freedesktop.org/archives/igt-dev/2025-July/093228.html) From: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dave Airlie <[email protected]>
| | * drm/amdgpu: move reset support type checks into the callerAlex Deucher2025-07-1725-79/+37
| | | | | | | | | | | | | | | | | | | | | | | | Rather than checking in the callbacks, check if the reset type is supported in the caller. Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/sdma7: re-emit unprocessed state on ring resetAlex Deucher2025-07-171-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/sdma6: re-emit unprocessed state on ring resetAlex Deucher2025-07-171-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/sdma5.2: re-emit unprocessed state on ring resetAlex Deucher2025-07-171-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/sdma5: re-emit unprocessed state on ring resetAlex Deucher2025-07-171-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/gfx12: re-emit unprocessed state on ring resetAlex Deucher2025-07-171-31/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-emit the unprocessed state after resetting the queue. Drop the soft_recovery callbacks as the queue reset replaces it. Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/gfx11: re-emit unprocessed state on ring resetAlex Deucher2025-07-171-31/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-emit the unprocessed state after resetting the queue. Drop the soft_recovery callbacks as the queue reset replaces it. Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/gfx10: re-emit unprocessed state on ring resetAlex Deucher2025-07-171-31/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-emit the unprocessed state after resetting the queue. Drop the soft_recovery callbacks as the queue reset replaces it. Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq resetAlex Deucher2025-07-171-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/gfx9: re-emit unprocessed state on kcq resetAlex Deucher2025-07-171-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu: Add WARN_ON to the resource clear functionArunpravin Paneer Selvam2025-07-171-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set the dirty bit when the memory resource is not cleared during BO release. v2(Christian): - Drop the cleared flag set to false. - Improve the amdgpu_vram_mgr_set_clear_state() function. v3: - Add back the resource clear flag set function call after being cleared during eviction (Christian). - Modified the patch subject name. Signed-off-by: Arunpravin Paneer Selvam <[email protected]> Suggested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu: Replace HQD terminology with slots namingJesse Zhang2025-07-161-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The term "HQD" is CP-specific and doesn't accurately describe the queue resources for other IP blocks like SDMA, VCN, or VPE. This change: 1. Renames `num_hqds` to `num_slots` in amdgpu_kms.c to better reflect the generic nature of the resource counting 2. Updates the UAPI struct member from `userq_num_hqds` to `userq_num_slots` 3. Maintains the same functionality while using more appropriate terminology Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu: Add user queue instance count in HW IP infoJesse Zhang2025-07-161-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change exposes the number of available user queue instances for each hardware IP type (GFX, COMPUTE, SDMA) through the drm_amdgpu_info_hw_ip interface. Key changes: 1. Added userq_num_instance field to drm_amdgpu_info_hw_ip structure 2. Implemented counting of available HQD slots using: - mes.gfx_hqd_mask for GFX queues - mes.compute_hqd_mask for COMPUTE queues - mes.sdma_hqd_mask for SDMA queues 3. Only counts available instances when user queues are enabled (!disable_uq) v2: using the adev->mes.gfx_hqd_mask[]/compute_hqd_mask[]/sdma_hqd_mask[] masks to determine the number of queue slots available for each engine type (Alex) v3: rename userq_num_instance to userq_num_hqds (Alex) Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amd/amdgpu: Add helper functions for isp buffersPratap Nirujogi2025-07-163-10/+176
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Accessing amdgpu internal data structures "struct amdgpu_device" and "struct amdgpu_bo" in ISP V4L2 driver to alloc/free GART buffers is not recommended. Add new amdgpu_isp helper functions that takes opaque params from ISP V4L2 driver and calls the amdgpu internal functions amdgpu_bo_create_isp_user() and amdgpu_bo_create_kernel() to alloc/free GART buffers. Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Pratap Nirujogi <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amd/amdgpu: Initialize swnode for ISP MFD devicePratap Nirujogi2025-07-163-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create amd_isp_capture MFD device with swnode initialized to isp specific software_node part of fwnode graph in amd_isp4 x86/platform driver. The isp driver use this swnode handle to retrieve the critical properties (data-lanes, mipi phyid, link-frequencies etc.) required for camera to work on AMD ISP4 based targets. Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Pratap Nirujogi <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/gfx8: reset compute ring wptr on the GPU on resumeEeli Haapalainen2025-07-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") made the ring pointer always to be reset on resume from suspend. This caused compute rings to fail since the reset was done without also resetting it for the firmware. Reset wptr on the GPU to avoid a disconnect between the driver and firmware wptr. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3911 Fixes: 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") Signed-off-by: Eeli Haapalainen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/jpeg: clean up reset type handlingAlex Deucher2025-07-168-26/+29
| | | | | | | | | | | | | | | | | | | | | | | | Make the handling consistent with other IPs and across JPEG versions. Reviewed-by: Sathishkumar S <[email protected]> Signed-off-by: Alex Deucher <[email protected]>