aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu
Commit message (Collapse)AuthorAgeFilesLines
* drm/amdgpu: suspend KFD and KGD user queues for S0ixAlex Deucher2025-09-181-14/+10
| | | | | | | | | | | We need to make sure the user queues are preempted so GFX can enter gfxoff. Reviewed-by: Mario Limonciello (AMD) <[email protected]> Tested-by: David Perry <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit f8b367e6fa1716cab7cc232b9e3dff29187fc99d) Cc: [email protected]
* drm/amdkfd: add proper handling for S0ixAlex Deucher2025-09-182-4/+24
| | | | | | | | | | | When in S0i3, the GFX state is retained, so all we need to do is stop the runlist so GFX can enter gfxoff. Reviewed-by: Mario Limonciello (AMD) <[email protected]> Tested-by: David Perry <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 4bfa8609934dbf39bbe6e75b4f971469384b50b1) Cc: [email protected]
* drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.0.1/11.0.4 GPUsSrinivasan Shanmugam2025-09-151-0/+15
| | | | | | | | | | | | | | | | | | | | | | Enable the cleaner shader for additional GFX11.0.1/11.0.4 series GPUs to ensure data isolation among GPU tasks. The cleaner shader is tasked with clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps avoid data leakage and guarantees the accuracy of computational results. This update extends cleaner shader support to GFX11.0.1/11.0.4 GPUs, previously available for GFX11.0.3. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Wasee Alam <[email protected]> Cc: Mario Sopena-Novales <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 0a71ceb27f88a944c2de2808b67b2f46ac75076b)
* drm/amdgpu/vcn: Allow limiting ctx to instance 0 for AV1 at any timeDavid Rosca2025-09-092-8/+16
| | | | | | | | | | | | There is no reason to require this to happen on first submitted IB only. We need to wait for the queue to be idle, but it can be done at any time (including when there are multiple video sessions active). Signed-off-by: David Rosca <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 8908fdce0634a623404e9923ed2f536101a39db5) Cc: [email protected]
* drm/amdgpu/vcn4: Fix IB parsing with multiple engine info packagesDavid Rosca2025-09-091-29/+19
| | | | | | | | | | | | | There can be multiple engine info packages in one IB and the first one may be common engine, not decode/encode. We need to parse the entire IB instead of stopping after finding first engine info. Signed-off-by: David Rosca <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit dc8f9f0f45166a6b37864e7a031c726981d6e5fc) Cc: [email protected]
* drm/amd/amdgpu: Declare isp firmware binary filePratap Nirujogi2025-09-091-0/+2
| | | | | | | | | | | Declare isp firmware file isp_4_1_1.bin required by isp4.1.1 device. Suggested-by: Alexey Zagorodnikov <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Pratap Nirujogi <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit d97b74a833eba1f4f69f67198fd98ef036c0e5f9) Cc: [email protected]
* drm/amdgpu: fix a memory leak in fence cleanup when unloadingAlex Deucher2025-09-091-2/+0
| | | | | | | | | | | | | | | | | | Commit b61badd20b44 ("drm/amdgpu: fix usage slab after free") reordered when amdgpu_fence_driver_sw_fini() was called after that patch, amdgpu_fence_driver_sw_fini() effectively became a no-op as the sched entities we never freed because the ring pointers were already set to NULL. Remove the NULL setting. Reported-by: Lin.Cao <[email protected]> Cc: Vitaly Prosyak <[email protected]> Cc: Christian König <[email protected]> Fixes: b61badd20b44 ("drm/amdgpu: fix usage slab after free") Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit a525fa37aac36c4591cc8b07ae8957862415fbd5) Cc: [email protected]
* amd/amdkfd: correct mem limit calculation for small APUsYifan Zhang2025-09-091-12/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current mem limit check leaks some GTT memory (reserved_for_pt reserved_for_ras + adev->vram_pin_size) for small APUs. Since carveout VRAM is tunable on APUs, there are three case regarding the carveout VRAM size relative to GTT: 1. 0 < carveout < gtt apu_prefer_gtt = true, is_app_apu = false 2. carveout > gtt / 2 apu_prefer_gtt = false, is_app_apu = false 3. 0 = carveout apu_prefer_gtt = true, is_app_apu = true It doesn't make sense to check below limitation in case 1 (default case, small carveout) because the values in the below expression are mixed with carveout and gtt. adev->kfd.vram_used[xcp_id] + vram_needed > vram_size - reserved_for_pt - reserved_for_ras - atomic64_read(&adev->vram_pin_size) gtt: kfd.vram_used, vram_needed, vram_size carveout: reserved_for_pt, reserved_for_ras, adev->vram_pin_size In case 1, vram allocation will go to gtt domain, skip vram check since ttm_mem_limit check already cover this allocation. Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit fa7c99f04f6dd299388e9282812b14e95558ac8e)
* drm/amdgpu: Wait for bootloader after PSPv11 resetLijo Lazar2025-09-081-15/+4
| | | | | | | | | | | | | | | Some PSPv11 SOCs take a longer time for PSP based mode-1 reset. Instead of checking for C2PMSG_33 status, add the callback wait_for_bootloader. Wait for bootloader to be back to steady state is already part of the generic mode-1 reset flow. Increase the retry count for bootloader wait and also fix the mask to prevent fake pass. Fixes: 8345a71fc54b ("drm/amdgpu: Add more checks to PSP mailbox") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4531 Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 32f73741d6ee41fd5db8791c1163931e313d0fdc)
* drm/amd/amdgpu: Fix missing error return on kzalloc failureColin Ian King2025-09-031-1/+1
| | | | | | | | | | | | Currently the kzalloc failure check just sets reports the failure and sets the variable ret to -ENOMEM, which is not checked later for this specific error. Fix this by just returning -ENOMEM rather than setting ret. Fixes: 4fb930715468 ("drm/amd/amdgpu: remove redundant host to psp cmd buf allocations") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 1ee9d1a0962c13ba5ab7e47d33a80e3b8dc4b52e)
* drm/amdgpu: drop hw access in non-DC audio finiAlex Deucher2025-08-294-20/+0
| | | | | | | | | | | We already disable the audio pins in hw_fini so there is no need to do it again in sw_fini. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4481 Cc: oushixiong <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 5eeb16ca727f11278b2917fd4311a7d7efb0bbd6) Cc: [email protected]
* drm/amdgpu/mes11: make MES_MISC_OP_CHANGE_CONFIG failure non-fatalAlex Deucher2025-08-291-2/+3
| | | | | | | | | | | | If the firmware is too old, just warn and return success. Fixes: 27b791514789 ("drm/amdgpu/mes: keep enforce isolation up to date") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4414 Cc: [email protected] Reviewed-by: Shaoyun.liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 9f28af76fab0948b59673f69c10aeec47de11c60) Cc: [email protected]
* drm/amdgpu/sdma: bump firmware version checks for user queue supportJesse.Zhang2025-08-291-3/+3
| | | | | | | | | | | | | | | | | Using the previous firmware could lead to problems with PROTECTED_FENCE_SIGNAL commands, specifically causing register conflicts between MCU_DBG0 and MCU_DBG1. The updated firmware versions ensure proper alignment and unification of the SDMA_SUBOP_PROTECTED_FENCE_SIGNAL value with SDMA 7.x, resolving these hardware coordination issues Fixes: e8cca30d8b34 ("drm/amdgpu/sdma6: add ucode version checks for userq support") Acked-by: Alex Deucher <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit aab8b689aded255425db3d80c0030d1ba02fe2ef) Cc: [email protected]
* Merge tag 'amd-drm-fixes-6.17-2025-08-28' of ↵Dave Airlie2025-08-284-9/+18
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.17-2025-08-28: amdgpu: - UserQ fixes - Revert CSA fix - SR-IOV fix Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected]
| * drm/amdgpu/userq: fix error handling of invalid doorbellAlex Deucher2025-08-271-0/+1
| | | | | | | | | | | | | | | | | | | | If the doorbell is invalid, be sure to set the r to an error state so the function returns an error. Reviewed-by: David (Ming Qiang) Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 7e2a5b0a9a165a7c51274aa01b18be29491b4345) Cc: [email protected]
| * drm/amdgpu: update firmware version checks for user queue supportJesse.Zhang2025-08-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The minimum firmware versions required for user queue functionality have been increased to address an issue where the queue privilege state was lost during queue connect operations. The problem occurred because the privilege state was being restored to its initial value at the beginning of the function, overwriting the state that was properly set during the queue connect case. This commit updates the minimum version requirements: - ME firmware from 2390 to 2420 - PFP firmware from 2530 to 2580 - MEC firmware from 2600 to 2650 - MES firmware remains at 120 These updated firmware versions contain the necessary fixes to properly maintain queue privilege state throughout connect operations. Fixes: 61ca97e9590c ("drm/amdgpu: Add fw minimum version check for usermode queue") Acked-by: Alex Deucher <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 5f976c9939f0d5916d2b8ef3156a6d1799781df1) Cc: [email protected]
| * Revert "drm/amdgpu: fix incorrect vm flags to map bo"Alex Deucher2025-08-271-2/+2
| | | | | | | | | | | | | | | | | | | | This reverts commit b08425fa77ad2f305fe57a33dceb456be03b653f. Revert this to align with 6.17 because the fixes tag was wrong on this commit. Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit be33e8a239aac204d7e9e673c4220ef244eb1ba3)
| * drm/amdgpu/gfx12: set MQD as appriopriate for queue typesAlex Deucher2025-08-271-2/+6
| | | | | | | | | | | | | | | | | | | | Set the MQD as appropriate for the kernel vs user queues. Acked-by: Christian König <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 7b9110f2897957efd9715b52fc01986509729db3) Cc: [email protected]
| * drm/amdgpu/gfx11: set MQD as appriopriate for queue typesAlex Deucher2025-08-271-2/+6
| | | | | | | | | | | | | | | | | | | | Set the MQD as appropriate for the kernel vs user queues. Acked-by: Christian König <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 063d6683208722b1875f888a45084e3d112701ac) Cc: [email protected]
* | drm/amdgpu: Pin buffers while vmap'ing exported dma-buf objectsThomas Zimmermann2025-08-221-2/+32
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current dma-buf vmap semantics require that the mapped buffer remains in place until the corresponding vunmap has completed. For GEM-SHMEM, this used to be guaranteed by a pin operation while creating an S/G table in import. GEM-SHMEN can now import dma-buf objects without creating the S/G table, so the pin is missing. Leads to page-fault errors, such as the one shown below. [ 102.101726] BUG: unable to handle page fault for address: ffffc90127000000 [...] [ 102.157102] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] [...] [ 102.243250] Call Trace: [ 102.245695] <TASK> [ 102.2477V95] ? validate_chain+0x24e/0x5e0 [ 102.251805] ? __lock_acquire+0x568/0xae0 [ 102.255807] udl_render_hline+0x165/0x341 [udl] [ 102.260338] ? __pfx_udl_render_hline+0x10/0x10 [udl] [ 102.265379] ? local_clock_noinstr+0xb/0x100 [ 102.269642] ? __lock_release.isra.0+0x16c/0x2e0 [ 102.274246] ? mark_held_locks+0x40/0x70 [ 102.278177] udl_primary_plane_helper_atomic_update+0x43e/0x680 [udl] [ 102.284606] ? __pfx_udl_primary_plane_helper_atomic_update+0x10/0x10 [udl] [ 102.291551] ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 [ 102.297208] ? lockdep_hardirqs_on+0x88/0x130 [ 102.301554] ? _raw_spin_unlock_irq+0x24/0x50 [ 102.305901] ? wait_for_completion_timeout+0x2bb/0x3a0 [ 102.311028] ? drm_atomic_helper_calc_timestamping_constants+0x141/0x200 [ 102.317714] ? drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.323279] drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.328664] drm_atomic_helper_commit_tail+0x41/0xb0 [ 102.333622] commit_tail+0x204/0x330 [...] [ 102.529946] ---[ end trace 0000000000000000 ]--- [ 102.651980] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] In this stack strace, udl (based on GEM-SHMEM) imported and vmap'ed a dma-buf from amdgpu. Amdgpu relocated the buffer, thereby invalidating the mapping. Provide a custom dma-buf vmap method in amdgpu that pins the object before mapping it's buffer's pages into kernel address space. Do the opposite in vunmap. Note that dma-buf vmap differs from GEM vmap in how it handles relocation. While dma-buf vmap keeps the buffer in place, GEM vmap requires the caller to keep the buffer in place. Hence, this fix is in amdgpu's dma-buf code instead of its GEM code. A discussion of various approaches to solving the problem is available at [1]. v3: - try (GTT | VRAM); drop CPU domain (Christian) v2: - only use mapable domains (Christian) - try pinning to domains in preferred order Signed-off-by: Thomas Zimmermann <[email protected]> Fixes: 660cd44659a0 ("drm/shmem-helper: Import dmabuf without mapping its sg_table") Reported-by: Thomas Zimmermann <[email protected]> Closes: https://lore.kernel.org/dri-devel/[email protected]/ Cc: Shixiong Ou <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: David Airlie <[email protected]> Cc: Simona Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: "Christian König" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/dri-devel/[email protected]/ # [1] Reviewed-by: Christian König <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* Merge drm/drm-fixes into drm-misc-fixesMaxime Ripard2025-08-205-11/+33
|\ | | | | | | | | | | Update drm-misc-fixes to -rc2. Signed-off-by: Maxime Ripard <[email protected]>
| * drm/amdgpu: fix task hang from failed job submission during process killLiu01 Tong2025-08-122-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During process kill, drm_sched_entity_flush() will kill the vm entities. The following job submissions of this process will fail, and the resources of these jobs have not been released, nor have the fences been signalled, causing tasks to hang and timeout. Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to stopped entity. v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in function amdgpu_cs_vm_handling(). Fixes: 1f02f2044bda ("drm/amdgpu: Avoid extra evict-restore process.") Signed-off-by: Liu01 Tong <[email protected]> Signed-off-by: Lin.Cao <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit f101c13a8720c73e67f8f9d511fbbeda95bcedb1)
| * drm/amdgpu: fix incorrect vm flags to map boJack Xiao2025-08-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | It should use vm flags instead of pte flags to specify bo vm attributes. Fixes: 7946340fa389 ("drm/amdgpu: Move csa related code to separate file") Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit b08425fa77ad2f305fe57a33dceb456be03b653f)
| * drm/amdgpu: fix vram reservation issueYiPeng Chai2025-08-121-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The vram block allocation flag must be cleared before making vram reservation, otherwise reserving addresses within the currently freed memory range will always fail. Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu") Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit d38eaf27de1b8584f42d6fb3f717b7ec44b3a7a1)
| * drm/amdgpu: Add PSP fw version check for fw reserve GFX commandFrank Min2025-08-121-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fw reserved GFX command is only supported starting from PSP fw version 0x3a0e14 and 0x3b0e0d. Older versions do not support this command. Add a version guard to ensure the command is only used when the running PSP fw meets the minimum version requirement. This ensures backward compatibility and safe operation across fw revisions. Fixes: a3b7f9c306e1 ("drm/amdgpu: reclaim psp fw reservation memory region") Signed-off-by: Frank Min <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 065e23170a1e09bc9104b761183e59562a029619)
* | Revert "drm/amdgpu: Use dma_buf from GEM object instance"Thomas Zimmermann2025-08-133-3/+4
|/ | | | | | | | | | | | | | | | | | | | | | This reverts commit 515986100d176663d0a03219a3056e4252f729e6. The dma_buf field in struct drm_gem_object is not stable over the object instance's lifetime. The field becomes NULL when user space releases the final GEM handle on the buffer object. This resulted in a NULL-pointer deref. Workarounds in commit 5307dce878d4 ("drm/gem: Acquire references on GEM handles for framebuffers") and commit f6bfc9afc751 ("drm/framebuffer: Acquire internal references on GEM handles") only solved the problem partially. They especially don't work for buffer objects without a DRM framebuffer associated. Hence, this revert to going back to using .import_attach->dmabuf. Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Simona Vetter <[email protected]> Acked-by: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* Merge tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds2025-08-089-74/+189
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm fixes from Dave Airlie: "This is the fixes that built up in the merge window, mostly amdgpu and xe with one i915 display fix, seems like things are pretty good for rc1. i915: - DP LPFS fixes xe: - SRIOV: PF fixes and removal of need of module param - Fix driver unbind around Devcoredump - Mark xe driver as BROKEN if kernel page size is not 4kB amdgpu: - GC 9.5.0 fixes - SMU fix - DCE 6 DC fixes - mmhub client ID fixes - VRR fix - Backlight fix - UserQ fix - Legacy reset fix - Misc fixes amdkfd: - CRIU fix - Debugfs fix" * tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits) drm/amdgpu: add missing vram lost check for LEGACY RESET drm/amdgpu/discovery: fix fw based ip discovery drm/amdkfd: Destroy KFD debugfs after destroy KFD wq amdgpu/amdgpu_discovery: increase timeout limit for IFWI init drm/amdgpu: Update SDMA firmware version check for user queue support drm/amdgpu: Add NULL check for asic_funcs drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value" drm/amd/display: fix a Null pointer dereference vulnerability drm/amd/display: Add primary plane to commits for correct VRR handling drm/amdgpu: update mmhub 3.3 client id mappings drm/amdgpu: update mmhub 3.0.1 client id mappings drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming. drm/amd/display: Don't overwrite dce60_clk_mgr drm/amdkfd: Fix checkpoint-restore on multi-xcc drm/amd: Restore cached manual clock settings during resume drm/amd: Restore cached power limit during resume drm/amdgpu: Update external revid for GC v9.5.0 drm/amdgpu: Update supported modes for GC v9.5.0 Mark xe driver as BROKEN if kernel page size is not 4kB ...
| * drm/amdgpu: add missing vram lost check for LEGACY RESETAlex Deucher2025-08-061-0/+1
| | | | | | | | | | | | | | | | | | | | Legacy resets reset the memory controllers so VRAM contents may be unreliable after reset. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit aae94897b6661a2a4b1de2d328090fc388b3e0af) Cc: [email protected]
| * drm/amdgpu/discovery: fix fw based ip discoveryAlex Deucher2025-08-062-36/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We only need the fw based discovery table for sysfs. No need to parse it. Additionally parsing some of the board specific tables may result in incorrect data on some boards. just load the binary and don't parse it on those boards. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441 Fixes: 80a0e8282933 ("drm/amdgpu/discovery: optionally use fw based ip discovery") Reviewed-by: Mario Limonciello (AMD) <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 62eedd150fa11aefc2d377fc746633fdb1baeb55) Cc: [email protected]
| * amdgpu/amdgpu_discovery: increase timeout limit for IFWI initXaver Hugl2025-08-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | With a timeout of only 1 second, my rx 5700XT fails to initialize, so this increases the timeout to 2s. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697 Signed-off-by: Xaver Hugl <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 9ed3d7bdf2dcdf1a1196630fab89a124526e9cc2) Cc: [email protected]
| * drm/amdgpu: Update SDMA firmware version check for user queue supportJesse.Zhang2025-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes a firmware version check for enabling user queue support in SDMA v7.0. The previous version check (7836028) was incorrect and could lead to issues with PROTECTED_FENCE_SIGNAL commands causing register conflicts between MCU_DBG0 and MCU_DBG1. Fixes: 8c011408ed84 ("drm/amdgpu/sdma7: add ucode version checks for userq support") Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 92e2449241516c95aab95eea91faecd0fa2b7ed5) Cc: [email protected]
| * drm/amdgpu: Add NULL check for asic_funcsLijo Lazar2025-08-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | If driver load fails too early, asic_funcs pointer remains unassigned. Add NULL check to sanitize unwind path. Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 582bf7c5158dce16f7dc5b8345b7876bd8031224) Cc: [email protected]
| * drm/amdgpu: update mmhub 3.3 client id mappingsAlex Deucher2025-08-041-1/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | Update the client id mapping so the correct clients get printed when there is a mmhub page fault. v2: fix typos spotted by David Wu. v3: fix additional typo spotted by David. Reviewed-by: David (Ming Qiang) Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit e932f4779a2d329841bb9ca70bb80a4bb2d707b6) Cc: [email protected]
| * drm/amdgpu: update mmhub 3.0.1 client id mappingsAlex Deucher2025-08-041-25/+32
| | | | | | | | | | | | | | | | | | | | Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Reviewed-by: David (Ming Qiang) Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 2a2681eda73b99a2c1ee8cdb006099ea5d0c2505) Cc: [email protected]
| * drm/amdgpu: Retain job->vm in amdgpu_job_prepare_jobYuanShang2025-08-041-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The field job->vm is used in function amdgpu_job_run to get the page table re-generation counter and decide whether the job should be skipped. Specifically, function amdgpu_vm_generation checks if the VM is valid for this job to use. For instance, if a gfx job depends on a cancelled sdma job from entity vm->delayed, then the gfx job should be skipped. Fixes: 26c95e838e63 ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") Signed-off-by: YuanShang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit ed76936c6b10b547c6df4ca75412331e9ef6d339) Cc: [email protected]
| * drm/amdgpu: Update external revid for GC v9.5.0Lijo Lazar2025-08-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | Use different external revid for GC v9.5.0 SOCs. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 21c6764ed4bfaecad034bc4fd15dd64c5a436325) Cc: [email protected]
| * drm/amdgpu: Update supported modes for GC v9.5.0Lijo Lazar2025-08-041-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | For GC v9.5.0 SOCs, both CPX and QPX compute modes are also supported in NPS2 mode. Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Mangesh Gadre <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 9d1ac25c7f830e0132aa816393b1e9f140e71148) Cc: [email protected]
* | Merge tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds2025-08-019-35/+43
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm fixes from Dave Airlie: "Just a bunch of amdgpu and xe fixes. amdgpu: - DSC divide by 0 fix - clang fix - DC debugfs fix - Userq fixes - Avoid extra evict-restore with KFD - Backlight fix - Documentation fix - RAS fix - Add new kicker handling - DSC fix for DCN 3.1.4 - PSR fix - Atomic fix - DC reset fixes - DCN 3.0.1 fix - MMHUB client mapping fix xe: - Fix BMG probe on unsupported mailbox command - Fix OA static checker warning about null gt - Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter - Fix missing unwind goto in GuC/HuC - Don't register I2C devices if VF - Clear whole GuC g2h_fence during initialization - Avoid call kfree for drmm_kzalloc - Fix pci_dev reference leak on configfs - SRIOV: Disable CSC support on VF * tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel: (24 commits) drm/xe/vf: Disable CSC support on VF drm/amdgpu: update mmhub 4.1.0 client id mappings drm/amd/display: Allow DCN301 to clear update flags drm/amd/display: Pass up errors for reset GPU that fails to init HW drm/amd/display: Only finalize atomic_obj if it was initialized drm/amd/display: Avoid configuring PSR granularity if PSR-SU not supported drm/amd/display: Disable dsc_power_gate for dcn314 by default drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14 drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr access drm/amd/display: Fix misuse of /** to /* in 'dce_i2c_hw.c' drm/amd/display: fix initial backlight brightness calculation drm/amdgpu: Avoid extra evict-restore process. drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_prop drm/amdgpu: check if hubbub is NULL in debugfs/amdgpu_dm_capabilities drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram() drm/amd/display: Fix divide by zero when calculating min ODM factor drm/xe/configfs: Fix pci_dev reference leak drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc() drm/xe/guc: Clear whole g2h_fence during initialization drm/xe/vf: Don't register I2C devices if VF ...
| * drm/amdgpu: update mmhub 4.1.0 client id mappingsAlex Deucher2025-07-281-21/+13
| | | | | | | | | | | | | | | | | | | | Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Tested-by: David (Ming Qiang) Wu <[email protected]> Reviewed-by: David (Ming Qiang) Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| * drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14Frank Min2025-07-284-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Add kicker firmwares loading for gfx12/smu14/psp14 2. Register additional MODULE_FIRMWARE entries for kicker fws - gc_12_0_1_rlc_kicker.bin - gc_12_0_1_imu_kicker.bin - psp_14_0_3_sos_kicker.bin - psp_14_0_3_ta_kicker.bin - smu_14_0_3_kicker.bin Signed-off-by: Frank Min <[email protected]> Reviewed-by: Gui Chengming <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| * drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr accessYang Wang2025-07-281-2/+4
| | | | | | | | | | | | | | | | | | | | Add lock protection for 'ring->wptr'/'ring->rptr' to ensure the correct execution. Fixes: 8652920d2c00 ("drm/amdgpu: add mutex lock for cper ring") Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| * drm/amdgpu: Avoid extra evict-restore process.Gang Ba2025-07-281-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If vm belongs to another process, this is fclose after fork, wait may enable signaling KFD eviction fence and cause parent process queue evicted. [677852.634569] amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu] [677852.634814] __dma_fence_enable_signaling+0x3e/0xe0 [677852.634820] dma_fence_wait_timeout+0x3a/0x140 [677852.634825] amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl] [677852.634831] amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu] [677852.635026] amdgpu_flush+0x34/0x50 [amdgpu] [677852.635208] filp_flush+0x38/0x90 [677852.635213] filp_close+0x14/0x30 [677852.635216] do_close_on_exec+0xdd/0x130 [677852.635221] begin_new_exec+0x1da/0x490 [677852.635225] load_elf_binary+0x307/0xea0 [677852.635231] ? srso_alias_return_thunk+0x5/0xfbef5 [677852.635235] ? ima_bprm_check+0xa2/0xd0 [677852.635240] search_binary_handler+0xda/0x260 [677852.635245] exec_binprm+0x58/0x1a0 [677852.635249] bprm_execve.part.0+0x16f/0x210 [677852.635254] bprm_execve+0x45/0x80 [677852.635257] do_execveat_common.isra.0+0x190/0x200 Suggested-by: Christian König <[email protected]> Signed-off-by: Gang Ba <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| * drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_propAlex Deucher2025-07-282-0/+2
| | | | | | | | | | | | | | | | | | | | Used to to set the MQD appropriately for each queue type. Kernel queues have additional privileges. Acked-by: Christian König <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.16.x
| * drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram()Nathan Chancellor2025-07-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a recent change in clang to expose uninitialized warnings from const variables and pointers [1], there is a warning in imu_v12_0_program_rlc_ram() because data is passed uninitialized to program_imu_rlc_ram(): drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:374:30: error: variable 'data' is uninitialized when used here [-Werror,-Wuninitialized] 374 | program_imu_rlc_ram(adev, data, (const u32)size); | ^~~~ As this warning happens early in clang's frontend, it does not realize that due to the assignment of r to -EINVAL, program_imu_rlc_ram() is never actually called, and even if it were, data would not be dereferenced because size is 0. Just initialize data to NULL to silence the warning, as the commit that added program_imu_rlc_ram() mentioned it would eventually be used over the old method, at which point data can be properly initialized and used. Cc: [email protected] Closes: https://github.com/ClangBuiltLinux/linux/issues/2107 Fixes: 56159fffaab5 ("drm/amdgpu: use new method to program rlc ram") Link: https://github.com/llvm/llvm-project/commit/2464313eef01c5b1edf0eccf57a32cdee01472c7 [1] Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* | Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds2025-07-31120-1869/+3193
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm updates from Dave Airlie: "Highlights: - Intel xe enable Panthor Lake, started adding WildCat Lake - amdgpu has a bunch of reset improvments along with the usual IP updates - msm got VM_BIND support which is important for vulkan sparse memory - more drm_panic users - gpusvm common code to handle a bunch of core SVM work outside drivers. Detail summary: Changes outside drm subdirectory: - 'shrink_shmem_memory()' for better shmem/hibernate interaction - Rust support infrastructure: - make ETIMEDOUT available - add size constants up to SZ_2G - add DMA coherent allocation bindings - mtd driver for Intel GPU non-volatile storage - i2c designware quirk for Intel xe core: - atomic helpers: tune enable/disable sequences - add task info to wedge API - refactor EDID quirks - connector: move HDR sink to drm_display_info - fourcc: half-float and 32-bit float formats - mode_config: pass format info to simplify dma-buf: - heaps: Give CMA heap a stable name ci: - add device tree validation and kunit displayport: - change AUX DPCD access probe address - add quirk for DPCD probe - add panel replay definitions - backlight control helpers fbdev: - make CONFIG_FIRMWARE_EDID available on all arches fence: - fix UAF issues format-helper: - improve tests gpusvm: - introduce devmem only flag for allocation - add timeslicing support to GPU SVM ttm: - improve eviction sched: - tracing improvements - kunit improvements - memory leak fixes - reset handling improvements color mgmt: - add hardware gamma LUT handling helpers bridge: - add destroy hook - switch to reference counted drm_bridge allocations - tc358767: convert to devm_drm_bridge_alloc - improve CEC handling panel: - switch to reference counter drm_panel allocations - fwnode panel lookup - Huiling hl055fhv028c support - Raspberry Pi 7" 720x1280 support - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK - simple: AUO P238HAN01 - st7701: Winstar wf40eswaa6mnn0 - visionox: rm69299-shift - Renesas R61307, Renesas R69328 support - DJN HX83112B hdmi: - add CEC handling - YUV420 output support xe: - WildCat Lake support - Enable PanthorLake by default - mark BMG as SRIOV capable - update firmware recommendations - Expose media OA units - aux-bux support for non-volatile memory - MTD intel-dg driver for non-volatile memory - Expose fan control and voltage regulator in sysfs - restructure migration for multi-device - Restore GuC submit UAF fix - make GEM shrinker drm managed - SRIOV VF Post-migration recovery of GGTT nodes - W/A additions/reworks - Prefetch support for svm ranges - Don't allocate managed BO for each policy change - HWMON fixes for BMG - Create LRC BO without VM - PCI ID updates - make SLPC debugfs files optional - rework eviction rejection of bound external BOs - consolidate PAT programming logic for pre/post Xe2 - init changes for flicker-free boot - Enable GuC Dynamic Inhibit Context switch i915: - drm_panic support for i915/xe - initial flip queue off by default for LNL/PNL - Wildcat Lake Display support - Support for DSC fractional link bpp - Support for simultaneous Panel Replay and Adaptive sync - Support for PTL+ double buffer LUT - initial PIPEDMC event handling - drm_panel_follower support - DPLL interface renames - allocate struct intel_display dynamically - flip queue preperation - abstract DRAM detection better - avoid GuC scheduling stalls - remove DG1 force probe requirement - fix MEI interrupt handler on RT kernels - use backlight control helpers for eDP - more shared display code refactoring amdgpu: - add userq slot to INFO ioctl - SR-IOV hibernation support - Suspend improvements - Backlight improvements - Use scaling for non-native eDP modes - cleaner shader updates for GC 9.x - Remove fence slab - SDMA fw checks for userq support - RAS updates - DMCUB updates - DP tunneling fixes - Display idle D3 support - Per queue reset improvements - initial smartmux support amdkfd: - enable KFD on loongarch - mtype fix for ext coherent system memory radeon: - CS validation additional GL extensions - drop console lock during suspend/resume - bump driver version msm: - VM BIND support - CI: infrastructure updates - UBWC single source of truth - decouple GPU and KMS support - DP: rework I/O accessors - DPU: SM8750 support - DSI: SM8750 support - GPU: X1-45 support and speedbin support for X1-85 - MDSS: SM8750 support nova: - register! macro improvements - DMA object abstraction - VBIOS parser + fwsec lookup - sysmem flush page support - falcon: generic falcon boot code and HAL - FWSEC-FRTS: fb setup and load/execute ivpu: - Add Wildcat Lake support - Add turbo flag ast: - improve hardware generations implementation imx: - IMX8qxq Display Controller support lima: - Rockchip RK3528 GPU support nouveau: - fence handling cleanup panfrost: - MT8370 support - bo labeling - 64-bit register access qaic: - add RAS support rockchip: - convert inno_hdmi to a bridge rz-du: - add RZ/V2H(P) support - MIPI-DSI DCS support sitronix: - ST7567 support sun4i: - add H616 support tidss: - add TI AM62L support - AM65x OLDI bridge support bochs: - drm panic support vkms: - YUV and R* format support - use faux device vmwgfx: - fence improvements hyperv: - move out of simple - add drm_panic support" * tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits) drm/tidss: oldi: convert to devm_drm_bridge_alloc() API drm/tidss: encoder: convert to devm_drm_bridge_alloc() drm/amdgpu: move reset support type checks into the caller drm/amdgpu/sdma7: re-emit unprocessed state on ring reset drm/amdgpu/sdma6: re-emit unprocessed state on ring reset drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset drm/amdgpu/sdma5: re-emit unprocessed state on ring reset drm/amdgpu/gfx12: re-emit unprocessed state on ring reset drm/amdgpu/gfx11: re-emit unprocessed state on ring reset drm/amdgpu/gfx10: re-emit unprocessed state on ring reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu: Add WARN_ON to the resource clear function drm/amd/pm: Use cached metrics data on SMUv13.0.6 drm/amd/pm: Use cached data for min/max clocks gpu: nova-core: fix bounds check in PmuLookupTableEntry::new drm/amdgpu: Replace HQD terminology with slots naming drm/amdgpu: Add user queue instance count in HW IP info drm/amd/amdgpu: Add helper functions for isp buffers drm/amd/amdgpu: Initialize swnode for ISP MFD device ...
| * Merge tag 'amd-drm-next-6.17-2025-07-17' of ↵Dave Airlie2025-07-2152-374/+910
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.17-2025-07-17: amdgpu: - Partition fixes - Reset fixes - RAS fixes - i2c fix - MPC updates - DSC cleanup - EDID fixes - Display idle D3 update - IPS updates - DMUB updates - Retimer fix - Replay fixes - Fix DC memory leak - Initial support for smartmux - DCN 4.0.1 degamma LUT fix - Per queue reset cleanups - Track ring state associated with a fence - SR-IOV fixes - SMU fixes - Per queue reset improvements for GC 9+ compute - Per queue reset improvements for GC 10+ gfx - Per queue reset improvements for SDMA 5+ - Per queue reset improvements for JPEG 2+ - Per queue reset improvements for VCN 2+ - GC 8 fix - ISP updates amdkfd: - Enable KFD on LoongArch radeon: - Drop console lock during suspend/resume UAPI: - Add userq slot info to INFO IOCTL Used for IGT userq validation tests (https://lists.freedesktop.org/archives/igt-dev/2025-July/093228.html) From: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dave Airlie <[email protected]>
| | * drm/amdgpu: move reset support type checks into the callerAlex Deucher2025-07-1725-79/+37
| | | | | | | | | | | | | | | | | | | | | | | | Rather than checking in the callbacks, check if the reset type is supported in the caller. Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/sdma7: re-emit unprocessed state on ring resetAlex Deucher2025-07-171-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/sdma6: re-emit unprocessed state on ring resetAlex Deucher2025-07-171-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/sdma5.2: re-emit unprocessed state on ring resetAlex Deucher2025-07-171-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>