aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu
Commit message (Collapse)AuthorAgeFilesLines
* drm/amdgpu: track bo memory stats at runtimeYunxiang Li2024-12-199-139/+232
| | | | | | | | | | | | | | | | | | Before, every time fdinfo is queried we try to lock all the BOs in the VM and calculate memory usage from scratch. This works okay if the fdinfo is rarely read and the VMs don't have a ton of BOs. If either of these conditions is not true, we get a massive performance hit. In this new revision, we track the BOs as they change states. This way when the fdinfo is queried we only need to take the status lock and copy out the usage stats with minimal impact to the runtime performance. With this new approach however, we would no longer be able to track active buffers. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
* drm/amdgpu: remove unused function parameterYunxiang Li2024-12-196-12/+7
| | | | | | | | | | amdgpu_vm_bo_invalidate doesn't use the adev parameter and not all callers have a reference to adev handy, so remove it for cleanliness. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
* drm: make drm-active- stats optionalYunxiang Li2024-12-191-0/+1
| | | | | | | | | | | | | | | When memory stats is generated fresh everytime by going though all the BOs, their active information is quite easy to get. But if the stats are tracked with BO's state this becomes harder since the job scheduling part doesn't really deal with individual buffers. Make drm-active- optional to enable amdgpu to switch to the second method. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
* Merge remote-tracking branch 'drm/drm-next' into drm-misc-nextMaarten Lankhorst2024-12-098-40/+91
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The v6.13-rc2 release included a bunch of breaking changes, specifically the MODULE_IMPORT_NS commit. Backmerge in order to fix them before the next pull-request. Include the fix from Stephen Roswell. Caused by commit 25c3fd1183c0 ("drm/virtio: Add a helper to map and note the dma addrs and lengths") Interacting with commit cdd30ebb1b9f ("module: Convert symbol namespace to string literal") Reported-by: Stephen Rothwell <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Maarten Lankhorst <[email protected]>
| * Merge tag 'drm-fixes-2024-12-06' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds2024-12-067-39/+90
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more drm fixes from Simona Vetter: "Due to mailing list unreliability we missed the amdgpu pull, hence part two with that now included: - amdgu: mostly display fixes + jpeg vcn 1.0, sriov, dcn4.0 resume fixes - amdkfd fixes" * tag 'drm-fixes-2024-12-06' of https://gitlab.freedesktop.org/drm/kernel: drm/amdgpu: rework resume handling for display (v2) drm/amd/pm: fix and simplify workload handling Revert "drm/amd/pm: correct the workload setting" drm/amdgpu: fix sriov reinit late orders drm/amdgpu: Fix ISP hw init issue drm/amd/display: Add hblank borrowing support drm/amd/display: Limit VTotal range to max hw cap minus fp drm/amd/display: Correct prefetch calculation drm/amd/display: Add option to retrieve detile buffer size drm/amd/display: Add a left edge pixel if in YCbCr422 or YCbCr420 and odm drm/amdkfd: hard-code cacheline for gc943,gc944 drm/amdkfd: add MEC version that supports no PCIe atomics for GFX12 drm/amd/display: Fix programming backlight on OLED panels drm/amd: Sanity check the ACPI EDID drm/amdgpu/hdp7.0: do a posting read when flushing HDP drm/amdgpu/hdp6.0: do a posting read when flushing HDP drm/amdgpu/hdp5.2: do a posting read when flushing HDP drm/amdgpu/hdp5.0: do a posting read when flushing HDP drm/amdgpu/hdp4.0: do a posting read when flushing HDP drm/amdgpu/jpeg1.0: fix idle work handler
| | * drm/amdgpu: rework resume handling for display (v2)Alex Deucher2024-12-031-2/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split resume into a 3rd step to handle displays when DCC is enabled on DCN 4.0.1. Move display after the buffer funcs have been re-enabled so that the GPU will do the move and properly set the DCC metadata for DCN. v2: fix fence irq resume ordering Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.11.x
| | * drm/amdgpu: fix sriov reinit late ordersYiqing Yao2024-12-021-23/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use found block to call correct init/resume function on the block. Set status.hw for resume and init. Print re-init result again. Change to use dev_info. Use amdgpu_device_ip_get_ip_block to get target block instead of loop. Fixes: 502d76308d45 ("drm/amdgpu: validate resume before function call") Signed-off-by: Yiqing Yao <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu: Fix ISP hw init issuePratap Nirujogi2024-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ISP hw_init is not called with the recent changes related to hw init levels. AMDGPU_INIT_LEVEL_DEFAULT is ignoring the ISP IP block as AMDGPU_IP_BLK_MASK_ALL is derived using incorrect max number of IP blocks. Update AMDGPU_IP_BLK_MASK_ALL to use AMDGPU_MAX_IP_NUM instead of (AMDGPU_MAX_IP_NUM - 1) to fix the issue. Reviewed-by: Mario Limonciello <[email protected]> Fixes: 14f2fe34f5c6 ("drm/amdgpu: Add init levels") Signed-off-by: Pratap Nirujogi <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/hdp7.0: do a posting read when flushing HDPAlex Deucher2024-12-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Need to read back to make sure the write goes through. Cc: David Belanger <[email protected]> Reviewed-by: Frank Min <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| | * drm/amdgpu/hdp6.0: do a posting read when flushing HDPAlex Deucher2024-12-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Need to read back to make sure the write goes through. Cc: David Belanger <[email protected]> Reviewed-by: Frank Min <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| | * drm/amdgpu/hdp5.2: do a posting read when flushing HDPAlex Deucher2024-12-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Need to read back to make sure the write goes through. Cc: David Belanger <[email protected]> Reviewed-by: Frank Min <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| | * drm/amdgpu/hdp5.0: do a posting read when flushing HDPAlex Deucher2024-12-021-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Need to read back to make sure the write goes through. Cc: David Belanger <[email protected]> Reviewed-by: Frank Min <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| | * drm/amdgpu/hdp4.0: do a posting read when flushing HDPAlex Deucher2024-12-021-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Need to read back to make sure the write goes through. Cc: David Belanger <[email protected]> Reviewed-by: Frank Min <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| | * drm/amdgpu/jpeg1.0: fix idle work handlerAlex Deucher2024-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On VCN 1.0, VCN and JPEG use the same worker thread so cancel the vcn worker rather than jpeg. On VCN 2.0 and newer there are separate workers for each. Fixes: 93df74873703 ("drm/amdgpu/jpeg: cancel the jpeg worker") Tested-by: George Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| * | module: Convert symbol namespace to string literalPeter Zijlstra2024-12-021-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up the existing export namespace code along the same lines of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ && $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]*\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* | drm: remove driver date from struct drm_driver and all driversJani Nikula2024-12-052-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We stopped using the driver initialized date in commit 7fb8af6798e8 ("drm: deprecate driver date") and (eventually) started returning "0" for drm_version ioctl instead. Finish the job, and remove the unused date member from struct drm_driver, its initialization from drivers, along with the common DRIVER_DATE macros. v2: Also update drivers/accel (kernel test robot) Reviewed-by: Javier Martinez Canillas <[email protected]> Acked-by: Alex Deucher <[email protected]> Acked-by: Simon Ser <[email protected]> Acked-by: Jeffrey Hugo <[email protected]> Acked-by: Lucas De Marchi <[email protected]> Acked-by: Dmitry Baryshkov <[email protected]> # msm Reviewed-by: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/1f2bf2543aed270a06f6c707fd6ed1b78bf16712.1733322525.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <[email protected]>
* | Merge drm/drm-next into drm-misc-nextMaxime Ripard2024-12-0292-286/+1997
|\| | | | | | | | | | | Kickstart 6.14 cycle. Signed-off-by: Maxime Ripard <[email protected]>
| * Merge tag 'drm-next-2024-11-29' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds2024-11-2931-44/+233
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm fixes from Dave Airlie: "Merge window fixes, mostly amdgpu and xe, with a few other minor ones, all looks fairly normal, i915: - hdcp: Fix when the first read and write are retried xe: - Wake up waiters after wait condition set to true - Mark the preempt fence workqueue as reclaim - Update xe2 graphics name string - Fix a couple of guc submit races - Fix pat index usage in migrate - Ensure non-cached migrate pagetable bo mappings - Take a PM ref in the delayed snapshot capture worker amdgpu: - SMU 13.0.6 fixes - XGMI fixes - SMU 13.0.7 fixes - Misc code cleanups - Plane refcount fixes - DCN 4.0.1 fixes - DC power fixes - DTO fixes - NBIO 7.11 fixes - SMU 14.0.x fixes - Reset fixes - Enable DC on LoongArch - Sysfs hotplug warning fix - Misc small fixes - VCN 4.0.3 fix - Slab usage fix - Jpeg delayed work fix amdkfd: - wptr handling fixes radeon: - Use ttm_bo_move_null() - Constify struct pci_device_id - Fix spurious hotplug - HPD fix rockchip - fix 32-bit build" * tag 'drm-next-2024-11-29' of https://gitlab.freedesktop.org/drm/kernel: (48 commits) drm/xe: Take PM ref in delayed snapshot capture worker drm/xe/migrate: use XE_BO_FLAG_PAGETABLE drm/xe/migrate: fix pat index usage drm/xe/guc_submit: fix race around suspend_pending drm/xe/guc_submit: fix race around pending_disable drm/xe: Update xe2_graphics name string drm/rockchip: avoid 64-bit division Revert "drm/radeon: Delay Connector detecting when HPD singals is unstable" drm/amdgpu/jpeg: cancel the jpeg worker drm/amdgpu: fix usage slab after free drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3 drm/amdgpu: Fix sysfs warning when hotplugging drm/amdgpu: Add sysfs interface for vcn reset mask drm/amdgpu/gmc7: fix wait_for_idle callers drm/amd/pm: Remove arcturus min power limit drm/amd/pm: skip setting the power source on smu v14.0.2/3 drm/amd/pm: disable pcie speed switching on Intel platform for smu v14.0.2/3 drm/amdkfd: Use the correct wptr size drm/xe: Mark preempt fence workqueue as reclaim drm/xe/ufence: Wake up waiters after setting ufence->signalled ...
| | * drm/amdgpu/jpeg: cancel the jpeg workerAlex Deucher2024-11-217-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | Looks like these got missed when jpeg was split from vcn. Cancel the jpeg workers rather than vcn workers. Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu: fix usage slab after freeVitaly Prosyak2024-11-212-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ +0.000021] BUG: KASAN: slab-use-after-free in drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000027] Read of size 8 at addr ffff8881b8605f88 by task amd_pci_unplug/2147 [ +0.000023] CPU: 6 PID: 2147 Comm: amd_pci_unplug Not tainted 6.10.0+ #1 [ +0.000016] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000016] Call Trace: [ +0.000008] <TASK> [ +0.000009] dump_stack_lvl+0x76/0xa0 [ +0.000017] print_report+0xce/0x5f0 [ +0.000017] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000019] ? srso_return_thunk+0x5/0x5f [ +0.000015] ? kasan_complete_mode_report_info+0x72/0x200 [ +0.000016] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000019] kasan_report+0xbe/0x110 [ +0.000015] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000023] __asan_report_load8_noabort+0x14/0x30 [ +0.000014] drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000016] ? __pfx_drm_sched_entity_flush+0x10/0x10 [gpu_sched] [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? enable_work+0x124/0x220 [ +0.000015] ? __pfx_enable_work+0x10/0x10 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? free_large_kmalloc+0x85/0xf0 [ +0.000016] drm_sched_entity_destroy+0x18/0x30 [gpu_sched] [ +0.000020] amdgpu_vce_sw_fini+0x55/0x170 [amdgpu] [ +0.000735] ? __kasan_check_read+0x11/0x20 [ +0.000016] vce_v4_0_sw_fini+0x80/0x110 [amdgpu] [ +0.000726] amdgpu_device_fini_sw+0x331/0xfc0 [amdgpu] [ +0.000679] ? mutex_unlock+0x80/0xe0 [ +0.000017] ? __pfx_amdgpu_device_fini_sw+0x10/0x10 [amdgpu] [ +0.000662] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? __kasan_check_write+0x14/0x30 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? mutex_unlock+0x80/0xe0 [ +0.000016] amdgpu_driver_release_kms+0x16/0x80 [amdgpu] [ +0.000663] drm_minor_release+0xc9/0x140 [drm] [ +0.000081] drm_release+0x1fd/0x390 [drm] [ +0.000082] __fput+0x36c/0xad0 [ +0.000018] __fput_sync+0x3c/0x50 [ +0.000014] __x64_sys_close+0x7d/0xe0 [ +0.000014] x64_sys_call+0x1bc6/0x2680 [ +0.000014] do_syscall_64+0x70/0x130 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? irqentry_exit_to_user_mode+0x60/0x190 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? irqentry_exit+0x43/0x50 [ +0.000012] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? exc_page_fault+0x7c/0x110 [ +0.000015] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000014] RIP: 0033:0x7ffff7b14f67 [ +0.000013] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 73 ba f7 ff [ +0.000026] RSP: 002b:00007fffffffe378 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ +0.000019] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff7b14f67 [ +0.000014] RDX: 0000000000000000 RSI: 00007ffff7f6f47a RDI: 0000000000000003 [ +0.000014] RBP: 00007fffffffe3a0 R08: 0000555555569890 R09: 0000000000000000 [ +0.000014] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5c8 [ +0.000013] R13: 00005555555552a9 R14: 0000555555557d48 R15: 00007ffff7ffd040 [ +0.000020] </TASK> [ +0.000016] Allocated by task 383 on cpu 7 at 26.880319s: [ +0.000014] kasan_save_stack+0x28/0x60 [ +0.000008] kasan_save_track+0x18/0x70 [ +0.000007] kasan_save_alloc_info+0x38/0x60 [ +0.000007] __kasan_kmalloc+0xc1/0xd0 [ +0.000007] kmalloc_trace_noprof+0x180/0x380 [ +0.000007] drm_sched_init+0x411/0xec0 [gpu_sched] [ +0.000012] amdgpu_device_init+0x695f/0xa610 [amdgpu] [ +0.000658] amdgpu_driver_load_kms+0x1a/0x120 [amdgpu] [ +0.000662] amdgpu_pci_probe+0x361/0xf30 [amdgpu] [ +0.000651] local_pci_probe+0xe7/0x1b0 [ +0.000009] pci_device_probe+0x248/0x890 [ +0.000008] really_probe+0x1fd/0x950 [ +0.000008] __driver_probe_device+0x307/0x410 [ +0.000007] driver_probe_device+0x4e/0x150 [ +0.000007] __driver_attach+0x223/0x510 [ +0.000006] bus_for_each_dev+0x102/0x1a0 [ +0.000007] driver_attach+0x3d/0x60 [ +0.000006] bus_add_driver+0x2ac/0x5f0 [ +0.000006] driver_register+0x13d/0x490 [ +0.000008] __pci_register_driver+0x1ee/0x2b0 [ +0.000007] llc_sap_close+0xb0/0x160 [llc] [ +0.000009] do_one_initcall+0x9c/0x3e0 [ +0.000008] do_init_module+0x241/0x760 [ +0.000008] load_module+0x51ac/0x6c30 [ +0.000006] __do_sys_init_module+0x234/0x270 [ +0.000007] __x64_sys_init_module+0x73/0xc0 [ +0.000006] x64_sys_call+0xe3/0x2680 [ +0.000006] do_syscall_64+0x70/0x130 [ +0.000007] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000015] Freed by task 2147 on cpu 6 at 160.507651s: [ +0.000013] kasan_save_stack+0x28/0x60 [ +0.000007] kasan_save_track+0x18/0x70 [ +0.000007] kasan_save_free_info+0x3b/0x60 [ +0.000007] poison_slab_object+0x115/0x1c0 [ +0.000007] __kasan_slab_free+0x34/0x60 [ +0.000007] kfree+0xfa/0x2f0 [ +0.000007] drm_sched_fini+0x19d/0x410 [gpu_sched] [ +0.000012] amdgpu_fence_driver_sw_fini+0xc4/0x2f0 [amdgpu] [ +0.000662] amdgpu_device_fini_sw+0x77/0xfc0 [amdgpu] [ +0.000653] amdgpu_driver_release_kms+0x16/0x80 [amdgpu] [ +0.000655] drm_minor_release+0xc9/0x140 [drm] [ +0.000071] drm_release+0x1fd/0x390 [drm] [ +0.000071] __fput+0x36c/0xad0 [ +0.000008] __fput_sync+0x3c/0x50 [ +0.000007] __x64_sys_close+0x7d/0xe0 [ +0.000007] x64_sys_call+0x1bc6/0x2680 [ +0.000007] do_syscall_64+0x70/0x130 [ +0.000007] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000014] The buggy address belongs to the object at ffff8881b8605f80 which belongs to the cache kmalloc-64 of size 64 [ +0.000020] The buggy address is located 8 bytes inside of freed 64-byte region [ffff8881b8605f80, ffff8881b8605fc0) [ +0.000028] The buggy address belongs to the physical page: [ +0.000011] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1b8605 [ +0.000008] anon flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff) [ +0.000007] page_type: 0xffffefff(slab) [ +0.000009] raw: 0017ffffc0000000 ffff8881000428c0 0000000000000000 dead000000000001 [ +0.000006] raw: 0000000000000000 0000000000200020 00000001ffffefff 0000000000000000 [ +0.000006] page dumped because: kasan: bad access detected [ +0.000012] Memory state around the buggy address: [ +0.000011] ffff8881b8605e80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ +0.000015] ffff8881b8605f00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ +0.000015] >ffff8881b8605f80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ +0.000013] ^ [ +0.000011] ffff8881b8606000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc [ +0.000014] ffff8881b8606080: fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb fb [ +0.000013] ================================================================== The issue reproduced on VG20 during the IGT pci_unplug test. The root cause of the issue is that the function drm_sched_fini is called before drm_sched_entity_kill. In drm_sched_fini, the drm_sched_rq structure is freed, but this structure is later accessed by each entity within the run queue, leading to invalid memory access. To resolve this, the order of cleanup calls is updated: Before: amdgpu_fence_driver_sw_fini amdgpu_device_ip_fini After: amdgpu_device_ip_fini amdgpu_fence_driver_sw_fini This updated order ensures that all entities in the IPs are cleaned up first, followed by proper cleanup of the schedulers. Additional Investigation: During debugging, another issue was identified in the amdgpu_vce_sw_fini function. The vce.vcpu_bo buffer must be freed only as the final step in the cleanup process to prevent any premature access during earlier cleanup stages. v2: Using Christian suggestion call drm_sched_entity_destroy before drm_sched_fini. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Vitaly Prosyak <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
| | * drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3Xiang Liu2024-11-211-8/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is not necessarily corrupted. When there is RAS fatal error, device memory access is blocked. Hence vcpu bo cannot be saved to system memory as in a regular suspend sequence before going for reset. In other full device reset cases, that gets saved and restored during resume. v2: Remove redundant code like vcn_v4_0 did v2: Refine commit message v3: Drop the volatile v3: Refine commit message Signed-off-by: Xiang Liu <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Stanley.Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu: Fix sysfs warning when hotplugging[email protected]2024-11-217-14/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the similar warning when hotplugging: [ 155.585721] kernfs: can not remove 'enforce_isolation', no directory [ 155.592201] WARNING: CPU: 3 PID: 6960 at fs/kernfs/dir.c:1683 kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.601145] Modules linked in: xt_MASQUERADE xt_comment nft_compat veth bridge stp llc overlay nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr intel_rapl_msr amd_atl intel_rapl_common amd64_edac edac_mce_amd amdgpu kvm_amd kvm ipmi_ssif amdxcp rapl drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_ttm_helper ttm pcspkr drm_display_helper acpi_cpufreq drm_kms_helper video wmi k10temp i2c_piix4 acpi_ipmi ipmi_si drm zram ip_tables loop squashfs dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 sp5100_tco ixgbe rfkill ccp dca sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_msghandler fuse [ 155.685224] systemd-journald[1354]: Compressed data object 957 -> 524 using ZSTD [ 155.685687] CPU: 3 PID: 6960 Comm: amd_pci_unplug Not tainted 6.10.0-1148853.1.zuul.164395107d6642bdb451071313e9378d #1 [ 155.704149] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019 [ 155.712383] RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.717805] Code: a0 00 48 89 ef e8 37 96 c7 ff 5b b8 fe ff ff ff 5d 41 5c 41 5d e9 f7 96 a0 00 0f 0b eb ab 48 c7 c7 48 ba 7e 8f e8 f7 66 bf ff <0f> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 [ 155.736766] RSP: 0018:ffffb1685d7a3e20 EFLAGS: 00010296 [ 155.742108] RAX: 0000000000000038 RBX: ffff929e94c80000 RCX: 0000000000000000 [ 155.749363] RDX: ffff928e1efaf200 RSI: ffff928e1efa18c0 RDI: ffff928e1efa18c0 [ 155.756612] RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000003 [ 155.763855] R10: ffffb1685d7a3cd8 R11: ffffffff8fb3e1c8 R12: ffffffffc1ef5341 [ 155.771104] R13: ffff929e94cc5530 R14: 0000000000000000 R15: 0000000000000000 [ 155.778357] FS: 00007fd9dd8d9c40(0000) GS:ffff928e1ef80000(0000) knlGS:0000000000000000 [ 155.786594] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 155.792450] CR2: 0000561245ceee38 CR3: 0000000113018000 CR4: 00000000003506f0 [ 155.799702] Call Trace: [ 155.802254] <TASK> [ 155.804460] ? __warn+0x80/0x120 [ 155.807798] ? kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.812617] ? report_bug+0x164/0x190 [ 155.816393] ? handle_bug+0x3c/0x80 [ 155.819994] ? exc_invalid_op+0x17/0x70 [ 155.823939] ? asm_exc_invalid_op+0x1a/0x20 [ 155.828235] ? kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.833058] amdgpu_gfx_sysfs_fini+0x59/0xd0 [amdgpu] [ 155.838637] gfx_v9_0_sw_fini+0x123/0x1c0 [amdgpu] [ 155.843887] amdgpu_device_fini_sw+0xbc/0x3e0 [amdgpu] [ 155.849432] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] [ 155.855235] drm_dev_put.part.0+0x3c/0x60 [drm] [ 155.859914] drm_release+0x8b/0xc0 [drm] [ 155.863978] __fput+0xf1/0x2c0 [ 155.867141] __x64_sys_close+0x3c/0x80 [ 155.870998] do_syscall_64+0x64/0x170 V2: Add details in comments (Tim) Signed-off-by: Jesse Zhang <[email protected]> Reported-by: Andy Dong <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu: Add sysfs interface for vcn reset mask[email protected]2024-11-215-0/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the sysfs interface for vcn: vcn_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) V2: the sysfs node returns a text string instead of some flags (Christian) v3: add a generic helper which takes the ring as parameter and print the strings in the order they are applied (Christian) check amdgpu_gpu_recovery before creating sysfs file itself, and initialize supported_reset_types in IP version files (Lijo) v4: s/sdma/vcn/ in the reset mask setup Acked-by: Christian König <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu/gmc7: fix wait_for_idle callersAlex Deucher2024-11-211-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The wait_for_idle signature was changed, but the callers were not. Reviewed-by: Sunil Khatri <[email protected]> Reported-by: Michel Dänzer <[email protected]> Fixes: 82ae6619a450 ("drm/amdgpu: update the handle ptr in wait_for_idle") Signed-off-by: Alex Deucher <[email protected]> Cc: Sunil Khatri <[email protected]>
| | * drm/amdgpu: Use reset recovery state checksLijo Lazar2024-11-202-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some in_reset checks are infact checking whether the state is reinitialization after reset. Replace with reset_in_recovery calls to identify that it's really checking for recovery stage after reset. Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amdgpu: Add init level for post reset reinitLijo Lazar2024-11-207-3/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When device needs to be reset before initialization, it's not required for all IPs to be initialized before a reset. In such cases, it needs to identify whether the IP/feature is initialized for the first time or whether it's reinitialized after a reset. Add RESET_RECOVERY init level to identify post reset reinitialization phase. This only provides a device level identification, IP/features may choose to track their state independently also. Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amd: Fix initialization mistake for NBIO 7.11 devicesMario Limonciello2024-11-201-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a strapping issue on NBIO 7.11.x that can lead to spurious PME events while in the D0 state. Cc: [email protected] Reviewed-by: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * drm/amd/pm: Get xgmi link status for XGMI_v_6_4_0Asad Kamal2024-11-202-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get XGMI_v_6_4_0 link status and populate it to metrics v1_7 for SMU_v_13_0_6 v2: Get link status register value for each soc from separate function (Lijo) Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| * | Merge tag 'driver-core-6.13-rc1' of ↵Linus Torvalds2024-11-291-1/+1
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is a small set of driver core changes for 6.13-rc1. Nothing major for this merge cycle, except for the two simple merge conflicts are here just to make life interesting. Included in here are: - sysfs core changes and preparations for more sysfs api cleanups that can come through all driver trees after -rc1 is out - fw_devlink fixes based on many reports and debugging sessions - list_for_each_reverse() removal, no one was using it! - last-minute seq_printf() format string bug found and fixed in many drivers all at once. - minor bugfixes and changes full details in the shortlog" * tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (35 commits) Fix a potential abuse of seq_printf() format string in drivers cpu: Remove spurious NULL in attribute_group definition s390/con3215: Remove spurious NULL in attribute_group definition perf: arm-ni: Remove spurious NULL in attribute_group definition driver core: Constify bin_attribute definitions sysfs: attribute_group: allow registration of const bin_attribute firmware_loader: Fix possible resource leak in fw_log_firmware_info() drivers: core: fw_devlink: Fix excess parameter description in docstring driver core: class: Correct WARN() message in APIs class_(for_each|find)_device() cacheinfo: Use of_property_present() for non-boolean properties cdx: Fix cdx_mmap_resource() after constifying attr in ->mmap() drivers: core: fw_devlink: Make the error message a bit more useful phy: tegra: xusb: Set fwnode for xusb port devices drm: display: Set fwnode for aux bus devices driver core: fw_devlink: Stop trying to optimize cycle detection logic driver core: Constify attribute arguments of binary attributes sysfs: bin_attribute: add const read/write callback variants sysfs: implement all BIN_ATTR_* macros in terms of __BIN_ATTR() sysfs: treewide: constify attribute callback of bin_attribute::llseek() sysfs: treewide: constify attribute callback of bin_attribute::mmap() ...
| | * | sysfs: treewide: constify attribute callback of bin_is_visible()Thomas Weißschuh2024-11-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The is_bin_visible() callbacks should not modify the struct bin_attribute passed as argument. Enforce this by marking the argument as const. As there are not many callback implementers perform this change throughout the tree at once. Signed-off-by: Thomas Weißschuh <[email protected]> Acked-by: Martin K. Petersen <[email protected]> Acked-by: Jason Gunthorpe <[email protected]> Acked-by: Ira Weiny <[email protected]> Acked-by: Krzysztof Wilczyński <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
| * | | Merge tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds2024-11-21159-4137/+6545
| |\ \ \ | | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm updates from Dave Airlie: "There's a lot of rework, the panic helper support is being added to more drivers, v3d gets support for HW superpages, scheduler documentation, drm client and video aperture reworks, some new MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel has some Pantherlake enablement and xe is getting some SRIOV bits, but just lots of stuff everywhere. core: - split DSC helpers from DP helpers - clang build fixes for drm/mm test - drop simple pipeline support for gem vram - document submission error signaling - move drm_rect to drm core module from kms helper - add default client setup to most drivers - move to video aperture helpers instead of drm ones tests: - new framebuffer tests ttm: - remove swapped and pinned BOs from TTM lru panic: - fix uninit spinlock - add ABGR2101010 support bridge: - add TI TDP158 support - use standard PM OPS dma-fence: - use read_trylock instead of read_lock to help lockdep scheduler: - add errno to sched start to report different errors - add locking to drm_sched_entity_modify_sched - improve documentation xe: - add drm_line_printer - lots of refactoring - Enable Xe2 + PES disaggregation - add new ARL PCI ID - SRIOV development work - fix exec unnecessary implicit fence - define and parse OA sync props - forcewake refactoring i915: - Enable BMG/LNL ultra joiner - Enable 10bpx + CCS scanout on ICL+, fp16/CCS on TGL+ - use DSB for plane/color mgmt - Arrow lake PCI IDs - lots of i915/xe display refactoring - enable PXP GuC autoteardown - Pantherlake (PTL) Xe3 LPD display enablement - Allow fastset HDR infoframe changes - write DP source OUI for non-eDP sinks - share PCI IDs between i915 and xe amdgpu: - SDMA queue reset support - SMU 13.0.6, JPEG 4.0.3 updates - Initial runtime repartitioning support - rework IP structs for multiple IP instances - Fetch EDID from _DDC if available - SMU13 zero rpm user control - lots of fixes/cleanups amdkfd: - Increase event FIFO size - add topology cap flag for per queue reset msm: - DPU: - SA8775P support - (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support - Enable large framebuffer support - Drop MSM8998 and SDM845 - DP: - SA8775P support - GPU: - a7xx preemption support - Adreno A663 support ast: - warn about unsupported TX chips ivpu: - add coredump - add pantherlake support rockchip: - 4K@60Hz display enablement - generate pll programming tables panthor: - add timestamp query API - add realtime group priority - add fdinfo support etnaviv: - improve handling of DMA address limits - improve GPU hangcheck exynos: - Decon Exynos7870 support mediatek: - add OF graph support omap: - locking fixes bochs: - convert to gem/shmem from simpledrm v3d: - support big/super pages - add gemfs vc4: - BCM2712 support refactoring - add YUV444 format support udmabuf: - folio related fixes nouveau: - add panic support on nv50+" * tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel: (1583 commits) drm/xe/guc: Fix dereference before NULL check drm/amd: Fix initialization mistake for NBIO 7.7.0 Revert "drm/amd/display: parse umc_info or vram_info based on ASIC" drm/amd/display: Fix failure to read vram info due to static BP_RESULT drm/amdgpu: enable GTT fallback handling for dGPUs only drm/amd/amdgpu: limit single process inside MES drm/fourcc: add AMD_FMT_MOD_TILE_GFX9_4K_D_X drm/amdgpu/mes12: correct kiq unmap latency drm/amdgpu: Support vcn and jpeg error info parsing drm/amd : Update MES API header file for v11 & v12 drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling drm/amdkfd: change kfd process kref count at creation drm/amdgpu: Cleanup shift coding style drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data drm/amdgpu: Implement virt req_ras_err_count drm/amdgpu: VF Query RAS Caps from Host if supported drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support drm/amd/display: 3.2.309 drm/amd/display: Adjust VSDB parser for replay feature ...
| | * | drm/amd: Fix initialization mistake for NBIO 7.7.0Vijendar Mukunda2024-11-121-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a strapping issue on NBIO 7.7.0 that can lead to spurious PME events while in the D0 state. Co-developed-by: Mario Limonciello <[email protected]> Signed-off-by: Vijendar Mukunda <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Acked-by: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: enable GTT fallback handling for dGPUs onlyChristian König2024-11-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | That is just a waste of time on APUs. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3704 Fixes: 216c1282dde3 ("drm/amdgpu: use GTT only as fallback for VRAM|GTT") Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amd/amdgpu: limit single process inside MESShaoyun Liu2024-11-125-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is for MES to limit only one process for the user queues Signed-off-by: Shaoyun Liu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu/mes12: correct kiq unmap latencyJack Xiao2024-11-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Correct kiq unmap queue timeout value. Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: Cleanup shift coding styleAdvait Dhamorikar2024-11-111-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improves the coding style by updating bit-shift operations in the amdgpu_jpeg.c driver file. It ensures consistency and avoids potential issues by explicitly using 1U and 1ULL for unsigned and unsigned long long shifts in all relevant instances. Signed-off-by: Advait Dhamorikar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amd/amdgpu: Increase MES log buffer to dump mes scratch datashaoyunl2024-11-113-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MES internal scratch data is useful for mes debug, it can only located in VRAM, change the allocation type and increase size for mes 11 Signed-off-by: shaoyunl <[email protected]> Acked-by: Feifei Xu <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: Implement virt req_ras_err_countVictor Skvortsov2024-11-117-7/+229
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable RAS late init if VF RAS Telemetry is supported. When enabled, the VF can use this interface to query total RAS error counts from the host. The VF FB access may abruptly end due to a fatal error, therefore the VF must cache and sanitize the input. The Host allows 15 Telemetry messages every 60 seconds, afterwhich the host will ignore any more in-coming telemetry messages. The VF will rate limit its msg calling to once every 5 seconds (12 times in 60 seconds). While the VF is rate limited, it will continue to report the last good cached data. v2: Flip generate report & update statistics order for VF Signed-off-by: Victor Skvortsov <[email protected]> Acked-by: Tao Zhou <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: VF Query RAS Caps from Host if supportedVictor Skvortsov2024-11-113-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If VF RAS Capability support is enabled, guest is able to retrieve the real RAS support from the host. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: Add msg handlers for SRIOV RAS TelemetryVictor Skvortsov2024-11-112-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add message handlers for RAS telemetry. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry SupportVictor Skvortsov2024-11-112-19/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SRIOV PF/VF Data exchange is extended by 64KB for VF RAS Telemetry data. Add Host RAS Telemetry enable capabilities bitfields. Add a new VF msg REQ_RAS_ERROR_COUNT, the host response data will be populated in the RAS Telemetry region. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu/gfx11: Enable cleaner shader for GFX11.0.0/11.0.2 GPUsSrinivasan Shanmugam2024-11-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable the cleaner shader for GFX11.0.0/11.0.2 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX11.0.0/11.0.2 GPUs, previously available for GFX11.0.3. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: fix check in gmc_v9_0_get_vm_pte()Christian König2024-11-081-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The coherency flags can only be determined when the BO is locked and that in turn is only guaranteed when the mapping is validated. Fix the check, move the resource check into the function and add an assert that the BO is locked. Signed-off-by: Christian König <[email protected]> Fixes: d1a372af1c3d ("drm/amdgpu: Set MTYPE in PTE based on BO flags") Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: Inform if PCIe based P2P links are not availableRamesh Errabolu2024-11-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Raise an info message in kernel log if PCIe root complex determines that a AMD GPU device D<i> cannot have P2P communication with another AMD GPU device D<j> Signed-off-by: Ramesh Errabolu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: Fix video caps for H264 and HEVC encode maximum sizeDavid Rosca2024-11-085-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | H264 supports 4096x4096 starting from Polaris. HEVC also supports 4096x4096, with VCN 3 and newer 8192x4352 is supported. Signed-off-by: David Rosca <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: Add sysfs interface for jpeg reset mask[email protected]2024-11-086-0/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the sysfs interface for jpeg: jpeg_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) v3: add a generic helper which takes the ring as parameter and print the strings in the order they are applied (Christian) check amdgpu_gpu_recovery before creating sysfs file itself, and initialize supported_reset_types in IP version files (Lijo) Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: Add sysfs interface for vpe reset mask[email protected]2024-11-082-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the sysfs interface for vpe: vpe_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) v3: add a generic helper which takes the ring as parameter and print the strings in the order they are applied (Christian) check amdgpu_gpu_recovery before creating sysfs file itself, and initialize supported_reset_types in IP version files (Lijo) Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: Add sysfs interface for sdma reset mask[email protected]2024-11-086-0/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the sysfs interface for sdma: sdma_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) v3: add a generic helper which takes the ring as parameter and print the strings in the order they are applied (Christian) check amdgpu_gpu_recovery before creating sysfs file itself, and initialize supported_reset_types in IP version files (Lijo) Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: Normalize reg offsets on VCN v4.0.3Sathishkumar S2024-11-081-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remote access to external AIDs isn't possible with VCN RRMT disabled and it is disabled on SoCs with GC 9.4.4, so use only local offsets. Signed-off-by: Sathishkumar S <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| | * | drm/amdgpu: Avoid kcq disable during resetLijo Lazar2024-11-081-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reset sequence indicates that hardware already ran into a bad state. Avoid sending unmap queue request to reset KCQ. This will also cover RAS error scenarios which need a reset to recover, hence remove the check. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>