aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
diff options
context:
space:
mode:
authorBalasubramani Vivekanandan <[email protected]>2025-08-01 05:23:56 +0000
committerRodrigo Vivi <[email protected]>2025-08-04 15:58:56 +0000
commit465f1dba74c010995190bff267ae5a75afcdcfea (patch)
tree65503af931a8bec770e42f4aa3c66793cddb674d /drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
parentdrm/xe/pf: Enable SR-IOV PF mode by default (diff)
downloadkernel-465f1dba74c010995190bff267ae5a75afcdcfea.tar.gz
kernel-465f1dba74c010995190bff267ae5a75afcdcfea.zip
drm/xe/devcoredump: Defer devcoredump initialization during probe
Doing devcoredump initializing before GT though look harmless, it leads to problem during driver unbind. Because of this order, GT/Engine release functions will be called before xe devcoredump release function (xe_driver_devcoredump_fini) leading to the following kernel crash[1] because the devcoredump functions might still use GT/Engine datastructures after those are freed. The following crash is observed while running the IGT xe_wedged@wedged-at-any-timeout. The test forces a wedged state by submitting a workload which hangs. Then does a unbind/rebind of the driver to recover from the wedged state. The hanged workload leads to a devcoredump. The following crash is noticed when the devcoredump capture races with the driver unbind. During driver unbind, the release function hw_engine_fini() will be called which assigns NULL to hwe->gt. But the same data structure is accessed during the coredump capture in the function xe_engine_snapshot_print by reading snapshot->hwe->gt. With this patch, we make sure the devcoredump is stopped before deinitializing the core driver functions. [1]: BUG: kernel NULL pointer dereference, address: 0000000000000000 Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe] RIP: 0010:xe_engine_snapshot_print+0x47/0x420 [xe] Call Trace: <TASK> ? drm_printf+0x64/0x90 __xe_devcoredump_read+0x23f/0x2d0 [xe] ? __pfx___drm_printfn_coredump+0x10/0x10 ? __pfx___drm_puts_coredump+0x10/0x10 xe_devcoredump_deferred_snap_work+0x17a/0x190 [xe] process_one_work+0x22e/0x6f0 worker_thread+0x1e8/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x11f/0x250 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x47/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 v2: Detailed commit description (Rodrigo) v3: FIXME added (Rodrigo, Stuart) Fixes: 4209d635a823 ("drm/xe: Remove devcoredump during driver release") Reviewed-by: Stuart Summers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Balasubramani Vivekanandan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> (cherry picked from commit 1fdc4c381ff765479d76ccf3134717c430c871b8) Signed-off-by: Rodrigo Vivi <[email protected]>
Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c')
0 files changed, 0 insertions, 0 deletions