diff options
| author | Jesse.Zhang <[email protected]> | 2025-05-09 09:18:16 +0000 |
|---|---|---|
| committer | Alex Deucher <[email protected]> | 2025-05-13 13:32:25 +0000 |
| commit | 648a0dc0d78c369233b16878e4f351efe7fd8df6 (patch) | |
| tree | 0e4625bf01e204a222773f0f9002aa6144c7abbf /drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | |
| parent | drm/amdgpu: Fix the kernel panic caused by RAS records exceed threshold (diff) | |
| download | kernel-648a0dc0d78c369233b16878e4f351efe7fd8df6.tar.gz kernel-648a0dc0d78c369233b16878e4f351efe7fd8df6.zip | |
drm/amdgpu: Fix user queue deadlock by reordering mutex locking
This resolves a deadlock between user queue management and GPU reset
paths by enforcing consistent lock ordering.
The deadlock occurred when:
1. Process exit path (amdgpu_userq_mgr_fini) would:
- Take uqm->userq_mutex
- Then try to take adev->userq_mutex for list operations
2. GPU reset path (amdgpu_userq_pre_reset) would:
- Take adev->userq_mutex first (for list traversal)
- Then take uqm->userq_mutex
The solution establishes a strict top-down locking order:
1. Always take adev->userq_mutex before any uqm->userq_mutex
2. Maintain this order consistently across all code paths
Changes made:
- Reordered locking in amdgpu_userq_mgr_fini() to take device lock first
- Kept existing proper order in amdgpu_userq_pre_reset()
- Simplified the fini flow by removing redundant operations
This prevents circular dependencies while maintaining thread safety
during both normal operation and GPU reset scenarios.
Fixes: 4ce60dbada96 ("drm/amdgpu: store userq_managers in a list in adev")
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Arvind Yadav <[email protected]>
Signed-off-by: Jesse Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c')
0 files changed, 0 insertions, 0 deletions
