aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
Commit message (Collapse)AuthorAgeFilesLines
* drm/amdgpu/userq: add suspend and resume helpersAlex Deucher2025-04-211-0/+39
| | | | | | | | Add helpers to unmap and map user queues on suspend and resume. Reviewed-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu/userq: properly clean up userq fence driver on failureAlex Deucher2025-04-211-0/+4
| | | | | | | | | | If userq creation fails, we need to properly unwind and free the user queue fence driver. v2: free idr as well (Sunil) Reviewed-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu/userq: move some code aroundAlex Deucher2025-04-211-26/+0
| | | | | | | | Move some userq fence handling code into amdgpu_userq_fence.c. This matches the other code in that file. Reviewed-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu/userq: rework front end call sequenceAlex Deucher2025-04-211-2/+15
| | | | | | | | | | Split out the queue map from the mqd create call and split out the queue unmap from the mqd destroy call. This splits the queue setup and teardown with the actual enablement in the firmware. Reviewed-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu/userq: rename suspend/resume callbacksAlex Deucher2025-04-211-5/+5
| | | | | | | | | Rename to map and umap to better align with what is happening at the firmware level and remove the extra level of indirection in the MES userq code. Reviewed-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: store userq_managers in a list in adevAlex Deucher2025-04-081-1/+15
| | | | | | | | | | So we can iterate across them when we need to manage all user queues. v2: add uq_mgr to adev list in amdgpu_userq_mgr_init Reviewed-by: Arunpravin Paneer Selvam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Fix display freezing issue when resizing appsArvind Yadav2025-04-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | The display is freezing because the amdgpu_userq_wait_ioctl() is waiting for a non-user queue fence(specifically, the PT update fence). RootCause: The resume_work is initiated by both amdgpu_userq_suspend and amdgpu_userqueue_ensure_ev_fence at same time. The amdgpu_userq_suspend signals a dma-fence and subsequently triggers the resume_work, which is intended to replace the existing fence by creating new dma-fence. However, following this, the amdgpu_userqueue_ensure_ev_fence schedules another resume_work that generates a new dma-fence, thereby replacing the one created by amdgpu_userq_suspend. Consequently, the original fence will never be signaled. Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: Shashank Sharma <[email protected]> Cc: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Shashank Sharma <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: validate user queue parametersAlex Deucher2025-04-081-0/+14
| | | | | | | | Make sure these are set properly to ensure compatibility if we ever update the IOCTL interface. Reviewed-by: Prike Liang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: return an error in the userq IOCTL when DRM_AMDGPU_NAVI3X_USERQ=nAlex Deucher2025-04-081-1/+1
| | | | | | | | I'd swear this was already fixed, but I guess the patch never landed. Add it now. Reviewed-by: Prike Liang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu/userq: fix hardcoded uq functionsAlex Deucher2025-04-081-6/+6
| | | | | | | | Use the IP type to look up the userq functions rather than hardcoding it. Reviewed-by: Saleemkhan Jamadar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Fix display freeze lockup errorArvind Yadav2025-04-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A deadlock situation has arised between the userq signal ioctl and the eviction fence. In this scenario, the function amdgpu_userq_signal_ioctl() has acquired a reservation lock on the read/write buffer object (BO) through drm_exec. Subsequently, it calls amdgpu_userqueue_ensure_ev_fence(), which is in a waiting for the userq resume work. Meanwhile, the userq suspend worker has initiated the userq resume work(amdgpu_userqueue_resume_worker). This userq resume work attempts to validate the vm->done BO, leading to amdgpu_userqueue_validate_bos also attempting to reservation lock the same write BO that is already locked by amdgpu_userq_signal_ioctl. As a result, the resume work becomes stalled, causing amdgpu_userqueue_ensure_ev_fence to remain in a waiting state. Call Trace: [ 242.836469] INFO: task gnome-shel:cs0:1288 blocked for more than 120 seconds. [ 242.836486] Tainted: G OE 6.12.0-rc2rebased-oct-24+ #4 [ 242.836491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.836494] task:gnome-shel:cs0 state:D stack:0 pid:1288 tgid:1282 ppid:1180 flags:0x00000002 [ 242.836503] Call Trace: [ 242.836508] <TASK> [ 242.836517] __schedule+0x3e0/0xb10 [ 242.836530] ? srso_return_thunk+0x5/0x5f [ 242.836541] schedule+0x31/0x120 [ 242.836546] schedule_timeout+0x150/0x160 [ 242.836551] ? srso_return_thunk+0x5/0x5f [ 242.836555] ? sysvec_call_function+0x69/0xd0 [ 242.836562] ? srso_return_thunk+0x5/0x5f [ 242.836567] ? preempt_count_add+0x7f/0xd0 [ 242.836577] __wait_for_common+0x91/0x180 [ 242.836582] ? __pfx_schedule_timeout+0x10/0x10 [ 242.836590] wait_for_completion+0x28/0x30 [ 242.836595] __flush_work+0x16c/0x290 [ 242.836602] ? __pfx_wq_barrier_func+0x10/0x10 [ 242.836611] flush_delayed_work+0x3a/0x60 [ 242.836621] amdgpu_userqueue_ensure_ev_fence+0x2d/0xb0 [amdgpu] [ 242.836966] amdgpu_userq_signal_ioctl+0x959/0xec0 [amdgpu] [ 242.837171] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ 242.837365] drm_ioctl_kernel+0xae/0x100 [drm] [ 242.837398] drm_ioctl+0x2a1/0x500 [drm] [ 242.837420] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ 242.837622] ? srso_return_thunk+0x5/0x5f [ 242.837627] ? srso_return_thunk+0x5/0x5f [ 242.837630] ? _raw_spin_unlock_irqrestore+0x2b/0x50 [ 242.837635] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu] [ 242.837811] __x64_sys_ioctl+0x99/0xd0 [ 242.837820] x64_sys_call+0x1209/0x20d0 [ 242.837825] do_syscall_64+0x51/0x120 [ 242.837830] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 242.837835] RIP: 0033:0x7f2f33f1a94f [ 242.837838] RSP: 002b:00007f2f24ffea30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 242.837842] RAX: ffffffffffffffda RBX: 00007f2f24ffebd0 RCX: 00007f2f33f1a94f [ 242.837845] RDX: 00007f2f24ffebd0 RSI: 00000000c0306457 RDI: 000000000000000d [ 242.837847] RBP: 00007f2f24ffeab0 R08: 0000000000000000 R09: 0000000000000000 [ 242.837849] R10: 00007f2f24ffecd0 R11: 0000000000000246 R12: 00007f2f25000640 [ 242.837851] R13: 00000000c0306457 R14: 000000000000000d R15: 00007fff3b39c1e0 [ 242.837858] </TASK> [ 242.837865] INFO: task Xwayland:cs0:1517 blocked for more than 120 seconds. [ 242.837869] Tainted: G OE 6.12.0-rc2rebased-oct-24+ #4 [ 242.837872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.837874] task:Xwayland:cs0 state:D stack:0 pid:1517 tgid:1338 ppid:1282 flags:0x00004002 [ 242.837878] Call Trace: [ 242.837880] <TASK> [ 242.837883] __schedule+0x3e0/0xb10 [ 242.837890] schedule+0x31/0x120 [ 242.837894] schedule_preempt_disabled+0x1c/0x30 [ 242.837897] __mutex_lock.constprop.0+0x386/0x6e0 [ 242.837902] ? srso_return_thunk+0x5/0x5f [ 242.837905] ? __timer_delete_sync+0x81/0xe0 [ 242.837911] __mutex_lock_slowpath+0x13/0x20 [ 242.837915] mutex_lock+0x3b/0x50 [ 242.837919] amdgpu_userqueue_ensure_ev_fence+0x35/0xb0 [amdgpu] [ 242.838138] amdgpu_userq_signal_ioctl+0x959/0xec0 [amdgpu] [ 242.838340] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ 242.838531] drm_ioctl_kernel+0xae/0x100 [drm] [ 242.838559] drm_ioctl+0x2a1/0x500 [drm] [ 242.838580] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ 242.838778] ? srso_return_thunk+0x5/0x5f [ 242.838783] ? srso_return_thunk+0x5/0x5f [ 242.838786] ? _raw_spin_unlock_irqrestore+0x2b/0x50 [ 242.838791] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu] [ 242.838967] __x64_sys_ioctl+0x99/0xd0 [ 242.838972] x64_sys_call+0x1209/0x20d0 [ 242.838975] do_syscall_64+0x51/0x120 [ 242.838979] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 242.838982] RIP: 0033:0x7f9118b1a94f [ 242.838985] RSP: 002b:00007f910cdff760 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 242.838989] RAX: ffffffffffffffda RBX: 00007f910cdff910 RCX: 00007f9118b1a94f [ 242.838991] RDX: 00007f910cdff910 RSI: 00000000c0306457 RDI: 000000000000000c [ 242.838993] RBP: 00007f910cdff7e0 R08: 0000000000000000 R09: 0000000000000001 [ 242.838995] R10: 00007f910cdff9d4 R11: 0000000000000246 R12: 00007f910ce00640 [ 242.838997] R13: 00000000c0306457 R14: 000000000000000c R15: 00007fff9dd11d10 [ 242.839004] </TASK> v2: Addressed review comemnts from Christian. v3/v4: Addressed review comemnts from Christian. - Move drm_exec drm_exec loop after userq fence create. - cleanup the newly created userq fence in case of error. v5 - Addressed review comemnts from Christian. - Create a new amdgpu_userq_fence_alloc() function for allocation. - Calling dma_fence_put for cleanup procedure. - make amdgpu_userq_fence_create() function static. - drm_exec_init is called after mutex_unlock. Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: Shashank Sharma <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add db size and offset range for VCN and VPESaleemkhan Jamadar2025-04-081-1/+23
| | | | | | | | | | | | VCN and VPE have different offset range, update the doorbell offset range repsectively. Doorbell size for VCN and VPE is 32bit. v1 : add gfx switch case and fix checkpatch warnings (Shashank) Signed-off-by: Saleemkhan Jamadar <[email protected]> Reviewed-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: map doorbell for the requested userqSaleemkhan Jamadar2025-04-081-11/+16
| | | | | | | | | | | | | | | Introduce db_info structure to the populate the doorbell information that is required to be mapped. Made changes to the doorbell mapping func more generic, by taking parameters that vary based on IPs and/or usecase into db_info structure. v2 - Fix space alignment and checkpatch warnings(Shashank) Signed-off-by: Saleemkhan Jamadar <[email protected]> Reviewed-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Fix the use-after-free issue in wait IOCTLArunpravin Paneer Selvam2025-04-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xarray pointer which has the userqueue xarray structure reference should be cleared when the userqueue gets destroyed. Otherwise, we may access the freed xa memory and see the below warnings. warning 1: BUG: KASAN: slab-use-after-free in _raw_spin_lock+0x7a/0xe0 [ +0.000044] Call Trace: [ +0.000017] <TASK> [ +0.000016] dump_stack_lvl+0x6c/0x90 [ +0.000025] print_report+0xc4/0x5e0 [ +0.000025] ? srso_return_thunk+0x5/0x5f [ +0.000024] ? kasan_complete_mode_report_info+0x60/0x1d0 [ +0.000030] ? _raw_spin_lock+0x7a/0xe0 [ +0.000023] kasan_report+0xdf/0x120 [ +0.000023] ? _raw_spin_lock+0x7a/0xe0 [ +0.000025] kasan_check_range+0xf7/0x1b0 [ +0.000025] __kasan_check_write+0x14/0x20 [ +0.000024] _raw_spin_lock+0x7a/0xe0 [ +0.000023] ? __pfx__raw_spin_lock+0x10/0x10 [ +0.000024] ? amdgpu_userq_wait_ioctl+0xac0/0x1f30 [amdgpu] [ +0.000442] amdgpu_userq_wait_ioctl+0x18fc/0x1f30 [amdgpu] [ +0.000428] ? __pfx_amdgpu_userq_wait_ioctl+0x10/0x10 [amdgpu] [ +0.000424] ? __pfx_idr_alloc_u32+0x10/0x10 [ +0.000027] ? srso_return_thunk+0x5/0x5f [ +0.000024] ? __kasan_check_write+0x14/0x20 [ +0.000025] ? srso_return_thunk+0x5/0x5f [ +0.000024] ? idr_alloc+0x72/0xc0 [ +0.000023] ? srso_return_thunk+0x5/0x5f [ +0.000023] ? fput+0x1c/0x2f0 [ +0.000025] drm_ioctl_kernel+0x178/0x2f0 [drm] [ +0.000065] ? __pfx_amdgpu_userq_wait_ioctl+0x10/0x10 [amdgpu] [ +0.000425] ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm] [ +0.000064] ? srso_return_thunk+0x5/0x5f [ +0.000023] ? __kasan_check_write+0x14/0x20 [ +0.000025] drm_ioctl+0x513/0xd20 [drm] [ +0.000058] ? __pfx_amdgpu_userq_wait_ioctl+0x10/0x10 [amdgpu] [ +0.000428] ? __pfx_drm_ioctl+0x10/0x10 [drm] [ +0.000061] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.000027] ? __count_memcg_events+0x11f/0x3a0 [ +0.000027] ? srso_return_thunk+0x5/0x5f [ +0.001040] ? srso_return_thunk+0x5/0x5f [ +0.000969] ? _raw_spin_unlock_irqrestore+0x27/0x50 [ +0.000966] amdgpu_drm_ioctl+0xcd/0x1d0 [amdgpu] [ +0.001352] __x64_sys_ioctl+0x135/0x1b0 [ +0.000966] x64_sys_call+0x1205/0x20d0 [ +0.000968] do_syscall_64+0x4d/0x120 [ +0.000960] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000962] RIP: 0033:0x7f42af11a94f warning 2: WARNING: at lib/xarray.c:1849 __xa_alloc+0x13a/0x150 [ 366.491409] RIP: 0010:__xa_alloc+0x13a/0x150 [ 366.491434] Call Trace: [ 366.491437] <TASK> [ 366.491440] ? show_regs+0x6d/0x80 [ 366.491445] ? __warn+0x91/0x140 [ 366.491450] ? __xa_alloc+0x13a/0x150 [ 366.491453] ? report_bug+0x1c9/0x1e0 [ 366.491459] ? handle_bug+0x63/0xa0 [ 366.491463] ? exc_invalid_op+0x1d/0x80 [ 366.491467] ? asm_exc_invalid_op+0x1f/0x30 [ 366.491476] ? __xa_alloc+0x13a/0x150 [ 366.491484] amdgpu_userq_wait_ioctl+0xe0e/0xfe0 [amdgpu] [ 366.491743] ? idr_alloc_u32+0x97/0xd0 [ 366.491749] ? __pfx_amdgpu_userq_wait_ioctl+0x10/0x10 [amdgpu] [ 366.491912] drm_ioctl_kernel+0xae/0x100 [drm] [ 366.491942] drm_ioctl+0x2a1/0x500 [drm] [ 366.491961] ? __pfx_amdgpu_userq_wait_ioctl+0x10/0x10 [amdgpu] [ 366.492127] ? srso_return_thunk+0x5/0x5f [ 366.492132] ? srso_return_thunk+0x5/0x5f [ 366.492135] ? _raw_spin_unlock_irqrestore+0x2b/0x50 [ 366.492139] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu] [ 366.492288] __x64_sys_ioctl+0x99/0xd0 [ 366.492295] x64_sys_call+0x1209/0x20d0 [ 366.492299] do_syscall_64+0x51/0x120 [ 366.492303] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 366.492418] RIP: 0033:0x7f86f3b1a94f Signed-off-by: Arunpravin Paneer Selvam <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: simplify eviction fence suspend/resumeShashank Sharma2025-04-081-29/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The basic idea in this redesign is to add an eviction fence only in UQ resume path. When userqueue is not present, keep ev_fence as NULL Main changes are: - do not create the eviction fence during evf_mgr_init, keeping evf_mgr->ev_fence=NULL until UQ get active. - do not replace the ev_fence in evf_resume path, but replace it only in uq_resume path, so remove all the unnecessary code from ev_fence_resume. - add a new helper function (amdgpu_userqueue_ensure_ev_fence) which will do the following: - flush any pending uq_resume work, so that it could create an eviction_fence - if there is no pending uq_resume_work, add a uq_resume work and wait for it to execute so that we always have a valid ev_fence - call this helper function from two places, to ensure we have a valid ev_fence: - when a new uq is created - when a new uq completion fence is created v2: Worked on review comments by Christian. v3: Addressed few more review comments by Christian. v4: Move mutex lock outside of the amdgpu_userqueue_suspend() function (Christian). v5: squash in build fix (Alex) Cc: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: handle eviction fence raceShashank Sharma2025-04-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | The eviction process can get into a race condition between the eviction fence suspend work (which replaces the old fence with new) and kms_close (which destroys the fence and doesn't expect a new one). This patch: - adds a flag to indicate that fd is closing, so fence replacement is not required (evf_mgr->fd_closing) - adds a flush_work() during the ev_fence_destroy routine V2: Addressed review comments from Christian: - Do not use mutex to sync - Use flush_work and wait for suspend_work to be done V3: Fixed state machine for queue->active, which adds into race between suspend/resume and queue ops Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: resume gfx userqueuesShashank Sharma2025-04-081-5/+178
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for userqueue resume. What it typically does is this: - adds a new delayed work for resuming all the queues. - schedules this delayed work from the suspend work. - validates the BOs and replaces the eviction fence before resuming all the queues running under this instance of userq manager. V2: Addressed Christian's review comments: - declare local variables like ret at the bottom. - lock all the object first, then start attaching the new fence. - dont replace old eviction fence, just attach new eviction fence. - no error logs for drm_exec_lock failures - no need to reserve bos after drm_exec_locked - schedule the resume worker immediately (not after 100 ms) - check for NULL BO (Arvind) V5: Rebased wrt changes in suspend patch - moved amdgpu_userqueue_validate_vm_bo in this patch - initialized ret in resume_all V6: Rebase V7: Addressed review comments from Christian - Do not use list_for_each_safe() with vm->invalidated, its not correct way V8: Fixed the race condition between suspend/close/fence Cc: Alex Deucher <[email protected]> Cc: Christian Koenig <[email protected]> Acked-by: Christian Koenig <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: suspend gfx userqueuesShashank Sharma2025-04-081-0/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds suspend support for gfx userqueues. It typically does the following: - adds an enable_signaling function for the eviction fence, so that it can trigger the userqueue suspend, - adds a delayed work to handle suspending of the eviction_fence - adds a suspend function to handle suspending of userqueues which suspends all the queues under this userq manager and signals the eviction fence, - adds a function to replace the old eviction fence with a new one and attach it to each of the objects, - adds reference of userq manager in the eviction fence container so that it can be used in the suspend function. V2: Addressed Christian's review comments: - schedule suspend work immediately V4: Addressed Christian's review comments: - wait for pending uq fences before starting suspend, added queue->last_fence for the same - accommodate ev_fence_mgr into existing code - some bug fixes and NULL checks V5: Addressed Christian's review comments (gitlab) - Wait for eviction fence to get signaled in destroy, don't signal it - Wait for eviction fence to get signaled in replace fence, don't signal it V6: Addressed Christian's review comments - Do not destroy the old eviction fence until we have it replaced - Change the sequence of fence replacement sub-tasks - reusing the ev_fence delayed work for userqueue suspend as well (Shashank). V7: Addressed Christian's review comments - give evf_mgr as argument (instead of fpriv) to replace_fence() - save ptr to evf_mgr in ev_fence (instead of uq_mgr) - modify suspend_all_queues logic to reflect error properly - remove the garbage drm_exec_lock section in wait_for_signal - grab the userqueue mutex before starting the wait for fence - remove the unrelated gobj check from signal_ioctl V8: Added race condition fixes Cc: Alex Deucher <[email protected]> Cc: Christian Koenig <[email protected]> Acked-by: Christian Koenig <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: fix userqueue UAPI commentsShashank Sharma2025-04-081-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes some of the pending UAPI review comments from the libDRM/UAPI review process. - It updates some outdated comments in the userqueue UAPI header highlighted during the libdrm UAPI review. - It removes the GDS BO support which was found unused. - It also removes the unused flags parameter from the UAPI. - It also adds a padding variables in userqueue in/out structures. (Pierre-Eric and Marek) - clarify comments on top of drm_amdgpu_userq_in - clarify comment for queue_id (in) - clarify comment for mqd - clarify comment for compute MQD size - clarify comment for queue_id (out) - remove GDB object from BO object list - remove the unused flags parameter Cc: Alex Deucher <[email protected]> Cc: Christian Koenig <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Few optimization and fixes for userq fence driverArunpravin Paneer Selvam2025-04-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | Few optimization and fixes for userq fence driver. v1:(Christian): - Remove unnecessary comments. - In drm_exec_init call give num_bo_handles as last parameter it would making allocation of the array more efficient - Handle return value of __xa_store() and improve the error handling of amdgpu_userq_fence_driver_alloc(). v2:(Christian): - Revert userq_xa xarray init to XA_FLAGS_LOCK_IRQ. - move the xa_unlock before the error check of the call xa_err(__xa_store()) and moved this change to a separate patch as this is adding a missing error handling. - Removed the unnecessary comments. Signed-off-by: Arunpravin Paneer Selvam <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Implement userqueue signal/wait IOCTLArunpravin Paneer Selvam2025-04-081-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces new IOCTL for userqueue secure semaphore. The signal IOCTL called from userspace application creates a drm syncobj and array of bo GEM handles and passed in as parameter to the driver to install the fence into it. The wait IOCTL gets an array of drm syncobjs, finds the fences attached to the drm syncobjs and obtain the array of memory_address/fence_value combintion which are returned to userspace. v2: (Christian) - Install fence into GEM BO object. - Lock all BO's using the dma resv subsystem - Reorder the sequence in signal IOCTL function. - Get write pointer from the shadow wptr - use userq_fence to fetch the va/value in wait IOCTL. v3: (Christian) - Use drm_exec helper for the proper BO drm reserve and avoid BO lock/unlock issues. - fence/fence driver reference count logic for signal/wait IOCTLs. v4: (Christian) - Fixed the drm_exec calling sequence - use dma_resv_for_each_fence_unlock if BO's are not locked - Modified the fence_info array storing logic. v5: (Christian) - Keep fence_drv until wait queue execution. - Add dma_fence_wait for other fences. - Lock BO's using drm_exec as the number of fences in them could change. - Install signaled fences as well into BO/Syncobj. - Move Syncobj fence installation code after the drm_exec_prepare_array. - Directly add dma_resv_usage_rw(args->bo_flags.... - remove unnecessary dma_fence_put. v6: (Christian) - Add xarray stuff to store the fence_drv - Implement a function to iterate over the xarray and drop the fence_drv references. - Add drm_exec_until_all_locked() wrapper - Add a check that if we haven't exceeded the user allocated num_fences before adding dma_fence to the fences array. v7: (Christian) - Use memdup_user() for kmalloc_array + copy_from_user - Move the fence_drv references from the xarray into the newly created fence and drop the fence_drv references when we signal this fence. - Move this locking of BOs before the "if (!wait_info->num_fences)", this way you need this code block only once. - Merge the error handling code and the cleanup + return 0 code. - Initializing the xa should probably be done in the userq code. - Remove the userq back pointer stored in fence_drv. - Pass xarray as parameter in amdgpu_userq_walk_and_drop_fence_drv() v8: (Christian) - Move fence_drv references must come before adding the fence to the list. - Use xa_lock_irqsave_nested for nested spinlock operations. - userq_mgr should be per fpriv and not one per device. - Restructure the interrupt process code for the early exit of the loop. - The reference acquired in the syncobj fence replace code needs to be kept around. - Modify the dma_fence acquire placement in wait IOCTL. - Move USERQ_BO_WRITE flag to UAPI header file. - drop the fence drv reference after telling the hw to stop accessing it. - Add multi sync object support to userq signal IOCTL. V9: (Christian) - Store all the fence_drv ref to other drivers and not ourself. - Remove the userq fence xa implementation and replace with kvmalloc_array. v10: (Christian) - Add a comment for the userq_xa xarray - drop the if check of userq_fence->fence_drv_array - use the i variable to initialize userq_fence->fence_drv_array_count - drop the fence reference before you free the array in the error handling, otherwise it could be that some references leaked. Signed-off-by: Arunpravin Paneer Selvam <[email protected]> Suggested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Implement a new userqueue fence driverArunpravin Paneer Selvam2025-04-081-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Developed a userqueue fence driver for the userqueue process shared BO synchronization. Create a dma fence having write pointer as the seqno and allocate a seq64 memory for each user queue process and feed this memory address into the firmware/hardware, thus the firmware writes the read pointer into the given address when the process completes it execution. Compare wptr and rptr, if rptr >= wptr, signal the fences for the waiting process to consume the buffers. v2: Worked on review comments from Christian for the following modifications - Add wptr as sequence number into the fence - Add a reference count for the fence driver - Add dma_fence_put below the list_del as it might frees the userq fence. - Trim unnecessary code in interrupt handler. - Check dma fence signaled state in dma fence creation function for a potential problem of hardware completing the job processing beforehand. - Add necessary locks. - Create a list and process all the unsignaled fences. - clean up fences in destroy function. - implement .signaled callback function v3: Worked on review comments from Christian - Modify naming convention for reference counted objects - Fix fence driver reference drop issue - Drop amdgpu_userq_fence_driver_process() function return value v4: Worked on review comments from Christian - Moved fence driver allocation into amdgpu_userq_fence_driver_alloc() - Added detail doc mentioning the differences b/w two spinlocks declared. v5: Worked on review comments from Christian - Check before upcast and remove local variable - Add error handling in fence_drv alloc function. - Move rptr read fn outside of the loop and remove WARN_ON in destroy function. v6: - clear the seq64 memory in user fence driver(Christian) - fix for the wptr va bo mapping(Christian) - move the fence_drv xa entry erase code from the interrupt handler into user fence destroy function Signed-off-by: Arunpravin Paneer Selvam <[email protected]> Reviewed-by: Christian König <[email protected]> Suggested-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add kernel config for gfx-userqueueShashank Sharma2025-04-081-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | This patch: - adds a kernel config option "CONFIG_DRM_AMDGPU_NAVI3X_USERQ" - moves the usequeue initialization code for all IPs under this flag - cover the core userqueue functions under this config - adds stub function for userqueue ioctl. so that the userqueue works only when the config is enabled. V9: Introduce this patch V10: Call it CONFIG_DRM_AMDGPU_NAVI3X_USERQ instead of CONFIG_DRM_AMDGPU_USERQ_GFX (Christian) V11: Add GFX in the config help description message. V12: Add depends on BROKEN for this config, remove this when the rest of the code is available. Cc: Alex Deucher <[email protected]> Cc: Christian Koenig <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: enable compute/gfx usermode queueShashank Sharma2025-04-081-1/+3
| | | | | | | | | | | | | | | | | | This patch does the necessary changes required to enable compute workload support using the existing usermode queues infrastructure. V9: Patch introduced V10: Add custom IP specific mqd strcuture for compute (Alex) V11: Rename drm_amdgpu_userq_mqd_compute_gfx_v11 to drm_amdgpu_userq_mqd_compute_gfx11 (Marek) Cc: Alex Deucher <[email protected]> Cc: Christian Koenig <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: enable SDMA usermode queuesArvind Yadav2025-04-081-1/+1
| | | | | | | | | | | | | | | | | | This patch does necessary modifications to enable the SDMA usermode queues using the existing userqueue infrastructure. V9: introduced this patch in the series V10: use header file instead of extern (Alex) V11: rename drm_amdgpu_userq_mqd_sdma_gfx_v11 to drm_amdgpu_userq_mqd_sdma_gfx11 (Marek) Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: enable GFX-V11 userqueue supportShashank Sharma2025-04-081-0/+6
| | | | | | | | | | | | | | | | | | | | | | This patch enables GFX-v11 IP support in the usermode queue base code. It typically: - adds a GFX_v11 specific MQD structure - sets IP functions to create and destroy MQDs - sets MQD objects coming from userspace V10: introduced this spearate patch for GFX V11 enabling (Alex). V11: Addressed review comments: - update the comments in GFX mqd structure informing user about using the INFO IOCTL for object sizes (Alex) - rename struct drm_amdgpu_userq_mqd_gfx_v11 to drm_amdgpu_userq_mqd_gfx11 (Marek) Cc: Alex Deucher <[email protected]> Cc: Christian Koenig <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: cleanup leftover queuesShashank Sharma2025-04-081-7/+20
| | | | | | | | | | | | | | | | | | | | This patch adds code to cleanup any leftover userqueues which a user might have missed to destroy due to a crash or any other programming error. V7: Added Alex's R-B V8: Rebase V9: Rebase V10: Rebase V11: Rebase Cc: Alex Deucher <[email protected]> Cc: Christian Koenig <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Suggested-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: generate doorbell index for userqueueShashank Sharma2025-04-081-0/+59
| | | | | | | | | | | | | | | | | | | | | | | | The userspace sends us the doorbell object and the relative doobell index in the object to be used for the usermode queue, but the FW expects the absolute doorbell index on the PCI BAR in the MQD. This patch adds a function to convert this relative doorbell index to absolute doorbell index. V5: Fix the db object reference leak (Christian) V6: Pin the doorbell bo in userqueue_create() function, and unpin it in userqueue destoy (Christian) V7: Added missing kfree for queue in error cases Added Alex's R-B V8: Rebase V9: Changed the function names from gfx_v11* to mes_v11* V10: Rebase V11: Rebase Cc: Alex Deucher <[email protected]> Cc: Christian Koenig <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add helpers to create userqueue objectShashank Sharma2025-04-081-0/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces amdgpu_userqueue_object and its helper functions to creates and destroy this object. The helper functions creates/destroys a base amdgpu_bo, kmap/unmap it and save the respective GPU and CPU addresses in the encapsulating userqueue object. These helpers will be used to create/destroy userqueue MQD, WPTR and FW areas. V7: - Forked out this new patch from V11-gfx-userqueue patch to prevent that patch from growing very big. - Using amdgpu_bo_create instead of amdgpu_bo_create_kernel in prep for eviction fences (Christian) V9: - Rebase V10: - Added Alex's R-B Cc: Alex Deucher <[email protected]> Cc: Christian Koenig <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add new IOCTL for usermode queueShashank Sharma2025-04-081-0/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds: - A new IOCTL function to create and destroy - A new structure to keep all the user queue data in one place. - A function to generate unique index for the queue. V1: Worked on review comments from RFC patch series: - Alex: Keep a list of queues, instead of single queue per process. - Christian: Use the queue manager instead of global ptrs, Don't keep the queue structure in amdgpu_ctx V2: Worked on review comments: - Christian: - Formatting of text - There is no need for queuing of userqueues, with idr in place - Alex: - Remove use_doorbell, its unnecessary - Reuse amdgpu_mqd_props for saving mqd fields - Code formatting and re-arrangement V3: - Integration with doorbell manager V4: - Accommodate MQD union related changes in UAPI (Alex) - Do not set the queue size twice (Bas) V5: - Remove wrapper functions for queue indexing (Christian) - Do not save the queue id/idr in queue itself (Christian) - Move the idr allocation in the IP independent generic space (Christian) V6: - Check the validity of input IP type (Christian) V7: - Move uq_func from uq_mgr to adev (Alex) - Add missing free(queue) for error cases (Yifan) V9: - Rebase V10: Addressed review comments from Christian, and added R-B: - Do not initialize the local variable - Convert DRM_ERROR to DEBUG. V11: - check the input flags to be zero (Alex) Cc: Alex Deucher <[email protected]> Cc: Christian Koenig <[email protected]> Reviewed-by: Christian Koenig <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add usermode queue base codeShashank Sharma2025-04-081-0/+40
This patch adds IP independent skeleton code for amdgpu usermode queue. It contains: - A new files with init functions of usermode queues. - A queue context manager in driver private data. V1: Worked on design review comments from RFC patch series: (https://patchwork.freedesktop.org/series/112214/) - Alex: Keep a list of queues, instead of single queue per process. - Christian: Use the queue manager instead of global ptrs, Don't keep the queue structure in amdgpu_ctx V2: - Reformatted code, split the big patch into two V3: - Integration with doorbell manager V4: - Align the structure member names to the largest member's column (Luben) - Added SPDX license (Luben) V5: - Do not add amdgpu.h in amdgpu_userqueue.h (Christian). - Move struct amdgpu_userq_mgr into amdgpu_userqueue.h (Christian). V6: Rebase V9: Rebase V10: Rebase + Alex's R-B Cc: Alex Deucher <[email protected]> Cc: Christian Koenig <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>