aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/scheduler/sched_entity.c
Commit message (Collapse)AuthorAgeFilesLines
...
* | drm/scheduler: fix fence ref countingChristian König2022-10-051-1/+5
| | | | | | | | | | | | | | | | | | | | We leaked dependency fences when processes were beeing killed. Additional to that grab a reference to the last scheduled fence. Signed-off-by: Christian König <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* | drm/sched: Add FIFO sched policy to run queueAndrey Grodzovsky2022-09-301-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When many entities are competing for the same run queue on the same scheduler, we observe an unusually long wait times and some jobs get starved. This has been observed on GPUVis. The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queues within entity some jobs could be stuck for very long time in it's entity's queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben) v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben) v6: Add missing drm_sched_rq_remove_fifo_locked v7: Fix ts sampling bug and more cosmetic stuff (Luben) v8: Fix module parameter string (Luben) Cc: Luben Tuikov <[email protected]> Cc: Christian König <[email protected]> Cc: Direct Rendering Infrastructure - Development <[email protected]> Cc: AMD Graphics <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Tested-by: Yunxiang Li (Teddy) <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* | drm/scheduler: Don't kill jobs in interrupt contextDmitry Osipenko2022-05-171-3/+3
|/ | | | | | | | | | | | | | Interrupt context can't sleep. Drivers like Panfrost and MSM are taking mutex when job is released, and thus, that code can sleep. This results into "BUG: scheduling while atomic" if locks are contented while job is freed. There is no good reason for releasing scheduler's jobs in IRQ context, hence use normal context to fix the trouble. Cc: [email protected] Fixes: 542cff7893a3 ("drm/sched: Avoid lockdep spalt on killing a processes") Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* drm/sched: Avoid lockdep spalt on killing a processesAndrey Grodzovsky2021-11-011-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Probelm: Singlaning one sched fence from within another's sched fence singal callback generates lockdep splat because the both have same lockdep class of their fence->lock Fix: Fix bellow stack by rescheduling to irq work of signaling and killing of jobs that left when entity is killed. [11176.741181] dump_stack+0x10/0x12 [11176.741186] __lock_acquire.cold+0x208/0x2df [11176.741197] lock_acquire+0xc6/0x2d0 [11176.741204] ? dma_fence_signal+0x28/0x80 [11176.741212] _raw_spin_lock_irqsave+0x4d/0x70 [11176.741219] ? dma_fence_signal+0x28/0x80 [11176.741225] dma_fence_signal+0x28/0x80 [11176.741230] drm_sched_fence_finished+0x12/0x20 [gpu_sched] [11176.741240] drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched] [11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0 [11176.741254] dma_fence_signal+0x3b/0x80 [11176.741260] drm_sched_fence_finished+0x12/0x20 [gpu_sched] [11176.741268] drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched] [11176.741277] drm_sched_job_done_cb+0x12/0x20 [gpu_sched] [11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0 [11176.741290] dma_fence_signal+0x3b/0x80 [11176.741296] amdgpu_fence_process+0xd1/0x140 [amdgpu] [11176.741504] sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu] [11176.741731] amdgpu_irq_dispatch+0xce/0x250 [amdgpu] [11176.741954] amdgpu_ih_process+0x81/0x100 [amdgpu] [11176.742174] amdgpu_irq_handler+0x26/0xa0 [amdgpu] [11176.742393] __handle_irq_event_percpu+0x4f/0x2c0 [11176.742402] handle_irq_event_percpu+0x33/0x80 [11176.742408] handle_irq_event+0x39/0x60 [11176.742414] handle_edge_irq+0x93/0x1d0 [11176.742419] __common_interrupt+0x50/0xe0 [11176.742426] common_interrupt+0x80/0x90 Signed-off-by: Andrey Grodzovsky <[email protected]> Suggested-by: Daniel Vetter <[email protected]> Suggested-by: Christian König <[email protected]> Tested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/dri-devel/msg321250.html
* drm/sched: improve docs around drm_sched_entityDaniel Vetter2021-08-301-60/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found a few too many things that are tricky and not documented, so I started typing. I found a few more things that looked broken while typing, see the varios FIXME in drm_sched_entity. Also some of the usual logics: - actually include sched_entity.c declarations, that was lost in the move here: 620e762f9a98 ("drm/scheduler: move entity handling into separate file") - Ditch the kerneldoc for internal functions, keep the comments where they're describing more than what the function name already implies. - Switch drm_sched_entity to inline docs. Acked-by: Melissa Wen <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> (v1) Signed-off-by: Daniel Vetter <[email protected]> Cc: Lucas Stach <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: "Christian König" <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Steven Price <[email protected]> Cc: Emma Anholt <[email protected]> Cc: Lee Jones <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* drm/sched: drop entity parameter from drm_sched_push_jobDaniel Vetter2021-08-301-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally a job was only bound to the queue when we pushed this, but now that's done in drm_sched_job_init, making that parameter entirely redundant. Remove it. The same applies to the context parameter in lima_sched_context_queue_task, simplify that too. v2: Rebase on top of msm adopting drm/sched Reviewed-by: Christian König <[email protected]> Acked-by: Emma Anholt <[email protected]> Acked-by: Melissa Wen <[email protected]> Reviewed-by: Steven Price <[email protected]> (v1) Reviewed-by: Boris Brezillon <[email protected]> (v1) Signed-off-by: Daniel Vetter <[email protected]> Cc: Lucas Stach <[email protected]> Cc: Russell King <[email protected]> Cc: Christian Gmeiner <[email protected]> Cc: Qiang Yu <[email protected]> Cc: Rob Herring <[email protected]> Cc: Tomeu Vizoso <[email protected]> Cc: Steven Price <[email protected]> Cc: Alyssa Rosenzweig <[email protected]> Cc: Emma Anholt <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: "Christian König" <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Nirmoy Das <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Chen Li <[email protected]> Cc: Lee Jones <[email protected]> Cc: Deepak R Varma <[email protected]> Cc: Kevin Wang <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: "Marek Olšák" <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Dennis Li <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Rob Clark <[email protected]> Cc: Sean Paul <[email protected]> Cc: Melissa Wen <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* drm/sched: Add dependency trackingDaniel Vetter2021-08-301-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of just a callback we can just glue in the gem helpers that panfrost, v3d and lima currently use. There's really not that many ways to skin this cat. v2/3: Rebased. v4: Repaint this shed. The functions are now called _add_dependency() and _add_implicit_dependency() Reviewed-by: Christian König <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> (v3) Reviewed-by: Steven Price <[email protected]> (v1) Acked-by: Melissa Wen <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: "Christian König" <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Lee Jones <[email protected]> Cc: Nirmoy Das <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* drm/sched: Barriers are needed for entity->last_scheduledDaniel Vetter2021-08-301-2/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It might be good enough on x86 with just READ_ONCE, but the write side should then at least be WRITE_ONCE because x86 has total store order. It's definitely not enough on arm. Fix this proplery, which means - explain the need for the barrier in both places - point at the other side in each comment Also pull out the !sched_list case as the first check, so that the code flow is clearer. While at it sprinkle some comments around because it was very non-obvious to me what's actually going on here and why. Note that we really need full barriers here, at first I thought store-release and load-acquire on ->last_scheduled would be enough, but we actually requiring ordering between that and the queue state. v2: Put smp_rmp() in the right place and fix up comment (Andrey) Reviewed-by: Christian König <[email protected]> Acked-by: Melissa Wen <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Cc: "Christian König" <[email protected]> Cc: Steven Price <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Lee Jones <[email protected]> Cc: Boris Brezillon <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* drm/sched: Split drm_sched_job_initDaniel Vetter2021-08-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a very confusingly named function, because not just does it init an object, it arms it and provides a point of no return for pushing a job into the scheduler. It would be nice if that's a bit clearer in the interface. But the real reason is that I want to push the dependency tracking helpers into the scheduler code, and that means drm_sched_job_init must be called a lot earlier, without arming the job. v2: - don't change .gitignore (Steven) - don't forget v3d (Emma) v3: Emma noticed that I leak the memory allocated in drm_sched_job_init if we bail out before the point of no return in subsequent driver patches. To be able to fix this change drm_sched_job_cleanup() so it can handle being called both before and after drm_sched_job_arm(). Also improve the kerneldoc for this. v4: - Fix the drm_sched_job_cleanup logic, I inverted the booleans, as usual (Melissa) - Christian pointed out that drm_sched_entity_select_rq() also needs to be moved into drm_sched_job_arm, which made me realize that the job->id definitely needs to be moved too. Shuffle things to fit between job_init and job_arm. v5: Reshuffle the split between init/arm once more, amdgpu abuses drm_sched.ready to signal gpu reset failures. Also document this somewhat. (Christian) v6: Rebase on top of the msm drm/sched support. Note that the drm_sched_job_init() call is completely misplaced, and hence also the split-out drm_sched_entity_push_job(). I've put in a FIXME which the next patch will address. v7: Drop the FIXME in msm, after discussions with Rob I agree it shouldn't be a problem where it is now. Acked-by: Christian König <[email protected]> Acked-by: Melissa Wen <[email protected]> Cc: Melissa Wen <[email protected]> Acked-by: Emma Anholt <[email protected]> Acked-by: Steven Price <[email protected]> (v2) Reviewed-by: Boris Brezillon <[email protected]> (v5) Signed-off-by: Daniel Vetter <[email protected]> Cc: Lucas Stach <[email protected]> Cc: Russell King <[email protected]> Cc: Christian Gmeiner <[email protected]> Cc: Qiang Yu <[email protected]> Cc: Rob Herring <[email protected]> Cc: Tomeu Vizoso <[email protected]> Cc: Steven Price <[email protected]> Cc: Alyssa Rosenzweig <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: "Christian König" <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Kees Cook <[email protected]> Cc: Adam Borowski <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Paul Menzel <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Nirmoy Das <[email protected]> Cc: Deepak R Varma <[email protected]> Cc: Lee Jones <[email protected]> Cc: Kevin Wang <[email protected]> Cc: Chen Li <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: "Marek Olšák" <[email protected]> Cc: Dennis Li <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Sonny Jiang <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Tian Tao <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Emma Anholt <[email protected]> Cc: Rob Clark <[email protected]> Cc: Sean Paul <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* drm/sched: Avoid data corruptionsAndrey Grodzovsky2021-05-201-0/+5
| | | | | | | | | Wait for all dependencies of a job to complete before killing it to avoid data corruptions. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* drm/scheduler: Fix hang when sched_entity releasedAndrey Grodzovsky2021-05-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: If scheduler is already stopped by the time sched_entity is released and entity's job_queue not empty I encountred a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle never becomes false. Fix: In drm_sched_fini detach all sched_entities from the scheduler's run queues. This will satisfy drm_sched_entity_is_idle. Also wakeup all those processes stuck in sched_entity flushing as the scheduler main thread which wakes them up is stopped by now. v2: Reverse order of drm_sched_rq_remove_entity and marking s_entity as stopped to prevent reinserion back to rq due to race. v3: Drop drm_sched_rq_remove_entity, only modify entity->stopped and check for it in drm_sched_entity_is_idle Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* drm/scheduler/sched_entity: Fix some function name disparityLee Jones2021-04-221-3/+3
| | | | | | | | | | | | | | | | | | | | Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/scheduler/sched_entity.c:204: warning: expecting prototype for drm_sched_entity_kill_jobs(). Prototype was for drm_sched_entity_kill_jobs_cb() instead drivers/gpu/drm/scheduler/sched_entity.c:262: warning: expecting prototype for drm_sched_entity_cleanup(). Prototype was for drm_sched_entity_fini() instead drivers/gpu/drm/scheduler/sched_entity.c:305: warning: expecting prototype for drm_sched_entity_fini(). Prototype was for drm_sched_entity_destroy() instead Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: "Christian König" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Lee Jones <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Christian König <[email protected]> Signed-off-by: Christian König <[email protected]>
* drm/sched: select new rq even if there is only one v3Christian König2021-03-081-2/+4
| | | | | | | | | | | This is necessary when changing priorities of an entity. v2: test the sched_list instead of num_sched. v3: set the sched_list to NULL when there is only one entry Signed-off-by: Christian König <[email protected]> Reviewed-by: Sonny Jiang <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* drm/scheduler: provide scheduler score externallyChristian König2021-02-051-1/+1
| | | | | | | | | | Allow multiple schedulers to share the load balancing score. This is useful when one engine has different hw rings. Signed-off-by: Christian König <[email protected]> Reviewed-and-Tested-by: Leo Liu <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* gpu: drm: scheduler: sched_entity: Demote non-conformant kernel-doc headersLee Jones2020-11-131-2/+2
| | | | | | | | | | | | | | | | | | | | | Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/scheduler/sched_entity.c:316: warning: Function parameter or member 'f' not described in 'drm_sched_entity_clear_dep' drivers/gpu/drm/scheduler/sched_entity.c:316: warning: Function parameter or member 'cb' not described in 'drm_sched_entity_clear_dep' drivers/gpu/drm/scheduler/sched_entity.c:330: warning: Function parameter or member 'f' not described in 'drm_sched_entity_wakeup' drivers/gpu/drm/scheduler/sched_entity.c:330: warning: Function parameter or member 'cb' not described in 'drm_sched_entity_wakeup' Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: "Christian König" <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Nirmoy Das <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Lee Jones <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/sched: Avoid infinite waits in the drm_sched_entity_destroy() pathBoris Brezillon2020-10-051-0/+3
| | | | | | | | | | | | | | | If we don't initialize the entity to idle and the entity is never scheduled before being destroyed we end up with an infinite wait in the destroy path. v2: - Add Steven's R-b Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/393486/
* drm/scheduler: improve job distribution with multiple queuesNirmoy Das2020-06-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch uses score to select a new drm scheduler for better loadbalance between multiple drm schedulers instead of num_jobs. Below are test results after running amdgpu_test for ~10 times. Before this patch: sched_name num of many times it got schedule ========= ================================== sdma0 1463 sdma1 198 comp_1.0.1 280 After this patch: sched_name num of many times it got schedule ========= ================================== sdma0 925 sdma1 928 comp_1.0.1 177 comp_1.1.1 44 comp_1.2.1 43 comp_1.3.1 44 Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/373000/ Signed-off-by: Christian König <[email protected]>
* drm/sched: implement and export drm_sched_pick_bestNirmoy Das2020-03-161-33/+3
| | | | | | | | | | | Remove drm_sched_entity_get_free_sched() and use the logic of picking the least loaded drm scheduler from a drm scheduler list to implement drm_sched_pick_best(). This patch also exports drm_sched_pick_best() so that it can be utilized by other drm drivers. Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* Revert "drm/scheduler: improve job distribution with multiple queues"changzhu2020-03-161-5/+5
| | | | | | | | | | | It needs to revert this patch to avoid amdgpu_test compute hang problem on picasso. This reverts commit 56822db194232c089601728d68ed078dccb97f8b. Signed-off-by: changzhu <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: implement a function to modify sched listNirmoy Das2020-03-091-0/+18
| | | | | | | | | | | Implement drm_sched_entity_modify_sched() which modifies existing sched_list with a different one. This is going to be helpful when userspace changes priority of a ctx/entity then the driver can switch to the corresponding HW scheduler list for that priority. Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: fix doc by clarifying sched_list definitionNirmoy Das2020-01-271-1/+1
| | | | | | | | | expand sched_list definition for better understanding. Also fix a typo atleast -> at least Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: fix documentation by replacing rq_list with sched_listNirmoy Das2020-01-161-1/+1
| | | | | | | | | This also replaces old artifacts with a correct one in drm_sched_entity_init() declaration Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: improve job distribution with multiple queuesNirmoy Das2020-01-161-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch uses score based logic to select a new rq for better loadbalance between multiple rq/scheds instead of num_jobs. Below are test results after running amdgpu_test from mesa drm Before this patch: sched_name num of many times it got scheduled ========= ================================== sdma0 314 sdma1 32 comp_1.0.0 56 comp_1.0.1 0 comp_1.1.0 0 comp_1.1.1 0 comp_1.2.0 0 comp_1.2.1 0 comp_1.3.0 0 comp_1.3.1 0 After this patch: sched_name num of many times it got scheduled ========= ================================== sdma0 216 sdma1 185 comp_1.0.0 39 comp_1.0.1 9 comp_1.1.0 12 comp_1.1.1 0 comp_1.2.0 12 comp_1.2.1 0 comp_1.3.0 12 comp_1.3.1 0 Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: do not keep a copy of sched listNirmoy Das2019-12-181-15/+4
| | | | | | | | | entity should not keep copy and maintain sched list for itself. Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: rework entity creationNirmoy Das2019-12-181-45/+29
| | | | | | | | | | | | Entity currently keeps a copy of run_queue list and modify it in drm_sched_entity_set_priority(). Entities shouldn't modify run_queue list. Use drm_gpu_scheduler list instead of drm_sched_rq list in drm_sched_entity struct. In this way we can select a runqueue based on entity/ctx's priority for a drm scheduler. Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/sched: Use completion to wait for sched->thread idle v2.Andrey Grodzovsky2019-11-071-4/+8
| | | | | | | | | | | | | Removes thread park/unpark hack from drm_sched_entity_fini and by this fixes reactivation of scheduler thread while the thread is supposed to be stopped. v2: Per sched entity completion. Signed-off-by: Andrey Grodzovsky <[email protected]> Suggested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: use job count instead of peekChristian König2019-08-151-2/+2
| | | | | | | | | | | | The spsc_queue_peek function is accessing queue->head which belongs to the consumer thread and shouldn't be accessed by the producer This is fixing a rare race condition when destroying entities. Signed-off-by: Christian König <[email protected]> Acked-by: Andrey Grodzovsky <[email protected]> Reviewed-by: [email protected] Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: drop use of drmP.hSam Ravnborg2019-07-151-0/+3
| | | | | | | | | | | | | | | | | | | | | Drop use of the deprecated drmP.h header file. Fix fallout. Signed-off-by: Sam Ravnborg <[email protected]> Reviewed-by: Christian König <[email protected]> Acked-by: Emil Velikov <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Andrey Grodzovsky <[email protected]> Cc: Huang Rui <[email protected]> Cc: Eric Anholt <[email protected]> Cc: Bas Nieuwenhuizen <[email protected]> Cc: Sharat Masetty <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nayan Deshmukh <[email protected]> Cc: Sam Ravnborg <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
* Merge branch 'drm-next-5.1' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie2019-02-221-13/+26
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into drm-next Fixes for 5.1: amdgpu: - Fix missing fw declaration after dropping old CI DPM code - Fix debugfs access to registers beyond the MMIO bar size - Fix context priority handling - Add missing license on some new files - Various cleanups and bug fixes radeon: - Fix missing break in CS parser for evergreen - Various cleanups and bug fixes sched: - Fix entities with 0 run queues Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
| * drm/sched: Fix entities with 0 rqs.Bas Nieuwenhuizen2019-02-151-13/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some blocks in amdgpu can have 0 rqs. Job creation already fails with -ENOENT when entity->rq is NULL, so jobs cannot be pushed. Without a rq there is no scheduler to pop jobs, and rq selection already does the right thing with a list of length 0. So the operations we need to fix are: - Creation, do not set rq to rq_list[0] if the list can have length 0. - Do not flush any jobs when there is no rq. - On entity destruction handle the rq = NULL case. - on set_priority, do not try to change the rq if it is NULL. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* | drm/sched: Always trace the dependencies we wait on, to fix a race.Eric Anholt2019-02-081-5/+2
|/ | | | | | | | | | | | | | | | The entity->dependency can go away completely once we've called drm_sched_entity_add_dependency_cb() (if the cb is called before we get around to tracing). The tracepoint is more useful if we trace every dependency instead of just ones that get callbacks installed, anyway, so just do that. Fixes any easy-to-produce OOPS when tracing the scheduler on V3D with "perf record -a -e gpu_scheduler:.\* glxgears" and DEBUG_SLAB enabled. Signed-off-by: Eric Anholt <[email protected]> Reviewed-by: Christian König <[email protected]> Cc: [email protected] Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: Add drm_sched_job_cleanupSharat Masetty2018-11-051-1/+0
| | | | | | | | | | | | | | This patch adds a new API to clean up the scheduler job resources. This is primarliy needed in cases the job was created but was not queued to the scheduler queue. Additionally with this change, the layer which creates the scheduler job also gets to free up the job's resources and this entails moving the dma_fence_put(finished_fence) to the drivers ops free handler routines. Signed-off-by: Sharat Masetty <[email protected]> Reviewed-by: Christian König <[email protected]> Acked-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/sched: Add boolean to mark if sched is ready to work v5Andrey Grodzovsky2018-11-051-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: A particular scheduler may become unsuable (underlying HW) after some event (e.g. GPU reset). If it's later chosen by the get free sched. policy a command will fail to be submitted. Fix: Add a driver specific callback to report the sched status so rq with bad sched can be avoided in favor of working one or none in which case job init will fail. v2: Switch from driver callback to flag in scheduler. v3: rebase v4: Remove ready paramter from drm_sched_init, set uncoditionally to true once init done. v5: fix missed change in v3d in v4 (Alex) Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: Simplify spsc_queue_count check in drm_sched_entity_select_rqNathan Chancellor2018-10-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clang generates a warning when it sees a logical not followed by a conditional operator like ==, >, or <. drivers/gpu/drm/scheduler/sched_entity.c:470:6: warning: logical not is only applied to the left hand side of this comparison [-Wlogical-not-parentheses] if (!spsc_queue_count(&entity->job_queue) == 0 || ^ ~~ drivers/gpu/drm/scheduler/sched_entity.c:470:6: note: add parentheses after the '!' to evaluate the comparison first if (!spsc_queue_count(&entity->job_queue) == 0 || ^ ( ) drivers/gpu/drm/scheduler/sched_entity.c:470:6: note: add parentheses around left hand side expression to silence this warning if (!spsc_queue_count(&entity->job_queue) == 0 || ^ ( ) 1 warning generated. It assumes the author might have made a mistake in their logic: if (!a == b) -> if (!(a == b)) Sometimes that is the case; other times, it's just a super convoluted way of saying 'if (a)' when b = 0: if (!1 == 0) -> if (0 == 0) -> if (true) Alternatively: if (!1 == 0) -> if (!!1) -> if (1) Simplify this comparison so that Clang doesn't complain. Fixes: 35e160e781a0 ("drm/scheduler: change entities rq even earlier") Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: avoid redundant shifting of the entity v2Nayan Deshmukh2018-08-271-0/+3
| | | | | | | | | do not remove entity from the rq if the current rq is from the least loaded scheduler. Signed-off-by: Nayan Deshmukh <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: Add stopped flag to drm_sched_entityAndrey Grodzovsky2018-08-271-1/+11
| | | | | | | | | | The flag will prevent another thread from same process to reinsert the entity queue into scheduler's rq after it was already removewd from there by another thread during drm_sched_entity_flush. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: cleanup entity coding styleChristian König2018-08-271-57/+110
| | | | | | | | Cleanup coding style in sched_entity.c Signed-off-by: Christian König <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/scheduler: move entity handling into separate fileChristian König2018-08-271-0/+459
This is complex enough on it's own. Move it into a separate C file. Signed-off-by: Christian König <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>