| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We leaked dependency fences when processes were beeing killed.
Additional to that grab a reference to the last scheduled fence.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When many entities are competing for the same run queue
on the same scheduler, we observe an unusually long wait
times and some jobs get starved. This has been observed on GPUVis.
The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stuck for very long time in it's entity's
queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.
Fix:
Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.
v2:
Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue
Drop default option in module control parameter.
v3:
Various cosmetical fixes and minor refactoring of fifo update function. (Luben)
v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)
v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben)
v6: Add missing drm_sched_rq_remove_fifo_locked
v7: Fix ts sampling bug and more cosmetic stuff (Luben)
v8: Fix module parameter string (Luben)
Cc: Luben Tuikov <[email protected]>
Cc: Christian König <[email protected]>
Cc: Direct Rendering Infrastructure - Development <[email protected]>
Cc: AMD Graphics <[email protected]>
Signed-off-by: Andrey Grodzovsky <[email protected]>
Tested-by: Yunxiang Li (Teddy) <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Reviewed-by: Luben Tuikov <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
| |
Interrupt context can't sleep. Drivers like Panfrost and MSM are taking
mutex when job is released, and thus, that code can sleep. This results
into "BUG: scheduling while atomic" if locks are contented while job is
freed. There is no good reason for releasing scheduler's jobs in IRQ
context, hence use normal context to fix the trouble.
Cc: [email protected]
Fixes: 542cff7893a3 ("drm/sched: Avoid lockdep spalt on killing a processes")
Signed-off-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Andrey Grodzovsky <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Probelm:
Singlaning one sched fence from within another's sched
fence singal callback generates lockdep splat because
the both have same lockdep class of their fence->lock
Fix:
Fix bellow stack by rescheduling to irq work of
signaling and killing of jobs that left when entity is killed.
[11176.741181] dump_stack+0x10/0x12
[11176.741186] __lock_acquire.cold+0x208/0x2df
[11176.741197] lock_acquire+0xc6/0x2d0
[11176.741204] ? dma_fence_signal+0x28/0x80
[11176.741212] _raw_spin_lock_irqsave+0x4d/0x70
[11176.741219] ? dma_fence_signal+0x28/0x80
[11176.741225] dma_fence_signal+0x28/0x80
[11176.741230] drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[11176.741240] drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
[11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0
[11176.741254] dma_fence_signal+0x3b/0x80
[11176.741260] drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[11176.741268] drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
[11176.741277] drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
[11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0
[11176.741290] dma_fence_signal+0x3b/0x80
[11176.741296] amdgpu_fence_process+0xd1/0x140 [amdgpu]
[11176.741504] sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
[11176.741731] amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
[11176.741954] amdgpu_ih_process+0x81/0x100 [amdgpu]
[11176.742174] amdgpu_irq_handler+0x26/0xa0 [amdgpu]
[11176.742393] __handle_irq_event_percpu+0x4f/0x2c0
[11176.742402] handle_irq_event_percpu+0x33/0x80
[11176.742408] handle_irq_event+0x39/0x60
[11176.742414] handle_edge_irq+0x93/0x1d0
[11176.742419] __common_interrupt+0x50/0xe0
[11176.742426] common_interrupt+0x80/0x90
Signed-off-by: Andrey Grodzovsky <[email protected]>
Suggested-by: Daniel Vetter <[email protected]>
Suggested-by: Christian König <[email protected]>
Tested-by: Christian König <[email protected]>
Reviewed-by: Christian König <[email protected]>
Link: https://www.spinics.net/lists/dri-devel/msg321250.html
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I found a few too many things that are tricky and not documented, so I
started typing.
I found a few more things that looked broken while typing, see the
varios FIXME in drm_sched_entity.
Also some of the usual logics:
- actually include sched_entity.c declarations, that was lost in the
move here: 620e762f9a98 ("drm/scheduler: move entity handling into
separate file")
- Ditch the kerneldoc for internal functions, keep the comments where
they're describing more than what the function name already implies.
- Switch drm_sched_entity to inline docs.
Acked-by: Melissa Wen <[email protected]>
Reviewed-by: Boris Brezillon <[email protected]> (v1)
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Lucas Stach <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Emma Anholt <[email protected]>
Cc: Lee Jones <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally a job was only bound to the queue when we pushed this, but
now that's done in drm_sched_job_init, making that parameter entirely
redundant.
Remove it.
The same applies to the context parameter in
lima_sched_context_queue_task, simplify that too.
v2:
Rebase on top of msm adopting drm/sched
Reviewed-by: Christian König <[email protected]>
Acked-by: Emma Anholt <[email protected]>
Acked-by: Melissa Wen <[email protected]>
Reviewed-by: Steven Price <[email protected]> (v1)
Reviewed-by: Boris Brezillon <[email protected]> (v1)
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Lucas Stach <[email protected]>
Cc: Russell King <[email protected]>
Cc: Christian Gmeiner <[email protected]>
Cc: Qiang Yu <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Tomeu Vizoso <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Alyssa Rosenzweig <[email protected]>
Cc: Emma Anholt <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Nirmoy Das <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Chen Li <[email protected]>
Cc: Lee Jones <[email protected]>
Cc: Deepak R Varma <[email protected]>
Cc: Kevin Wang <[email protected]>
Cc: Luben Tuikov <[email protected]>
Cc: "Marek Olšák" <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Cc: Dennis Li <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Rob Clark <[email protected]>
Cc: Sean Paul <[email protected]>
Cc: Melissa Wen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of just a callback we can just glue in the gem helpers that
panfrost, v3d and lima currently use. There's really not that many
ways to skin this cat.
v2/3: Rebased.
v4: Repaint this shed. The functions are now called _add_dependency()
and _add_implicit_dependency()
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Boris Brezillon <[email protected]> (v3)
Reviewed-by: Steven Price <[email protected]> (v1)
Acked-by: Melissa Wen <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Cc: Lee Jones <[email protected]>
Cc: Nirmoy Das <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Luben Tuikov <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It might be good enough on x86 with just READ_ONCE, but the write side
should then at least be WRITE_ONCE because x86 has total store order.
It's definitely not enough on arm.
Fix this proplery, which means
- explain the need for the barrier in both places
- point at the other side in each comment
Also pull out the !sched_list case as the first check, so that the
code flow is clearer.
While at it sprinkle some comments around because it was very
non-obvious to me what's actually going on here and why.
Note that we really need full barriers here, at first I thought
store-release and load-acquire on ->last_scheduled would be enough,
but we actually requiring ordering between that and the queue state.
v2: Put smp_rmp() in the right place and fix up comment (Andrey)
Reviewed-by: Christian König <[email protected]>
Acked-by: Melissa Wen <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Cc: Lee Jones <[email protected]>
Cc: Boris Brezillon <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a very confusingly named function, because not just does it
init an object, it arms it and provides a point of no return for
pushing a job into the scheduler. It would be nice if that's a bit
clearer in the interface.
But the real reason is that I want to push the dependency tracking
helpers into the scheduler code, and that means drm_sched_job_init
must be called a lot earlier, without arming the job.
v2:
- don't change .gitignore (Steven)
- don't forget v3d (Emma)
v3: Emma noticed that I leak the memory allocated in
drm_sched_job_init if we bail out before the point of no return in
subsequent driver patches. To be able to fix this change
drm_sched_job_cleanup() so it can handle being called both before and
after drm_sched_job_arm().
Also improve the kerneldoc for this.
v4:
- Fix the drm_sched_job_cleanup logic, I inverted the booleans, as
usual (Melissa)
- Christian pointed out that drm_sched_entity_select_rq() also needs
to be moved into drm_sched_job_arm, which made me realize that the
job->id definitely needs to be moved too.
Shuffle things to fit between job_init and job_arm.
v5:
Reshuffle the split between init/arm once more, amdgpu abuses
drm_sched.ready to signal gpu reset failures. Also document this
somewhat. (Christian)
v6:
Rebase on top of the msm drm/sched support. Note that the
drm_sched_job_init() call is completely misplaced, and hence also the
split-out drm_sched_entity_push_job(). I've put in a FIXME which the next
patch will address.
v7: Drop the FIXME in msm, after discussions with Rob I agree it shouldn't
be a problem where it is now.
Acked-by: Christian König <[email protected]>
Acked-by: Melissa Wen <[email protected]>
Cc: Melissa Wen <[email protected]>
Acked-by: Emma Anholt <[email protected]>
Acked-by: Steven Price <[email protected]> (v2)
Reviewed-by: Boris Brezillon <[email protected]> (v5)
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Lucas Stach <[email protected]>
Cc: Russell King <[email protected]>
Cc: Christian Gmeiner <[email protected]>
Cc: Qiang Yu <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Tomeu Vizoso <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Alyssa Rosenzweig <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Adam Borowski <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Paul Menzel <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Nirmoy Das <[email protected]>
Cc: Deepak R Varma <[email protected]>
Cc: Lee Jones <[email protected]>
Cc: Kevin Wang <[email protected]>
Cc: Chen Li <[email protected]>
Cc: Luben Tuikov <[email protected]>
Cc: "Marek Olšák" <[email protected]>
Cc: Dennis Li <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Cc: Sonny Jiang <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Tian Tao <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Emma Anholt <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: Sean Paul <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |
|
|
|
|
|
|
|
| |
Wait for all dependencies of a job to complete before
killing it to avoid data corruptions.
Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: If scheduler is already stopped by the time sched_entity
is released and entity's job_queue not empty I encountred
a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle
never becomes false.
Fix: In drm_sched_fini detach all sched_entities from the
scheduler's run queues. This will satisfy drm_sched_entity_is_idle.
Also wakeup all those processes stuck in sched_entity flushing
as the scheduler main thread which wakes them up is stopped by now.
v2:
Reverse order of drm_sched_rq_remove_entity and marking
s_entity as stopped to prevent reinserion back to rq due
to race.
v3:
Drop drm_sched_rq_remove_entity, only modify entity->stopped
and check for it in drm_sched_entity_is_idle
Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/scheduler/sched_entity.c:204: warning: expecting prototype for drm_sched_entity_kill_jobs(). Prototype was for drm_sched_entity_kill_jobs_cb() instead
drivers/gpu/drm/scheduler/sched_entity.c:262: warning: expecting prototype for drm_sched_entity_cleanup(). Prototype was for drm_sched_entity_fini() instead
drivers/gpu/drm/scheduler/sched_entity.c:305: warning: expecting prototype for drm_sched_entity_fini(). Prototype was for drm_sched_entity_destroy() instead
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Lee Jones <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Christian König <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
This is necessary when changing priorities of an entity.
v2: test the sched_list instead of num_sched.
v3: set the sched_list to NULL when there is only one entry
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Sonny Jiang <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |
|
|
|
|
|
|
|
|
| |
Allow multiple schedulers to share the load balancing score.
This is useful when one engine has different hw rings.
Signed-off-by: Christian König <[email protected]>
Reviewed-and-Tested-by: Leo Liu <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/scheduler/sched_entity.c:316: warning: Function parameter or member 'f' not described in 'drm_sched_entity_clear_dep'
drivers/gpu/drm/scheduler/sched_entity.c:316: warning: Function parameter or member 'cb' not described in 'drm_sched_entity_clear_dep'
drivers/gpu/drm/scheduler/sched_entity.c:330: warning: Function parameter or member 'f' not described in 'drm_sched_entity_wakeup'
drivers/gpu/drm/scheduler/sched_entity.c:330: warning: Function parameter or member 'cb' not described in 'drm_sched_entity_wakeup'
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Nirmoy Das <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we don't initialize the entity to idle and the entity is never
scheduled before being destroyed we end up with an infinite wait in the
destroy path.
v2:
- Add Steven's R-b
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/393486/
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch uses score to select a new drm scheduler for better
loadbalance between multiple drm schedulers instead of num_jobs.
Below are test results after running amdgpu_test for ~10 times.
Before this patch:
sched_name num of many times it got schedule
========= ==================================
sdma0 1463
sdma1 198
comp_1.0.1 280
After this patch:
sched_name num of many times it got schedule
========= ==================================
sdma0 925
sdma1 928
comp_1.0.1 177
comp_1.1.1 44
comp_1.2.1 43
comp_1.3.1 44
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/373000/
Signed-off-by: Christian König <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
Remove drm_sched_entity_get_free_sched() and use the logic of picking
the least loaded drm scheduler from a drm scheduler list to implement
drm_sched_pick_best(). This patch also exports drm_sched_pick_best() so
that it can be utilized by other drm drivers.
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
It needs to revert this patch to avoid amdgpu_test compute hang problem
on picasso.
This reverts commit 56822db194232c089601728d68ed078dccb97f8b.
Signed-off-by: changzhu <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
Implement drm_sched_entity_modify_sched() which modifies existing
sched_list with a different one. This is going to be helpful when
userspace changes priority of a ctx/entity then the driver can switch
to the corresponding HW scheduler list for that priority.
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
expand sched_list definition for better understanding.
Also fix a typo atleast -> at least
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
This also replaces old artifacts with a correct one in drm_sched_entity_init()
declaration
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch uses score based logic to select a new rq for better
loadbalance between multiple rq/scheds instead of num_jobs.
Below are test results after running amdgpu_test from mesa drm
Before this patch:
sched_name num of many times it got scheduled
========= ==================================
sdma0 314
sdma1 32
comp_1.0.0 56
comp_1.0.1 0
comp_1.1.0 0
comp_1.1.1 0
comp_1.2.0 0
comp_1.2.1 0
comp_1.3.0 0
comp_1.3.1 0
After this patch:
sched_name num of many times it got scheduled
========= ==================================
sdma0 216
sdma1 185
comp_1.0.0 39
comp_1.0.1 9
comp_1.1.0 12
comp_1.1.1 0
comp_1.2.0 12
comp_1.2.1 0
comp_1.3.0 12
comp_1.3.1 0
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
entity should not keep copy and maintain sched list for
itself.
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Entity currently keeps a copy of run_queue list and modify it in
drm_sched_entity_set_priority(). Entities shouldn't modify run_queue
list. Use drm_gpu_scheduler list instead of drm_sched_rq list
in drm_sched_entity struct. In this way we can select a runqueue based
on entity/ctx's priority for a drm scheduler.
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Removes thread park/unpark hack from drm_sched_entity_fini and
by this fixes reactivation of scheduler thread while the thread
is supposed to be stopped.
v2: Per sched entity completion.
Signed-off-by: Andrey Grodzovsky <[email protected]>
Suggested-by: Christian König <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The spsc_queue_peek function is accessing queue->head which belongs to
the consumer thread and shouldn't be accessed by the producer
This is fixing a rare race condition when destroying entities.
Signed-off-by: Christian König <[email protected]>
Acked-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Drop use of the deprecated drmP.h header file.
Fix fallout.
Signed-off-by: Sam Ravnborg <[email protected]>
Reviewed-by: Christian König <[email protected]>
Acked-by: Emil Velikov <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andrey Grodzovsky <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Eric Anholt <[email protected]>
Cc: Bas Nieuwenhuizen <[email protected]>
Cc: Sharat Masetty <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nayan Deshmukh <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
into drm-next
Fixes for 5.1:
amdgpu:
- Fix missing fw declaration after dropping old CI DPM code
- Fix debugfs access to registers beyond the MMIO bar size
- Fix context priority handling
- Add missing license on some new files
- Various cleanups and bug fixes
radeon:
- Fix missing break in CS parser for evergreen
- Various cleanups and bug fixes
sched:
- Fix entities with 0 run queues
Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Some blocks in amdgpu can have 0 rqs.
Job creation already fails with -ENOENT when entity->rq is NULL,
so jobs cannot be pushed. Without a rq there is no scheduler to
pop jobs, and rq selection already does the right thing with a
list of length 0.
So the operations we need to fix are:
- Creation, do not set rq to rq_list[0] if the list can have length 0.
- Do not flush any jobs when there is no rq.
- On entity destruction handle the rq = NULL case.
- on set_priority, do not try to change the rq if it is NULL.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The entity->dependency can go away completely once we've called
drm_sched_entity_add_dependency_cb() (if the cb is called before we
get around to tracing). The tracepoint is more useful if we trace
every dependency instead of just ones that get callbacks installed,
anyway, so just do that.
Fixes any easy-to-produce OOPS when tracing the scheduler on V3D with
"perf record -a -e gpu_scheduler:.\* glxgears" and DEBUG_SLAB enabled.
Signed-off-by: Eric Anholt <[email protected]>
Reviewed-by: Christian König <[email protected]>
Cc: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new API to clean up the scheduler job resources. This
is primarliy needed in cases the job was created but was not queued to
the scheduler queue. Additionally with this change, the layer which
creates the scheduler job also gets to free up the job's resources and
this entails moving the dma_fence_put(finished_fence) to the drivers
ops free handler routines.
Signed-off-by: Sharat Masetty <[email protected]>
Reviewed-by: Christian König <[email protected]>
Acked-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
A particular scheduler may become unsuable (underlying HW) after
some event (e.g. GPU reset). If it's later chosen by
the get free sched. policy a command will fail to be
submitted.
Fix:
Add a driver specific callback to report the sched status so
rq with bad sched can be avoided in favor of working one or
none in which case job init will fail.
v2: Switch from driver callback to flag in scheduler.
v3: rebase
v4: Remove ready paramter from drm_sched_init, set
uncoditionally to true once init done.
v5: fix missed change in v3d in v4 (Alex)
Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clang generates a warning when it sees a logical not followed by a
conditional operator like ==, >, or <.
drivers/gpu/drm/scheduler/sched_entity.c:470:6: warning: logical not is
only applied to the left hand side of this comparison
[-Wlogical-not-parentheses]
if (!spsc_queue_count(&entity->job_queue) == 0 ||
^ ~~
drivers/gpu/drm/scheduler/sched_entity.c:470:6: note: add parentheses
after the '!' to evaluate the comparison first
if (!spsc_queue_count(&entity->job_queue) == 0 ||
^
( )
drivers/gpu/drm/scheduler/sched_entity.c:470:6: note: add parentheses
around left hand side expression to silence this warning
if (!spsc_queue_count(&entity->job_queue) == 0 ||
^
( )
1 warning generated.
It assumes the author might have made a mistake in their logic:
if (!a == b) -> if (!(a == b))
Sometimes that is the case; other times, it's just a super convoluted
way of saying 'if (a)' when b = 0:
if (!1 == 0) -> if (0 == 0) -> if (true)
Alternatively:
if (!1 == 0) -> if (!!1) -> if (1)
Simplify this comparison so that Clang doesn't complain.
Fixes: 35e160e781a0 ("drm/scheduler: change entities rq even earlier")
Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
do not remove entity from the rq if the current rq is from
the least loaded scheduler.
Signed-off-by: Nayan Deshmukh <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
The flag will prevent another thread from same process to
reinsert the entity queue into scheduler's rq after it was already
removewd from there by another thread during drm_sched_entity_flush.
Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
| |
|
|
|
|
|
|
| |
Cleanup coding style in sched_entity.c
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
This is complex enough on it's own. Move it into a separate C file.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|