diff options
| author | Linus Torvalds <[email protected]> | 2025-07-31 02:26:49 +0000 |
|---|---|---|
| committer | Linus Torvalds <[email protected]> | 2025-07-31 02:26:49 +0000 |
| commit | 260f6f4fda93c8485c8037865c941b42b9cba5d2 (patch) | |
| tree | 587a0ea46d3351f63250d19860b01da8217ac774 /drivers/gpu/drm/scheduler/sched_main.c | |
| parent | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm (diff) | |
| parent | Merge tag 'drm-misc-next-fixes-2025-07-24' of https://gitlab.freedesktop.org/... (diff) | |
| download | kernel-260f6f4fda93c8485c8037865c941b42b9cba5d2.tar.gz kernel-260f6f4fda93c8485c8037865c941b42b9cba5d2.zip | |
Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
"Highlights:
- Intel xe enable Panthor Lake, started adding WildCat Lake
- amdgpu has a bunch of reset improvments along with the usual IP
updates
- msm got VM_BIND support which is important for vulkan sparse memory
- more drm_panic users
- gpusvm common code to handle a bunch of core SVM work outside
drivers.
Detail summary:
Changes outside drm subdirectory:
- 'shrink_shmem_memory()' for better shmem/hibernate interaction
- Rust support infrastructure:
- make ETIMEDOUT available
- add size constants up to SZ_2G
- add DMA coherent allocation bindings
- mtd driver for Intel GPU non-volatile storage
- i2c designware quirk for Intel xe
core:
- atomic helpers: tune enable/disable sequences
- add task info to wedge API
- refactor EDID quirks
- connector: move HDR sink to drm_display_info
- fourcc: half-float and 32-bit float formats
- mode_config: pass format info to simplify
dma-buf:
- heaps: Give CMA heap a stable name
ci:
- add device tree validation and kunit
displayport:
- change AUX DPCD access probe address
- add quirk for DPCD probe
- add panel replay definitions
- backlight control helpers
fbdev:
- make CONFIG_FIRMWARE_EDID available on all arches
fence:
- fix UAF issues
format-helper:
- improve tests
gpusvm:
- introduce devmem only flag for allocation
- add timeslicing support to GPU SVM
ttm:
- improve eviction
sched:
- tracing improvements
- kunit improvements
- memory leak fixes
- reset handling improvements
color mgmt:
- add hardware gamma LUT handling helpers
bridge:
- add destroy hook
- switch to reference counted drm_bridge allocations
- tc358767: convert to devm_drm_bridge_alloc
- improve CEC handling
panel:
- switch to reference counter drm_panel allocations
- fwnode panel lookup
- Huiling hl055fhv028c support
- Raspberry Pi 7" 720x1280 support
- edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK
- simple: AUO P238HAN01
- st7701: Winstar wf40eswaa6mnn0
- visionox: rm69299-shift
- Renesas R61307, Renesas R69328 support
- DJN HX83112B
hdmi:
- add CEC handling
- YUV420 output support
xe:
- WildCat Lake support
- Enable PanthorLake by default
- mark BMG as SRIOV capable
- update firmware recommendations
- Expose media OA units
- aux-bux support for non-volatile memory
- MTD intel-dg driver for non-volatile memory
- Expose fan control and voltage regulator in sysfs
- restructure migration for multi-device
- Restore GuC submit UAF fix
- make GEM shrinker drm managed
- SRIOV VF Post-migration recovery of GGTT nodes
- W/A additions/reworks
- Prefetch support for svm ranges
- Don't allocate managed BO for each policy change
- HWMON fixes for BMG
- Create LRC BO without VM
- PCI ID updates
- make SLPC debugfs files optional
- rework eviction rejection of bound external BOs
- consolidate PAT programming logic for pre/post Xe2
- init changes for flicker-free boot
- Enable GuC Dynamic Inhibit Context switch
i915:
- drm_panic support for i915/xe
- initial flip queue off by default for LNL/PNL
- Wildcat Lake Display support
- Support for DSC fractional link bpp
- Support for simultaneous Panel Replay and Adaptive sync
- Support for PTL+ double buffer LUT
- initial PIPEDMC event handling
- drm_panel_follower support
- DPLL interface renames
- allocate struct intel_display dynamically
- flip queue preperation
- abstract DRAM detection better
- avoid GuC scheduling stalls
- remove DG1 force probe requirement
- fix MEI interrupt handler on RT kernels
- use backlight control helpers for eDP
- more shared display code refactoring
amdgpu:
- add userq slot to INFO ioctl
- SR-IOV hibernation support
- Suspend improvements
- Backlight improvements
- Use scaling for non-native eDP modes
- cleaner shader updates for GC 9.x
- Remove fence slab
- SDMA fw checks for userq support
- RAS updates
- DMCUB updates
- DP tunneling fixes
- Display idle D3 support
- Per queue reset improvements
- initial smartmux support
amdkfd:
- enable KFD on loongarch
- mtype fix for ext coherent system memory
radeon:
- CS validation additional GL extensions
- drop console lock during suspend/resume
- bump driver version
msm:
- VM BIND support
- CI: infrastructure updates
- UBWC single source of truth
- decouple GPU and KMS support
- DP: rework I/O accessors
- DPU: SM8750 support
- DSI: SM8750 support
- GPU: X1-45 support and speedbin support for X1-85
- MDSS: SM8750 support
nova:
- register! macro improvements
- DMA object abstraction
- VBIOS parser + fwsec lookup
- sysmem flush page support
- falcon: generic falcon boot code and HAL
- FWSEC-FRTS: fb setup and load/execute
ivpu:
- Add Wildcat Lake support
- Add turbo flag
ast:
- improve hardware generations implementation
imx:
- IMX8qxq Display Controller support
lima:
- Rockchip RK3528 GPU support
nouveau:
- fence handling cleanup
panfrost:
- MT8370 support
- bo labeling
- 64-bit register access
qaic:
- add RAS support
rockchip:
- convert inno_hdmi to a bridge
rz-du:
- add RZ/V2H(P) support
- MIPI-DSI DCS support
sitronix:
- ST7567 support
sun4i:
- add H616 support
tidss:
- add TI AM62L support
- AM65x OLDI bridge support
bochs:
- drm panic support
vkms:
- YUV and R* format support
- use faux device
vmwgfx:
- fence improvements
hyperv:
- move out of simple
- add drm_panic support"
* tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits)
drm/tidss: oldi: convert to devm_drm_bridge_alloc() API
drm/tidss: encoder: convert to devm_drm_bridge_alloc()
drm/amdgpu: move reset support type checks into the caller
drm/amdgpu/sdma7: re-emit unprocessed state on ring reset
drm/amdgpu/sdma6: re-emit unprocessed state on ring reset
drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset
drm/amdgpu/sdma5: re-emit unprocessed state on ring reset
drm/amdgpu/gfx12: re-emit unprocessed state on ring reset
drm/amdgpu/gfx11: re-emit unprocessed state on ring reset
drm/amdgpu/gfx10: re-emit unprocessed state on ring reset
drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset
drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset
drm/amdgpu: Add WARN_ON to the resource clear function
drm/amd/pm: Use cached metrics data on SMUv13.0.6
drm/amd/pm: Use cached data for min/max clocks
gpu: nova-core: fix bounds check in PmuLookupTableEntry::new
drm/amdgpu: Replace HQD terminology with slots naming
drm/amdgpu: Add user queue instance count in HW IP info
drm/amd/amdgpu: Add helper functions for isp buffers
drm/amd/amdgpu: Initialize swnode for ISP MFD device
...
Diffstat (limited to 'drivers/gpu/drm/scheduler/sched_main.c')
| -rw-r--r-- | drivers/gpu/drm/scheduler/sched_main.c | 203 |
1 files changed, 130 insertions, 73 deletions
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 829579c41c6b..e2cda28a1af4 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -66,6 +66,7 @@ * This implies waiting for previously executed jobs. */ +#include <linux/export.h> #include <linux/wait.h> #include <linux/sched.h> #include <linux/completion.h> @@ -83,12 +84,6 @@ #define CREATE_TRACE_POINTS #include "gpu_scheduler_trace.h" -#ifdef CONFIG_LOCKDEP -static struct lockdep_map drm_sched_lockdep_map = { - .name = "drm_sched_lockdep_map" -}; -#endif - int drm_sched_policy = DRM_SCHED_POLICY_FIFO; /** @@ -268,38 +263,14 @@ drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched, entity = rq->current_entity; if (entity) { list_for_each_entry_continue(entity, &rq->entities, list) { - if (drm_sched_entity_is_ready(entity)) { - /* If we can't queue yet, preserve the current - * entity in terms of fairness. - */ - if (!drm_sched_can_queue(sched, entity)) { - spin_unlock(&rq->lock); - return ERR_PTR(-ENOSPC); - } - - rq->current_entity = entity; - reinit_completion(&entity->entity_idle); - spin_unlock(&rq->lock); - return entity; - } + if (drm_sched_entity_is_ready(entity)) + goto found; } } list_for_each_entry(entity, &rq->entities, list) { - if (drm_sched_entity_is_ready(entity)) { - /* If we can't queue yet, preserve the current entity in - * terms of fairness. - */ - if (!drm_sched_can_queue(sched, entity)) { - spin_unlock(&rq->lock); - return ERR_PTR(-ENOSPC); - } - - rq->current_entity = entity; - reinit_completion(&entity->entity_idle); - spin_unlock(&rq->lock); - return entity; - } + if (drm_sched_entity_is_ready(entity)) + goto found; if (entity == rq->current_entity) break; @@ -308,6 +279,22 @@ drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched, spin_unlock(&rq->lock); return NULL; + +found: + if (!drm_sched_can_queue(sched, entity)) { + /* + * If scheduler cannot take more jobs signal the caller to not + * consider lower priority queues. + */ + entity = ERR_PTR(-ENOSPC); + } else { + rq->current_entity = entity; + reinit_completion(&entity->entity_idle); + } + + spin_unlock(&rq->lock); + + return entity; } /** @@ -379,11 +366,16 @@ static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched) { struct drm_sched_job *job; - spin_lock(&sched->job_list_lock); job = list_first_entry_or_null(&sched->pending_list, struct drm_sched_job, list); if (job && dma_fence_is_signaled(&job->s_fence->finished)) __drm_sched_run_free_queue(sched); +} + +static void drm_sched_run_free_queue_unlocked(struct drm_gpu_scheduler *sched) +{ + spin_lock(&sched->job_list_lock); + drm_sched_run_free_queue(sched); spin_unlock(&sched->job_list_lock); } @@ -391,7 +383,7 @@ static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched) * drm_sched_job_done - complete a job * @s_job: pointer to the job which is done * - * Finish the job's fence and wake up the worker thread. + * Finish the job's fence and resubmit the work items. */ static void drm_sched_job_done(struct drm_sched_job *s_job, int result) { @@ -401,7 +393,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result) atomic_sub(s_job->credits, &sched->credit_count); atomic_dec(sched->score); - trace_drm_sched_process_job(s_fence); + trace_drm_sched_job_done(s_fence); dma_fence_get(&s_fence->finished); drm_sched_fence_finished(s_fence, result); @@ -536,11 +528,37 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job) spin_unlock(&sched->job_list_lock); } +/** + * drm_sched_job_reinsert_on_false_timeout - reinsert the job on a false timeout + * @sched: scheduler instance + * @job: job to be reinserted on the pending list + * + * In the case of a "false timeout" - when a timeout occurs but the GPU isn't + * hung and is making progress, the scheduler must reinsert the job back into + * @sched->pending_list. Otherwise, the job and its resources won't be freed + * through the &struct drm_sched_backend_ops.free_job callback. + * + * This function must be used in "false timeout" cases only. + */ +static void drm_sched_job_reinsert_on_false_timeout(struct drm_gpu_scheduler *sched, + struct drm_sched_job *job) +{ + spin_lock(&sched->job_list_lock); + list_add(&job->list, &sched->pending_list); + + /* After reinserting the job, the scheduler enqueues the free-job work + * again if ready. Otherwise, a signaled job could be added to the + * pending list, but never freed. + */ + drm_sched_run_free_queue(sched); + spin_unlock(&sched->job_list_lock); +} + static void drm_sched_job_timedout(struct work_struct *work) { struct drm_gpu_scheduler *sched; struct drm_sched_job *job; - enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL; + enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_RESET; sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); @@ -551,9 +569,10 @@ static void drm_sched_job_timedout(struct work_struct *work) if (job) { /* - * Remove the bad job so it cannot be freed by concurrent - * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread - * is parked at which point it's safe. + * Remove the bad job so it cannot be freed by a concurrent + * &struct drm_sched_backend_ops.free_job. It will be + * reinserted after the scheduler's work items have been + * cancelled, at which point it's safe. */ list_del_init(&job->list); spin_unlock(&sched->job_list_lock); @@ -568,6 +587,9 @@ static void drm_sched_job_timedout(struct work_struct *work) job->sched->ops->free_job(job); sched->free_guilty = false; } + + if (status == DRM_GPU_SCHED_STAT_NO_HANG) + drm_sched_job_reinsert_on_false_timeout(sched, job); } else { spin_unlock(&sched->job_list_lock); } @@ -590,6 +612,10 @@ static void drm_sched_job_timedout(struct work_struct *work) * This function is typically used for reset recovery (see the docu of * drm_sched_backend_ops.timedout_job() for details). Do not call it for * scheduler teardown, i.e., before calling drm_sched_fini(). + * + * As it's only used for reset recovery, drivers must not call this function + * in their &struct drm_sched_backend_ops.timedout_job callback when they + * skip a reset using &enum drm_gpu_sched_stat.DRM_GPU_SCHED_STAT_NO_HANG. */ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) { @@ -599,10 +625,10 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) /* * Reinsert back the bad job here - now it's safe as - * drm_sched_get_finished_job cannot race against us and release the + * drm_sched_get_finished_job() cannot race against us and release the * bad job at this point - we parked (waited for) any in progress - * (earlier) cleanups and drm_sched_get_finished_job will not be called - * now until the scheduler thread is unparked. + * (earlier) cleanups and drm_sched_get_finished_job() will not be + * called now until the scheduler's work items are submitted again. */ if (bad && bad->sched == sched) /* @@ -615,7 +641,8 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) * Iterate the job list from later to earlier one and either deactive * their HW callbacks or remove them from pending list if they already * signaled. - * This iteration is thread safe as sched thread is stopped. + * This iteration is thread safe as the scheduler's work items have been + * cancelled. */ list_for_each_entry_safe_reverse(s_job, tmp, &sched->pending_list, list) { @@ -674,15 +701,19 @@ EXPORT_SYMBOL(drm_sched_stop); * drm_sched_backend_ops.timedout_job() for details). Do not call it for * scheduler startup. The scheduler itself is fully operational after * drm_sched_init() succeeded. + * + * As it's only used for reset recovery, drivers must not call this function + * in their &struct drm_sched_backend_ops.timedout_job callback when they + * skip a reset using &enum drm_gpu_sched_stat.DRM_GPU_SCHED_STAT_NO_HANG. */ void drm_sched_start(struct drm_gpu_scheduler *sched, int errno) { struct drm_sched_job *s_job, *tmp; /* - * Locking the list is not required here as the sched thread is parked - * so no new jobs are being inserted or removed. Also concurrent - * GPU recovers can't run in parallel. + * Locking the list is not required here as the scheduler's work items + * are currently not running, so no new jobs are being inserted or + * removed. Also concurrent GPU recovers can't run in parallel. */ list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) { struct dma_fence *fence = s_job->s_fence->parent; @@ -764,6 +795,8 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs); * @credits: the number of credits this job contributes to the schedulers * credit limit * @owner: job owner for debugging + * @drm_client_id: &struct drm_file.client_id of the owner (used by trace + * events) * * Refer to drm_sched_entity_push_job() documentation * for locking considerations. @@ -784,7 +817,8 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs); */ int drm_sched_job_init(struct drm_sched_job *job, struct drm_sched_entity *entity, - u32 credits, void *owner) + u32 credits, void *owner, + uint64_t drm_client_id) { if (!entity->rq) { /* This will most likely be followed by missing frames @@ -810,7 +844,7 @@ int drm_sched_job_init(struct drm_sched_job *job, job->entity = entity; job->credits = credits; - job->s_fence = drm_sched_fence_alloc(entity, owner); + job->s_fence = drm_sched_fence_alloc(entity, owner, drm_client_id); if (!job->s_fence) return -ENOMEM; @@ -850,7 +884,6 @@ void drm_sched_job_arm(struct drm_sched_job *job) job->sched = sched; job->s_priority = entity->priority; - job->id = atomic64_inc_return(&sched->job_id_count); drm_sched_fence_init(job->s_fence, job->entity); } @@ -1193,7 +1226,7 @@ static void drm_sched_free_job_work(struct work_struct *w) if (job) sched->ops->free_job(job); - drm_sched_run_free_queue(sched); + drm_sched_run_free_queue_unlocked(sched); drm_sched_run_job_queue(sched); } @@ -1229,7 +1262,7 @@ static void drm_sched_run_job_work(struct work_struct *w) atomic_add(sched_job->credits, &sched->credit_count); drm_sched_job_begin(sched_job); - trace_drm_run_job(sched_job, entity); + trace_drm_sched_job_run(sched_job, entity); /* * The run_job() callback must by definition return a fence whose * refcount has been incremented for the scheduler already. @@ -1256,6 +1289,25 @@ static void drm_sched_run_job_work(struct work_struct *w) drm_sched_run_job_queue(sched); } +static struct workqueue_struct *drm_sched_alloc_wq(const char *name) +{ +#if (IS_ENABLED(CONFIG_LOCKDEP)) + static struct lockdep_map map = { + .name = "drm_sched_lockdep_map" + }; + + /* + * Avoid leaking a lockdep map on each drm sched creation and + * destruction by using a single lockdep map for all drm sched + * allocated submit_wq. + */ + + return alloc_ordered_workqueue_lockdep_map(name, WQ_MEM_RECLAIM, &map); +#else + return alloc_ordered_workqueue(name, WQ_MEM_RECLAIM); +#endif +} + /** * drm_sched_init - Init a gpu scheduler instance * @@ -1296,13 +1348,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_ sched->submit_wq = args->submit_wq; sched->own_submit_wq = false; } else { -#ifdef CONFIG_LOCKDEP - sched->submit_wq = alloc_ordered_workqueue_lockdep_map(args->name, - WQ_MEM_RECLAIM, - &drm_sched_lockdep_map); -#else - sched->submit_wq = alloc_ordered_workqueue(args->name, WQ_MEM_RECLAIM); -#endif + sched->submit_wq = drm_sched_alloc_wq(args->name); if (!sched->submit_wq) return -ENOMEM; @@ -1348,6 +1394,18 @@ Out_check_own: } EXPORT_SYMBOL(drm_sched_init); +static void drm_sched_cancel_remaining_jobs(struct drm_gpu_scheduler *sched) +{ + struct drm_sched_job *job, *tmp; + + /* All other accessors are stopped. No locking necessary. */ + list_for_each_entry_safe_reverse(job, tmp, &sched->pending_list, list) { + sched->ops->cancel_job(job); + list_del(&job->list); + sched->ops->free_job(job); + } +} + /** * drm_sched_fini - Destroy a gpu scheduler * @@ -1355,19 +1413,11 @@ EXPORT_SYMBOL(drm_sched_init); * * Tears down and cleans up the scheduler. * - * This stops submission of new jobs to the hardware through - * drm_sched_backend_ops.run_job(). Consequently, drm_sched_backend_ops.free_job() - * will not be called for all jobs still in drm_gpu_scheduler.pending_list. - * There is no solution for this currently. Thus, it is up to the driver to make - * sure that: - * - * a) drm_sched_fini() is only called after for all submitted jobs - * drm_sched_backend_ops.free_job() has been called or that - * b) the jobs for which drm_sched_backend_ops.free_job() has not been called - * after drm_sched_fini() ran are freed manually. - * - * FIXME: Take care of the above problem and prevent this function from leaking - * the jobs in drm_gpu_scheduler.pending_list under any circumstances. + * This stops submission of new jobs to the hardware through &struct + * drm_sched_backend_ops.run_job. If &struct drm_sched_backend_ops.cancel_job + * is implemented, all jobs will be canceled through it and afterwards cleaned + * up through &struct drm_sched_backend_ops.free_job. If cancel_job is not + * implemented, memory could leak. */ void drm_sched_fini(struct drm_gpu_scheduler *sched) { @@ -1397,11 +1447,18 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) /* Confirm no work left behind accessing device structures */ cancel_delayed_work_sync(&sched->work_tdr); + /* Avoid memory leaks if supported by the driver. */ + if (sched->ops->cancel_job) + drm_sched_cancel_remaining_jobs(sched); + if (sched->own_submit_wq) destroy_workqueue(sched->submit_wq); sched->ready = false; kfree(sched->sched_rq); sched->sched_rq = NULL; + + if (!list_empty(&sched->pending_list)) + dev_warn(sched->dev, "Tearing down scheduler while jobs are pending!\n"); } EXPORT_SYMBOL(drm_sched_fini); |
