diff options
| author | Linus Torvalds <[email protected]> | 2025-07-31 02:26:49 +0000 |
|---|---|---|
| committer | Linus Torvalds <[email protected]> | 2025-07-31 02:26:49 +0000 |
| commit | 260f6f4fda93c8485c8037865c941b42b9cba5d2 (patch) | |
| tree | 587a0ea46d3351f63250d19860b01da8217ac774 /drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | |
| parent | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm (diff) | |
| parent | Merge tag 'drm-misc-next-fixes-2025-07-24' of https://gitlab.freedesktop.org/... (diff) | |
| download | kernel-260f6f4fda93c8485c8037865c941b42b9cba5d2.tar.gz kernel-260f6f4fda93c8485c8037865c941b42b9cba5d2.zip | |
Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
"Highlights:
- Intel xe enable Panthor Lake, started adding WildCat Lake
- amdgpu has a bunch of reset improvments along with the usual IP
updates
- msm got VM_BIND support which is important for vulkan sparse memory
- more drm_panic users
- gpusvm common code to handle a bunch of core SVM work outside
drivers.
Detail summary:
Changes outside drm subdirectory:
- 'shrink_shmem_memory()' for better shmem/hibernate interaction
- Rust support infrastructure:
- make ETIMEDOUT available
- add size constants up to SZ_2G
- add DMA coherent allocation bindings
- mtd driver for Intel GPU non-volatile storage
- i2c designware quirk for Intel xe
core:
- atomic helpers: tune enable/disable sequences
- add task info to wedge API
- refactor EDID quirks
- connector: move HDR sink to drm_display_info
- fourcc: half-float and 32-bit float formats
- mode_config: pass format info to simplify
dma-buf:
- heaps: Give CMA heap a stable name
ci:
- add device tree validation and kunit
displayport:
- change AUX DPCD access probe address
- add quirk for DPCD probe
- add panel replay definitions
- backlight control helpers
fbdev:
- make CONFIG_FIRMWARE_EDID available on all arches
fence:
- fix UAF issues
format-helper:
- improve tests
gpusvm:
- introduce devmem only flag for allocation
- add timeslicing support to GPU SVM
ttm:
- improve eviction
sched:
- tracing improvements
- kunit improvements
- memory leak fixes
- reset handling improvements
color mgmt:
- add hardware gamma LUT handling helpers
bridge:
- add destroy hook
- switch to reference counted drm_bridge allocations
- tc358767: convert to devm_drm_bridge_alloc
- improve CEC handling
panel:
- switch to reference counter drm_panel allocations
- fwnode panel lookup
- Huiling hl055fhv028c support
- Raspberry Pi 7" 720x1280 support
- edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK
- simple: AUO P238HAN01
- st7701: Winstar wf40eswaa6mnn0
- visionox: rm69299-shift
- Renesas R61307, Renesas R69328 support
- DJN HX83112B
hdmi:
- add CEC handling
- YUV420 output support
xe:
- WildCat Lake support
- Enable PanthorLake by default
- mark BMG as SRIOV capable
- update firmware recommendations
- Expose media OA units
- aux-bux support for non-volatile memory
- MTD intel-dg driver for non-volatile memory
- Expose fan control and voltage regulator in sysfs
- restructure migration for multi-device
- Restore GuC submit UAF fix
- make GEM shrinker drm managed
- SRIOV VF Post-migration recovery of GGTT nodes
- W/A additions/reworks
- Prefetch support for svm ranges
- Don't allocate managed BO for each policy change
- HWMON fixes for BMG
- Create LRC BO without VM
- PCI ID updates
- make SLPC debugfs files optional
- rework eviction rejection of bound external BOs
- consolidate PAT programming logic for pre/post Xe2
- init changes for flicker-free boot
- Enable GuC Dynamic Inhibit Context switch
i915:
- drm_panic support for i915/xe
- initial flip queue off by default for LNL/PNL
- Wildcat Lake Display support
- Support for DSC fractional link bpp
- Support for simultaneous Panel Replay and Adaptive sync
- Support for PTL+ double buffer LUT
- initial PIPEDMC event handling
- drm_panel_follower support
- DPLL interface renames
- allocate struct intel_display dynamically
- flip queue preperation
- abstract DRAM detection better
- avoid GuC scheduling stalls
- remove DG1 force probe requirement
- fix MEI interrupt handler on RT kernels
- use backlight control helpers for eDP
- more shared display code refactoring
amdgpu:
- add userq slot to INFO ioctl
- SR-IOV hibernation support
- Suspend improvements
- Backlight improvements
- Use scaling for non-native eDP modes
- cleaner shader updates for GC 9.x
- Remove fence slab
- SDMA fw checks for userq support
- RAS updates
- DMCUB updates
- DP tunneling fixes
- Display idle D3 support
- Per queue reset improvements
- initial smartmux support
amdkfd:
- enable KFD on loongarch
- mtype fix for ext coherent system memory
radeon:
- CS validation additional GL extensions
- drop console lock during suspend/resume
- bump driver version
msm:
- VM BIND support
- CI: infrastructure updates
- UBWC single source of truth
- decouple GPU and KMS support
- DP: rework I/O accessors
- DPU: SM8750 support
- DSI: SM8750 support
- GPU: X1-45 support and speedbin support for X1-85
- MDSS: SM8750 support
nova:
- register! macro improvements
- DMA object abstraction
- VBIOS parser + fwsec lookup
- sysmem flush page support
- falcon: generic falcon boot code and HAL
- FWSEC-FRTS: fb setup and load/execute
ivpu:
- Add Wildcat Lake support
- Add turbo flag
ast:
- improve hardware generations implementation
imx:
- IMX8qxq Display Controller support
lima:
- Rockchip RK3528 GPU support
nouveau:
- fence handling cleanup
panfrost:
- MT8370 support
- bo labeling
- 64-bit register access
qaic:
- add RAS support
rockchip:
- convert inno_hdmi to a bridge
rz-du:
- add RZ/V2H(P) support
- MIPI-DSI DCS support
sitronix:
- ST7567 support
sun4i:
- add H616 support
tidss:
- add TI AM62L support
- AM65x OLDI bridge support
bochs:
- drm panic support
vkms:
- YUV and R* format support
- use faux device
vmwgfx:
- fence improvements
hyperv:
- move out of simple
- add drm_panic support"
* tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits)
drm/tidss: oldi: convert to devm_drm_bridge_alloc() API
drm/tidss: encoder: convert to devm_drm_bridge_alloc()
drm/amdgpu: move reset support type checks into the caller
drm/amdgpu/sdma7: re-emit unprocessed state on ring reset
drm/amdgpu/sdma6: re-emit unprocessed state on ring reset
drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset
drm/amdgpu/sdma5: re-emit unprocessed state on ring reset
drm/amdgpu/gfx12: re-emit unprocessed state on ring reset
drm/amdgpu/gfx11: re-emit unprocessed state on ring reset
drm/amdgpu/gfx10: re-emit unprocessed state on ring reset
drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset
drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset
drm/amdgpu: Add WARN_ON to the resource clear function
drm/amd/pm: Use cached metrics data on SMUv13.0.6
drm/amd/pm: Use cached data for min/max clocks
gpu: nova-core: fix bounds check in PmuLookupTableEntry::new
drm/amdgpu: Replace HQD terminology with slots naming
drm/amdgpu: Add user queue instance count in HW IP info
drm/amd/amdgpu: Add helper functions for isp buffers
drm/amd/amdgpu: Initialize swnode for ISP MFD device
...
Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c')
| -rw-r--r-- | drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 151 |
1 files changed, 110 insertions, 41 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 5fec808d7f54..9e7506965cab 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -41,21 +41,6 @@ #include "amdgpu_trace.h" #include "amdgpu_reset.h" -static struct kmem_cache *amdgpu_fence_slab; - -int amdgpu_fence_slab_init(void) -{ - amdgpu_fence_slab = KMEM_CACHE(amdgpu_fence, SLAB_HWCACHE_ALIGN); - if (!amdgpu_fence_slab) - return -ENOMEM; - return 0; -} - -void amdgpu_fence_slab_fini(void) -{ - rcu_barrier(); - kmem_cache_destroy(amdgpu_fence_slab); -} /* * Cast helper */ @@ -114,14 +99,14 @@ static u32 amdgpu_fence_read(struct amdgpu_ring *ring) * * @ring: ring the fence is associated with * @f: resulting fence object - * @job: job the fence is embedded in + * @af: amdgpu fence input * @flags: flags to pass into the subordinate .emit_fence() call * * Emits a fence command on the requested ring (all asics). * Returns 0 on success, -ENOMEM on failure. */ -int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amdgpu_job *job, - unsigned int flags) +int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, + struct amdgpu_fence *af, unsigned int flags) { struct amdgpu_device *adev = ring->adev; struct dma_fence *fence; @@ -130,40 +115,35 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd uint32_t seq; int r; - if (job == NULL) { - /* create a sperate hw fence */ - am_fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC); - if (am_fence == NULL) + if (!af) { + /* create a separate hw fence */ + am_fence = kzalloc(sizeof(*am_fence), GFP_KERNEL); + if (!am_fence) return -ENOMEM; + am_fence->context = 0; } else { - /* take use of job-embedded fence */ - am_fence = &job->hw_fence; + am_fence = af; } fence = &am_fence->base; am_fence->ring = ring; seq = ++ring->fence_drv.sync_seq; - if (job && job->job_run_counter) { - /* reinit seq for resubmitted jobs */ - fence->seqno = seq; - /* TO be inline with external fence creation and other drivers */ + am_fence->seq = seq; + if (af) { + dma_fence_init(fence, &amdgpu_job_fence_ops, + &ring->fence_drv.lock, + adev->fence_context + ring->idx, seq); + /* Against remove in amdgpu_job_{free, free_cb} */ dma_fence_get(fence); } else { - if (job) { - dma_fence_init(fence, &amdgpu_job_fence_ops, - &ring->fence_drv.lock, - adev->fence_context + ring->idx, seq); - /* Against remove in amdgpu_job_{free, free_cb} */ - dma_fence_get(fence); - } else { - dma_fence_init(fence, &amdgpu_fence_ops, - &ring->fence_drv.lock, - adev->fence_context + ring->idx, seq); - } + dma_fence_init(fence, &amdgpu_fence_ops, + &ring->fence_drv.lock, + adev->fence_context + ring->idx, seq); } amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr, seq, flags | AMDGPU_FENCE_FLAG_INT); + amdgpu_fence_save_wptr(fence); pm_runtime_get_noresume(adev_to_drm(adev)->dev); ptr = &ring->fence_drv.fences[seq & ring->fence_drv.num_fences_mask]; if (unlikely(rcu_dereference_protected(*ptr, 1))) { @@ -276,6 +256,7 @@ bool amdgpu_fence_process(struct amdgpu_ring *ring) do { struct dma_fence *fence, **ptr; + struct amdgpu_fence *am_fence; ++last_seq; last_seq &= drv->num_fences_mask; @@ -288,6 +269,12 @@ bool amdgpu_fence_process(struct amdgpu_ring *ring) if (!fence) continue; + /* Save the wptr in the fence driver so we know what the last processed + * wptr was. This is required for re-emitting the ring state for + * queues that are reset but are not guilty and thus have no guilty fence. + */ + am_fence = container_of(fence, struct amdgpu_fence, base); + drv->signalled_wptr = am_fence->wptr; dma_fence_signal(fence); dma_fence_put(fence); pm_runtime_mark_last_busy(adev_to_drm(adev)->dev); @@ -310,7 +297,9 @@ static void amdgpu_fence_fallback(struct timer_list *t) fence_drv.fallback_timer); if (amdgpu_fence_process(ring)) - DRM_WARN("Fence fallback timer expired on ring %s\n", ring->name); + dev_warn(ring->adev->dev, + "Fence fallback timer expired on ring %s\n", + ring->name); } /** @@ -748,6 +737,86 @@ void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring) amdgpu_fence_process(ring); } + +/** + * Kernel queue reset handling + * + * The driver can reset individual queues for most engines, but those queues + * may contain work from multiple contexts. Resetting the queue will reset + * lose all of that state. In order to minimize the collateral damage, the + * driver will save the ring contents which are not associated with the guilty + * context prior to resetting the queue. After resetting the queue the queue + * contents from the other contexts is re-emitted to the rings so that it can + * be processed by the engine. To handle this, we save the queue's write + * pointer (wptr) in the fences associated with each context. If we get a + * queue timeout, we can then use the wptrs from the fences to determine + * which data needs to be saved out of the queue's ring buffer. + */ + +/** + * amdgpu_fence_driver_guilty_force_completion - force signal of specified sequence + * + * @fence: fence of the ring to signal + * + */ +void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *fence) +{ + dma_fence_set_error(&fence->base, -ETIME); + amdgpu_fence_write(fence->ring, fence->seq); + amdgpu_fence_process(fence->ring); +} + +void amdgpu_fence_save_wptr(struct dma_fence *fence) +{ + struct amdgpu_fence *am_fence = container_of(fence, struct amdgpu_fence, base); + + am_fence->wptr = am_fence->ring->wptr; +} + +static void amdgpu_ring_backup_unprocessed_command(struct amdgpu_ring *ring, + u64 start_wptr, u32 end_wptr) +{ + unsigned int first_idx = start_wptr & ring->buf_mask; + unsigned int last_idx = end_wptr & ring->buf_mask; + unsigned int i; + + /* Backup the contents of the ring buffer. */ + for (i = first_idx; i != last_idx; ++i, i &= ring->buf_mask) + ring->ring_backup[ring->ring_backup_entries_to_copy++] = ring->ring[i]; +} + +void amdgpu_ring_backup_unprocessed_commands(struct amdgpu_ring *ring, + struct amdgpu_fence *guilty_fence) +{ + struct dma_fence *unprocessed; + struct dma_fence __rcu **ptr; + struct amdgpu_fence *fence; + u64 wptr, i, seqno; + + seqno = amdgpu_fence_read(ring); + wptr = ring->fence_drv.signalled_wptr; + ring->ring_backup_entries_to_copy = 0; + + for (i = seqno + 1; i <= ring->fence_drv.sync_seq; ++i) { + ptr = &ring->fence_drv.fences[i & ring->fence_drv.num_fences_mask]; + rcu_read_lock(); + unprocessed = rcu_dereference(*ptr); + + if (unprocessed && !dma_fence_is_signaled(unprocessed)) { + fence = container_of(unprocessed, struct amdgpu_fence, base); + + /* save everything if the ring is not guilty, otherwise + * just save the content from other contexts. + */ + if (!guilty_fence || (fence->context != guilty_fence->context)) + amdgpu_ring_backup_unprocessed_command(ring, wptr, + fence->wptr); + wptr = fence->wptr; + } + rcu_read_unlock(); + } +} + /* * Common fence implementation */ @@ -814,7 +883,7 @@ static void amdgpu_fence_free(struct rcu_head *rcu) struct dma_fence *f = container_of(rcu, struct dma_fence, rcu); /* free fence_slab if it's separated fence*/ - kmem_cache_free(amdgpu_fence_slab, to_amdgpu_fence(f)); + kfree(to_amdgpu_fence(f)); } /** |
