| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
There is a statement that has an extraneous semicolon; remove it.
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/664675/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a VM is marked as an usuable we disallow new submissions from it,
however submissions that where already scheduled on the ring would still
be re-sent.
Since this can lead to further hangs, avoid emitting the actual IBs.
Fixes: 6a4d287a1ae6 ("drm/msm: Mark VM as unusable on GPU hangs")
Signed-off-by: Antonino Maniscalco <[email protected]>
Reviewed-by: Akhil P Oommen <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/668314/
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The global fault counter is no longer used since commit 12578c075f89
("drm/msm/gpu: Skip retired submits in recover worker"). However, it's
still needed, as we need to handle cases where a GPU fault occurs after
the faulting process has already ended.
Hence, increment the global fault counter when the submitting process
had already ended. This way, the number of faults returned by
MSM_PARAM_FAULTS will stay consistent.
While here, s/unusuable/unusable.
Fixes: 12578c075f89 ("drm/msm/gpu: Skip retired submits in recover worker")
Signed-off-by: Maíra Canal <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/664853/
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When userspace opts in to VM_BIND, the submit no longer holds references
keeping the VMA alive. This makes it difficult to distinguish between
UMD/KMD/app bugs. So add a debug option for logging the most recent VM
updates and capturing these in GPU devcoredumps.
The submitqueue id is also captured, a value of zero means the operation
did not go via a submitqueue (ie. comes from msm_gem_vm_close() tearing
down the remaining mappings when the device file is closed.
Signed-off-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Antonino Maniscalco <[email protected]>
Reviewed-by: Antonino Maniscalco <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/661518/
|
| |
|
|
|
|
|
|
|
|
|
| |
In this case, we need to iterate the VMAs looking for ones with
MSM_VMA_DUMP flag.
Signed-off-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Antonino Maniscalco <[email protected]>
Reviewed-by: Antonino Maniscalco <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/661504/
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In this case, userspace could request dumping partial GEM obj mappings.
Also drop use of should_dump() helper, which really only makes sense in
the old submit->bos[] table world.
Signed-off-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Antonino Maniscalco <[email protected]>
Reviewed-by: Antonino Maniscalco <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/661496/
|
| |
|
|
|
|
|
|
|
|
|
| |
If userspace has opted-in to VM_BIND, then GPU hangs and VM_BIND errors
will mark the VM as unusable.
Signed-off-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Antonino Maniscalco <[email protected]>
Reviewed-by: Antonino Maniscalco <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/661499/
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a SET_PARAM for userspace to request to manage to the VM itself,
instead of getting a kernel managed VM.
In order to transition to a userspace managed VM, this param must be set
before any mappings are created.
Signed-off-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Antonino Maniscalco <[email protected]>
Reviewed-by: Antonino Maniscalco <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/661494/
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the driver code doesn't need to reach in to msm specific fields,
so just use the drm_gpuvm/drm_gpuva types directly. This should
hopefully improve commonality with other drivers and make the code
easier to understand.
Signed-off-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Antonino Maniscalco <[email protected]>
Reviewed-by: Antonino Maniscalco <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/661483/
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is standing in the way of drm_gpuvm / VM_BIND support. Not to
mention frequently broken and rarely tested. And I think only needed
for a 10yr old not quite upstream SoC (msm8974).
Maybe we can add support back in later, but I'm doubtful.
Signed-off-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Antonino Maniscalco <[email protected]>
Reviewed-by: Antonino Maniscalco <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/661467/
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-aligning naming to better match drm_gpuvm terminology will make
things less confusing at the end of the drm_gpuvm conversion.
This is just rename churn, no functional change.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Antonino Maniscalco <[email protected]>
Reviewed-by: Antonino Maniscalco <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/661466/
|
| |
|
|
|
|
|
|
|
|
|
| |
This is a more descriptive name.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Antonino Maniscalco <[email protected]>
Reviewed-by: Antonino Maniscalco <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/661459/
|
| |\
| |
| |
| |
| |
| |
| | |
Back-merge drm-next to (indirectly) get arm-smmu updates for making
stall-on-fault more reliable.
Signed-off-by: Rob Clark <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
| |/
|
|
|
|
|
|
|
|
|
|
| |
Now that we use a threaded IRQ, it should be safe to do this in the
fault handler.
We can also remove fault_info from struct msm_gpu and just pass it
directly.
Signed-off-by: Connor Abbott <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/654889/
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In the case of iova fault triggered devcore dumps, include additional
debug information based on what we think is the current page tables,
including the TTBR0 value (which should match what we have in
adreno_smmu_fault_info unless things have gone horribly wrong), and
the pagetable entries traversed in the process of resolving the
faulting iova.
Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/628117/
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
kthread_create() creates a kthread without running it yet. kthread_run()
creates a kthread and runs it.
On the other hand, kthread_create_worker() creates a kthread worker and
runs it.
This difference in behaviours is confusing. Also there is no way to
create a kthread worker and affine it using kthread_bind_mask() or
kthread_affine_preferred() before starting it.
Consolidate the behaviours and introduce kthread_run_worker[_on_cpu]()
that behaves just like kthread_run(). kthread_create_worker[_on_cpu]()
will now only create a kthread worker without starting it.
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With preemption it is not enough to track the current_ctx_seqno globally
as execution might switch between rings.
This is especially problematic when current_ctx_seqno is used to
determine whether a page table switch is necessary as it might lead to
security bugs.
Track current context per ring.
Tested-by: Rob Clark <[email protected]>
Tested-by: Neil Armstrong <[email protected]> # on SM8650-QRD
Tested-by: Neil Armstrong <[email protected]> # on SM8550-QRD
Tested-by: Neil Armstrong <[email protected]> # on SM8450-HDK
Signed-off-by: Antonino Maniscalco <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/618012/
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some cases, such as the one uncovered by Commit 46d4efcccc68
("drm/msm/a6xx: Avoid a nullptr dereference when speedbin setting fails")
where
msm_gpu_cleanup() : platform_set_drvdata(gpu->pdev, NULL);
is called on gpu->pdev == NULL, as the GPU device has not been fully
initialized yet.
Turns out that there's more than just the aforementioned path that
causes this to happen (e.g. the case when there's speedbin data in the
catalog, but opp-supported-hw is missing in DT).
Assigning msm_gpu->pdev earlier seems like the least painful solution
to this, therefore do so.
Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/602742/
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
When debugging faults, it is useful to know how the BO is mapped (cached
vs WC, gpu readonly, etc).
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Akhil P Oommen <[email protected]>
Acked-by: Konrad Dybcio <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/593854/
|
| |
|
|
|
|
|
|
|
|
| |
Two core driver files include headers from Adreno subdir, which also
brings dependency on the Adreno register headers. Rework those includes
to remove unnecessary dependency.
Signed-off-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/585850/
Link: https://lore.kernel.org/r/[email protected]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit abe2023b4cea192ab266b351fd38dc9dbd846df0.
Changing the locking order means that scheduler/msm_job_run() can race
with the recovery kthread worker, with the result that the GPU gets an
extra runpm get when we are trying to power it off. Leaving the GPU in
an unrecovered state.
I'll need to come up with a different scheme for appeasing lockdep.
Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/573835/
|
| |
|
|
|
|
|
|
|
| |
If we somehow raced with submit retiring, either while waiting for
worker to have a chance to run or acquiring the gpu lock, then the
recover worker should just bail.
Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/568034/
|
| |
|
|
|
|
|
|
|
|
|
| |
The dpu devcore's are already associated with the dpu device. So we
should associate the gpu devcore's with the gpu device, for easier
classification.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Abhinav Kumar <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/567738/
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid holding gpu lock when calling runpm, to avoid this lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
6.4.3-debug+ #14 Not tainted
------------------------------------------------------
ring0/373 is trying to acquire lock:
ffffffead86efb98 (prepare_lock){+.+.}-{3:3}, at: clk_prepare_lock+0x70/0x98
but task is already holding lock:
ffffff809cd19170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x7c/0x128 [msm]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (&gpu->lock){+.+.}-{3:3}:
__mutex_lock+0xc8/0x388
mutex_lock_nested+0x2c/0x38
msm_job_run+0x7c/0x128 [msm]
drm_sched_main+0x264/0x354 [gpu_sched]
kthread+0xf0/0x100
ret_from_fork+0x10/0x20
-> #3 (dma_fence_map){++++}-{0:0}:
__dma_fence_might_wait+0x74/0xc0
dma_resv_lockdep+0x1f0/0x2e8
do_one_initcall+0xb4/0x214
kernel_init_freeable+0x338/0x33c
kernel_init+0x30/0x134
ret_from_fork+0x10/0x20
-> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
fs_reclaim_acquire+0x7c/0x9c
slab_pre_alloc_hook.constprop.0+0x40/0x250
__kmem_cache_alloc_node+0x60/0x18c
kmalloc_node_trace+0x40/0x84
alloc_worker+0x2c/0x64
init_rescuer+0x34/0xe0
workqueue_init+0x168/0x1fc
kernel_init_freeable+0x15c/0x33c
kernel_init+0x30/0x134
ret_from_fork+0x10/0x20
-> #1 (fs_reclaim){+.+.}-{0:0}:
__fs_reclaim_acquire+0x3c/0x48
fs_reclaim_acquire+0x50/0x9c
slab_pre_alloc_hook.constprop.0+0x40/0x250
__kmem_cache_alloc_node+0x60/0x18c
kmalloc_trace+0x44/0x88
clk_rcg2_dfs_determine_rate+0x60/0x214
clk_core_determine_round_nolock+0xb8/0xf0
clk_core_round_rate_nolock+0x84/0x118
clk_core_round_rate_nolock+0xd8/0x118
clk_round_rate+0x6c/0xd0
geni_se_clk_tbl_get+0x78/0xc0
geni_se_clk_freq_match+0x44/0xe4
get_spi_clk_cfg+0x50/0xf4
geni_spi_set_clock_and_bw+0x54/0x104
spi_geni_prepare_message+0x130/0x174
__spi_pump_transfer_message+0x200/0x4d8
__spi_sync+0x13c/0x23c
spi_sync_locked+0x18/0x24
do_cros_ec_pkt_xfer_spi+0x124/0x3f0
cros_ec_xfer_high_pri_work+0x28/0x3c
kthread_worker_fn+0x14c/0x27c
kthread+0xf0/0x100
ret_from_fork+0x10/0x20
-> #0 (prepare_lock){+.+.}-{3:3}:
__lock_acquire+0xdf8/0x109c
lock_acquire+0x234/0x284
__mutex_lock+0xc8/0x388
mutex_lock_nested+0x2c/0x38
clk_prepare_lock+0x70/0x98
clk_prepare+0x24/0x50
clk_bulk_prepare+0x50/0x9c
a6xx_gmu_resume+0x94/0x800 [msm]
a6xx_gmu_pm_resume+0x38/0x158 [msm]
adreno_runtime_resume+0x2c/0x38 [msm]
pm_generic_runtime_resume+0x30/0x44
__rpm_callback+0x4c/0x134
rpm_callback+0x78/0x7c
rpm_resume+0x3a4/0x46c
__pm_runtime_resume+0x78/0xbc
pm_runtime_get_sync.isra.0+0x14/0x20 [msm]
msm_gpu_submit+0x4c/0x12c [msm]
msm_job_run+0x88/0x128 [msm]
drm_sched_main+0x264/0x354 [gpu_sched]
kthread+0xf0/0x100
ret_from_fork+0x10/0x20
other info that might help us debug this:
Chain exists of:
prepare_lock --> dma_fence_map --> &gpu->lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&gpu->lock);
lock(dma_fence_map);
lock(&gpu->lock);
lock(prepare_lock);
*** DEADLOCK ***
2 locks held by ring0/373:
#0: ffffffead875ae50 (dma_fence_map){++++}-{0:0}, at: drm_sched_main+0x54/0x354 [gpu_sched]
#1: ffffff809cd19170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x7c/0x128 [msm]
stack backtrace:
CPU: 2 PID: 373 Comm: ring0 Not tainted 6.4.3-debug+ #14
Hardware name: Google Villager (rev1+) with LTE (DT)
Call trace:
dump_backtrace+0xb4/0xf0
show_stack+0x20/0x30
dump_stack_lvl+0x60/0x84
dump_stack+0x18/0x24
print_circular_bug+0x1cc/0x234
check_noncircular+0x78/0xac
__lock_acquire+0xdf8/0x109c
lock_acquire+0x234/0x284
__mutex_lock+0xc8/0x388
mutex_lock_nested+0x2c/0x38
clk_prepare_lock+0x70/0x98
clk_prepare+0x24/0x50
clk_bulk_prepare+0x50/0x9c
a6xx_gmu_resume+0x94/0x800 [msm]
a6xx_gmu_pm_resume+0x38/0x158 [msm]
adreno_runtime_resume+0x2c/0x38 [msm]
pm_generic_runtime_resume+0x30/0x44
__rpm_callback+0x4c/0x134
rpm_callback+0x78/0x7c
rpm_resume+0x3a4/0x46c
__pm_runtime_resume+0x78/0xbc
pm_runtime_get_sync.isra.0+0x14/0x20 [msm]
msm_gpu_submit+0x4c/0x12c [msm]
msm_job_run+0x88/0x128 [msm]
drm_sched_main+0x264/0x354 [gpu_sched]
kthread+0xf0/0x100
ret_from_fork+0x10/0x20
Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/552298/
|
| |
|
|
|
|
|
|
| |
Basically everywhere wants the base ptr type. So store that instead of
msm_gem_object.
Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/551021/
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
There is no need to call the DRM_DEV_ERROR() function directly to print
a custom message when handling an error from platform_get_irq() function
as it is going to display an appropriate error message
in case of a failure.
Signed-off-by: Ruan Jinjie <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/549499/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
|
| |\
| |
| |
| |
| |
| |
| |
| |
| | |
msm-next-lumag-base
Merge the drm-next tree to pick up the DRM DSC helpers (merged via
drm-intel-next tree). MSM DSC v1.2 patches depend on these helpers.
Signed-off-by: Dmitry Baryshkov <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that we have a common helper, use it.
v2: Rebase on drm-misc-next
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Acked-by: Dave Airlie <[email protected]>
Signed-off-by: Neil Armstrong <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
| |/
|
|
|
|
|
|
|
| |
This is something that can block for arbitrary amounts of time as
userspace consumes from the FIFO. So we don't really want this to
be in the fence signaling path.
Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/532617/
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some older GPUs (namely a2xx with no opp tables at all and a320 with
downstream-remnants gpu pwrlevels) used not to have OPP tables. They
both however had just one frequency defined, making it extremely easy
to construct such an OPP table from within the driver if need be.
Do so and switch all clk_set_rate calls on core_clk to their OPP
counterparts.
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/523784/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the unused 'reset' interface which was supposed to help to ensure
that cx gdsc has collapsed during gpu recovery. This is was not enabled
so far due to missing gpucc driver support. Similar functionality using
genpd framework will be implemented in the upcoming patch.
This effectively reverts commit 1f6cca404918
("drm/msm/a6xx: Ensure CX collapse during gpu recovery").
Signed-off-by: Akhil P Oommen <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Reviewed-by: Philipp Zabel <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/516470/
Link: https://lore.kernel.org/r/20230102161757.v5.4.I96e0bf9eaf96dd866111c1eec8a4c9b70fd7cbcb@changeid
Signed-off-by: Rob Clark <[email protected]>
|
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
https://gitlab.freedesktop.org/drm/msm into drm-fixes
msm-fixes for v6.3-rc5
Two GPU fixes which were meant to be part of the previous pull request,
but I'd forgotten to fetch from gitlab after the MR was merged so that
git tag was applied to the wrong commit.
- kexec shutdown fix
- fix potential double free
Signed-off-by: Dave Airlie <[email protected]>
From: Rob Clark <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGskguoVsz2wqAK2k+f32LwcVY5JC6+e2RwLqZswz3RY2Q@mail.gmail.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If userspace was calling the MSM_SET_PARAM ioctl on multiple threads to
set the COMM or CMDLINE param, it could trigger a race causing the
previous value to be kfree'd multiple times. Fix this by serializing on
the gpu lock.
Signed-off-by: Rob Clark <[email protected]>
Fixes: d4726d770068 ("drm/msm: Add a way to override processes comm/cmdline")
Patchwork: https://patchwork.freedesktop.org/patch/517778/
Link: https://lore.kernel.org/r/[email protected]
|
| |\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
https://gitlab.freedesktop.org/drm/msm into drm-next
msm-next for v6.2 (the gpu/gem bits)
- Remove exclusive-fence hack that caused over-synchronization
- Fix speed-bin detection vs. probe-defer
- Enable clamp_to_idle on 7c3
- Improved hangcheck detection
Signed-off-by: Dave Airlie <[email protected]>
From: Rob Clark <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvT1h_S4d=YRgphgR8i7aMaxQaNW8mru7QaoUo9uiUk2A@mail.gmail.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If the hangcheck timer expires, check if the fw's position in the
cmdstream has advanced (changed) since last timer expiration, and
allow it up to three additional "extensions" to it's alotted time.
The intention is to continue to catch "shader stuck in a loop" type
hangs quickly, but allow more time for things that are actually
making forward progress.
Because we need to sample the CP state twice to detect if there has
not been progress, this also cuts the the timer's duration in half.
v2: Fix typo (REG_A6XX_CP_CSQ_IB2_STAT), add comment
v3: Only halve hangcheck timer duration for generations which
support progress detection (hdanton); removed unused a5xx
progress (without knowing how to adjust for data buffered
in ROQ it is too likely to report a false negative)
v4: Comment updates to better describe the total hangcheck
duration when progress detection is applied
Reviewed-by: Chia-I Wu <[email protected]>
Tested-by: Chia-I Wu <[email protected]> # dEQP-GLES2.functional.flush_finish.wait
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Akhil P Oommen <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/511584/
Link: https://lore.kernel.org/r/[email protected]
|
| |/
|
|
|
|
|
|
|
|
|
| |
In adreno_unbind, we should clean up gpu device's drvdata to avoid
accessing a stale pointer during system suspend. Also, check for NULL
ptr in both system suspend/resume callbacks.
Signed-off-by: Akhil P Oommen <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/505075/
Link: https://lore.kernel.org/r/20220928124830.2.I5ee0ac073ccdeb81961e5ec0cce5f741a7207a71@changeid
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because there could be transient votes from other drivers/tz/hyp which
may keep the cx gdsc enabled, we should poll until cx gdsc collapses.
We can use the reset framework to poll for cx gdsc collapse from gpucc
clk driver.
This feature requires support from the platform's gpucc driver.
Signed-off-by: Akhil P Oommen <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Philipp Zabel <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/498397/
Link: https://lore.kernel.org/r/20220819015030.v5.5.I176567525af2b9439a7e485d0ca130528666a55c@changeid
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some hardware logic under CX domain. For a successful
recovery, we should ensure cx headswitch collapses to ensure all the
stale states are cleard out. This is especially true to for a6xx family
where we can GMU co-processor.
Currently, cx doesn't collapse due to a devlink between gpu and its
smmu. So the *struct gpu device* needs to be runtime suspended to ensure
that the iommu driver removes its vote on cx gdsc.
Signed-off-by: Akhil P Oommen <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/498398/
Link: https://lore.kernel.org/r/20220819015030.v5.4.I4ac27a0b34ea796ce0f938bb509e257516bc6f57@changeid
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In the scenario where there is one a single submit which is hung, gpu is
power collapsed when it is retired. Because of this, by the time we call
reover(), gpu state would be already clear. Fix this by correctly
managing the pm runtime votes.
Signed-off-by: Akhil P Oommen <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/498391/
Link: https://lore.kernel.org/r/20220819015030.v5.3.Ib07ecec3d5c17cb0e1efa6fcddaaa019ec2fb556@changeid
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
Instead of separate refcount for each submit, take single rpm refcount
on behalf of all the submits. This makes it easier to drop the rpm
refcount during recovery in an upcoming patch.
Signed-off-by: Akhil P Oommen <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/498392/
Link: https://lore.kernel.org/r/20220819015030.v5.2.Ifee853f6d8217a0fdacc459092bbc9e81a8a7ac7@changeid
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This converts over to use the shared GEM LRU/shrinker helpers. Note
that it means we are no longer tracking purgeable or willneed buffers
that are active separately. But the most recently pinned buffers should
be at the tail of the various LRUs, and the shrinker is already prepared
to encounter objects which are still active.
Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/496131/
Link: https://lore.kernel.org/r/[email protected]
|
| |
|
|
|
|
|
|
|
| |
Handle the demotion to MSM_BO_WC at the userspace ABI level, and fix
the remaining internal MSM_BO_UNCACHED user.
Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/489339/
Link: https://lore.kernel.org/r/[email protected]
|
| |
|
|
|
|
|
|
|
|
| |
When trying to understand an iova fault devcore, once you figure out
which buffer we accessed beyond the end of, it is useful to see the
buffer's debug label.
Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/491910/
Link: https://lore.kernel.org/r/[email protected]
|
| |
|
|
|
|
|
|
|
|
| |
It is useful to know what buffers userspace thinks are associated with
the submit, even if we don't care to capture their content. This brings
things more inline with $debugfs/rd cmdstream dumping.
Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/491908/
Link: https://lore.kernel.org/r/[email protected]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to AMD commit
874442541133 ("drm/amdgpu: Add show_fdinfo() interface"), using the
infrastructure added in previous patches, we add basic client info
and GPU engine utilisation for msm.
Example output:
# cat /proc/`pgrep glmark2`/fdinfo/6
pos: 0
flags: 02400002
mnt_id: 21
ino: 162
drm-driver: msm
drm-client-id: 7
drm-engine-gpu: 1734371319 ns
drm-cycles-gpu: 1153645024
drm-maxfreq-gpu: 800000000 Hz
See also: https://patchwork.freedesktop.org/patch/468505/
v2: Add dev-maxfreq-$engine and update drm-usage-stats.rst
v3: spelling and compiler warning
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Acked-by: Tvrtko Ursulin <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/488906/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to the last commit, this could result in setting the GPU
written fence value back to an older value, if we had missed
updating completed_fence prior to suspend. This was mostly
harmless as the GPU would eventually overwrite it again with
the correct value. But we should just not do this. Instead
just leave a sanity check that the fence looks plausible (in
case the GPU scribbled on memory).
Reported-by: Steev Klimaszewski <[email protected]>
Fixes: 95d1deb02a9c ("drm/msm/gem: Add fenced vma unpin")
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Steev Klimaszewski <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/490138/
Link: https://lore.kernel.org/r/[email protected]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I noticed while looking at some traces, that we could miss calls to
msm_update_fence(), as the irq could have raced with retire_submits()
which could have already popped the last submit on a ring out of the
queue of in-flight submits. But walking the list of submits in the
irq handler isn't really needed, as dma_fence_is_signaled() will dtrt.
So lets just drop it entirely.
v2: use spin_lock_irqsave/restore as we are no longer protected by the
spin_lock_irqsave/restore() in update_fences()
Reported-by: Steev Klimaszewski <[email protected]>
Fixes: 95d1deb02a9c ("drm/msm/gem: Add fenced vma unpin")
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Steev Klimaszewski <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/490136/
Link: https://lore.kernel.org/r/[email protected]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In msm_devfreq_suspend() we cancel idle_work synchronously so that it
doesn't run after we power of the hw or in the resume path. But this
means that we want to ensure that idle_work is not scheduled *after* we
no longer hold a runpm ref. So switch the ordering of pm_runtime_put()
vs msm_devfreq_idle().
v2. Only move the runpm _put_autosuspend, and not the _mark_last_busy()
Fixes: 9bc95570175a ("drm/msm: Devfreq tuning")
Signed-off-by: Rob Clark <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Akhil P Oommen <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
Check if 'aspace' is set before using it as it will stay null without
IOMMU, such as on msm8974.
Fixes: bc2112583a0b ("drm/msm/gpu: Track global faults per address-space")
Signed-off-by: Luca Weiss <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Clark <[email protected]>
|