From d8100caf40a35904d27ce446fb2088b54277997a Mon Sep 17 00:00:00 2001 From: Erico Nunes Date: Fri, 5 Apr 2024 17:29:50 +0200 Subject: drm/lima: include pp bcast irq in timeout handler check In commit 53cb55b20208 ("drm/lima: handle spurious timeouts due to high irq latency") a check was added to detect an unexpectedly high interrupt latency timeout. With further investigation it was noted that on Mali-450 the pp bcast irq may also be a trigger of race conditions against the timeout handler, so add it to this check too. Signed-off-by: Erico Nunes Signed-off-by: Qiang Yu Link: https://patchwork.freedesktop.org/patch/msgid/20240405152951.1531555-3-nunes.erico@gmail.com --- drivers/gpu/drm/lima/lima_sched.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'drivers/gpu/drm/lima/lima_sched.c') diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index 00b19adfc888..66841503a618 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -422,6 +422,8 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job */ for (i = 0; i < pipe->num_processor; i++) synchronize_irq(pipe->processor[i]->irq); + if (pipe->bcast_processor) + synchronize_irq(pipe->bcast_processor->irq); if (dma_fence_is_signaled(task->fence)) { DRM_WARN("%s unexpectedly high interrupt latency\n", lima_ip_name(ip)); -- cgit From a421cc7a6a001b70415aa4f66024fa6178885a14 Mon Sep 17 00:00:00 2001 From: Erico Nunes Date: Fri, 5 Apr 2024 17:29:51 +0200 Subject: drm/lima: mask irqs in timeout path before hard reset There is a race condition in which a rendering job might take just long enough to trigger the drm sched job timeout handler but also still complete before the hard reset is done by the timeout handler. This runs into race conditions not expected by the timeout handler. In some very specific cases it currently may result in a refcount imbalance on lima_pm_idle, with a stack dump such as: [10136.669170] WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/lima/lima_devfreq.c:205 lima_devfreq_record_idle+0xa0/0xb0 ... [10136.669459] pc : lima_devfreq_record_idle+0xa0/0xb0 ... [10136.669628] Call trace: [10136.669634] lima_devfreq_record_idle+0xa0/0xb0 [10136.669646] lima_sched_pipe_task_done+0x5c/0xb0 [10136.669656] lima_gp_irq_handler+0xa8/0x120 [10136.669666] __handle_irq_event_percpu+0x48/0x160 [10136.669679] handle_irq_event+0x4c/0xc0 We can prevent that race condition entirely by masking the irqs at the beginning of the timeout handler, at which point we give up on waiting for that job entirely. The irqs will be enabled again at the next hard reset which is already done as a recovery by the timeout handler. Signed-off-by: Erico Nunes Reviewed-by: Qiang Yu Signed-off-by: Qiang Yu Link: https://patchwork.freedesktop.org/patch/msgid/20240405152951.1531555-4-nunes.erico@gmail.com --- drivers/gpu/drm/lima/lima_sched.c | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'drivers/gpu/drm/lima/lima_sched.c') diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index 66841503a618..bbf3f8feab94 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -430,6 +430,13 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job return DRM_GPU_SCHED_STAT_NOMINAL; } + /* + * The task might still finish while this timeout handler runs. + * To prevent a race condition on its completion, mask all irqs + * on the running core until the next hard reset completes. + */ + pipe->task_mask_irq(pipe); + if (!pipe->error) DRM_ERROR("%s job timeout\n", lima_ip_name(ip)); -- cgit