drm/amdgpu: Avoid VF for RAS recovery source check

VF device sets the RAS flag when mailbox data can't be read properly. There is no conclusive way to tell if the real source is RAS error. Therefore VF schedules a KFD based reset which doesn't set RAS source. SKip checking RAS source for any VF scheduled recovery. Signed-off-by: Lijo Lazar <[email protected]> Reported-by: Vojislav Tomasevic <[email protected]> Reviewed-by: Yiqing Yao <[email protected]> Tested-by: Yiqing Yao <[email protected]> Fixes: e1ee2111ca48 ("drm/amdgpu: Prefer RAS recovery for scheduler hang") Signed-off-by: Alex Deucher <[email protected]>
author: Lijo Lazar <[email protected]> 2024-12-09 03:44:53 +0000
committer: Alex Deucher <[email protected]> 2024-12-11 22:30:59 +0000
commit: fccb446f82b9155c05758d1fa30af4a06494e0ec (patch)
tree: 6c5f95e1fa748135a58d153abc208d4f91ed7a6b /drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
parent: drm/amdgpu/sdma7: implement queue reset callback for sdma7 (diff)
download: kernel-fccb446f82b9155c05758d1fa30af4a06494e0ec.tar.gz
kernel-fccb446f82b9155c05758d1fa30af4a06494e0ec.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 144295da9e4c..e22fc7a8101f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5866,6 +5866,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 	 * detected at the same time, let RAS recovery take care of it.
 	 */
 	if (amdgpu_ras_is_err_state(adev, AMDGPU_RAS_BLOCK__ANY) &&
+	    !amdgpu_sriov_vf(adev) &&
 	    reset_context->src != AMDGPU_RESET_SRC_RAS) {
 		dev_dbg(adev->dev,
 			"Gpu recovery from source: %d yielding to RAS error recovery handling",
author	Lijo Lazar <[email protected]>	2024-12-09 03:44:53 +0000
committer	Alex Deucher <[email protected]>	2024-12-11 22:30:59 +0000
commit	fccb446f82b9155c05758d1fa30af4a06494e0ec (patch)
tree	6c5f95e1fa748135a58d153abc208d4f91ed7a6b /drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
parent	drm/amdgpu/sdma7: implement queue reset callback for sdma7 (diff)
download	kernel-fccb446f82b9155c05758d1fa30af4a06494e0ec.tar.gz kernel-fccb446f82b9155c05758d1fa30af4a06494e0ec.zip