aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
Commit message (Collapse)AuthorAgeFilesLines
* drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr accessYang Wang2025-07-281-2/+4
| | | | | | | | | | Add lock protection for 'ring->wptr'/'ring->rptr' to ensure the correct execution. Fixes: 8652920d2c00 ("drm/amdgpu: add mutex lock for cper ring") Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
* drm/amdgpu: Use correct severity for BP threshold exceed eventXiang Liu2025-06-301-2/+4
| | | | | | | | | The severity of CPER for BP threshold exceed event should be set as CPER_SEV_FATAL to match the OOB implementation. Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Disable ACA on VFsVictor Skvortsov2025-04-081-2/+2
| | | | | | | | | | | | VFs query RAS error counts directly from host with AMDGPU_RAS_VIRT_ERROR_COUNT_QUERY. When ACA is enabled, an unusable aca_sysfs is created rather than amdgpu_ras_sysfs_create() Likewise, VFs depend on host support to query CPERs, rather than ACA component. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Fix computation for remain size of CPER ringXiang Liu2025-03-181-7/+8
| | | | | | | | | The mistake of computation for remain size of CPER ring will cause unbreakable while cycle when CPER ring overflow. Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Use unique CPER record id across devicesXiang Liu2025-03-071-5/+13
| | | | | | | | | | | Encode socket id to CPER record id to be unique across devices. v2: add pointer check for adev->smuio.funcs->get_socket_id v2: set 0 if adev->smuio.funcs->get_socket_id is NULL Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Free CPER entry after committing to ringXiang Liu2025-03-051-0/+3
| | | | | | | | Free CPER entry when it's committed to CPER ring to avoid memory leak. Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Fix spelling mistake "initiailize" -> "initialize" and grammarColin Ian King2025-02-271-1/+1
| | | | | | | | | There is a spelling mistake and a grammatical error in a dev_err message. Fix it. Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Disable fru_id field in CPER sectionXiang Liu2025-02-271-4/+1
| | | | | | | | | The fru_id field is disabled cause of mis-matching defination between CPER spec and driver. Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Set CPER enabled flag after ring initiailizedXiang Liu2025-02-251-1/+9
| | | | | | | | | Setting cper.enabled to be true only after cper ring is successfully created. Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Check aca enabled inside cper init/fini funcXiang Liu2025-02-191-0/+6
| | | | | | | | | Move code about checking aca enabled to the cper init/fini function to make code clean. Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Generate bad page threshold cper recordsXiang Liu2025-02-171-1/+23
| | | | | | | | | | | | Generate CPER record when bad page threshold exceed and commit to CPER ring. v2: return -ENOMEM instead of false v2: check return value of fill section function Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Commit CPER entryXiang Liu2025-02-171-2/+4
| | | | | | | | Commit the CPER entry to the ring buffer. Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add mutex lock for cper ringTao Zhou2025-02-171-0/+4
| | | | | | | | Avoid the confliction between read and write of ring buffer. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add data write function for CPER ringTao Zhou2025-02-171-1/+94
| | | | | | | | | Old CPER data will be overwritten if ring buffer is full, and read pointer always points to CPER header. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add RAS CPER ring bufferTao Zhou2025-02-171-4/+35
| | | | | | | | | | | | | | | And initialize it, this is a pure software ring to store RAS CPER data. v2: change ring size to 0x100000 v2: update the initialization of count_dw of cper ring, it's dword variable v3: skip VM inv eng for cper v3: init/fini when aca enabled Signed-off-by: Tao Zhou <[email protected]> Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Get timestamp from system timeXiang Liu2025-02-171-1/+18
| | | | | | | | Get system local time and encode it to timestamp for CPER. Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Introduce funcs for generating cper recordHawking Zhang2025-02-171-0/+108
| | | | | | | | | | | | | | Introduce new functions that are used to generate cper ue or ce records. v2: return -ENOMEM instead of false v2: check return value of fill section function Signed-off-by: Hawking Zhang <[email protected]> Signed-off-by: Xiang Liu <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Introduce funcs for populating CPERHawking Zhang2025-02-171-0/+281
Introduce utility functions designed to assist in populating CPER records. v2: call cper_init/fini in device_ip_init/fini. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>