aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd
Commit message (Collapse)AuthorAgeFilesLines
* drm/amd/pm: always allow ih interrupt from fwKenneth Feng2025-03-051-11/+1
| | | | | | | | | | | always allow ih interrupt from fw on smu v14 based on the interface requirement Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit a3199eba46c54324193607d9114a1e321292d7a1) Cc: [email protected] # 6.12.x
* drm/amdkfd: Fix NULL Pointer Dereference in KFD queueAndrew Martin2025-03-051-2/+2
| | | | | | | | | | | | | Through KFD IOCTL Fuzzing we encountered a NULL pointer derefrence when calling kfd_queue_acquire_buffers. Fixes: 629568d25fea ("drm/amdkfd: Validate queue cwsr area and eop buffer size") Signed-off-by: Andrew Martin <[email protected]> Reviewed-by: Philip Yang <[email protected]> Signed-off-by: Andrew Martin <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 049e5bf3c8406f87c3d8e1958e0a16804fa1d530) Cc: [email protected]
* drm/amd/display: Fix null check for pipe_ctx->plane_state in ↵Ma Ke2025-03-051-1/+2
| | | | | | | | | | | | | | | | | resource_build_scaling_params Null pointer dereference issue could occur when pipe_ctx->plane_state is null. The fix adds a check to ensure 'pipe_ctx->plane_state' is not null before accessing. This prevents a null pointer dereference. Found by code review. Fixes: 3be5262e353b ("drm/amd/display: Rename more dc_surface stuff to plane_state") Reviewed-by: Alex Hung <[email protected]> Signed-off-by: Ma Ke <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 63e6a77ccf239337baa9b1e7787cde9fa0462092) Cc: [email protected]
* drm/amdgpu: init return value in amdgpu_ttm_clear_bufferPierre-Eric Pelloux-Prayer2025-02-251-1/+1
| | | | | | | | | | | | | | | Otherwise an uninitialized value can be returned if amdgpu_res_cleared returns true for all regions. Possibly closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") Signed-off-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 7c62aacc3b452f73a1284198c81551035fac6d71) Cc: [email protected]
* drm/amd/display: Fix HPD after gpu resetRoman Li2025-02-251-0/+14
| | | | | | | | | | | | | | | | | | | | | [Why] DC is not using amdgpu_irq_get/put to manage the HPD interrupt refcounts. So when amdgpu_irq_gpu_reset_resume_helper() reprograms all of the IRQs, HPD gets disabled. [How] Use amdgpu_irq_get/put() for HPD init/fini in DM in order to sync refcounts Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Reviewed-by: Aurabindo Pillai <[email protected]> Signed-off-by: Roman Li <[email protected]> Signed-off-by: Zaeem Mohamed <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit f3dde2ff7fcaacd77884502e8f572f2328e9c745) Cc: [email protected]
* drm/amd/display: add a quirk to enable eDP0 on DP1Yilin Chen2025-02-251-7/+62
| | | | | | | | | | | | | | | | | | | | | | [why] some board designs have eDP0 connected to DP1, need a way to enable support_edp0_on_dp1 flag, otherwise edp related features cannot work [how] do a dmi check during dm initialization to identify systems that require support_edp0_on_dp1. Optimize quirk table with callback functions to set quirk entries, retrieve_dmi_info can set quirks according to quirk entries Cc: Mario Limonciello <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Reviewed-by: Nicholas Kazlauskas <[email protected]> Signed-off-by: Yilin Chen <[email protected]> Signed-off-by: Zaeem Mohamed <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit f6d17270d18a6a6753fff046330483d43f8405e4) Cc: [email protected]
* drm/amd/display: Disable PSR-SU on eDP panelsTom Chung2025-02-251-1/+2
| | | | | | | | | | | | | | | | | | | | [Why] PSR-SU may cause some glitching randomly on several panels. [How] Temporarily disable the PSR-SU and fallback to PSR1 for all eDP panels. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3388 Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Sun peng Li <[email protected]> Signed-off-by: Tom Chung <[email protected]> Signed-off-by: Roman Li <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 6deeefb820d0efb0b36753622fb982d03b37b3ad) Cc: [email protected]
* drm/amd/display: restore edid reading from a given i2c adapterMelissa Wen2025-02-251-2/+15
| | | | | | | | | | | | | | | | | | | | | When switching to drm_edid, we slightly changed how to get edid by removing the possibility of getting them from dc_link when in aux transaction mode. As MST doesn't initialize the connector with `drm_connector_init_with_ddc()`, restore the original behavior to avoid functional changes. v2: - Fix build warning of unchecked dereference (kernel test bot) CC: Alex Hung <[email protected]> CC: Mario Limonciello <[email protected]> CC: Roman Li <[email protected]> CC: Aurabindo Pillai <[email protected]> Fixes: 48edb2a4256e ("drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid") Reviewed-by: Alex Hung <[email protected]> Signed-off-by: Melissa Wen <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 81262b1656feb3813e3d917ab78824df6831e69e)
* drm/amdgpu/mes: keep enforce isolation up to dateAlex Deucher2025-02-255-11/+32
| | | | | | | | | | | | | Re-send the mes message on resume to make sure the mes state is up to date. Fixes: 8521e3c5f058 ("drm/amd/amdgpu: limit single process inside MES") Acked-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: Shaoyun Liu <[email protected]> Cc: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 27b791514789844e80da990c456c2465325e0851)
* drm/amdgpu/gfx: only call mes for enforce isolation if supportedAlex Deucher2025-02-251-2/+4
| | | | | | | | | | | | This should not be called on chips without MES so check if MES is enabled and if the cleaner shader is supported. Fixes: 8521e3c5f058 ("drm/amd/amdgpu: limit single process inside MES") Reviewed-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: Shaoyun Liu <[email protected]> Cc: Srinivasan Shanmugam <[email protected]> (cherry picked from commit 80513e389765c8f9543b26d8fa4bbdf0e59ff8bc)
* drm/amdgpu: disable BAR resize on Dell G5 SEAlex Deucher2025-02-251-0/+7
| | | | | | | | | | | | | | | | | | There was a quirk added to add a workaround for a Sapphire RX 5600 XT Pulse that didn't allow BAR resizing. However, the quirk caused a regression with runtime pm on Dell laptops using those chips, rather than narrowing the scope of the resizing quirk, add a quirk to prevent amdgpu from resizing the BAR on those Dell platforms unless runtime pm is disabled. v2: update commit message, add runpm check Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707 Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse") Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 5235053f443cef4210606e5fb71f99b915a9723d) Cc: [email protected]
* drm/amdkfd: Preserve cp_hqd_pq_control on update_mqdDavid Yat Sin2025-02-254-7/+14
| | | | | | | | | | | | When userspace applications call AMDKFD_IOC_UPDATE_QUEUE. Preserve bitfields that do not need to be modified as they contain flags to track queue states that are used by CP FW. Signed-off-by: David Yat Sin <[email protected]> Reviewed-by: Jay Cornwall <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 8150827990b709ab5a40c46c30d21b7f7b9e9440) Cc: [email protected]
* amdgpu/pm/legacy: fix suspend/resume issueschr[]2025-02-253-14/+45
| | | | | | | | | | | | | | | | | | | resume and irq handler happily races in set_power_state() * amdgpu_legacy_dpm_compute_clocks() needs lock * protect irq work handler * fix dpm_enabled usage v2: fix clang build, integrate Lijo's comments (Alex) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2524 Fixes: 3712e7a49459 ("drm/amd/pm: unified lock protections in amdgpu_dpm.c") Reviewed-by: Lijo Lazar <[email protected]> Tested-by: Maciej S. Szmigiero <[email protected]> # on Oland PRO Signed-off-by: chr[] <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit ee3dc9e204d271c9c7a8d4d38a0bce4745d33e71) Cc: [email protected]
* drm/amdgpu: avoid buffer overflow attach in smu_sys_set_pp_table()Jiang Liu2025-02-131-1/+2
| | | | | | | | | | | It malicious user provides a small pptable through sysfs and then a bigger pptable, it may cause buffer overflow attack in function smu_sys_set_pp_table(). Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Jiang Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
* drm/amdkfd: Ensure consistent barrier state saved in gfx12 trap handlerLancelot SIX2025-02-132-1/+6
| | | | | | | | | | | | | | | | It is possible for some waves in a workgroup to finish their save sequence before the group leader has had time to capture the workgroup barrier state. When this happens, having those waves exit do impact the barrier state. As a consequence, the state captured by the group leader is invalid, and is eventually incorrectly restored. This patch proposes to have all waves in a workgroup wait for each other at the end of their save sequence (just before calling s_endpgm_saved). Signed-off-by: Lancelot SIX <[email protected]> Reviewed-by: Jay Cornwall <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.12.x
* drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode()Jiang Liu2025-02-131-2/+3
| | | | | | | | | | In function psp_init_cap_microcode(), it should bail out when failed to load firmware, otherwise it may cause invalid memory access. Fixes: 07dbfc6b102e ("drm/amd: Use `amdgpu_ucode_*` helpers for PSP") Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Jiang Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* amdkfd: properly free gang_ctx_bo when failed to init user queueZhu Lingshan2025-02-131-1/+1
| | | | | | | | | | | | | | | | | | The destructor of a gtt bo is declared as void amdgpu_amdkfd_free_gtt_mem(struct amdgpu_device *adev, void **mem_obj); Which takes void** as the second parameter. GCC allows passing void* to the function because void* can be implicitly casted to any other types, so it can pass compiling. However, passing this void* parameter into the function's execution process(which expects void** and dereferencing void**) will result in errors. Signed-off-by: Zhu Lingshan <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Fixes: fb91065851cd ("drm/amdkfd: Refactor queue wptr_bo GART mapping") Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: bump version for RV/PCO compute fixAlex Deucher2025-02-131-1/+2
| | | | | | | | | | Bump the driver version for RV/PCO compute stability fix so mesa can use this check to enable compute queues on RV/PCO. Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.12.x
* drm/amdgpu/gfx9: manually control gfxoff for CS on RVAlex Deucher2025-02-131-2/+34
| | | | | | | | | | | | | | | | | | | | | | | | When mesa started using compute queues more often we started seeing additional hangs with compute queues. Disabling gfxoff seems to mitigate that. Manually control gfxoff and gfx pg with command submissions to avoid any issues related to gfxoff. KFD already does the same thing for these chips. v2: limit to compute v3: limit to APUs v4: limit to Raven/PCO v5: only update the compute ring_funcs v6: Disable GFX PG v7: adjust order Reviewed-by: Lijo Lazar <[email protected]> Suggested-by: Błażej Szczygieł <[email protected]> Suggested-by: Sergey Kovalenko <[email protected]> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861 Link: https://lists.freedesktop.org/archives/amd-gfx/2025-January/119116.html Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.12.x
* drm/amdgpu/pm: fix UVD handing in amdgpu_dpm_set_powergating_by_smu()Alex Deucher2025-02-121-1/+1
| | | | | | | | | | | | | | UVD and VCN were split into separate dpm helpers in commit ff69bba05f08 ("drm/amd/pm: add inst to dpm_set_powergating_by_smu") as such, there is no need to include UVD in the is_vcn variable since UVD and VCN are handled by separate dpm helpers now. Fix the check. Fixes: ff69bba05f08 ("drm/amd/pm: add inst to dpm_set_powergating_by_smu") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3959 Link: https://lists.freedesktop.org/archives/amd-gfx/2025-February/119827.html Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: Boyuan Zhang <[email protected]>
* Revert "drm/amd/display: Use HW lock mgr for PSR1"Tom Chung2025-02-041-2/+1
| | | | | | | | | | | This reverts commit a2b5a9956269 ("drm/amd/display: Use HW lock mgr for PSR1") Because it may cause system hang while connect with two edp panel. Acked-by: Wayne Lin <[email protected]> Signed-off-by: Tom Chung <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/display: Respect user's CONFIG_FRAME_WARN more for dml filesNathan Chancellor2025-02-042-14/+22
| | | | | | | | | | | | | | | | | | | | Currently, there are several files in drm/amd/display that aim to have a higher -Wframe-larger-than value to avoid instances of that warning with a lower value from the user's configuration. However, with the way that it is currently implemented, it does not respect the user's request via CONFIG_FRAME_WARN for a higher stack frame limit, which can cause pain when new instances of the warning appear and break the build due to CONFIG_WERROR. Adjust the logic to switch from a hard coded -Wframe-larger-than value to only using the value as a minimum clamp and deferring to the requested value from CONFIG_FRAME_WARN if it is higher. Suggested-by: Harry Wentland <[email protected]> Reported-by: Greg Kroah-Hartman <[email protected]> Closes: https://lore.kernel.org/2025013003-audience-opposing-7f95@gregkh/ Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/display: Fix seamless boot sequenceLo-an Chen2025-02-038-6/+15
| | | | | | | | | | | | | | | | | | | | | | [WHY] When the system powers up eDP with external monitors in seamless boot sequence, stutter get enabled before TTU and HUBP registers being programmed, which resulting in underflow. [HOW] Enable TTU in hubp_init. Change the sequence that do not perpare_bandwidth and optimize_bandwidth while having seamless boot streams. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Reviewed-by: Nicholas Kazlauskas <[email protected]> Signed-off-by: Lo-an Chen <[email protected]> Signed-off-by: Paul Hsieh <[email protected]> Signed-off-by: Alex Hung <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/display: Fix out-of-bound accessesAlex Hung2025-02-032-5/+5
| | | | | | | | | | | | | | | | | [WHAT & HOW] hpo_stream_to_link_encoder_mapping has size MAX_HPO_DP2_ENCODERS(=4), but location can have size up to 6. As a result, it is necessary to check location against MAX_HPO_DP2_ENCODERS. Similiarly, disp_cfg_stream_location can be used as an array index which should be 0..5, so the ASSERT's conditions should be less without equal. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3904 Reviewed-by: Austin Zheng <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Hung <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add a BO metadata flag to disable write compression for VulkanMarek Olšák2025-02-034-5/+13
| | | | | | | | | | | | | Vulkan can't support DCC and Z/S compression on GFX12 without WRITE_COMPRESS_DISABLE in this commit or a completely different DCC interface. AMDGPU_TILING_GFX12_SCANOUT is added because it's already used by userspace. Cc: [email protected] # 6.12.x Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/display: restore invalid MSA timing check for freesyncMelissa Wen2025-01-281-4/+8
| | | | | | | | | | | | | | | | | | This restores the original behavior that gets min/max freq from EDID and only set DP/eDP connector as freesync capable if "sink device is capable of rendering incoming video stream without MSA timing parameters", i.e., `allow_invalid_MSA_timing_params` is true. The condition was mistakenly removed by 0159f88a99c9 ("drm/amd/display: remove redundant freesync parser for DP"). CC: Mario Limonciello <[email protected]> CC: Alex Hung <[email protected]> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3915 Fixes: 0159f88a99c9 ("drm/amd/display: remove redundant freesync parser for DP") Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Melissa Wen <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
* drm/amdkfd: only flush the validate MES contexPrike Liang2025-01-281-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | The following page fault was observed duringthe KFD process release. In this particular error case, the HIP test (./MemcpyPerformance -h) does not require the queue. As a result, the process_context_addr was not assigned when the KFD process was released, ultimately leading to this page fault during the execution of the function kfd_process_dequeue_from_all_devices(). [345962.294891] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0) [345962.295333] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10 [345962.295775] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B33 [345962.296097] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [345962.296394] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 [345962.296633] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x1 [345962.296876] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [345962.297135] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1 [345962.297377] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [345962.297682] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0) Signed-off-by: Prike Liang <[email protected]> Reviewed-by: Jonathan Kim <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
* drm/amd/display: Correct register address in dcn35loanchen2025-01-281-1/+1
| | | | | | | | | | | | | [Why] the offset address of mmCLK5_spll_field_8 was incorrect for dcn35 which causes SSC not to be enabled. Reviewed-by: Charlene Liu <[email protected]> Signed-off-by: Lo-An Chen <[email protected]> Signed-off-by: Zaeem Mohamed <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
* drm/amd/pm: Mark MM activity as unsupportedLijo Lazar2025-01-281-1/+0
| | | | | | | | | | Aldebaran doesn't support querying MM activity percentage. Keep the field as 0xFFs to mark it as unsupported. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
* drm/amd/amdgpu: change the config of cgcg on gfx12Kenneth Feng2025-01-281-11/+0
| | | | | | | | | change the config of cgcg on gfx12 Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.12.x
* drm/amdkfd: Block per-queue reset when halt_if_hws_hang=1Jay Cornwall2025-01-281-2/+2
| | | | | | | | | | | The purpose of halt_if_hws_hang is to preserve GPU state for driver debugging when queue preemption fails. Issuing per-queue reset may kill wavefronts which caused the preemption failure. Signed-off-by: Jay Cornwall <[email protected]> Reviewed-by: Jonathan Kim <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.12.x
* drm/amd/display: Optimize cursor position updatesAric Cyr2025-01-244-12/+19
| | | | | | | | | | | | | | | | | | | | | | [why] Updating the cursor enablement register can be a slow operation and accumulates when high polling rate cursors cause frequent updates asynchronously to the cursor position. [how] Since the cursor enable bit is cached there is no need to update the enablement register if there is no change to it. This removes the read-modify-write from the cursor position programming path in HUBP and DPP, leaving only the register writes. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Reviewed-by: Sung Lee <[email protected]> Signed-off-by: Aric Cyr <[email protected]> Signed-off-by: Wayne Lin <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/display: Add hubp cache reset when powergatingAric Cyr2025-01-2414-2/+33
| | | | | | | | | | | | | | | | | | | | | [Why] When HUBP is power gated, the SW state can get out of sync with the hardware state causing cursor to not be programmed correctly. [How] Similar to DPP, add a HUBP reset function which is called wherever HUBP is initialized or powergated. This function will clear the cursor position and attribute cache allowing for proper programming when the HUBP is brought back up. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Reviewed-by: Sung Lee <[email protected]> Signed-off-by: Aric Cyr <[email protected]> Signed-off-by: Wayne Lin <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/amdgpu: Enable scratch data dump for mes 12Shaoyun Liu2025-01-242-14/+37
| | | | | | | | | | MES internal will check CP_MES_MSCRATCH_LO/HI register to set scratch data location during ucode start, driver side need to start the MES one by one with different setting for each pipe Signed-off-by: Shaoyun Liu <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd: Clarify kdoc for amdgpu.gttsizeMario Limonciello2025-01-241-1/+1
| | | | | | | | | Effectively amdgpu.gttsize gets set to ~1/2 of RAM, but that's controlled by what the TTM page limit is set to. Clarify the kdoc. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/amdgpu: Prevent null pointer dereference in GPU bandwidth calculationSrinivasan Shanmugam2025-01-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the parent is NULL, adev->pdev is used to retrieve the PCIe speed and width, ensuring that the function can still determine these capabilities from the device itself. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6193 amdgpu_device_gpu_bandwidth() error: we previously assumed 'parent' could be null (see line 6180) drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 6170 static void amdgpu_device_gpu_bandwidth(struct amdgpu_device *adev, 6171 enum pci_bus_speed *speed, 6172 enum pcie_link_width *width) 6173 { 6174 struct pci_dev *parent = adev->pdev; 6175 6176 if (!speed || !width) 6177 return; 6178 6179 parent = pci_upstream_bridge(parent); 6180 if (parent && parent->vendor == PCI_VENDOR_ID_ATI) { ^^^^^^ If parent is NULL 6181 /* use the upstream/downstream switches internal to dGPU */ 6182 *speed = pcie_get_speed_cap(parent); 6183 *width = pcie_get_width_cap(parent); 6184 while ((parent = pci_upstream_bridge(parent))) { 6185 if (parent->vendor == PCI_VENDOR_ID_ATI) { 6186 /* use the upstream/downstream switches internal to dGPU */ 6187 *speed = pcie_get_speed_cap(parent); 6188 *width = pcie_get_width_cap(parent); 6189 } 6190 } 6191 } else { 6192 /* use the device itself */ --> 6193 *speed = pcie_get_speed_cap(parent); ^^^^^^ Then we are toasted here. 6194 *width = pcie_get_width_cap(parent); 6195 } 6196 } Fixes: 757e8b951ce2 ("drm/amdgpu: cache gpu pcie link width") Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Lijo Lazar <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/display: Fix error pointers in amdgpu_dm_crtc_mem_type_changedSrinivasan Shanmugam2025-01-241-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function amdgpu_dm_crtc_mem_type_changed was dereferencing pointers returned by drm_atomic_get_plane_state without checking for errors. This could lead to undefined behavior if the function returns an error pointer. This commit adds checks using IS_ERR to ensure that new_plane_state and old_plane_state are valid before dereferencing them. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:11486 amdgpu_dm_crtc_mem_type_changed() error: 'new_plane_state' dereferencing possible ERR_PTR() drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c 11475 static bool amdgpu_dm_crtc_mem_type_changed(struct drm_device *dev, 11476 struct drm_atomic_state *state, 11477 struct drm_crtc_state *crtc_state) 11478 { 11479 struct drm_plane *plane; 11480 struct drm_plane_state *new_plane_state, *old_plane_state; 11481 11482 drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) { 11483 new_plane_state = drm_atomic_get_plane_state(state, plane); 11484 old_plane_state = drm_atomic_get_plane_state(state, plane); ^^^^^^^^^^^^^^^^^^^^^^^^^^ These functions can fail. 11485 --> 11486 if (old_plane_state->fb && new_plane_state->fb && 11487 get_mem_type(old_plane_state->fb) != get_mem_type(new_plane_state->fb)) 11488 return true; 11489 } 11490 11491 return false; 11492 } Fixes: 4caacd1671b7 ("drm/amd/display: Do not elevate mem_type change to full update") Cc: Leo Li <[email protected]> Cc: Tom Chung <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: Roman Li <[email protected]> Cc: Alex Hung <[email protected]> Cc: Aurabindo Pillai <[email protected]> Cc: Harry Wentland <[email protected]> Cc: Hamza Mahfooz <[email protected]> Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Roman Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: fix ring timeout issue in gfx10 sr-iov environmentLin.Cao2025-01-241-1/+1
| | | | | | | | | | | | | commit 26c95e838e63 ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") set job->vm as NULL if there is no fence. It will cause emit switch buffer be skippen if job->vm set as NULL. Check job rather than vm could solve this problem. Fixes: 26c95e838e63 ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") Signed-off-by: Lin.Cao <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/pm: Fix smu v13.0.6 caps initializationLijo Lazar2025-01-241-104/+133
| | | | | | | | | | | | | | Fix the initialization and usage of SMU v13.0.6 capability values. Use caps_set/clear functions to set/clear capability. Also, fix SET_UCLK_MAX capability on APUs, it is supported on APUs. Fixes: e9b86b841baf ("drm/amd/pm: Add capability flags for SMU v13.0.6") Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/pm: Refactor SMU 13.0.6 SDMA reset firmware version checks[email protected]2025-01-241-5/+12
| | | | | | | | | | | | This patch refactors the firmware version checks in `smu_v13_0_6_reset_sdma` to support multiple SMU programs with different firmware version thresholds. V2: return -EOPNOTSUPP for unspported pmfw Suggested-by: Lazar Lijo <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* revert "drm/amdgpu/pm: add definition PPSMC_MSG_ResetSDMA2"[email protected]2025-01-243-4/+1
| | | | | | | | | pmfw now unifies PPSMC_MSG_ResetSDMA definitions for different devices. PPSMC_MSG_ResetSDMA2 is not needed. Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* revert "drm/amdgpu/pm: Implement SDMA queue reset for different asic"[email protected]2025-01-241-21/+9
| | | | | | | | | pmfw unified PPSMC_MSG_ResetSDMA definitions for different devices. PPSMC_MSG_ResetSDMA2 is not needed. Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/pm: Add capability flags for SMU v13.0.6Lijo Lazar2025-01-242-68/+158
| | | | | | | | | | | | | Add capability flags for SMU v13.0.6 variants. Initialize the flags based on firmware support. As there are multiple IP versions maintained, it is more manageable with one time initialization of caps flags based on IP version and firmware feature support. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/display: fix SUBVP DC_DEBUG_MASK documentationAlex Deucher2025-01-241-1/+1
| | | | | | | | | | This needs to be kerneldoc formatted. Fixes: 5349658fa4a1 ("drm/amd: Add debug option to disable subvp") Reviewed-by: Harry Wentland <[email protected]> Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: Aurabindo Pillai <[email protected]>
* drm/amd/display: fix CEC DC_DEBUG_MASK documentationAlex Deucher2025-01-241-1/+1
| | | | | | | | | | This needs to be kerneldoc formatted. Fixes: 7594874227e1 ("drm/amd/display: add CEC notifier to amdgpu driver") Reported-by: Stephen Rothwell <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: Kun Liu <[email protected]>
* drm/amdgpu: fix the PCIe lanes reporting in the INFO IOCTLAlex Deucher2025-01-241-8/+11
| | | | | | | | | | Combine the platform and GPU caps like we do for PCIe Gen. This aligns properly with expectations and documentation for the interface. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3820 Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: cache gpu pcie link widthAlex Deucher2025-01-242-32/+138
| | | | | | | | | | | Get the PCIe link with of the device itself (or it's integrated upstream bridge) and cache that. v2: fix typo Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3820 Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/display: mark static functions noinline_for_stackTzung-Bi Shih2025-01-242-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling allmodconfig (CONFIG_WERROR=y) with clang-19, see the following errors: .../display/dc/dml2/display_mode_core.c:6268:13: warning: stack frame size (3128) exceeds limit (3072) in 'dml_prefetch_check' [-Wframe-larger-than] .../display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c:7236:13: warning: stack frame size (3256) exceeds limit (3072) in 'dml_core_mode_support' [-Wframe-larger-than] Mark static functions called by dml_prefetch_check() and dml_core_mode_support() noinline_for_stack to avoid them become huge functions and thus exceed the frame size limit. A way to reproduce: $ git checkout next-20250107 $ mkdir build_dir $ export PATH=/tmp/llvm-19.1.6-x86_64/bin:$PATH $ make LLVM=1 O=build_dir allmodconfig $ make LLVM=1 O=build_dir drivers/gpu/drm/ -j The way how it chose static functions to mark: [0] Unset CONFIG_WERROR in build_dir/.config. To get display_mode_core.o without errors. [1] Get a function list called by dml_prefetch_check(). $ sed -n '6268,6711p' drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c \ | sed -n -r 's/.*\W(\w+)\(.*/\1/p' | sort -u >/tmp/syms [2] Get the non-inline function list. Objdump won't show the symbols if they are inline functions. $ make LLVM=1 O=build_dir drivers/gpu/drm/ -j $ objdump -d build_dir/.../display_mode_core.o | \ ./scripts/checkstack.pl x86_64 0 | \ grep -f /tmp/syms | cut -d' ' -f2- >/tmp/orig [3] Get the full function list. Append "-fno-inline" to `CFLAGS_.../display_mode_core.o` in drivers/gpu/drm/amd/display/dc/dml2/Makefile. $ make LLVM=1 O=build_dir drivers/gpu/drm/ -j $ objdump -d build_dir/.../display_mode_core.o | \ ./scripts/checkstack.pl x86_64 0 | \ grep -f /tmp/syms | cut -d' ' -f2- >/tmp/noinline [4] Get the inline function list. If a symbol only in /tmp/noinline but not in /tmp/orig, it is a good candidate to mark noinline. $ diff /tmp/orig /tmp/noinline Chosen functions and their stack sizes: CalculateBandwidthAvailableForImmediateFlip [display_mode_core.o]:144 CalculateExtraLatency [display_mode_core.o]:176 CalculateTWait [display_mode_core.o]:64 CalculateVActiveBandwithSupport [display_mode_core.o]:112 set_calculate_prefetch_schedule_params [display_mode_core.o]:48 CheckGlobalPrefetchAdmissibility [dml2_core_dcn4_calcs.o]:544 calculate_bandwidth_available [dml2_core_dcn4_calcs.o]:320 calculate_vactive_det_fill_latency [dml2_core_dcn4_calcs.o]:272 CalculateDCFCLKDeepSleep [dml2_core_dcn4_calcs.o]:208 CalculateODMMode [dml2_core_dcn4_calcs.o]:208 CalculateOutputLink [dml2_core_dcn4_calcs.o]:176 Signed-off-by: Tzung-Bi Shih <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdkfd: Clear MODE.VSKIP in gfx9 trap handlerJay Cornwall2025-01-242-1311/+1318
| | | | | | | | | | | | | If user shader issues S_SETVSKIP then this state will persist when executing the trap handler, causing vector instructions to be skipped. VSKIP state is already saved/restored through the MODE register. Signed-off-by: Jay Cornwall <[email protected]> Reviewed-by: Lancelot Six <[email protected]> Suggested-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Refine ip detection log messageLijo Lazar2025-01-241-2/+2
| | | | | | | | | | 'add ip block' causes a confusion if the blocks are disabled later with ip_block_mask. Instead change to 'detected' and also add device context. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Suggested-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>