aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu.h
Commit message (Collapse)AuthorAgeFilesLines
* drm/amdgpu: rename the files for HMM handlingChristian König2022-11-171-1/+0
| | | | | | | | | Clean that up a bit, no functional change. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: there is no vbios fb on devices with no display hw (v2)Alex Deucher2022-11-151-0/+1
| | | | | | | | | | | | | If we enable virtual display functionality on parts with no display hardware we can end up trying to check for and reserve the vbios FB area on devices where it doesn't exist. Check if display hardware is actually present on the hardware before trying to reserve the memory. v2: move the check into common code Acked-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: clarify DC checksAlex Deucher2022-11-151-0/+1
| | | | | | | | | | There are several places where we don't want to check if a particular asic could support DC, but rather, if DC is enabled. Set a flag if DC is enabled and check for that rather than if a device supports DC or not. Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: rework SR-IOV virtual display handlingAlex Deucher2022-11-151-0/+2
| | | | | | | | | | | | virtual display is enabled unconditionally in SR-IOV, but without specifying the virtual_display module, the number of crtcs defaults to 0. Set a single display by default for SR-IOV if the virtual_display parameter is not set. Only enable virtual display by default on SR-IOV on asics which actually have display hardware. Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: extend halt_if_hws_hang to MESGraham Sider2022-11-041-0/+2
| | | | | | | | Hang on MES timeout if halt_if_hws_hang is set to 1. Signed-off-by: Graham Sider <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* Revert "drm/amdgpu: add debugfs amdgpu_reset_level"Victor Zhao2022-10-171-4/+0
| | | | | | | | | | | | This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc. This commit breaks the reset logic for aldebaran, revert it for now. Will move the mask inside the reset handler. Fixes: 5bd8d53f6fa53e ("drm/amdgpu: add debugfs amdgpu_reset_level") Signed-off-by: Victor Zhao <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: extend HWIP_MAX_INSTANCE to 28Hawking Zhang2022-10-171-1/+1
| | | | | | | | more ip instances are available Acked-by: Christian König <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add gang submit backend v2Christian König2022-09-201-0/+3
| | | | | | | | | | | | | | | | | Allows submitting jobs as gang which needs to run on multiple engines at the same time. Basic idea is that we have a global gang submit fence representing when the gang leader is finally pushed to run on the hardware last. Jobs submitted as gang are never re-submitted in case of a GPU reset since this won't work and will just deadlock the hardware immediately again. v2: fix logic inversion, improve documentation, fix rcu Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: reduce reset timeVictor Zhao2022-08-161-0/+1
| | | | | | | | | | | | | | | In multi container use case, reset time is important, so skip ring tests and cp halt wait during ip suspending for reset as they are going to fail and cost more time on reset v2: add a hang flag to indicate the reset comes from a job timeout, skip ring test and cp halt wait in this case v3: move hang flag to adev Signed-off-by: Victor Zhao <[email protected]> Acked-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add debugfs amdgpu_reset_levelVictor Zhao2022-08-161-0/+5
| | | | | | | | | | | | | | Introduce amdgpu_reset_level debugfs in order to help debug and test specific type of reset. Also helps blocking unwanted type of resets. By default, mode2 reset will not be enabled v2: make this debugfs in adev and use debugfs_create_u32 Signed-off-by: Victor Zhao <[email protected]> Acked-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Increase tlb flush timeout for sriovDusica Milinkovic2022-08-161-1/+1
| | | | | | | | | | | | | | | | [Why] During multi-vf executing benchmark (Luxmark) observed kiq error timeout. It happenes because all of VFs do the tlb invalidation at the same time. Although each VF has the invalidate register set, from hardware side the invalidate requests are queue to execute. [How] In case of 12 VF increase timeout on 12*100ms Signed-off-by: Dusica Milinkovic <[email protected]> Acked-by: Shaoyun Liu <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Fix the incomplete product numberRoy Sun2022-07-281-1/+1
| | | | | | | | | | | | | | The comments say that the product number is a 16-digit HEX string so the buffer needs to be at least 17 characters to hold the NUL terminator. Expand the buffer size to 20 to avoid the alignment issues. The comment:Product number should only be 16 characters. Any more,and something could be wrong. Cap it at 16 to be safe Signed-off-by: Roy Sun <[email protected]> Reviewed-by: André Almeida <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd/display: Add visualconfirm module parameterLeo Li2022-07-251-0/+1
| | | | | | | | | | | | | | | | [Why] Being able to configure visual confirm at boot or in cmdline is helpful when debugging. [How] Add a module parameter to configure DC visual confirm, which works the same way as the equivalent debugfs entry. Signed-off-by: Leo Li <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: drop runpm from amdgpu_device structureGuchun Chen2022-07-181-1/+0
| | | | | | | | | | | It's redundant, as now switching to rpm_mode to indicate runtime power management mode. Suggested-by: Lijo Lazar <[email protected]> Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: support reset flag set for gpu resetLikun Gao2022-07-131-3/+2
| | | | | | | | | | | | Move reset_context out of gpu recover function to make it configurable for different reset purpose. For the reset way of call gpu_recovery sysfs, force to use full reset method. Otherwise, try soft reset by default if the related ASIC supportted, if soft reset failed, will use full reset. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to ↵Andrey Grodzovsky2022-06-101-1/+1
| | | | | | | | | | | amdgpu_device_gpu_recover We removed the wrapper that was queueing the recover function into reset domain queue who was using this name. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Add work_struct for GPU reset from debugfsAndrey Grodzovsky2022-06-101-0/+2
| | | | | | | | | We need to have a work_struct to cancel this reset if another already in progress. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: enable ASPM support for PCIE 7.4.0/7.6.0Evan Quan2022-06-081-0/+1
| | | | | | | | Enable ASPM support for PCIE 7.4.0 and 7.6.0. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUsRamesh Errabolu2022-06-081-0/+3
| | | | | | | | | Add support for peer-to-peer communication among AMD GPUs over PCIe bus. Support REQUIRES enablement of config HSA_AMD_P2P. Signed-off-by: Ramesh Errabolu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: adding device coredump supportSomalapuram Amaranath2022-06-061-0/+5
| | | | | | | | | | | | | | | | | | | | | Added device coredump information: - Kernel version - Module - Time - VRAM status - Guilty process name and PID - GPU register dumps v1 -> v2: Variable name change v1 -> v2: NULL check v1 -> v2: Code alignment v1 -> v2: Adding dummy amdgpu_devcoredump_free v1 -> v2: memset reset_task_info to zero v2 -> v3: add CONFIG_DEV_COREDUMP for variables v2 -> v3: remove NULL check on amdgpu_devcoredump_read Signed-off-by: Somalapuram Amaranath <[email protected]> Reviewed-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: save the reset dump register value for devcoredumpSomalapuram Amaranath2022-06-061-0/+1
| | | | | | | | | | Allocate memory for register value and use the same values for devcoredump. v1 -> v2: Change krealloc_array() to kmalloc_array() v2 -> v3: Fix alignment Signed-off-by: Somalapuram Amaranath <[email protected]> Reviewed-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd: Fix spelling typo in commentspengfuyuan2022-06-031-1/+1
| | | | | | | | Fix spelling typo in comments. Reported-by: k2ci <[email protected]> Signed-off-by: pengfuyuan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amd: Don't reset dGPUs if the system is going to s2idleMario Limonciello2022-05-181-0/+2
| | | | | | | | | | | | | | | | | | An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC reset for handling aborted suspend can't work with s2idle. This functionality was introduced in commit daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)"). A few other commits have gone on top of the ASIC reset, but this still doesn't work on the A+A configuration in s2idle. Avoid doing the reset on dGPUs specifically when using s2idle. Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008 Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add lsdma blockLikun Gao2022-05-101-0/+5
| | | | | | | | | Add Light SDMA (LSDMA) block and related function. LSDMA is a small instance of SDMA mainly for kernel driver use. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu/psp: Add vbflash sysfs interface supportLikun Gao2022-05-101-0/+1
| | | | | | | | | | | Add sysfs interface to copy VBIOS. v2: squash in fix for proper vmalloc API (Alex) Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* Revert "drm/amdgpu: disable runpm if we are the primary adapter"Alex Deucher2022-05-051-1/+0
| | | | | | | | | | | This reverts commit b95dc06af3e683d6b7ddbbae178b2b2a21ee8b2b. This workaround is no longer necessary. We have a better workaround in commit f95af4a9236695 ("drm/amdgpu: don't runtime suspend if there are displays attached (v3)"). Reviewed-by: Javier Martinez Canillas <[email protected]> Acked-by: Daniel Vetter <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add mes_kiq module parameter v2Jack Xiao2022-05-041-0/+2
| | | | | | | | | | | | | mes_kiq parameter is used to enable mes kiq pipe. This module parameter is unneccessary or enabled by default in final version. v2: reword commit message. Signed-off-by: Jack Xiao <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add the per-context meta data v3Jack Xiao2022-05-041-0/+1
| | | | | | | | | | | | | | | | The per-context meta data is a per-context data structure associated with a mes-managed hardware ring, which includes MCBP CSA, ring buffer and etc. v2: fix typo v3: a. use structure instead of typedef b. move amdgpu_mes_ctx_get_offs_* to amdgpu_ring.h c. use __aligned to make alignement Signed-off-by: Jack Xiao <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: define MQD abstract layer for hw ipJack Xiao2022-05-041-0/+21
| | | | | | | | | | | Define MQD abstract layer for hw ip, for the passing mqd configuration not only from ring but more sources, like user queue. Signed-off-by: Jack Xiao <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add tracking for the enablement of SCPMLikun Gao2022-05-041-0/+3
| | | | | | | | | Add parmeter to shows whether SCPM feature is enabled or not, and whether is valid. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add hdp version 6 functionsLikun Gao2022-05-041-1/+2
| | | | | | | | | Unify hdp related function into hdp structure for hdp version 6. V2: Remove hdp invalidate function as hdp v6 doesn't have read cache. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add function to decode ip versionLikun Gao2022-04-281-0/+3
| | | | | | | | Add function to decode IP version. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: increase HWIP MAX INSTANCELikun Gao2022-04-281-1/+1
| | | | | | | | | Extend HWIP MAX INSTANCE to 11. Acked-by: Christian König <[email protected]> Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: expand cg_flags from u32 to u64Evan Quan2022-04-081-3/+3
| | | | | | | | | With this, we can support more CG flags. Signed-off-by: Evan Quan <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: header cleanupChristian König2022-03-041-95/+0
| | | | | | | | | No function change, just move a bunch of definitions from amdgpu.h into separate header files. Signed-off-by: Christian König <[email protected]> Acked-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu/vcn: Add vcn firmware logRuijing Dong2022-03-041-0/+3
| | | | | | | | | vcn fwlog is for debugging purpose only, by default, it is disabled. Signed-off-by: Ruijing Dong <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* Merge tag 'amd-drm-next-5.18-2022-02-25' of ↵Dave Airlie2022-03-011-9/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.18-2022-02-25: amdgpu: - Raven2 suspend/resume fix - SDMA 5.2.6 updates - VCN 3.1.2 updates - SMU 13.0.5 updates - DCN 3.1.5 updates - Virtual display fixes - SMU code cleanup - Harvest fixes - Expose benchmark tests via debugfs - Drop no longer relevant gart aperture tests - More RAS restructuring - W=1 fixes - PSR rework - DP/VGA adapter fixes - DP MST fixes - GPUVM eviction fix - GPU reset debugfs register dumping support - Misc display fixes - SR-IOV fix - Aldebaran mGPU fix - Add module parameter to disable XGMI for testing amdkfd: - IH ring overflow logging fixes - CRIU fixes - Misc fixes Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
| * drm/amdgpu: Add use_xgmi_p2p module parameterAlex Sierra2022-02-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This parameter controls xGMI p2p communication, which is enabled by default. However, it can be disabled by setting it to 0. In case xGMI p2p is disabled in a dGPU, PCIe p2p interface will be used instead. This parameter is ignored in GPUs that do not support xGMI p2p configuration. Signed-off-by: Alex Sierra <[email protected]> Acked-by: Luben Tuikov <[email protected]> Acked-by: Harish Kasiviswanathan <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| * drm/amdgpu: add debugfs for reset registers listSomalapuram Amaranath2022-02-231-0/+4
| | | | | | | | | | | | | | | | List of register populated for dump collection during the GPU reset. Signed-off-by: Somalapuram Amaranath <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| * drm/amdgpu: drop testing module parameterAlex Deucher2022-02-231-7/+0
| | | | | | | | | | | | | | | | This test is not particularly useful now that GTT and GART are decoupled in the driver. Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| * drm/amdgpu: drop benchmark module parameterAlex Deucher2022-02-231-1/+0
| | | | | | | | | | | | | | | | Now that we expose the benchmarks via debugfs, there is no longer a need for the module parameter. Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| * drm/amdgpu: add a benchmark mutexAlex Deucher2022-02-231-0/+2
| | | | | | | | | | | | | | To avoid multiple runs in parallel to avoid mixing results. Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
| * drm/amdgpu: plumb error handling though amdgpu_benchmark()Alex Deucher2022-02-231-1/+1
| | | | | | | | | | | | | | | | | | So we can tell when this function fails. v2: squash in error handling fix (Alex) Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* | Merge tag 'drm-misc-next-2022-02-23' of ↵Dave Airlie2022-02-241-6/+8
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.18: UAPI Changes: Cross-subsystem Changes: - Split out panel-lvds and lvds dt bindings . - Put yes/no on/off disabled/enabled strings in linux/string_helpers.h and use it in drivers and tomoyo. - Clarify dma_fence_chain and dma_fence_array should never include eachother. - Flatten chains in syncobj's. - Don't double add in fbdev/defio when page is already enlisted. - Don't sort deferred-I/O pages by default in fbdev. Core Changes: - Fix missing pm_runtime_put_sync in bridge. - Set modifier support to only linear fb modifier if drivers don't advertise support. - As a result, we remove allow_fb_modifiers. - Add missing clear for EDID Deep Color Modes in drm_reset_display_info. - Assorted documentation updates. - Warn once in drm_clflush if there is no arch support. - Add missing select for dp helper in drm_panel_edp. - Assorted small fixes. - Improve fb-helper's clipping handling. - Don't dump shmem mmaps in a core dump. - Add accounting to ttm resource manager, and use it in amdgpu. - Allow querying the detected eDP panel through debugfs. - Add helpers for xrgb8888 to 8 and 1 bits gray. - Improve drm's buddy allocator. - Add selftests for the buddy allocator. Driver Changes: - Add support for nomodeset to a lot of drm drivers. - Use drm_module_*_driver in a lot of drm drivers. - Assorted small fixes to bridge/lt9611, v3d, vc4, vmwgfx, mxsfb, nouveau, bridge/dw-hdmi, panfrost, lima, ingenic, sprd, bridge/anx7625, ti-sn65dsi86. - Add bridge/it6505. - Create DP and DVI-I connectors in ast. - Assorted nouveau backlight fixes. - Rework amdgpu reset handling. - Add dt bindings for ingenic,jz4780-dw-hdmi. - Support reading edid through aux channel in ingenic. - Add a drm driver for Solomon SSD130x OLED displays. - Add simple support for sharp LQ140M1JW46. - Add more panels to nt35560. Signed-off-by: Dave Airlie <[email protected]> From: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
| * drm/amdgpu: Move in_gpu_reset into reset_domainAndrey Grodzovsky2022-02-091-5/+2
| | | | | | | | | | | | | | | | | | We should have a single instance per entrire reset domain. Signed-off-by: Andrey Grodzovsky <[email protected]> Suggested-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74116.html
| * drm/amdgpu: Move reset sem into reset_domainAndrey Grodzovsky2022-02-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74117.html
| * drm/amdgpu: Rework reset domain to be refcounted.Andrey Grodzovsky2022-02-091-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. v4: Fix crash on boot witrh XGMI hive by adding type to reset_domain. XGMI will only create a new reset_domain if prevoius was of single device type meaning it's first boot. Otherwsie it will take a refocunt to exsiting reset_domain from the amdgou device. Add a wrapper around reset_domain->refcount get/put and a wrapper around send to reset wq (Lijo) Signed-off-by: Andrey Grodzovsky <[email protected]> Acked-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74121.html
| * drm/amdgpu: Serialize non TDR gpu recovery with TDRsAndrey Grodzovsky2022-02-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeueue work and wait on it to finish. v2: Rename to amdgpu_recover_work_struct Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74113.html
| * drm/amdgpu: Introduce reset domainAndrey Grodzovsky2022-02-091-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another. Do it for both single device and XGMI hive cases. Signed-off-by: Andrey Grodzovsky <[email protected]> Suggested-by: Daniel Vetter <[email protected]> Suggested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74111.html
* | drm/amd: Refactor `amdgpu_aspm` to be evaluated per deviceMario Limonciello2022-02-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Evaluating `pcie_aspm_enabled` as part of driver probe has the implication that if one PCIe bridge with an AMD GPU connected doesn't support ASPM then none of them do. This is an invalid assumption as the PCIe core will configure ASPM for individual PCIe bridges. Create a new helper function that can be called by individual dGPUs to react to the `amdgpu_aspm` module parameter without having negative results for other dGPUs on the PCIe bus. Suggested-by: Lijo Lazar <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>