kernel - saturneric's kernel source tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	drm/amdgpu: Implement unrecoverable error message handling for VFs	Ellen Pan	2025-05-07	1	-3/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This notification may arrive in VF mailbox while polling for response from another event. This patches covers the following scenarios: - If VF is already in RMA state, then do not attempt to contact the host. Host will ignore the VF after sending the notification. - If the notification is detected during polling, then set the RMA status, and return error to caller. - If the notification arrives by interrupt, then set the RMA status and queue a reset. This reset will fail and VF will stop runtime services. Reviewed-by: Shravan Kumar Gande <[email protected]> Signed-off-by: Victor Skvortsov <[email protected]> Signed-off-by: Ellen Pan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Implement Runtime Bad Page query for VFs	Ellen Pan	2025-05-07	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Host will send a notification when new bad pages are available. Uopn guest request, the first 256 bad page addresses will be placed into the PF2VF region. Guest should pause the PF2VF worker thread while the copy is in progress. Reviewed-by: Shravan Kumar Gande <[email protected]> Signed-off-by: Victor Skvortsov <[email protected]> Signed-off-by: Ellen Pan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Add support for CPERs on virtualization	Tony Yi	2025-03-05	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for CPERs on VFs. VFs do not receive PMFW messages directly; as such, they need to query them from the host. To avoid hitting host event guard, CPER queries need to be rate limited. CPER queries share the same RAS telemetry buffer as error count query, so a mutex protecting the shared buffer was added as well. For readability, the amdgpu_detect_virtualization was refactored into multiple individual functions. Signed-off-by: Tony Yi <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry	Victor Skvortsov	2024-11-11	1	-2/+14
\| \| \| \| \| \| \| \|	Add message handlers for RAS telemetry. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: process RAS fatal error MB notification	Vignesh Chander	2024-06-27	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For RAS error scenario, VF guest driver will check mailbox and set fed flag to avoid unnecessary HW accesses. additionally, poll for reset completion message first to avoid accidentally spamming multiple reset requests to host. v2: add another mailbox check for handling case where kfd detects timeout first v3: set host_flr bit and use wait_for_reset Signed-off-by: Vignesh Chander <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Use dev_ prints for virtualization as it supports multi adapter	Vignesh Chander	2024-06-27	1	-8/+15
\| \| \| \| \| \| \| \|	So we can get clearer per device logging. Signed-off-by: Vignesh Chander <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: fix sriov host flr handler	Yunxiang Li	2024-06-14	1	-23/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We send back the ready to reset message before we stop anything. This is wrong. Move it to when we are actually ready for the FLR to happen. In the current state since we take tens of seconds to stop everything, it is very likely that host would give up waiting and reset the GPU before we send ready, so it would be the same as before. But this gets rid of the hack with reset_domain locking and also let us tell how slow ready to reset actually is from the host. The ready to reset speed can be improved later. Signed-off-by: Yunxiang Li <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Add reset_context flag for host FLR	Yunxiang Li	2024-05-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are other reset sources that pass NULL as the job pointer, such as amdgpu_amdkfd_reset_work. Therefore, using the job pointer to check if the FLR comes from the host does not work. Add a flag in reset_context to explicitly mark host triggered reset, and set this flag when we receive host reset notification. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Emily Deng <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Fix two reset triggered in a row	Yunxiang Li	2024-05-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some times a hang GPU causes multiple reset sources to schedule resets. The second source will be able to trigger an unnecessary reset if they schedule after we call amdgpu_device_stop_pending_resets. Move amdgpu_device_stop_pending_resets to after the reset is done. Since at this point the GPU is supposedly in a good state, any reset scheduled after this point would be a legitimate reset. Remove unnecessary and incorrect checks for amdgpu_in_reset that was kinda serving this purpose. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: update vf to pf message retry from 2 to 5	Zhigang Luo	2024-05-02	1	-1/+1
\| \| \| \| \| \| \| \|	increase retry times to wait host has enough time to complete reset. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: remove virt_init_data_exchange from poison consumption handler	Zhigang Luo	2024-04-19	1	-2/+0
\| \| \| \| \| \| \| \| \|	Host will initiate an FLR for all poison consumption. Guest should wait for FLR message to re-init data exchange. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: trigger flr_work if reading pf2vf data failed	Zhigang Luo	2024-03-20	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	if reading pf2vf data failed 30 times continuously, it means something is wrong. Need to trigger flr_work to recover the issue. also use dev_err to print the error message to get which device has issue and add warning message if waiting IDH_FLR_NOTIFICATION_CMPL timeout. Signed-off-by: Zhigang Luo <[email protected]> Acked-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Skip virt_exchange_init on SDMA poison consumption	Victor Skvortsov	2024-03-20	1	-1/+2
\| \| \| \| \| \| \| \| \|	Host will initiate an FLR in SDMA poison consumption scenario. Guest should wait for FLR message to re-init data exchange. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Add RAS_POISON_READY host response message	Victor Skvortsov	2024-01-25	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	In a non-FLR page avoidance scenario, the host driver will provide the bad pages in the pf2vf exchange region. Adding a new host response message to indicate when the pf2vf exchange region has been updated. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Support passing poison consumption ras block to SRIOV	YiPeng Chai	2024-01-25	1	-5/+18
\| \| \| \| \| \| \| \| \|	Support passing poison consumption ras blocks to SRIOV. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Clean up errors in mxgpu_nv.c	Ran Sun	2023-08-09	1	-4/+2
\| \| \| \| \| \| \| \| \| \|	Fix the following errors reported by checkpatch: ERROR: else should follow close brace '}' ERROR: that open brace { should be on the previous line Signed-off-by: Ran Sun <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: add RAS poison consumption handler for NV SRIOV	Tao Zhou	2022-12-15	1	-0/+6
\| \| \| \| \| \| \| \|	Send handling request to host. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	Revert "drm/amdgpu: let mode2 reset fallback to default when failure"	Victor Zhao	2022-10-19	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957. This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with the original design of reset handler. Will redesign it. Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure") Signed-off-by: Victor Zhao <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: let mode2 reset fallback to default when failure	Victor Zhao	2022-08-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	- introduce AMDGPU_SKIP_MODE2_RESET flag - let mode2 reset fallback to default reset method if failed v2: move this part out from the asic specific part Signed-off-by: Victor Zhao <[email protected]> Acked-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: support reset flag set for gpu reset	Likun Gao	2022-07-13	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Move reset_context out of gpu recover function to make it configurable for different reset purpose. For the reset way of call gpu_recovery sysfs, force to use full reset method. Otherwise, try soft reset by default if the related ASIC supportted, if soft reset failed, will use full reset. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to ↵	Andrey Grodzovsky	2022-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	amdgpu_device_gpu_recover We removed the wrapper that was queueing the recover function into reset domain queue who was using this name. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Move in_gpu_reset into reset_domain	Andrey Grodzovsky	2022-02-09	1	-2/+2
\| \| \| \| \| \| \| \| \|	We should have a single instance per entrire reset domain. Signed-off-by: Andrey Grodzovsky <[email protected]> Suggested-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74116.html
*	drm/amdgpu: Move reset sem into reset_domain	Andrey Grodzovsky	2022-02-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74117.html
*	drm/amdgpu: Rework reset domain to be refcounted.	Andrey Grodzovsky	2022-02-09	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. v4: Fix crash on boot witrh XGMI hive by adding type to reset_domain. XGMI will only create a new reset_domain if prevoius was of single device type meaning it's first boot. Otherwsie it will take a refocunt to exsiting reset_domain from the amdgou device. Add a wrapper around reset_domain->refcount get/put and a wrapper around send to reset wq (Lijo) Signed-off-by: Andrey Grodzovsky <[email protected]> Acked-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74121.html
*	drm/amd/virt: For SRIOV send GPU reset directly to TDR queue.	Andrey Grodzovsky	2022-02-09	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	No need to to trigger another work queue inside the work queue. v3: Problem: Extra reset caused by host side FLR notification following guest side triggered reset. Fix: Preven qeuing flr_work from mailbox irq if guest already executing a reset. Suggested-by: Liu Shaoyun <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Liu Shaoyun <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74114.html
*	drm/amdgpu: SRIOV flr_work should use down_write	Victor Skvortsov	2021-12-14	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	Host initiated VF FLR may fail if someone else is already holding a read_lock. Change from down_write_trylock to down_write to guarantee the reset goes through. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed by: Shaoyun.liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Add MB_REQ_MSG_READY_TO_RESET response when VF get FLR notification.	Jiange Zhao	2021-08-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When guest received FLR notification from host, it would lock adapter into reset state. There will be no more job submission and hardware access after that. Then it should send a response to host that it has prepared for host reset. Signed-off-by: Jiange Zhao <[email protected]> Signed-off-by: Peng Ju Zhou <[email protected]> Reviewed-by: Emily.Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Extend full access wait time in guest	Victor Zhao	2021-08-09	1	-4/+12
\| \| \| \| \| \| \| \| \|	- Extend wait time and add retry, currently 6s * 2times - Change timing algorithm Signed-off-by: Victor Zhao <[email protected]> Signed-off-by: Peng Ju Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: SRIOV flr_work should take write_lock	Jingwen Chen	2021-07-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	[Why] If flr_work takes read_lock, then other threads who takes read_lock can access hardware when host is doing vf flr. [How] flr_work should take write_lock to avoid this case. Signed-off-by: Jingwen Chen <[email protected]> Reviewed-by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu/sriov Stop data exchange for wholegpu reset	Jack Zhang	2021-01-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[Why] When host trigger a whole gpu reset, guest will keep waiting till host finish reset. But there's a work queue in guest exchanging data between vf&pf which need to access frame buffer. During whole gpu reset, frame buffer is not accessable, and this causes the call trace. [How] After vf get reset notification from pf, stop data exchange. Signed-off-by: Jingwen Chen <[email protected]> Signed-off-by: Jack Zhang <[email protected]> Reviewed-by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu/SRIOV: Extend VF reset request wait period	Jiange Zhao	2020-12-15	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In Virtualization case, when one VF is sending too many FLR requests, hypervisor would stop responding to this VF's request for a long period of time. This is called event guard. During this period of cooling time, guest driver should wait instead of doing other things. After this period of time, guest driver would resume reset process and return to normal. Currently, guest driver would wait 12 seconds and return fail if it doesn't get response from host. Solution: extend this waiting time in guest driver and poll response periodically. Poll happens every 6 seconds and it will last for 60 seconds. v2: change the max repetition times from number to macro. Signed-off-by: Jiange Zhao <[email protected]> Acked-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Do gpu recovery when no job is running	Liu ChengZhe	2020-09-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	In function flr_work, we should do gpu recovery when no job is running. Fix the logic by inverting it. v2: modify the description Reviewed-by: Christian König <[email protected]> Signed-off-by: Liu ChengZhe <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: change reset lock from mutex to rw_semaphore	Dennis Li	2020-08-24	1	-12/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	clients don't need reset-lock for synchronization when no GPU recovery. v2: change to return the return value of down_read_killable. v3: if GPU recovery begin, VF ignore FLR notification. Reviewed-by: Monk Liu <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: refine codes to avoid reentering GPU recovery	Dennis Li	2020-08-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	if other threads have holden the reset lock, recovery will fail to try_lock. Therefore we introduce atomic hive->in_reset and adev->in_gpu_reset, to avoid reentering GPU recovery. v2: drop "? true : false" in the definition of amdgpu_in_reset Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Fix repeatly flr issue	jqdeng	2020-08-18	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	Only for no job running test case need to do recover in flr notification. For having job in mirror list, then let guest driver to hit job timeout, and then do recover. Signed-off-by: jqdeng <[email protected]> Acked-by: Nirmoy Das <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: revert "fix system hang issue during GPU reset"	Christian König	2020-08-14	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1aa278c435c30a69b8f2418b4a52fcb929. Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: fix system hang issue during GPU reset	Dennis Li	2020-07-27	1	-10/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: Hawking Zhang <[email protected]> Suggested-by: Andrey Grodzovsky <[email protected]> Suggested-by: Christian König <[email protected]> Suggested-by: Felix Kuehling <[email protected]> Suggested-by: Lijo Lazar <[email protected]> Suggested-by: Luben Tukov <[email protected]> Signed-off-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: use static mmio offset for NV mailbox	Monk Liu	2020-04-01	1	-30/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	what: with the new "req_init_data" handshake we need to use mailbox before do IP discovery, so in mxgpu_nv.c file the original SOC15_REG method won'twork because that depends on IP discovery complete first. how: so the solution is to always use static MMIO offset for NV+ mailbox registers. HW team confirm us all MAILBOX registers will be at the same offset for all ASICs, no IP discovery needed for those registers Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: introduce new request and its function	Monk Liu	2020-04-01	1	-8/+40
\| \| \| \| \| \| \| \| \| \| \| \|	1) modify xgpu_nv_send_access_requests to support new idh request 2) introduce new function: req_gpu_init_data() which is used to notify host to prepare vbios/ip-discovery/pfvf exchange Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: cleanup idh event/req for NV headers	Monk Liu	2020-04-01	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	1) drop the headers from AI in mxgpu_nv.c, should refer to mxgpu_nv.h 2) the IDH_EVENT_MAX is not used and not aligned with host side so drop it 3) the IDH_TEXT_MESSAG was provided in host but not defined in guest Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: use true, false for bool variable in mxgpu_nv.c	zhengbin	2019-12-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Fixes coccicheck warning: drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c:255:2-20: WARNING: Assignment of 0/1 to bool variable drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c:267:2-20: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <[email protected]> Signed-off-by: zhengbin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: fix double gpu_recovery for NV of SRIOV	Monk Liu	2019-12-18	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	issues: gpu_recover() is re-entered by the mailbox interrupt handler mxgpu_nv.c fix: we need to bypass the gpu_recover() invoke in mailbox interrupt as long as the timeout is not infinite (thus the TDR will be triggered automatically after time out, no need to invoke gpu_recover() through mailbox interrupt. Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
*	drm/amdgpu: Add SRIOV mailbox backend for Navi1x	Jiange Zhao	2019-09-16	1	-0/+380
	Mimic the ones for Vega10, add mailbox backend for Navi1x Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Jiange Zhao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>