aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
Commit message (Collapse)AuthorAgeFilesLines
* drm/amdgpu: Implement unrecoverable error message handling for VFsEllen Pan2025-05-071-3/+31
| | | | | | | | | | | | | | | | | | | | | This notification may arrive in VF mailbox while polling for response from another event. This patches covers the following scenarios: - If VF is already in RMA state, then do not attempt to contact the host. Host will ignore the VF after sending the notification. - If the notification is detected during polling, then set the RMA status, and return error to caller. - If the notification arrives by interrupt, then set the RMA status and queue a reset. This reset will fail and VF will stop runtime services. Reviewed-by: Shravan Kumar Gande <[email protected]> Signed-off-by: Victor Skvortsov <[email protected]> Signed-off-by: Ellen Pan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Implement Runtime Bad Page query for VFsEllen Pan2025-05-071-0/+28
| | | | | | | | | | | | | | Host will send a notification when new bad pages are available. Uopn guest request, the first 256 bad page addresses will be placed into the PF2VF region. Guest should pause the PF2VF worker thread while the copy is in progress. Reviewed-by: Shravan Kumar Gande <[email protected]> Signed-off-by: Victor Skvortsov <[email protected]> Signed-off-by: Ellen Pan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Add support for CPERs on virtualizationTony Yi2025-03-051-0/+14
| | | | | | | | | | | | | | | | | | Add support for CPERs on VFs. VFs do not receive PMFW messages directly; as such, they need to query them from the host. To avoid hitting host event guard, CPER queries need to be rate limited. CPER queries share the same RAS telemetry buffer as error count query, so a mutex protecting the shared buffer was added as well. For readability, the amdgpu_detect_virtualization was refactored into multiple individual functions. Signed-off-by: Tony Yi <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Add msg handlers for SRIOV RAS TelemetryVictor Skvortsov2024-11-111-2/+14
| | | | | | | | Add message handlers for RAS telemetry. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: process RAS fatal error MB notificationVignesh Chander2024-06-271-0/+8
| | | | | | | | | | | | | | | | For RAS error scenario, VF guest driver will check mailbox and set fed flag to avoid unnecessary HW accesses. additionally, poll for reset completion message first to avoid accidentally spamming multiple reset requests to host. v2: add another mailbox check for handling case where kfd detects timeout first v3: set host_flr bit and use wait_for_reset Signed-off-by: Vignesh Chander <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Use dev_ prints for virtualization as it supports multi adapterVignesh Chander2024-06-271-8/+15
| | | | | | | | So we can get clearer per device logging. Signed-off-by: Vignesh Chander <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: fix sriov host flr handlerYunxiang Li2024-06-141-23/+16
| | | | | | | | | | | | | | | | | We send back the ready to reset message before we stop anything. This is wrong. Move it to when we are actually ready for the FLR to happen. In the current state since we take tens of seconds to stop everything, it is very likely that host would give up waiting and reset the GPU before we send ready, so it would be the same as before. But this gets rid of the hack with reset_domain locking and also let us tell how slow ready to reset actually is from the host. The ready to reset speed can be improved later. Signed-off-by: Yunxiang Li <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Add reset_context flag for host FLRYunxiang Li2024-05-021-0/+1
| | | | | | | | | | | | | | There are other reset sources that pass NULL as the job pointer, such as amdgpu_amdkfd_reset_work. Therefore, using the job pointer to check if the FLR comes from the host does not work. Add a flag in reset_context to explicitly mark host triggered reset, and set this flag when we receive host reset notification. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Emily Deng <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Fix two reset triggered in a rowYunxiang Li2024-05-021-1/+1
| | | | | | | | | | | | | | | | | Some times a hang GPU causes multiple reset sources to schedule resets. The second source will be able to trigger an unnecessary reset if they schedule after we call amdgpu_device_stop_pending_resets. Move amdgpu_device_stop_pending_resets to after the reset is done. Since at this point the GPU is supposedly in a good state, any reset scheduled after this point would be a legitimate reset. Remove unnecessary and incorrect checks for amdgpu_in_reset that was kinda serving this purpose. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: update vf to pf message retry from 2 to 5Zhigang Luo2024-05-021-1/+1
| | | | | | | | increase retry times to wait host has enough time to complete reset. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: remove virt_init_data_exchange from poison consumption handlerZhigang Luo2024-04-191-2/+0
| | | | | | | | | Host will initiate an FLR for all poison consumption. Guest should wait for FLR message to re-init data exchange. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: trigger flr_work if reading pf2vf data failedZhigang Luo2024-03-201-0/+2
| | | | | | | | | | | | | if reading pf2vf data failed 30 times continuously, it means something is wrong. Need to trigger flr_work to recover the issue. also use dev_err to print the error message to get which device has issue and add warning message if waiting IDH_FLR_NOTIFICATION_CMPL timeout. Signed-off-by: Zhigang Luo <[email protected]> Acked-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Skip virt_exchange_init on SDMA poison consumptionVictor Skvortsov2024-03-201-1/+2
| | | | | | | | | Host will initiate an FLR in SDMA poison consumption scenario. Guest should wait for FLR message to re-init data exchange. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Add RAS_POISON_READY host response messageVictor Skvortsov2024-01-251-0/+6
| | | | | | | | | | | | In a non-FLR page avoidance scenario, the host driver will provide the bad pages in the pf2vf exchange region. Adding a new host response message to indicate when the pf2vf exchange region has been updated. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Support passing poison consumption ras block to SRIOVYiPeng Chai2024-01-251-5/+18
| | | | | | | | | Support passing poison consumption ras blocks to SRIOV. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Clean up errors in mxgpu_nv.cRan Sun2023-08-091-4/+2
| | | | | | | | | | Fix the following errors reported by checkpatch: ERROR: else should follow close brace '}' ERROR: that open brace { should be on the previous line Signed-off-by: Ran Sun <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: add RAS poison consumption handler for NV SRIOVTao Zhou2022-12-151-0/+6
| | | | | | | | Send handling request to host. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* Revert "drm/amdgpu: let mode2 reset fallback to default when failure"Victor Zhao2022-10-191-1/+0
| | | | | | | | | | | | This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957. This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with the original design of reset handler. Will redesign it. Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure") Signed-off-by: Victor Zhao <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: let mode2 reset fallback to default when failureVictor Zhao2022-08-161-0/+1
| | | | | | | | | | | - introduce AMDGPU_SKIP_MODE2_RESET flag - let mode2 reset fallback to default reset method if failed v2: move this part out from the asic specific part Signed-off-by: Victor Zhao <[email protected]> Acked-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: support reset flag set for gpu resetLikun Gao2022-07-131-2/+10
| | | | | | | | | | | | Move reset_context out of gpu recover function to make it configurable for different reset purpose. For the reset way of call gpu_recovery sysfs, force to use full reset method. Otherwise, try soft reset by default if the related ASIC supportted, if soft reset failed, will use full reset. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to ↵Andrey Grodzovsky2022-06-101-1/+1
| | | | | | | | | | | amdgpu_device_gpu_recover We removed the wrapper that was queueing the recover function into reset domain queue who was using this name. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Move in_gpu_reset into reset_domainAndrey Grodzovsky2022-02-091-2/+2
| | | | | | | | | We should have a single instance per entrire reset domain. Signed-off-by: Andrey Grodzovsky <[email protected]> Suggested-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74116.html
* drm/amdgpu: Move reset sem into reset_domainAndrey Grodzovsky2022-02-091-2/+2
| | | | | | | | | | | We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74117.html
* drm/amdgpu: Rework reset domain to be refcounted.Andrey Grodzovsky2022-02-091-2/+4
| | | | | | | | | | | | | | | | | | | | | | | The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. v4: Fix crash on boot witrh XGMI hive by adding type to reset_domain. XGMI will only create a new reset_domain if prevoius was of single device type meaning it's first boot. Otherwsie it will take a refocunt to exsiting reset_domain from the amdgou device. Add a wrapper around reset_domain->refcount get/put and a wrapper around send to reset wq (Lijo) Signed-off-by: Andrey Grodzovsky <[email protected]> Acked-by: Christian König <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74121.html
* drm/amd/virt: For SRIOV send GPU reset directly to TDR queue.Andrey Grodzovsky2022-02-091-3/+6
| | | | | | | | | | | | | | | | | No need to to trigger another work queue inside the work queue. v3: Problem: Extra reset caused by host side FLR notification following guest side triggered reset. Fix: Preven qeuing flr_work from mailbox irq if guest already executing a reset. Suggested-by: Liu Shaoyun <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Liu Shaoyun <[email protected]> Link: https://www.spinics.net/lists/amd-gfx/msg74114.html
* drm/amdgpu: SRIOV flr_work should use down_writeVictor Skvortsov2021-12-141-2/+3
| | | | | | | | | | Host initiated VF FLR may fail if someone else is already holding a read_lock. Change from down_write_trylock to down_write to guarantee the reset goes through. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed by: Shaoyun.liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Add MB_REQ_MSG_READY_TO_RESET response when VF get FLR notification.Jiange Zhao2021-08-161-0/+2
| | | | | | | | | | | | | | When guest received FLR notification from host, it would lock adapter into reset state. There will be no more job submission and hardware access after that. Then it should send a response to host that it has prepared for host reset. Signed-off-by: Jiange Zhao <[email protected]> Signed-off-by: Peng Ju Zhou <[email protected]> Reviewed-by: Emily.Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Extend full access wait time in guestVictor Zhao2021-08-091-4/+12
| | | | | | | | | - Extend wait time and add retry, currently 6s * 2times - Change timing algorithm Signed-off-by: Victor Zhao <[email protected]> Signed-off-by: Peng Ju Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: SRIOV flr_work should take write_lockJingwen Chen2021-07-131-2/+2
| | | | | | | | | | | | | [Why] If flr_work takes read_lock, then other threads who takes read_lock can access hardware when host is doing vf flr. [How] flr_work should take write_lock to avoid this case. Signed-off-by: Jingwen Chen <[email protected]> Reviewed-by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu/sriov Stop data exchange for wholegpu resetJack Zhang2021-01-141-0/+1
| | | | | | | | | | | | | | | | | [Why] When host trigger a whole gpu reset, guest will keep waiting till host finish reset. But there's a work queue in guest exchanging data between vf&pf which need to access frame buffer. During whole gpu reset, frame buffer is not accessable, and this causes the call trace. [How] After vf get reset notification from pf, stop data exchange. Signed-off-by: Jingwen Chen <[email protected]> Signed-off-by: Jack Zhang <[email protected]> Reviewed-by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu/SRIOV: Extend VF reset request wait periodJiange Zhao2020-12-151-1/+10
| | | | | | | | | | | | | | | | | | | | | | | In Virtualization case, when one VF is sending too many FLR requests, hypervisor would stop responding to this VF's request for a long period of time. This is called event guard. During this period of cooling time, guest driver should wait instead of doing other things. After this period of time, guest driver would resume reset process and return to normal. Currently, guest driver would wait 12 seconds and return fail if it doesn't get response from host. Solution: extend this waiting time in guest driver and poll response periodically. Poll happens every 6 seconds and it will last for 60 seconds. v2: change the max repetition times from number to macro. Signed-off-by: Jiange Zhao <[email protected]> Acked-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Do gpu recovery when no job is runningLiu ChengZhe2020-09-151-1/+1
| | | | | | | | | | | In function flr_work, we should do gpu recovery when no job is running. Fix the logic by inverting it. v2: modify the description Reviewed-by: Christian König <[email protected]> Signed-off-by: Liu ChengZhe <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: change reset lock from mutex to rw_semaphoreDennis Li2020-08-241-12/+6
| | | | | | | | | | | | | | | | clients don't need reset-lock for synchronization when no GPU recovery. v2: change to return the return value of down_read_killable. v3: if GPU recovery begin, VF ignore FLR notification. Reviewed-by: Monk Liu <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: refine codes to avoid reentering GPU recoveryDennis Li2020-08-241-2/+2
| | | | | | | | | | | | | if other threads have holden the reset lock, recovery will fail to try_lock. Therefore we introduce atomic hive->in_reset and adev->in_gpu_reset, to avoid reentering GPU recovery. v2: drop "? true : false" in the definition of amdgpu_in_reset Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Fix repeatly flr issuejqdeng2020-08-181-1/+2
| | | | | | | | | | | Only for no job running test case need to do recover in flr notification. For having job in mirror list, then let guest driver to hit job timeout, and then do recover. Signed-off-by: jqdeng <[email protected]> Acked-by: Nirmoy Das <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: revert "fix system hang issue during GPU reset"Christian König2020-08-141-3/+10
| | | | | | | | | | | | | | The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1aa278c435c30a69b8f2418b4a52fcb929. Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: fix system hang issue during GPU resetDennis Li2020-07-271-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: Hawking Zhang <[email protected]> Suggested-by: Andrey Grodzovsky <[email protected]> Suggested-by: Christian König <[email protected]> Suggested-by: Felix Kuehling <[email protected]> Suggested-by: Lijo Lazar <[email protected]> Suggested-by: Luben Tukov <[email protected]> Signed-off-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: use static mmio offset for NV mailboxMonk Liu2020-04-011-30/+22
| | | | | | | | | | | | | | | | | | what: with the new "req_init_data" handshake we need to use mailbox before do IP discovery, so in mxgpu_nv.c file the original SOC15_REG method won'twork because that depends on IP discovery complete first. how: so the solution is to always use static MMIO offset for NV+ mailbox registers. HW team confirm us all MAILBOX registers will be at the same offset for all ASICs, no IP discovery needed for those registers Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: introduce new request and its functionMonk Liu2020-04-011-8/+40
| | | | | | | | | | | | 1) modify xgpu_nv_send_access_requests to support new idh request 2) introduce new function: req_gpu_init_data() which is used to notify host to prepare vbios/ip-discovery/pfvf exchange Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: cleanup idh event/req for NV headersMonk Liu2020-04-011-1/+0
| | | | | | | | | | | | 1) drop the headers from AI in mxgpu_nv.c, should refer to mxgpu_nv.h 2) the IDH_EVENT_MAX is not used and not aligned with host side so drop it 3) the IDH_TEXT_MESSAG was provided in host but not defined in guest Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: use true, false for bool variable in mxgpu_nv.czhengbin2019-12-231-2/+2
| | | | | | | | | | | Fixes coccicheck warning: drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c:255:2-20: WARNING: Assignment of 0/1 to bool variable drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c:267:2-20: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <[email protected]> Signed-off-by: zhengbin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: fix double gpu_recovery for NV of SRIOVMonk Liu2019-12-181-1/+5
| | | | | | | | | | | | | | | | issues: gpu_recover() is re-entered by the mailbox interrupt handler mxgpu_nv.c fix: we need to bypass the gpu_recover() invoke in mailbox interrupt as long as the timeout is not infinite (thus the TDR will be triggered automatically after time out, no need to invoke gpu_recover() through mailbox interrupt. Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* drm/amdgpu: Add SRIOV mailbox backend for Navi1xJiange Zhao2019-09-161-0/+380
Mimic the ones for Vega10, add mailbox backend for Navi1x Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Jiange Zhao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>