aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw
Commit message (Collapse)AuthorAgeFilesLines
* IB/mlx5: Fix obj_type mismatch for SRQ event subscriptionsOr Har-Toov2025-08-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fix a bug where the driver's event subscription logic for SRQ-related events incorrectly sets obj_type for RMP objects. When subscribing to SRQ events, get_legacy_obj_type() did not handle the MLX5_CMD_OP_CREATE_RMP case, which caused obj_type to be 0 (default). This led to a mismatch between the obj_type used during subscription (0) and the value used during notification (1, taken from the event's type field). As a result, event mapping for SRQ objects could fail and event notification would not be delivered correctly. This fix adds handling for MLX5_CMD_OP_CREATE_RMP in get_legacy_obj_type, returning MLX5_EVENT_QUEUE_TYPE_RQ so obj_type is consistent between subscription and notification. Fixes: 759738537142 ("IB/mlx5: Enable subscription for device events over DEVX") Link: https://patch.msgid.link/r/8f1048e3fdd1fde6b90607ce0ed251afaf8a148c.1755088962.git.leon@kernel.org Signed-off-by: Or Har-Toov <[email protected]> Reviewed-by: Edward Srouji <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/hns: Fix dip entries leak on devices newer than hip09Junxian Huang2025-08-131-1/+1
| | | | | | | | | | DIP algorithm is also supported on devices newer than hip09, so free dip entries too. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
* RDMA/bnxt_re: Fix to initialize the PBL arrayAnantha Prabhu2025-08-131-0/+2
| | | | | | | | | | | | | memset the PBL page pointer and page map arrays before populating the SGL addresses of the HWQ. Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Signed-off-by: Anantha Prabhu <[email protected]> Reviewed-by: Saravanan Vajravel <[email protected]> Reviewed-by: Selvin Xavier <[email protected]> Signed-off-by: Kalesh AP <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
* RDMA/bnxt_re: Fix a possible memory leak in the driverKalesh AP2025-08-131-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GID context reuse logic requires the context memory to be not freed if and when DEL_GID firmware command fails. But, if there's no subsequent ADD_GID to reuse it, the context memory must be freed when the driver is unloaded. Otherwise it leads to a memory leak. Below is the kmemleak trace reported: unreferenced object 0xffff88817a4f34d0 (size 8): comm "insmod", pid 1072504, jiffies 4402561550 hex dump (first 8 bytes): 01 00 00 00 00 00 00 00 ........ backtrace (crc ccaa009e): __kmalloc_cache_noprof+0x33e/0x400 0xffffffffc2db9d48 add_modify_gid+0x5e0/0xb60 [ib_core] __ib_cache_gid_add+0x213/0x350 [ib_core] update_gid+0xf2/0x180 [ib_core] enum_netdev_ipv4_ips+0x3f3/0x690 [ib_core] enum_all_gids_of_dev_cb+0x125/0x1b0 [ib_core] ib_enum_roce_netdev+0x14b/0x250 [ib_core] ib_cache_setup_one+0x2e5/0x540 [ib_core] ib_register_device+0x82c/0xf10 [ib_core] 0xffffffffc2df5ad9 0xffffffffc2da8b07 0xffffffffc2db174d auxiliary_bus_probe+0xa5/0x120 really_probe+0x1e4/0x850 __driver_probe_device+0x18f/0x3d0 Fixes: 4a62c5e9e2e1 ("RDMA/bnxt_re: Do not free the ctx_tbl entry if delete GID fails") Signed-off-by: Kalesh AP <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Sriharsha Basavapatna <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
* RDMA/bnxt_re: Fix to remove workload check in SRQ limit pathKashyap Desai2025-08-133-35/+2
| | | | | | | | | | | | | | | There should not be any checks of current workload to set srq_limit value to SRQ hw context. Remove all such workload checks and make a direct call to set srq_limit via doorbell SRQ_ARM. Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by: Kashyap Desai <[email protected]> Signed-off-by: Saravanan Vajravel <[email protected]> Signed-off-by: Kalesh AP <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
* RDMA/bnxt_re: Fix to do SRQ armena by defaultKashyap Desai2025-08-131-2/+1
| | | | | | | | | | | | | | | | Whenever SRQ is created, make sure SRQ arm enable is always set. Driver is always ready to receive SRQ ASYNC event. Additional note - There is no need to do srq arm enable conditionally. See bnxt_qplib_armen_db in bnxt_qplib_create_cq(). Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by: Kashyap Desai <[email protected]> Signed-off-by: Saravanan Vajravel <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
* RDMA/hns: Fix querying wrong SCC context for DIP algorithmwenglianfa2025-08-132-3/+10
| | | | | | | | | | | | | | When using DIP algorithm, all QPs establishing connections with the same destination IP share the same SCC, which is indexed by dip_idx, but dip_idx isn't necessarily equal to qpn. Therefore, dip_idx should be used to query SCC context instead of qpn. Fixes: 124a9fbe43aa ("RDMA/hns: Append SCC context to the raw dump of QPC") Signed-off-by: wenglianfa <[email protected]> Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Zhu Yanjun <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
* RDMA/erdma: Fix unset QPN of GSI QPBoshi Yu2025-08-131-0/+2
| | | | | | | | | | | | The QPN of the GSI QP was not set, which may cause issues. Set the QPN to 1 when creating the GSI QP. Fixes: 999a0a2e9b87 ("RDMA/erdma: Support UD QPs and UD WRs") Reviewed-by: Cheng Xu <[email protected]> Signed-off-by: Boshi Yu <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Zhu Yanjun <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
* RDMA/erdma: Fix ignored return value of init_kernel_qpBoshi Yu2025-08-131-1/+3
| | | | | | | | | | | The init_kernel_qp interface may fail. Check its return value and free related resources properly when it does. Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation") Reviewed-by: Cheng Xu <[email protected]> Signed-off-by: Boshi Yu <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2025-07-31104-48942/+1259
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rdma updates from Jason Gunthorpe: - Various minor code cleanups and fixes for hns, iser, cxgb4, hfi1, rxe, erdma, mana_ib - Prefetch supprot for rxe ODP - Remove memory window support from hns as new device FW is no longer support it - Remove qib, it is very old and obsolete now, Cornelis wishes to restructure the hfi1/qib shared layer - Fix a race in destroying CQs where we can still end up with work running because the work is cancled before the driver stops triggering it - Improve interaction with namespaces: * Follow the devlink namespace for newly spawned RDMA devices * Create iopoib net devces in the parent IB device's namespace * Allow CAP_NET_RAW checks to pass in user namespaces - A new flow control scheme for IB MADs to try and avoid queue overflows in the network - Fix 2G message sizes in bnxt_re - Optimize mkey layout for mlx5 DMABUF - New "DMA Handle" concept to allow controlling PCI TPH and steering tags * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits) RDMA/siw: Change maintainer email address RDMA/mana_ib: add support of multiple ports RDMA/mlx5: Refactor optional counters steering code RDMA/mlx5: Add DMAH support for reg_user_mr/reg_user_dmabuf_mr IB: Extend UVERBS_METHOD_REG_MR to get DMAH RDMA/mlx5: Add DMAH object support RDMA/core: Introduce a DMAH object and its alloc/free APIs IB/core: Add UVERBS_METHOD_REG_MR on the MR object net/mlx5: Add support for device steering tag net/mlx5: Expose IFC bits for TPH PCI/TPH: Expose pcie_tph_get_st_table_size() RDMA/mlx5: Fix incorrect MKEY masking RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey() RDMA/mlx5: remove redundant check on err on return expression RDMA/mana_ib: add additional port counters RDMA/mana_ib: Fix DSCP value in modify QP RDMA/efa: Add CQ with external memory support RDMA/core: Add umem "is_contiguous" and "start_dma_addr" helpers RDMA/uverbs: Add a common way to create CQ with umem RDMA/mlx5: Optimize DMABUF mkey page size ...
| * RDMA/mana_ib: add support of multiple portsKonstantin Taranov2025-07-233-55/+69
| | | | | | | | | | | | | | | | | | | | | | | | If the HW indicates support of multiple ports for rdma, create an IB device with a port per netdev in the ethernet mana driver. CM is only available on port 1, but RC QPs are supported on all ports. Signed-off-by: Konstantin Taranov <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * RDMA/mlx5: Refactor optional counters steering codePatrisious Haddad2025-07-234-62/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there isn't a clear layer separation between the counters and the steering code, whereas the steering code is doing redundant access to the counter struct. Separate the fs.c and counters.c, where fs code won't access or be aware of counter structs but only the objects it needs. As a result, move mlx5_rdma_counter struct from the header file to be an internal struct for the counters file only. Signed-off-by: Patrisious Haddad <[email protected]> Reviewed-by: Edward Srouji <[email protected]> Link: https://patch.msgid.link/9854d1fdb140e4a6552b7a2fd1a028cfe488a935.1753004208.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
| * RDMA/mlx5: Add DMAH support for reg_user_mr/reg_user_dmabuf_mrYishai Hadas2025-07-234-20/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of this enhancement, allow the creation of an MKEY associated with a DMA handle. Additional notes: MKEYs with TPH (i.e. TLP Processing Hints) attributes are currently not UMR-capable unless explicitly enabled by firmware or hardware. Therefore, to maintain such MKEYs in the MR cache, the TPH fields have been added to the rb_key structure, with a dedicated hash bucket. The ability to bypass the kernel verbs flow and create an MKEY with TPH attributes using DEVX has been restricted. TPH must follow the standard InfiniBand flow, where a DMAH is created with the appropriate security checks and management mechanisms in place. DMA handles are currently not supported in conjunction with On-Demand Paging (ODP). Re-registration of memory regions originally created with TPH attributes is currently not supported. Signed-off-by: Yishai Hadas <[email protected]> Reviewed-by: Edward Srouji <[email protected]> Link: https://patch.msgid.link/1c485651cf8417694ddebb80446c5093d5a791a9.1752752567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
| * IB: Extend UVERBS_METHOD_REG_MR to get DMAHYishai Hadas2025-07-2326-11/+105
| | | | | | | | | | | | | | | | | | | | | | | | Extend UVERBS_METHOD_REG_MR to get DMAH and pass it to all drivers. It will be used in mlx5 driver as part of the next patch from the series. Signed-off-by: Yishai Hadas <[email protected]> Reviewed-by: Edward Srouji <[email protected]> Link: https://patch.msgid.link/2ae1e628c0675db81f092cc00d3ad6fbf6139405.1752752567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
| * RDMA/mlx5: Add DMAH object supportYishai Hadas2025-07-234-0/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces support for allocating and deallocating the DMAH object. Further details: ---------------- The DMAH API is exposed to upper layers only if the underlying device supports TPH. It uses the mlx5_core steering tag (ST) APIs to get a steering tag index based on the provided input. The obtained index is stored in the device-specific mlx5_dmah structure for future use. Upcoming patches in the series will integrate the allocated DMAH into the memory region (MR) registration process. Signed-off-by: Yishai Hadas <[email protected]> Reviewed-by: Edward Srouji <[email protected]> Link: https://patch.msgid.link/778550776799d82edb4d05da249a1cff00160b50.1752752567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
| * RDMA/mlx5: Fix incorrect MKEY maskingLeon Romanovsky2025-07-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | mkey_mask is __be64 type, while MLX5_MKEY_MASK_PAGE_SIZE is declared as unsigned long long. This causes to the static checkers errors: drivers/infiniband/hw/mlx5/umr.c:663:49: warning: invalid assignment: &= drivers/infiniband/hw/mlx5/umr.c:663:49: left side has type restricted __be64 drivers/infiniband/hw/mlx5/umr.c:663:49: right side has type int Fixes: e73242aa14d2 ("RDMA/mlx5: Optimize DMABUF mkey page size") Link: https://patch.msgid.link/e354d70b98dfa5ecf4c236a36cd36b64add9d9de.1753003467.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
| * RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey()Leon Romanovsky2025-07-211-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Colin reported: "The variable zapped_blocks is a size_t type and is being assigned a int return value from the call to _mlx5r_umr_zap_mkey. Since zapped_blocks is an unsigned type, the error check for zapped_blocks < 0 will never be true." So separate return error and nblocks assignment. Fixes: e73242aa14d2 ("RDMA/mlx5: Optimize DMABUF mkey page size") Reported-by: Colin King (gmail) <[email protected]> Closes: https://lore.kernel.org/all/[email protected] Link: https://patch.msgid.link/71d8ea208ac7eaa4438af683b9afaed78625e419.1753003467.git.leon@kernel.org Reviewed-by: Zhu Yanjun <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
| * RDMA/mlx5: remove redundant check on err on return expressionColin Ian King2025-07-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Currently all paths that set err and then check it for an error perform immediate returns, hence err always zero at the end of the function _mlx5r_umr_zap_mkey. The return expression err ? err : nblocks has a redundant check on the err since err is always zero, so just return nblocks instead. Signed-off-by: Colin Ian King <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * RDMA/mana_ib: add additional port countersZhiyue Qiu2025-07-133-0/+34
| | | | | | | | | | | | | | | | | | | | Add packet and request port counters to mana_ib. Signed-off-by: Zhiyue Qiu <[email protected]> Signed-off-by: Konstantin Taranov <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Long Li <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
| * RDMA/mana_ib: Fix DSCP value in modify QPShiraz Saleem2025-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | Convert the traffic_class in GRH to a DSCP value as required by the HW. Fixes: e095405b45bb ("RDMA/mana_ib: Modify QP state") Signed-off-by: Shiraz Saleem <[email protected]> Signed-off-by: Konstantin Taranov <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Long Li <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
| * RDMA/efa: Add CQ with external memory supportMichael Margolin2025-07-133-14/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an option to create CQ using external memory instead of allocating in the driver. The memory can be passed from userspace by dmabuf fd and an offset or a VA. One of the possible usages is creating CQs that reside in accelerator memory, allowing low latency asynchronous direct polling from the accelerator device. Add a capability bit to reflect on the feature support. Reviewed-by: Daniel Kranzdorf <[email protected]> Reviewed-by: Yonatan Nachum <[email protected]> Signed-off-by: Michael Margolin <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
| * RDMA/mlx5: Optimize DMABUF mkey page sizeEdward Srouji2025-07-134-43/+327
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation of DMABUF memory registration uses a fixed page size for the memory key (mkey), which can lead to suboptimal performance when the underlying memory layout may offer better page size. The optimization improves performance by reducing the number of page table entries required for the mkey, leading to less MTT/KSM descriptors that the HCA must go through to find translations, fewer cache-lines, and shorter UMR work requests on mkey updates such as when re-registering or reusing a cacheable mkey. To ensure safe page size updates, the implementation uses a 5-step process: 1. Make the first X entries non-present, while X is calculated to be minimal according to a large page shift that can be used to cover the MR length. 2. Update the page size to the large supported page size 3. Load the remaining N-X entries according to the (optimized) page shift 4. Update the page size according to the (optimized) page shift 5. Load the first X entries with the correct translations This ensures that at no point is the MR accessible with a partially updated translation table, maintaining correctness and preventing access to stale or inconsistent mappings, such as having an mkey advertising the new page size while some of the underlying page table entries still contain the old page size translations. Signed-off-by: Edward Srouji <[email protected]> Reviewed-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/bc05a6b2142c02f96a90635f9a4458ee4bbbf39f.1751979184.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
| * RDMA/mlx5: Align mkc page size capability check to PRMMichael Guralnik2025-07-132-9/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Align the capabilities checked when using the log_page_size 6th bit in the mkey context to the PRM definition. The upper and lower bounds are set by max/min caps, and modification of the 6th bit by UMR is allowed only when a specific UMR cap is set. Current implementation falsely assumes all page sizes up-to 2^63 are supported when the UMR cap is set. In case the upper bound cap is lower than 63, this might result a FW syndrome on mkey creation, e.g: mlx5_core 0000:c1:00.0: mlx5_cmd_out_err:832:(pid 0): CREATE_MKEY(0×200) op_mod(0×0) failed, status bad parameter(0×3), syndrome (0×38a711), err(-22) Previous cap enforcement is still correct for all current HW, FW and driver combinations. However, this patch aligns the code to be PRM compliant in the general case. Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/eab4eeb4785105a4bb5eb362dc0b3662cd840412.1751979184.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
| * Optimize DMABUF mkey page size in mlx5Leon Romanovsky2025-07-131-2/+4
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From Edward: This patch series enables the mlx5 driver to dynamically choose the optimal page size for a DMABUF-based memory key (mkey), rather than always registering with a fixed page size. Previously, DMABUF memory registration used a fixed 4K page size for mkeys which could lead to suboptimal performance when the underlying memory layout may offer better page sizes. This approach did not take advantage of larger page size capabilities advertised by the HCA, and the driver was not setting the proper page size mask in the mkey mask when performing page size changes, potentially leading to invalid registrations when updating to a very large pages. This series improves DMABUF performance by dynamically selecting the best page size for a given memory region (MR) both at creation time and on page fault occurrences, based on the underlying layout and fixing related gaps and bugs. By doing so, we reduce the number of page table entries (and thus MTT/ KSM descriptors) that the HCA must traverse, which in turn reduces cache-line fetches. Thanks * mlx5-next: RDMA/mlx5: Fix UMR modifying of mkey page size net/mlx5: Expose HCA capability bits for mkey max page size Signed-off-by: Leon Romanovsky <[email protected]>
| * | RDMA/efa: Add Network HW statistics countersBasel Nassar2025-07-094-20/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update device API and request network counters. Expose newly added counters through ib core counters mechanism. Reviewed-by: David Shoolman <[email protected]> Reviewed-by: Yonatan Nachum <[email protected]> Signed-off-by: Basel Nassar <[email protected]> Signed-off-by: Michael Margolin <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | Merge branch 'mlx5-next' into wip/leon-for-nextLeon Romanovsky2025-07-071-1/+1
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * mlx5-next: net/mlx5: Check device memory pointer before usage net/mlx5: fs, fix RDMA TRANSPORT init cleanup flow net/mlx5: Add IFC bits for PCIe Congestion Event object net/mlx5: Small refactor for general object capabilities
| * | | RDMA/bnxt_re: Use macro instead of hard coded valueKalesh AP2025-07-072-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Defined a macro for the hard coded value. 2. "access" field in the request structure is of type "u8". Updated the mask accordingly. Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Shravya KN <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Hongguang Gao <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/bnxt_re: Support 2G message sizeSelvin Xavier2025-07-072-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bnxt_qplib_put_sges is calculating the length in a signed int. So handling the 2G message size is not working since it is considered as negative. Use a unsigned number to calculate the total message length. As per the spec, IB message size shall be between zero and 2^31 bytes (inclusive). So adding a check for 2G message size. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Shravya KN <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Saravanan Vajravel <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/bnxt_re: Fix size of uverbs_copy_to() in BNXT_RE_METHOD_GET_TOGGLE_MEMKalesh AP2025-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since both "length" and "offset" are of type u32, there is no functional issue here. Reviewed-by: Saravanan Vajravel <[email protected]> Signed-off-by: Shravya KN <[email protected]> Signed-off-by: Kalesh AP <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/hns: Fix -Wframe-larger-than issueJunxian Huang2025-07-071-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix -Wframe-larger-than issue by allocating memory for qpc struct with kzalloc() instead of using stack memory. Fixes: 606bf89e98ef ("RDMA/hns: Refactor for hns_roce_v2_modify_qp function") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/hns: Drop GFP_NOWARNJunxian Huang2025-07-071-13/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GFP_NOWARN silences all warnings on dma_alloc_coherent() failure, which might otherwise help with troubleshooting. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/hns: Fix accessing uninitialized resourcesJunxian Huang2025-07-071-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hr_dev->pgdir_list and hr_dev->pgdir_mutex won't be initialized if CQ/QP record db are not enabled, but they are also needed when using SRQ with SRQ record db enabled. Simplified the logic by always initailizing the reosurces. Fixes: c9813b0b9992 ("RDMA/hns: Support SRQ record doorbell") Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/hns: Get message length of ack_req from FWJunxian Huang2025-07-073-11/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ACK_REQ_FREQ indicates the number of packets (after MTU fragmentation) HW sends before setting an ACK request. When MTU is greater than or equal to 1024, the current ACK_REQ_FREQ value causes HW to request an ACK for every MTU fragment. The processing of a large number of ACKs severely impacts HW performance when sending large size payloads. Get message length of ack_req from FW so that we can adjust this parameter according to different situations. There are several constraints for ACK_REQ_FREQ: 1. mtu * (2 ^ ACK_REQ_FREQ) should not be too large, otherwise it may cause some unexpected retries when sending large payload. 2. ACK_REQ_FREQ should be larger than or equal to LP_PKTN_INI. 3. ACK_REQ_FREQ must be equal to LP_PKTN_INI when using LDCP or HC3 congestion control algorithm. Fixes: 56518a603fd2 ("RDMA/hns: Modify the value of long message loopback slice") Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/hns: Fix HW configurations not cleared in error flowwenglianfa2025-07-061-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hns_roce_clear_extdb_list_info() will eventually do some HW configurations through FW, and they need to be cleared by calling hns_roce_function_clear() when the initialization fails. Fixes: 7e78dd816e45 ("RDMA/hns: Clear extended doorbell info before using") Signed-off-by: wenglianfa <[email protected]> Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/hns: Fix double destruction of rsv_qpwenglianfa2025-07-062-15/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rsv_qp may be double destroyed in error flow, first in free_mr_init(), and then in hns_roce_exit(). Fix it by moving the free_mr_init() call into hns_roce_v2_init(). list_del corruption, ffff589732eb9b50->next is LIST_POISON1 (dead000000000100) WARNING: CPU: 8 PID: 1047115 at lib/list_debug.c:53 __list_del_entry_valid+0x148/0x240 ... Call trace: __list_del_entry_valid+0x148/0x240 hns_roce_qp_remove+0x4c/0x3f0 [hns_roce_hw_v2] hns_roce_v2_destroy_qp_common+0x1dc/0x5f4 [hns_roce_hw_v2] hns_roce_v2_destroy_qp+0x22c/0x46c [hns_roce_hw_v2] free_mr_exit+0x6c/0x120 [hns_roce_hw_v2] hns_roce_v2_exit+0x170/0x200 [hns_roce_hw_v2] hns_roce_exit+0x118/0x350 [hns_roce_hw_v2] __hns_roce_hw_v2_init_instance+0x1c8/0x304 [hns_roce_hw_v2] hns_roce_hw_v2_reset_notify_init+0x170/0x21c [hns_roce_hw_v2] hns_roce_hw_v2_reset_notify+0x6c/0x190 [hns_roce_hw_v2] hclge_notify_roce_client+0x6c/0x160 [hclge] hclge_reset_rebuild+0x150/0x5c0 [hclge] hclge_reset+0x10c/0x140 [hclge] hclge_reset_subtask+0x80/0x104 [hclge] hclge_reset_service_task+0x168/0x3ac [hclge] hclge_service_task+0x50/0x100 [hclge] process_one_work+0x250/0x9a0 worker_thread+0x324/0x990 kthread+0x190/0x210 ret_from_fork+0x10/0x18 Fixes: fd8489294dd2 ("RDMA/hns: Fix Use-After-Free of rsv_qp on HIP08") Signed-off-by: wenglianfa <[email protected]> Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | Fix dma_unmap_sg() nents valueThomas Fourier2025-07-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dma_unmap_sg() functions should be called with the same nents as the dma_map_sg(), not the value the map function returned. Fixes: ed10435d3583 ("RDMA/erdma: Implement hierarchical MTT") Signed-off-by: Thomas Fourier <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/mlx5: Check CAP_NET_RAW in user namespace for devx createParav Pandit2025-07-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the capability check is done in the default init_user_ns user namespace. When a process runs in a non default user namespace, such check fails. Due to this when a process is running using Podman, it fails to create the devx object. Since the RDMA device is a resource within a network namespace, use the network namespace associated with the RDMA device to determine its owning user namespace. Fixes: a8b92ca1b0e5 ("IB/mlx5: Introduce DEVX") Signed-off-by: Parav Pandit <[email protected]> Link: https://patch.msgid.link/36ee87e92defd81410c6a2b33f9d6c0d6dcfd64c.1750963874.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/mlx5: Check CAP_NET_RAW in user namespace for anchor createParav Pandit2025-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the capability check is done in the default init_user_ns user namespace. When a process runs in a non default user namespace, such check fails. Due to this when a process is running using Podman, it fails to create the anchor. Since the RDMA device is a resource within a network namespace, use the network namespace associated with the RDMA device to determine its owning user namespace. Fixes: 0c6ab0ca9a66 ("RDMA/mlx5: Expose steering anchor to userspace") Signed-off-by: Parav Pandit <[email protected]> Link: https://patch.msgid.link/c2376ca75e7658e2cbd1f619cf28fbe98c906419.1750963874.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/mlx5: Check CAP_NET_RAW in user namespace for flow createParav Pandit2025-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the capability check is done in the default init_user_ns user namespace. When a process runs in a non default user namespace, such check fails. Due to this when a process is running using Podman, it fails to create the flow. Since the RDMA device is a resource within a network namespace, use the network namespace associated with the RDMA device to determine its owning user namespace. Fixes: 322694412400 ("IB/mlx5: Introduce driver create and destroy flow methods") Signed-off-by: Parav Pandit <[email protected]> Link: https://patch.msgid.link/a4dcd5e3ac6904ef50b19e56942ca6ab0728794c.1750963874.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/mlx5: Allocate IB device with net namespace supplied from core devMark Bloch2025-06-262-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new ib_alloc_device_with_net() API to allocate the IB device so that it is properly bound to the network namespace obtained via mlx5_core_net(). This change ensures correct namespace association (e.g., for containerized setups). Additionally, expose mlx5_core_net so that RDMA driver can use it. Signed-off-by: Shay Drory <[email protected]> Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMI: hfi1: drop cpumask_empty() call in hfi1/affinity.cYury Norov [NVIDIA]2025-06-261-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In few places, the driver tests a cpumask for emptiness immediately before calling functions that report emptiness themself. Signed-off-by: Yury Norov [NVIDIA] <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA: hfi1: simplify hfi1_get_proc_affinity()Yury Norov [NVIDIA]2025-06-261-15/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function protects the for loop with affinity->num_core_siblings > 0 condition, which is redundant because the loop will break immediately in that case. Drop it and save one indentation level. Signed-off-by: Yury Norov [NVIDIA] <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA: hfi1: use rounddown in find_hw_thread_mask()Yury Norov [NVIDIA]2025-06-261-10/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | num_cores_per_socket is calculated by dividing by node_affinity.num_online_nodes, but all users of this variable multiply it by node_affinity.num_online_nodes again. This effectively is the same as rounding it down by node_affinity.num_online_nodes. Signed-off-by: Yury Norov [NVIDIA] <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA: hfi1: simplify init_real_cpu_mask()Yury Norov [NVIDIA]2025-06-261-14/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function opencodes cpumask_nth() and cpumask_clear_cpus(). The dedicated helpers are easier to use and usually much faster than opencoded for-loops. While there, drop useless clear of real_cpu_mask. Signed-off-by: Yury Norov [NVIDIA] <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA: hfi1: simplify find_hw_thread_mask()Yury Norov [NVIDIA]2025-06-261-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function opencodes cpumask_nth() and cpumask_clear_cpus(). The dedicated helpers are easier to use and usually much faster than opencoded for-loops. Signed-off-by: Yury Norov [NVIDIA] <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA: hfi1: fix possible divide-by-zero in find_hw_thread_mask()Yury Norov [NVIDIA]2025-06-261-20/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function divides number of online CPUs by num_core_siblings, and later checks the divider by zero. This implies a possibility to get and divide-by-zero runtime error. Fix it by moving the check prior to division. This also helps to save one indentation level. Signed-off-by: Yury Norov [NVIDIA] <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/mlx5: Add multiple priorities support to RDMA TRANSPORT userspace tablesPatrisious Haddad2025-06-253-19/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support the creation of RDMA TRANSPORT tables over multiple priorities via matcher creation. Signed-off-by: Patrisious Haddad <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Link: https://patch.msgid.link/bb38e50ae4504e979c6568d41939402a4cf15635.1750148083.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/mlx5: Support driver APIs pre_destroy_cq and post_destroy_cqMark Zhang2025-06-253-4/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - pre_destroy_cq: Destroy FW CQ object so that no new CQ event would be generated; - post_destroy_cq: Release all resources. This patch, along with last one, fixes the crash below. Unable to handle kernel paging request at virtual address ffff8000114b1180 Mem abort info: ESR = 0x96000047 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000047 CM = 0, WnR = 1 swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000f4582000 [ffff8000114b1180] pgd=00000447fffff003, p4d=00000447fffff003, pud=00000447ffffe003, pmd=00000447ffffb003, pte=0000000000000000 Internal error: Oops: 96000047 [#1] SMP Modules linked in: udp_diag uio_pci_generic uio tcp_diag inet_diag binfmt_misc sn_core_odd(OE) rpcrdma(OE) xprtrdma(OE) ib_isert(OE) ib_iser(OE) ib_srpt(OE) ib_srp(OE) ib_ipoib(OE) kpatch_9658536(OK) kpatch_9322385(OK) kpatch_8843421(OK) kpatch_8636216(OK) vfat fat aes_ce_blk crypto_simd cryptd aes_ce_cipher crct10dif_ce ghash_ce sm4_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt sg acpi_ipmi ipmi_si ipmi_msghandler m1_uncore_ddrss_pmu m1_uncore_cmn_pmu team_yosemite9rc6(OE) vnic(OE) ip_tables mlx5_ib(OE) sd_mod ast mlx5_core(OE) i2c_algo_bit drm_vram_helper psample drm_kms_helper mlxdevm(OE) auxiliary(OE) mlxfw(OE) syscopyarea sysfillrect tls sysimgblt fb_sys_fops drm_ttm_helper nvme ttm nvme_core drm t10_pi i2c_designware_platform i2c_designware_core i2c_core ahci libahci libata rdma_ucm(OE) ib_uverbs(OE) rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_umad(OE) ib_core(OE) ib_ucm(OE) mlx_compat(OE) [last unloaded: ipmi_devintf] CPU: 83 PID: 59375 Comm: kworker/u253:1 Kdump: loaded Tainted: G OE K 5.10.84-004.ali5000.alios7.aarch64 #1 Hardware name: Inspur AliServer-Xuanwu2.0AM-02-2UM1P-5B/AS1221MG1, BIOS 1.2.M1.AL.P.158.00 08/31/2023 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core] pstate: 82c00089 (Nzcv daIf +PAN +UAO +TCO BTYPE=--) pc : native_queued_spin_lock_slowpath+0x1c4/0x31c lr : mlx5_ib_poll_cq+0x18c/0x2f8 [mlx5_ib] sp : ffff80002be1bc80 x29: ffff80002be1bc80 x28: ffff000810e69000 x27: ffff000810e69000 x26: ffff000810e69200 x25: 0000000000000000 x24: ffff8000117db000 x23: ffff04000156b780 x22: 0000000000000000 x21: ffff04000ce6c160 x20: ffff0008196a4000 x19: 0000000000000010 x18: 0000000000000020 x17: 0000000000000000 x16: 0000000000000000 x15: ffff040055a364e8 x14: ffffffffffffffff x13: ffff80002318bda8 x12: ffff0400358836e8 x11: 0000000000000040 x10: 0000000000000eb0 x9 : 0000000000000000 x8 : 0000000000000000 x7 : ffff04477fa20140 x6 : ffff8000114b1140 x5 : ffff04477fa20140 x4 : ffff8000114b1180 x3 : ffff000810e69200 x2 : ffff8000114b1180 x1 : 0000000001500000 x0 : ffff04477fa20148 Call trace: native_queued_spin_lock_slowpath+0x1c4/0x31c __ib_process_cq+0x74/0x1b8 [ib_core] ib_cq_poll_work+0x34/0xa0 [ib_core] process_one_work+0x1d8/0x4b0 worker_thread+0x180/0x440 kthread+0x114/0x120 Code: 910020e0 8b0400c4 f862d929 aa0403e2 (f8296847) ---[ end trace 387be2290557729c ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x9850817,7a60aa38 Memory Limit: none Starting crashdump kernel... Bye! Signed-off-by: Mark Zhang <[email protected]> Link: https://patch.msgid.link/aaf0072f350d1c7e8731f43b79e11a560bafb9e0.1750070205.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
| * | | RDMA/qib: Remove outdated driverDennis Dalessandro2025-06-1744-48350/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The qib Infiniband device has long been decommissioned from distros and unsupported by the company. We have kept it going in a compile only mode for a while now and it's time to remove the driver. It has a number of issues that are never going to be addressed and is a hindrance to improving hfi1 and rdmavt. Link: https://patch.msgid.link/r/174836076755.2436819.5981097575800950899.stgit@awdrv-04.cornelisnetworks.com Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
| * | | RDMA/mana_ib: Add device statistics supportShiraz Saleem2025-06-124-2/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for mana device level statistics. Co-developed-by: Solom Tamawy <[email protected]> Signed-off-by: Solom Tamawy <[email protected]> Signed-off-by: Shiraz Saleem <[email protected]> Signed-off-by: Konstantin Taranov <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Long Li <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>