aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/bnxt_re/main.c
Commit message (Collapse)AuthorAgeFilesLines
...
* RDMA: Move driver_id into struct ib_device_opsJason Gunthorpe2019-06-101-1/+2
| | | | | | | | | No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/drivers: Convert easy drivers to use ib_device_set_netdev()Jason Gunthorpe2019-04-091-1/+5
| | | | | | | | | | Drivers that never change their ndev dynamically do not need to use the get_netdev callback. Signed-off-by: Jason Gunthorpe <[email protected]> Acked-by: Selvin Xavier <[email protected]> Acked-by: Michal Kalderon <[email protected]> Acked-by: Adit Ranadive <[email protected]>
* RDMA: Handle SRQ allocations by IB/coreLeon Romanovsky2019-04-081-0/+1
| | | | | | | Convert SRQ allocation from drivers to be in the IB/core Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA: Handle AH allocations by IB/coreLeon Romanovsky2019-04-081-0/+1
| | | | | | | | | | | | Simplify drivers by ensuring lifetime of ib_ah object. The changes in .create_ah() go hand in hand with relevant update in .destroy_ah(). We will use this opportunity and convert .destroy_ah() to don't fail, as it was suggested a long time ago, because there is nothing to do in case of failure during destroy. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA: Handle ucontext allocations by IB/coreLeon Romanovsky2019-02-221-0/+1
| | | | | | | Following the PD conversion patch, do the same for ucontext allocations. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA: Handle PD allocations by IB/coreLeon Romanovsky2019-02-081-0/+1
| | | | | | | | | | | | | The PD allocations in IB/core allows us to simplify drivers and their error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand in had with relevant update in .dealloc_pd(). We will use this opportunity and convert .dealloc_pd() to don't fail, as it was suggested a long time ago, failures are not happening as we have never seen a WARN_ON print. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Skip backing store allocation for 57500 seriesDevesh Sharma2019-02-071-1/+2
| | | | | | | | | | | | | | | | The backing store to keep HW context data structures is allocated and initialized by L2 driver. For 57500 chip RoCE driver do not require to allocate and initialize additional memory. Changing to skip duplicate allocation and initialization for 57500 adapters. Driver continues as before for older chips. This patch also takes care of stats context memory alignment to 128 boundary, a requirement for 57500 series of chip. Older chips do not care of alignment, thus the change is unconditional. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Add 64bit doorbells for 57500 seriesDevesh Sharma2019-02-071-33/+52
| | | | | | | | | | | | | | | | | | | | | The new chip series has 64 bit doorbell for notification queues. Thus, both control and data path event queues need new routines to write 64 bit doorbell. Adding the same. There is new doorbell interface between the chip and driver. Changing the chip specific data structure definitions. Additional significant changes are listed below - bnxt_re_net_ring_free/alloc takes a new argument - bnxt_qplib_enable_nq and enable_rcfw uses new doorbell offset for new chip. - DB mapping for NQ and CREQ now maps 8 bytes. - DBR_DBR_* macros renames to DBC_DBC_* - store nq_db_offset in a 32bit data type. - got rid of __iowrite64_copy, used writeq instead. - changed the DB header initialization to simpler scheme. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Add chip context to identify 57500 seriesDevesh Sharma2019-02-071-0/+34
| | | | | | | | | | | Adding setup and destroy routines for chip-context. The chip context would be used frequently in control and data path to take execution flow depending on the chip type. chip context structure pointer is added to the relevant data structures. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA: Provide safe ib_alloc_device() functionLeon Romanovsky2019-01-301-1/+1
| | | | | | | | | | | | All callers to ib_alloc_device() provide a larger size than struct ib_device and rely on the fact that struct ib_device is embedded in their driver specific structure as the first member. Provide a safer variant of ib_alloc_device() that checks and enforces this approach to make sure the drivers are using it right. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA: Introduce and use rdma_device_to_ibdev()Parav Pandit2019-01-141-2/+4
| | | | | | | | | | | | | | Introduce and use rdma_device_to_ibdev() API for those drivers which are registering one sysfs group and also use in ib_core. In subsequent patch, device->provider_ibdev one-to-one mapping is no longer holds true during accessing sysfs entries. Therefore, introduce an API rdma_device_to_ibdev() that provides such information. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA: Rename port_callback to init_portParav Pandit2019-01-141-1/+1
| | | | | | | | | | | | | | Most provider routines are callback routines which ib core invokes. _callback suffix doesn't convey information about when such callback is invoked. Therefore, rename port_callback to init_port. Additionally, store the init_port function pointer in ib_device_ops, so that it can be accessed in subsequent patches when binding rdma device to net namespace. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Increase depth of control path command queueDevesh Sharma2018-12-191-0/+1
| | | | | | | | | | Increasing the depth of control path command queue to 8K entries to handle burst of commands. This feature needs support from FW and the driver/fw compatibility is checked from the interface version number. Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Query HWRM Interface version from FWSelvin Xavier2018-12-191-0/+31
| | | | | | | | Get HWRM interface major, minor, build and patch version from FW for checking the FW/Driver compatibility. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Initialize ib_device_ops structKamal Heib2018-12-111-51/+45
| | | | | | | | Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Avoid accessing the device structure after it is freedSelvin Xavier2018-11-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When bnxt_re_ib_reg returns failure, the device structure gets freed. Driver tries to access the device pointer after it is freed. [ 4871.034744] Failed to register with netedev: 0xffffffa1 [ 4871.034765] infiniband (null): Failed to register with IB: 0xffffffea [ 4871.046430] ================================================================== [ 4871.046437] BUG: KASAN: use-after-free in bnxt_re_task+0x63/0x180 [bnxt_re] [ 4871.046439] Write of size 4 at addr ffff880fa8406f48 by task kworker/u48:2/17813 [ 4871.046443] CPU: 20 PID: 17813 Comm: kworker/u48:2 Kdump: loaded Tainted: G B OE 4.20.0-rc1+ #42 [ 4871.046444] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 4871.046447] Workqueue: bnxt_re bnxt_re_task [bnxt_re] [ 4871.046449] Call Trace: [ 4871.046454] dump_stack+0x91/0xeb [ 4871.046458] print_address_description+0x6a/0x2a0 [ 4871.046461] kasan_report+0x176/0x2d0 [ 4871.046463] ? bnxt_re_task+0x63/0x180 [bnxt_re] [ 4871.046466] bnxt_re_task+0x63/0x180 [bnxt_re] [ 4871.046470] process_one_work+0x216/0x5b0 [ 4871.046471] ? process_one_work+0x189/0x5b0 [ 4871.046475] worker_thread+0x4e/0x3d0 [ 4871.046479] kthread+0x10e/0x140 [ 4871.046480] ? process_one_work+0x5b0/0x5b0 [ 4871.046482] ? kthread_stop+0x220/0x220 [ 4871.046486] ret_from_fork+0x3a/0x50 [ 4871.046492] The buggy address belongs to the page: [ 4871.046494] page:ffffea003ea10180 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 4871.046495] flags: 0x57ffffc0000000() [ 4871.046498] raw: 0057ffffc0000000 0000000000000000 ffffea003ea10188 0000000000000000 [ 4871.046500] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 4871.046501] page dumped because: kasan: bad access detected Avoid accessing the device structure once it is freed. Fixes: 497158aa5f52 ("RDMA/bnxt_re: Fix the ib_reg failure cleanup") Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Fix system hang when registration with L2 driver failsSelvin Xavier2018-11-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Driver doesn't release rtnl lock if registration with L2 driver (bnxt_re_register_netdev) fais and this causes hang while requesting for the next lock. [ 371.635416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 371.635417] kworker/u48:1 D 0 634 2 0x80000000 [ 371.635423] Workqueue: bnxt_re bnxt_re_task [bnxt_re] [ 371.635424] Call Trace: [ 371.635426] ? __schedule+0x36b/0xbd0 [ 371.635429] schedule+0x39/0x90 [ 371.635430] schedule_preempt_disabled+0x11/0x20 [ 371.635431] __mutex_lock+0x45b/0x9c0 [ 371.635433] ? __mutex_lock+0x16d/0x9c0 [ 371.635435] ? bnxt_re_ib_reg+0x2b/0xb30 [bnxt_re] [ 371.635438] ? wake_up_klogd+0x37/0x40 [ 371.635442] bnxt_re_ib_reg+0x2b/0xb30 [bnxt_re] [ 371.635447] bnxt_re_task+0xfd/0x180 [bnxt_re] [ 371.635449] process_one_work+0x216/0x5b0 [ 371.635450] ? process_one_work+0x189/0x5b0 [ 371.635453] worker_thread+0x4e/0x3d0 [ 371.635455] kthread+0x10e/0x140 [ 371.635456] ? process_one_work+0x5b0/0x5b0 [ 371.635458] ? kthread_stop+0x220/0x220 [ 371.635460] ret_from_fork+0x3a/0x50 [ 371.635477] INFO: task NetworkManager:1228 blocked for more than 120 seconds. [ 371.635478] Tainted: G B OE 4.20.0-rc1+ #42 [ 371.635479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Release the rtnl_lock correctly in the failure path. Fixes: de5c95d0f518 ("RDMA/bnxt_re: Fix system crash during RDMA resource initialization") Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/drivers: Use core provided API for registering device attributesParav Pandit2018-10-171-43/+31
| | | | | | | | | Use rdma_set_device_sysfs_group() to register device attributes and simplify the driver. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Avoid resource leak in case the NQ registration failsSelvin Xavier2018-10-161-9/+22
| | | | | | | | | In case the NQ alloc/enable fails, free up the already allocated/enabled NQ before reporting failure. Also, track the alloc/enable using proper state checking. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Wait for delayed work to finish before device removalSelvin Xavier2018-10-161-1/+1
| | | | | | | | | | Delayed work bnxt_re_worker would be still running even after cancel_delayed_work returns. This causes crash as the driver proceeds with device removal. To make sure that the work is finished before returning, use cancel_delayed_work_sync. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Fix qp async event reportingDevesh Sharma2018-10-161-4/+9
| | | | | | | | | Reports affiliated async event on the qp-async event channel instead of global event channel. Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Remove the unnecessary version macro definitionSelvin Xavier2018-10-161-1/+1
| | | | | | | | Version macro is not required as the driver is not maintaining the version. Removing the references of this macro too. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* Merge branch 'for-rc' into rdma.git for-nextJason Gunthorpe2018-10-161-55/+38
|\ | | | | | | | | | | | | | | | | | | | | | | From git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git This is required to resolve dependencies of the next series of RDMA patches. The code motion conflicts in drivers/infiniband/core/cache.c were resolved. Signed-off-by: Jason Gunthorpe <[email protected]>
| * RDMA/bnxt_re: Fix system crash during RDMA resource initializationSelvin Xavier2018-09-241-55/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bnxt_re_ib_reg acquires and releases the rtnl lock whenever it accesses the L2 driver. The following sequence can trigger a crash Acquires the rtnl_lock -> Registers roce driver callback with L2 driver -> release the rtnl lock bnxt_re acquires the rtnl_lock -> Request for MSIx vectors -> release the rtnl_lock Issue happens when bnxt_re proceeds with remaining part of initialization and L2 driver invokes bnxt_ulp_irq_stop as a part of bnxt_open_nic. The crash is in bnxt_qplib_nq_stop_irq as the NQ structures are not initialized yet, <snip> [ 3551.726647] BUG: unable to handle kernel NULL pointer dereference at (null) [ 3551.726656] IP: [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re] [ 3551.726674] PGD 0 [ 3551.726679] Oops: 0002 1 SMP ... [ 3551.726822] Hardware name: Dell Inc. PowerEdge R720/08RW36, BIOS 2.4.3 07/09/2014 [ 3551.726826] task: ffff97e30eec5ee0 ti: ffff97e3173bc000 task.ti: ffff97e3173bc000 [ 3551.726829] RIP: 0010:[<ffffffffc0840ee9>] [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re] ... [ 3551.726872] Call Trace: [ 3551.726886] [<ffffffffc082cb9e>] bnxt_re_stop_irq+0x4e/0x70 [bnxt_re] [ 3551.726899] [<ffffffffc07d6a53>] bnxt_ulp_irq_stop+0x43/0x70 [bnxt_en] [ 3551.726908] [<ffffffffc07c82f4>] bnxt_reserve_rings+0x174/0x1e0 [bnxt_en] [ 3551.726917] [<ffffffffc07cafd8>] __bnxt_open_nic+0x368/0x9a0 [bnxt_en] [ 3551.726925] [<ffffffffc07cb62b>] bnxt_open_nic+0x1b/0x50 [bnxt_en] [ 3551.726934] [<ffffffffc07cc62f>] bnxt_setup_mq_tc+0x11f/0x260 [bnxt_en] [ 3551.726943] [<ffffffffc07d5f58>] bnxt_dcbnl_ieee_setets+0xb8/0x1f0 [bnxt_en] [ 3551.726954] [<ffffffff890f983a>] dcbnl_ieee_set+0x9a/0x250 [ 3551.726966] [<ffffffff88fd6d21>] ? __alloc_skb+0xa1/0x2d0 [ 3551.726972] [<ffffffff890f72fa>] dcb_doit+0x13a/0x210 [ 3551.726981] [<ffffffff89003ff7>] rtnetlink_rcv_msg+0xa7/0x260 [ 3551.726989] [<ffffffff88ffdb00>] ? rtnl_unicast+0x20/0x30 [ 3551.726996] [<ffffffff88bf9dc8>] ? __kmalloc_node_track_caller+0x58/0x290 [ 3551.727002] [<ffffffff890f7326>] ? dcb_doit+0x166/0x210 [ 3551.727007] [<ffffffff88fd6d0d>] ? __alloc_skb+0x8d/0x2d0 [ 3551.727012] [<ffffffff89003f50>] ? rtnl_newlink+0x880/0x880 ... [ 3551.727104] [<ffffffff8911f7d5>] system_call_fastpath+0x1c/0x21 ... [ 3551.727164] RIP [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re] [ 3551.727175] RSP <ffff97e3173bf788> [ 3551.727177] CR2: 0000000000000000 Avoid this inconsistent state and system crash by acquiring the rtnl lock for the entire duration of device initialization. Re-factor the code to remove the rtnl lock from the individual function and acquire and release it from the caller. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Fixes: 6e04b1035689 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes") Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* | RDMA: Fully setup the device name in ib_register_deviceJason Gunthorpe2018-09-261-2/+1
|/ | | | | | | | | | | | | | | | | | | | | | | The current code has two copies of the device name, ibdev->dev and dev_name(&ibdev->dev), and they are setup at different times, which is very confusing. Set them both up at the same time and make dev_name() the lead name, which is the proper use of the driver core APIs. To make it very clear that the name is not valid until registration pass it in to the ib_register_device() call rather than messing with ibdev->name directly. Also the reorganization now checks that dev_name is unique even if it does not contain a %. Signed-off-by: Jason Gunthorpe <[email protected]> Acked-by: Adit Ranadive <[email protected]> Reviewed-by: Steve Wise <[email protected]> Acked-by: Devesh Sharma <[email protected]> Reviewed-by: Shiraz Saleem <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Reviewed-by: Michael J. Ruhl <[email protected]>
* RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changesDevesh Sharma2018-05-251-1/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent changes in Broadcom's ethernet driver(L2 driver) broke RoCE functionality in terms of MSIx vector allocation and de-allocation. There is a possibility that L2 driver would initiate MSIx vector reallocation depending upon the requests coming from administrator. In such cases L2 driver needs to free up all the MSIx vectors allocated previously and reallocate/initialize those. If RoCE driver is loaded and reshuffling is attempted, there will be kernel crashes because RoCE driver would still be holding the MSIx vectors but L2 driver would attempt to free in-use vectors. Thus leading to a kernel crash. Making changes in roce driver to fix crashes described above. As part of solution L2 driver tells RoCE driver to release the MSIx vector whenever there is a need. When RoCE driver get message it sync up with all the running tasklets and IRQ handlers and releases the vectors. L2 driver send one more message to RoCE driver to resume the MSIx vectors. L2 driver guarantees that RoCE vector do not change during reshuffling. Fixes: ec86f14ea506 ("bnxt_en: Add ULP calls to stop and restart IRQs.") Fixes: 08654eb213a8 ("bnxt_en: Change IRQ assignment for RDMA driver.") Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/providers: Simplify query_gid callback of RoCE providersParav Pandit2018-04-041-1/+0
| | | | | | | | | | | | | ib_query_gid() fetches the GID from the software cache maintained in ib_core for RoCE ports. Therefore, simplify the provider drivers for RoCE to treat query_gid() callback as never called for RoCE, and only require non-RoCE devices to implement it. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* IB/uverbs: Extend uverbs_ioctl header with driver_idMatan Barak2018-03-191-0/+1
| | | | | | | | | | | | | | Extending uverbs_ioctl header with driver_id and another reserved field. driver_id should be used in order to identify the driver. Since every driver could have its own parsing tree, this is necessary for strace support. Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB support and thus we add some reserved fields for future usage. Reviewed-by: Yishai Hadas <[email protected]> Signed-off-by: Matan Barak <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Avoid Hard lockup during error CQE processingSelvin Xavier2018-03-071-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hitting the following hardlockup due to a race condition in error CQE processing. [26146.879798] bnxt_en 0000:04:00.0: QPLIB: FP: CQ Processed Req [26146.886346] bnxt_en 0000:04:00.0: QPLIB: wr_id[1251] = 0x0 with status 0xa [26156.350935] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4 [26156.357470] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace [26156.447957] CPU: 4 PID: 3413 Comm: kworker/4:1H Kdump: loaded [26156.457994] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, [26156.466390] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [26156.472639] Call Trace: [26156.475379] <NMI> [<ffffffff98d0d722>] dump_stack+0x19/0x1b [26156.481833] [<ffffffff9873f775>] watchdog_overflow_callback+0x135/0x140 [26156.489341] [<ffffffff9877f237>] __perf_event_overflow+0x57/0x100 [26156.496256] [<ffffffff98787c24>] perf_event_overflow+0x14/0x20 [26156.502887] [<ffffffff9860a580>] intel_pmu_handle_irq+0x220/0x510 [26156.509813] [<ffffffff98d16031>] perf_event_nmi_handler+0x31/0x50 [26156.516738] [<ffffffff98d1790c>] nmi_handle.isra.0+0x8c/0x150 [26156.523273] [<ffffffff98d17be8>] do_nmi+0x218/0x460 [26156.528834] [<ffffffff98d16d79>] end_repeat_nmi+0x1e/0x7e [26156.534980] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200 [26156.543268] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200 [26156.551556] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200 [26156.559842] <EOE> [<ffffffff98d083e4>] queued_spin_lock_slowpath+0xb/0xf [26156.567555] [<ffffffff98d15690>] _raw_spin_lock+0x20/0x30 [26156.573696] [<ffffffffc08381a1>] bnxt_qplib_lock_buddy_cq+0x31/0x40 [bnxt_re] [26156.581789] [<ffffffffc083bbaa>] bnxt_qplib_poll_cq+0x43a/0xf10 [bnxt_re] [26156.589493] [<ffffffffc083239b>] bnxt_re_poll_cq+0x9b/0x760 [bnxt_re] The issue happens if RQ poll_cq or SQ poll_cq or Async error event tries to put the error QP in flush list. Since SQ and RQ of each error qp are added to two different flush list, we need to protect it using locks of corresponding CQs. Difference in order of acquiring the lock in SQ poll_cq and RQ poll_cq can cause a hard lockup. Revisits the locking strategy and removes the usage of qplib_cq.hwq.lock. Instead of this lock, introduces qplib_cq.flush_lock to handle addition/deletion of QPs in flush list. Also, always invoke the flush_lock in order (SQ CQ lock first and then RQ CQ lock) to avoid any potential deadlock. Other than the poll_cq context, the movement of QP to/from flush list can be done in modify_qp context or from an async error event from HW. Synchronize these operations using the bnxt_re verbs layer CQ locks. To achieve this, adds a call back to the HW abstraction layer(qplib) to bnxt_re ib_verbs layer in case of async error event. Also, removes the buddy cq functions as it is no longer required. Signed-off-by: Sriharsha Basavapatna <[email protected]> Signed-off-by: Somnath Kotur <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Fix the ib_reg failure cleanupSelvin Xavier2018-02-281-1/+4
| | | | | | | | Release the netdev references in the cleanup path. Invokes the cleanup routines if bnxt_re_ib_reg fails. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
* RDMA/bnxt_re: Avoid system hang during device un-regSelvin Xavier2018-02-201-4/+3
| | | | | | | | | | | | BNXT_RE_FLAG_TASK_IN_PROG doesn't handle multiple work requests posted together. Track schedule of multiple workqueue items by maintaining a per device counter and proceed with IB dereg only if this counter is zero. flush_workqueue is no longer required from NETDEV_UNREGISTER path. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA/bnxt_re: Fix system crash during load/unloadSelvin Xavier2018-02-201-0/+5
| | | | | | | | | | | | | During driver unload, the driver proceeds with cleanup without waiting for the scheduled events. So the device pointers get freed up and driver crashes when the events are scheduled later. Flush the bnxt_re_task work queue before starting device removal. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA/bnxt_re: Add SRQ support for Broadcom adaptersDevesh Sharma2018-01-181-4/+98
| | | | | | | | | | | | | Shared receive queue (SRQ) is defined as a pool of receive buffers shared among multiple QPs which belong to same protection domain in a given process context. Use of SRQ reduces the memory foot print of IB applications. Broadcom adapters support SRQ, adding code-changes to enable shared receive queue. Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA/bnxt_re: expose detailed stats retrieved from HWSelvin Xavier2018-01-181-0/+1
| | | | | | | | | | | | | | | Broadcom's adapter supports more granular statistics to allow better understanding about the state of the chip when data traffic is flowing. Exposing the detailed stats to the consumer through the standard hook available in the kverbs interface. In order to retrieve all the information, driver implements a firmware command. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA/bnxt_re: Add support for query firmware versionSelvin Xavier2018-01-171-10/+1
| | | | | | | | | | | | The device now reports firmware version thus, removing the hard coded values of the FW version string and redundant fw_rev hook from sysfs. Adding code to query firmware version from underlying device and report it through the kernel verb to get firmware version string. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA/bnxt_re: Enable RoCE on virtual functionsSelvin Xavier2018-01-171-27/+108
| | | | | | | | | | | | | | | RoCE can be used by virtual functions (VFs) as well. Adding code changes to allow resource reservation, initialization and avail the resources to the RDMA applications running on those VFs. Currently, fifty percent of the total available resources are reserved for PF and remaining are equally divided among active VFs. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* bnxt_re: report RoCE device support at info levelJonathan Toppins2018-01-051-1/+1
| | | | | | | | | | | Reporting that a device doesn't support RoCE seems like a valuable piece of information to have when trying to determine why a driver is not binding to a device. Better to report this at info log level instead of requiring a user to enable all debug messages in the driver. Signed-off-by: Jonathan Toppins <[email protected]> Acked-By: Devesh Sharma <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* bnxt_re: Implement the shutdown hook of the L2-RoCE driver interfaceSomnath Kotur2017-10-251-1/+13
| | | | | | | | | | | | When host is shutting down, it invokes the shutdown hook of the L2 driver where it would attempt to free the MSI-X vectors, but would fail because some vectors are held by the RoCE driver. Implement the new hook in the L2 -> RoCE interface which will be invoked so that the RoCE driver can unregister the device and free up the MSI-X vectors it had claimed so that L2 can proceed with it's shutdown without failure. Signed-off-by: Somnath Kotur <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* bnxt_re: Fix incorrect usage of test_bit()Somnath Kotur2017-10-181-2/+3
| | | | | | | | | | | | | | | test_bit() takes a bit number while the 'flags' field in struct bnxt_qplib_rcfw was using actual BIT position converted values. Fix this by assigning bit numbers and use consistent APIs all the flag values. Also logging a message in case of failure. Thanks to Dan Carpenter for pointing this out. Suggested-by: Dan Carpenter <[email protected]> Signed-off-by: Somnath Kotur <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* bnxt_re: Remove RTNL lock dependency in bnxt_re_query_portSomnath Kotur2017-09-221-0/+4
| | | | | | | | | | | | | | | | When there is a NETDEV_UNREGISTER event, bnxt_re driver calls ib_unregister_device() (RTNL lock held). ib_unregister_device attempts to flush a worker queue scheduled by ib_core and that queue might have a pending ib_query_port(). ib_query_port in turn calls bnxt_re_query_port(), which while querying the link speed using ib_get_eth_speed(), tries to acquire the rtnl_lock() which was already held by NETDEV_UNREGISTER. Fixing the issue by removing the link speed query from bnxt_re_query_port() Now the speed is queried post a successful ib_register_device or whenever there is a NETDEV_CHANGE event. Signed-off-by: Somnath Kotur <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* bnxt_re: Fix race between the netdev register and unregister eventsSomnath Kotur2017-09-221-0/+8
| | | | | | | | | | | | | | | | | | | | | | | Upon receipt of the NETDEV_REGISTER event from the netdev notifier chain, the IB stack registration is spawned off to a workqueue since that also requires an rtnl lock. There could be 2 kinds of races between the NETDEV_REGISTER and the NETDEV_UNREGISTER event handling. a)The NETDEV_UNREGISTER event is received in rapid succession after the NETDEV_REGISTER event even before the work queue got a chance to run. b)The NETDEV_UNREGISTER event is received while the workqueue that handles registration with the IB stack is still in progress. Handle both the races with a bit flag that is set just before the work item is queued and cleared in the workqueue after the event is handled just before the workqueue item is freed. While adding the new flag, it was noted that the flags are all used in *_bit() operations which expect a bit number and not a literal constant with a bit set. So change the numbers to be bit numbers. Signed-off-by: Somnath Kotur <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* bnxt_re: Free up devices in module_exit pathSomnath Kotur2017-09-221-0/+16
| | | | | | | | Clean up all devices added to the bnxt_re_dev_list in the module_exit entry point. Signed-off-by: Somnath Kotur <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA/bnxt_re: Implement the alloc/get_hw_stats callbackSomnath Kotur2017-08-181-0/+4
| | | | | | | | | Expose HW counters using the get_hw_stats callback Signed-off-by: Somnath Kotur <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA/bnxt_re: Allocate multiple notification queuesSelvin Xavier2017-08-181-39/+67
| | | | | | | | | | | | | Enables multiple Interrupt vectors. Driver is requesting the max MSIX vectors based on the number of online cpus and creates upto 9 MSIx vectors (1 for control path and 8 for data path). A tasklet is created for each of these vectors. NQs are assigned to CQs in round robin fashion. This patch also adds IRQ affinity hint for the MSIX vector of each NQ. Signed-off-by: Ray Jui <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA/bnxt_re: Allow posting when QPs are in errorSelvin Xavier2017-07-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | This patch allows driver to post send and receive requests on QPs which are in error state. Instead of flushing the QP in the context of polling error CQEs, the QPs will be added to a flush list maintained per CQ. QP state is moved to error. QP is added to flush list if the user moves it to error state using modify_qp also. After polling the HW CQ in poll_cq routine, this flush list is traversed and driver completes work requests on each QP in the flush list, till the budget expires. The QP is moved out of flush list during QP destroy or during modify_QP to RESET. When ULPs post Work Requests while QP is in error state, driver will store the ULP data and then increment the QP producer s/w index, without ringing doorbell. It then schedules a worker to invoke the CQ handler since the interrupts wont be generated from the HW for this request. Signed-off-by: Sriharsha Basavapatna <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA/bnxt_re: Add vlan tag for untagged RoCE traffic when PFC is configuredKalesh AP2017-07-241-6/+47
| | | | | | | | | | | | Current implementation does not program vlan header insertion in RoCE packet if no vlan is configured. Firmware does not add prority when there is no vlan tag in the packet. Modify the code to insert vlan header when PFC is enabled on the interface. Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA: Remove useless MODULE_VERSIONLeon Romanovsky2017-07-241-1/+0
| | | | | | | | | | | | | | | | | All modules in drivers/infiniband defined and used MODULE_VERSION, which was pointless because the kernel version describes their state more accurate then those arbitrary numbers. Signed-off-by: Leon Romanovsky <[email protected]> Acked-by: Sagi Grimbrg <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Acked-by: Dennis Dalessandro <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Acked-by: Selvin Xavier <[email protected]> Acked-by: Ram Amrani <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Acked-by: Adit Ranadive <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA/bnxt_re: Delete unsupported modify_port functionLeon Romanovsky2017-07-241-1/+0
| | | | | | | | | | | | There is no need to return always zero for function which is not supported. The IB stack treats uninitialized ib_device->functions as not implemented. Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Reviewed-by: Yuval Shaia <[email protected]> Acked-by: Selvin Xavier <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA/bnxt_re: Specify RDMA component when allocating stats contextSomnath Kotur2017-07-201-0/+1
| | | | | | | | | Starting FW version 20.6.47, firmware is keeping separate statistics for L2 and RDMA. However, driver needs to specify RDMA or not when allocating stat_ctx. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
* RDMA/bnxt_re: Remove FMR supportSelvin Xavier2017-06-141-4/+0
| | | | | | | | | | | Some issues observed with FMR implementation while running stress traffic. So removing the FMR verbs support for now. Signed-off-by: Selvin Xavier <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>