aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/intel/ice/ice_lib.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch '100GbE' of ↵Jakub Kicinski2025-07-251-5/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== libie: commonize adminq structure Michal Swiatkowski says: It is a prework to allow reusing some specific Intel code (eq. fwlog). Move common *_aq_desc structure to libie header and changing it in ice, ixgbe, i40e and iavf. Only generic adminq commands can be easily moved to common header, as rest is slightly different. Format remains the same. It will be better to correctly move it when it will be needed to commonize other part of the code. Move *_aq_str() to new libie module (libie_adminq) and use it across drivers. The functions are exactly the same in each driver. Some more adminq helpers/functions can be moved to libie_adminq when needed. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: i40e: use libie_aq_str iavf: use libie_aq_str ice: use libie_aq_str libie: add adminq helper for converting err to str iavf: use libie adminq descriptors i40e: use libie adminq descriptors ixgbe: use libie adminq descriptors ice, libie: move generic adminq descriptors to lib ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
| * ice: use libie_aq_strMichal Swiatkowski2025-07-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Simple: s/ice_aq_str/libie_aq_str Add libie_aminq module in ice Kconfig. Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
| * ice, libie: move generic adminq descriptors to libMichal Swiatkowski2025-07-241-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The descriptor structure is the same in ice, ixgbe and i40e. Move it to common libie header to use it across different driver. Leave device specific adminq commands in separate folders. This lead to a change that need to be done in filling/getting descriptor: - previous: struct specific_desc *cmd; cmd = &desc.params.specific_desc; - now: struct specific_desc *cmd; cmd = libie_aq_raw(&desc); Do this changes across the driver to allow clean build. The casting only have to be done in case of specific descriptors, for generic one union can still be used. Changes beside code moving: - change ICE_ prefix to LIBIE_ prefix (ice_ and libie_ too) - remove shift variables not otherwise needed (in libie_aq_flags) - fill/get descriptor data based on desc.params.raw whenever the descriptor isn't defined in libie - move defines from the libie_aq_sth structure outside - add libie_aq_raw helper and use it instead of explicit casting Reviewed by: Przemek Kitszel <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | net: Fix typosBjorn Helgaas2025-07-251-1/+1
|/ | | | | | | | | | Fix typos in comments and error messages. Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: David Arinzon <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* ice: move ice_vsi_update_l2tsel to ice_lib.cJacob Keller2025-07-101-0/+35
| | | | | | | | | | | | | A future change is going to need to call ice_vsi_update_l2tsel from a new context outside of ice_virtchnl.c Since this function deals with a generic VSI, move it into ice_lib.c to enable calling it from other places in the ice driver. Signed-off-by: Jacob Keller <[email protected]> Reviewed-by: Madhu Chittim <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* ice: add a separate Rx handler for flow director commandsMichal Kubiak2025-06-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "ice" driver implementation uses the control VSI to handle the flow director configuration for PFs and VFs. Unfortunately, although a separate VSI type was created to handle flow director queues, the Rx queue handler was shared between the flow director and a standard NAPI Rx handler. Such a design approach was not very flexible. First, it mixed hotpath and slowpath code, blocking their further optimization. It also created a huge overkill for the flow director command processing, which is descriptor-based only, so there is no need to allocate Rx data buffers. For the above reasons, implement a separate Rx handler for the control VSI. Also, remove from the NAPI handler the code dedicated to configuring the flow director rules on VFs. Do not allocate Rx data buffers to the flow director queues because their processing is descriptor-based only. Finally, allow Rx data queues to be allocated only for VSIs that have netdev assigned to them. This handler splitting approach is the first step in converting the driver to use the Page Pool (which can only be used for data queues). Test hints: 1. Create a VF for any PF managed by the ice driver. 2. In a loop, add and delete flow director rules for the VF, e.g.: for i in {1..128}; do q=$(( i % 16 )) ethtool -N ens802f0v0 flow-type tcp4 dst-port "$i" action "$q" done for i in {0..127}; do ethtool -N ens802f0v0 delete "$i" done Suggested-by: Maciej Fijalkowski <[email protected]> Suggested-by: Michal Swiatkowski <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Michal Kubiak <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* net: intel: rename 'hena' to 'hashcfg' for clarityJacob Keller2025-06-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | i40e, ice, and iAVF all use 'hena' as a shorthand for the "hash enable" configuration. This comes originally from the X710 datasheet 'xxQF_HENA' registers. In the context of the registers the meaning is fairly clear. However, on its own, hena is a weird name that can be more difficult to understand. This is especially true in ice. The E810 hardware doesn't even have registers with HENA in the name. Replace the shorthand 'hena' with 'hashcfg'. This makes it clear the variables deal with the Hash configuration, not just a single boolean on/off for all hashing. Do not update the register names. These come directly from the datasheet for X710 and X722, and it is more important that the names can be searched. Suggested-by: Przemek Kitszel <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* ice: receive LLDP on trusted VFsMateusz Pacuszka2025-04-111-6/+42
| | | | | | | | | | | | | | | | | | | When a trusted VF tries to configure an LLDP multicast address, configure a rule that would mirror the traffic to this VF, untrusted VFs are not allowed to receive LLDP at all, so the request to add LLDP MAC address will always fail for them. Add a forwarding LLDP filter to a trusted VF when it tries to add an LLDP multicast MAC address. The MAC address has to be added after enabling trust (through restarting the LLDP service). Signed-off-by: Mateusz Pacuszka <[email protected]> Co-developed-by: Larysa Zaremba <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Reviewed-by: Michal Swiatkowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* ice: do not add LLDP-specific filter if not necessaryLarysa Zaremba2025-04-111-7/+16
| | | | | | | | | | | | | | | | | | | | Commit 34295a3696fb ("ice: implement new LLDP filter command") introduced the ability to use LLDP-specific filter that directs all LLDP traffic to a single VSI. However, current goal is for all trusted VFs to be able to see LLDP neighbors, which is impossible to do with the special filter. Make using the generic filter the default choice and fall back to special one only if a generic filter cannot be added. That way setups with "NVMs where an already existent LLDP filter is blocking the creation of a filter to allow LLDP packets" will still be able to configure software Rx LLDP on PF only, while all other setups would be able to forward them to VFs too. Reviewed-by: Michal Swiatkowski <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* ice: Add E830 checksum offload supportPaul Greenwalt2025-03-181-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | E830 supports raw receive and generic transmit checksum offloads. Raw receive checksum support is provided by hardware calculating the checksum over the whole packet, regardless of type. The calculated checksum is provided to driver in the Rx flex descriptor. Then the driver assigns the checksum to skb->csum and sets skb->ip_summed to CHECKSUM_COMPLETE. Generic transmit checksum support is provided by hardware calculating the checksum given two offsets: the start offset to begin checksum calculation, and the offset to insert the calculated checksum in the packet. Support is advertised to the stack using NETIF_F_HW_CSUM feature. E830 has the following limitations when both generic transmit checksum offload and TCP Segmentation Offload (TSO) are enabled: 1. Inner packet header modification is not supported. This restriction includes the inability to alter TCP flags, such as the push flag. As a result, this limitation can impact the receiver's ability to coalesce packets, potentially degrading network throughput. 2. The Maximum Segment Size (MSS) is limited to 1023 bytes, which prevents support of Maximum Transmission Unit (MTU) greater than 1063 bytes. Therefore NETIF_F_HW_CSUM and NETIF_F_ALL_TSO features are mutually exclusive. NETIF_F_HW_CSUM hardware feature support is indicated but is not enabled by default. Instead, IP checksums and NETIF_F_ALL_TSO are the defaults. Enforcement of mutual exclusivity of NETIF_F_HW_CSUM and NETIF_F_ALL_TSO is done in ice_set_features(). Mutual exclusivity of IP checksums and NETIF_F_HW_CSUM is handled by netdev_fix_features(). When NETIF_F_HW_CSUM is requested the provided skb->csum_start and skb->csum_offset are passed to hardware in the Tx context descriptor generic checksum (GCS) parameters. Hardware calculates the 1's complement from skb->csum_start to the end of the packet, and inserts the result in the packet at skb->csum_offset. Co-developed-by: Alice Michael <[email protected]> Signed-off-by: Alice Michael <[email protected]> Co-developed-by: Eric Joyner <[email protected]> Signed-off-by: Eric Joyner <[email protected]> Signed-off-by: Paul Greenwalt <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni2025-03-131-18/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR (net-6.14-rc6). Conflicts: tools/testing/selftests/drivers/net/ping.py 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py") de94e8697405 ("selftests: drv-net: store addresses in dict indexed by ipver") https://lore.kernel.org/netdev/[email protected]/ net/core/devmem.c a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()") 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations") https://lore.kernel.org/netdev/[email protected]/ Adjacent changes: tools/testing/selftests/net/Makefile 6f50175ccad4 ("selftests: Add IPv6 link-local address generation tests for GRE devices.") 2e5584e0f913 ("selftests/net: expand cmsg_ipv6.sh with ipv4") drivers/net/ethernet/broadcom/bnxt/bnxt.c 661958552eda ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic") fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings") Signed-off-by: Paolo Abeni <[email protected]>
| * ice: do not configure destination override for switchdevLarysa Zaremba2025-03-051-18/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After switchdev is enabled and disabled later, LLDP packets sending stops, despite working perfectly fine before and during switchdev state. To reproduce (creating/destroying VF is what triggers the reconfiguration): devlink dev eswitch set pci/<address> mode switchdev echo '2' > /sys/class/net/<ifname>/device/sriov_numvfs echo '0' > /sys/class/net/<ifname>/device/sriov_numvfs This happens because LLDP relies on the destination override functionality. It needs to 1) set a flag in the descriptor, 2) set the VSI permission to make it valid. The permissions are set when the PF VSI is first configured, but switchdev then enables it for the uplink VSI (which is always the PF) once more when configured and disables when deconfigured, which leads to software-generated LLDP packets being blocked. Do not modify the destination override permissions when configuring switchdev, as the enabled state is the default configuration that is never modified. Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment") Reviewed-by: Michal Swiatkowski <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | ice: use napi's irq affinity and rmap IRQ notifiersAhmed Zaki2025-02-271-7/+0
| | | | | | | | | | | | | | | | | | Delete the driver CPU affinity and aRFS rmap info, use the core's API instead. Signed-off-by: Ahmed Zaki <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* | ice: clear NAPI's IRQ numbers in ice_vsi_clear_napi_queues()Ahmed Zaki2025-02-271-1/+8
| | | | | | | | | | | | | | | | | | We set the NAPI's IRQ number in ice_vsi_set_napi_queues(). Clear the NAPI's IRQ in ice_vsi_clear_napi_queues(). Signed-off-by: Ahmed Zaki <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* | ice: support Rx timestamp on flex descriptorSimei Su2025-02-141-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To support Rx timestamp offload, VIRTCHNL_OP_1588_PTP_CAPS is sent by the VF to request PTP capability and responded by the PF what capability is enabled for that VF. Hardware captures timestamps which contain only 32 bits of nominal nanoseconds, as opposed to the 64bit timestamps that the stack expects. To convert 32b to 64b, we need a current PHC time. VIRTCHNL_OP_1588_PTP_GET_TIME is sent by the VF and responded by the PF with the current PHC time. Signed-off-by: Simei Su <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Co-developed-by: Mateusz Polchlopek <[email protected]> Signed-off-by: Mateusz Polchlopek <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | ice: Don't check device type when checking GNSS presenceKarol Kolacinski2025-02-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't check if the device type is E810T as non-E810T devices can support GNSS too and PCA9575 check is enough to determine if GNSS is present or not. Rename ice_gnss_is_gps_present() to ice_gnss_is_module_present() because GNSS module supports multiple GNSS providers, not only GPS. Move functions related to PCA9575 from ice_ptp_hw.c to ice_common.c to be able to access them when PTP is disabled in the kernel, but GNSS is enabled. Remove logical AND with ICE_AQC_LINK_TOPO_NODE_TYPE_M in ice_get_pca9575_handle(), which has no effect, and reorder device type checks to check the device_id first, then set other variables. Signed-off-by: Karol Kolacinski <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: enable_rdma devlink paramMichal Swiatkowski2025-02-051-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement enable_rdma devlink parameter to allow user to turn RDMA feature on and off. It is useful when there is no enough interrupts and user doesn't need RDMA feature. Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Jan Sokolowski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | ice: treat dyn_allowed only as suggestionMichal Swiatkowski2025-02-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It can be needed to have some MSI-X allocated as static and rest as dynamic. For example on PF VSI. We want to always have minimum one MSI-X on it, because of that it is allocated as a static one, rest can be dynamic if it is supported. Change the ice_get_irq_res() to allow using static entries if they are free even if caller wants dynamic one. Adjust limit values to the new approach. Min and max in limit means the values that are valid, so decrease max and num_static by one. Set vsi::irq_dyn_alloc if dynamic allocation is supported. Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | ice: get rid of num_lan_msix fieldMichal Swiatkowski2025-02-051-11/+14
|/ | | | | | | | | | | | Remove the field to allow having more queues than MSI-X on VSI. As default the number will be the same, but if there won't be more MSI-X available VSI can run with at least one MSI-X. Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* ice: support FW Recovery ModeKonrad Knitter2025-01-161-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recovery Mode is intended to recover from a fatal failure scenario in which the device is not accessible to the host, meaning the firmware is non-responsive. The purpose of the Firmware Recovery Mode is to enable software tools to update firmware and/or device configuration so the fatal error can be resolved. Recovery Mode Firmware supports a limited set of admin commands required for NVM update. Recovery Firmware does not support hardware interrupts so a polling mode is used. The driver will expose only the minimum set of devlink commands required for the recovery of the adapter. Using an appropriate NVM image, the user can recover the adapter using the devlink flash API. Prior to 4.20 E810 Adapter Recovery Firmware supports only the update and erase of the "fw.mgmt" component. E810 Adapter Recovery Firmware doesn't support selected preservation of cards settings or identifiers. The following command can be used to recover the adapter: $ devlink dev flash <pci-address> <update-image.bin> component fw.mgmt overwrite settings overwrite identifier Newer FW versions (4.20 or newer) supports update of "fw.undi" and "fw.netlist" components. $ devlink dev flash <pci-address> <update-image.bin> Tested on Intel Corporation Ethernet Controller E810-C for SFP FW revision 3.20 and 4.30. Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Konrad Knitter <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* ice: Add support for persistent NAPI configJoe Damato2024-11-131-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use netif_napi_add_config to assign persistent per-NAPI config when initializing NAPIs. This preserves NAPI config settings when queue counts are adjusted. Tested with an E810-2CQDA2 NIC. Begin by setting the queue count to 4: $ sudo ethtool -L eth4 combined 4 Check the queue settings: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now, set the queue with NAPI ID 8451 to have a gro-flush-timeout of 1111: $ sudo ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --do napi-set --json='{"id": 8451, "gro-flush-timeout": 1111}' None Check that worked: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 1111, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now reduce the queue count to 2, which would destroy the queue with NAPI ID 8451: $ sudo ethtool -L eth4 combined 2 Check the queue settings, noting that NAPI ID 8451 is gone: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now, increase the number of queues back to 4: $ sudo ethtool -L eth4 combined 4 Dump the settings, expecting to see the same NAPI IDs as above and for NAPI ID 8451 to have its gro-flush-timeout set to 1111: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 1111, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Signed-off-by: Joe Damato <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* ice: add E830 HW VF mailbox message limit supportPaul Greenwalt2024-10-081-0/+3
| | | | | | | | | | | | | | | | | E830 adds hardware support to prevent the VF from overflowing the PF mailbox with VIRTCHNL messages. E830 will use the hardware feature (ICE_F_MBX_LIMIT) instead of the software solution ice_is_malicious_vf(). To prevent a VF from overflowing the PF, the PF sets the number of messages per VF that can be in the PF's mailbox queue (ICE_MBX_OVERFLOW_WATERMARK). When the PF processes a message from a VF, the PF decrements the per VF message count using the E830_MBX_VF_DEC_TRIG register. Signed-off-by: Paul Greenwalt <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-09-131-7/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. No conflicts (sort of) and no adjacent changes. This merge reverts commit b3c9e65eb227 ("net: hsr: remove seqnr_lock") from net, as it was superseded by commit 430d67bdcb04 ("net: hsr: Use the seqnr lock for frames received via interlink port.") in net-next. Signed-off-by: Jakub Kicinski <[email protected]>
| * ice: Fix lldp packets dropping after changing the number of channelsMartyna Szapar-Mudlaw2024-09-091-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After vsi setup refactor commit 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") ice_cfg_sw_lldp function which removes rx rule directing LLDP packets to vsi is moved from ice_vsi_release to ice_vsi_decfg function. ice_vsi_decfg is used in more cases than just in vsi_release resulting in unnecessary removal of rx lldp packets handling switch rule. This leads to lldp packets being dropped after a change number of channels via ethtool. This patch moves ice_cfg_sw_lldp function that removes rx lldp sw rule back to ice_vsi_release function. Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reported-by: Matěj Grégr <[email protected]> Closes: https://lore.kernel.org/intel-wired-lan/[email protected]/T/#u Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Martyna Szapar-Mudlaw <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: add basic devlink subfunctions supportPiotr Raczynski2024-09-061-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement devlink port handlers responsible for ethernet type devlink subfunctions. Create subfunction devlink port and setup all resources needed for a subfunction netdev to operate. Configure new VSI for each new subfunction, initialize and configure interrupts and Tx/Rx resources. Set correct MAC filters and create new netdev. For now, subfunction is limited to only one Tx/Rx queue pair. Only allocate new subfunction VSI with devlink port new command. Allocate and free subfunction MSIX interrupt vectors using new API calls with pci_msix_alloc_irq_at and pci_msix_free_irq. Support both automatic and manual subfunction numbers. If no subfunction number is provided, use xa_alloc to pick a number automatically. This will find the first free index and use that as the number. This reduces burden on users in the simple case where a specific number is not required. It may also be slightly faster to check that a number exists since xarray lookup should be faster than a linear scan of the dyn_ports xarray. Co-developed-by: Jacob Keller <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: Piotr Raczynski <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | ice: export ice ndo_ops functionsPiotr Raczynski2024-09-061-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | Make some of the netdevice_ops functions visible from outside for another VSI type created netdev. Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: Piotr Raczynski <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | ice: add new VSI type for subfunctionsPiotr Raczynski2024-09-061-2/+23
|/ | | | | | | | | | | | | | | | | Add required plumbing for new VSI type dedicated to devlink subfunctions. Make sure that the vsi is properly configured and destroyed. Also allow loading XDP and AF_XDP sockets. The first implementation of devlink subfunctions supports only one Tx/Rx queue pair per given subfunction. Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: Piotr Raczynski <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* ice: check ICE_VSI_DOWN under rtnl_lock when preparing for resetLarysa Zaremba2024-09-031-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following scenario: .ndo_bpf() | ice_prepare_for_reset() | ________________________|_______________________________________| rtnl_lock() | | ice_down() | | | test_bit(ICE_VSI_DOWN) - true | | ice_dis_vsi() returns | ice_up() | | | proceeds to rebuild a running VSI | .ndo_bpf() is not the only rtnl-locked callback that toggles the interface to apply new configuration. Another example is .set_channels(). To avoid the race condition above, act only after reading ICE_VSI_DOWN under rtnl_lock. Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow") Reviewed-by: Wojciech Drewek <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* ice: check for XDP rings instead of bpf program when unconfiguringLarysa Zaremba2024-09-031-2/+2
| | | | | | | | | | | | | | | | | | | If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on VSI without applying new ring configuration. When unconfiguring the VSI, we can encounter the state in which there is an XDP program but no XDP rings to destroy or there will be XDP rings that need to be destroyed, but no XDP program to indicate their presence. When unconfiguring, rely on the presence of XDP rings rather then XDP program, as they better represent the current state that has to be destroyed. Reviewed-by: Wojciech Drewek <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* ice: protect XDP configuration with a mutexLarysa Zaremba2024-09-031-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main threat to data consistency in ice_xdp() is a possible asynchronous PF reset. It can be triggered by a user or by TX timeout handler. XDP setup and PF reset code access the same resources in the following sections: * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked * ice_vsi_rebuild() for the PF VSI - not protected * ice_vsi_open() - already rtnl-locked With an unfortunate timing, such accesses can result in a crash such as the one below: [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14 [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18 [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001 [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14 [ +0.394718] ice 0000:b1:00.0: PTP reset successful [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ +0.000045] #PF: supervisor read access in kernel mode [ +0.000023] #PF: error_code(0x0000) - not-present page [ +0.000023] PGD 0 P4D 0 [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1 [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000036] Workqueue: ice ice_service_task [ice] [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice] [...] [ +0.000013] Call Trace: [ +0.000016] <TASK> [ +0.000014] ? __die+0x1f/0x70 [ +0.000029] ? page_fault_oops+0x171/0x4f0 [ +0.000029] ? schedule+0x3b/0xd0 [ +0.000027] ? exc_page_fault+0x7b/0x180 [ +0.000022] ? asm_exc_page_fault+0x22/0x30 [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice] [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice] [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice] [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice] [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice] [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [ +0.000145] ice_rebuild+0x18c/0x840 [ice] [ +0.000145] ? delay_tsc+0x4a/0xc0 [ +0.000022] ? delay_tsc+0x92/0xc0 [ +0.000020] ice_do_reset+0x140/0x180 [ice] [ +0.000886] ice_service_task+0x404/0x1030 [ice] [ +0.000824] process_one_work+0x171/0x340 [ +0.000685] worker_thread+0x277/0x3a0 [ +0.000675] ? preempt_count_add+0x6a/0xa0 [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50 [ +0.000679] ? __pfx_worker_thread+0x10/0x10 [ +0.000653] kthread+0xf0/0x120 [ +0.000635] ? __pfx_kthread+0x10/0x10 [ +0.000616] ret_from_fork+0x2d/0x50 [ +0.000612] ? __pfx_kthread+0x10/0x10 [ +0.000604] ret_from_fork_asm+0x1b/0x30 [ +0.000604] </TASK> The previous way of handling this through returning -EBUSY is not viable, particularly when destroying AF_XDP socket, because the kernel proceeds with removal anyway. There is plenty of code between those calls and there is no need to create a large critical section that covers all of them, same as there is no need to protect ice_vsi_rebuild() with rtnl_lock(). Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp(). Leaving unprotected sections in between would result in two states that have to be considered: 1. when the VSI is closed, but not yet rebuild 2. when VSI is already rebuild, but not yet open The latter case is actually already handled through !netif_running() case, we just need to adjust flag checking a little. The former one is not as trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of hardware interaction happens, this can make adding/deleting rings exit with an error. Luckily, VSI rebuild is pending and can apply new configuration for us in a managed fashion. Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to indicate that ice_xdp() can just hot-swap the program. Also, as ice_vsi_rebuild() flow is touched in this patch, make it more consistent by deconfiguring VSI when coalesce allocation fails. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Wojciech Drewek <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* ice: move netif_queue_set_napi to rtnl-protected sectionsLarysa Zaremba2024-09-031-98/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is not rtnl-locked when called from the reset. This creates the need to take the rtnl_lock just for a single function and complicates the synchronization with .ndo_bpf. At the same time, there no actual need to fill napi-to-queue information at this exact point. Fill napi-to-queue information when opening the VSI and clear it when the VSI is being closed. Those routines are already rtnl-locked. Also, rewrite napi-to-queue assignment in a way that prevents inclusion of XDP queues, as this leads to out-of-bounds writes, such as one below. [ +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0 [ +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047 [ +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2 [ +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000003] Call Trace: [ +0.000003] <TASK> [ +0.000002] dump_stack_lvl+0x60/0x80 [ +0.000007] print_report+0xce/0x630 [ +0.000007] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.000007] ? __virt_addr_valid+0x1c9/0x2c0 [ +0.000005] ? netif_queue_set_napi+0x1c2/0x1e0 [ +0.000003] kasan_report+0xe9/0x120 [ +0.000004] ? netif_queue_set_napi+0x1c2/0x1e0 [ +0.000004] netif_queue_set_napi+0x1c2/0x1e0 [ +0.000005] ice_vsi_close+0x161/0x670 [ice] [ +0.000114] ice_dis_vsi+0x22f/0x270 [ice] [ +0.000095] ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice] [ +0.000086] ice_prepare_for_reset+0x299/0x750 [ice] [ +0.000087] pci_dev_save_and_disable+0x82/0xd0 [ +0.000006] pci_reset_function+0x12d/0x230 [ +0.000004] reset_store+0xa0/0x100 [ +0.000006] ? __pfx_reset_store+0x10/0x10 [ +0.000002] ? __pfx_mutex_lock+0x10/0x10 [ +0.000004] ? __check_object_size+0x4c1/0x640 [ +0.000007] kernfs_fop_write_iter+0x30b/0x4a0 [ +0.000006] vfs_write+0x5d6/0xdf0 [ +0.000005] ? fd_install+0x180/0x350 [ +0.000005] ? __pfx_vfs_write+0x10/0xA10 [ +0.000004] ? do_fcntl+0x52c/0xcd0 [ +0.000004] ? kasan_save_track+0x13/0x60 [ +0.000003] ? kasan_save_free_info+0x37/0x60 [ +0.000006] ksys_write+0xfa/0x1d0 [ +0.000003] ? __pfx_ksys_write+0x10/0x10 [ +0.000002] ? __x64_sys_fcntl+0x121/0x180 [ +0.000004] ? _raw_spin_lock+0x87/0xe0 [ +0.000005] do_syscall_64+0x80/0x170 [ +0.000007] ? _raw_spin_lock+0x87/0xe0 [ +0.000004] ? __pfx__raw_spin_lock+0x10/0x10 [ +0.000003] ? file_close_fd_locked+0x167/0x230 [ +0.000005] ? syscall_exit_to_user_mode+0x7d/0x220 [ +0.000005] ? do_syscall_64+0x8c/0x170 [ +0.000004] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? fput+0x1a/0x2c0 [ +0.000004] ? filp_close+0x19/0x30 [ +0.000004] ? do_dup2+0x25a/0x4c0 [ +0.000004] ? __x64_sys_dup2+0x6e/0x2e0 [ +0.000002] ? syscall_exit_to_user_mode+0x7d/0x220 [ +0.000004] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? __count_memcg_events+0x113/0x380 [ +0.000005] ? handle_mm_fault+0x136/0x820 [ +0.000005] ? do_user_addr_fault+0x444/0xa80 [ +0.000004] ? clear_bhb_loop+0x25/0x80 [ +0.000004] ? clear_bhb_loop+0x25/0x80 [ +0.000002] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7f2033593154 Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios") Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi") Reviewed-by: Wojciech Drewek <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Amritha Nambiar <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Tested-by: George Kuruvinakunnel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* ice: use irq_update_affinity_hint()Michal Schmidt2024-06-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | irq_set_affinity_hint() is deprecated. Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. On the contrary, when the driver applies affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. ice reconfigures VSIs at runtime due to a MIB change (ice_dcb_process_lldp_set_mib_change). Reopening a VSI resets the affinity in ice_vsi_req_irq_msix(). 3. ice has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. I am not sure if updating the affinity hints is at all useful, because irqbalance ignores them since 2016 ([1]), but at least it's harmless. This ice change is similar to i40e commit d34c54d1739c ("i40e: Use irq_update_affinity_hint()"). [1] https://github.com/Irqbalance/irqbalance/commit/dcc411e7bfdd Signed-off-by: Michal Schmidt <[email protected]> Reviewed-by: Sunil Goutham <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-3-d1470cee3347@intel.com Signed-off-by: Jakub Kicinski <[email protected]>
* ice: map XDP queues to vectors in ice_vsi_map_rings_to_vectors()Larysa Zaremba2024-06-061-7/+7
| | | | | | | | | | | | | | | | | | | ice_pf_dcb_recfg() re-maps queues to vectors with ice_vsi_map_rings_to_vectors(), which does not restore the previous state for XDP queues. This leads to no AF_XDP traffic after rebuild. Map XDP queues to vectors in ice_vsi_map_rings_to_vectors(). Also, move the code around, so XDP queues are mapped independently only through .ndo_bpf(). Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-5-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <[email protected]>
* ice: add flag to distinguish reset from .ndo_bpf in XDP rings configLarysa Zaremba2024-06-061-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") has placed ice_vsi_free_q_vectors() after ice_destroy_xdp_rings() in the rebuild process. The behaviour of the XDP rings config functions is context-dependent, so the change of order has led to ice_destroy_xdp_rings() doing additional work and removing XDP prog, when it was supposed to be preserved. Also, dependency on the PF state reset flags creates an additional, fortunately less common problem: * PFR is requested e.g. by tx_timeout handler * .ndo_bpf() is asked to delete the program, calls ice_destroy_xdp_rings(), but reset flag is set, so rings are destroyed without deleting the program * ice_vsi_rebuild tries to delete non-existent XDP rings, because the program is still on the VSI * system crashes With a similar race, when requested to attach a program, ice_prepare_xdp_rings() can actually skip setting the program in the VSI and nevertheless report success. Instead of reverting to the old order of function calls, add an enum argument to both ice_prepare_xdp_rings() and ice_destroy_xdp_rings() in order to distinguish between calls from rebuild and .ndo_bpf(). Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Igor Bagnucki <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-4-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <[email protected]>
* ice: remove af_xdp_zc_qps bitmapLarysa Zaremba2024-06-061-8/+0
| | | | | | | | | | | | | | | | | | | | | | | Referenced commit has introduced a bitmap to distinguish between ZC and copy-mode AF_XDP queues, because xsk_get_pool_from_qid() does not do this for us. The bitmap would be especially useful when restoring previous state after rebuild, if only it was not reallocated in the process. This leads to e.g. xdpsock dying after changing number of queues. Instead of preserving the bitmap during the rebuild, remove it completely and distinguish between ZC and copy-mode queues based on the presence of a device associated with the pool. Fixes: e102db780e1c ("ice: track AF_XDP ZC enabled queues in bitmap") Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-3-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <[email protected]>
* ice: refactor struct ice_vsi_cfg_params to be inside of struct ice_vsiMateusz Polchlopek2024-05-061-22/+11
| | | | | | | | | | | | | | | Refactor struct ice_vsi_cfg_params to be embedded into struct ice_vsi. Prior to that the members of the struct were scattered around ice_vsi, and were copy-pasted for purposes of reinit. Now we have struct handy, and it is easier to have something sticky in the flags field. Suggested-by: Przemek Kitszel <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Vaishnavi Tipireddy <[email protected]> Signed-off-by: Mateusz Polchlopek <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* ice: move devlink port code to a separate filePiotr Raczynski2024-04-011-1/+0
| | | | | | | | | | | | | | | Keep devlink related code in a separate file. More devlink port code and handlers will be added here for other port operations. Remove no longer needed include of our devlink.h in ice_lib.c. Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: Piotr Raczynski <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* ice: move ice_devlink.[ch] to devlink folderMichal Swiatkowski2024-04-011-1/+1
| | | | | | | | | | | | | | | | Only moving whole files, fixing Makefile and bunch of includes. Some changes to ice_devlink file was done even in representor part (Tx topology), so keep it as final patch to not mess up with rebasing. After moving to devlink folder there is no need to have such long name for these files. Rename them to simple devlink. Reviewed-by: Aleksandr Loktionov <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* Merge branch '100GbE' of ↵Jakub Kicinski2024-03-291-48/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: use less resources in switchdev Michal Swiatkowski says: Switchdev is using one queue per created port representor. This can quickly lead to Rx queue shortage, as with subfunction support user can create high number of PRs. Save one MSI-X and 'number of PRs' * 1 queues. Refactor switchdev slow-path to use less resources (even no additional resources). Do this by removing control plane VSI and move its functionality to PF VSI. Even with current solution PF is acting like uplink and can't be used simultaneously for other use cases (adding filters can break slow-path). In short, do Tx via PF VSI and Rx packets using PF resources. Rx needs additional code in interrupt handler to choose correct PR netdev. Previous solution had to queue filters, it was way more elegant but needed one queue per PRs. Beside that this refactor mostly simplifies switchdev configuration. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: count representor stats ice: do switchdev slow-path Rx using PF VSI ice: change repr::id values ice: remove switchdev control plane VSI ice: control default Tx rule in lag ice: default Tx rule instead of to queue ice: do Tx through PF netdev in slow-path ice: remove eswitch changing queues algorithm ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
| * ice: remove switchdev control plane VSIMichal Swiatkowski2024-03-251-48/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | For slow-path Rx and Tx PF VSI is used. There is no need to have control plane VSI. Remove all code related to it. Eswitch rebuild can't fail without rebuilding control plane VSI. Return void from ice_eswitch_rebuild(). Reviewed-by: Marcin Szycik <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | ice: fix memory corruption bug with suspend and rebuildJesse Brandeburg2024-03-251-9/+9
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ice driver would previously panic after suspend. This is caused from the driver *only* calling the ice_vsi_free_q_vectors() function by itself, when it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload") the driver has zeroed out num_q_vectors, and only restored it in ice_vsi_cfg_def(). This further causes the ice_rebuild() function to allocate a zero length buffer, after which num_q_vectors is updated, and then the new value of num_q_vectors is used to index into the zero length buffer, which corrupts memory. The fix entails making sure all the code referencing num_q_vectors only does so after it has been reset via ice_vsi_cfg_def(). I didn't perform a full bisect, but I was able to test against 6.1.77 kernel and that ice driver works fine for suspend/resume with no panic, so sometime since then, this problem was introduced. Also clean up an un-needed init of a local variable in the function being modified. PANIC from 6.8.0-rc1: [1026674.915596] PM: suspend exit [1026675.664697] ice 0000:17:00.1: PTP reset successful [1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time [1026675.667660] ice 0000:b1:00.0: PTP reset successful [1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time [1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None [1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010 [1026677.192753] ice 0000:17:00.0: PTP reset successful [1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time [1026677.197928] #PF: supervisor read access in kernel mode [1026677.197933] #PF: error_code(0x0000) - not-present page [1026677.197937] PGD 1557a7067 P4D 0 [1026677.212133] ice 0000:b1:00.1: PTP reset successful [1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time [1026677.212575] [1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI [1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G W 6.8.0-rc1+ #1 [1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022 [1026677.269367] Workqueue: ice ice_service_task [ice] [1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6 [1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202 [1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000 [1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828 [1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010 [1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0 [1026677.344472] FS: 0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000 [1026677.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0 [1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1026677.381952] PKRU: 55555554 [1026677.385116] Call Trace: [1026677.388023] <TASK> [1026677.390589] ? __die+0x20/0x70 [1026677.394105] ? page_fault_oops+0x82/0x160 [1026677.398576] ? do_user_addr_fault+0x65/0x6a0 [1026677.403307] ? exc_page_fault+0x6a/0x150 [1026677.407694] ? asm_exc_page_fault+0x22/0x30 [1026677.412349] ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.418614] ice_vsi_rebuild+0x34b/0x3c0 [ice] [1026677.423583] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [1026677.429147] ice_rebuild+0x18b/0x520 [ice] [1026677.433746] ? delay_tsc+0x8f/0xc0 [1026677.437630] ice_do_reset+0xa3/0x190 [ice] [1026677.442231] ice_service_task+0x26/0x440 [ice] [1026677.447180] process_one_work+0x174/0x340 [1026677.451669] worker_thread+0x27e/0x390 [1026677.455890] ? __pfx_worker_thread+0x10/0x10 [1026677.460627] kthread+0xee/0x120 [1026677.464235] ? __pfx_kthread+0x10/0x10 [1026677.468445] ret_from_fork+0x2d/0x50 [1026677.472476] ? __pfx_kthread+0x10/0x10 [1026677.476671] ret_from_fork_asm+0x1b/0x30 [1026677.481050] </TASK> Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload") Reported-by: Robert Elliott <[email protected]> Signed-off-by: Jesse Brandeburg <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* Merge branch '100GbE' of ↵David S. Miller2024-03-111-0/+37
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ethtool: ice: Support for RSS settings to GTP Takeru Hayasaka enables RSS functionality for GTP packets on ice driver with ethtool. A user can include TEID and make RSS work for GTP-U over IPv4 by doing the following:`ethtool -N ens3 rx-flow-hash gtpu4 sde` In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d. gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does not include a TEID. gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that includes a TEID. gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios. gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6. gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended header includes Uplink, applicable to both IPv4 and IPv6. gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink, for both IPv4 and IPv6. ==================== Signed-off-by: David S. Miller <[email protected]>
| * ice: Implement RSS settings for GTP using ethtoolTakeru Hayasaka2024-03-061-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following the addition of new GTP RSS hash options to ethtool.h, this patch implements the corresponding RSS settings for GTP packets in the Intel ice driver. It enables users to configure RSS for GTP-U and GTP-C traffic over IPv4 and IPv6, utilizing the newly defined hash options. The implementation covers the handling of gtpu(4|6), gtpc(4|6), gtpc(4|6)t, gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d traffic, providing enhanced load distribution for GTP traffic across multiple processing units. Signed-off-by: Takeru Hayasaka <[email protected]> Reviewed-by: Marcin Szycik <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-03-071-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <[email protected]>
| * ice: fix typo in assignmentJesse Brandeburg2024-03-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | Fix an obviously incorrect assignment, created with a typo or cut-n-paste error. Fixes: 5995ef88e3a8 ("ice: realloc VSI stats arrays") Signed-off-by: Jesse Brandeburg <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | ice: do not disable Tx queues twice in ice_down()Maciej Fijalkowski2024-03-041-55/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | ice_down() clears QINT_TQCTL_CAUSE_ENA_M bit twice, which is not necessary. First clearing happens in ice_vsi_dis_irq() and second in ice_vsi_stop_tx_ring() - remove the first one. While at it, make ice_vsi_dis_irq() static as ice_down() is the only current caller of it. Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-02-291-17/+69
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: net/mptcp/protocol.c adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket") 9426ce476a70 ("mptcp: annotate lockless access for RX path fields") https://lore.kernel.org/all/[email protected]/ Adjacent changes: drivers/dpll/dpll_core.c 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order") drivers/net/veth.c 1ce7d306ea63 ("veth: try harder when allocating queue memory") 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers") drivers/net/wireless/intel/iwlwifi/mvm/d3.c 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") 78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists") net/wireless/nl80211.c f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change") 414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately") Signed-off-by: Jakub Kicinski <[email protected]>
| * ice: Fix ASSERT_RTNL() warning during certain scenariosAmritha Nambiar2024-02-201-17/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi") invoked the netif_queue_set_napi() call. This kernel function requires to be called with rtnl_lock taken, otherwise ASSERT_RTNL() warning will be triggered. ice_vsi_rebuild() initiating this call is under rtnl_lock when the rebuild is in response to configuration changes from external interfaces (such as tc, ethtool etc. which holds the lock). But, the VSI rebuild generated from service tasks and resets (PFR/CORER/GLOBR) is not under rtnl lock protection. Handle these cases as well to hold lock before the kernel call (by setting the 'locked' boolean to false). netif_queue_set_napi() is also used to clear previously set napi in the q_vector unroll flow. Handle this for locked/lockless execution paths. Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi") Signed-off-by: Amritha Nambiar <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: make ice_vsi_cfg_txq() staticMaciej Fijalkowski2024-02-021-73/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, XSK control path in ice driver calls directly ice_vsi_cfg_txq() whereas we have ice_vsi_cfg_single_txq() for that purpose. Use the latter from XSK side and make ice_vsi_cfg_txq() static. ice_vsi_cfg_txq() resides in ice_base.c and is rather big, so to reduce the code churn let us move the callers of it from ice_lib.c to ice_base.c. This change puts ice_qp_ena() on nice diet due to the checks and operations that ice_vsi_cfg_single_{r,t}xq() do internally. add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-182 (-182) Function old new delta ice_xsk_pool_setup 2165 1983 -182 Total: Before=472597, After=472415, chg -0.04% Signed-off-by: Maciej Fijalkowski <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: make ice_vsi_cfg_rxq() staticMaciej Fijalkowski2024-02-021-56/+0
|/ | | | | | | | | | | | | | | | Currently, XSK control path in ice driver calls directly ice_vsi_cfg_rxq() whereas we have ice_vsi_cfg_single_rxq() for that purpose. Use the latter from XSK side and make ice_vsi_cfg_rxq() static. ice_vsi_cfg_rxq() resides in ice_base.c and is rather big, so to reduce the code churn let us move two callers of it from ice_lib.c to ice_base.c. Signed-off-by: Maciej Fijalkowski <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>