aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/intel/ice/ice_main.c
Commit message (Collapse)AuthorAgeFilesLines
* ice: fix NULL access of tx->in_use in ice_ll_ts_intrJacob Keller2025-09-021-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | Recent versions of the E810 firmware have support for an extra interrupt to handle report of the "low latency" Tx timestamps coming from the specialized low latency firmware interface. Instead of polling the registers, software can wait until the low latency interrupt is fired. This logic makes use of the Tx timestamp tracking structure, ice_ptp_tx, as it uses the same "ready" bitmap to track which Tx timestamps complete. Unfortunately, the ice_ll_ts_intr() function does not check if the tracker is initialized before its first access. This results in NULL dereference or use-after-free bugs similar to the issues fixed in the ice_ptp_ts_irq() function. Fix this by only checking the in_use bitmap (and other fields) if the tracker is marked as initialized. The reset flow will clear the init field under lock before it tears the tracker down, thus preventing any use-after-free or NULL access. Fixes: 82e71b226e0e ("ice: Enable SW interrupt from FW for LL TS") Signed-off-by: Jacob Keller <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* ice: don't leave device non-functional if Tx scheduler config failsJacob Keller2025-08-251-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ice_cfg_tx_topo function attempts to apply Tx scheduler topology configuration based on NVM parameters, selecting either a 5 or 9 layer topology. As part of this flow, the driver acquires the "Global Configuration Lock", which is a hardware resource associated with programming the DDP package to the device. This "lock" is implemented by firmware as a way to guarantee that only one PF can program the DDP for a device. Unlike a traditional lock, once a PF has acquired this lock, no other PF will be able to acquire it again (including that PF) until a CORER of the device. Future requests to acquire the lock report that global configuration has already completed. The following flow is used to program the Tx topology: * Read the DDP package for scheduler configuration data * Acquire the global configuration lock * Program Tx scheduler topology according to DDP package data * Trigger a CORER which clears the global configuration lock This is followed by the flow for programming the DDP package: * Acquire the global configuration lock (again) * Download the DDP package to the device * Release the global configuration lock. However, if configuration of the Tx topology fails, (i.e. ice_get_set_tx_topo returns an error code), the driver exits ice_cfg_tx_topo() immediately, and fails to trigger CORER. While the global configuration lock is held, the firmware rejects most AdminQ commands, as it is waiting for the DDP package download (or Tx scheduler topology programming) to occur. The current driver flows assume that the global configuration lock has been reset by CORER after programming the Tx topology. Thus, the same PF attempts to acquire the global lock again, and fails. This results in the driver reporting "an unknown error occurred when loading the DDP package". It then attempts to enter safe mode, but ultimately fails to finish ice_probe() since nearly all AdminQ command report error codes, and the driver stops loading the device at some point during its initialization. The only currently known way that ice_get_set_tx_topo() can fail is with certain older DDP packages which contain invalid topology configuration, on firmware versions which strictly validate this data. The most recent releases of the DDP have resolved the invalid data. However, it is still poor practice to essentially brick the device, and prevent access to the device even through safe mode or recovery mode. It is also plausible that this command could fail for some other reason in the future. We cannot simply release the global lock after a failed call to ice_get_set_tx_topo(). Releasing the lock indicates to firmware that global configuration (downloading of the DDP) has completed. Future attempts by this or other PFs to load the DDP will fail with a report that the DDP package has already been downloaded. Then, PFs will enter safe mode as they realize that the package on the device does not meet the minimum version requirement to load. The reported error messages are confusing, as they indicate the version of the default "safe mode" package in the NVM, rather than the version of the file loaded from /lib/firmware. Instead, we need to trigger CORER to clear global configuration. This is the lowest level of hardware reset which clears the global configuration lock and related state. It also clears any already downloaded DDP. Crucially, it does *not* clear the Tx scheduler topology configuration. Refactor ice_cfg_tx_topo() to always trigger a CORER after acquiring the global lock, regardless of success or failure of the topology configuration. We need to re-initialize the HW structure when we trigger the CORER. Thus, it makes sense for this to be the responsibility of ice_cfg_tx_topo() rather than its caller, ice_init_tx_topology(). This avoids needless re-initialization in cases where we don't attempt to update the Tx scheduler topology, such as if it has already been programmed. There is one catch: failure to re-initialize the HW struct should stop ice_probe(). If this function fails, we won't have a valid HW structure and cannot ensure the device is functioning properly. To handle this, ensure ice_cfg_tx_topo() returns a limited set of error codes. Set aside one specifically, -ENODEV, to indicate that the ice_init_tx_topology() should fail and stop probe. Other error codes indicate failure to apply the Tx scheduler topology. This is treated as a non-fatal error, with an informational message informing the system administrator that the updated Tx topology did not apply. This allows the device to load and function with the default Tx scheduler topology, rather than failing to load entirely. Note that this use of CORER will not result in loops with future PFs attempting to also load the invalid Tx topology configuration. The first PF will acquire the global configuration lock as part of programming the DDP. Each PF after this will attempt to acquire the global lock as part of programming the Tx topology, and will fail with the indication from firmware that global configuration is already complete. Tx scheduler topology configuration is only performed during driver init (probe or devlink reload) and not during cleanup for a CORER that happens after probe completes. Fixes: 91427e6d9030 ("ice: Support 5 layer topology") Signed-off-by: Jacob Keller <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* ice: use libie_aq_strMichal Swiatkowski2025-07-241-59/+10
| | | | | | | | | | | | | Simple: s/ice_aq_str/libie_aq_str Add libie_aminq module in ice Kconfig. Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* i40e: use libie adminq descriptorsMichal Swiatkowski2025-07-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Use libie_aq_desc instead of i40e_aq_desc. Do needed changes to allow clean build. Get version descriptor is a little less detailed on i40e. To not mess up with shifting or union inside libie desc use get version descriptor from i40e. Move additional caps for i40e to libie. Fix RCT in declaration that is using libie_aq_desc; Use libie_aq_raw() wherever it can be used. The libie aq error is extended, cover it in ice driver just to clean build. In next patches the libie code for that will be used in each of intel driver. Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* ice, libie: move generic adminq descriptors to libMichal Swiatkowski2025-07-241-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The descriptor structure is the same in ice, ixgbe and i40e. Move it to common libie header to use it across different driver. Leave device specific adminq commands in separate folders. This lead to a change that need to be done in filling/getting descriptor: - previous: struct specific_desc *cmd; cmd = &desc.params.specific_desc; - now: struct specific_desc *cmd; cmd = libie_aq_raw(&desc); Do this changes across the driver to allow clean build. The casting only have to be done in case of specific descriptors, for generic one union can still be used. Changes beside code moving: - change ICE_ prefix to LIBIE_ prefix (ice_ and libie_ too) - remove shift variables not otherwise needed (in libie_aq_flags) - fill/get descriptor data based on desc.params.raw whenever the descriptor isn't defined in libie - move defines from the libie_aq_sth structure outside - add libie_aq_raw helper and use it instead of explicit casting Reviewed by: Przemek Kitszel <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* ice: add E835 device IDsDawid Osuchowski2025-07-181-0/+9
| | | | | | | | | | | | | | | | | | | | | | | E835 is an enhanced version of the E830. It continues to use the same set of commands, registers and interfaces as other devices in the 800 Series. Following device IDs are added: - 0x1248: Intel(R) Ethernet Controller E835-CC for backplane - 0x1249: Intel(R) Ethernet Controller E835-CC for QSFP - 0x124A: Intel(R) Ethernet Controller E835-CC for SFP - 0x1261: Intel(R) Ethernet Controller E835-C for backplane - 0x1262: Intel(R) Ethernet Controller E835-C for QSFP - 0x1263: Intel(R) Ethernet Controller E835-C for SFP - 0x1265: Intel(R) Ethernet Controller E835-L for backplane - 0x1266: Intel(R) Ethernet Controller E835-L for QSFP - 0x1267: Intel(R) Ethernet Controller E835-L for SFP Reviewed-by: Konrad Knitter <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Dawid Osuchowski <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* ice: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean2025-07-031-22/+2
| | | | | | | | | | | | | | | | | New timestamping API was introduced in commit 66f7223039c0 ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert the Intel ice driver to the new API, so that timestamping configuration can be removed from the ndo_eth_ioctl() path completely. Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Jacob Keller <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Vadim Fedorenko <[email protected]> Reviewed-by: Milena Olech <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* udp_tunnel: remove rtnl_lock dependencyStanislav Fomichev2025-06-191-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drivers that are using ops lock and don't depend on RTNL lock still need to manage it because udp_tunnel's RTNL dependency. Introduce new udp_tunnel_nic_lock and use it instead of rtnl_lock. Drop non-UDP_TUNNEL_NIC_INFO_MAY_SLEEP mode from udp_tunnel infra (udp_tunnel_nic_device_sync_work needs to grab udp_tunnel_nic_lock mutex and might sleep). Cover more places in v4: - netlink - udp_tunnel_notify_add_rx_port (ndo_open) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_notify_del_rx_port (ndo_stop) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_get_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - udp_tunnel_drop_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_DROP_INFO - udp_tunnel_nic_reset_ntf (ndo_open) - notifiers - udp_tunnel_nic_netdevice_event, depending on the event: - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - triggers NETDEV_UDP_TUNNEL_DROP_INFO - ethnl_tunnel_info_reply_size - udp_tunnel_nic_set_port_priv (two intel drivers) Cc: Michael Chan <[email protected]> Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* ice: add phase offset monitor for all PPS dpll inputsArkadiusz Kubalewski2025-06-141-0/+4
| | | | | | | | | | | | | | | | | | | Implement a new admin command and helper function to handle and obtain CGU measurements for input pins. Add new callback operations to control the dpll device-level feature "phase offset monitor," allowing it to be enabled or disabled. If the feature is enabled, provide users with measured phase offsets and notifications. Initialize PPS DPLL with new callback operations if the feature is supported by the firmware. Reviewed-by: Milena Olech <[email protected]> Signed-off-by: Arkadiusz Kubalewski <[email protected]> Acked-by: Vadim Fedorenko <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2025-06-121-1/+1
|\ | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR (net-6.16-rc2). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
| * treewide, timers: Rename from_timer() to timer_container_of()Ingo Molnar2025-06-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
* | ice: add link_down_events statisticMartyna Szapar-Mudlaw2025-06-091-0/+3
|/ | | | | | | | | | | | | | | Introduce a link_down_events counter to the ice driver, incremented each time the link transitions from up to down. This counter can help diagnose issues related to link stability, such as port flapping or unexpected link drops. The value is exposed via ethtool's get_link_ext_stats() interface. Reviewed-by: Kory Maincent <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Martyna Szapar-Mudlaw <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* ice: fix Tx scheduler error handling in XDP callbackMichal Kubiak2025-05-301-14/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the XDP program is loaded, the XDP callback adds new Tx queues. This means that the callback must update the Tx scheduler with the new queue number. In the event of a Tx scheduler failure, the XDP callback should also fail and roll back any changes previously made for XDP preparation. The previous implementation had a bug that not all changes made by the XDP callback were rolled back. This caused the crash with the following call trace: [ +9.549584] ice 0000:ca:00.0: Failed VSI LAN queue config for XDP, error: -5 [ +0.382335] Oops: general protection fault, probably for non-canonical address 0x50a2250a90495525: 0000 [#1] SMP NOPTI [ +0.010710] CPU: 103 UID: 0 PID: 0 Comm: swapper/103 Not tainted 6.14.0-net-next-mar-31+ #14 PREEMPT(voluntary) [ +0.010175] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022 [ +0.010946] RIP: 0010:__ice_update_sample+0x39/0xe0 [ice] [...] [ +0.002715] Call Trace: [ +0.002452] <IRQ> [ +0.002021] ? __die_body.cold+0x19/0x29 [ +0.003922] ? die_addr+0x3c/0x60 [ +0.003319] ? exc_general_protection+0x17c/0x400 [ +0.004707] ? asm_exc_general_protection+0x26/0x30 [ +0.004879] ? __ice_update_sample+0x39/0xe0 [ice] [ +0.004835] ice_napi_poll+0x665/0x680 [ice] [ +0.004320] __napi_poll+0x28/0x190 [ +0.003500] net_rx_action+0x198/0x360 [ +0.003752] ? update_rq_clock+0x39/0x220 [ +0.004013] handle_softirqs+0xf1/0x340 [ +0.003840] ? sched_clock_cpu+0xf/0x1f0 [ +0.003925] __irq_exit_rcu+0xc2/0xe0 [ +0.003665] common_interrupt+0x85/0xa0 [ +0.003839] </IRQ> [ +0.002098] <TASK> [ +0.002106] asm_common_interrupt+0x26/0x40 [ +0.004184] RIP: 0010:cpuidle_enter_state+0xd3/0x690 Fix this by performing the missing unmapping of XDP queues from q_vectors and setting the XDP rings pointer back to NULL after all those queues are released. Also, add an immediate exit from the XDP callback in case of ring preparation failure. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Dawid Osuchowski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Michal Kubiak <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Jesse Brandeburg <[email protected]> Tested-by: Saritha Sanigani <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* Merge branch 'for-next' of ↵Jakub Kicinski2025-05-131-8/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux Tony Nguyen says: ==================== Prepare for Intel IPU E2000 (GEN3) This is the first part in introducing RDMA support for idpf. ---------------------------------------------------------------- Tatyana Nikolova says: To align with review comments, the patch series introducing RDMA RoCEv2 support for the Intel Infrastructure Processing Unit (IPU) E2000 line of products is going to be submitted in three parts: 1. Modify ice to use specific and common IIDC definitions and pass a core device info to irdma. 2. Add RDMA support to idpf and modify idpf to use specific and common IIDC definitions and pass a core device info to irdma. 3. Add RDMA RoCEv2 support for the E2000 products, referred to as GEN3 to irdma. This first part is a 5 patch series based on the original "iidc/ice/irdma: Update IDC to support multiple consumers" patch to allow for multiple CORE PCI drivers, using the auxbus. Patches: 1) Move header file to new name for clarity and replace ice specific DSCP define with a kernel equivalent one in irdma 2) Unify naming convention 3) Separate header file into common and driver specific info 4) Replace ice specific DSCP define with a kernel equivalent one in ice 5) Implement core device info struct and update drivers to use it ---------------------------------------------------------------- v1: https://lore.kernel.org/[email protected] IWL reviews: [v5] https://lore.kernel.org/[email protected] [v4] https://lore.kernel.org/[email protected] [v3] https://lore.kernel.org/[email protected] [v2] https://lore.kernel.org/[email protected] [v1] https://lore.kernel.org/[email protected] * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux: iidc/ice/irdma: Update IDC to support multiple consumers ice: Replace ice specific DSCP mapping num with a kernel define iidc/ice/irdma: Break iidc.h into two headers iidc/ice/irdma: Rename to iidc_* convention iidc/ice/irdma: Rename IDC header file ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
| * iidc/ice/irdma: Update IDC to support multiple consumersDave Ertman2025-05-091-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation of supporting more than a single core PCI driver for RDMA, move ice specific structs like qset_params, qos_info and qos_params from iidc_rdma.h to iidc_rdma_ice.h. Previously, the ice driver was just exporting its entire PF struct to the auxiliary driver, but since each core driver will have its own different PF struct, implement a universal struct that all core drivers can provide to the auxiliary driver through the probe call. Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Dave Ertman <[email protected]> Co-developed-by: Mustafa Ismail <[email protected]> Signed-off-by: Mustafa Ismail <[email protected]> Co-developed-by: Shiraz Saleem <[email protected]> Signed-off-by: Shiraz Saleem <[email protected]> Co-developed-by: Tatyana Nikolova <[email protected]> Signed-off-by: Tatyana Nikolova <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
| * iidc/ice/irdma: Rename to iidc_* conventionDave Ertman2025-04-301-4/+4
| | | | | | | | | | | | | | | | | | | | | | In preparation of supporting more than a single core PCI driver for RDMA, homogenize naming to iidc_rdma_* and IIDC_RDMA_* form. Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Dave Ertman <[email protected]> Signed-off-by: Tatyana Nikolova <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | ice: support egress drop rules on PFLarysa Zaremba2025-04-111-8/+55
|/ | | | | | | | | | | | | | | | | | | | | | | tc clsact qdisc allows us to add offloaded egress rules with commands such as the following one: tc filter add dev <ifname> egress protocol lldp flower skip_sw action drop Support the egress rule drop action when added to PF, with a few caveats: * in switchdev mode, all PF traffic has to go uplink with an exception for LLDP that can be delegated to a single VSI at a time * in legacy mode, we cannot delegate LLDP functionality to another VSI, so such packets from PF should not be blocked. Also, simplify the rule direction logic, it was previously derived from actions, but actually can be inherited from the tc block (and flipped in case of port representors). Reviewed-by: Michal Swiatkowski <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* treewide: Switch/rename to timer_delete[_sync]()Thomas Gleixner2025-04-051-1/+1
| | | | | | | | | | timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
* ice: Add E830 checksum offload supportPaul Greenwalt2025-03-181-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | E830 supports raw receive and generic transmit checksum offloads. Raw receive checksum support is provided by hardware calculating the checksum over the whole packet, regardless of type. The calculated checksum is provided to driver in the Rx flex descriptor. Then the driver assigns the checksum to skb->csum and sets skb->ip_summed to CHECKSUM_COMPLETE. Generic transmit checksum support is provided by hardware calculating the checksum given two offsets: the start offset to begin checksum calculation, and the offset to insert the calculated checksum in the packet. Support is advertised to the stack using NETIF_F_HW_CSUM feature. E830 has the following limitations when both generic transmit checksum offload and TCP Segmentation Offload (TSO) are enabled: 1. Inner packet header modification is not supported. This restriction includes the inability to alter TCP flags, such as the push flag. As a result, this limitation can impact the receiver's ability to coalesce packets, potentially degrading network throughput. 2. The Maximum Segment Size (MSS) is limited to 1023 bytes, which prevents support of Maximum Transmission Unit (MTU) greater than 1063 bytes. Therefore NETIF_F_HW_CSUM and NETIF_F_ALL_TSO features are mutually exclusive. NETIF_F_HW_CSUM hardware feature support is indicated but is not enabled by default. Instead, IP checksums and NETIF_F_ALL_TSO are the defaults. Enforcement of mutual exclusivity of NETIF_F_HW_CSUM and NETIF_F_ALL_TSO is done in ice_set_features(). Mutual exclusivity of IP checksums and NETIF_F_HW_CSUM is handled by netdev_fix_features(). When NETIF_F_HW_CSUM is requested the provided skb->csum_start and skb->csum_offset are passed to hardware in the Tx context descriptor generic checksum (GCS) parameters. Hardware calculates the 1's complement from skb->csum_start to the end of the packet, and inserts the result in the packet at skb->csum_offset. Co-developed-by: Alice Michael <[email protected]> Signed-off-by: Alice Michael <[email protected]> Co-developed-by: Eric Joyner <[email protected]> Signed-off-by: Eric Joyner <[email protected]> Signed-off-by: Paul Greenwalt <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni2025-03-131-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR (net-6.14-rc6). Conflicts: tools/testing/selftests/drivers/net/ping.py 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py") de94e8697405 ("selftests: drv-net: store addresses in dict indexed by ipver") https://lore.kernel.org/netdev/[email protected]/ net/core/devmem.c a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()") 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations") https://lore.kernel.org/netdev/[email protected]/ Adjacent changes: tools/testing/selftests/net/Makefile 6f50175ccad4 ("selftests: Add IPv6 link-local address generation tests for GRE devices.") 2e5584e0f913 ("selftests/net: expand cmsg_ipv6.sh with ipv4") drivers/net/ethernet/broadcom/bnxt/bnxt.c 661958552eda ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic") fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings") Signed-off-by: Paolo Abeni <[email protected]>
| * ice: register devlink prior to creating health reportersPrzemek Kitszel2025-03-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ice_health_init() was introduced in the commit 2a82874a3b7b ("ice: add Tx hang devlink health reporter"). The call to it should have been put after ice_devlink_register(). It went unnoticed until next reporter by Konrad, which receives events from FW. FW is reporting all events, also from prior driver load, and thus it is not unlikely to have something at the very beginning. And that results in a splat: [ 24.455950] ? devlink_recover_notify.constprop.0+0x198/0x1b0 [ 24.455973] devlink_health_report+0x5d/0x2a0 [ 24.455976] ? __pfx_ice_health_status_lookup_compare+0x10/0x10 [ice] [ 24.456044] ice_process_health_status_event+0x1b7/0x200 [ice] Do the analogous thing for deinit patch. Fixes: 85d6164ec56d ("ice: add fw and port health reporters") Reviewed-by: Aleksandr Loktionov <[email protected]> Reviewed-by: Michal Swiatkowski <[email protected]> Reviewed-by: Konrad Knitter <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: use napi's irq affinity and rmap IRQ notifiersAhmed Zaki2025-02-271-44/+3
| | | | | | | | | | | | | | | | | | Delete the driver CPU affinity and aRFS rmap info, use the core's API instead. Signed-off-by: Ahmed Zaki <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* | ice: Implement PTP support for E830 devicesMichal Michalik2025-02-101-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add specific functions and definitions for E830 devices to enable PTP support. E830 devices support direct write to GLTSYN_ registers without shadow registers and 64 bit read of PHC time. Enable PTM for E830 device, which is required for cross timestamp and and dependency on PCIE_PTM for ICE_HWTS. Check X86_FEATURE_ART for E830 as it may not be present in the CPU. Cc: Anna-Maria Behnsen <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Thomas Gleixner <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Co-developed-by: Jacob Keller <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Co-developed-by: Milena Olech <[email protected]> Signed-off-by: Milena Olech <[email protected]> Co-developed-by: Paul Greenwalt <[email protected]> Signed-off-by: Paul Greenwalt <[email protected]> Signed-off-by: Michal Michalik <[email protected]> Co-developed-by: Karol Kolacinski <[email protected]> Signed-off-by: Karol Kolacinski <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: Process TSYN IRQ in a separate functionKarol Kolacinski2025-02-101-15/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Simplify TSYN IRQ processing by moving it to a separate function and having appropriate behavior per PHY model, instead of multiple conditions not related to HW, but to specific timestamping modes. When PTP is not enabled in the kernel, don't process timestamps and return IRQ_HANDLED. Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Karol Kolacinski <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: init flow director before RDMAMichal Swiatkowski2025-02-051-2/+4
|/ | | | | | | | | | Flow director needs only one MSI-X. Load it before RDMA to save MSI-X for it. Reviewed-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* Merge branch '100GbE' of ↵Jakub Kicinski2025-01-181-0/+53
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: support FW Recovery Mode Konrad Knitter says: Enable update of card in FW Recovery Mode * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: support FW Recovery Mode devlink: add devl guard pldmfw: enable selected component update ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
| * ice: support FW Recovery ModeKonrad Knitter2025-01-161-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recovery Mode is intended to recover from a fatal failure scenario in which the device is not accessible to the host, meaning the firmware is non-responsive. The purpose of the Firmware Recovery Mode is to enable software tools to update firmware and/or device configuration so the fatal error can be resolved. Recovery Mode Firmware supports a limited set of admin commands required for NVM update. Recovery Firmware does not support hardware interrupts so a polling mode is used. The driver will expose only the minimum set of devlink commands required for the recovery of the adapter. Using an appropriate NVM image, the user can recover the adapter using the devlink flash API. Prior to 4.20 E810 Adapter Recovery Firmware supports only the update and erase of the "fw.mgmt" component. E810 Adapter Recovery Firmware doesn't support selected preservation of cards settings or identifiers. The following command can be used to recover the adapter: $ devlink dev flash <pci-address> <update-image.bin> component fw.mgmt overwrite settings overwrite identifier Newer FW versions (4.20 or newer) supports update of "fw.undi" and "fw.netlist" components. $ devlink dev flash <pci-address> <update-image.bin> Tested on Intel Corporation Ethernet Controller E810-C for SFP FW revision 3.20 and 4.30. Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Konrad Knitter <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2025-01-161-3/+3
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR (net-6.13-rc8). Conflicts: drivers/net/ethernet/realtek/r8169_main.c 1f691a1fc4be ("r8169: remove redundant hwmon support") 152d00a91396 ("r8169: simplify setting hwmon attribute visibility") https://lore.kernel.org/[email protected] Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c 152f4da05aee ("bnxt_en: add support for rx-copybreak ethtool command") f0aa6a37a3db ("eth: bnxt: always recalculate features after XDP clearing, fix null-deref") drivers/net/ethernet/intel/ice/ice_type.h 50327223a8bb ("ice: add lock to protect low latency interface") dc26548d729e ("ice: Fix quad registers read on E825") Signed-off-by: Jakub Kicinski <[email protected]>
| * ice: Add correct PHY lane assignmentKarol Kolacinski2025-01-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Driver always naively assumes, that for PTP purposes, PHY lane to configure is corresponding to PF ID. This is not true for some port configurations, e.g.: - 2x50G per quad, where lanes used are 0 and 2 on each quad, but PF IDs are 0 and 1 - 100G per quad on 2 quads, where lanes used are 0 and 4, but PF IDs are 0 and 1 Use correct PHY lane assignment by getting and parsing port options. This is read from the NVM by the FW and provided to the driver with the indication of active port split. Remove ice_is_muxed_topo(), which is no longer needed. Fixes: 4409ea1726cb ("ice: Adjust PTP init for 2x50G E825C devices") Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Arkadiusz Kubalewski <[email protected]> Signed-off-by: Karol Kolacinski <[email protected]> Signed-off-by: Grzegorz Nitka <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: add fw and port health reportersKonrad Knitter2025-01-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Firmware generates events for global events or port specific events. Driver shall subscribe for health status events from firmware on supported FW versions >= 1.7.6. Driver shall expose those under specific health reporter, two new reporters are introduced: - FW health reporter shall represent global events (problems with the image, recovery mode); - Port health reporter shall represent port-specific events (module failure). Firmware only reports problems when those are detected, it does not store active fault list. Driver will hold only last global and last port-specific event. Driver will report all events via devlink health report, so in case of multiple events of the same source they can be reviewed using devlink autodump feature. $ devlink health pci/0000:b1:00.3: reporter fw state healthy error 0 recover 0 auto_dump true reporter port state error error 1 recover 0 last_dump_date 2024-03-17 last_dump_time 09:29:29 auto_dump true $ devlink health diagnose pci/0000:b1:00.3 reporter port Syndrome: 262 Description: Module is not present. Possible Solution: Check that the module is inserted correctly. Port Number: 0 Tested on Intel Corporation Ethernet Controller E810-C for SFP Reviewed-by: Marcin Szycik <[email protected]> Co-developed-by: Sharon Haroni <[email protected]> Signed-off-by: Sharon Haroni <[email protected]> Co-developed-by: Nicholas Nunley <[email protected]> Signed-off-by: Nicholas Nunley <[email protected]> Co-developed-by: Brett Creeley <[email protected]> Signed-off-by: Brett Creeley <[email protected]> Signed-off-by: Konrad Knitter <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: ice_probe: init ice_adapter after HW initPrzemek Kitszel2025-01-141-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | Move ice_adapter initialization to be after HW init, so it could use HW capabilities, like number of PFs. This is needed for devlink-resource based RSS LUT size management for PF/VF (not in this series). Reviewed-by: Marcin Szycik <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: minor: rename goto labels from err to unrollPrzemek Kitszel2025-01-141-8/+8
| | | | | | | | | | | | | | | | | | | | | | Clean up goto labels after previous commit, to conform to single naming scheme in ice_probe() and ice_init_dev(). Reviewed-by: Marcin Szycik <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: split ice_init_hw() out from ice_init_dev()Przemek Kitszel2025-01-141-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split ice_init_hw() call out from ice_init_dev(). Such move enables pulling the former to be even earlier on call path, what would enable moving ice_adapter init to be between the two (in subsequent commit). Such move enables ice_adapter to know about number of PFs. Do the same for ice_deinit_hw(), so the init and deinit calls could be easily mirrored. Next commit will rename unrelated goto labels to unroll prefix. Reviewed-by: Marcin Szycik <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: c827: move wait for FW to ice_init_hw()Przemek Kitszel2025-01-141-37/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Move call to ice_wait_for_fw() from ice_init_dev() into ice_init_hw(), where it fits better. This requires also to move ice_wait_for_fw() to ice_common.c. ice_is_pf_c827() is now used only in ice_common.c, so it could be static. CC: Arkadiusz Kubalewski <[email protected]> Reviewed-by: Marcin Szycik <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
* | ice: Add MDD logging via devlink healthBen Shelton2024-12-171-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a devlink health reporter for MDD events. The 'dump' handler will return the information captured in each call to ice_handle_mdd_event(). A device reset (CORER/PFR) will put the reporter back in healthy state. Signed-off-by: Ben Shelton <[email protected]> Reviewed-by: Igor Bagnucki <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Mateusz Polchlopek <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Co-developed-by: Przemek Kitszel <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | ice: add Tx hang devlink health reporterPrzemek Kitszel2024-12-171-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add Tx hang devlink health reporter, see struct ice_tx_hang_event to see what exactly is reported. For now dump descriptors with little metadata and skb diagnostic information. Reviewed-by: Igor Bagnucki <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Co-developed-by: Mateusz Polchlopek <[email protected]> Signed-off-by: Mateusz Polchlopek <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Przemek Kitszel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | ice: rename devlink_port.[ch] to port.[ch]Przemek Kitszel2024-12-171-1/+1
|/ | | | | | | | | | | | | Drop "devlink_" prefix from files that sit in devlink/. I'm going to add more files there, and repeating "devlink" does not feel good. This is also the scheme used in most other places, most notably the devlink core files are named like that. devlink.[ch] stays as is. Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* Merge tag 'net-6.13-rc2' of ↵Linus Torvalds2024-12-051-3/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can and netfilter. Current release - regressions: - rtnetlink: fix double call of rtnl_link_get_net_ifla() - tcp: populate XPS related fields of timewait sockets - ethtool: fix access to uninitialized fields in set RXNFC command - selinux: use sk_to_full_sk() in selinux_ip_output() Current release - new code bugs: - net: make napi_hash_lock irq safe - eth: - bnxt_en: support header page pool in queue API - ice: fix NULL pointer dereference in switchdev Previous releases - regressions: - core: fix icmp host relookup triggering ip_rt_bug - ipv6: - avoid possible NULL deref in modify_prefix_route() - release expired exception dst cached in socket - smc: fix LGR and link use-after-free issue - hsr: avoid potential out-of-bound access in fill_frame_info() - can: hi311x: fix potential use-after-free - eth: ice: fix VLAN pruning in switchdev mode Previous releases - always broken: - netfilter: - ipset: hold module reference while requesting a module - nft_inner: incorrect percpu area handling under softirq - can: j1939: fix skb reference counting - eth: - mlxsw: use correct key block on Spectrum-4 - mlx5: fix memory leak in mlx5hws_definer_calc_layout" * tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits) net :mana :Request a V2 response version for MANA_QUERY_GF_STAT net: avoid potential UAF in default_operstate() vsock/test: verify socket options after setting them vsock/test: fix parameter types in SO_VM_SOCKETS_* calls vsock/test: fix failures due to wrong SO_RCVLOWAT parameter net/mlx5e: Remove workaround to avoid syndrome for internal port net/mlx5e: SD, Use correct mdev to build channel param net/mlx5: E-Switch, Fix switching to switchdev mode in MPV net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabled net/mlx5: HWS: Properly set bwc queue locks lock classes net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layout bnxt_en: handle tpa_info in queue API implementation bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() bnxt_en: refactor tpa_info alloc/free into helpers geneve: do not assume mac header is set in geneve_xmit_skb() mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4 ethtool: Fix wrong mod state in case of verbose and no_mask bitset ipmr: tune the ipmr_can_free_table() checks. netfilter: nft_set_hash: skip duplicated elements pending gc run netfilter: ipset: Hold module reference while requesting a module ...
| * ice: Fix VLAN pruning in switchdev modeMarcin Szycik2024-12-031-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In switchdev mode the uplink VSI should receive all unmatched packets, including VLANs. Therefore, VLAN pruning should be disabled if uplink is in switchdev mode. It is already being done in ice_eswitch_setup_env(), however the addition of ice_up() in commit 44ba608db509 ("ice: do switchdev slow-path Rx using PF VSI") caused VLAN pruning to be re-enabled after disabling it. Add a check to ice_set_vlan_filtering_features() to ensure VLAN filtering will not be enabled if uplink is in switchdev mode. Note that ice_is_eswitch_mode_switchdev() is being used instead of ice_is_switchdev_running(), as the latter would only return true after the whole switchdev setup completes. Fixes: 44ba608db509 ("ice: do switchdev slow-path Rx using PF VSI") Reviewed-by: Michal Swiatkowski <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Tested-by: Priya Singh <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | module: Convert symbol namespace to string literalPeter Zijlstra2024-12-021-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up the existing export namespace code along the same lines of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ && $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]*\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* Merge branch '100GbE' of ↵Jakub Kicinski2024-11-161-1/+32
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-11-05 (ice, ixgbe, igc. igb, igbvf, e1000) For ice: Mateusz refactors and adds additional SerDes configuration values to be output. Przemek refactors processing of DDP and adds support for a flag field in the DDP's signature segment header. Joe Damato adds support for persistent NAPI config. Brett adjusts setting of Tx promiscuous based on unicast/multicast setting. Jake moves setting of pf->supported_rxdids to occur directly after DDP load and changes a small struct to use stack memory. Frederic Weisbecker adds WQ_UNBOUND flag to the workqueue. For ixgbe: Diomidis Spinellis removes a circular dependency. For igc: Vitaly removes an unneeded autoneg parameter. For igb: Johnny Park fixes a couple of typos. For igbvf: Wander Lairson Costa removes an unused spinlock. For e1000: Joe Damato adds RTNL lock to some calls where it is expected to be held. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: e1000: Hold RTNL when e1000_down can be called igbvf: remove unused spinlock igb: Fix 2 typos in comments in igb_main.c igc: remove autoneg parameter from igc_mac_info ixgbe: Break include dependency cycle ice: Unbind the workqueue ice: use stack variable for virtchnl_supported_rxdids ice: initialize pf->supported_rxdids immediately after loading DDP ice: only allow Tx promiscuous for multicast ice: Add support for persistent NAPI config ice: support optional flags in signature segment header ice: refactor "last" segment of DDP pkg ice: extend dump serdes equalizer values feature ice: rework of dump serdes equalizer values feature ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
| * ice: Unbind the workqueueFrederic Weisbecker2024-11-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The ice workqueue doesn't seem to rely on any CPU locality and should therefore be able to run on any CPU. In practice this is already happening through the unbound ice_service_timer that may fire anywhere and queue the workqueue accordingly to any CPU. Make this official so that the ice workqueue is only ever queued to housekeeping CPUs on nohz_full. Signed-off-by: Frederic Weisbecker <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
| * ice: initialize pf->supported_rxdids immediately after loading DDPJacob Keller2024-11-131-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pf->supported_rxdids field is used to populate the list of valid RXDIDs that a VF may use when negotiating VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. The set of supported RXDIDs is dependent on the DDP, and can be read from the GLXFLXP_RXDID_FLAGS register. The PF needs to send this list to the VF upon receiving the VIRTCHNL_OP_GET_SUPPORTED_RXDIDs. It also needs to use this list to validate the requested descriptor ID from the VF when programming the Rx queues. A future update to support VF live migration will also want to validate that the target VF can support the same descriptor ID when migrating. Currently, pf->supported_rxdids is initialized inside the ice_vc_query_rxdid() function. This means that it is only ever initialized if at least one VF actually tries to negotiate VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. It is also unnecessarily re-initialized every time the VF loads and requests the descriptor list. This worked before because the PF only checks pf->suppported_rxdids when programming the Rx queue if the VF actually negotiates the VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC feature. This will be problematic for VF live migration. We need the list of supported Rx descriptor IDs when migrating. It is possible that no VF on the target PF has ever actually issued a VIRTCHNL_OP_GET_SUPPORTED_RXDIDs. Refactor the driver to initialize pf->supported_rxdids during driver initialization after the DDP is loaded. This is simpler, avoids unnecessary duplicate work, and avoids issues with the live migration process. Signed-off-by: Jacob Keller <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
* | ndo_fdb_del: Add a parameter to report whether notification was sentPetr Machata2024-11-161-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | In a similar fashion to ndo_fdb_add, which was covered in the previous patch, add the bool *notified argument to ndo_fdb_del. Callees that send a notification on their own set the flag to true. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Amit Cohen <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Link: https://patch.msgid.link/06b1acf4953ef0a5ed153ef1f32d7292044f2be6.1731589511.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
* | ndo_fdb_add: Add a parameter to report whether notification was sentPetr Machata2024-11-161-1/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when FDB entries are added to or deleted from a VXLAN netdevice, the VXLAN driver emits one notification, including the VXLAN-specific attributes. The core however always sends a notification as well, a generic one. Thus two notifications are unnecessarily sent for these operations. A similar situation comes up with bridge driver, which also emits notifications on its own: # ip link add name vx type vxlan id 1000 dstport 4789 # bridge monitor fdb & [1] 1981693 # bridge fdb add de:ad:be:ef:13:37 dev vx self dst 192.0.2.1 de:ad:be:ef:13:37 dev vx dst 192.0.2.1 self permanent de:ad:be:ef:13:37 dev vx self permanent In order to prevent this duplicity, add a paremeter to ndo_fdb_add, bool *notified. The flag is primed to false, and if the callee sends a notification on its own, it sets it to true, thus informing the core that it should not generate another notification. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Amit Cohen <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Link: https://patch.msgid.link/cbf6ae8195e85cbf922f8058ce4eba770f3b71ed.1731589511.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-10-101-32/+7
|\ | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR (net-6.12-rc3). No conflicts and no adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
| * ice: Flush FDB entries before resetWojciech Drewek2024-10-081-21/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Triggering the reset while in switchdev mode causes errors[1]. Rules are already removed by this time because switch content is flushed in case of the reset. This means that rules were deleted from HW but SW still thinks they exist so when we get SWITCHDEV_FDB_DEL_TO_DEVICE notification we try to delete not existing rule. We can avoid these errors by clearing the rules early in the reset flow before they are removed from HW. Switchdev API will get notified that the rule was removed so we won't get SWITCHDEV_FDB_DEL_TO_DEVICE notification. Remove unnecessary ice_clear_sw_switch_recipes. [1] ice 0000:01:00.0: Failed to delete FDB forward rule, err: -2 ice 0000:01:00.0: Failed to delete FDB guard rule, err: -2 Fixes: 7c945a1a8e5f ("ice: Switchdev FDB events support") Reviewed-by: Mateusz Polchlopek <[email protected]> Signed-off-by: Wojciech Drewek <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
| * ice: Fix netif_is_ice() in Safe ModeMarcin Szycik2024-10-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | netif_is_ice() works by checking the pointer to netdev ops. However, it only checks for the default ice_netdev_ops, not ice_netdev_safe_mode_ops, so in Safe Mode it always returns false, which is unintuitive. While it doesn't look like netif_is_ice() is currently being called anywhere in Safe Mode, this could change and potentially lead to unexpected behaviour. Fixes: df006dd4b1dc ("ice: Add initial support framework for LAG") Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
| * ice: Fix entering Safe ModeMarcin Szycik2024-10-081-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If DDP package is missing or corrupted, the driver should enter Safe Mode. Instead, an error is returned and probe fails. To fix this, don't exit init if ice_init_ddp_config() returns an error. Repro: * Remove or rename DDP package (/lib/firmware/intel/ice/ddp/ice.pkg) * Load ice Fixes: cc5776fe1832 ("ice: Enable switching default Tx scheduler topology") Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
| * ice: fix memleak in ice_init_tx_topology()Przemek Kitszel2024-09-301-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix leak of the FW blob (DDP pkg). Make ice_cfg_tx_topo() const-correct, so ice_init_tx_topology() can avoid copying whole FW blob. Copy just the topology section, and only when needed. Reuse the buffer allocated for the read of the current topology. This was found by kmemleak, with the following trace for each PF: [<ffffffff8761044d>] kmemdup_noprof+0x1d/0x50 [<ffffffffc0a0a480>] ice_init_ddp_config+0x100/0x220 [ice] [<ffffffffc0a0da7f>] ice_init_dev+0x6f/0x200 [ice] [<ffffffffc0a0dc49>] ice_init+0x29/0x560 [ice] [<ffffffffc0a10c1d>] ice_probe+0x21d/0x310 [ice] Constify ice_cfg_tx_topo() @buf parameter. This cascades further down to few more functions. Fixes: cc5776fe1832 ("ice: Enable switching default Tx scheduler topology") CC: Larysa Zaremba <[email protected]> CC: Jacob Keller <[email protected]> CC: Pucha Himasekhar Reddy <[email protected]> CC: Mateusz Polchlopek <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>