| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When adding new VM MAC, driver checks only *active* filters in
vsi->mac_filter_hash. Each MAC, even in non-active state is using resources.
To determine number of MACs VM uses, count VSI filters in *any* state.
Add i40e_count_all_filters() to simply count all filters, and rename
i40e_count_filters() to i40e_count_active_filters() to avoid ambiguity.
Fixes: cfb1d572c986 ("i40e: Add ensurance of MacVlan resources for every trusted VF")
Cc: [email protected]
Signed-off-by: Lukasz Czapnik <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Signed-off-by: Przemek Kitszel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The i40e hardware has multiple hardware settings which define the Maximum
Frame Size (MFS) of the physical port. The firmware has an AdminQ command
(0x0603) to configure the MFS, but the i40e Linux driver never issues this
command.
In most cases this is no problem, as the NVM default value has the device
configured for its maximum value of 9728. Unfortunately, recent versions of
the iPXE intelxl driver now issue the 0x0603 Set Mac Config command,
modifying the MFS and reducing it from its default value of 9728.
This occurred as part of iPXE commit 6871a7de705b ("[intelxl] Use admin
queue to set port MAC address and maximum frame size"), a prerequisite
change for supporting the E800 series hardware in iPXE. Both the E700 and
E800 firmware support the AdminQ command, and the iPXE code shares much of
the logic between the two device drivers.
The ice E800 Linux driver already issues the 0x0603 Set Mac Config command
early during probe, and is thus unaffected by the iPXE change.
Since commit 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set"), the
i40e driver does check the I40E_PRTGL_SAH register, but it only logs a
warning message if its value is below the 9728 default. This register also
only covers received packets and not transmitted packets. A warning can
inform system administrators, but does not correct the issue. No
interactions from userspace cause the driver to write to PRTGL_SAH or issue
the 0x0603 AdminQ command. Only a GLOBR reset will restore the value to its
default value. There is no obvious method to trigger a GLOBR reset from
user space.
To fix this, introduce the i40e_aq_set_mac_config() function, similar to
the one from the ice driver. Call this during early probe to ensure that
the device configuration matches driver expectation. Unlike E800, the E700
firmware also has a bit to control whether the MAC should append CRC data.
It is on by default, but setting a 0 to this bit would disable CRC. The
i40e implementation must set this bit to ensure CRC will be appended by the
MAC.
In addition to the AQ command, instead of just checking the I40E_PRTGL_SAH
register, update its value to the 9728 default and write it back. This
ensures that the hardware is in the expected state, regardless of whether
the iPXE (or any other early boot driver) has modified this state.
This is a better user experience, as we now fix the issues with larger MTU
instead of merely warning. It also aligns with the way the ice E800 series
driver works.
A final note: The Fixes tag provided here is not strictly accurate. The
issue occurs as a result of an external entity (the iPXE intelxl driver),
and this is not a regression specifically caused by the mentioned change.
However, I believe the original change to just warn about PRTGL_SAH being
too low was an insufficient fix.
Fixes: 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set")
Link: https://github.com/ipxe/ipxe/commit/6871a7de705b6f6a4046f0d19da9bcd689c3bc8e
Signed-off-by: Jacob Keller <[email protected]>
Signed-off-by: Aleksandr Loktionov <[email protected]>
Reviewed-by: Michal Schmidt <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If request_irq() in i40e_vsi_request_irq_msix() fails in an iteration
later than the first, the error path wants to free the IRQs requested
so far. However, it uses the wrong dev_id argument for free_irq(), so
it does not free the IRQs correctly and instead triggers the warning:
Trying to free already-free IRQ 173
WARNING: CPU: 25 PID: 1091 at kernel/irq/manage.c:1829 __free_irq+0x192/0x2c0
Modules linked in: i40e(+) [...]
CPU: 25 UID: 0 PID: 1091 Comm: NetworkManager Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy)
Hardware name: [...]
RIP: 0010:__free_irq+0x192/0x2c0
[...]
Call Trace:
<TASK>
free_irq+0x32/0x70
i40e_vsi_request_irq_msix.cold+0x63/0x8b [i40e]
i40e_vsi_request_irq+0x79/0x80 [i40e]
i40e_vsi_open+0x21f/0x2f0 [i40e]
i40e_open+0x63/0x130 [i40e]
__dev_open+0xfc/0x210
__dev_change_flags+0x1fc/0x240
netif_change_flags+0x27/0x70
do_setlink.isra.0+0x341/0xc70
rtnl_newlink+0x468/0x860
rtnetlink_rcv_msg+0x375/0x450
netlink_rcv_skb+0x5c/0x110
netlink_unicast+0x288/0x3c0
netlink_sendmsg+0x20d/0x430
____sys_sendmsg+0x3a2/0x3d0
___sys_sendmsg+0x99/0xe0
__sys_sendmsg+0x8a/0xf0
do_syscall_64+0x82/0x2c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
[...]
</TASK>
---[ end trace 0000000000000000 ]---
Use the same dev_id for free_irq() as for request_irq().
I tested this with inserting code to fail intentionally.
Fixes: 493fb30011b3 ("i40e: Move q_vectors from pointer to array to array of pointers")
Signed-off-by: Michal Schmidt <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Reviewed-by: Subbaraya Sundeep <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no need to store the err string in hw->err_str. Simplify it and
use common helper. hw->err_str is still used for other purpouse.
It should be marked that previously for unknown error the numeric value
was passed as a string. Now the "LIBIE_AQ_RC_UNKNOWN" is used for such
cases.
Add libie_aminq module in i40e Kconfig.
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Larysa Zaremba <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use libie_aq_desc instead of i40e_aq_desc. Do needed changes to allow
clean build.
Get version descriptor is a little less detailed on i40e. To not mess up
with shifting or union inside libie desc use get version descriptor from
i40e.
Move additional caps for i40e to libie.
Fix RCT in declaration that is using libie_aq_desc;
Use libie_aq_raw() wherever it can be used.
The libie aq error is extended, cover it in ice driver just to clean
build. In next patches the libie code for that will be used in each
of intel driver.
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6.
It is time to convert the Intel i40e driver to the new API, so that
timestamping configuration can be removed from the ndo_eth_ioctl() path
completely.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Vadim Fedorenko <[email protected]>
Reviewed-by: Milena Olech <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Drivers that are using ops lock and don't depend on RTNL lock
still need to manage it because udp_tunnel's RTNL dependency.
Introduce new udp_tunnel_nic_lock and use it instead of
rtnl_lock. Drop non-UDP_TUNNEL_NIC_INFO_MAY_SLEEP mode from
udp_tunnel infra (udp_tunnel_nic_device_sync_work needs to
grab udp_tunnel_nic_lock mutex and might sleep).
Cover more places in v4:
- netlink
- udp_tunnel_notify_add_rx_port (ndo_open)
- triggers udp_tunnel_nic_device_sync_work
- udp_tunnel_notify_del_rx_port (ndo_stop)
- triggers udp_tunnel_nic_device_sync_work
- udp_tunnel_get_rx_info (__netdev_update_features)
- triggers NETDEV_UDP_TUNNEL_PUSH_INFO
- udp_tunnel_drop_rx_info (__netdev_update_features)
- triggers NETDEV_UDP_TUNNEL_DROP_INFO
- udp_tunnel_nic_reset_ntf (ndo_open)
- notifiers
- udp_tunnel_nic_netdevice_event, depending on the event:
- triggers NETDEV_UDP_TUNNEL_PUSH_INFO
- triggers NETDEV_UDP_TUNNEL_DROP_INFO
- ethnl_tunnel_info_reply_size
- udp_tunnel_nic_set_port_priv (two intel drivers)
Cc: Michael Chan <[email protected]>
Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Stanislav Fomichev <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |\
| |
| |
| |
| |
| |
| |
| | |
Cross-merge networking fixes after downstream PR (net-6.16-rc2).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Introduce a link_down_events counter to the i40e driver, incremented
each time the link transitions from up to down.
This counter can help diagnose issues related to link stability,
such as port flapping or unexpected link drops.
The value is exposed via ethtool's get_link_ext_stats() interface.
Co-developed-by: Martyna Szapar-Mudlaw <[email protected]>
Signed-off-by: Martyna Szapar-Mudlaw <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Dawid Osuchowski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The Intel i40e, iavf, and ice drivers all include a definition of the
packet classifier filter types used to program RSS hash enable bits. For
i40e, these bits are used for both the PF and VF to configure the PFQF_HENA
and VFQF_HENA registers.
For ice and iAVF, these bits are used to communicate the desired hash
enable filter over virtchnl via its struct virtchnl_rss_hashena. The
virtchnl.h header makes no mention of where the bit definitions reside.
Maintaining a separate copy of these bits across three drivers is
cumbersome. Move the definition to libie as a new pctype.h header file.
Each driver can include this, and drop its own definition.
The ice implementation also defined a ICE_AVF_FLOW_FIELD_INVALID, intending
to use this to indicate when there were no hash enable bits set. This is
confusing, since the enumeration is using bit positions. A value of 0
*should* indicate the first bit. Instead, rewrite the code that uses
ICE_AVF_FLOW_FIELD_INVALID to just check if the avf_hash is zero. From
context this should be clear that we're checking if none of the bits are
set.
The values are kept as bit positions instead of encoding the BIT_ULL
directly into their value. While most users will simply use BIT_ULL
immediately, i40e uses the macros both with BIT_ULL and test_bit/set_bit
calls.
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
i40e, ice, and iAVF all use 'hena' as a shorthand for the "hash enable"
configuration. This comes originally from the X710 datasheet 'xxQF_HENA'
registers. In the context of the registers the meaning is fairly clear.
However, on its own, hena is a weird name that can be more difficult to
understand. This is especially true in ice. The E810 hardware doesn't even
have registers with HENA in the name.
Replace the shorthand 'hena' with 'hashcfg'. This makes it clear the
variables deal with the Hash configuration, not just a single boolean
on/off for all hashing.
Do not update the register names. These come directly from the datasheet
for X710 and X722, and it is more important that the names can be searched.
Suggested-by: Przemek Kitszel <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement "mdd-auto-reset-vf" priv-flag to handle Tx and Rx MDD events for VFs.
This flag is also used in other network adapters like ICE.
Usage:
- "on" - The problematic VF will be automatically reset
if a malformed descriptor is detected.
- "off" - The problematic VF will be disabled.
In cases where a VF sends malformed packets classified as malicious, it can
cause the Tx queue to freeze, rendering it unusable for several minutes. When
an MDD event occurs, this new implementation allows for a graceful VF reset to
quickly restore operational state.
Currently, VF queues are disabled if an MDD event occurs. This patch adds the
ability to reset the VF if a Tx or Rx MDD event occurs. It also includes MDD
event logging throttling to avoid dmesg pollution and unifies the format of
Tx and Rx MDD messages.
Note: Standard message rate limiting functions like dev_info_ratelimited()
do not meet our requirements. Custom rate limiting is implemented,
please see the code for details.
Co-developed-by: Jan Sokolowski <[email protected]>
Signed-off-by: Jan Sokolowski <[email protected]>
Co-developed-by: Padraig J Connolly <[email protected]>
Signed-off-by: Padraig J Connolly <[email protected]>
Signed-off-by: Aleksandr Loktionov <[email protected]>
Reviewed-by: Michal Schmidt <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
i40e_commit_partition_bw_setting() was added in 2017 by
commit 4fc8c6763957 ("i40e: genericize the partition bandwidth control")
but hasn't been used.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The last use of i40e_del_filter() was removed in 2016 by
commit 9569a9a4547d ("i40e: when adding or removing MAC filters, correctly
handle VLANs")
Remove it.
Fix up a comment that referenced it.
Note: The __ version of this function is still used.
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The last use of i40e_get_cur_guaranteed_fd_count() was removed in 2015 by
commit 04294e38a451 ("i40e: FD filters flush policy changes")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clean up the existing export namespace code along the same lines of
commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.
Scripted using
git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file;
do
awk -i inplace '
/^#define EXPORT_SYMBOL_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/^#define MODULE_IMPORT_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/MODULE_IMPORT_NS/ {
$0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g");
}
/EXPORT_SYMBOL_NS/ {
if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) {
if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ &&
$0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ &&
$0 !~ /^my/) {
getline line;
gsub(/[[:space:]]*\\$/, "");
gsub(/[[:space:]]/, "", line);
$0 = $0 " " line;
}
$0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/,
"\\1(\\2, \"\\3\")", "g");
}
}
{ print }' $file;
done
Requested-by: Masahiro Yamada <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc
Acked-by: Greg KH <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently when FDB entries are added to or deleted from a VXLAN netdevice,
the VXLAN driver emits one notification, including the VXLAN-specific
attributes. The core however always sends a notification as well, a generic
one. Thus two notifications are unnecessarily sent for these operations. A
similar situation comes up with bridge driver, which also emits
notifications on its own:
# ip link add name vx type vxlan id 1000 dstport 4789
# bridge monitor fdb &
[1] 1981693
# bridge fdb add de:ad:be:ef:13:37 dev vx self dst 192.0.2.1
de:ad:be:ef:13:37 dev vx dst 192.0.2.1 self permanent
de:ad:be:ef:13:37 dev vx self permanent
In order to prevent this duplicity, add a paremeter to ndo_fdb_add,
bool *notified. The flag is primed to false, and if the callee sends a
notification on its own, it sets it to true, thus informing the core that
it should not generate another notification.
Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Link: https://patch.msgid.link/cbf6ae8195e85cbf922f8058ce4eba770f3b71ed.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a race condition in the i40e driver that leads to MAC/VLAN filters
becoming corrupted and leaking. Address the issue that occurs under
heavy load when multiple threads are concurrently modifying MAC/VLAN
filters by setting mac and port VLAN.
1. Thread T0 allocates a filter in i40e_add_filter() within
i40e_ndo_set_vf_port_vlan().
2. Thread T1 concurrently frees the filter in __i40e_del_filter() within
i40e_ndo_set_vf_mac().
3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which
refers to the already freed filter memory, causing corruption.
Reproduction steps:
1. Spawn multiple VFs.
2. Apply a concurrent heavy load by running parallel operations to change
MAC addresses on the VFs and change port VLANs on the host.
3. Observe errors in dmesg:
"Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX,
please set promiscuous on manually for VF XX".
Exact code for stable reproduction Intel can't open-source now.
The fix involves implementing a new intermediate filter state,
I40E_FILTER_NEW_SYNC, for the time when a filter is on a tmp_add_list.
These filters cannot be deleted from the hash list directly but
must be removed using the full process.
Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key")
Signed-off-by: Aleksandr Loktionov <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Reviewed-by: Michal Schmidt <[email protected]>
Tested-by: Michal Schmidt <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch addresses a macvlan leak issue in the i40e driver caused by
concurrent access to vsi->mac_filter_hash. The leak occurs when multiple
threads attempt to modify the mac_filter_hash simultaneously, leading to
inconsistent state and potential memory leaks.
To fix this, we now wrap the calls to i40e_del_mac_filter() and zeroing
vf->default_lan_addr.addr with spin_lock/unlock_bh(&vsi->mac_filter_hash_lock),
ensuring atomic operations and preventing concurrent access.
Additionally, we add lockdep_assert_held(&vsi->mac_filter_hash_lock) in
i40e_add_mac_filter() to help catch similar issues in the future.
Reproduction steps:
1. Spawn VFs and configure port vlan on them.
2. Trigger concurrent macvlan operations (e.g., adding and deleting
portvlan and/or mac filters).
3. Observe the potential memory leak and inconsistent state in the
mac_filter_hash.
This synchronization ensures the integrity of the mac_filter_hash and prevents
the described leak.
Fixes: fed0d9f13266 ("i40e: Fix VF's MAC Address change on VM")
Reviewed-by: Arkadiusz Kubalewski <[email protected]>
Signed-off-by: Aleksandr Loktionov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add "EEE: Enabled/Disabled" to dmesg for supported X710 Base-T/KR/KX
cards. According to the IEEE standard report the EEE ability and the
EEE Link Partner ability. Use the kernel's 'ethtool_keee' structure
and report EEE link modes.
Example:
dmesg | grep 'NIC Link is'
ethtool --show-eee <device>
Before:
NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Supported EEE link modes: Not reported
Advertised EEE link modes: Not reported
Link partner advertised EEE link modes: Not reported
After:
NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None, EEE: Enabled
Supported EEE link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Link partner advertised EEE link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Reviewed-by: Arkadiusz Kubalewski <[email protected]>
Signed-off-by: Aleksandr Loktionov <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-07-11 (net/intel)
This series contains updates to most Intel network drivers.
Tony removes MODULE_AUTHOR from drivers containing the entry.
Simon Horman corrects a kdoc entry for i40e.
Pawel adds implementation for devlink param "local_forwarding" on ice.
Michal removes unneeded call, and code, for eswitch rebuild for ice.
Sasha removed a no longer used field from igc.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
igc: Remove the internal 'eee_advert' field
ice: remove eswitch rebuild
ice: Add support for devlink local_forwarding param
i40e: correct i40e_addr_to_hkey() name in kdoc
net: intel: Remove MODULE_AUTHORs
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We are moving away from the Sourceforge email address. Rather than
removing or updating the email for the affected entries, remove the
MODULE_AUTHOR altogether as its usage is incorrect [1].
Link: https://lore.kernel.org/netdev/20200626115236.7f36d379@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ [1]
Reviewed-by: Jesse Brandeburg <[email protected]>
Acked-by: Alexander Lobakin <[email protected]> # libeth, libie
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The commit 6533e558c650 ("i40e: Fix reset path while removing
the driver") introduced a new PF state "__I40E_IN_REMOVE" to block
modifying the XDP program while the driver is being removed.
Unfortunately, such a change is useful only if the ".ndo_bpf()"
callback was called out of the rmmod context because unloading the
existing XDP program is also a part of driver removing procedure.
In other words, from the rmmod context the driver is expected to
unload the XDP program without reporting any errors. Otherwise,
the kernel warning with callstack is printed out to dmesg.
Example failing scenario:
1. Load the i40e driver.
2. Load the XDP program.
3. Unload the i40e driver (using "rmmod" command).
The example kernel warning log:
[ +0.004646] WARNING: CPU: 94 PID: 10395 at net/core/dev.c:9290 unregister_netdevice_many_notify+0x7a9/0x870
[...]
[ +0.010959] RIP: 0010:unregister_netdevice_many_notify+0x7a9/0x870
[...]
[ +0.002726] Call Trace:
[ +0.002457] <TASK>
[ +0.002119] ? __warn+0x80/0x120
[ +0.003245] ? unregister_netdevice_many_notify+0x7a9/0x870
[ +0.005586] ? report_bug+0x164/0x190
[ +0.003678] ? handle_bug+0x3c/0x80
[ +0.003503] ? exc_invalid_op+0x17/0x70
[ +0.003846] ? asm_exc_invalid_op+0x1a/0x20
[ +0.004200] ? unregister_netdevice_many_notify+0x7a9/0x870
[ +0.005579] ? unregister_netdevice_many_notify+0x3cc/0x870
[ +0.005586] unregister_netdevice_queue+0xf7/0x140
[ +0.004806] unregister_netdev+0x1c/0x30
[ +0.003933] i40e_vsi_release+0x87/0x2f0 [i40e]
[ +0.004604] i40e_remove+0x1a1/0x420 [i40e]
[ +0.004220] pci_device_remove+0x3f/0xb0
[ +0.003943] device_release_driver_internal+0x19f/0x200
[ +0.005243] driver_detach+0x48/0x90
[ +0.003586] bus_remove_driver+0x6d/0xf0
[ +0.003939] pci_unregister_driver+0x2e/0xb0
[ +0.004278] i40e_exit_module+0x10/0x5f0 [i40e]
[ +0.004570] __do_sys_delete_module.isra.0+0x197/0x310
[ +0.005153] do_syscall_64+0x85/0x170
[ +0.003684] ? syscall_exit_to_user_mode+0x69/0x220
[ +0.004886] ? do_syscall_64+0x95/0x170
[ +0.003851] ? exc_page_fault+0x7e/0x180
[ +0.003932] entry_SYSCALL_64_after_hwframe+0x71/0x79
[ +0.005064] RIP: 0033:0x7f59dc9347cb
[ +0.003648] Code: 73 01 c3 48 8b 0d 65 16 0c 00 f7 d8 64 89 01 48 83
c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 16 0c 00 f7 d8 64 89 01 48
[ +0.018753] RSP: 002b:00007ffffac99048 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ +0.007577] RAX: ffffffffffffffda RBX: 0000559b9bb2f6e0 RCX: 00007f59dc9347cb
[ +0.007140] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000559b9bb2f748
[ +0.007146] RBP: 00007ffffac99070 R08: 1999999999999999 R09: 0000000000000000
[ +0.007133] R10: 00007f59dc9a5ac0 R11: 0000000000000206 R12: 0000000000000000
[ +0.007141] R13: 00007ffffac992d8 R14: 0000559b9bb2f6e0 R15: 0000000000000000
[ +0.007151] </TASK>
[ +0.002204] ---[ end trace 0000000000000000 ]---
Fix this by checking if the XDP program is being loaded or unloaded.
Then, block only loading a new program while "__I40E_IN_REMOVE" is set.
Also, move testing "__I40E_IN_REMOVE" flag to the beginning of XDP_SETUP
callback to avoid unnecessary operations and checks.
Fixes: 6533e558c650 ("i40e: Fix reset path while removing the driver")
Signed-off-by: Michal Kubiak <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When EEH events occurs, the callback functions in the i40e, which are
managed by the EEH driver, will completely suspend and resume all IO
operations.
- In the PCI error detected callback, replaced i40e_prep_for_reset()
with i40e_io_suspend(). The change is to fully suspend all I/O
operations
- In the PCI error slot reset callback, replaced pci_enable_device_mem()
with pci_enable_device(). This change enables both I/O and memory of
the device.
- In the PCI error resume callback, replaced i40e_handle_reset_warning()
with i40e_io_resume(). This change allows the system to resume I/O
operations
Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains")
Reviewed-by: Jacob Keller <[email protected]>
Tested-by: Robert Thomas <[email protected]>
Signed-off-by: Thinh Tran <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-3-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two new functions, i40e_io_suspend() and i40e_io_resume(), have been
introduced. These functions were factored out from the existing
i40e_suspend() and i40e_resume() respectively. This factoring was
done due to concerns about the logic of the I40E_SUSPENSED state, which
caused the device to be unable to recover. The functions are now used
in the EEH handling for device suspend/resume callbacks.
The function i40e_enable_mc_magic_wake() has been moved ahead of
i40e_io_suspend() to ensure it is declared before being used.
Tested-by: Robert Thomas <[email protected]>
Signed-off-by: Thinh Tran <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-2-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This driver currently doesn't support any control flags.
Use flow_rule_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.
In case any control flags are masked, flow_rule_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Simon reported that ndo_change_mtu() methods were never
updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted
in commit 501a90c94510 ("inet: protect against too small
mtu values.")
We read dev->mtu without holding RTNL in many places,
with READ_ONCE() annotations.
It is time to take care of ndo_change_mtu() methods
to use corresponding WRITE_ONCE()
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Simon Horman <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Sabrina Dubroca <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Shannon Nelson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add helper i40e_vsi_reconfig_tc(vsi) that configures TC
for given VSI using previously stored TC bitmap.
Effectively replaces open-coded patterns:
enabled_tc = vsi->tc_config.enabled_tc;
vsi->tc_config.enabled_tc = 0;
i40e_vsi_config_tc(vsi, enabled_tc);
Reviewed-by: Michal Schmidt <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add a helper to access main VEB:
i40e_pf_get_main_veb(pf) replaces 'pf->veb[pf->lan_veb]'
Reviewed-by: Michal Schmidt <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the driver code there are 3 types of checks whether given
VSI is main or not:
1. vsi->type ==/!= I40E_VSI_MAIN
2. vsi ==/!= pf->vsi[pf->lan_vsi]
3. vsi->seid ==/!= pf->vsi[pf->lan_vsi]->seid
All of them are equivalent and can be consolidated. Convert cases
2 and 3 to case 1.
Reviewed-by: Michal Schmidt <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
Add simple helper i40e_pf_get_main_vsi(pf) to access main VSI
that replaces pattern 'pf->vsi[pf->lan_vsi]'
Reviewed-by: Michal Schmidt <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 07d44190a389 ("i40e/i40evf: Detect and recover hung queue
scenario") changes i40e_detect_recover_hung() argument type from
i40e_pf* to i40e_vsi* to be shareable by both i40e and i40evf.
Because the i40evf does not exist anymore and the function is
exclusively used by i40e we can revert this change.
Reviewed-by: Michal Schmidt <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 0ef2d5afb12d ("i40e: KISS the client interface") simplified
the client interface so in practice it supports only one client
per i40e netdev. But we have still 2 notification functions that
uses as parameter a pointer to VSI of netdevice associated with
the client. After the mentioned commit only possible and used
VSI is the main (LAN) VSI.
So refactor these functions so they are called with PF pointer argument
and the associated VSI (LAN) is taken inside them.
Reviewed-by: Michal Schmidt <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The field is initialized always to zero and it is never read.
Remove it.
Reviewed-by: Michal Schmidt <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
net: intel: start The Great Code Dedup + Page Pool for iavf
Alexander Lobakin says:
Here's a two-shot: introduce {,Intel} Ethernet common library (libeth and
libie) and switch iavf to Page Pool. Details are in the commit messages;
here's a summary:
Not a secret there's a ton of code duplication between two and more Intel
ethernet modules. Before introducing new changes, which would need to be
copied over again, start decoupling the already existing duplicate
functionality into a new module, which will be shared between several
Intel Ethernet drivers. The first name that came to my mind was
"libie" -- "Intel Ethernet common library". Also this sounds like
"lovelie" (-> one word, no "lib I E" pls) and can be expanded as
"lib Internet Explorer" :P
The "generic", pure-software part is placed separately, so that it can be
easily reused in any driver by any vendor without linking to the Intel
pre-200G guts. In a few words, it's something any modern driver does the
same way, but nobody moved it level up (yet).
The series is only the beginning. From now on, adding every new feature
or doing any good driver refactoring will remove much more lines than add
for quite some time. There's a basic roadmap with some deduplications
planned already, not speaking of that touching every line now asks:
"can I share this?". The final destination is very ambitious: have only
one unified driver for at least i40e, ice, iavf, and idpf with a struct
ops for each generation. That's never gonna happen, right? But you still
can at least try.
PP conversion for iavf lands within the same series as these two are tied
closely. libie will support Page Pool model only, so that a driver can't
use much of the lib until it's converted. iavf is only the example, the
rest will eventually be converted soon on a per-driver basis. That is
when it gets really interesting. Stay tech.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
MAINTAINERS: add entry for libeth and libie
iavf: switch to Page Pool
iavf: pack iavf_ring more efficiently
libeth: add Rx buffer management
page_pool: add DMA-sync-for-CPU inline helper
page_pool: constify some read-only function arguments
slab: introduce kvmalloc_array_node() and kvcalloc_node()
iavf: drop page splitting and recycling
iavf: kill "legacy-rx" for good
net: intel: introduce {, Intel} Ethernet common library
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Not a secret there's a ton of code duplication between two and more Intel
ethernet modules.
Before introducing new changes, which would need to be copied over again,
start decoupling the already existing duplicate functionality into a new
module, which will be shared between several Intel Ethernet drivers.
Add the lookup table which converts 8/10-bit hardware packet type into
a parsed bitfield structure for easy checking packet format parameters,
such as payload level, IP version, etc. This is currently used by i40e,
ice and iavf and it's all the same in all three drivers.
The only difference introduced in this implementation is that instead of
defining a 256 (or 1024 in case of ice) element array, add unlikely()
condition to limit the input to 154 (current maximum non-reserved packet
type). There's no reason to waste 600 (or even 3600) bytes only to not
hurt very unlikely exception packets.
The hash computation function now takes payload level directly as a
pkt_hash_type. There's a couple cases when non-IP ptypes are marked as
L3 payload and in the previous versions their hash level would be 2, not
3. But skb_set_hash() only sees difference between L4 and non-L4, thus
this won't change anything at all.
The module is behind the hidden Kconfig symbol, which the drivers will
select when needed. The exports are behind 'LIBIE' namespace to limit
the scope of the functions.
Not that non-HW-specific symbols will live in yet another module,
libeth. This is done to easily distinguish pretty generic code ready
for reusing by any other vendor and/or for moving the layer up from
the code useful in Intel's 1-100G drivers only.
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/ti/icssg/icssg_prueth.c
net/mac80211/chan.c
89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link")
87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()")
https://lore.kernel.org/all/[email protected]/
net/unix/garbage.c
1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().")
4090fa373f0e ("af_unix: Replace garbage collection algorithm.")
drivers/net/ethernet/ti/icssg/icssg_prueth.c
drivers/net/ethernet/ti/icssg/icssg_common.c
4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()")
e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file")
No adjacent changes.
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If the MFS is set below the default (0x2600), a warning message is
reported like the following :
MFS for port 1 has been set below the default: 600
This message is a bit confusing as the number shown here (600) is in
fact an hexa number: 0x600 = 1536
Without any explicit "0x" prefix, this message is read like the MFS is
set to 600 bytes.
MFS, as per MTUs, are usually expressed in decimal base.
This commit reports both current and default MFS values in decimal
so it's less confusing for end-users.
A typical warning message looks like the following :
MFS for port 1 (1536) has been set below the default (9728)
Signed-off-by: Erwan Velu <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Fixes: 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Issue reported by customer during SRIOV testing, call trace:
When both i40e and the i40iw driver are loaded, a warning
in check_flush_dependency is being triggered. This seems
to be because of the i40e driver workqueue is allocated with
the WQ_MEM_RECLAIM flag, and the i40iw one is not.
Similar error was encountered on ice too and it was fixed by
removing the flag. Do the same for i40e too.
[Feb 9 09:08] ------------[ cut here ]------------
[ +0.000004] workqueue: WQ_MEM_RECLAIM i40e:i40e_service_task [i40e] is
flushing !WQ_MEM_RECLAIM infiniband:0x0
[ +0.000060] WARNING: CPU: 0 PID: 937 at kernel/workqueue.c:2966
check_flush_dependency+0x10b/0x120
[ +0.000007] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq
snd_timer snd_seq_device snd soundcore nls_utf8 cifs cifs_arc4
nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 dns_resolver netfs qrtr
rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common irdma
intel_uncore_frequency intel_uncore_frequency_common ice ipmi_ssif
isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal
intel_powerclamp gnss coretemp ib_uverbs rapl intel_cstate ib_core
iTCO_wdt iTCO_vendor_support acpi_ipmi mei_me ipmi_si intel_uncore
ioatdma i2c_i801 joydev pcspkr mei ipmi_devintf lpc_ich
intel_pch_thermal i2c_smbus ipmi_msghandler acpi_power_meter acpi_pad
xfs libcrc32c ast sd_mod drm_shmem_helper t10_pi drm_kms_helper sg ixgbe
drm i40e ahci crct10dif_pclmul libahci crc32_pclmul igb crc32c_intel
libata ghash_clmulni_intel i2c_algo_bit mdio dca wmi dm_mirror
dm_region_hash dm_log dm_mod fuse
[ +0.000050] CPU: 0 PID: 937 Comm: kworker/0:3 Kdump: loaded Not
tainted 6.8.0-rc2-Feb-net_dev-Qiueue-00279-gbd43c5687e05 #1
[ +0.000003] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS
SE5C620.86B.02.01.0013.121520200651 12/15/2020
[ +0.000001] Workqueue: i40e i40e_service_task [i40e]
[ +0.000024] RIP: 0010:check_flush_dependency+0x10b/0x120
[ +0.000003] Code: ff 49 8b 54 24 18 48 8d 8b b0 00 00 00 49 89 e8 48
81 c6 b0 00 00 00 48 c7 c7 b0 97 fa 9f c6 05 8a cc 1f 02 01 e8 35 b3 fd
ff <0f> 0b e9 10 ff ff ff 80 3d 78 cc 1f 02 00 75 94 e9 46 ff ff ff 90
[ +0.000002] RSP: 0018:ffffbd294976bcf8 EFLAGS: 00010282
[ +0.000002] RAX: 0000000000000000 RBX: ffff94d4c483c000 RCX:
0000000000000027
[ +0.000001] RDX: ffff94d47f620bc8 RSI: 0000000000000001 RDI:
ffff94d47f620bc0
[ +0.000001] RBP: 0000000000000000 R08: 0000000000000000 R09:
00000000ffff7fff
[ +0.000001] R10: ffffbd294976bb98 R11: ffffffffa0be65e8 R12:
ffff94c5451ea180
[ +0.000001] R13: ffff94c5ab5e8000 R14: ffff94c5c20b6e05 R15:
ffff94c5f1330ab0
[ +0.000001] FS: 0000000000000000(0000) GS:ffff94d47f600000(0000)
knlGS:0000000000000000
[ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000001] CR2: 00007f9e6f1fca70 CR3: 0000000038e20004 CR4:
00000000007706f0
[ +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ +0.000001] PKRU: 55555554
[ +0.000001] Call Trace:
[ +0.000001] <TASK>
[ +0.000002] ? __warn+0x80/0x130
[ +0.000003] ? check_flush_dependency+0x10b/0x120
[ +0.000002] ? report_bug+0x195/0x1a0
[ +0.000005] ? handle_bug+0x3c/0x70
[ +0.000003] ? exc_invalid_op+0x14/0x70
[ +0.000002] ? asm_exc_invalid_op+0x16/0x20
[ +0.000006] ? check_flush_dependency+0x10b/0x120
[ +0.000002] ? check_flush_dependency+0x10b/0x120
[ +0.000002] __flush_workqueue+0x126/0x3f0
[ +0.000015] ib_cache_cleanup_one+0x1c/0xe0 [ib_core]
[ +0.000056] __ib_unregister_device+0x6a/0xb0 [ib_core]
[ +0.000023] ib_unregister_device_and_put+0x34/0x50 [ib_core]
[ +0.000020] i40iw_close+0x4b/0x90 [irdma]
[ +0.000022] i40e_notify_client_of_netdev_close+0x54/0xc0 [i40e]
[ +0.000035] i40e_service_task+0x126/0x190 [i40e]
[ +0.000024] process_one_work+0x174/0x340
[ +0.000003] worker_thread+0x27e/0x390
[ +0.000001] ? __pfx_worker_thread+0x10/0x10
[ +0.000002] kthread+0xdf/0x110
[ +0.000002] ? __pfx_kthread+0x10/0x10
[ +0.000002] ret_from_fork+0x2d/0x50
[ +0.000003] ? __pfx_kthread+0x10/0x10
[ +0.000001] ret_from_fork_asm+0x1b/0x30
[ +0.000004] </TASK>
[ +0.000001] ---[ end trace 0000000000000000 ]---
Fixes: 4d5957cbdecd ("i40e: remove WQ_UNBOUND and the task limit of our workqueue")
Signed-off-by: Sindhu Devale <[email protected]>
Reviewed-by: Arkadiusz Kubalewski <[email protected]>
Reviewed-by: Mateusz Polchlopek <[email protected]>
Signed-off-by: Aleksandr Loktionov <[email protected]>
Tested-by: Robert Ganzynkowicz <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Cross-merge networking fixes after downstream PR.
Conflicts:
net/ipv4/ip_gre.c
17af420545a7 ("erspan: make sure erspan_base_hdr is present in skb->head")
5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps")
https://lore.kernel.org/all/[email protected]/
Adjacent changes:
net/ipv6/ip6_fib.c
d21d40605bca ("ipv6: Fix infinite recursion in fib6_dump_done().")
5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()")
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The bug usually affects untrusted VFs, because they are limited to 18 MACs,
it affects them badly, not letting to create MAC all filters.
Not stable to reproduce, it happens when VF user creates MAC filters
when other MACVLAN operations are happened in parallel.
But consequence is that VF can't receive desired traffic.
Fix counter to be bumped only for new or active filters.
Fixes: 621650cabee5 ("i40e: Refactoring VF MAC filters counting to make more reliable")
Signed-off-by: Aleksandr Loktionov <[email protected]>
Reviewed-by: Arkadiusz Kubalewski <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
during poll exit") followed by commit 23be7075b318 ("ice: fix software
generating extra interrupts") I'm seeing the similar issue also with
i40e driver.
In certain situation when busy-loop is enabled together with adaptive
coalescing, the driver occasionally misses that there are outstanding
descriptors to clean when exiting busy poll.
Try to catch the remaining work by triggering a software interrupt
when exiting busy poll. No extra interrupts will be generated when
busy polling is not used.
The issue was found when running sockperf ping-pong tcp test with
adaptive coalescing and busy poll enabled (50 as value busy_pool
and busy_read sysctl knobs) and results in huge latency spikes
with more than 100000us.
The fix is inspired from the ice driver and do the following:
1) During napi poll exit in case of busy-poll (napo_complete_done()
returns false) this is recorded to q_vector that we were in busy
loop.
2) Extends i40e_buildreg_itr() to be able to add an enforced software
interrupt into built value
2) In i40e_update_enable_itr() enforces a software interrupt trigger
if we are exiting busy poll to catch any pending clean-ups
3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
20K interrupts per second to limit the number of these sw interrupts.
Test results
============
Prior:
[root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.9.9.1 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473
sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.571 usec
sockperf: Total 2429473 observations; each percentile contains 24294.73 observations
sockperf: ---> <MAX> observation = 103294.331
sockperf: ---> percentile 99.999 = 45.633
sockperf: ---> percentile 99.990 = 37.013
sockperf: ---> percentile 99.900 = 35.910
sockperf: ---> percentile 99.000 = 33.390
sockperf: ---> percentile 90.000 = 28.626
sockperf: ---> percentile 75.000 = 27.741
sockperf: ---> percentile 50.000 = 26.743
sockperf: ---> percentile 25.000 = 25.614
sockperf: ---> <MIN> observation = 12.220
After:
[root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.9.9.1 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186
sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.965 usec
sockperf: Total 2391186 observations; each percentile contains 23911.86 observations
sockperf: ---> <MAX> observation = 195.841
sockperf: ---> percentile 99.999 = 45.026
sockperf: ---> percentile 99.990 = 39.009
sockperf: ---> percentile 99.900 = 35.922
sockperf: ---> percentile 99.000 = 33.482
sockperf: ---> percentile 90.000 = 28.902
sockperf: ---> percentile 75.000 = 27.821
sockperf: ---> percentile 50.000 = 26.860
sockperf: ---> percentile 25.000 = 25.685
sockperf: ---> <MIN> observation = 12.277
Fixes: 0bcd952feec7 ("ethernet/intel: consolidate NAPI and NAPI exit")
Reported-by: Hugo Ferreira <[email protected]>
Reviewed-by: Michal Schmidt <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| |\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-03-29 (net: intel)
This series contains updates to most Intel drivers.
Jesse moves declaration of pci_driver struct to remove need for forward
declarations in igb and converts Intel drivers to user newer power
management ops.
Sasha reworks power management flow on igc to avoid using rtnl_lock()
during those flows.
Maciej reorganizes i40e_nvm file to avoid forward declarations.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
i40e: avoid forward declarations in i40e_nvm.c
igc: Refactor runtime power management flow
net: intel: implement modern PM ops declarations
igb: simplify pci ops declaration
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Switch the Intel networking drivers to use the new power management ops
declaration formats and macros, which allows us to drop __maybe_unused,
as well as a bunch of ifdef checking CONFIG_PM.
This is safe to do because the compiler drops the unused functions,
verified by checking for any of the power management function symbols
being present in System.map for a build without CONFIG_PM.
If a driver has runtime PM, define the ops with pm_ptr(), and if the
driver has Simple PM, use pm_sleep_ptr(), as well as the new versions of
the macros for declaring the members of the pm_ops structs.
Checked with network-enabled allnoconfig, allyesconfig, allmodconfig on
x64_64.
Reviewed-by: Alan Brady <[email protected]>
Signed-off-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are, especially with multi-attr arrays, many cases
of needing to iterate all attributes of a specific type
in a netlink message or a nested attribute. Add specific
macros to support that case.
Also convert many instances using this spatch:
@@
iterator nla_for_each_attr;
iterator name nla_for_each_attr_type;
identifier nla;
expression head, len, rem;
expression ATTR;
type T;
identifier x;
@@
-nla_for_each_attr(nla, head, len, rem)
+nla_for_each_attr_type(nla, ATTR, head, len, rem)
{
<... T x; ...>
-if (nla_type(nla) == ATTR) {
...
-}
}
@@
identifier nla;
iterator nla_for_each_nested;
iterator name nla_for_each_nested_type;
expression attr, rem;
expression ATTR;
type T;
identifier x;
@@
-nla_for_each_nested(nla, attr, rem)
+nla_for_each_nested_type(nla, ATTR, attr, rem)
{
<... T x; ...>
-if (nla_type(nla) == ATTR) {
...
-}
}
@@
iterator nla_for_each_attr;
iterator name nla_for_each_attr_type;
identifier nla;
expression head, len, rem;
expression ATTR;
type T;
identifier x;
@@
-nla_for_each_attr(nla, head, len, rem)
+nla_for_each_attr_type(nla, ATTR, head, len, rem)
{
<... T x; ...>
-if (nla_type(nla) != ATTR) continue;
...
}
@@
identifier nla;
iterator nla_for_each_nested;
iterator name nla_for_each_nested_type;
expression attr, rem;
expression ATTR;
type T;
identifier x;
@@
-nla_for_each_nested(nla, attr, rem)
+nla_for_each_nested_type(nla, ATTR, attr, rem)
{
<... T x; ...>
-if (nla_type(nla) != ATTR) continue;
...
}
Although I had to undo one bad change this made, and
I also adjusted some other code for whitespace and to
use direct variable initialization now.
Signed-off-by: Johannes Berg <[email protected]>
Link: https://lore.kernel.org/r/20240328203144.b5a6c895fb80.I1869b44767379f204998ff44dd239803f39c23e0@changeid
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
net/core/page_pool_user.c
0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors")
429679dcf7d9 ("page_pool: fix netlink dump stop/resume")
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Disable NAPI before shutting down queues that this particular NAPI
contains so that the order of actions in i40e_queue_pair_disable()
mirrors what we do in i40e_queue_pair_enable().
Fixes: 123cecd427b6 ("i40e: added queue pair disable/enable functions")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Acked-by: Magnus Karlsson <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
(skb_transport_header(skb) - skb_network_header(skb))
can be replaced by skb_network_header_len(skb)
Add a DEBUG_NET_WARN_ON_ONCE() in skb_network_header_len()
to catch cases were the transport_header was not set.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|