| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 1a931c4f5e68 ("igc: add lock preventing multiple simultaneous PTM
transactions") added a new mutex to protect concurrent PTM transactions.
This lock is acquired in igc_ptp_reset() in order to ensure the PTM
registers are properly disabled after a device reset.
The flow where the lock is acquired already holds a spinlock, so acquiring
a mutex leads to a sleep-while-locking bug, reported both by smatch,
and the kernel test robot.
The critical section in igc_ptp_reset() does correctly use the
readx_poll_timeout_atomic variants, but the standard PTM flow uses regular
sleeping variants. This makes converting the mutex to a spinlock a bit
tricky.
Instead, re-order the locking in igc_ptp_reset. Acquire the mutex first,
and then the tmreg_lock spinlock. This is safe because there is no other
ordering dependency on these locks, as this is the only place where both
locks were acquired simultaneously. Indeed, any other flow acquiring locks
in that order would be wrong regardless.
Signed-off-by: Jacob Keller <[email protected]>
Fixes: 1a931c4f5e68 ("igc: add lock preventing multiple simultaneous PTM transactions")
Link: https://lore.kernel.org/intel-wired-lan/[email protected]/
Link: https://lore.kernel.org/intel-wired-lan/[email protected]/T/#u
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Vitaly Lifshits <[email protected]>
Tested-by: Mor Bar-Gabay <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before the referenced commit, the shutdown just called idpf_remove(),
this way IDPF_REMOVE_IN_PROG was protecting us from the serv_task
rescheduling reset. Without this flag set the shutdown process is
vulnerable to HW reset or any other triggering conditions (such as
default mailbox being destroyed).
When one of conditions checked in idpf_service_task becomes true,
vc_event_task can be rescheduled during shutdown, this leads to accessing
freed memory e.g. idpf_req_rel_vector_indexes() trying to read
vport->q_vector_idxs. This in turn causes the system to become defunct
during e.g. systemctl kexec.
Considering using IDPF_REMOVE_IN_PROG would lead to more heavy shutdown
process, instead just cancel the serv_task before cancelling
adapter->serv_task before cancelling adapter->vc_event_task to ensure that
reset will not be scheduled while we are doing a shutdown.
Fixes: 4c9106f4906a ("idpf: fix adapter NULL pointer dereference on reboot")
Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Larysa Zaremba <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Emil Tantilov <[email protected]>
Tested-by: Samuel Salin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case of failing on rss_data->rss_key allocation the function is
freeing vport without freeing earlier allocated q_vector_idxs. Fix it.
Move from freeing in error branch to goto scheme.
Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport")
Reviewed-by: Pavan Kumar Linga <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Suggested-by: Pavan Kumar Linga <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Samuel Salin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split offloads into csum, tso and other offloads so that tunneled
packets do not by default have all the offloads enabled.
Stateless offloads for encapsulated packets are not yet supported in
firmware/software but in the driver we were setting the features same as
non encapsulated features.
Fixed naming to clarify CSUM bits are being checked for Tx.
Inherit netdev features to VLAN interfaces as well.
Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
Reviewed-by: Sridhar Samudrala <[email protected]>
Signed-off-by: Madhu Chittim <[email protected]>
Tested-by: Zachary Goldstein <[email protected]>
Tested-by: Samuel Salin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
As mentioned in the commit baeb705fd6a7 ("ice: always check VF VSI
pointer values"), we need to perform a null pointer check on the return
value of ice_get_vf_vsi() before using it.
Fixes: 6ebbe97a4881 ("ice: Add a per-VF limit on number of FDIR filters")
Signed-off-by: Xuanqiang Luo <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Get Tx Topology AQ command (opcode 0x0418) has different read flag
requirements depending on the hardware/firmware. For E810, E822, and E823
firmware the read flag must be set, and for newer hardware (E825 and E830)
it must not be set.
This results in failure to configure Tx topology and the following warning
message during probe:
DDP package does not support Tx scheduling layers switching feature -
please update to the latest DDP package and try again
The current implementation only handles E825-C but not E830. It is
confusing as we first check ice_is_e825c() and then set the flag in the set
case. Finally, we check ice_is_e825c() again and set the flag for all other
hardware in both the set and get case.
Instead, notice that we always need the read flag for set, but only need
the read flag for get on E810, E822, and E823 firmware. Fix the logic to
check the MAC type and set the read flag in get only on the older devices
which require it.
Fixes: ba1124f58afd ("ice: Add E830 device IDs, MAC type and registers")
Signed-off-by: Paul Greenwalt <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Krishneil Singh <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a mutex around the PTM transaction to prevent multiple transactors
Multiple processes try to initiate a PTM transaction, one or all may
fail. This can be reproduced by running two instances of the
following:
$ sudo phc2sys -O 0 -i tsn0 -m
PHC2SYS exits with:
"ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
fails
Note: Normally two instance of PHC2SYS will not run, but one process
should not break another.
Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()")
Signed-off-by: Christopher S M Hall <[email protected]>
Reviewed-by: Corinna Vinschen <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Mor Bar-Gabay <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Make sure that the PTP module is cleaned up if the igc_probe() fails by
calling igc_ptp_stop() on exit.
Fixes: d89f88419f99 ("igc: Add skeletal frame for Intel(R) 2.5G Ethernet Controller support")
Signed-off-by: Christopher S M Hall <[email protected]>
Reviewed-by: Corinna Vinschen <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Mor Bar-Gabay <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
All functions in igc_ptp.c called from igc_main.c should check the
IGC_PTP_ENABLED flag. Adding check for this flag to stop and reset
functions.
Fixes: 5f2958052c58 ("igc: Add basic skeleton for PTP")
Signed-off-by: Christopher S M Hall <[email protected]>
Reviewed-by: Corinna Vinschen <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Mor Bar-Gabay <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move ktime_get_snapshot() into the loop. If a retry does occur, a more
recent snapshot will result in a more accurate cross-timestamp.
Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()")
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Mor Bar-Gabay <[email protected]>
Tested-by: Avigail Dahan <[email protected]>
Signed-off-by: Christopher S M Hall <[email protected]>
Reviewed-by: Corinna Vinschen <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The i225/i226 hardware retries if it receives an inappropriate response
from the upstream device. If the device retries too quickly, the root
port does not respond.
The wait between attempts was reduced from 10us to 1us in commit
6b8aa753a9f9 ("igc: Decrease PTM short interval from 10 us to 1 us"), which
said:
With the 10us interval, we were seeing PTM transactions take around
12us. Hardware team suggested this interval could be lowered to 1us
which was confirmed with PCIe sniffer. With the 1us interval, PTM
dialogs took around 2us.
While a 1us short cycle time was thought to be theoretically sufficient, it
turns out in practice it is not quite long enough. It is unclear if the
problem is in the root port or an issue in i225/i226.
Increase the wait from 1us to 4us. Increasing to 2us appeared to work in
practice on the setups we have available. A value of 4us was chosen due to
the limited hardware available for testing, with a goal of ensuring we wait
long enough without overly penalizing the response time when unnecessary.
The issue can be reproduced with the following:
$ sudo phc2sys -R 1000 -O 0 -i tsn0 -m
Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to
quickly reproduce the issue.
PHC2SYS exits with:
"ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
fails
Fixes: 6b8aa753a9f9 ("igc: Decrease PTM short interval from 10 us to 1 us")
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Mor Bar-Gabay <[email protected]>
Tested-by: Avigail Dahan <[email protected]>
Signed-off-by: Christopher S M Hall <[email protected]>
Reviewed-by: Corinna Vinschen <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Writing to clear the PTM status 'valid' bit while the PTM cycle is
triggered results in unreliable PTM operation. To fix this, clear the
PTM 'trigger' and status after each PTM transaction.
The issue can be reproduced with the following:
$ sudo phc2sys -R 1000 -O 0 -i tsn0 -m
Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to
quickly reproduce the issue.
PHC2SYS exits with:
"ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
fails
This patch also fixes a hang in igc_probe() when loading the igc
driver in the kdump kernel on systems supporting PTM.
The igc driver running in the base kernel enables PTM trigger in
igc_probe(). Therefore the driver is always in PTM trigger mode,
except in brief periods when manually triggering a PTM cycle.
When a crash occurs, the NIC is reset while PTM trigger is enabled.
Due to a hardware problem, the NIC is subsequently in a bad busmaster
state and doesn't handle register reads/writes. When running
igc_probe() in the kdump kernel, the first register access to a NIC
register hangs driver probing and ultimately breaks kdump.
With this patch, igc has PTM trigger disabled most of the time,
and the trigger is only enabled for very brief (10 - 100 us) periods
when manually triggering a PTM cycle. Chances that a crash occurs
during a PTM trigger are not 0, but extremely reduced.
Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()")
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Mor Bar-Gabay <[email protected]>
Tested-by: Avigail Dahan <[email protected]>
Signed-off-by: Christopher S M Hall <[email protected]>
Reviewed-by: Corinna Vinschen <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Corinna Vinschen <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SRIOV enabled, idpf ends up calling into idpf_remove() twice.
First via idpf_shutdown() and then again when idpf_remove() calls into
sriov_disable(), because the VF devices use the idpf driver, hence the
same remove routine. When that happens, it is possible for the adapter
to be NULL from the first call to idpf_remove(), leading to a NULL
pointer dereference.
echo 1 > /sys/class/net/<netif>/device/sriov_numvfs
reboot
BUG: kernel NULL pointer dereference, address: 0000000000000020
...
RIP: 0010:idpf_remove+0x22/0x1f0 [idpf]
...
? idpf_remove+0x22/0x1f0 [idpf]
? idpf_remove+0x1e4/0x1f0 [idpf]
pci_device_remove+0x3f/0xb0
device_release_driver_internal+0x19f/0x200
pci_stop_bus_device+0x6d/0x90
pci_stop_and_remove_bus_device+0x12/0x20
pci_iov_remove_virtfn+0xbe/0x120
sriov_disable+0x34/0xe0
idpf_sriov_configure+0x58/0x140 [idpf]
idpf_remove+0x1b9/0x1f0 [idpf]
idpf_shutdown+0x12/0x30 [idpf]
pci_device_shutdown+0x35/0x60
device_shutdown+0x156/0x200
...
Replace the direct idpf_remove() call in idpf_shutdown() with
idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform
the bulk of the cleanup, such as stopping the init task, freeing IRQs,
destroying the vports and freeing the mailbox. This avoids the calls to
sriov_disable() in addition to a small netdev cleanup, and destroying
workqueues, which don't seem to be required on shutdown.
Reported-by: Yuying Ma <[email protected]>
Fixes: e850efed5e15 ("idpf: add module register and probe functionality")
Reviewed-by: Madhu Chittim <[email protected]>
Signed-off-by: Emil Tantilov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Samuel Salin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The commit 23c0e5a16bcc ("ixgbe: Add link management support for E610
device") introduced incorrect media type detection for E610 device. It
reproduces when advertised speed is modified after driver reload. Clear
the previous outdated PHY type high value.
Reproduction steps:
modprobe ixgbe
ethtool -s eth0 advertise 0x1000000000000
modprobe -r ixgbe
modprobe ixgbe
ethtool -s eth0 advertise 0x1000000000000
Result before the fix:
netlink error: link settings update failed
netlink error: Invalid argument
Result after the fix:
No output error
Fixes: 23c0e5a16bcc ("ixgbe: Add link management support for E610 device")
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Signed-off-by: Piotr Kwapulinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Bharath R <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Starting from Meteor Lake, the Kumeran interface between the integrated
MAC and the I219 PHY works at a different frequency. This causes sporadic
MDI errors when accessing the PHY, and in rare circumstances could lead
to packet corruption.
To overcome this, introduce minor changes to the Kumeran idle
state (K1) parameters during device initialization. Hardware reset
reverts this configuration, therefore it needs to be applied in a few
places.
Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
Signed-off-by: Vitaly Lifshits <[email protected]>
Tested-by: Avigail Dahan <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes TX frame drops in AF_XDP zero copy mode when budget < 4.
xsk_tx_peek_desc() consumed TX frame and it was ignored because of
low budget. Not even AF_XDP completion was done for dropped frames.
It can be reproduced on i226 by sending 100000x 60 B frames with
launch time set to minimal IPG (672 ns between starts of frames)
on 1Gbit/s. Always 1026 frames are not sent and are missing a
completion.
Fixes: 9acf59a752d4c ("igc: Enable TX via AF_XDP zero-copy")
Signed-off-by: Zdenek Bouska <[email protected]>
Reviewed-by: Song Yoong Siang <[email protected]>
Reviewed-by: Florian Bezdeka <[email protected]>
Tested-by: Mor Bar-Gabay <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit b65969856d4f ("igc: Link queues to NAPI instances"), the XSK
queues were incorrectly unmapped from their NAPI instances. After
discussion on the mailing list and the introduction of a test to codify
the expected behavior, we can see that the unmapping causes the
check_xsk test to fail:
NETIF=enp86s0 ./tools/testing/selftests/drivers/net/queues.py
[...]
# Check| ksft_eq(q.get('xsk', None), {},
# Check failed None != {} xsk attr on queue we configured
not ok 4 queues.check_xsk
After this commit, the test passes:
ok 4 queues.check_xsk
Note that the test itself is only in net-next, so I tested this change
by applying it to my local net-next tree, booting, and running the test.
Cc: [email protected]
Fixes: b65969856d4f ("igc: Link queues to NAPI instances")
Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Gerhard Engleder <[email protected]>
Tested-by: Mor Bar-Gabay <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Continue Netlink conversions to per-namespace RTNL lock
(IPv4 routing, routing rules, routing next hops, ARP ioctls)
- Continue extending the use of netdev instance locks. As a driver
opt-in protect queue operations and (in due course) ethtool
operations with the instance lock and not RTNL lock.
- Support collecting TCP timestamps (data submitted, sent, acked) in
BPF, allowing for transparent (to the application) and lower
overhead tracking of TCP RPC performance.
- Tweak existing networking Rx zero-copy infra to support zero-copy
Rx via io_uring.
- Optimize MPTCP performance in single subflow mode by 29%.
- Enable GRO on packets which went thru XDP CPU redirect (were queued
for processing on a different CPU). Improving TCP stream
performance up to 2x.
- Improve performance of contended connect() by 200% by searching for
an available 4-tuple under RCU rather than a spin lock. Bring an
additional 229% improvement by tweaking hash distribution.
- Avoid unconditionally touching sk_tsflags on RX, improving
performance under UDP flood by as much as 10%.
- Avoid skb_clone() dance in ping_rcv() to improve performance under
ping flood.
- Avoid FIB lookup in netfilter if socket is available, 20% perf win.
- Rework network device creation (in-kernel) API to more clearly
identify network namespaces and their roles. There are up to 4
namespace roles but we used to have just 2 netns pointer arguments,
interpreted differently based on context.
- Use sysfs_break_active_protection() instead of trylock to avoid
deadlocks between unregistering objects and sysfs access.
- Add a new sysctl and sockopt for capping max retransmit timeout in
TCP.
- Support masking port and DSCP in routing rule matches.
- Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
- Support specifying at what time packet should be sent on AF_XDP
sockets.
- Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
users.
- Add Netlink YAML spec for WiFi (nl80211) and conntrack.
- Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
which only need to be exported when IPv6 support is built as a
module.
- Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
normal bridging.
- Allow users to specify source port range for GENEVE tunnels.
- netconsole: allow attaching kernel release, CPU ID and task name to
messages as metadata
Driver API:
- Continue rework / fixing of Energy Efficient Ethernet (EEE) across
the SW layers. Delegate the responsibilities to phylink where
possible. Improve its handling in phylib.
- Support symmetric OR-XOR RSS hashing algorithm.
- Support tracking and preserving IRQ affinity by NAPI itself.
- Support loopback mode speed selection for interface selftests.
Device drivers:
- Remove the IBM LCS driver for s390
- Remove the sb1000 cable modem driver
- Add support for SFP module access over SMBus
- Add MCTP transport driver for MCTP-over-USB
- Enable XDP metadata support in multiple drivers
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- add PCIe TLP Processing Hints (TPH) support for new AMD
platforms
- support dumping RoCE queue state for debug
- opt into instance locking
- Intel (100G, ice, idpf):
- ice: rework MSI-X IRQ management and distribution
- ice: support for E830 devices
- iavf: add support for Rx timestamping
- iavf: opt into instance locking
- nVidia/Mellanox:
- mlx4: use page pool memory allocator for Rx
- mlx5: support for one PTP device per hardware clock
- mlx5: support for 200Gbps per-lane link modes
- mlx5: move IPSec policy check after decryption
- AMD/Solarflare:
- support FW flashing via devlink
- Cisco (enic):
- use page pool memory allocator for Rx
- enable 32, 64 byte CQEs
- get max rx/tx ring size from the device
- Meta (fbnic):
- support flow steering and RSS configuration
- report queue stats
- support TCP segmentation
- support IRQ coalescing
- support ring size configuration
- Marvell/Cavium:
- support AF_XDP
- Wangxun:
- support for PTP clock and timestamping
- Huawei (hibmcge):
- checksum offload
- add more statistics
- Ethernet virtual:
- VirtIO net:
- aggressively suppress Tx completions, improve perf by 96%
with 1 CPU and 55% with 2 CPUs
- expose NAPI to IRQ mapping and persist NAPI settings
- Google (gve):
- support XDP in DQO RDA Queue Format
- opt into instance locking
- Microsoft vNIC:
- support BIG TCP
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- cleanup Tx and Tx clock setting and other link-focused
cleanups
- enable SGMII and 2500BASEX mode switching for Intel platforms
- support Sophgo SG2044
- Broadcom switches (b53):
- support for BCM53101
- TI:
- iep: add perout configuration support
- icssg: support XDP
- Cadence (macb):
- implement BQL
- Xilinx (axinet):
- support dynamic IRQ moderation and changing coalescing at
runtime
- implement BQL
- report standard stats
- MediaTek:
- support phylink managed EEE
- Intel:
- igc: don't restart the interface on every XDP program change
- RealTek (r8169):
- support reading registers of internal PHYs directly
- increase max jumbo packet size on RTL8125/RTL8126
- Airoha:
- support for RISC-V NPU packet processing unit
- enable scatter-gather and support MTU up to 9kB
- Tehuti (tn40xx):
- support cards with TN4010 MAC and an Aquantia AQR105 PHY
- Ethernet PHYs:
- support for TJA1102S, TJA1121
- dp83tg720: add randomized polling intervals for link detection
- dp83822: support changing the transmit amplitude voltage
- support for LEDs on 88q2xxx
- CAN:
- canxl: support Remote Request Substitution bit access
- flexcan: add S32G2/S32G3 SoC
- WiFi:
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
- batman-adv: add support for jumbo frames
- WiFi drivers:
- RealTek (rtw88):
- support RTL8814AE and RTL8814AU
- RealTek (rtw89):
- switch using wiphy_lock and wiphy_work
- add BB context to manipulate two PHY as preparation of MLO
- improve BT-coexistence mechanism to play A2DP smoothly
- Intel (iwlwifi):
- add new iwlmld sub-driver for latest HW/FW combinations
- MediaTek (mt76):
- preparation for mt7996 Multi-Link Operation (MLO) support
- Qualcomm/Atheros (ath12k):
- continued work on MLO
- Silabs (wfx):
- Wake-on-WLAN support
- Bluetooth:
- add support for skb TX SND/COMPLETION timestamping
- hci_core: enable buffer flow control for SCO/eSCO
- coredump: log devcd dumps into the monitor
- Bluetooth drivers:
- intel: add support to configure TX power
- nxp: handle bootloader error during cmd5 and cmd7"
* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
unix: fix up for "apparmor: add fine grained af_unix mediation"
mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
net: usb: asix: ax88772: Increase phy_name size
net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
net: libwx: fix Tx L4 checksum
net: libwx: fix Tx descriptor content for some tunnel packets
atm: Fix NULL pointer dereference
net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
net: phy: aquantia: add essential functions to aqr105 driver
net: phy: aquantia: search for firmware-name in fwnode
net: phy: aquantia: add probe function to aqr105 for firmware loading
net: phy: Add swnode support to mdiobus_scan
gve: add XDP DROP and PASS support for DQ
gve: update XDP allocation path support RX buffer posting
gve: merge packet buffer size fields
gve: update GQ RX to use buf_size
gve: introduce config-based allocation for XDP
gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
...
|
| | |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Merge in late fixes to prepare for the 6.15 net-next PR.
No conflicts, adjacent changes:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
919f9f497dbc ("eth: bnxt: fix out-of-range access of vnic_info array")
fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings")
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Current init logic ignores the error code from register_netdev(),
which will cause WARN_ON() on attempt to unregister it, if there was one,
and there is no info for the user that the creation of the netdev failed.
WARNING: CPU: 89 PID: 6902 at net/core/dev.c:11512 unregister_netdevice_many_notify+0x211/0x1a10
...
[ 3707.563641] unregister_netdev+0x1c/0x30
[ 3707.563656] idpf_vport_dealloc+0x5cf/0xce0 [idpf]
[ 3707.563684] idpf_deinit_task+0xef/0x160 [idpf]
[ 3707.563712] idpf_vc_core_deinit+0x84/0x320 [idpf]
[ 3707.563739] idpf_remove+0xbf/0x780 [idpf]
[ 3707.563769] pci_device_remove+0xab/0x1e0
[ 3707.563786] device_release_driver_internal+0x371/0x530
[ 3707.563803] driver_detach+0xbf/0x180
[ 3707.563816] bus_remove_driver+0x11b/0x2a0
[ 3707.563829] pci_unregister_driver+0x2a/0x250
Introduce an error check and log the vport number and error code.
On removal make sure to check VPORT_REG_NETDEV flag prior to calling
unregister and free on the netdev.
Add local variables for idx, vport_config and netdev for readability.
Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
Suggested-by: Tony Nguyen <[email protected]>
Signed-off-by: Emil Tantilov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Samuel Salin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix using the untrusted value of proto->raw.pkt_len in function
ice_vc_fdir_parse_raw() by verifying if it does not exceed the
VIRTCHNL_MAX_SIZE_RAW_PACKET value.
Fixes: 99f419df8a5c ("ice: enable FDIR filters from raw binary patterns for VFs")
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Mateusz Polchlopek <[email protected]>
Signed-off-by: Martyna Szapar-Mudlaw <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add missing validation of tc and queue id values sent by a VF in
ice_vc_cfg_q_bw().
Additionally fixed logged value in the warning message,
where max_tx_rate was incorrectly referenced instead of min_tx_rate.
Also correct error handling in this function by properly exiting
when invalid configuration is detected.
Fixes: 015307754a19 ("ice: Support VF queue rate limit and quanta size configuration")
Reviewed-by: Jedrzej Jagielski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Lukasz Czapnik <[email protected]>
Co-developed-by: Martyna Szapar-Mudlaw <[email protected]>
Signed-off-by: Martyna Szapar-Mudlaw <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add queue wraparound prevention in quanta configuration.
Ensure end_qid does not overflow by validating start_qid and num_queues.
Fixes: 015307754a19 ("ice: Support VF queue rate limit and quanta size configuration")
Reviewed-by: Jedrzej Jagielski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jan Glaza <[email protected]>
Signed-off-by: Martyna Szapar-Mudlaw <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Queue IDs can be up to 4096, fix invalid check to stop
truncating IDs to 8 bits.
Fixes: bf93bf791cec8 ("ice: introduce ice_virtchnl.c and ice_virtchnl.h")
Reviewed-by: Aleksandr Loktionov <[email protected]>
Reviewed-by: Jedrzej Jagielski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jan Glaza <[email protected]>
Signed-off-by: Martyna Szapar-Mudlaw <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the CONFIG_INFINIBAND_IRDMA symbol is not enabled as a module or a
built-in, then don't let the driver reserve resources for RDMA. The result
of this change is a large savings in resources for older kernels, and a
cleaner driver configuration for the IRDMA=n case for old and new kernels.
Implement this by avoiding enabling the RDMA capability when scanning
hardware capabilities.
Note: Loading the out-of-tree irdma driver in connection to the in-kernel
ice driver, is not supported, and should not be attempted, especially when
disabling IRDMA in the kernel config.
Fixes: d25a0fc41c1f ("ice: Initialize RDMA support")
Signed-off-by: Jesse Brandeburg <[email protected]>
Acked-by: Dave Ertman <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On E800 series hardware, if the start time for a periodic output signal is
programmed into GLTSYN_TGT_H and GLTSYN_TGT_L registers, the hardware logic
locks up and the periodic output signal never starts. Any future attempt to
reprogram the clock function is futile as the hardware will not reset until
a power on.
The ice_ptp_cfg_perout function has logic to prevent this, as it checks if
the requested start time is in the past. If so, a new start time is
calculated by rounding up.
Since commit d755a7e129a5 ("ice: Cache perout/extts requests and check
flags"), the rounding is done to the nearest multiple of the clock period,
rather than to a full second. This is more accurate, since it ensures the
signal matches the user request precisely.
Unfortunately, there is a race condition with this rounding logic. If the
current time is close to the multiple of the period, we could calculate a
target time that is extremely soon. It takes time for the software to
program the registers, during which time this requested start time could
become a start time in the past. If that happens, the periodic output
signal will lock up.
For large enough periods, or for the logic prior to the mentioned commit,
this is unlikely. However, with the new logic rounding to the period and
with a small enough period, this becomes inevitable.
For example, attempting to enable a 10MHz signal requires a period of 100
nanoseconds. This means in the *best* case, we have 99 nanoseconds to
program the clock output. This is essentially impossible, and thus such a
small period practically guarantees that the clock output function will
lock up.
To fix this, add some slop to the clock time used to check if the start
time is in the past. Because it is not critical that output signals start
immediately, but it *is* critical that we do not brick the function, 0.5
seconds is selected. This does mean that any requested output will be
delayed by at least 0.5 seconds.
This slop is applied before rounding, so that we always round up to the
nearest multiple of the period that is at least 0.5 seconds in the future,
ensuring a minimum of 0.5 seconds to program the clock output registers.
Finally, to ensure that the hardware registers programming the clock output
complete in a timely manner, add a write flush to the end of
ice_ptp_write_perout. This ensures we don't risk any issue with PCIe
transaction batching.
Strictly speaking, this fixes a race condition all the way back at the
initial implementation of periodic output programming, as it is
theoretically possible to trigger this bug even on the old logic when
always rounding to a full second. However, the window is narrow, and the
code has been refactored heavily since then, making a direct backport not
apply cleanly.
Fixes: d755a7e129a5 ("ice: Cache perout/extts requests and check flags")
Signed-off-by: Karol Kolacinski <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
GCC 7 is not as good as GCC 8+ in telling what is a compile-time
const, and thus could be used for static storage.
Fortunately keeping strings as const arrays is enough to make old
gcc happy.
Excerpt from the report:
My GCC is: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0.
CC [M] drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.o
drivers/net/ethernet/intel/ice/devlink/health.c:35:3: error: initializer element is not constant
ice_common_port_solutions, {ice_port_number_label}},
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/intel/ice/devlink/health.c:35:3: note: (near initialization for 'ice_health_status_lookup[0].solution')
drivers/net/ethernet/intel/ice/devlink/health.c:35:31: error: initializer element is not constant
ice_common_port_solutions, {ice_port_number_label}},
^~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/intel/ice/devlink/health.c:35:31: note: (near initialization for 'ice_health_status_lookup[0].data_label[0]')
drivers/net/ethernet/intel/ice/devlink/health.c:37:46: error: initializer element is not constant
"Change or replace the module or cable.", {ice_port_number_label}},
^~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/intel/ice/devlink/health.c:37:46: note: (near initialization for 'ice_health_status_lookup[1].data_label[0]')
drivers/net/ethernet/intel/ice/devlink/health.c:39:3: error: initializer element is not constant
ice_common_port_solutions, {ice_port_number_label}},
^~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 85d6164ec56d ("ice: add fw and port health reporters")
Reported-by: Qiuxu Zhuo <[email protected]>
Closes: https://lore.kernel.org/netdev/CY8PR11MB7134BF7A46D71E50D25FA7A989F72@CY8PR11MB7134.namprd11.prod.outlook.com
Reviewed-by: Michal Swiatkowski <[email protected]>
Suggested-by: Simon Horman <[email protected]>
Signed-off-by: Przemek Kitszel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
| | |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2025-03-24
1) Prevent setting high order sequence number bits input in
non-ESN mode. From Leon Romanovsky.
2) Support PMTU handling in tunnel mode for packet offload.
From Leon Romanovsky.
3) Make xfrm_state_lookup_byaddr lockless.
From Florian Westphal.
4) Remove unnecessary NULL check in xfrm_lookup_with_ifid().
From Dan Carpenter.
* tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Remove unnecessary NULL check in xfrm_lookup_with_ifid()
xfrm: state: make xfrm_state_lookup_byaddr lockless
xfrm: check for PMTU in tunnel mode for packet offload
xfrm: provide common xdo_dev_offload_ok callback implementation
xfrm: rely on XFRM offload
xfrm: simplify SA initialization routine
xfrm: delay initialization of offload path till its actually requested
xfrm: prevent high SEQ input in non-ESN mode
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Almost all drivers except bond and nsim had same check if device
can perform XFRM offload on that specific packet. The check was that
packet doesn't have IPv4 options and IPv6 extensions.
In NIC drivers, the IPv4 HELEN comparison was slightly different, but
the intent was to check for the same conditions. So let's chose more
strict variant as a common base.
Reviewed-by: Zhu Yanjun <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The igb_ptp_feature_enable_82580 function correctly checks that unknown
flags are not passed to the function. However, it does not actually check
PTP_RISING_EDGE or PTP_FALLING_EDGE when configuring the external timestamp
function.
The data sheet for the 82580 product says:
Upon a change in the input level of one of the SDP pins that was
configured to detect Time stamp events using the TSSDP register, a time
stamp of the system time is captured into one of the two auxiliary time
stamp registers (AUXSTMPL/H0 or AUXSTMPL/H1).
For example to define timestamping of events in the AUXSTMPL0 and
AUXSTMPH0 registers, Software should:
1. Set the TSSDP.AUX0_SDP_SEL field to select the SDP pin that detects
the level change and set the TSSDP.AUX0_TS_SDP_EN bit to 1.
2. Set the TSAUXC.EN_TS0 bit to 1 to enable timestamping
The same paragraph is in the i350 and i354 data sheets.
The wording implies that the time stamps are captured at any level change.
There does not appear to be any way to only timestamp one edge of the
signal.
Reject requests which do not set both PTP_RISING_EDGE and PTP_FALLING_EDGE
when operating under PTP_STRICT_FLAGS mode via PTP_EXTTS_REQUEST2.
Fixes: 38970eac41db ("igb: support EXTTS on 82580/i354/i350")
Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/20250312-jk-net-fixes-supported-extts-flags-v2-1-ea930ba82459@intel.com
Signed-off-by: Paolo Abeni <[email protected]>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
E610 NICs unlike the previous devices utilising ixgbe driver
are notified in the case of overheating by the FW ACI event.
In event of overheat when threshold is exceeded, FW suspends all
traffic and sends overtemp event to the driver. Then driver
logs appropriate message and disables the adapter instance.
The card remains in that state until the platform is rebooted.
This approach is a solution to the fact current version of the
E610 FW doesn't support reading thermal sensor data by the
SW. So give to user at least any info that overtemp event
has occurred, without interface disappearing from the OS
without any note.
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Mateusz Polchlopek <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Jeremiah Lokan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add PTP support for E610 adapter. The E610 is based on X550 and adds
firmware managed link, enhanced security capabilities and support for
updated server manageability. It does not introduce any new PTP features
compared to X550.
Reviewed-by: Milena Olech <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Piotr Kwapulinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Vadim Fedorenko <[email protected]>
Tested-by: Bharath R <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Minor PTP register refactor, including logical grouping E825C 1-step
timestamping registers. Remove unused register definitions
(PHY_REG_GPCS_BITSLIP, PHY_REG_REVISION).
Also, apply preferred GENMASK macro (instead of ICE_M) for register
fields definition affected by this patch.
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Signed-off-by: Grzegorz Nitka <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Simplify ice_phy_reg_info_eth56g struct definition to include base
address for the very first quad. Use base address info and 'step'
value to determine address for specific PHY quad.
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Signed-off-by: Grzegorz Nitka <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Refactor the code by changing ice_ptp_init_phc_eth56g function
name to ice_ptp_init_phc_e825, to be consistent with the naming pattern
for other devices.
Signed-off-by: Karol Kolacinski <[email protected]>
Signed-off-by: Grzegorz Nitka <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
E830 supports raw receive and generic transmit checksum offloads.
Raw receive checksum support is provided by hardware calculating the
checksum over the whole packet, regardless of type. The calculated
checksum is provided to driver in the Rx flex descriptor. Then the driver
assigns the checksum to skb->csum and sets skb->ip_summed to
CHECKSUM_COMPLETE.
Generic transmit checksum support is provided by hardware calculating the
checksum given two offsets: the start offset to begin checksum calculation,
and the offset to insert the calculated checksum in the packet. Support is
advertised to the stack using NETIF_F_HW_CSUM feature.
E830 has the following limitations when both generic transmit checksum
offload and TCP Segmentation Offload (TSO) are enabled:
1. Inner packet header modification is not supported. This restriction
includes the inability to alter TCP flags, such as the push flag. As a
result, this limitation can impact the receiver's ability to coalesce
packets, potentially degrading network throughput.
2. The Maximum Segment Size (MSS) is limited to 1023 bytes, which prevents
support of Maximum Transmission Unit (MTU) greater than 1063 bytes.
Therefore NETIF_F_HW_CSUM and NETIF_F_ALL_TSO features are mutually
exclusive. NETIF_F_HW_CSUM hardware feature support is indicated but is not
enabled by default. Instead, IP checksums and NETIF_F_ALL_TSO are the
defaults. Enforcement of mutual exclusivity of NETIF_F_HW_CSUM and
NETIF_F_ALL_TSO is done in ice_set_features(). Mutual exclusivity
of IP checksums and NETIF_F_HW_CSUM is handled by netdev_fix_features().
When NETIF_F_HW_CSUM is requested the provided skb->csum_start and
skb->csum_offset are passed to hardware in the Tx context descriptor
generic checksum (GCS) parameters. Hardware calculates the 1's complement
from skb->csum_start to the end of the packet, and inserts the result in
the packet at skb->csum_offset.
Co-developed-by: Alice Michael <[email protected]>
Signed-off-by: Alice Michael <[email protected]>
Co-developed-by: Eric Joyner <[email protected]>
Signed-off-by: Eric Joyner <[email protected]>
Signed-off-by: Paul Greenwalt <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
| | |\ \ \
| | | |/
| | |/|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Cross-merge networking fixes after downstream PR (net-6.14-rc6).
Conflicts:
tools/testing/selftests/drivers/net/ping.py
75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py")
de94e8697405 ("selftests: drv-net: store addresses in dict indexed by ipver")
https://lore.kernel.org/netdev/[email protected]/
net/core/devmem.c
a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()")
1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
https://lore.kernel.org/netdev/[email protected]/
Adjacent changes:
tools/testing/selftests/net/Makefile
6f50175ccad4 ("selftests: Add IPv6 link-local address generation tests for GRE devices.")
2e5584e0f913 ("selftests/net: expand cmsg_ipv6.sh with ipv4")
drivers/net/ethernet/broadcom/bnxt/bnxt.c
661958552eda ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic")
fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings")
Signed-off-by: Paolo Abeni <[email protected]>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Move the more esoteric helpers for netdev instance lock to
a dedicated header. This avoids growing netdevice.h to infinity
and makes rebuilding the kernel much faster (after touching
the header with the helpers).
The main netdev_lock() / netdev_unlock() functions are used
in static inlines in netdevice.h and will probably be used
most commonly, so keep them in netdevice.h.
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Introduce new dev_setup_tc for nft ndo_setup_tc paths.
Reviewed-by: Eric Dumazet <[email protected]>
Cc: Saeed Mahameed <[email protected]>
Signed-off-by: Stanislav Fomichev <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
For the drivers that use shaper API, switch to the mode where
core stack holds the netdev lock. This affects two drivers:
* iavf - already grabs netdev lock in ndo_open/ndo_stop, so mostly
remove these
* netdevsim - switch to _locked APIs to avoid deadlock
iavf_close diff is a bit confusing, the existing call looks like this:
iavf_close() {
netdev_lock()
..
netdev_unlock()
wait_event_timeout(down_waitqueue)
}
I change it to the following:
netdev_lock()
iavf_close() {
..
netdev_unlock()
wait_event_timeout(down_waitqueue)
netdev_lock() // reusing this lock call
}
netdev_unlock()
Since I'm reusing existing netdev_lock call, so it looks like I only
add netdev_unlock.
Cc: Saeed Mahameed <[email protected]>
Signed-off-by: Stanislav Fomichev <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Netlink error messages should not have a newline at the end of the
string.
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Gal Pressman <[email protected]>
Acked-by: Przemek Kitszel <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Cross-merge networking fixes after downstream PR (net-6.14-rc5).
Conflicts:
drivers/net/ethernet/cadence/macb_main.c
fa52f15c745c ("net: cadence: macb: Synchronize stats calculations")
75696dd0fd72 ("net: cadence: macb: Convert to get_stats64")
https://lore.kernel.org/[email protected]
Adjacent changes:
drivers/net/ethernet/intel/ice/ice_sriov.c
79990cf5e7ad ("ice: Fix deinitializing VF in error path")
a203163274a4 ("ice: simplify VF MSI-X managing")
net/ipv4/tcp.c
18912c520674 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace")
297d389e9e5b ("net: prefix devmem specific helpers")
net/mptcp/subflow.c
8668860b0ad3 ("mptcp: reset when MPTCP opts are dropped after join")
c3349a22c200 ("mptcp: consolidate subflow cleanup")
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Delete the driver CPU affinity info and use the core's napi config
instead.
Signed-off-by: Ahmed Zaki <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Delete the driver CPU affinity and aRFS rmap info, use the core's
API instead.
Signed-off-by: Ahmed Zaki <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We set the NAPI's IRQ number in ice_vsi_set_napi_queues(). Clear the
NAPI's IRQ in ice_vsi_clear_napi_queues().
Signed-off-by: Ahmed Zaki <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The linked series wants to add skb tx completion timestamps.
That needs a bit in skb_shared_info.tx_flags, but all are in use.
A per-skb bit is only needed for features that are configured on a
per packet basis. Per socket features can be read from sk->sk_tsflags.
Per packet tsflags can be set in sendmsg using cmsg, but only those in
SOF_TIMESTAMPING_TX_RECORD_MASK.
Per packet tsflags can also be set without cmsg by sandwiching a
send inbetween two setsockopts:
val |= SOF_TIMESTAMPING_$FEATURE;
setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
write(fd, buf, sz);
val &= ~SOF_TIMESTAMPING_$FEATURE;
setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
Changing a datapath test from skb_shinfo(skb)->tx_flags to
skb->sk->sk_tsflags can change behavior in that case, as the tx_flags
is written before the second setsockopt updates sk_tsflags.
Therefore, only bits can be reclaimed that cannot be set by cmsg and
are also highly unlikely to be used to target individual packets
otherwise.
Free up the bit currently used for SKBTX_HW_TSTAMP_USE_CYCLES. This
selects between clock and free running counter source for HW TX
timestamps. It is probable that all packets of the same socket will
always use the same source.
Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Willem de Bruijn <[email protected]>
Reviewed-by: Jason Xing <[email protected]>
Reviewed-by: Gerhard Engleder <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add an additional type of symmetric RSS hash type: OR-XOR.
The "Symmetric-OR-XOR" algorithm transforms the input as follows:
(SRC_IP | DST_IP, SRC_IP ^ DST_IP, SRC_PORT | DST_PORT, SRC_PORT ^ DST_PORT)
Change 'cap_rss_sym_xor_supported' to 'supported_input_xfrm', a bitmap
of supported RXH_XFRM_* types.
Reviewed-by: Cosmin Ratiu <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Gal Pressman <[email protected]>
Reviewed-by: Edward Cree <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-02-20
We've added 19 non-merge commits during the last 8 day(s) which contain
a total of 35 files changed, 1126 insertions(+), 53 deletions(-).
The main changes are:
1) Add TCP_RTO_MAX_MS support to bpf_set/getsockopt, from Jason Xing
2) Add network TX timestamping support to BPF sock_ops, from Jason Xing
3) Add TX metadata Launch Time support, from Song Yoong Siang
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
igc: Add launch time support to XDP ZC
igc: Refactor empty frame insertion for launch time support
net: stmmac: Add launch time support to XDP ZC
selftests/bpf: Add launch time request to xdp_hw_metadata
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add simple bpf tests in the tx path for timestamping feature
bpf: Support selective sampling for bpf timestamping
bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback
net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING
bpf: Disable unsafe helpers in TX timestamping callbacks
bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping
bpf: Add networking timestamping support to bpf_get/setsockopt()
selftests/bpf: Add rto max for bpf_setsockopt test
bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
metadata framework.
This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
on Intel I225-LM Ethernet controller. Below are the test steps and result.
Test 1: Send a single packet with the launch time set to 1 s in the future.
Test steps:
1. On the DUT, start the xdp_hw_metadata selftest application:
$ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
of the DUT.
Result:
When the launch time is set to 1 s in the future, the delta between the
launch time and the transmit hardware timestamp is 0.016 us, as shown in
printout of the xdp_hw_metadata application below.
0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
rx_hash: 0xE343384 with RSS type:0x1
HW RX-time: 1734578015467548904 (sec:1734578015.4675)
delta to User RX-time sec:0.0002 (183.103 usec)
XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
delta to User RX-time sec:0.0001 (80.309 usec)
No rx_vlan_tci or rx_vlan_proto, err=-95
0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
csum_start=34 csum_offset=6
HW RX-time: 1734578015467548904 (sec:1734578015.4675)
delta to HW Launch-time sec:1.0000 (1000000.000 usec)
0x562ff5dc8880: complete tx idx=4 addr=4018
HW Launch-time: 1734578016467548904 (sec:1734578016.4675)
delta to HW TX-complete-time sec:0.0000 (0.016 usec)
HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675)
delta to User TX-complete-time sec:0.0000
(32.546 usec)
XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
delta to User TX-complete-time sec:0.9999
(999929.768 usec)
HW RX-time: 1734578015467548904 (sec:1734578015.4675)
delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
0x562ff5dc8880: complete rx idx=132 addr=84110
Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
500 us in the future.
Test steps:
1. On the DUT, start the xdp_hw_metadata selftest application:
$ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
/dev/shm/result.log
2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
VLAN priority 1 to port 9091 of the DUT.
Result:
When the launch time is set to 500 us in the future, the average delta
between the launch time and the transmit hardware timestamp is 0.016 us,
as shown in the analysis of /dev/shm/result.log below. The XDP launch time
works correctly in sending 1000 packets continuously.
Min delta: 0.005 us
Avr delta: 0.016 us
Max delta: 0.031 us
Total packets forwarded: 1000
Signed-off-by: Song Yoong Siang <[email protected]>
Signed-off-by: Martin KaFai Lau <[email protected]>
Reviewed-by: Faizal Rahim <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Link: https://patch.msgid.link/[email protected]
|