diff options
| author | Chuang Wang <[email protected]> | 2025-11-11 06:43:24 +0000 |
|---|---|---|
| committer | Jakub Kicinski <[email protected]> | 2025-11-12 14:46:36 +0000 |
| commit | ac1499fcd40fe06479e9b933347b837ccabc2a40 (patch) | |
| tree | 96d127b378858f16830f6c06c2f7013ed5fc271f /lib/mpi/mpi-mod.c | |
| parent | net_sched: limit try_bulk_dequeue_skb() batches (diff) | |
| download | kernel-ac1499fcd40fe06479e9b933347b837ccabc2a40.tar.gz kernel-ac1499fcd40fe06479e9b933347b837ccabc2a40.zip | |
ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe
The sit driver's packet transmission path calls: sit_tunnel_xmit() ->
update_or_create_fnhe(), which lead to fnhe_remove_oldest() being called
to delete entries exceeding FNHE_RECLAIM_DEPTH+random.
The race window is between fnhe_remove_oldest() selecting fnheX for
deletion and the subsequent kfree_rcu(). During this time, the
concurrent path's __mkroute_output() -> find_exception() can fetch the
soon-to-be-deleted fnheX, and rt_bind_exception() then binds it with a
new dst using a dst_hold(). When the original fnheX is freed via RCU,
the dst reference remains permanently leaked.
CPU 0 CPU 1
__mkroute_output()
find_exception() [fnheX]
update_or_create_fnhe()
fnhe_remove_oldest() [fnheX]
rt_bind_exception() [bind dst]
RCU callback [fnheX freed, dst leak]
This issue manifests as a device reference count leak and a warning in
dmesg when unregistering the net device:
unregister_netdevice: waiting for sitX to become free. Usage count = N
Ido Schimmel provided the simple test validation method [1].
The fix clears 'oldest->fnhe_daddr' before calling fnhe_flush_routes().
Since rt_bind_exception() checks this field, setting it to zero prevents
the stale fnhe from being reused and bound to a new dst just before it
is freed.
[1]
ip netns add ns1
ip -n ns1 link set dev lo up
ip -n ns1 address add 192.0.2.1/32 dev lo
ip -n ns1 link add name dummy1 up type dummy
ip -n ns1 route add 192.0.2.2/32 dev dummy1
ip -n ns1 link add name gretap1 up arp off type gretap \
local 192.0.2.1 remote 192.0.2.2
ip -n ns1 route add 198.51.0.0/16 dev gretap1
taskset -c 0 ip netns exec ns1 mausezahn gretap1 \
-A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q &
taskset -c 2 ip netns exec ns1 mausezahn gretap1 \
-A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q &
sleep 10
ip netns pids ns1 | xargs kill
ip netns del ns1
Cc: [email protected]
Fixes: 67d6d681e15b ("ipv4: make exception cache less predictible")
Signed-off-by: Chuang Wang <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Diffstat (limited to 'lib/mpi/mpi-mod.c')
0 files changed, 0 insertions, 0 deletions
