aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/intel/e1000/e1000_main.c
Commit message (Collapse)AuthorAgeFilesLines
...
| * e1000: Do not overestimate descriptor counts in Tx pre-checkAlexander Duyck2016-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code path is capable of grossly overestimating the number of descriptors needed to transmit a new frame. This specifically occurs if the skb contains a number of 4K pages. The issue is that the logic for determining the descriptors needed is ((S) >> (X)) + 1. When X is 12 it means that we were indicating that we required 2 descriptors for each 4K page when we only needed one. This change corrects this by instead adding (1 << (X)) - 1 to the S value instead of adding 1 after the fact. This way we get an accurate descriptor needed count as we are essentially doing a DIV_ROUNDUP(). Reported-by: Ivan Suzdal <[email protected]> Signed-off-by: Alexander Duyck <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* | e1000: call ndo_stop() instead of dev_close() when running offline selftestStefan Assmann2016-04-061-4/+4
|/ | | | | | | | | | | | | | Calling dev_close() causes IFF_UP to be cleared which will remove the interfaces routes and some addresses. That's probably not what the user intended when running the offline selftest. Besides this does not happen if the interface is brought down before the test, so the current behaviour is inconsistent. Instead call the net_device_ops ndo_stop function directly and avoid touching IFF_UP at all. Signed-off-by: Stefan Assmann <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: fix a typo in the commentJean Sacren2015-12-131-1/+1
| | | | | | | | Use 'That' to replace 'The' so that the comment would make sense. Signed-off-by: Jean Sacren <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: clean up the checking logicJean Sacren2015-12-131-5/+5
| | | | | | | | | | | | | | | | | The checking logic needed some clean-up work, so we rewrite it by checking for break first. With that change in place, we can even move the second check for goto statement outside of the loop. As this is merely a cleanup, no functional change is involved. The questionable 'tmp != 0xFF' is intentionally left alone. Mark Rustad and Alexander Duyck contributed to this patch. CC: Mark Rustad <[email protected]> CC: Alex Duyck <[email protected]> Signed-off-by: Jean Sacren <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: Remove checkpatch coding style errorsJanusz Wolak2015-12-131-51/+65
| | | | | | Signed-off-by: Janusz Wolak <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: fix data race between tx_ring->next_to_cleanDmitriy Vyukov2015-12-131-1/+4
| | | | | | | | | | | | | | | | | e1000_clean_tx_irq cleans buffers and sets tx_ring->next_to_clean, then e1000_xmit_frame reuses the cleaned buffers. But there are no memory barriers when buffers gets recycled, so the recycled buffers can be corrupted. Use smp_store_release to update tx_ring->next_to_clean and smp_load_acquire to read tx_ring->next_to_clean to properly hand off buffers from e1000_clean_tx_irq to e1000_xmit_frame. The data race was found with KernelThreadSanitizer (KTSAN). Signed-off-by: Dmitry Vyukov <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* drivers/net/intel: use napi_complete_done()Jesse Brandeburg2015-10-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As per Eric Dumazet's previous patches: (see commit (24d2e4a50737) - tg3: use napi_complete_done()) Quoting verbatim: Using napi_complete_done() instead of napi_complete() allows us to use /sys/class/net/ethX/gro_flush_timeout GRO layer can aggregate more packets if the flush is delayed a bit, without having to set too big coalescing parameters that impact latencies. </end quote> Tested configuration: low latency via ethtool -C ethx adaptive-rx off rx-usecs 10 adaptive-tx off tx-usecs 15 workload: streaming rx using netperf TCP_MAERTS igb: MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo ... Interim result: 941.48 10^6bits/s over 1.000 seconds ending at 1440193171.589 Alignment Offset Bytes Bytes Recvs Bytes Sends Local Remote Local Remote Xfered Per Per Recv Send Recv Send Recv (avg) Send (avg) 8 8 0 0 1176930056 1475.36 797726 16384.00 71905 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo ... Interim result: 941.49 10^6bits/s over 0.997 seconds ending at 1440193142.763 Alignment Offset Bytes Bytes Recvs Bytes Sends Local Remote Local Remote Xfered Per Per Recv Send Recv Send Recv (avg) Send (avg) 8 8 0 0 1175182320 50476.00 23282 16384.00 71816 i40e: Hard to test because the traffic is incoming so fast (24Gb/s) that GRO always receives 87kB, even at the highest interrupt rate. Other drivers were only compile tested. Signed-off-by: Jesse Brandeburg <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: Replace e1000_free_frag with skb_free_fragAlexander Duyck2015-05-121-12/+7
| | | | | | Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Jeff Kirsher <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* e1000, e1000e: Use dma_rmb instead of rmb for descriptor read orderingAlexander Duyck2015-04-081-3/+3
| | | | | | | | | | This change replaces calls to rmb with dma_rmb in the case where we want to order all follow-on descriptor reads after the check for the descriptor status bit. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Jeff Kirsher <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* ethernet: codespell comment spelling fixesJoe Perches2015-03-091-1/+1
| | | | | | | | | | | | | | | To test a checkpatch spelling patch, I ran codespell against drivers/net/ethernet/. $ git ls-files drivers/net/ethernet/ | \ while read file ; do \ codespell -w $file; \ done I removed a false positive in e1000_hw.h Signed-off-by: Joe Perches <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* e1000: add dummy allocator to fix race condition between mtu change and netpollSabrina Dubroca2015-03-061-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race condition between e1000_change_mtu's cleanups and netpoll, when we change the MTU across jumbo size: Changing MTU frees all the rx buffers: e1000_change_mtu -> e1000_down -> e1000_clean_all_rx_rings -> e1000_clean_rx_ring Then, close to the end of e1000_change_mtu: pr_info -> ... -> netpoll_poll_dev -> e1000_clean -> e1000_clean_rx_irq -> e1000_alloc_rx_buffers -> e1000_alloc_frag And when we come back to do the rest of the MTU change: e1000_up -> e1000_configure -> e1000_configure_rx -> e1000_alloc_jumbo_rx_buffers alloc_jumbo finds the buffers already != NULL, since data (shared with page in e1000_rx_buffer->rxbuf) has been re-alloc'd, but it's garbage, or at least not what is expected when in jumbo state. This results in an unusable adapter (packets don't get through), and a NULL pointer dereference on the next call to e1000_clean_rx_ring (other mtu change, link down, shutdown): BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81194d6e>] put_compound_page+0x7e/0x330 [...] Call Trace: [<ffffffff81195445>] put_page+0x55/0x60 [<ffffffff815d9f44>] e1000_clean_rx_ring+0x134/0x200 [<ffffffff815da055>] e1000_clean_all_rx_rings+0x45/0x60 [<ffffffff815df5e0>] e1000_down+0x1c0/0x1d0 [<ffffffff811e2260>] ? deactivate_slab+0x7f0/0x840 [<ffffffff815e21bc>] e1000_change_mtu+0xdc/0x170 [<ffffffff81647050>] dev_set_mtu+0xa0/0x140 [<ffffffff81664218>] do_setlink+0x218/0xac0 [<ffffffff814459e9>] ? nla_parse+0xb9/0x120 [<ffffffff816652d0>] rtnl_newlink+0x6d0/0x890 [<ffffffff8104f000>] ? kvm_clock_read+0x20/0x40 [<ffffffff810a2068>] ? sched_clock_cpu+0xa8/0x100 [<ffffffff81663802>] rtnetlink_rcv_msg+0x92/0x260 By setting the allocator to a dummy version, netpoll can't mess up our rx buffers. The allocator is set back to a sane value in e1000_configure_rx. Fixes: edbbb3ca1077 ("e1000: implement jumbo receive with partial descriptors") Signed-off-by: Sabrina Dubroca <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: call netif_carrier_off early on downEliezer Tamir2015-03-061-1/+1
| | | | | | | | | | | | | | | | | When bringing down an interface netif_carrier_off() should be one the first things we do, since this will prevent the stack from queuing more packets to this interface. This operation is very fast, and should make the device behave much nicer when trying to bring down an interface under load. Also, this would Do The Right Thing (TM) if this device has some sort of fail-over teaming and redirect traffic to the other IF. Move netif_carrier_off as early as possible. Signed-off-by: Eliezer Tamir <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* net: e1000: support txtd update delay via xmit_moreFlorian Westphal2015-01-231-6/+9
| | | | | | | | | Don't update Tx tail descriptor if we queue hasn't been stopped and we know at least one more skb will be sent right away. Signed-off-by: Florian Westphal <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* net: rename vlan_tx_* helpers since "tx" is misleading thereJiri Pirko2015-01-131-2/+3
| | | | | | | The same macros are used for rx as well. So rename it. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* ethernet/intel: Use napi_alloc_skbAlexander Duyck2014-12-101-1/+1
| | | | | | | | | | | | | | This change replaces calls to netdev_alloc_skb_ip_align with napi_alloc_skb. The advantage of napi_alloc_skb is currently the fact that the page allocation doesn't make use of any irq disable calls. There are few spots where I couldn't replace the calls as the buffer allocation routine is called as a part of init which is outside of the softirq context. Cc: Jeff Kirsher <[email protected]> Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* ethernet/intel: Use eth_skb_pad and skb_put_padto helpersAlexander Duyck2014-12-091-6/+2
| | | | | | | | | | | | | | Update the Intel Ethernet drivers to use eth_skb_pad() and skb_put_padto instead of doing their own implementations of the function. Also this cleans up two other spots where skb_pad was called but the length and tail pointers were being manipulated directly instead of just having the padding length added via __skb_put. Cc: Jeff Kirsher <[email protected]> Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Jeff Kirsher <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* e1000: unset IFF_UNICAST_FLT on WMware 82545EMFrancesco Ruggeri2014-10-301-1/+4
| | | | | | | | | | | VMWare's e1000 implementation does not seem to support unicast filtering. This can be observed by configuring a macvlan interface on eth0 in a VM in VMWare Fusion 5.0.5, and trying to use that interface instead of eth0. Tested on 3.16. Signed-off-by: Francesco Ruggeri <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: switch to napi_gro_frags apiFlorian Westphal2014-09-121-17/+32
| | | | | | | | | | | | | | | napi_gro_frags allows skb re-use in case GRO can merge payload pages into an skb on the GRO lists. netperf TCP_STREAM, kvm-e1000 emulation, mtu 9k: Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec old: 87380 16384 16384 30.00 8985.78 new: 87380 16384 16384 30.00 9907.05 Signed-off-by: Florian Westphal <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: convert to build_skbFlorian Westphal2014-09-121-104/+112
| | | | | | | | | | | | | | | Instead of preallocating Rx skbs, allocate them right before sending inbound packet up the stack. e1000-kvm, mtu1500, netperf TCP_STREAM: Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec old: 87380 16384 16384 60.00 4532.40 new: 87380 16384 16384 60.00 4599.05 Signed-off-by: Florian Westphal <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: rename struct e1000_buffer to e1000_tx_bufferFlorian Westphal2014-09-121-11/+12
| | | | | | | | and remove *page, its only used for Rx. Signed-off-by: Florian Westphal <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: add and use e1000_rx_buffer info for RxFlorian Westphal2014-09-121-18/+17
| | | | | | | | | | | | | | | | e1000 uses the same metadata struct for Rx and Tx. But Tx and Rx have different requirements. For Rx, we only need to store a buffer and a DMA address. Follow-up patch will remove skb for Rx, bringing rx_buffer_info down to 16 bytes on x86_64. [ buffer_info is 48 bytes ] Signed-off-by: Florian Westphal <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: perform copybreak ahead of DMA unmapFlorian Westphal2014-09-121-30/+43
| | | | | | | | | Currently we unmap the DMA range, then copy to new skb. Change this so we can keep the mapping in case the data is copied. Signed-off-by: Florian Westphal <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: move tbi workaround code into helper functionFlorian Westphal2014-09-121-29/+32
| | | | | | | | Its the same in both handlers. Signed-off-by: Florian Westphal <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: move e1000_tbi_adjust_stats to where its usedFlorian Westphal2014-09-121-0/+77
| | | | | | | | ... and make it static. Signed-off-by: Florian Westphal <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: Fix TSO for non-accelerated vlan trafficVlad Yasevich2014-08-261-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This device claims TSO and checksum support for vlans. It also allows a user to control vlan acceleration offloading. As such, it is possible to turn off vlan acceleration and configure a vlan which will continue to support TSO. In such situation the packet passed down the the device will contain a vlan header and skb->protocol will be set to ETH_P_8021Q. The device assumes that skb->protocol contains network protocol value and uses that value to set up TSO and checksum information. This will results in corrupted frames sent on the wire. This patch extract the protocol value correctly and corrects TSO for non-accelerated traffic. CC: Jeff Kirsher <[email protected]> CC: Jesse Brandeburg <[email protected]> CC: Bruce Allan <[email protected]> CC: Carolyn Wyborny <[email protected]> CC: Don Skidmore <[email protected]> CC: Greg Rose <[email protected]> CC: Alex Duyck <[email protected]> CC: John Ronciak <[email protected]> CC: Mitch Williams <[email protected]> CC: Linux NICS <[email protected]> CC: [email protected] Signed-off-by: Vladislav Yasevich <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* PCI: Remove DEFINE_PCI_DEVICE_TABLE macro useBenoit Taine2014-08-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines. This issue was reported by checkpatch. A simplified version of the semantic patch that makes this change is as follows (http://coccinelle.lip6.fr/): // <smpl> @@ identifier i; declarer name DEFINE_PCI_DEVICE_TABLE; initializer z; @@ - DEFINE_PCI_DEVICE_TABLE(i) + const struct pci_device_id i[] = z; // </smpl> [bhelgaas: add semantic patch] Signed-off-by: Benoit Taine <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
* e1000: remove the check: skb->len<=0Yongjian Xu2014-06-041-5/+0
| | | | | | | | | There is no case skb->len would be 0 or 'negative'. Remove the check. Signed-off-by: Yongjian Xu <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: remove open-coded skb_cow_headFrancois Romieu2014-04-111-6/+5
| | | | | | | Signed-off-by: Francois Romieu <[email protected]> Cc: Jesse Brandeburg <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: fix possible reset_task running after adapter downVladimir Davydov2013-11-301-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | On e1000_down(), we should ensure every asynchronous work is canceled before proceeding. Since the watchdog_task can schedule other works apart from itself, it should be stopped first, but currently it is stopped after the reset_task. This can result in the following race leading to the reset_task running after the module unload: e1000_down_and_stop(): e1000_watchdog(): ---------------------- ----------------- cancel_work_sync(reset_task) schedule_work(reset_task) cancel_delayed_work_sync(watchdog_task) The patch moves cancel_delayed_work_sync(watchdog_task) at the beginning of e1000_down_and_stop() thus ensuring the race is impossible. Cc: Tushar Dave <[email protected]> Cc: Patrick McHardy <[email protected]> Signed-off-by: Vladimir Davydov <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: fix lockdep warning in e1000_reset_taskVladimir Davydov2013-11-301-33/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch fixes the following lockdep warning, which is 100% reproducible on network restart: ====================================================== [ INFO: possible circular locking dependency detected ] 3.12.0+ #47 Tainted: GF ------------------------------------------------------- kworker/1:1/27 is trying to acquire lock: ((&(&adapter->watchdog_task)->work)){+.+...}, at: [<ffffffff8108a5b0>] flush_work+0x0/0x70 but task is already holding lock: (&adapter->mutex){+.+...}, at: [<ffffffffa0177c0a>] e1000_reset_task+0x4a/0xa0 [e1000] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&adapter->mutex){+.+...}: [<ffffffff810bdb5d>] lock_acquire+0x9d/0x120 [<ffffffff816b8cbc>] mutex_lock_nested+0x4c/0x390 [<ffffffffa017233d>] e1000_watchdog+0x7d/0x5b0 [e1000] [<ffffffff8108b972>] process_one_work+0x1d2/0x510 [<ffffffff8108ca80>] worker_thread+0x120/0x3a0 [<ffffffff81092c1e>] kthread+0xee/0x110 [<ffffffff816c3d7c>] ret_from_fork+0x7c/0xb0 -> #0 ((&(&adapter->watchdog_task)->work)){+.+...}: [<ffffffff810bd9c0>] __lock_acquire+0x1710/0x1810 [<ffffffff810bdb5d>] lock_acquire+0x9d/0x120 [<ffffffff8108a5eb>] flush_work+0x3b/0x70 [<ffffffff8108b5d8>] __cancel_work_timer+0x98/0x140 [<ffffffff8108b693>] cancel_delayed_work_sync+0x13/0x20 [<ffffffffa0170cec>] e1000_down_and_stop+0x3c/0x60 [e1000] [<ffffffffa01775b1>] e1000_down+0x131/0x220 [e1000] [<ffffffffa0177c12>] e1000_reset_task+0x52/0xa0 [e1000] [<ffffffff8108b972>] process_one_work+0x1d2/0x510 [<ffffffff8108ca80>] worker_thread+0x120/0x3a0 [<ffffffff81092c1e>] kthread+0xee/0x110 [<ffffffff816c3d7c>] ret_from_fork+0x7c/0xb0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&adapter->mutex); lock((&(&adapter->watchdog_task)->work)); lock(&adapter->mutex); lock((&(&adapter->watchdog_task)->work)); *** DEADLOCK *** 3 locks held by kworker/1:1/27: #0: (events){.+.+.+}, at: [<ffffffff8108b906>] process_one_work+0x166/0x510 #1: ((&adapter->reset_task)){+.+...}, at: [<ffffffff8108b906>] process_one_work+0x166/0x510 #2: (&adapter->mutex){+.+...}, at: [<ffffffffa0177c0a>] e1000_reset_task+0x4a/0xa0 [e1000] stack backtrace: CPU: 1 PID: 27 Comm: kworker/1:1 Tainted: GF 3.12.0+ #47 Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS 0501 05/31/2007 Workqueue: events e1000_reset_task [e1000] ffffffff820f6000 ffff88007b9dba98 ffffffff816b54a2 0000000000000002 ffffffff820f5e50 ffff88007b9dbae8 ffffffff810ba936 ffff88007b9dbac8 ffff88007b9dbb48 ffff88007b9d8f00 ffff88007b9d8780 ffff88007b9d8f00 Call Trace: [<ffffffff816b54a2>] dump_stack+0x49/0x5f [<ffffffff810ba936>] print_circular_bug+0x216/0x310 [<ffffffff810bd9c0>] __lock_acquire+0x1710/0x1810 [<ffffffff8108a5b0>] ? __flush_work+0x250/0x250 [<ffffffff810bdb5d>] lock_acquire+0x9d/0x120 [<ffffffff8108a5b0>] ? __flush_work+0x250/0x250 [<ffffffff8108a5eb>] flush_work+0x3b/0x70 [<ffffffff8108a5b0>] ? __flush_work+0x250/0x250 [<ffffffff8108b5d8>] __cancel_work_timer+0x98/0x140 [<ffffffff8108b693>] cancel_delayed_work_sync+0x13/0x20 [<ffffffffa0170cec>] e1000_down_and_stop+0x3c/0x60 [e1000] [<ffffffffa01775b1>] e1000_down+0x131/0x220 [e1000] [<ffffffffa0177c12>] e1000_reset_task+0x52/0xa0 [e1000] [<ffffffff8108b972>] process_one_work+0x1d2/0x510 [<ffffffff8108b906>] ? process_one_work+0x166/0x510 [<ffffffff8108ca80>] worker_thread+0x120/0x3a0 [<ffffffff8108c960>] ? manage_workers+0x2c0/0x2c0 [<ffffffff81092c1e>] kthread+0xee/0x110 [<ffffffff81092b30>] ? __init_kthread_worker+0x70/0x70 [<ffffffff816c3d7c>] ret_from_fork+0x7c/0xb0 [<ffffffff81092b30>] ? __init_kthread_worker+0x70/0x70 == The issue background == The problem occurs, because e1000_down(), which is called under adapter->mutex by e1000_reset_task(), tries to synchronously cancel e1000 auxiliary works (reset_task, watchdog_task, phy_info_task, fifo_stall_task), which take adapter->mutex in their handlers. So the question is what does adapter->mutex protect there? The adapter->mutex was introduced by commit 0ef4ee ("e1000: convert to private mutex from rtnl") as a replacement for rtnl_lock() taken in the asynchronous handlers. It targeted on fixing a similar lockdep warning issued when e1000_down() was called under rtnl_lock(), and it fixed it, but unfortunately it introduced the lockdep warning described above. Anyway, that said the source of this bug is that the asynchronous works were made to take rtnl_lock() some time ago, so let's look deeper and find why it was added there. The rtnl_lock() was added to asynchronous handlers by commit 338c15 ("e1000: fix occasional panic on unload") in order to prevent asynchronous handlers from execution after the module is unloaded (e1000_down() is called) as it follows from the comment to the commit: > Net drivers in general have an issue where timers fired > by mod_timer or work threads with schedule_work are running > outside of the rtnl_lock. > > With no other lock protection these routines are vulnerable > to races with driver unload or reset paths. > > The longer term solution to this might be a redesign with > safer locks being taken in the driver to guarantee no > reentrance, but for now a safe and effective fix is > to take the rtnl_lock in these routines. I'm not sure if this locking scheme fixed the problem or just made it unlikely, although I incline to the latter. Anyway, this was long time ago when e1000 auxiliary works were implemented as timers scheduling real work handlers in their routines. The e1000_down() function only canceled the timers, but left the real handlers running if they were running, which could result in work execution after module unload. Today, the e1000 driver uses sane delayed works instead of the pair timer+work to implement its delayed asynchronous handlers, and the e1000_down() synchronously cancels all the works so that the problem that commit 338c15 tried to cope with disappeared, and we don't need any locks in the handlers any more. Moreover, any locking there can potentially result in a deadlock. So, this patch reverts commits 0ef4ee and 338c15. Fixes: 0ef4eedc2e98 ("e1000: convert to private mutex from rtnl") Fixes: 338c15e470d8 ("e1000: fix occasional panic on unload") Cc: Tushar Dave <[email protected]> Cc: Patrick McHardy <[email protected]> Signed-off-by: Vladimir Davydov <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* e1000: prevent oops when adapter is being closed and reset simultaneouslyyzhu12013-11-301-0/+9
| | | | | | | | | | | | | | | | This change is based on a similar change made to e1000e support in commit bb9e44d0d0f4 ("e1000e: prevent oops when adapter is being closed and reset simultaneously"). The same issue has also been observed on the older e1000 cards. Here, we have increased the RESET_COUNT value to 50 because there are too many accesses to e1000 nic on stress tests to e1000 nic, it is not enough to set RESET_COUT 25. Experimentation has shown that it is enough to set RESET_COUNT 50. Signed-off-by: yzhu1 <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* Merge branch 'for-linus-dma-masks' of git://git.linaro.org/people/rmk/linux-armLinus Torvalds2013-11-131-7/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull DMA mask updates from Russell King: "This series cleans up the handling of DMA masks in a lot of drivers, fixing some bugs as we go. Some of the more serious errors include: - drivers which only set their coherent DMA mask if the attempt to set the streaming mask fails. - drivers which test for a NULL dma mask pointer, and then set the dma mask pointer to a location in their module .data section - which will cause problems if the module is reloaded. To counter these, I have introduced two helper functions: - dma_set_mask_and_coherent() takes care of setting both the streaming and coherent masks at the same time, with the correct error handling as specified by the API. - dma_coerce_mask_and_coherent() which resolves the problem of drivers forcefully setting DMA masks. This is more a marker for future work to further clean these locations up - the code which creates the devices really should be initialising these, but to fix that in one go along with this change could potentially be very disruptive. The last thing this series does is prise away some of Linux's addition to "DMA addresses are physical addresses and RAM always starts at zero". We have ARM LPAE systems where all system memory is above 4GB physical, hence having DMA masks interpreted by (eg) the block layers as describing physical addresses in the range 0..DMAMASK fails on these platforms. Santosh Shilimkar addresses this in this series; the patches were copied to the appropriate people multiple times but were ignored. Fixing this also gets rid of some ARM weirdness in the setup of the max*pfn variables, and brings ARM into line with every other Linux architecture as far as those go" * 'for-linus-dma-masks' of git://git.linaro.org/people/rmk/linux-arm: (52 commits) ARM: 7805/1: mm: change max*pfn to include the physical offset of memory ARM: 7797/1: mmc: Use dma_max_pfn(dev) helper for bounce_limit calculations ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations ARM: 7795/1: mm: dma-mapping: Add dma_max_pfn(dev) helper function ARM: 7794/1: block: Rename parameter dma_mask to max_addr for blk_queue_bounce_limit() ARM: DMA-API: better handing of DMA masks for coherent allocations ARM: 7857/1: dma: imx-sdma: setup dma mask DMA-API: firmware/google/gsmi.c: avoid direct access to DMA masks DMA-API: dcdbas: update DMA mask handing DMA-API: dma: edma.c: no need to explicitly initialize DMA masks DMA-API: usb: musb: use platform_device_register_full() to avoid directly messing with dma masks DMA-API: crypto: remove last references to 'static struct device *dev' DMA-API: crypto: fix ixp4xx crypto platform device support DMA-API: others: use dma_set_coherent_mask() DMA-API: staging: use dma_set_coherent_mask() DMA-API: usb: use new dma_coerce_mask_and_coherent() DMA-API: usb: use dma_set_coherent_mask() DMA-API: parport: parport_pc.c: use dma_coerce_mask_and_coherent() DMA-API: net: octeon: use dma_coerce_mask_and_coherent() DMA-API: net: nxp/lpc_eth: use dma_coerce_mask_and_coherent() ...
| * DMA-API: net: intel/e1000: replace dma_set_mask()+dma_set_coherent_mask() ↵Russell King2013-09-211-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | with new helper Replace the following sequence: dma_set_mask(dev, mask); dma_set_coherent_mask(dev, mask); with a call to the new helper dma_set_mask_and_coherent(). Acked-by: Jeff Kirsher <[email protected]> Signed-off-by: Russell King <[email protected]>
* | e1000: fix wrong queue idx calculationHong Zhiguo2013-11-011-2/+1
|/ | | | | | | | | tx_ring and adapter->tx_ring are already of type "struct e1000_tx_ring *" Signed-off-by: Hong Zhiguo <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
* net: vlan: add protocol argument to packet tagging functionsPatrick McHardy2013-04-191-1/+1
| | | | | | | | | | Add a protocol argument to the VLAN packet tagging functions. In case of HW tagging, we need that protocol available in the ndo_start_xmit functions, so it is stored in a new field in the skb. The new field fits into a hole (on 64 bit) and doesn't increase the sks's size. Signed-off-by: Patrick McHardy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* net: vlan: prepare for 802.1ad VLAN filtering offloadPatrick McHardy2013-04-191-8/+14
| | | | | | | | | Change the rx_{add,kill}_vid callbacks to take a protocol argument in preparation of 802.1ad support. The protocol argument used so far is always htons(ETH_P_8021Q). Signed-off-by: Patrick McHardy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_*Patrick McHardy2013-04-191-8/+8
| | | | | | | | | | Rename the hardware VLAN acceleration features to include "CTAG" to indicate that they only support CTAGs. Follow up patches will introduce 802.1ad server provider tagging (STAGs) and require the distinction for hardware not supporting acclerating both. Signed-off-by: Patrick McHardy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* drivers:net: Remove dma_alloc_coherent OOM messagesJoe Perches2013-03-151-7/+0
| | | | | | | | | | | | | | | | | I believe these error messages are already logged on allocation failure by warn_alloc_failed and so get a dump_stack on OOM. Remove the unnecessary additional error logging. Around these deletions: o Alignment neatening. o Remove unnecessary casts of dma_alloc_coherent. o Hoist assigns from ifs. Signed-off-by: Joe Perches <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* e1000: fix whitespace issues and multi-line commentsJeff Kirsher2013-02-161-164/+158
| | | | | | | | | Fixes whitespace issues, such as lines exceeding 80 chars, needless blank lines and the use of spaces where tabs are needed. In addition, fix multi-line comments to align with the networking standard. Signed-off-by: Jeff Kirsher <[email protected]> Tested-by: Aaron Brown <[email protected]>
* drivers: net: Remove remaining alloc/OOM messagesJoe Perches2013-02-081-11/+3
| | | | | | | | | | | | | | | | | | | alloc failures already get standardized OOM messages and a dump_stack. For the affected mallocs around these OOM messages: Converted kmallocs with multiplies to kmalloc_array. Converted a kmalloc/memcpy to kmemdup. Removed now unused stack variables. Removed unnecessary parentheses. Neatened alignment. Signed-off-by: Joe Perches <[email protected]> Acked-by: Arend van Spriel <[email protected]> Acked-by: Marc Kleine-Budde <[email protected]> Acked-by: John W. Linville <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* remove init of dev->perm_addr in driversJiri Pirko2013-01-091-2/+1
| | | | | | | | perm_addr is initialized correctly in register_netdevice() so to init it in drivers is no longer needed. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* drivers/net: fix up function prototypes after __dev* removalsGreg Kroah-Hartman2012-12-071-2/+1
| | | | | | | | | | | | The __dev* removal patches for the network drivers ended up messing up the function prototypes for a bunch of drivers. This patch fixes all of them back up to be properly aligned. Bonus is that this almost removes 100 lines of code, always a nice surprise. Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* net/intel: remove __dev* attributesBill Pemberton2012-12-031-6/+6
| | | | | | | | | | | | | | | | | | | | | | CONFIG_HOTPLUG is going away as an option. As result the __dev* markings will be going away. Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst, and __devexit. Signed-off-by: Bill Pemberton <[email protected]> Cc: Jeff Kirsher <[email protected]> Cc: Jesse Brandeburg <[email protected]> Cc: Bruce Allan <[email protected]> Cc: Carolyn Wyborny <[email protected]> Cc: Don Skidmore <[email protected]> Cc: Greg Rose <[email protected]> Cc: Peter P Waskiewicz Jr <[email protected]> Cc: Alex Duyck <[email protected]> Cc: John Ronciak <[email protected]> Cc: Tushar Dave <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2012-10-021-0/+14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking changes from David Miller: 1) GRE now works over ipv6, from Dmitry Kozlov. 2) Make SCTP more network namespace aware, from Eric Biederman. 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko. 4) Make openvswitch network namespace aware, from Pravin B Shelar. 5) IPV6 NAT implementation, from Patrick McHardy. 6) Server side support for TCP Fast Open, from Jerry Chu and others. 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel Borkmann. 8) Increate the loopback default MTU to 64K, from Eric Dumazet. 9) Use a per-task rather than per-socket page fragment allocator for outgoing networking traffic. This benefits processes that have very many mostly idle sockets, which is quite common. From Eric Dumazet. 10) Use up to 32K for page fragment allocations, with fallbacks to smaller sizes when higher order page allocations fail. Benefits are a) less segments for driver to process b) less calls to page allocator c) less waste of space. From Eric Dumazet. 11) Allow GRO to be used on GRE tunnels, from Eric Dumazet. 12) VXLAN device driver, one way to handle VLAN issues such as the limitation of 4096 VLAN IDs yet still have some level of isolation. From Stephen Hemminger. 13) As usual there is a large boatload of driver changes, with the scale perhaps tilted towards the wireless side this time around. Fix up various fairly trivial conflicts, mostly caused by the user namespace changes. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits) hyperv: Add buffer for extended info after the RNDIS response message. hyperv: Report actual status in receive completion packet hyperv: Remove extra allocated space for recv_pkt_list elements hyperv: Fix page buffer handling in rndis_filter_send_request() hyperv: Fix the missing return value in rndis_filter_set_packet_filter() hyperv: Fix the max_xfer_size in RNDIS initialization vxlan: put UDP socket in correct namespace vxlan: Depend on CONFIG_INET sfc: Fix the reported priorities of different filter types sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP sfc: Fix loopback self-test with separate_tx_channels=1 sfc: Fix MCDI structure field lookup sfc: Add parentheses around use of bitfield macro arguments sfc: Fix null function pointer in efx_sriov_channel_type vxlan: virtual extensible lan igmp: export symbol ip_mc_leave_group netlink: add attributes to fdb interface tg3: unconditionally select HWMON support when tg3 is enabled. Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT" gre: fix sparse warning ...
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-09-281-0/+11
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/team/team.c drivers/net/usb/qmi_wwan.c net/batman-adv/bat_iv_ogm.c net/ipv4/fib_frontend.c net/ipv4/route.c net/l2tp/l2tp_netlink.c The team, fib_frontend, route, and l2tp_netlink conflicts were simply overlapping changes. qmi_wwan and bat_iv_ogm were of the "use HEAD" variety. With help from Antonio Quartulli. Signed-off-by: David S. Miller <[email protected]>
| * | e1000: add byte queue limitsOtto Estuardo Solares Cabrera2012-09-151-0/+10
| | | | | | | | | | | | | | | | | | Signed-off-by: Otto Estuardo Solares Cabrera <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
| * | e1000: configure and read MDI settingsJesse Brandeburg2012-08-211-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the implementation in e1000 to allow ethtool to force MDI state, allowing users to work around some improperly behaving switches. Forcing in this driver is for now only allowed when auto-neg is enabled. To use must have the matching version of ethtool app that supports this functionality. Signed-off-by: Jesse Brandeburg <[email protected]> CC: Tushar Dave <[email protected]> Tested-by: Aaron Brown [email protected] Signed-off-by: Jeff Kirsher <[email protected]>
* | | Merge tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds2012-10-011-1/+1
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull PCI changes from Bjorn Helgaas: "Host bridge hotplug - Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi) - Clear host bridge resource info to avoid issue when releasing (Yinghai Lu) - Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu) - Use standard list ops for acpi_pci_drivers (Jiang Liu) Device hotplug - Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang Liu) - Remove fakephp driver (Bjorn Helgaas) - Fix VGA ref count in hotplug remove path (Yinghai Lu) - Allow acpiphp to handle PCIe ports without native hotplug (Jiang Liu) - Implement resume regardless of pciehp_force param (Oliver Neukum) - Make pci_fixup_irqs() work after init (Thierry Reding) Miscellaneous - Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang) - Factor out PCI Express Capability accessors (Jiang Liu) - Add pcibios_window_alignment() so powerpc EEH can use generic resource assignment (Gavin Shan) - Make pci_error_handlers const (Stephen Hemminger) - Cleanup drivers/pci/remove.c (Bjorn Helgaas) - Improve Vendor-Specific Extended Capability support (Bjorn Helgaas) - Use standard list ops for bus->devices (Bjorn Helgaas) - Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang) - Reassign invalid bus number ranges (Intel DP43BF workaround) (Yinghai Lu)" * tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (102 commits) PCI: acpiphp: Handle PCIe ports without native hotplug capability PCI/ACPI: Use acpi_driver_data() rather than searching acpi_pci_roots PCI/ACPI: Protect acpi_pci_roots list with mutex PCI/ACPI: Use acpi_pci_root info rather than looking it up again PCI/ACPI: Pass acpi_pci_root to acpi_pci_drivers' add/remove interface PCI/ACPI: Protect acpi_pci_drivers list with mutex PCI/ACPI: Notify acpi_pci_drivers when hot-plugging PCI root bridges PCI/ACPI: Use normal list for struct acpi_pci_driver PCI/ACPI: Use DEVICE_ACPI_HANDLE rather than searching acpi_pci_roots PCI: Fix default vga ref_count ia64/PCI: Clear host bridge aperture struct resource x86/PCI: Clear host bridge aperture struct resource PCI: Stop all children first, before removing all children Revert "PCI: Use hotplug-safe pci_get_domain_bus_and_slot()" PCI: Provide a default pcibios_update_irq() PCI: Discard __init annotations for pci_fixup_irqs() and related functions PCI: Use correct type when freeing bus resource list PCI: Check P2P bridge for invalid secondary/subordinate range PCI: Convert "new_id"/"remove_id" into generic pci_bus driver attributes xen-pcifront: Use hotplug-safe pci_get_domain_bus_and_slot() ...
| * | netdev: make pci_error_handlers constStephen Hemminger2012-09-071-1/+1
| |/ | | | | | | Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
* / e1000: Small packets may get corrupted during padding by HWTushar Dave2012-09-181-0/+11
|/ | | | | | | | | | | On PCI/PCI-X HW, if packet size is less than ETH_ZLEN, packets may get corrupted during padding by HW. To WA this issue, pad all small packets manually. Signed-off-by: Tushar Dave <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]> Signed-off-by: David S. Miller <[email protected]>