aboutsummaryrefslogtreecommitdiffstats
path: root/net/tls/tls_device.c
Commit message (Collapse)AuthorAgeFilesLines
* tcp: move icsk_clean_acked to a better locationEric Dumazet2025-03-241-4/+4
| | | | | | | | | | | | | | | | | | | | | As a followup of my presentation in Zagreb for netdev 0x19: icsk_clean_acked is only used by TCP when/if CONFIG_TLS_DEVICE is enabled from tcp_ack(). Rename it to tcp_clean_acked, move it to tcp_sock structure in the tcp_sock_read_rx for better cache locality in TCP fast path. Define this field only when CONFIG_TLS_DEVICE is enabled saving 8 bytes on configs not using it. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Neal Cardwell <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* tls: implement rekey for TLS1.3Sabrina Dubroca2024-12-161-1/+1
| | | | | | | | | | | | | | | This adds the possibility to change the key and IV when using TLS1.3. Changing the cipher or TLS version is not supported. Once we have updated the RX key, we can unblock the receive side. If the rekey fails, the context is unmodified and userspace is free to retry the update or close the socket. This change only affects tls_sw, since 1.3 offload isn't supported. Signed-off-by: Sabrina Dubroca <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* tcp: add a helper for setting EOR on tail skbJakub Kicinski2024-06-041-9/+2
| | | | | | | | | | | | | | | | | TLS (and hopefully soon PSP will) use EOR to prevent skbs with different decrypted state from getting merged, without adding new tests to the skb handling. In both cases once the connection switches to an "encrypted" state, all subsequent skbs will be encrypted, so a single "EOR fence" is sufficient to prevent mixing. Add a helper for setting the EOR bit, to make this arrangement more explicit. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
* net: move skb ref helpers to new headerMina Almasry2024-04-121-0/+1
| | | | | | | | | | | | | | | | Add a new header, linux/skbuff_ref.h, which contains all the skb_*_ref() helpers. Many of the consumers of skbuff.h do not actually use any of the skb ref helpers, and we can speed up compilation a bit by minimizing this header file. Additionally in the later patch in the series we add page_pool support to skb_frag_ref(), which requires some page_pool dependencies. We can now add these dependencies to skbuff_ref.h instead of a very ubiquitous skbuff.h Signed-off-by: Mina Almasry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* tls: don't reset prot->aad_size and prot->tail_size for TLS_HWSabrina Dubroca2023-10-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Prior to commit 1a074f7618e8 ("tls: also use init_prot_info in tls_set_device_offload"), setting TLS_HW on TX didn't touch prot->aad_size and prot->tail_size. They are set to 0 during context allocation (tls_prot_info is embedded in tls_context, kzalloc'd by tls_ctx_create). When the RX key is configured, tls_set_sw_offload is called (for both TLS_SW and TLS_HW). If the TX key is configured in TLS_HW mode after the RX key has been installed, init_prot_info will now overwrite the correct values of aad_size and tail_size, breaking SW decryption and causing -EBADMSG errors to be returned to userspace. Since TLS_HW doesn't use aad_size and tail_size at all (for TLS1.2, tail_size is always 0, and aad_size is equal to TLS_HEADER_SIZE + rec_seq_size), we can simply drop this hunk. Fixes: 1a074f7618e8 ("tls: also use init_prot_info in tls_set_device_offload") Signed-off-by: Sabrina Dubroca <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Tested-by: Ran Rozenstein <[email protected]> Link: https://lore.kernel.org/r/979d2f89a6a994d5bb49cae49a80be54150d094d.1697653889.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
* tls: use fixed size for tls_offload_context_{tx,rx}.driver_stateSabrina Dubroca2023-10-131-2/+2
| | | | | | | | | | driver_state is a flex array, but is always allocated by the tls core to a fixed size (TLS_DRIVER_STATE_SIZE_{TX,RX}). Simplify the code by making that size explicit so that sizeof(struct tls_offload_context_{tx,rx}) works. Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* tls: remove tls_context argument from tls_set_device_offloadSabrina Dubroca2023-10-131-7/+7
| | | | | | | | | | | It's not really needed since we end up refetching it as tls_ctx. We can also remove the NULL check, since we have already dereferenced ctx in do_tls_setsockopt_conf. While at it, fix up the reverse xmas tree ordering. Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* tls: remove tls_context argument from tls_set_sw_offloadSabrina Dubroca2023-10-131-1/+1
| | | | | | | | | It's not really needed since we end up refetching it as tls_ctx. We can also remove the NULL check, since we have already dereferenced ctx in do_tls_setsockopt_conf. Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* tls: add a helper to allocate/initialize offload_ctx_txSabrina Dubroca2023-10-131-14/+25
| | | | | | | Simplify tls_set_device_offload a bit. Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* tls: also use init_prot_info in tls_set_device_offloadSabrina Dubroca2023-10-131-10/+4
| | | | | | | | Most values are shared. Nonce size turns out to be equal to IV size for all offloadable ciphers. Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* tls: store iv directly within cipher_contextSabrina Dubroca2023-10-131-11/+2
| | | | | | | | TLS_MAX_IV_SIZE + TLS_MAX_SALT_SIZE is 20B, we don't get much benefit in cipher_context's size and can simplify the init code a bit. Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* tls: store rec_seq directly within cipher_contextSabrina Dubroca2023-10-131-9/+2
| | | | | | | TLS_MAX_REC_SEQ_SIZE is 8B, we don't get anything by using kmalloc. Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* tls: drop unnecessary cipher_type checks in tls offloadSabrina Dubroca2023-10-131-7/+1
| | | | | | | | | | We should never reach tls_device_reencrypt, tls_enc_record, or tls_enc_skb with a cipher_type that can't be offloaded. Replace those checks with a DEBUG_NET_WARN_ON_ONCE, and use cipher_desc instead of hard-coding offloadable cipher types. Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* tls: expand use of tls_cipher_desc in tls_set_device_offloadSabrina Dubroca2023-08-281-18/+4
| | | | | | | | | | | | | tls_set_device_offload is already getting iv and rec_seq sizes from tls_cipher_desc. We can now also check if the cipher_type coming from userspace is valid and can be offloaded. We can also remove the runtime check on rec_seq, since we validate it at compile time. Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/8ab71b8eca856c7aaf981a45fe91ac649eb0e2e9.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
* tls: rename tls_cipher_size_desc to tls_cipher_descSabrina Dubroca2023-08-281-17/+17
| | | | | | | | | We're going to add other fields to it to fully describe a cipher, so the "_size" name won't match the contents. Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/76ca6c7686bd6d1534dfa188fb0f1f6fabebc791.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
* tls: reduce size of tls_cipher_size_descSabrina Dubroca2023-08-281-2/+2
| | | | | | | | | | | | | | | | tls_cipher_size_desc indexes ciphers by their type, but we're not using indices 0..50 of the array. Each struct tls_cipher_size_desc is 20B, so that's a lot of unused memory. We can reindex the array starting at the lowest used cipher_type. Introduce the get_cipher_size_desc helper to find the right item and avoid out-of-bounds accesses, and make tls_cipher_size_desc's size explicit so that gcc reminds us to update TLS_CIPHER_MIN/MAX when we add a new cipher. Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/5e054e370e240247a5d37881a1cd93a67c15f4ca.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2023-08-101-31/+33
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/intel/igc/igc_main.c 06b412589eef ("igc: Add lock to safeguard global Qbv variables") d3750076d464 ("igc: Add TransmissionOverrun counter") drivers/net/ethernet/microsoft/mana/mana_en.c a7dfeda6fdec ("net: mana: Fix MANA VF unload when hardware is unresponsive") a9ca9f9ceff3 ("page_pool: split types and declarations from page_pool.h") 92272ec4107e ("eth: add missing xdp.h includes in drivers") net/mptcp/protocol.h 511b90e39250 ("mptcp: fix disconnect vs accept race") b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") tools/testing/selftests/net/mptcp/mptcp_join.sh c8c101ae390a ("selftests: mptcp: join: fix 'implicit EP' test") 03668c65d153 ("selftests: mptcp: join: rework detailed report") Signed-off-by: Jakub Kicinski <[email protected]>
| * net: tls: avoid discarding data on record closeJakub Kicinski2023-08-061-31/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TLS records end with a 16B tag. For TLS device offload we only need to make space for this tag in the stream, the device will generate and replace it with the actual calculated tag. Long time ago the code would just re-reference the head frag which mostly worked but was suboptimal because it prevented TCP from combining the record into a single skb frag. I'm not sure if it was correct as the first frag may be shorter than the tag. The commit under fixes tried to replace that with using the page frag and if the allocation failed rolling back the data, if record was long enough. It achieves better fragment coalescing but is also buggy. We don't roll back the iterator, so unless we're at the end of send we'll skip the data we designated as tag and start the next record as if the rollback never happened. There's also the possibility that the record was constructed with MSG_MORE and the data came from a different syscall and we already told the user space that we "got it". Allocate a single dummy page and use it as fallback. Found by code inspection, and proven by forcing allocation failures. Fixes: e7b159a48ba6 ("net/tls: remove the record tail optimization") Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* | net/tls: handle MSG_EOR for tls_device TX flowHannes Reinecke2023-07-281-1/+5
|/ | | | | | | | | | | | | tls_push_data() MSG_MORE, but bails out on MSG_EOR. Seeing that MSG_EOR is basically the opposite of MSG_MORE this patch adds handling MSG_EOR by treating it as the absence of MSG_MORE. Consequently we should return an error when both are set. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* net: Kill MSG_SENDPAGE_NOTLASTDavid Howells2023-06-241-2/+1
| | | | | | | | | | | | | | | | | | Now that ->sendpage() has been removed, MSG_SENDPAGE_NOTLAST can be cleaned up. Things were converted to use MSG_MORE instead, but the protocol sendpage stubs still convert MSG_SENDPAGE_NOTLAST to MSG_MORE, which is now unnecessary. Signed-off-by: David Howells <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)David Howells2023-06-241-17/+0
| | | | | | | | | | | | | | | | | | Remove ->sendpage() and ->sendpage_locked(). sendmsg() with MSG_SPLICE_PAGES should be used instead. This allows multiple pages and multipage folios to be passed through. Signed-off-by: David Howells <[email protected]> Acked-by: Marc Kleine-Budde <[email protected]> # for net/can cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usageDavid Howells2023-06-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | As MSG_SENDPAGE_NOTLAST is being phased out along with sendpage(), don't use it further in than the sendpage methods, but rather translate it to MSG_MORE and use that instead. Signed-off-by: David Howells <[email protected]> cc: Willem de Bruijn <[email protected]> cc: Bernard Metzler <[email protected]> cc: Jason Gunthorpe <[email protected]> cc: Leon Romanovsky <[email protected]> cc: John Fastabend <[email protected]> cc: Jakub Sitnicki <[email protected]> cc: David Ahern <[email protected]> cc: Karsten Graul <[email protected]> cc: Wenjia Zhang <[email protected]> cc: Jan Karcher <[email protected]> cc: "D. Wythe" <[email protected]> cc: Tony Lu <[email protected]> cc: Wen Gu <[email protected]> cc: Boris Pismenny <[email protected]> cc: Steffen Klassert <[email protected]> cc: Herbert Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* net: tls: make the offload check helper take skb not socketJakub Kicinski2023-06-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | All callers of tls_is_sk_tx_device_offloaded() currently do an equivalent of: if (skb->sk && tls_is_skb_tx_device_offloaded(skb->sk)) Have the helper accept skb and do the skb->sk check locally. Two drivers have local static inlines with similar wrappers already. While at it change the ifdef condition to TLS_DEVICE. Only TLS_DEVICE selects SOCK_VALIDATE_XMIT, so the two are equivalent. This makes removing the duplicated IS_ENABLED() check in funeth more obviously correct. Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Tariq Toukan <[email protected]> Acked-by: Dimitris Michailidis <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGESDavid Howells2023-06-091-69/+23
| | | | | | | | | | | | | | | | | | | Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. With that, the tls_iter_offset union is no longer necessary and can be replaced with an iov_iter pointer and the zc_page argument to tls_push_data() can also be removed. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <[email protected]> Acked-by: Jakub Kicinski <[email protected]> cc: Chuck Lever <[email protected]> cc: Boris Pismenny <[email protected]> cc: John Fastabend <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
* tls/device: Support MSG_SPLICE_PAGESDavid Howells2023-06-091-0/+26
| | | | | | | | | | | | | | | | | Make TLS's device sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> cc: Chuck Lever <[email protected]> cc: Boris Pismenny <[email protected]> cc: John Fastabend <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
* tls/device: Use splice_eof() to flushDavid Howells2023-06-091-0/+23
| | | | | | | | | | | | | | | | | | Allow splice to end a TLS record after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called TLS with a sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. Suggested-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> cc: Chuck Lever <[email protected]> cc: Boris Pismenny <[email protected]> cc: John Fastabend <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
* tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsgDavid Howells2023-06-091-1/+2
| | | | | | | | | | | | | | Allow MSG_SPLICE_PAGES to be specified to sendmsg() but treat it as normal sendmsg for now. This means the data will just be copied until MSG_SPLICE_PAGES is handled. Signed-off-by: David Howells <[email protected]> cc: Chuck Lever <[email protected]> cc: Boris Pismenny <[email protected]> cc: John Fastabend <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2023-05-261-13/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/raw.c 3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol") c85be08fc4fa ("raw: Stop using RTO_ONLINK.") https://lore.kernel.org/all/[email protected]/ Adjacent changes: drivers/net/ethernet/freescale/fec_main.c 9025944fddfe ("net: fec: add dma_wmb to ensure correct descriptor values") 144470c88c5d ("net: fec: using the standard return codes when xdp xmit errors") Signed-off-by: Jakub Kicinski <[email protected]>
| * tls: rx: strp: preserve decryption status of skbs when neededJakub Kicinski2023-05-191-13/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When receive buffer is small we try to copy out the data from TCP into a skb maintained by TLS to prevent connection from stalling. Unfortunately if a single record is made up of a mix of decrypted and non-decrypted skbs combining them into a single skb leads to loss of decryption status, resulting in decryption errors or data corruption. Similarly when trying to use TCP receive queue directly we need to make sure that all the skbs within the record have the same status. If we don't the mixed status will be detected correctly but we'll CoW the anchor, again collapsing it into a single paged skb without decrypted status preserved. So the "fixup" code will not know which parts of skb to re-encrypt. Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Shai Amiram <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
| * tls: rx: device: fix checking decryption statusJakub Kicinski2023-05-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb->len covers the entire skb, including the frag_list. In fact we're guaranteed that rxm->full_len <= skb->len, so since the change under Fixes we were not checking decrypt status of any skb but the first. Note that the skb_pagelen() added here may feel a bit costly, but it's removed by subsequent fixes, anyway. Reported-by: Tariq Toukan <[email protected]> Fixes: 86b259f6f888 ("tls: rx: device: bound the frag walk") Tested-by: Shai Amiram <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* | net: introduce and use skb_frag_fill_page_desc()Yunsheng Lin2023-05-131-6/+4
|/ | | | | | | | | | | | | | | | | | | | | | Most users use __skb_frag_set_page()/skb_frag_off_set()/ skb_frag_size_set() to fill the page desc for a skb frag. Introduce skb_frag_fill_page_desc() to do that. net/bpf/test_run.c does not call skb_frag_off_set() to set the offset, "copy_from_user(page_address(page), ...)" and 'shinfo' being part of the 'data' kzalloced in bpf_test_init() suggest that it is assuming offset to be initialized as zero, so call skb_frag_fill_page_desc() with offset being zero for this case. Also, skb_frag_set_page() is not used anymore, so remove it. Signed-off-by: Yunsheng Lin <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* net: tls: fix device-offloaded sendpage straddling recordsJakub Kicinski2023-03-061-0/+2
| | | | | | | | | | | | | | | Adrien reports that incorrect data is transmitted when a single page straddles multiple records. We would transmit the same data in all iterations of the loop. Reported-by: Adrien Moulin <[email protected]> Link: https://lore.kernel.org/all/[email protected] Fixes: c1318b39c7d3 ("tls: Add opt-in zerocopy mode of sendfile()") Tested-by: Adrien Moulin <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Acked-by: Maxim Mikityanskiy <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* use less confusing names for iov_iter direction initializersAl Viro2022-11-251-2/+2
| | | | | | | | | | | | | READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <[email protected]>
* net/tls: Support 256 bit keys with TX device offloadGal Pressman2022-09-231-0/+6
| | | | | | | | | Add the missing clause for 256 bit keys in tls_set_device_offload(), and the needed adjustments in tls_device_fallback.c. Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Gal Pressman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
* net/tls: Use cipher sizes structsGal Pressman2022-09-231-26/+29
| | | | | | | | | Use the newly introduced cipher sizes structs instead of the repeated switch cases churn. Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Gal Pressman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
* net/tls: Use RCU API to access tls_ctx->netdevMaxim Mikityanskiy2022-08-111-9/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, tls_device_down synchronizes with tls_device_resync_rx using RCU, however, the pointer to netdev is stored using WRITE_ONCE and loaded using READ_ONCE. Although such approach is technically correct (rcu_dereference is essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store NULL), using special RCU helpers for pointers is more valid, as it includes additional checks and might change the implementation transparently to the callers. Mark the netdev pointer as __rcu and use the correct RCU helpers to access it. For non-concurrent access pass the right conditions that guarantee safe access (locks taken, refcount value). Also use the correct helper in mlx5e, where even READ_ONCE was missing. The transition to RCU exposes existing issues, fixed by this commit: 1. bond_tls_device_xmit could read netdev twice, and it could become NULL the second time, after the NULL check passed. 2. Drivers shouldn't stop processing the last packet if tls_device_down just set netdev to NULL, before tls_dev_del was called. This prevents a possible packet drop when transitioning to the fallback software mode. Fixes: 89df6a810470 ("net/bonding: Implement TLS TX device offload") Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up") Signed-off-by: Maxim Mikityanskiy <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* tls: rx: device: bound the frag walkJakub Kicinski2022-08-111-1/+7
| | | | | | | | | | | | | | | | | | We can't do skb_walk_frags() on the input skbs, because the input skbs is really just a pointer to the tcp read queue. We need to bound the "is decrypted" check by the amount of data in the message. Note that the walk in tls_device_reencrypt() is after a CoW so the skb there is safe to walk. Actually in the current implementation it can't have frags at all, but whatever, maybe one day it will. Reported-by: Tariq Toukan <[email protected]> Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Ran Rozenstein <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* net/tls: Remove redundant workqueue flush before destroyTariq Toukan2022-08-011-1/+0
| | | | | | | | | destroy_workqueue() safely destroys the workqueue after draining it. No need for the explicit call to flush_workqueue(). Remove it. Signed-off-by: Tariq Toukan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* net/tls: Multi-threaded calls to TX tls_dev_delTariq Toukan2022-07-291-32/+31
| | | | | | | | | | | | | | | | | | | | | | | | Multiple TLS device-offloaded contexts can be added in parallel via concurrent calls to .tls_dev_add, while calls to .tls_dev_del are sequential in tls_device_gc_task. This is not a sustainable behavior. This creates a rate gap between add and del operations (addition rate outperforms the deletion rate). When running for enough time, the TLS device resources could get exhausted, failing to offload new connections. Replace the single-threaded garbage collector work with a per-context alternative, so they can be handled on several cores in parallel. Use a new dedicated destruct workqueue for this. Tested with mlx5 device: Before: 22141 add/sec, 103 del/sec After: 11684 add/sec, 11684 del/sec Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Maxim Mikityanskiy <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
* net/tls: Perform immediate device ctx cleanup when possibleTariq Toukan2022-07-291-8/+18
| | | | | | | | | | | | | | | TLS context destructor can be run in atomic context. Cleanup operations for device-offloaded contexts could require access and interaction with the device callbacks, which might sleep. Hence, the cleanup of such contexts must be deferred and completed inside an async work. For all others, this is not necessary, as cleanup is atomic. Invoke cleanup immediately for them, avoiding queueing redundant gc work. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Maxim Mikityanskiy <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2022-07-291-1/+6
|\ | | | | | | | | | | No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
| * net/tls: Remove the context from the list in tls_device_downMaxim Mikityanskiy2022-07-241-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tls_device_down takes a reference on all contexts it's going to move to the degraded state (software fallback). If sk_destruct runs afterwards, it can reduce the reference counter back to 1 and return early without destroying the context. Then tls_device_down will release the reference it took and call tls_device_free_ctx. However, the context will still stay in tls_device_down_list forever. The list will contain an item, memory for which is released, making a memory corruption possible. Fix the above bug by properly removing the context from all lists before any call to tls_device_free_ctx. Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down") Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* | tls: rx: device: add input CoW helperJakub Kicinski2022-07-261-10/+9
| | | | | | | | | | | | | | | | | | Wrap the remaining skb_cow_data() into a helper, so it's easier to replace down the lane. The new version will change the skb so make sure relevant pointers get reloaded after the call. Signed-off-by: Jakub Kicinski <[email protected]>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2022-07-211-3/+5
|\| | | | | | | | | | | No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
| * net/tls: Fix race in TLS device down flowTariq Toukan2022-07-181-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Socket destruction flow and tls_device_down function sync against each other using tls_device_lock and the context refcount, to guarantee the device resources are freed via tls_dev_del() by the end of tls_device_down. In the following unfortunate flow, this won't happen: - refcount is decreased to zero in tls_device_sk_destruct. - tls_device_down starts, skips the context as refcount is zero, going all the way until it flushes the gc work, and returns without freeing the device resources. - only then, tls_device_queue_ctx_destruction is called, queues the gc work and frees the context's device resources. Solve it by decreasing the refcount in the socket's destruction flow under the tls_device_lock, for perfect synchronization. This does not slow down the common likely destructor flow, in which both the refcount is decreased and the spinlock is acquired, anyway. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Reviewed-by: Maxim Mikityanskiy <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* | tls: rx: read the input skb from ctx->recv_pktJakub Kicinski2022-07-181-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Callers always pass ctx->recv_pkt into decrypt_skb_update(), and it propagates it to its callees. This may give someone the false impression that those functions can accept any valid skb containing a TLS record. That's not the case, the record sequence number is read from the context, and they can only take the next record coming out of the strp. Let the functions get the skb from the context instead of passing it in. This will also make it cleaner to return a different skb than ctx->recv_pkt as the decrypted one later on. Since we're touching the definition of decrypt_skb_update() use this as an opportunity to rename it. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2022-07-141-2/+2
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | include/net/sock.h 310731e2f161 ("net: Fix data-races around sysctl_mem.") e70f3c701276 ("Revert "net: set SK_MEM_QUANTUM to 4096"") https://lore.kernel.org/all/[email protected]/ net/ipv4/fib_semantics.c 747c14307214 ("ip: fix dflt addr selection for connected nexthop") d62607c3fe45 ("net: rename reference+tracking helpers") net/tls/tls.h include/net/tls.h 3d8c51b25a23 ("net/tls: Check for errors in tls_device_init") 587903142308 ("tls: create an internal header") Signed-off-by: Jakub Kicinski <[email protected]>
| * net/tls: Check for errors in tls_device_initTariq Toukan2022-07-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | Add missing error checks in tls_device_init. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Reported-by: Jakub Kicinski <[email protected]> Reviewed-by: Maxim Mikityanskiy <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
* | tls: create an internal headerJakub Kicinski2022-07-091-1/+2
|/ | | | | | | | | | | include/net/tls.h is getting a little long, and is probably hard for driver authors to navigate. Split out the internals into a header which will live under net/tls/. While at it move some static inlines with a single user into the source files, add a few tls_ prefixes and fix spelling of 'proccess'. Signed-off-by: Jakub Kicinski <[email protected]>
* tls: Add opt-in zerocopy mode of sendfile()Boris Pismenny2022-05-191-13/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TLS device offload copies sendfile data to a bounce buffer before transmitting. It allows to maintain the valid MAC on TLS records when the file contents change and a part of TLS record has to be retransmitted on TCP level. In many common use cases (like serving static files over HTTPS) the file contents are not changed on the fly. In many use cases breaking the connection is totally acceptable if the file is changed during transmission, because it would be received corrupted in any case. This commit allows to optimize performance for such use cases to providing a new optional mode of TLS sendfile(), in which the extra copy is skipped. Removing this copy improves performance significantly, as TLS and TCP sendfile perform the same operations, and the only overhead is TLS header/trailer insertion. The new mode can only be enabled with the new socket option named TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards compatibility with existing applications that rely on the copying behavior. The new mode is safe, meaning that unsolicited modifications of the file being sent can't break integrity of the kernel. The worst thing that can happen is sending a corrupted TLS record, which is in any case not forbidden when using regular TCP sockets. Sockets other than TLS device offload are not affected by the new socket option. The actual status of zerocopy sendfile can be queried with sock_diag. Performance numbers in a single-core test with 24 HTTPS streams on nginx, under 100% CPU load: * non-zerocopy: 33.6 Gbit/s * zerocopy: 79.92 Gbit/s CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz Signed-off-by: Boris Pismenny <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>