diff options
| author | Eric Biggers <[email protected]> | 2025-07-06 23:11:00 +0000 |
|---|---|---|
| committer | Eric Biggers <[email protected]> | 2025-07-11 21:29:42 +0000 |
| commit | 9f65592b7e1f24569bb6ced064df5b4319f725ce (patch) | |
| tree | 52f0b37eeb16174beb4f22d2bed273d9a7d10e3d /lib/net_utils.c | |
| parent | lib/crypto: x86/poly1305: Fix register corruption in no-SIMD contexts (diff) | |
| download | kernel-9f65592b7e1f24569bb6ced064df5b4319f725ce.tar.gz kernel-9f65592b7e1f24569bb6ced064df5b4319f725ce.zip | |
lib/crypto: x86/poly1305: Fix performance regression on short messages
Restore the len >= 288 condition on using the AVX implementation, which
was incidentally removed by commit 318c53ae02f2 ("crypto: x86/poly1305 -
Add block-only interface"). This check took into account the overhead
in key power computation, kernel-mode "FPU", and tail handling
associated with the AVX code. Indeed, restoring this check slightly
improves performance for len < 256 as measured using poly1305_kunit on
an "AMD Ryzen AI 9 365" (Zen 5) CPU:
Length Before After
====== ========== ==========
1 30 MB/s 36 MB/s
16 516 MB/s 598 MB/s
64 1700 MB/s 1882 MB/s
127 2265 MB/s 2651 MB/s
128 2457 MB/s 2827 MB/s
200 2702 MB/s 3238 MB/s
256 3841 MB/s 3768 MB/s
511 4580 MB/s 4585 MB/s
512 5430 MB/s 5398 MB/s
1024 7268 MB/s 7305 MB/s
3173 8999 MB/s 8948 MB/s
4096 9942 MB/s 9921 MB/s
16384 10557 MB/s 10545 MB/s
While the optimal threshold for this CPU might be slightly lower than
288 (see the len == 256 case), other CPUs would need to be tested too,
and these sorts of benchmarks can underestimate the true cost of
kernel-mode "FPU". Therefore, for now just restore the 288 threshold.
Fixes: 318c53ae02f2 ("crypto: x86/poly1305 - Add block-only interface")
Cc: [email protected]
Reviewed-by: Ard Biesheuvel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Eric Biggers <[email protected]>
Diffstat (limited to 'lib/net_utils.c')
0 files changed, 0 insertions, 0 deletions
