aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/entry/common.c
Commit message (Collapse)AuthorAgeFilesLines
* x86/syscall: Move sys_ni_syscall()Brian Gerst2025-03-191-38/+0
| | | | | | | | | | | | | | | | Move sys_ni_syscall() to kernel/process.c, and remove the now empty entry/common.c No functional changes. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Sohil Mehta <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* x86/syscall/64: Move 64-bit syscall dispatch codeBrian Gerst2025-03-191-93/+0
| | | | | | | | | | | | | | | Move the 64-bit syscall dispatch code to syscall_64.c. No functional changes. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Sohil Mehta <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* x86/syscall/32: Move 32-bit syscall dispatch codeBrian Gerst2025-03-191-321/+0
| | | | | | | | | | | | | | | Move the 32-bit syscall dispatch code to syscall_32.c. No functional changes. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Sohil Mehta <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* x86/xen: Move Xen upcall handlerBrian Gerst2025-03-191-72/+0
| | | | | | | | | | | | | | | | Move the upcall handler to Xen-specific files. No functional changes. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Reviewed-by: Sohil Mehta <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* x86/entry: Add __init to ia32_emulation_override_cmdline()Vitaly Kuznetsov2025-03-191-1/+1
| | | | | | | | | | | | | | ia32_emulation_override_cmdline() is an early_param() arg and these are only needed at boot time. In fact, all other early_param() functions in arch/x86 seem to have '__init' annotation and ia32_emulation_override_cmdline() is the only exception. Fixes: a11e097504ac ("x86: Make IA32_EMULATION boot time configurable") Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: https://lore.kernel.org/all/20241210151650.1746022-1-vkuznets%40redhat.com
* x86/entry: Fix kernel-doc warningDaniel Sneddon2025-02-251-0/+1
| | | | | | | | | | | | | The do_int80_emulation() function is missing a kernel-doc formatted description of its argument. This is causing a warning when building with W=1. Add a brief description of the argument to satisfy kernel-doc. Reported-by: kernel test robot <[email protected]> Signed-off-by: Daniel Sneddon <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
* treewide: context_tracking: Rename CONTEXT_* into CT_STATE_*Valentin Schneider2024-07-291-1/+1
| | | | | | | | | | | | | Context tracking state related symbols currently use a mix of the CONTEXT_ (e.g. CONTEXT_KERNEL) and CT_SATE_ (e.g. CT_STATE_MASK) prefixes. Clean up the naming and make the ctx_state enum use the CT_STATE_ prefix. Suggested-by: Frederic Weisbecker <[email protected]> Signed-off-by: Valentin Schneider <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]>
* x86/fred: Fix INT80 emulation for FREDXin Li (Intel)2024-04-181-0/+65
| | | | | | | | | | | | | | | Add a FRED-specific INT80 handler and document why it differs from the current one. Eventually, the common bits will be unified once FRED hw is available and it turns out that no further changes are needed but for now, keep the handlers separate for everyone's sanity's sake. [ bp: Zap duplicated commit message, massage. ] Fixes: 55617fb991df ("x86/entry: Do not allow external 0x80 interrupts") Suggested-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* x86/bhi: Add support for clearing branch history at syscall entryPawan Gupta2024-04-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Branch History Injection (BHI) attacks may allow a malicious application to influence indirect branch prediction in kernel by poisoning the branch history. eIBRS isolates indirect branch targets in ring0. The BHB can still influence the choice of indirect branch predictor entry, and although branch predictor entries are isolated between modes when eIBRS is enabled, the BHB itself is not isolated between modes. Alder Lake and new processors supports a hardware control BHI_DIS_S to mitigate BHI. For older processors Intel has released a software sequence to clear the branch history on parts that don't support BHI_DIS_S. Add support to execute the software sequence at syscall entry and VMexit to overwrite the branch history. For now, branch history is not cleared at interrupt entry, as malicious applications are not believed to have sufficient control over the registers, since previous register state is cleared at interrupt entry. Researchers continue to poke at this area and it may become necessary to clear at interrupt entry as well in the future. This mitigation is only defined here. It is enabled later. Signed-off-by: Pawan Gupta <[email protected]> Co-developed-by: Daniel Sneddon <[email protected]> Signed-off-by: Daniel Sneddon <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexandre Chartre <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]>
* x86/syscall: Don't force use of indirect calls for system callsLinus Torvalds2024-04-081-3/+3
| | | | | | | | | | | | | | | | | | | | Make <asm/syscall.h> build a switch statement instead, and the compiler can either decide to generate an indirect jump, or - more likely these days due to mitigations - just a series of conditional branches. Yes, the conditional branches also have branch prediction, but the branch prediction is much more controlled, in that it just causes speculatively running the wrong system call (harmless), rather than speculatively running possibly wrong random less controlled code gadgets. This doesn't mitigate other indirect calls, but the system call indirection is the first and most easily triggered case. Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Daniel Sneddon <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]>
* x86/entry: Do not allow external 0x80 interruptsThomas Gleixner2023-12-071-1/+36
| | | | | | | | | | | | | | | | | | | | | | The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The kernel expects to receive a software interrupt as a result of the INT 0x80 instruction. However, an external interrupt on the same vector also triggers the same codepath. An external interrupt on vector 0x80 will currently be interpreted as a 32-bit system call, and assuming that it was a user context. Panic on external interrupts on the vector. To distinguish software interrupts from external ones, the kernel checks the APIC ISR bit relevant to the 0x80 vector. For software interrupts, this bit will be 0. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Cc: <[email protected]> # v6.0+
* x86/entry: Convert INT 0x80 emulation to IDTENTRYThomas Gleixner2023-12-071-1/+57
| | | | | | | | | | | | | | | | | | | | | | | | | There is no real reason to have a separate ASM entry point implementation for the legacy INT 0x80 syscall emulation on 64-bit. IDTENTRY provides all the functionality needed with the only difference that it does not: - save the syscall number (AX) into pt_regs::orig_ax - set pt_regs::ax to -ENOSYS Both can be done safely in the C code of an IDTENTRY before invoking any of the syscall related functions which depend on this convention. Aside of ASM code reduction this prepares for detecting and handling a local APIC injected vector 0x80. [ kirill.shutemov: More verbose comments ] Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Cc: <[email protected]> # v6.0+
* x86/entry/32: Clean up syscall fast exit testsBrian Gerst2023-10-131-26/+22
| | | | | | | | | | | | | | | | | | | | Merge compat and native code and clarify comments. No change in functionality expected. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Uros Bizjak <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* x86/entry/64: Use TASK_SIZE_MAX for canonical RIP testBrian Gerst2023-10-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | Using shifts to determine if an address is canonical is difficult for the compiler to optimize when the virtual address width is variable (LA57 feature) without using inline assembly. Instead, compare RIP against TASK_SIZE_MAX. The only user executable address outside of that range is the deprecated vsyscall page, which can fall back to using IRET. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Uros Bizjak <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* x86/entry/64: Convert SYSRET validation tests to CBrian Gerst2023-10-131-1/+42
| | | | | | | | | | | | | | | | | | No change in functionality expected. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Uros Bizjak <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* x86/entry/32: Remove SEP test for SYSEXITBrian Gerst2023-10-051-2/+1
| | | | | | | | | SEP must be already be present in order for do_fast_syscall_32() to be called on native 32-bit, so checking it again is unnecessary. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* x86/entry/32: Convert do_fast_syscall_32() to bool return typeBrian Gerst2023-10-051-5/+5
| | | | | | | | Doesn't have to be 'long' - this simplifies the code a bit. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* Merge tag 'v6.6-rc4' into x86/entry, to pick up fixesIngo Molnar2023-10-051-1/+1
|\ | | | | | | Signed-off-by: Ingo Molnar <[email protected]>
| * xen: simplify evtchn_do_upcall() call mazeJuergen Gross2023-09-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several functions involved for performing the functionality of evtchn_do_upcall(): - __xen_evtchn_do_upcall() doing the real work - xen_hvm_evtchn_do_upcall() just being a wrapper for __xen_evtchn_do_upcall(), exposed for external callers - xen_evtchn_do_upcall() calling __xen_evtchn_do_upcall(), too, but without any user Simplify this maze by: - removing the unused xen_evtchn_do_upcall() - removing xen_hvm_evtchn_do_upcall() as the only left caller of __xen_evtchn_do_upcall(), while renaming __xen_evtchn_do_upcall() to xen_evtchn_do_upcall() Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
* | x86: Make IA32_EMULATION boot time configurableNikolay Borisov2023-09-141-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Distributions would like to reduce their attack surface as much as possible but at the same time they'd want to retain flexibility to cater to a variety of legacy software. This stems from the conjecture that compat layer is likely rarely tested and could have latent security bugs. Ideally distributions will set their default policy and also give users the ability to override it as appropriate. To enable this use case, introduce CONFIG_IA32_EMULATION_DEFAULT_DISABLED compile time option, which controls whether 32bit processes/syscalls should be allowed or not. This option is aimed mainly at distributions to set their preferred default behavior in their kernels. To allow users to override the distro's policy, introduce the 'ia32_emulation' parameter which allows overriding CONFIG_IA32_EMULATION_DEFAULT_DISABLED state at boot time. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* | x86: Introduce ia32_enabled()Nikolay Borisov2023-09-141-0/+4
|/ | | | | | | | | | | | | | | | IA32 support on 64bit kernels depends on whether CONFIG_IA32_EMULATION is selected or not. As it is a compile time option it doesn't provide the flexibility to have distributions set their own policy for IA32 support and give the user the flexibility to override it. As a first step introduce ia32_enabled() which abstracts whether IA32 compat is turned on or off. Upcoming patches will implement the ability to set IA32 compat state at boot time. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* Merge tag 'x86-entry-2021-06-29' of ↵Linus Torvalds2021-06-291-26/+61
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry code related updates from Thomas Gleixner: - Consolidate the macros for .byte ... opcode sequences - Deduplicate register offset defines in include files - Simplify the ia32,x32 compat handling of the related syscall tables to get rid of #ifdeffery. - Clear all EFLAGS which are not required for syscall handling - Consolidate the syscall tables and switch the generation over to the generic shell script and remove the CFLAGS tweaks which are not longer required. - Use 'int' type for system call numbers to match the generic code. - Add more selftests for syscalls * tag 'x86-entry-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/syscalls: Don't adjust CFLAGS for syscall tables x86/syscalls: Remove -Wno-override-init for syscall tables x86/uml/syscalls: Remove array index from syscall initializers x86/syscalls: Clear 'offset' and 'prefix' in case they are set in env x86/entry: Use int everywhere for system call numbers x86/entry: Treat out of range and gap system calls the same x86/entry/64: Sign-extend system calls on entry to int selftests/x86/syscall: Add tests under ptrace to syscall_numbering_64 selftests/x86/syscall: Simplify message reporting in syscall_numbering selftests/x86/syscall: Update and extend syscall_numbering_64 x86/syscalls: Switch to generic syscallhdr.sh x86/syscalls: Use __NR_syscalls instead of __NR_syscall_max x86/unistd: Define X32_NR_syscalls only for 64-bit kernel x86/syscalls: Stop filling syscall arrays with *_sys_ni_syscall x86/syscalls: Switch to generic syscalltbl.sh x86/entry/x32: Rename __x32_compat_sys_* to __x64_compat_sys_*
| * x86/entry: Use int everywhere for system call numbersH. Peter Anvin (Intel)2021-05-251-28/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | System call numbers are defined as int, so use int everywhere for system call numbers. This is strictly a cleanup; it should not change anything user visible; all ABI changes have been done in the preceeding patches. [ tglx: Replaced the unsigned long cast ] Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
| * x86/entry: Treat out of range and gap system calls the sameH. Peter Anvin (Intel)2021-05-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current 64-bit system call entry code treats out-of-range system calls differently than system calls that map to a hole in the system call table. This is visible to the user if system calls are intercepted via ptrace or seccomp and the return value (regs->ax) is modified: in the former case, the return value is preserved, and in the latter case, sys_ni_syscall() is called and the return value is forced to -ENOSYS. The API spec in <asm-generic/syscalls.h> is very clear that only (int)-1 is the non-system-call sentinel value, so make the system call behavior consistent by calling sys_ni_syscall() for all invalid system call numbers except for -1. Although currently sys_ni_syscall() simply returns -ENOSYS, calling it explicitly is friendly for tracing and future possible extensions, and as this is an error path there is no reason to optimize it. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* | Merge tag 'x86-asm-2021-06-28' of ↵Linus Torvalds2021-06-281-1/+1
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: - Micro-optimize and standardize the do_syscall_64() calling convention - Make syscall entry flags clearing more conservative - Clean up syscall table handling - Clean up & standardize assembly macros, in preparation of FRED - Misc cleanups and fixes * tag 'x86-asm-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Make <asm/asm.h> valid on cross-builds as well x86/regs: Syscall_get_nr() returns -1 for a non-system call x86/entry: Split PUSH_AND_CLEAR_REGS into two submacros x86/syscall: Maximize MSR_SYSCALL_MASK x86/syscall: Unconditionally prototype {ia32,x32}_sys_call_table[] x86/entry: Reverse arguments to do_syscall_64() x86/entry: Unify definitions from <asm/calling.h> and <asm/ptrace-abi.h> x86/asm: Use _ASM_BYTES() in <asm/nops.h> x86/asm: Add _ASM_BYTES() macro for a .byte ... opcode sequence x86/asm: Have the __ASM_FORM macros handle commas in arguments
| * x86/entry: Reverse arguments to do_syscall_64()H. Peter Anvin (Intel)2021-05-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | Reverse the order of arguments to do_syscall_64() so that the first argument is the pt_regs pointer. This is not only consistent with *all* other entry points from assembly, but it actually makes the compiled code slightly better. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* | x86/xen: Fix noinstr fail in xen_pv_evtchn_do_upcall()Peter Zijlstra2021-06-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | Fix: vmlinux.o: warning: objtool: xen_pv_evtchn_do_upcall()+0x23: call to irq_enter_rcu() leaves .noinstr.text section Fixes: 359f01d1816f ("x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* | x86/entry: Fix noinstr fail in __do_fast_syscall_32()Peter Zijlstra2021-06-221-1/+1
|/ | | | | | | | | | | Fix: vmlinux.o: warning: objtool: __do_fast_syscall_32()+0xf5: call to trace_hardirqs_off() leaves .noinstr.text section Fixes: 5d5675df792f ("x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* x86/entry: Enable random_kstack_offset supportKees Cook2021-04-081-0/+3
| | | | | | | | | | | | | | Allow for a randomized stack offset on a per-syscall basis, with roughly 5-6 bits of entropy, depending on compiler and word size. Since the method of offsetting uses macros, this cannot live in the common entry code (the stack offset needs to be retained for the life of the syscall, which means it needs to happen at the actual entry point). Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscallsAndy Lutomirski2021-03-061-1/+2
| | | | | | | | | | | | | | | | | | | | | On a 32-bit fast syscall that fails to read its arguments from user memory, the kernel currently does syscall exit work but not syscall entry work. This confuses audit and ptrace. For example: $ ./tools/testing/selftests/x86/syscall_arg_fault_32 ... strace: pid 264258: entering, ptrace_syscall_info.op == 2 ... This is a minimal fix intended for ease of backporting. A more complete cleanup is coming. Fixes: 0b085e68f407 ("x86/entry: Consolidate 32/64 bit syscall entry") Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/8c82296ddf803b91f8d1e5eac89e5803ba54ab0e.1614884673.git.luto@kernel.org
* Merge branch 'x86/paravirt' into x86/entryIngo Molnar2021-02-121-3/+7
|\ | | | | | | | | | | | | | | | | | | Merge in the recent paravirt changes to resolve conflicts caused by objtool annotations. Conflicts: arch/x86/xen/xen-asm.S Signed-off-by: Ingo Molnar <[email protected]>
| * x86/entry: Fix noinstr failPeter Zijlstra2021-01-121-3/+7
| | | | | | | | | | | | | | | | | | | | | | vmlinux.o: warning: objtool: __do_fast_syscall_32()+0x47: call to syscall_enter_from_user_mode_work() leaves .noinstr.text section Fixes: 4facb95b7ada ("x86/entry: Unbreak 32bit fast syscall") Reported-by: Randy Dunlap <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* | x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcallThomas Gleixner2021-02-101-13/+6
| | | | | | | | | | | | | | | | | | | | | | | | To avoid yet another macro implementation reuse the existing run_sysvec_on_irqstack_cond() and move the set_irq_regs() handling into the called function. Makes the code even simpler. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* | x86/entry: Fix instrumentation annotationThomas Gleixner2021-02-101-1/+1
|/ | | | | | | | | | | | Embracing a callout into instrumentation_begin() / instrumentation_begin() does not really make sense. Make the latter instrumentation_end(). Fixes: 2f6474e4636b ("x86/entry: Switch XEN/PV hypercall entry to IDTENTRY") Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
* x86/entry: Move nmi entry/exit into common codeThomas Gleixner2020-11-041-34/+0
| | | | | | | | | | | | | | | | | Lockdep state handling on NMI enter and exit is nothing specific to X86. It's not any different on other architectures. Also the extra state type is not necessary, irqentry_state_t can carry the necessary information as well. Move it to common code and extend irqentry_state_t to carry lockdep state. [ Ira: Make exit_rcu and lockdep a union as they are mutually exclusive between the IRQ and NMI exceptions, and add kernel documentation for struct irqentry_state_t ] Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ira Weiny <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* x86/irq: Make run_on_irqstack_cond() typesafeThomas Gleixner2020-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | Sami reported that run_on_irqstack_cond() requires the caller to cast functions to mismatching types, which trips indirect call Control-Flow Integrity (CFI) in Clang. Instead of disabling CFI on that function, provide proper helpers for the three call variants. The actual ASM code stays the same as that is out of reach. [ bp: Fix __run_on_irqstack() prototype to match. ] Fixes: 931b94145981 ("x86/entry: Provide helpers for executing on the irqstack") Reported-by: Nathan Chancellor <[email protected]> Reported-by: Sami Tolvanen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Tested-by: Sami Tolvanen <[email protected]> Cc: <[email protected]> Link: https://github.com/ClangBuiltLinux/linux/issues/1052 Link: https://lkml.kernel.org/r/[email protected]
* x86/entry: Unbreak 32bit fast syscallThomas Gleixner2020-09-041-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andy reported that the syscall treacing for 32bit fast syscall fails: # ./tools/testing/selftests/x86/ptrace_syscall_32 ... [RUN] SYSEMU [FAIL] Initial args are wrong (nr=224, args=10 11 12 13 14 4289172732) ... [RUN] SYSCALL [FAIL] Initial args are wrong (nr=29, args=0 0 0 0 0 4289172732) The eason is that the conversion to generic entry code moved the retrieval of the sixth argument (EBP) after the point where the syscall entry work runs, i.e. ptrace, seccomp, audit... Unbreak it by providing a split up version of syscall_enter_from_user_mode(). - syscall_enter_from_user_mode_prepare() establishes state and enables interrupts - syscall_enter_from_user_mode_work() runs the entry work Replace the call to syscall_enter_from_user_mode() in the 32bit fast syscall C-entry with the split functions and stick the EBP retrieval between them. Fixes: 27d6b4d14f5c ("x86/entry: Use generic syscall entry function") Reported-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* Merge branch 'locking/nmi' into x86/entryIngo Molnar2020-07-261-0/+34
|\ | | | | | | | | | | | | | | | | | | Resolve conflicts with ongoing lockdep work that fixed the NMI entry code. Conflicts: arch/x86/entry/common.c arch/x86/include/asm/idtentry.h Signed-off-by: Ingo Molnar <[email protected]>
| * x86/entry: Fix NMI vs IRQ state trackingPeter Zijlstra2020-07-101-4/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | While the nmi_enter() users did trace_hardirqs_{off_prepare,on_finish}() there was no matching lockdep_hardirqs_*() calls to complete the picture. Introduce idtentry_{enter,exit}_nmi() to enable proper IRQ state tracking across the NMIs. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
* | x86/entry: Cleanup idtentry_enter/exitThomas Gleixner2020-07-241-3/+3
| | | | | | | | | | | | | | | | | | | | Remove the temporary defines and fixup all references. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
* | x86/entry: Use generic interrupt entry/exit codeThomas Gleixner2020-07-241-166/+1
| | | | | | | | | | | | | | | | | | | | Replace the x86 code with the generic variant. Use temporary defines for idtentry_* which will be cleaned up in the next step. Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
* | x86/entry: Use generic syscall exit functionalityThomas Gleixner2020-07-241-216/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the x86 variant with the generic version. Provide the relevant architecture specific helper functions and defines. Use a temporary define for idtentry_exit_user which will be cleaned up seperately. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
* | x86/entry: Use generic syscall entry functionThomas Gleixner2020-07-241-173/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the syscall entry work handling with the generic version. Provide the necessary helper inlines to handle the real architecture specific parts, e.g. ptrace. Use a temporary define for idtentry_enter_user which will be cleaned up seperately. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
* | x86/entry: Move user return notifier out of loopThomas Gleixner2020-07-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Guests and user space share certain MSRs. KVM sets these MSRs to guest values once and does not set them back to user space values on every VM exit to spare the costly MSR operations. User return notifiers ensure that these MSRs are set back to the correct values before returning to user space in exit_to_usermode_loop(). There is no reason to evaluate the TIF flag indicating that user return notifiers need to be invoked in the loop. The important point is that they are invoked before returning to user space. Move the invocation out of the loop into the section which does the last preperatory steps before returning to user space. That section is not preemptible and runs with interrupts disabled until the actual return. Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
* | x86/entry: Consolidate 32/64 bit syscall entryThomas Gleixner2020-07-241-52/+41
| | | | | | | | | | | | | | | | | | | | | | | | 64bit and 32bit entry code have the same open coded syscall entry handling after the bitwidth specific bits. Move it to a helper function and share the code. Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
* | x86/entry: Consolidate check_user_regs()Thomas Gleixner2020-07-241-15/+9
| | | | | | | | | | | | | | | | | | | | | | The user register sanity check is sprinkled all over the place. Move it into enter_from_user_mode(). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
* | Merge branch 'x86/urgent' into x86/entry to pick up upstream fixes.Thomas Gleixner2020-07-101-2/+2
|\ \ | |/ |/|
| * x86/entry/common: Make prepare_exit_to_usermode() staticThomas Gleixner2020-07-091-1/+1
| | | | | | | | | | | | | | | | | | No users outside this file anymore. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
| * x86/entry: Mark check_user_regs() noinstrThomas Gleixner2020-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | It's called from the non-instrumentable section. Fixes: c9c26150e61d ("x86/entry: Assert that syscalls are on the right stack") Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
* | x86/entry: Rename idtentry_enter/exit_cond_rcu() to idtentry_enter/exit()Andy Lutomirski2020-07-061-22/+28
|/ | | | | | | | | | | | | | | | They were originally called _cond_rcu because they were special versions with conditional RCU handling. Now they're the standard entry and exit path, so the _cond_rcu part is just confusing. Drop it. Also change the signature to make them more extensible and more foolproof. No functional change -- it's pure refactoring. Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/247fc67685263e0b673e1d7f808182d28ff80359.1593795633.git.luto@kernel.org