ACPI: APEI: handle synchronous exceptions in task work - kernel

diff options

author	Shuai Xue <[email protected]>	2025-07-14 11:42:12 +0000
committer	Rafael J. Wysocki <[email protected]>	2025-07-16 19:08:04 +0000
commit	c1f1fda141373d7253b4c1497043b0ef85f534ce (patch)
tree	e8f42ac2fb69e1e7ce44fc4f77d5fa8c4b905b18 /fs/btrfs/dev-replace.c
parent	ACPI: APEI: send SIGBUS to current task if synchronous memory error not recov... (diff)
download	kernel-c1f1fda141373d7253b4c1497043b0ef85f534ce.tar.gz kernel-c1f1fda141373d7253b4c1497043b0ef85f534ce.zip

ACPI: APEI: handle synchronous exceptions in task work

The memory uncorrected error could be signaled by asynchronous interrupt (specifically, SPI in arm64 platform), e.g. when an error is detected by a background scrubber, or signaled by synchronous exception (specifically, data abort exception in arm64 platform), e.g. when a CPU tries to access a poisoned cache line. Currently, both synchronous and asynchronous errors use memory_failure_queue() to schedule memory_failure() to exectute in a kworker context. As a result, when a user-space process is accessing a poisoned data, a data abort is taken and the memory_failure() is executed in the kworker context, which: - will send wrong si_code by SIGBUS signal in early_kill mode, and - can not kill the user-space in some cases resulting a synchronous error infinite loop Issue 1: send wrong si_code in early_kill mode Since commit a70297d22132 ("ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED could be used to determine whether a synchronous exception occurs on ARM64 platform. When a synchronous exception is detected, the kernel is expected to terminate the current process which has accessed a poisoned page. This is done by sending a SIGBUS signal with error code BUS_MCEERR_AR, indicating an action-required machine check error on read. However, when kill_proc() is called to terminate the processes who has the poisoned page mapped, it sends the incorrect SIGBUS error code BUS_MCEERR_AO because the context in which it operates is not the one where the error was triggered. To reproduce this problem: #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 5 addr 0xffffb0d75000 page not present Test passed The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error and it is not factually correct. After this change: # STEP1: enable early kill mode #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed The si_code (code 4) from einj_mem_uc indicates that it is a BUS_MCEERR_AR error as expected. Issue 2: a synchronous error infinite loop If a user-space process, e.g. devmem, accesses a poisoned page for which the HWPoison flag is set, kill_accessing_process() is called to send SIGBUS to current processs with error info. Since the memory_failure() is executed in the kworker context, it will just do nothing but return EFAULT. So, devmem will access the posioned page and trigger an exception again, resulting in a synchronous error infinite loop. Such exception loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error. To reproduce this problem: # STEP 1: inject an UCE error, and kernel will set HWPosion flag for related page #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed # STEP 2: access the same page and it will trigger a synchronous error infinite loop devmem 0x4092d55b400 To fix above two issues, queue memory_failure() as a task_work so that it runs in the context of the process that is actually consuming the poisoned data. Signed-off-by: Shuai Xue <[email protected]> Tested-by: Ma Wupeng <[email protected]> Reviewed-by: Kefeng Wang <[email protected]> Reviewed-by: Xiaofei Tan <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Jane Chu <[email protected]> Reviewed-by: Yazen Ghannam <[email protected]> Reviewed-by: Hanjun Guo <[email protected]> Link: https://patch.msgid.link/[email protected] [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <[email protected]>

Diffstat (limited to 'fs/btrfs/dev-replace.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: