diff options
| author | Heming Zhao <[email protected]> | 2024-07-09 10:41:19 +0000 |
|---|---|---|
| committer | Song Liu <[email protected]> | 2024-07-12 01:30:17 +0000 |
| commit | fff42f213824fa434a4b6cf906b4331fe6e9302b (patch) | |
| tree | 8256583524f88dbb5ba545cefc8257ba44b06a41 /drivers/md/raid1.c | |
| parent | floppy: add missing MODULE_DESCRIPTION() macro (diff) | |
| download | kernel-fff42f213824fa434a4b6cf906b4331fe6e9302b.tar.gz kernel-fff42f213824fa434a4b6cf906b4331fe6e9302b.zip | |
md-cluster: fix hanging issue while a new disk adding
The commit 1bbe254e4336 ("md-cluster: check for timeout while a
new disk adding") is correct in terms of code syntax but not
suite real clustered code logic.
When a timeout occurs while adding a new disk, if recv_daemon()
bypasses the unlock for ack_lockres:CR, another node will be waiting
to grab EX lock. This will cause the cluster to hang indefinitely.
How to fix:
1. In dlm_lock_sync(), change the wait behaviour from forever to a
timeout, This could avoid the hanging issue when another node
fails to handle cluster msg. Another result of this change is
that if another node receives an unknown msg (e.g. a new msg_type),
the old code will hang, whereas the new code will timeout and fail.
This could help cluster_md handle new msg_type from different
nodes with different kernel/module versions (e.g. The user only
updates one leg's kernel and monitors the stability of the new
kernel).
2. The old code for __sendmsg() always returns 0 (success) under the
design (must successfully unlock ->message_lockres). This commit
makes this function return an error number when an error occurs.
Fixes: 1bbe254e4336 ("md-cluster: check for timeout while a new disk adding")
Signed-off-by: Heming Zhao <[email protected]>
Reviewed-by: Su Yue <[email protected]>
Acked-by: Yu Kuai <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Diffstat (limited to 'drivers/md/raid1.c')
0 files changed, 0 insertions, 0 deletions
