aboutsummaryrefslogtreecommitdiffstats
path: root/io_uring/msg_ring.c
Commit message (Collapse)AuthorAgeFilesLines
* io_uring/msg_ring: kill alloc_cache for io_kiocb allocationsJens Axboe2025-09-181-22/+2
| | | | | | | | | | | | | | | | | | | | | | A recent commit: fc582cd26e88 ("io_uring/msg_ring: ensure io_kiocb freeing is deferred for RCU") fixed an issue with not deferring freeing of io_kiocb structs that msg_ring allocates to after the current RCU grace period. But this only covers requests that don't end up in the allocation cache. If a request goes into the alloc cache, it can get reused before it is sane to do so. A recent syzbot report would seem to indicate that there's something there, however it may very well just be because of the KASAN poisoning that the alloc_cache handles manually. Rather than attempt to make the alloc_cache sane for that use case, just drop the usage of the alloc_cache for msg_ring request payload data. Fixes: 50cf5f3842af ("io_uring/msg_ring: add an alloc cache for io_kiocb entries") Link: https://lore.kernel.org/io-uring/[email protected]/ Reported-by: [email protected] Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: ensure io_kiocb freeing is deferred for RCUJens Axboe2025-07-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reports that defer/local task_work adding via msg_ring can hit a request that has been freed: CPU: 1 UID: 0 PID: 19356 Comm: iou-wrk-19354 Not tainted 6.16.0-rc4-syzkaller-00108-g17bbde2e1716 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Call Trace: <TASK> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0xd2/0x2b0 mm/kasan/report.c:521 kasan_report+0x118/0x150 mm/kasan/report.c:634 io_req_local_work_add io_uring/io_uring.c:1184 [inline] __io_req_task_work_add+0x589/0x950 io_uring/io_uring.c:1252 io_msg_remote_post io_uring/msg_ring.c:103 [inline] io_msg_data_remote io_uring/msg_ring.c:133 [inline] __io_msg_ring_data+0x820/0xaa0 io_uring/msg_ring.c:151 io_msg_ring_data io_uring/msg_ring.c:173 [inline] io_msg_ring+0x134/0xa00 io_uring/msg_ring.c:314 __io_issue_sqe+0x17e/0x4b0 io_uring/io_uring.c:1739 io_issue_sqe+0x165/0xfd0 io_uring/io_uring.c:1762 io_wq_submit_work+0x6e9/0xb90 io_uring/io_uring.c:1874 io_worker_handle_work+0x7cd/0x1180 io_uring/io-wq.c:642 io_wq_worker+0x42f/0xeb0 io_uring/io-wq.c:696 ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> which is supposed to be safe with how requests are allocated. But msg ring requests alloc and free on their own, and hence must defer freeing to a sane time. Add an rcu_head and use kfree_rcu() in both spots where requests are freed. Only the one in io_msg_tw_complete() is strictly required as it has been visible on the other ring, but use it consistently in the other spot as well. This should not cause any other issues outside of KASAN rightfully complaining about it. Link: https://lore.kernel.org/io-uring/[email protected]/ Reported-by: [email protected] Cc: [email protected] Fixes: 0617bb500bfa ("io_uring/msg_ring: improve handling of target CQE posting") Signed-off-by: Jens Axboe <[email protected]>
* io_uring: finish IOU_OK -> IOU_COMPLETE transitionJens Axboe2025-05-211-1/+1
| | | | | | | | | | | IOU_COMPLETE is more descriptive, in that it explicitly says that the return value means "please post a completion for this request". This patch completes the transition from IOU_OK to IOU_COMPLETE, replacing existing IOU_OK users. This is a purely mechanical change. Signed-off-by: Jens Axboe <[email protected]>
* io_uring: don't pass ctx to tw add remote helperPavel Begunkov2025-03-281-1/+1
| | | | | | | | | | Unlike earlier versions, io_msg_remote_post() creates a valid request with a proper context, so don't pass a context to io_req_task_work_add_remote() explicitly but derive it from the request. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/721f51cf34996d98b48f0bfd24ad40aa2730167e.1743190078.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg: initialise msg request opcodePavel Begunkov2025-03-281-0/+1
| | | | | | | | | | It's risky to have msg request opcode set to garbage, so at least initialise it to nop. Later we might want to add a user inaccessible opcode for such cases. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/9afe650fcb348414a4529d89f52eb8969ba06efd.1743190078.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg: rename io_double_lock_ctx()Pavel Begunkov2025-03-281-4/+4
| | | | | | | | | io_double_lock_ctx() doesn't lock both rings. Rename it to prevent any future confusion. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/9e5defa000efd9b0f5e169cbb6bad4994d46ec5c.1743190078.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
* io_uring: introduce type alias for io_tw_stateCaleb Sander Mateos2025-02-171-1/+1
| | | | | | | | | | | | | In preparation for changing how io_tw_state is passed, introduce a type alias io_tw_token_t for struct io_tw_state *. This allows for changing the representation in one place, without having to update the many functions that just forward their struct io_tw_state * argument. Also add a comment to struct io_tw_state to explain its purpose. Signed-off-by: Caleb Sander Mateos <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: don't leave potentially dangling ->tctx pointerJens Axboe2025-01-231-2/+2
| | | | | | | | | | | For remote posting of messages, req->tctx is assigned even though it is never used. Rather than leave a dangling pointer, just clear it to NULL and use the previous check for a valid submitter_task to gate on whether or not the request should be terminated. Reported-by: Jann Horn <[email protected]> Fixes: b6f58a3f4aa8 ("io_uring: move struct io_kiocb from task_struct to io_uring_task") Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: Drop custom destructorGabriel Krisman Bertazi2024-12-271-7/+0
| | | | | | | | | kfree can handle slab objects nowadays. Drop the extra callback and just use kfree. Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
* switch io_msg_ring() to CLASS(fd)Al Viro2024-11-151-11/+7
| | | | | | | | | | Use CLASS(fd) to get the file for sync message ring requests, rather than open-code the file retrieval dance. Signed-off-by: Al Viro <[email protected]> Link: https://lore.kernel.org/r/20241115034902.GP3387508@ZenIV [axboe: make a more coherent commit message] Signed-off-by: Jens Axboe <[email protected]>
* io_uring: move struct io_kiocb from task_struct to io_uring_taskJens Axboe2024-11-061-2/+2
| | | | | | | | | | | | | | | | | | | | Rather than store the task_struct itself in struct io_kiocb, store the io_uring specific task_struct. The life times are the same in terms of io_uring, and this avoids doing some dereferences through the task_struct. For the hot path of putting local task references, we can deref req->tctx instead, which we'll need anyway in that function regardless of whether it's local or remote references. This is mostly straight forward, except the original task PF_EXITING check needs a bit of tweaking. task_work is _always_ run from the originating task, except in the fallback case, where it's run from a kernel thread. Replace the potentially racy (in case of fallback work) checks for req->task->flags with current->flags. It's either the still the original task, in which case PF_EXITING will be sane, or it has PF_KTHREAD set, in which case it's fallback work. Both cases should prevent moving forward with the given request. Signed-off-by: Jens Axboe <[email protected]>
* io_uring/rsrc: add io_rsrc_node_lookup() helperJens Axboe2024-11-021-16/+15
| | | | | | | There are lots of spots open-coding this functionality, add a generic helper that does the node lookup in a speculation safe way. Signed-off-by: Jens Axboe <[email protected]>
* io_uring/rsrc: unify file and buffer resource tablesJens Axboe2024-11-021-2/+2
| | | | | | | | | | | | | | | | For files, there's nr_user_files/file_table/file_data, and buffers have nr_user_bufs/user_bufs/buf_data. There's no reason why file_table and file_data can't be the same thing, and ditto for the buffer side. That gets rid of more io_ring_ctx state that's in two spots rather than just being in one spot, as it should be. Put all the registered file data in one locations, and ditto on the buffer front. This also avoids having both io_rsrc_data->nodes being an allocated array, and ->user_bufs[] or ->file_table.nodes. There's no reason to have this information duplicated. Keep it in one spot, io_rsrc_data, along with how many resources are available. Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: add support for sending a sync messageJens Axboe2024-10-291-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | Normally MSG_RING requires both a source and a destination ring. But some users don't always have a ring avilable to send a message from, yet they still need to notify a target ring. Add support for using io_uring_register(2) without having a source ring, using a file descriptor of -1 for that. Internally those are called blind registration opcodes. Implement IORING_REGISTER_SEND_MSG_RING as a blind opcode, which simply takes an sqe that the application can put on the stack and use the normal liburing helpers to initialize it. Then the app can call: io_uring_register(-1, IORING_REGISTER_SEND_MSG_RING, &sqe, 1); and get the same behavior in terms of the target, where a CQE is posted with the details given in the sqe. For now this takes a single sqe pointer argument, and hence arg must be set to that, and nr_args must be 1. Could easily be extended to take an array of sqes, but for now let's keep it simple. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: refactor a few helper functionsJens Axboe2024-10-291-11/+20
| | | | | | | | | | | Mostly just to skip them taking an io_kiocb, rather just pass in the ctx and io_msg directly. In preparation for being able to issue a MSG_RING request without having an io_kiocb. No functional changes in this patch. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: fix uninitialized use of target_req->flagsJens Axboe2024-07-251-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reports that KMSAN complains that 'nr_tw' is an uninit-value with the following report: BUG: KMSAN: uninit-value in io_req_local_work_add io_uring/io_uring.c:1192 [inline] BUG: KMSAN: uninit-value in io_req_task_work_add_remote+0x588/0x5d0 io_uring/io_uring.c:1240 io_req_local_work_add io_uring/io_uring.c:1192 [inline] io_req_task_work_add_remote+0x588/0x5d0 io_uring/io_uring.c:1240 io_msg_remote_post io_uring/msg_ring.c:102 [inline] io_msg_data_remote io_uring/msg_ring.c:133 [inline] io_msg_ring_data io_uring/msg_ring.c:152 [inline] io_msg_ring+0x1c38/0x1ef0 io_uring/msg_ring.c:305 io_issue_sqe+0x383/0x22c0 io_uring/io_uring.c:1710 io_queue_sqe io_uring/io_uring.c:1924 [inline] io_submit_sqe io_uring/io_uring.c:2180 [inline] io_submit_sqes+0x1259/0x2f20 io_uring/io_uring.c:2295 __do_sys_io_uring_enter io_uring/io_uring.c:3205 [inline] __se_sys_io_uring_enter+0x40c/0x3ca0 io_uring/io_uring.c:3142 __x64_sys_io_uring_enter+0x11f/0x1a0 io_uring/io_uring.c:3142 x64_sys_call+0x2d82/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:427 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f which is the following check: if (nr_tw < nr_wait) return; in io_req_local_work_add(). While nr_tw itself cannot be uninitialized, it does depend on req->flags, which off the msg ring issue path can indeed be uninitialized. Fix this by always clearing the allocated 'req' fully if we can't grab one from the cache itself. Fixes: 50cf5f3842af ("io_uring/msg_ring: add an alloc cache for io_kiocb entries") Reported-by: [email protected] Link: https://lore.kernel.org/io-uring/[email protected]/ Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: use kmem_cache_free() to free requestJens Axboe2024-07-011-1/+1
| | | | | | | | | | | | | | The change adding caching around the request allocated and freed for data messages changed a kmem_cache_free() to a kfree(), which isn't correct as the request came from slab in the first place. Fix that up and use the right freeing function if the cache is already at its limit. Note that the current mixing of kmem_cache_alloc and kfree is fine, but consistent alloc/free functions should be used as it's otherwise somewhat confusing. Fixes: 50cf5f3842af ("io_uring/msg_ring: add an alloc cache for io_kiocb entries") Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: check for dead submitter taskJens Axboe2024-07-011-5/+10
| | | | | | | | | | The change for improving the handling of the target CQE posting inadvertently dropped the NULL check for the submitter task on the target ring, reinstate that. Fixes: 0617bb500bfa ("io_uring/msg_ring: improve handling of target CQE posting") Reported-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: add an alloc cache for io_kiocb entriesJens Axboe2024-06-241-2/+29
| | | | | | | | | | | With slab accounting, allocating and freeing memory has considerable overhead. Add a basic alloc cache for the io_kiocb allocations that msg_ring needs to do. Unlike other caches, this one is used by the sender, grabbing it from the remote ring. When the remote ring gets the posted completion, it'll free it locally. Hence it is separately locked, using ctx->msg_lock. Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: improve handling of target CQE postingJens Axboe2024-06-241-41/+45
| | | | | | | | | | | | | | | | Use the exported helper for queueing task_work for message passing, rather than rolling our own. Note that this is only done for strict data messages for now, file descriptor passing messages still rely on the kernel task_work. It could get converted at some point if it's performance critical. This improves peak performance of message passing by about 5x in some basic testing, with 2 threads just sending messages to each other. Before this change, it was capped at around 700K/sec, with the change it's at over 4M/sec. Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: tighten requirement for remote postingJens Axboe2024-06-241-3/+1
| | | | | | | | | | | Currently this is gated on whether or not the target ring needs a local completion - and if so, whether or not we're running on the right task. The use case for same thread cross posting is probably a lot less relevant than remote posting. And since we're going to improve this situation anyway, just gate it on local posting and ignore what task we're currently running on. Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ringJens Axboe2024-05-011-6/+4
| | | | | | | Move the posting outside the checking and locking, it's cleaner that way. Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of ↵linke li2024-04-261-1/+1
| | | | | | | | | | | | | | re-reading it In io_msg_exec_remote(), ctx->submitter_task is read using READ_ONCE at the beginning of the function, checked, and then re-read from ctx->submitter_task, voiding all guarantees of the checks. Reuse the value that was read by READ_ONCE to ensure the consistency of the task struct throughout the function. Signed-off-by: linke li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
* io_uring: use io_file_from_index in io_msg_grab_fileChristoph Hellwig2023-06-201-3/+1
| | | | | | | | Use io_file_from_index instead of open coding it. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: let target know allocated indexPavel Begunkov2023-03-161-1/+3
| | | | | | | | | | | | | | | | msg_ring requests transferring files support auto index selection via IORING_FILE_INDEX_ALLOC, however they don't return the selected index to the target ring and there is no other good way for the userspace to know where is the receieved file. Return the index for allocated slots and 0 otherwise, which is consistent with other fixed file installing requests. Cc: [email protected] # v6.0+ Fixes: e6130eba8a848 ("io_uring: add support for passing fixed file descriptors") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://github.com/axboe/liburing/issues/809 Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg-ring: ensure flags passing works for task_work completionsJens Axboe2023-01-291-1/+6
| | | | | | | | | | If the target ring is using IORING_SETUP_SINGLE_ISSUER and we're posting a message from a different thread, then we need to ensure that the fallback task_work that posts the CQE knwos about the flags passing as well. If not we'll always be posting 0 as the flags. Fixes: 3563d7ed58a5 ("io_uring/msg_ring: Pass custom flags to the cqe") Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: Pass custom flags to the cqeBreno Leitao2023-01-291-5/+19
| | | | | | | | | | | | | | | This patch adds a new flag (IORING_MSG_RING_FLAGS_PASS) in the message ring operations (IORING_OP_MSG_RING). This new flag enables the sender to specify custom flags, which will be copied over to cqe->flags in the receiving ring. These custom flags should be specified using the sqe->file_index field. This mechanism provides additional flexibility when sending messages between rings. Signed-off-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: fix remote queue to disabled ringPavel Begunkov2023-01-201-0/+8
| | | | | | | | | | | | IORING_SETUP_R_DISABLED rings don't have the submitter task set, so it's not always safe to use ->submitter_task. Disallow posting msg_ring messaged to disabled rings. Also add task NULL check for loosy sync around testing for IORING_SETUP_R_DISABLED. Cc: [email protected] Fixes: 6d043ee1164ca ("io_uring: do msg_ring in target task via tw") Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: fix flagging remote executionPavel Begunkov2023-01-201-17/+23
| | | | | | | | | | | | | | | There is a couple of problems with queueing a tw in io_msg_ring_data() for remote execution. First, once we queue it the target ring can go away and so setting IORING_SQ_TASKRUN there is not safe. Secondly, the userspace might not expect IORING_SQ_TASKRUN. Extract a helper and uniformly use TWA_SIGNAL without TWA_SIGNAL_NO_IPI tricks for now, just as it was done in the original patch. Cc: [email protected] Fixes: 6d043ee1164ca ("io_uring: do msg_ring in target task via tw") Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: fix missing lock on overflow for IOPOLLJens Axboe2023-01-191-9/+30
| | | | | | | | | | | If the target ring is configured with IOPOLL, then we always need to hold the target ring uring_lock before posting CQEs. We could just grab it unconditionally, but since we don't expect many target rings to be of this type, make grabbing the uring_lock conditional on the ring type. Link: https://lore.kernel.org/io-uring/Y8krlYa52%[email protected]/ Reported-by: Xingyuan Mo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: move double lock/unlock helpers higher upJens Axboe2023-01-191-24/+23
| | | | | | | | | In preparation for needing them somewhere else, move them and get rid of the unused 'issue_flags' for the unlock side. No functional changes in this patch. Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: flag target ring as having task_work, if neededJens Axboe2022-12-081-0/+1
| | | | | | | | | | | | | Before the recent change, we didn't even wake the targeted task when posting the cqe remotely. Now we do wake it, but we do want to ensure that the recipient knows there's potential work there that needs to get processed to get the CQE posted. OR in IORING_SQ_TASKRUN for that purpose. Link: https://lore.kernel.org/io-uring/[email protected]/ Fixes: 6d043ee1164c ("io_uring: do msg_ring in target task via tw") Signed-off-by: Jens Axboe <[email protected]>
* io_uring: do msg_ring in target task via twPavel Begunkov2022-12-071-3/+53
| | | | | | | | | | | | | | While executing in a context of one io_uring instance msg_ring manipulates another ring. We're trying to keep CQEs posting contained in the context of the ring-owner task, use task_work to send the request to the target ring's task when we're modifying its CQ or trying to install a file. Note, we can't safely use io_uring task_work infra and have to use task_work directly. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/4d76c7b28ed5d71b520de4482fbb7f660f21cd80.1670384893.git.asml.silence@gmail.com [axboe: use TWA_SIGNAL_NO_IPI] Signed-off-by: Jens Axboe <[email protected]>
* io_uring: extract a io_msg_install_complete helperPavel Begunkov2022-12-071-13/+21
| | | | | | | | | Extract a helper called io_msg_install_complete() from io_msg_send_fd(), will be used later. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1500ca1054cc4286a3ee1c60aacead57fcdfa02a.1670384893.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
* io_uring: get rid of double lockingPavel Begunkov2022-12-071-36/+49
| | | | | | | | | | We don't need to take both uring_locks at once, msg_ring can be split in two parts, first getting a file from the filetable of the first ring and then installing it into the second one. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/a80ecc2bc99c3b3f2cf20015d618b7c51419a797.1670384893.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
* io_uring: improve io_double_lock_ctx fail handlingPavel Begunkov2022-12-071-0/+2
| | | | | | | | | | msg_ring will fail the request if it can't lock rings, instead punt it to io-wq as was originally intended. Cc: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/4697f05afcc37df5c8f89e2fe6d9c7c19f0241f9.1670384893.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
* io_uring: dont remove file from msg_ring reqsPavel Begunkov2022-12-071-4/+0
| | | | | | | | | | | | We should not be messing with req->file outside of core paths. Clearing it makes msg_ring non reentrant, i.e. luckily io_msg_send_fd() fails the request on failed io_double_lock_ctx() but clearly was originally intended to do retries instead. Cc: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/e5ac9edadb574fe33f6d727cb8f14ce68262a684.1670384893.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
* io_uring: remove overflow param from io_post_aux_cqeDylan Yudaken2022-11-251-2/+2
| | | | | | | | | | | The only call sites which would not allow overflow are also call sites which would use the io_aux_cqe as they care about ordering. So remove this parameter from io_post_aux_cqe. Signed-off-by: Dylan Yudaken <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd()Harshit Mogalapalli2022-10-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Syzkaller produced the below call trace: BUG: KASAN: null-ptr-deref in io_msg_ring+0x3cb/0x9f0 Write of size 8 at addr 0000000000000070 by task repro/16399 CPU: 0 PID: 16399 Comm: repro Not tainted 6.1.0-rc1 #28 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 Call Trace: <TASK> dump_stack_lvl+0xcd/0x134 ? io_msg_ring+0x3cb/0x9f0 kasan_report+0xbc/0xf0 ? io_msg_ring+0x3cb/0x9f0 kasan_check_range+0x140/0x190 io_msg_ring+0x3cb/0x9f0 ? io_msg_ring_prep+0x300/0x300 io_issue_sqe+0x698/0xca0 io_submit_sqes+0x92f/0x1c30 __do_sys_io_uring_enter+0xae4/0x24b0 .... RIP: 0033:0x7f2eaf8f8289 RSP: 002b:00007fff40939718 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2eaf8f8289 RDX: 0000000000000000 RSI: 0000000000006f71 RDI: 0000000000000004 RBP: 00007fff409397a0 R08: 0000000000000000 R09: 0000000000000039 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004006d0 R13: 00007fff40939880 R14: 0000000000000000 R15: 0000000000000000 </TASK> Kernel panic - not syncing: panic_on_warn set ... We don't have a NULL check on file_ptr in io_msg_send_fd() function, so when file_ptr is NUL src_file is also NULL and get_file() dereferences a NULL pointer and leads to above crash. Add a NULL check to fix this issue. Fixes: e6130eba8a84 ("io_uring: add support for passing fixed file descriptors") Reported-by: syzkaller <[email protected]> Signed-off-by: Harshit Mogalapalli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
* io_uring/msg_ring: check file type before puttingJens Axboe2022-09-151-1/+2
| | | | | | | | | | If we're invoked with a fixed file, follow the normal rules of not calling io_fput_file(). Fixed files are permanently registered to the ring, and do not need putting separately. Cc: [email protected] Fixes: aa184e8671f0 ("io_uring: don't attempt to IOPOLL for MSG_RING requests") Signed-off-by: Jens Axboe <[email protected]>
* io_uring: make io_kiocb_to_cmd() typesafeStefan Metzmacher2022-08-121-4/+4
| | | | | | | | | We need to make sure (at build time) that struct io_cmd_data is not casted to a structure that's larger. Signed-off-by: Stefan Metzmacher <[email protected]> Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe <[email protected]>
* io_uring: add allow_overflow to io_post_aux_cqeDylan Yudaken2022-07-251-2/+2
| | | | | | | | | | | Some use cases of io_post_aux_cqe would not want to overflow as is, but might want to change the flags/result. For example multishot receive requires in order CQE, and so if there is an overflow it would need to stop receiving until the overflow is taken care of. Signed-off-by: Dylan Yudaken <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
* io_uring: add support for passing fixed file descriptorsJens Axboe2022-07-251-7/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With IORING_OP_MSG_RING, one ring can send a message to another ring. Extend that support to also allow sending a fixed file descriptor to that ring, enabling one ring to pass a registered descriptor to another one. Arguments are extended to pass in: sqe->addr3 fixed file slot in source ring sqe->file_index fixed file slot in destination ring IORING_OP_MSG_RING is extended to take a command argument in sqe->addr. If set to zero (or IORING_MSG_DATA), it sends just a message like before. If set to IORING_MSG_SEND_FD, a fixed file descriptor is sent according to the above arguments. Two common use cases for this are: 1) Server needs to be shutdown or restarted, pass file descriptors to another onei 2) Backend is split, and one accepts connections, while others then get the fd passed and handle the actual connection. Both of those are classic SCM_RIGHTS use cases, and it's not possible to support them with direct descriptors today. By default, this will post a CQE to the target ring, similarly to how IORING_MSG_DATA does it. If IORING_MSG_RING_CQE_SKIP is set, no message is posted to the target ring. The issuer is expected to notify the receiver side separately. Signed-off-by: Jens Axboe <[email protected]>
* io_uring: kill extra io_uring_types.h includesPavel Begunkov2022-07-251-1/+0
| | | | | | | | | | io_uring/io_uring.h already includes io_uring_types.h, no need to include it every time. Kill it in a bunch of places, it prepares us for following patches. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/94d8c943fbe0ef949981c508ddcee7fc1c18850f.1655384063.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
* io_uring: don't expose io_fill_cqe_aux()Pavel Begunkov2022-07-251-10/+1
| | | | | | | | | Deduplicate some code and add a helper for filling an aux CQE, locking and notification. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/b7c6557c8f9dc5c4cfb01292116c682a0ff61081.1655455613.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
* io_uring: move msg_ring into its own fileJens Axboe2022-07-251-0/+65
Signed-off-by: Jens Axboe <[email protected]>