virtio: remove per-work Arc<Mutex> clone from queue completion path#3173
virtio: remove per-work Arc<Mutex> clone from queue completion path#3173jstarks merged 13 commits intomicrosoft:mainfrom
Conversation
Add a complete() method on VirtioQueue that delegates to the internal used handler. This coexists with VirtioQueueCallbackWork::complete() during the migration—devices will be switched one at a time, then the old method will be removed.
Change process_rng_request to return the byte count instead of completing the descriptor internally. The caller completes via queue.complete().
Change process_pmem_request to return the byte count instead of completing the descriptor internally. The caller completes via queue.complete().
The caller already receives the byte count from process_9p_request. Switch from work.complete(bytes) to queue.complete(&mut work, bytes).
Change process_virtiofs_request to take &VirtioQueueCallbackWork and return the byte count instead of completing the descriptor internally. The caller completes via queue.complete().
At each call site, split work.consume().complete(n) into a let-binding followed by queue.complete(&mut work, n). After consume() releases the queue borrow, the queue is available for completion.
Pass &mut VirtioQueue to finish_io and poll_drain so they can call queue.complete() instead of work.complete(). In the main event loop, this comes from state.queue. In stop_queue, use worker.get_mut() to borrow both the BlkWorker and BlkQueueState simultaneously.
TX: complete_tx_packet borrow-splits self to access both virtio_state.tx_queue and active_state.pending_tx_packets. RX: change VirtioWorkPool::complete_packet to return (work, bytes) instead of completing internally. process_endpoint_rx completes via virtio_state.rx_queue.
…-work clone All devices now use VirtioQueue::complete(). Remove: - work.complete() method - Arc<Mutex<VirtioQueueUsedHandler>> from VirtioQueueCallbackWork - completed: bool tracking field - VirtioQueue Drop impl (Arc::get_mut check is dead now) Update test callback signatures to pass &mut VirtioQueue so tests can call queue.complete().
Now that no work item clones the Arc<Mutex<VirtioQueueUsedHandler>>, the indirection is pure overhead. Inline QueueCoreCompleteWork and Interrupt directly into VirtioQueue and delete the handler struct. VirtioQueue::complete() now calls complete_descriptor and delivers the interrupt inline, with no lock acquisition.
Move complete() calls from VirtioQueueCallbackWork to VirtioQueue, matching the new API where completion is done on the queue rather than the work item. Refactor write_packet so the caller (handle_host_rx) always completes the work item, ensuring completion even on early error returns.
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
There was a problem hiding this comment.
Pull request overview
Refactors virtio queue completion so that completion is performed by VirtioQueue (via VirtioQueue::complete(&mut work, bytes_written)) rather than by each VirtioQueueCallbackWork, eliminating per-descriptor Arc<Mutex<...>> cloning/locking overhead on the hot completion path.
Changes:
- Moves descriptor completion responsibility into
VirtioQueue, removing per-work used-ring handler ownership fromVirtioQueueCallbackWork. - Updates in-tree virtio devices and virtio tests to the new completion pattern.
- Adjusts a few device helper APIs (e.g., returning
bytes_written) so the caller can complete exactly once.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| vm/devices/virtio/virtio/src/common.rs | Introduces VirtioQueue::complete and removes per-work completion handler plumbing. |
| vm/devices/virtio/virtio/src/tests.rs | Updates test callbacks to complete work via VirtioQueue. |
| vm/devices/virtio/virtiofs/src/virtio.rs | Makes request processing return bytes_written; completion is done by the queue. |
| vm/devices/virtio/virtio_vsock/src/lib.rs | Refactors RX/TX completion flow; ensures RX work is always completed even on error. |
| vm/devices/virtio/virtio_rng/src/lib.rs | Returns bytes_written from request processing; caller completes via queue. |
| vm/devices/virtio/virtio_pmem/src/lib.rs | Returns bytes_written; caller completes via queue. |
| vm/devices/virtio/virtio_p9/src/lib.rs | Completes 9p requests via state.queue.complete. |
| vm/devices/virtio/virtio_net/src/lib.rs | Updates TX/RX completion to go through the queue; adapts RX completion flow. |
| vm/devices/virtio/virtio_net/src/buffers.rs | Changes RX “completion” helper to return (work, bytes) so caller completes via queue. |
| vm/devices/virtio/virtio_console/src/lib.rs | Updates peek/consume paths to complete via queue. |
| vm/devices/virtio/virtio_blk/src/lib.rs | Routes IO completion through VirtioQueue::complete, including drain-on-stop logic. |
| /// Complete a descriptor previously obtained from this queue. | ||
| /// | ||
| /// Writes `bytes_written` to the used ring and delivers an interrupt | ||
| /// to the guest (unless interrupt suppression is active). | ||
| pub fn complete(&mut self, work: &mut VirtioQueueCallbackWork, bytes_written: u32) { | ||
| match self.complete.complete_descriptor(&work.work, bytes_written) { | ||
| Ok(true) => { | ||
| self.notify_guest.deliver(); | ||
| } |
There was a problem hiding this comment.
VirtioQueue::complete() no longer enforces the “exactly once” completion contract. Previously, VirtioQueueCallbackWork::complete() asserted it hadn’t already been completed; now a caller can accidentally complete the same work multiple times (or via the wrong queue), which can corrupt the used ring and (for packed queues) can trigger a panic due to mismatched completion context. Consider restoring a guard (e.g., a completed flag in VirtioQueueCallbackWork set by VirtioQueue::complete, or changing complete to consume the work item and/or validate it belongs to this queue).
| @@ -107,7 +109,7 @@ impl VirtioWorkPool { | |||
| } else { | |||
| packet.len + header_size() as u32 | |||
| }; | |||
| packet.work.complete(payload_len); | |||
| (packet.work, payload_len) | |||
| } | |||
There was a problem hiding this comment.
VirtioWorkPool::complete_packet() no longer completes anything; it now returns the work item + computed length for the caller to complete. This name is misleading and makes it easy to accidentally assume the descriptor has already been completed. Consider renaming to reflect the new behavior (e.g., take_packet_for_completion / take_completed_packet_work) or updating the API to actually perform completion.
Take VirtioQueueCallbackWork by value instead of &mut, enforcing the exactly-once completion contract at compile time via move semantics rather than a runtime assert. Also rename VirtioWorkPool::complete_packet to take_rx_work to reflect that it no longer completes the descriptor itself.
Every
VirtioQueueCallbackWorkcurrently clones anArc<Mutex<VirtioQueueUsedHandler>>so that it can independently callcomplete()on itself. This means every descriptor processed by every virtio device pays for an atomic ref-count increment/decrement pair plus a mutex lock on the completion path—overhead that adds up on high-throughput devices like virtio-net and virtio-blk.This PR moves completion responsibility from the work item to the queue: callers now invoke
VirtioQueue::complete(work, bytes_written)instead ofwork.complete(bytes_written). Because the queue already owns the used-ring handler, theArc<Mutex>clone per work item, thecompletedtracking flag, and theDropguard onVirtioQueueare all eliminated.All in-tree virtio device crates are updated to the new pattern. For virtio-vsock, the refactor also fixes a latent bug where
write_packetcould early-return on error without completing the work item; the caller now always completes it.