Skip to content

TCP client performance optimizations#177

Merged
jadamcrain merged 14 commits intomainfrom
feature/performance-analysis
Mar 15, 2026
Merged

TCP client performance optimizations#177
jadamcrain merged 14 commits intomainfrom
feature/performance-analysis

Conversation

@jadamcrain
Copy link
Member

Summary

  • Eliminate unnecessary Box<dyn Callback> allocation in Promise types by storing oneshot::Sender directly (3.02 → 2.02 allocs/request, -33%)
  • Fix ReadBuffer shift condition bug (self.end == self.len()self.end == self.buffer.len()) with regression test
  • Skip tracing span creation per transaction when decode level is disabled
  • Make MBAP parser iterative instead of recursive
  • Pass function code through to handle_response instead of recomputing
  • Implement ExactSizeIterator for BitIterator, RegisterIterator, WriteMultipleIterator
  • Update scursor 0.2.0 → 0.5.0
  • Fix perf test connection race using Listener<ClientState>
  • Add perf test README with valgrind profiling instructions

Measurements

All measurements taken with valgrind (deterministic instruction counts and allocation profiling).

Metric Before After Delta
allocs/request 3.02 2.02 -33%
instructions/request ~19,385 ~18,720 -3.4%

jadamcrain and others added 9 commits March 15, 2026 08:37
- Add PERFORMANCE_ANALYSIS.md documenting optimization opportunities
  in the TCP client hot path
- Fix perf test to use a Listener<ClientState> to wait for connection
  before benchmarking, eliminating the startup race condition
- Add README.md for perf test with valgrind profiling instructions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Store oneshot::Sender directly in an enum variant instead of wrapping
it in a boxed closure. This removes one heap allocation per request
on the Channel API path while preserving the boxed callback path for
the FFI/CallbackSession API.

Measured with valgrind dhat: 3.02 -> 2.02 allocs/request (-33%)
Measured with callgrind: 19385 -> 18980 instructions/request (-2.1%)
The condition `self.end == self.len()` reduces to `self.begin == 0`,
so the shift never triggered when it was actually needed (begin > 0
and end at capacity). Changed to `self.end == self.buffer.len()`.

Added a test that reproduces the bug: filling the buffer to capacity,
consuming some bytes, then attempting to read more would panic with
"space for 0 bytes" because no shift occurred.
When all decode levels are Nothing (the default), skip creating the
per-transaction tracing::info_span. This avoids span allocation and
tx_id Display formatting on every request when logging is inactive.

Measured with callgrind: 18980 -> 18809 instructions/request (-0.9%)
Picks up first_chunk-based integer reads, const-generic read_array,
and Kani-verified panic-freedom guarantees. No API breakage.
Removes a redundant 8-arm enum match on RequestDetails per response.

Measured with callgrind: 18819 -> 18784 instructions/request (-0.2%)
Replace the recursive self.parse() call after header parsing with a
loop. Removes one function call/match overhead per parsed frame.

Measured with callgrind: 18784 -> 18733 instructions/request (-0.3%)
…ultipleIterator

These iterators already provided exact size_hint. Adding the marker
trait makes the contract explicit and may help the optimizer in
generic contexts.
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers significant performance enhancements to the TCP client by reducing heap allocations and CPU instructions, resulting in a 33% decrease in allocations per request and a 3.4% reduction in instructions per request. It also includes a crucial correctness fix for the ReadBuffer's internal logic and expands the performance testing infrastructure with new documentation and profiling guidance.

Highlights

  • Allocation Optimization: Eliminated unnecessary Box<dyn Callback> allocations in Promise types by directly storing oneshot::Sender.
  • Correctness Fix: Fixed a bug in ReadBuffer's shift condition (self.end == self.len() to self.end == self.buffer.len()) and added a regression test.
  • Tracing Performance: Optimized tracing by skipping span creation when the decode level is disabled.
  • Parser Refactoring: Refactored the MBAP parser from a recursive to an iterative approach.
  • Function Code Handling: Passed FunctionCode directly to handle_response to avoid recomputation.
  • Iterator Enhancements: Implemented ExactSizeIterator for BitIterator, RegisterIterator, and WriteMultipleIterator.
  • Dependency Update: Updated the scursor dependency from version 0.2.0 to 0.5.0.
  • Performance Test Stability: Addressed a performance test connection race using a Listener<ClientState>.
  • Documentation and Tooling: Added a PERFORMANCE_ANALYSIS.md document and updated the performance test README.md with Valgrind profiling instructions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • Cargo.lock
    • Updated the scursor dependency version from 0.2.0 to 0.5.0 and its checksum.
  • PERFORMANCE_ANALYSIS.md
    • Added a new document detailing a comprehensive performance analysis of the TCP client hot path, identifying several optimization opportunities.
  • examples/perf/README.md
    • Added a new README file for the performance test, including usage instructions and guidance for profiling with Valgrind (callgrind and dhat).
  • examples/perf/src/main.rs
    • Added ConnectListener and wait_for_connected structs/functions to manage client connection states.
    • Modified the main function to use ConnectListener to ensure clients are connected before starting performance tests, preventing race conditions.
  • rodbus/Cargo.toml
    • Updated the scursor dependency version from 0.2.0 to 0.5.0.
  • rodbus/src/client/message.rs
    • Modified RequestDetails::handle_response to accept FunctionCode as an argument, removing the need for recomputation.
    • Refactored the Promise struct to use an enum PromiseInner to store either oneshot::Sender or Box<dyn Callback>, eliminating an unnecessary heap allocation for the oneshot case.
  • rodbus/src/client/requests/read_bits.rs
    • Refactored the Promise struct to use PromiseInner for BitsCallback to avoid unnecessary heap allocations.
    • Updated ReadBits::new_channel to use the new Promise::oneshot constructor.
  • rodbus/src/client/requests/read_registers.rs
    • Refactored the Promise struct to use PromiseInner for RegistersCallback to avoid unnecessary heap allocations.
    • Updated ReadRegisters::new_channel to use the new Promise::oneshot constructor.
  • rodbus/src/client/requests/write_multiple.rs
    • Implemented the ExactSizeIterator trait for WriteMultipleIterator.
  • rodbus/src/client/task.rs
    • Added a conditional check around the tracing::info_span! macro to only create a tracing span if the decode level is enabled, reducing overhead.
  • rodbus/src/common/buffer.rs
    • Corrected the ReadBuffer::read_some shift condition from self.end == self.len() to self.end == self.buffer.len().
    • Added a new test case shifts_data_when_buffer_full_with_consumed_bytes to validate the corrected shift logic.
  • rodbus/src/tcp/frame.rs
    • Refactored MbapParser::parse to use an iterative loop instead of a recursive call, improving robustness and potentially performance.
  • rodbus/src/types.rs
    • Implemented the ExactSizeIterator trait for BitIterator and RegisterIterator.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements several performance optimizations for the TCP client, as detailed in the description. Key changes include reducing heap allocations by refactoring Promise types, fixing a logic bug in ReadBuffer, and converting a recursive parser to an iterative one. The changes are supported by new tests and documentation. After a thorough review, I have not identified any issues or further improvement opportunities in the submitted code.

Replace iterator-based collect() with as_chunks::<2>() to eliminate
per-element bounds checking when building Vec<Indexed<u16>> from
register responses. Applies to both holding and input register reads.

Requires bumping rust-version to 1.94 for slice::as_chunks.

Measured with callgrind: 18733 -> 18145 instructions/request (-3.1%)
Notable: tokio 1.48 -> 1.50, tracing 0.1.41 -> 0.1.44
@jadamcrain jadamcrain merged commit 9e68238 into main Mar 15, 2026
30 checks passed
@jadamcrain jadamcrain deleted the feature/performance-analysis branch March 15, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant