Skip to content

feat: hole-punch nim-v1.15 peer#78

Open
rlve wants to merge 8 commits into
libp2p:masterfrom
rlve:hp-nim-libp2p
Open

feat: hole-punch nim-v1.15 peer#78
rlve wants to merge 8 commits into
libp2p:masterfrom
rlve:hp-nim-libp2p

Conversation

@rlve

@rlve rlve commented Mar 31, 2026

Copy link
Copy Markdown
Contributor

Add nim-v1.15 to hole-punch interop tests as dialer/listener.

@rlve rlve requested a review from dhuseby as a code owner March 31, 2026 12:45
@richard-ramos richard-ramos self-requested a review March 31, 2026 13:09
@rlve

rlve commented Mar 31, 2026

Copy link
Copy Markdown
Contributor Author

Oops, was working locally 🤔

@rlve

rlve commented Apr 1, 2026

Copy link
Copy Markdown
Contributor Author

I made the workflow pass on my fork on a default github runner (with little changes to not use interface_name Docker Compose feature, as it's not available there): rlve#3

Which suggests that the issue might be with the connectivity of the self-hosted runner.

cc: @richard-ramos

@rlve rlve changed the title add: hole-punch nim-v1.15 peer feat: hole-punch nim-v1.15 peer May 4, 2026
@seetadev

Copy link
Copy Markdown
Collaborator

Excellent contribution by @rlve on this PR.

Adding nim-v1.15 support to the hole-punch interoperability test suite as both dialer and listener is an important step toward improving cross-implementation compatibility and strengthening NAT traversal testing across the libp2p ecosystem. It is great to see continued work expanding interoperability coverage for the Nim implementation.

I also appreciate the persistence shown here in debugging the workflow issues, validating the setup independently on a fork, and identifying that the remaining failures may be related to self-hosted runner connectivity rather than the implementation itself. That kind of investigation adds a lot of confidence to the contribution and helps narrow down where the actual bottleneck exists.

This is clearly an important PR for improving hole-punch interoperability coverage, and the work put into iterating through multiple fixes and validating behavior is highly appreciated.

Since Hole Punch Interoperability Tests (PR) / run-tests is still failing, tagging @luca and @Sumanjeet here as well, could you both please help investigate the remaining CI/CD or runner-related issues and work toward resolving them at the earliest so we can move this forward for merge.

Great work overall, @rlve, this is a meaningful contribution toward improving libp2p interoperability and hole-punch testing reliability across implementations.

CCing @johannamoran

@sumanjeet0012

Copy link
Copy Markdown
Collaborator

Relay Connectivity Failure - RCA and Verified Fix

Failing Tests & Configurations

  1. rust-v0.56 x nim-v1.15 (tcp, noise, yamux)
  2. rust-v0.56 x nim-v1.15 (tcp, noise, mplex)
  3. nim-v1.15 x rust-v0.56 (tcp, noise, yamux)
  4. nim-v1.15 x rust-v0.56 (tcp, noise, mplex)
  5. nim-v1.15 x nim-v1.15 (tcp, noise, yamux)
  6. nim-v1.15 x nim-v1.15 (tcp, noise, mplex)

Error Status:

  • Nim nodes: Connection to relay timed out: Timeout exceeded! (after 30 seconds)
  • Rust nodes: Error: Failed to connect: Transport([... Other(Custom { kind: Other, error: Timeout })]) (after 30 seconds)

Root Cause Analysis

The failures occurred before relay reservation or hole-punching could begin.

The relay was sending TCP SYN-ACK responses correctly, but the NAT routers (dialer-router and listener-router) were dropping them. Due to Docker veth checksum offloading, conntrack marked incoming SYN-ACK packets as INVALID, preventing reverse NAT translation.

As a result:

  1. TCP handshakes never completed.
  2. Nodes could not connect to the relay.
  3. Relay reservations were never created.
  4. All hole-punching tests timed out after 30 seconds.

Resolution

  1. Modified images/linux/Dockerfile to install ethtool.
  2. Updated images/linux/run.sh to include iptables CHECKSUM fill targets for outgoing packets (iptables -t mangle -A POSTROUTING -p tcp -j CHECKSUM --checksum-fill).
  3. Re-enabled the INPUT DROP rule in images/linux/run.sh to properly simulate the NAT and drop un-tracked packets (to avoid TCP RSTs).
  4. Modified the Docker Compose generation in lib/run-single-test.sh to inject the net.netfilter.nf_conntrack_checksum=0 sysctl into the routers. This forces conntrack to ignore checksum verification on incoming packets and allows it to successfully track and translate the Relay's SYN-ACK packets.

Verification

The fixes were verified locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants