Skip to content

feat: transport nim-v1.15#77

Open
rlve wants to merge 13 commits into
libp2p:masterfrom
rlve:transport-nim-libp2p
Open

feat: transport nim-v1.15#77
rlve wants to merge 13 commits into
libp2p:masterfrom
rlve:transport-nim-libp2p

Conversation

@rlve

@rlve rlve commented Mar 31, 2026

Copy link
Copy Markdown
Contributor

Add nim-v1.15 to transport interop tests.

@rlve rlve marked this pull request as ready for review March 31, 2026 10:19
@rlve rlve requested a review from dhuseby as a code owner March 31, 2026 10:19
@richard-ramos richard-ramos self-requested a review March 31, 2026 13:08
@rlve

rlve commented Mar 31, 2026

Copy link
Copy Markdown
Contributor Author

Looks like nim is failing only against python ws:

  → Results:
    → Total: 55
    ✓ Passed: 52
    ✗ Failed: 3
    - python-v0.x x nim-v1.15 (ws, noise, mplex)

    - nim-v1.15 x python-v0.x (ws, noise, yamux)
    - nim-v1.15 x python-v0.x (ws, noise, mplex)
  → Total time: 00:08:20

  ✗ 3 test(s) failed

From what I observed the issue is that python sends observedAddr (protobuf field 4) as an empty bytes field (@[]) in its identify response. It's present in the protobuf but contains zero bytes, which causes nim to reject the message when it tries to parse empty MultiAddress.

cc: @richard-ramos

@rlve rlve changed the title add: transport nim-v1.15 feat: transport nim-v1.15 May 4, 2026
@seetadev seetadev self-requested a review June 14, 2026 21:57
@seetadev

Copy link
Copy Markdown
Collaborator

Excellent contribution by @rlve on this PR.

Adding nim-v1.15 support to the transport interoperability test suite is an important step toward improving implementation diversity and strengthening compatibility testing across the libp2p ecosystem. It is great to see continued work ensuring the Nim implementation is represented more broadly across transport interoperability coverage.

I also want to appreciate the detailed debugging work done here. The investigation around the failing Python ↔ Nim WebSocket interoperability cases, particularly identifying the issue with Python sending an empty observedAddr field in the identify response that leads to MultiAddress parsing failures on the Nim side, is extremely valuable. This kind of protocol-level debugging helps uncover edge-case interoperability issues that improve the ecosystem as a whole, beyond just this individual PR.

It is clear a lot of effort has gone into iterating through multiple fixes, testing behavior across implementations, and narrowing down the root cause of the remaining failures. Contributions like this significantly strengthen long-term interoperability reliability.

Since Transport Interoperability Tests (PR) / run-tests is still failing, tagging @acul71 and @sumanjeet0012 here as well, could you both please help investigate the remaining CI/CD and cross-implementation interoperability issues at the earliest so we can move this important PR forward.

Great work overall, @rlve, really appreciate the persistence and the technical depth behind this contribution. This is a meaningful improvement for libp2p interoperability testing.

CCing @johannamoran

@sumanjeet0012

Copy link
Copy Markdown
Collaborator

Failing Tests Analysis

Snapshot: transport-da7832d3-162847-05-05-2026
Failed: 4 / 57 tests


1. python-v0.x (dialer) → nim-v1.15 (listener) — ws, noise, mplex

Exit code: 1

The ping actually succeeds (RTT ~1.14ms). The crash happens during teardown when Python tries to send the Mplex CLOSE frame — but Nim has already closed the WebSocket by then.

ConnectionClosedError: WebSocket connection closed by peer during write operation: code=1000
RuntimeError: Failed to send close message and Mplex isn't shutting down

Fix: Python should catch and swallow ConnectionClosed on stream teardown, or Nim should wait for the peer to close gracefully before dropping the WS connection.


2 & 3. nim-v1.15 (dialer) → python-v0.x (listener) — ws, noise, yamux & mplex

Exit code: 255

Nim fails immediately after connecting with:

Failed to finish outgoing upgrade in internalConnect: Incorrect message received!

Key observation: Nim dials Python over TCP (both yamux and mplex) without any issues. This rules out Noise and muxer bugs entirely. The problem is specific to WebSocket.

Python's listener reaches Listener ready before Nim connects, so the HTTP upgrade succeeds. The failure is in the binary framing of the libp2p protocol negotiation over WebSocket. Python's WS library is likely sending frames in a format Nim's WS client can't parse.

Fix: Capture the raw WS frames with tcpdump on the Docker network and compare what Python sends vs what Nim expects. Bug is likely in python-libp2p/transport/websocket/.


4. nim-v1.15 (dialer) → lua-v0.1.0 (listener) — tcp, noise, yamux

Error: Timeout at 180s

No application logs from either container — but that's normal for Lua (the passing test lua → nim also emits no startup logs). The real signal is the asymmetry: Lua as dialer completes in 2s, Lua as listener hangs for 180s.

Nim's dialer never even gets a connection, which puts the failure at the socket bind/listen stage before any libp2p logic runs. Most likely causes:

  1. Lua listener binding to 127.0.0.1 instead of 0.0.0.0 — containers need to bind to all interfaces to receive traffic from other containers on the Docker network.
  2. Lua's coroutine scheduler blocking on startup (e.g., waiting on a Redis key it never gets) before calling listen().

Fix: Check the Lua listener code for how it parses and binds to the multiaddr provided via env vars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants