Skip to content

feat: introduce serial link recovery#123

Open
sansmoraxz wants to merge 12 commits intogrid-x:masterfrom
sansmoraxz:feat/serial-recover
Open

feat: introduce serial link recovery#123
sansmoraxz wants to merge 12 commits intogrid-x:masterfrom
sansmoraxz:feat/serial-recover

Conversation

@sansmoraxz
Copy link
Copy Markdown
Contributor

@sansmoraxz sansmoraxz commented Dec 4, 2025

Attempt protection against shoddy wires in RTU connection.

Fixes: #122

@sansmoraxz

This comment was marked as resolved.

@sansmoraxz

This comment was marked as resolved.

@sansmoraxz sansmoraxz marked this pull request as ready for review December 7, 2025 05:37
Copy link
Copy Markdown
Contributor

@alexjoedt alexjoedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, sorry for the late review! Really like the direction here.

I left a few comments, mostly around defaults and some small robustness things. Hope they're helpful, let me know what you think!

// Silent period after successful connection
ConnectDelay time.Duration
// Recovery timeout if the connection is lost
LinkRecoveryTimeout time.Duration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that neither NewRTUClientHandler nor NewASCIIClientHandler set a default for LinkRecoveryTimeout. With a zero value, linkRecoveryDeadline is immediately in the past, so the recovery loop will never actually retry, it will always hit "link recovery timeout reached" on the first error.

The TCP transporter handles this by checking if mb.LinkRecoveryTimeout == 0 before giving up. Should we either set a sensible default here or add the same zero-check in the serial Send methods?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to be same for consistency IMO

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But really if went with default what would be sensible here? It would depend on parameters like bauderate, voltage stablity etc. Also if there's multiple requests in a loop line (with cetain devices turned off) would delay the entire request response cycle.

connDeadline := time.Now().Add(mb.Timeout)
linkRecoveryDeadline := time.Now().Add(mb.LinkRecoveryTimeout)

for {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud: if close() + connect() keeps succeeding but the remote device stays unresponsive, this could spin pretty fast until linkRecoveryDeadline. Would it be worth adding a small backoff delay between reconnect attempts, or capping the number of retries?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Speaking of which wouldn't this apply to TCP too?

Perhaps it can be tackled as a seperate PR

@@ -0,0 +1,117 @@
//go:build darwin || linux || freebsd || openbsd || netbsd
// +build darwin linux freebsd openbsd netbsd
Copy link
Copy Markdown
Contributor

@alexjoedt alexjoedt Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build tags include darwin but openPTY() uses syscall.TIOCSPTLCK and syscall.TIOCGPTN, which are Linux-only. This won't compile on macOS. Should we either restrict the tag to linux or use something like github.com/creack/pty for cross-platform PTY support?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I am dropping mac support. I don't have any mac device to test this with anyway. If required perhaps someone else can take a dig at this later on.

@sansmoraxz
Copy link
Copy Markdown
Contributor Author

sansmoraxz commented Mar 27, 2026

Also this has the unexpected surprise of writing to currently open terminals.

modbus/Makefile

Lines 12 to 16 in 79617db

test:
diagslave -m tcp -p 5020 & diagslave -m enc -p 5021 & go test -run TCP -v $(shell glide nv)
socat -d -d pty,raw,echo=0 pty,raw,echo=0 & diagslave -m rtu /dev/pts/1 & go test -run RTU -v $(shell glide nv)
socat -d -d pty,raw,echo=0 pty,raw,echo=0 & diagslave -m ascii /dev/pts/3 & go test -run ASCII -v $(shell glide nv)
go test -v -count=1 github.com/grid-x/modbus/cmd/modbus-cli

IMO instead of pts terminals perhaps better to have some tmp file that's auto trapped and deleted once tests are run.

@sansmoraxz sansmoraxz requested a review from alexjoedt April 2, 2026 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: Auto recovery for serial connections

2 participants