Skip to content

Wait for correct response id#77

Open
decadenza wants to merge 4 commits intogrid-x:masterfrom
decadenza:master
Open

Wait for correct response id#77
decadenza wants to merge 4 commits intogrid-x:masterfrom
decadenza:master

Conversation

@decadenza
Copy link
Copy Markdown

Due to noise/interference on the serial line (separate problem), I was having errors like:

error="modbus: response slave id '69' does not match request '1'"
error="modbus: response slave id '48' does not match request '1'"

And so on.

My system has only one slave with id 1.
I came across this issue on a different repository.
image

As I understand it, Modbus protocol specifies that master should manage the "unexpected slave" and wait for the correct one for the specified timeout setting.

Checking the code I realised that the client was not waiting for the correct timeout before throwing the error.

With this simple change my error rate has reduced, and when it happens the correct "timeout" error is given.

@decadenza
Copy link
Copy Markdown
Author

Added error value err = fmt.Errorf("modbus: response timeout")

Copy link
Copy Markdown
Contributor

@frzifus frzifus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to limit the amount of retries and to introduce a break between the retries?

@decadenza
Copy link
Copy Markdown
Author

@frzifus Yes, that is what I was doing as a sub-optimal measure. But I think it would be preferable to actually follow the specifications. The problem arises when the traffic is high. Any thoughts?

@hnicolaysen
Copy link
Copy Markdown
Contributor

Hi @decadenza!

Please excuse the late reply here, I hope you're still around!
First of all, thank you for your contribution!

I really like your suggestion here, but I have a question left:
The automaton above shows a retry for the response reception. In your requested change, you also propose to re-send the request if the first response was not matching. Is this intended?

@decadenza
Copy link
Copy Markdown
Author

Hi.
The re-send is intentional, but only until maxTime, calculated from the tcpTimeout parameter.

As an alternative, the user has to fire a re-attempt, but it looks more logical to use the timeout parameter that is already part of the library. What do you think?

@alexjoedt
Copy link
Copy Markdown
Contributor

@decadenza sorry again for the late reply, if you're still around, please resolve the conflicts and then I’ll review this PR 👍

@decadenza
Copy link
Copy Markdown
Author

It's been long time since I last used this. The conflict was only caused by the addition of ctx to the Send method so I fixed it.
Tests are passing. Give it a try. Thanks.

Copy link
Copy Markdown
Contributor

@alexjoedt alexjoedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right fix for this class of problem is to move the "discard wrong-ID frames" logic into the transport layer. For RTU, readIncrementally could keep reading until it sees the expected slave ID or times out. For TCP, readResponse already has protocol-recovery logic that could be extended. That way, client.send stays clean and the fix applies correctly regardless of which transport is in use.


// Wait for correct response ID before throwing error.
var aduResponse []byte
maxTime := time.Now().Add(tcpTimeout)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tcpTimeout is a constant defined in tcpclient.go and is meaningless here.
client.go's send method is shared by all transports (TCP, RTU, ASCII). For an RTU serial client the timeout is serialTimeout. There is no single correct timeout constant to use here; this approach needs to be rethought.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Unfortunately it is unlikely I'll have the time for this in the near future.

Comment on lines +649 to +652
if time.Now().After(maxTime) {
err = fmt.Errorf("modbus: response timeout")
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop never checks ctx.Done(), so it will ignore context cancellation/deadline and continue running past the context timeout - i guess it was'nt there when you issued the PR. The custom fmt.Errorf("modbus: response timeout") also hides the actual error from Verify, which callers could find useful. The ctx deadline should be the upper bound.

var aduResponse []byte
maxTime := time.Now().Add(tcpTimeout)
for {
aduResponse, err = mb.transporter.Send(ctx, aduRequest)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Send inside a loop re-transmits the request on every iteration. Both the TCP and RTU Send implementations write to the wire/serial port before reading the response, so every loop iteration sends a duplicate request. This is especially dangerous for write operations (writes would be executed multiple times).

The Modbus Serial Line spec defines the correct master behaviour explicitly:

In case of a reply received from an unexpected slave, the Response time-out is kept running.

So the master must keep waiting, not re-send the request. The retry logic belongs inside the transport's read path. For RTU, readIncrementally already has a state machine that could keep reading until it matches the correct slave ID or the deadline expires. For TCP, readResponse already has protocol-recovery logic that could be extended. That way, client.send stays clean and the fix applies correctly regardless of which transport is in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants