Conversation
|
Added error value |
frzifus
left a comment
There was a problem hiding this comment.
Does it make sense to limit the amount of retries and to introduce a break between the retries?
|
@frzifus Yes, that is what I was doing as a sub-optimal measure. But I think it would be preferable to actually follow the specifications. The problem arises when the traffic is high. Any thoughts? |
|
Hi @decadenza! Please excuse the late reply here, I hope you're still around! I really like your suggestion here, but I have a question left: |
|
Hi. As an alternative, the user has to fire a re-attempt, but it looks more logical to use the timeout parameter that is already part of the library. What do you think? |
|
@decadenza sorry again for the late reply, if you're still around, please resolve the conflicts and then I’ll review this PR 👍 |
|
It's been long time since I last used this. The conflict was only caused by the addition of |
alexjoedt
left a comment
There was a problem hiding this comment.
The right fix for this class of problem is to move the "discard wrong-ID frames" logic into the transport layer. For RTU, readIncrementally could keep reading until it sees the expected slave ID or times out. For TCP, readResponse already has protocol-recovery logic that could be extended. That way, client.send stays clean and the fix applies correctly regardless of which transport is in use.
|
|
||
| // Wait for correct response ID before throwing error. | ||
| var aduResponse []byte | ||
| maxTime := time.Now().Add(tcpTimeout) |
There was a problem hiding this comment.
tcpTimeout is a constant defined in tcpclient.go and is meaningless here.
client.go's send method is shared by all transports (TCP, RTU, ASCII). For an RTU serial client the timeout is serialTimeout. There is no single correct timeout constant to use here; this approach needs to be rethought.
There was a problem hiding this comment.
I see your point. Unfortunately it is unlikely I'll have the time for this in the near future.
| if time.Now().After(maxTime) { | ||
| err = fmt.Errorf("modbus: response timeout") | ||
| return | ||
| } |
There was a problem hiding this comment.
The loop never checks ctx.Done(), so it will ignore context cancellation/deadline and continue running past the context timeout - i guess it was'nt there when you issued the PR. The custom fmt.Errorf("modbus: response timeout") also hides the actual error from Verify, which callers could find useful. The ctx deadline should be the upper bound.
| var aduResponse []byte | ||
| maxTime := time.Now().Add(tcpTimeout) | ||
| for { | ||
| aduResponse, err = mb.transporter.Send(ctx, aduRequest) |
There was a problem hiding this comment.
Calling Send inside a loop re-transmits the request on every iteration. Both the TCP and RTU Send implementations write to the wire/serial port before reading the response, so every loop iteration sends a duplicate request. This is especially dangerous for write operations (writes would be executed multiple times).
The Modbus Serial Line spec defines the correct master behaviour explicitly:
In case of a reply received from an unexpected slave, the Response time-out is kept running.
So the master must keep waiting, not re-send the request. The retry logic belongs inside the transport's read path. For RTU, readIncrementally already has a state machine that could keep reading until it matches the correct slave ID or the deadline expires. For TCP, readResponse already has protocol-recovery logic that could be extended. That way, client.send stays clean and the fix applies correctly regardless of which transport is in use.
Due to noise/interference on the serial line (separate problem), I was having errors like:
And so on.
My system has only one slave with id 1.

I came across this issue on a different repository.
As I understand it, Modbus protocol specifies that master should manage the "unexpected slave" and wait for the correct one for the specified timeout setting.
Checking the code I realised that the client was not waiting for the correct timeout before throwing the error.
With this simple change my error rate has reduced, and when it happens the correct "timeout" error is given.