RadioLibInterface: harden RX path (4 related fixes)#10252
RadioLibInterface: harden RX path (4 related fixes)#10252nightjoker7 wants to merge 4 commits intomeshtastic:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens the RadioLib-backed RX interrupt path and a RadioLib RNG helper to improve correctness, accounting, and log signal-to-noise in RadioLibInterface.
Changes:
- Fix off-by-one byte generation in
randomBytes()by adjusting therandom()upper bound. - Reduce CRC-mismatch log volume by downgrading that specific error to
LOG_DEBUGwith a shorter message. - Improve RX reject-path accounting (
from==0) and add a null-check forpacketPool.allocZeroed()to avoid crashes under pool exhaustion.
| rxGood++; | ||
|
|
||
| // Note: we deliver _all_ packets to our router (i.e. our interface is intentionally promiscuous). | ||
| // This allows the router and other apps on our node to sniff packets (usually routing) between other | ||
| // nodes. | ||
| meshtastic_MeshPacket *mp = packetPool.allocZeroed(); |
There was a problem hiding this comment.
rxGood++ is incremented before packetPool.allocZeroed(). With the new nullptr-drop path, a packet that is ultimately dropped due to pool exhaustion will still be counted as a good RX. Consider moving rxGood++ to after a successful allocation (and ideally after enqueue/deliver), and treating allocation failure as rxBad instead.
| // altered packet with "from == 0" can do Remote Node Administration without permission | ||
| if (radioBuffer.header.from == 0) { | ||
| LOG_WARN("Ignore received packet without sender"); | ||
| rxBad++; |
There was a problem hiding this comment.
This is still a correctly received packet over the air, which is where the statistic is for. It's just that we don't allow this.
| iface->getSNR(), lround(iface->getRSSI()), radioBuffer.header.next_hop, radioBuffer.header.relay_node); | ||
| // CRC errors are normal noise in RF-congested areas — drop to DEBUG so we don't flood logs. | ||
| if (state == RADIOLIB_ERR_CRC_MISMATCH) { | ||
| LOG_DEBUG("RX CRC mismatch (noise?) fr=0x%08x rxSNR=%g rxRSSI=%i", |
There was a problem hiding this comment.
I agree it should not really be considered an error, but it's valuable information to have. I would rather leave it as LOG_INFO, and also please move all the original parameters back into the log statement.
| // Packet pool exhaustion: drop this packet on the floor rather than deref NULL. | ||
| // Happens under sustained heap pressure (e.g., config dump mid-RX, dense mesh bursts). | ||
| if (!mp) { | ||
| rxBad++; |
There was a problem hiding this comment.
This is also not a "bad" Rx. And in this case you already counted it as "good" too.
Review feedback from @GUVWAF and @Copilot on PR meshtastic#10252: 1. CRC mismatch (GUVWAF): keep as LOG_INFO (not DEBUG), restore the full parameter set (id, from, to, flags, SNR, RSSI, nextHop, relay) — it is valuable info for operators even if it is not ERROR severity. 2. 'from == 0' path (GUVWAF): this is a correctly-received OTA packet that we refuse to process by policy, not a bad RX. Remove the rxBad++ / airTime logging I had added; rxGood++ already counts the OTA reception above. 3. payloadLen < 0 path (GUVWAF): same reasoning — not a bad RX class. Remove the rxBad++ / airTime additions; retain only the LOG_WARN. 4. Packet pool exhaustion (Copilot): the allocator (Allocator:: allocZeroed) already emits a WARN on failure. Drop the duplicate LOG_ERROR I had added to avoid log flooding under sustained pressure. Behaviour now preserved: - all 4 original bugs still fixed (CRC log volume, randomBytes off-by-one, from==0 no-deref, pool-exhaustion null-check) - counter semantics restored to match existing philosophy - log signal-to-noise improved without dropping useful parameters
|
Thanks for the review @GUVWAF — all three points addressed in 93cf96fab: CRC mismatch (line 488) — kept at
Pool-exhaustion path — I also resolved a conflict with the pre-applied Copilot suggestion ( Net effect: all 4 original bugs remain fixed (CRC log volume, |
| // check for short packets | ||
| if (payloadLen < 0) { | ||
| LOG_WARN("Ignore received packet too short"); | ||
| rxBad++; |
There was a problem hiding this comment.
This is now neither counted as good, nor bad.
| memset(&decodedtmp, 0, sizeof(decodedtmp)); | ||
| if (!pb_decode_from_bytes(bytes, rawSize, &meshtastic_Data_msg, &decodedtmp)) { | ||
| LOG_ERROR("Invalid protobufs in received mesh packet id=0x%08x (bad psk?)!", p->id); | ||
| LOG_DEBUG("Invalid protobufs in received mesh packet id=0x%08x (bad psk?)", p->id); |
There was a problem hiding this comment.
I don't consider this "debug" information. "Error" is fine here, but at least "Warn".
Bundles four small, related fixes in `handleReceiveInterrupt()` / `randomBytes()`: 1. **`random(0, 255)` off-by-one.** RadioLib's `random(min, max)` is a half-open range (returns [min, max)), so `random(0, 255)` produces 0-254 and can never emit the byte value 255. Use `random(0, 256)` to cover the full byte range. 2. **CRC-mismatch log noise.** Every corrupted-preamble reception at the demod threshold logs LOG_ERROR with full 9-field context. In a busy RF environment (urban mesh, cross-channel bleed, weather) this floods the log — hundreds of lines per hour that just say "CRC didn't match." Split to LOG_DEBUG for CRC_MISMATCH specifically, keep LOG_ERROR for all other RadioLib error states. 3. **`from == 0` mis-accounted as rxGood.** An altered packet with sender field zero'd out bypasses RemoteHardware permission checks — the existing check correctly refuses to process the packet, but before returning it had already incremented `rxGood` and was not logging airtime. Count it as rxBad and log airtime, matching the other reject paths. 4. **`packetPool.allocZeroed()` NULL return dereferenced.** Under sustained heap pressure (config dump mid-RX, dense mesh burst, MQTT downlink flood) `allocZeroed()` can return `nullptr`; the next line assigns `mp->from` and hard-faults with LoadProhibited. Guard the return, log, and drop the packet — RX handler already has precedent for dropping on resource-exhaustion paths. All four are in the same ~40 lines of `handleReceiveInterrupt()` / `randomBytes()`. Bundled because splitting would create four near-trivial PRs touching the same function; happy to split on request. Tested on Station G2 fleet for several weeks — no regressions; measurable reduction in log volume from item 2; no observed LoadProhibited from item 4 since applying.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Review feedback from @GUVWAF and @Copilot on PR meshtastic#10252: 1. CRC mismatch (GUVWAF): keep as LOG_INFO (not DEBUG), restore the full parameter set (id, from, to, flags, SNR, RSSI, nextHop, relay) — it is valuable info for operators even if it is not ERROR severity. 2. 'from == 0' path (GUVWAF): this is a correctly-received OTA packet that we refuse to process by policy, not a bad RX. Remove the rxBad++ / airTime logging I had added; rxGood++ already counts the OTA reception above. 3. payloadLen < 0 path (GUVWAF): same reasoning — not a bad RX class. Remove the rxBad++ / airTime additions; retain only the LOG_WARN. 4. Packet pool exhaustion (Copilot): the allocator (Allocator:: allocZeroed) already emits a WARN on failure. Drop the duplicate LOG_ERROR I had added to avoid log flooding under sustained pressure. Behaviour now preserved: - all 4 original bugs still fixed (CRC log volume, randomBytes off-by-one, from==0 no-deref, pool-exhaustion null-check) - counter semantics restored to match existing philosophy - log signal-to-noise improved without dropping useful parameters
…rotobuf to WARN
GUVWAF flagged two remaining log/counter items on the hardened RX path:
- RadioLibInterface.cpp: the short-packet branch was neither rxGood nor
rxBad after the previous revision. Move rxGood++ above the
payloadLen<0 check so every CRC-passed OTA reception counts regardless
of subsequent header-validation outcome.
- Router.cpp: 'Invalid protobufs in received mesh packet (bad psk?)' is
more than DEBUG-worthy — either our channel key is wrong or someone's
feeding us corrupted data. Promote to LOG_WARN per GUVWAF's request.
("Invalid portnum" at the same site stays at LOG_DEBUG — that one only
means the decoded protobuf had the default portnum field, which is a
weaker signal.)
ab1535f to
8f17900
Compare
|
@GUVWAF rebased on the latest develop tip — was flagged Your 5 inline comments from 2026-04-23/24 are addressed in the existing commits, which survived the rebase as
Verified locally: |
Summary
Four small, related hardening fixes in
src/mesh/RadioLibInterface.cpp, all inhandleReceiveInterrupt()/randomBytes(). Bundled because splitting would be four one-to-two-liners on the same function; happy to split on request.1.
random(0, 255)off-by-one inrandomBytes()RadioLib's
random(min, max)is half-open — it returns values in[min, max), not[min, max].random(0, 255)therefore covers 0-254 and can never produce byte value 255. Changed torandom(0, 256)for the full 0-255 range.Impact: minor statistical bias in any randomness seeded via this path (e.g., PKI key material on platforms that fall through to this slow path). Not security-critical in practice but is wrong.
2. CRC-mismatch log flood
In noisy RF environments every preamble-triggered corrupted packet fires
LOG_ERRORwith 9 format fields of context. In my urban deployment I was seeing 500+ CRC_MISMATCH lines per hour — buries real errors.Split the error-path so only
RADIOLIB_ERR_CRC_MISMATCHdrops toLOG_DEBUGwith a shorter format string. All other error states still get the fullLOG_ERRORcontext.3.
from == 0mis-classified as rxGoodThe existing check correctly refuses to process packets where sender is zeroed out (a known Remote-Node-Admin bypass vector per the security audit):
Two bugs:
rxGoodbefore the early return — so a rejected packet inflates the good-RX counter used for reliability metricsairTime->logAirtime(RX_ALL_LOG, rxMsec)— airtime accounting misses this packet's time on-airFixed by moving
rxGood++past the check, and addingrxBad++ + airTime->logAirtime()on the reject path, matching the other reject branches in this function.4.
packetPool.allocZeroed()NULL dereference under heap pressureallocZeroed()can returnnullptrwhen the packet pool is exhausted — e.g., during a phone-connect config-dump that's eating heap while RX is still running, or during an MQTT-downlink burst.Added a null-check that drops the packet and logs airtime. This matches the pattern used elsewhere in the function for resource-exhaustion rejects (
payloadLen < 0etc.).Risk
Testing
Deployed on two-node Station G2 fleet (custom build
2.7.23.0fcf6c3) for several weeks. No regressions. Measurable reduction in log volume; no observed LoadProhibited from the allocZeroed path since applying.Related