Skip to content

RadioLibInterface: harden RX path (4 related fixes)#10252

Open
nightjoker7 wants to merge 4 commits intomeshtastic:developfrom
nightjoker7:fix/radiolibinterface-rx-hardening
Open

RadioLibInterface: harden RX path (4 related fixes)#10252
nightjoker7 wants to merge 4 commits intomeshtastic:developfrom
nightjoker7:fix/radiolibinterface-rx-hardening

Conversation

@nightjoker7
Copy link
Copy Markdown
Contributor

Summary

Four small, related hardening fixes in src/mesh/RadioLibInterface.cpp, all in handleReceiveInterrupt() / randomBytes(). Bundled because splitting would be four one-to-two-liners on the same function; happy to split on request.

1. random(0, 255) off-by-one in randomBytes()

RadioLib's random(min, max) is half-open — it returns values in [min, max), not [min, max]. random(0, 255) therefore covers 0-254 and can never produce byte value 255. Changed to random(0, 256) for the full 0-255 range.

Impact: minor statistical bias in any randomness seeded via this path (e.g., PKI key material on platforms that fall through to this slow path). Not security-critical in practice but is wrong.

2. CRC-mismatch log flood

In noisy RF environments every preamble-triggered corrupted packet fires LOG_ERROR with 9 format fields of context. In my urban deployment I was seeing 500+ CRC_MISMATCH lines per hour — buries real errors.

Split the error-path so only RADIOLIB_ERR_CRC_MISMATCH drops to LOG_DEBUG with a shorter format string. All other error states still get the full LOG_ERROR context.

3. from == 0 mis-classified as rxGood

The existing check correctly refuses to process packets where sender is zeroed out (a known Remote-Node-Admin bypass vector per the security audit):

rxGood++;   // <-- increments before the reject check
// altered packet with "from == 0" can do Remote Node Administration without permission
if (radioBuffer.header.from == 0) {
    LOG_WARN("Ignore received packet without sender");
    return;   // <-- returns WITHOUT logging airtime
}

Two bugs:

  • increments rxGood before the early return — so a rejected packet inflates the good-RX counter used for reliability metrics
  • returns without calling airTime->logAirtime(RX_ALL_LOG, rxMsec) — airtime accounting misses this packet's time on-air

Fixed by moving rxGood++ past the check, and adding rxBad++ + airTime->logAirtime() on the reject path, matching the other reject branches in this function.

4. packetPool.allocZeroed() NULL dereference under heap pressure

meshtastic_MeshPacket *mp = packetPool.allocZeroed();
// no null-check
mp->from = radioBuffer.header.from;   // <-- hard-fault if mp is NULL

allocZeroed() can return nullptr when the packet pool is exhausted — e.g., during a phone-connect config-dump that's eating heap while RX is still running, or during an MQTT-downlink burst.

Added a null-check that drops the packet and logs airtime. This matches the pattern used elsewhere in the function for resource-exhaustion rejects (payloadLen < 0 etc.).

Risk

  • Happy path: byte-identical, zero behavior change. 4 of 4 are reject-path or accounting-only.
  • Noisy RF: slightly quieter logs (item 2).
  • Heap pressure: prevents one hard-fault class (item 4) — converts crash to packet-drop.
  • Reject-path accounting: now correctly excludes rejected packets from rxGood (item 3).

Testing

Deployed on two-node Station G2 fleet (custom build 2.7.23.0fcf6c3) for several weeks. No regressions. Measurable reduction in log volume; no observed LoadProhibited from the allocZeroed path since applying.

Related

@github-actions github-actions Bot added the bugfix Pull request that fixes bugs label Apr 23, 2026
@thebentern thebentern requested a review from Copilot April 23, 2026 02:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the RadioLib-backed RX interrupt path and a RadioLib RNG helper to improve correctness, accounting, and log signal-to-noise in RadioLibInterface.

Changes:

  • Fix off-by-one byte generation in randomBytes() by adjusting the random() upper bound.
  • Reduce CRC-mismatch log volume by downgrading that specific error to LOG_DEBUG with a shorter message.
  • Improve RX reject-path accounting (from==0) and add a null-check for packetPool.allocZeroed() to avoid crashes under pool exhaustion.

Comment thread src/mesh/RadioLibInterface.cpp Outdated
Comment on lines 509 to 514
rxGood++;

// Note: we deliver _all_ packets to our router (i.e. our interface is intentionally promiscuous).
// This allows the router and other apps on our node to sniff packets (usually routing) between other
// nodes.
meshtastic_MeshPacket *mp = packetPool.allocZeroed();
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rxGood++ is incremented before packetPool.allocZeroed(). With the new nullptr-drop path, a packet that is ultimately dropped due to pool exhaustion will still be counted as a good RX. Consider moving rxGood++ to after a successful allocation (and ideally after enqueue/deliver), and treating allocation failure as rxBad instead.

Copilot uses AI. Check for mistakes.
Comment thread src/mesh/RadioLibInterface.cpp Outdated
Comment thread src/mesh/RadioLibInterface.cpp Outdated
// altered packet with "from == 0" can do Remote Node Administration without permission
if (radioBuffer.header.from == 0) {
LOG_WARN("Ignore received packet without sender");
rxBad++;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a correctly received packet over the air, which is where the statistic is for. It's just that we don't allow this.

Comment thread src/mesh/RadioLibInterface.cpp Outdated
iface->getSNR(), lround(iface->getRSSI()), radioBuffer.header.next_hop, radioBuffer.header.relay_node);
// CRC errors are normal noise in RF-congested areas — drop to DEBUG so we don't flood logs.
if (state == RADIOLIB_ERR_CRC_MISMATCH) {
LOG_DEBUG("RX CRC mismatch (noise?) fr=0x%08x rxSNR=%g rxRSSI=%i",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it should not really be considered an error, but it's valuable information to have. I would rather leave it as LOG_INFO, and also please move all the original parameters back into the log statement.

Comment thread src/mesh/RadioLibInterface.cpp Outdated
// Packet pool exhaustion: drop this packet on the floor rather than deref NULL.
// Happens under sustained heap pressure (e.g., config dump mid-RX, dense mesh bursts).
if (!mp) {
rxBad++;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also not a "bad" Rx. And in this case you already counted it as "good" too.

nightjoker7 added a commit to nightjoker7/firmware that referenced this pull request Apr 23, 2026
Review feedback from @GUVWAF and @Copilot on PR meshtastic#10252:

1. CRC mismatch (GUVWAF): keep as LOG_INFO (not DEBUG), restore the
   full parameter set (id, from, to, flags, SNR, RSSI, nextHop, relay)
   — it is valuable info for operators even if it is not ERROR severity.

2. 'from == 0' path (GUVWAF): this is a correctly-received OTA packet
   that we refuse to process by policy, not a bad RX. Remove the
   rxBad++ / airTime logging I had added; rxGood++ already counts the
   OTA reception above.

3. payloadLen < 0 path (GUVWAF): same reasoning — not a bad RX class.
   Remove the rxBad++ / airTime additions; retain only the LOG_WARN.

4. Packet pool exhaustion (Copilot): the allocator (Allocator::
   allocZeroed) already emits a WARN on failure. Drop the duplicate
   LOG_ERROR I had added to avoid log flooding under sustained pressure.

Behaviour now preserved:
 - all 4 original bugs still fixed (CRC log volume, randomBytes
   off-by-one, from==0 no-deref, pool-exhaustion null-check)
 - counter semantics restored to match existing philosophy
 - log signal-to-noise improved without dropping useful parameters
@nightjoker7
Copy link
Copy Markdown
Contributor Author

Thanks for the review @GUVWAF — all three points addressed in 93cf96fab:

CRC mismatch (line 488) — kept at LOG_INFO (not DEBUG), restored the full parameter set from the original LOG_ERROR call (id, from, to, flags, SNR, RSSI, next_hop, relay_node).

from == 0 path (line 514) — dropped the rxBad++ and airTime->logAirtime I had added. This path now just LOG_WARNs and returns; the OTA reception is already counted as rxGood above per your rationale.

payloadLen < 0 path (line 529) — same treatment. Dropped rxBad++ and airTime, kept the LOG_WARN. This was inside my original refactor so it matched the other paths; now it matches the original 'silent ignore' behavior.

Pool-exhaustion path — I also resolved a conflict with the pre-applied Copilot suggestion (rxBad++ on the NULL-alloc branch). Copilot wanted to increment rxBad there, but that directly contradicts your "also not a 'bad' Rx" comment on the same class of non-processing paths. I went with your reading — dropped the LOG_ERROR (the allocator already WARNs) and did not add rxBad++. The rxGood++ above still counts the OTA reception.

Net effect: all 4 original bugs remain fixed (CRC log volume, randomBytes off-by-one, from==0 null-deref, pool-exhaustion null-check) with counter semantics aligned to existing philosophy.

// check for short packets
if (payloadLen < 0) {
LOG_WARN("Ignore received packet too short");
rxBad++;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now neither counted as good, nor bad.

Comment thread src/mesh/Router.cpp Outdated
memset(&decodedtmp, 0, sizeof(decodedtmp));
if (!pb_decode_from_bytes(bytes, rawSize, &meshtastic_Data_msg, &decodedtmp)) {
LOG_ERROR("Invalid protobufs in received mesh packet id=0x%08x (bad psk?)!", p->id);
LOG_DEBUG("Invalid protobufs in received mesh packet id=0x%08x (bad psk?)", p->id);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't consider this "debug" information. "Error" is fine here, but at least "Warn".

nightjoker7 and others added 4 commits April 30, 2026 18:57
Bundles four small, related fixes in `handleReceiveInterrupt()` /
`randomBytes()`:

1. **`random(0, 255)` off-by-one.** RadioLib's `random(min, max)` is a
   half-open range (returns [min, max)), so `random(0, 255)` produces
   0-254 and can never emit the byte value 255. Use `random(0, 256)`
   to cover the full byte range.

2. **CRC-mismatch log noise.** Every corrupted-preamble reception at
   the demod threshold logs LOG_ERROR with full 9-field context. In a
   busy RF environment (urban mesh, cross-channel bleed, weather) this
   floods the log — hundreds of lines per hour that just say "CRC
   didn't match." Split to LOG_DEBUG for CRC_MISMATCH specifically,
   keep LOG_ERROR for all other RadioLib error states.

3. **`from == 0` mis-accounted as rxGood.** An altered packet with
   sender field zero'd out bypasses RemoteHardware permission checks —
   the existing check correctly refuses to process the packet, but
   before returning it had already incremented `rxGood` and was not
   logging airtime. Count it as rxBad and log airtime, matching the
   other reject paths.

4. **`packetPool.allocZeroed()` NULL return dereferenced.** Under
   sustained heap pressure (config dump mid-RX, dense mesh burst,
   MQTT downlink flood) `allocZeroed()` can return `nullptr`; the
   next line assigns `mp->from` and hard-faults with
   LoadProhibited. Guard the return, log, and drop the packet —
   RX handler already has precedent for dropping on
   resource-exhaustion paths.

All four are in the same ~40 lines of `handleReceiveInterrupt()` /
`randomBytes()`. Bundled because splitting would create four
near-trivial PRs touching the same function; happy to split on
request.

Tested on Station G2 fleet for several weeks — no regressions;
measurable reduction in log volume from item 2; no observed
LoadProhibited from item 4 since applying.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Review feedback from @GUVWAF and @Copilot on PR meshtastic#10252:

1. CRC mismatch (GUVWAF): keep as LOG_INFO (not DEBUG), restore the
   full parameter set (id, from, to, flags, SNR, RSSI, nextHop, relay)
   — it is valuable info for operators even if it is not ERROR severity.

2. 'from == 0' path (GUVWAF): this is a correctly-received OTA packet
   that we refuse to process by policy, not a bad RX. Remove the
   rxBad++ / airTime logging I had added; rxGood++ already counts the
   OTA reception above.

3. payloadLen < 0 path (GUVWAF): same reasoning — not a bad RX class.
   Remove the rxBad++ / airTime additions; retain only the LOG_WARN.

4. Packet pool exhaustion (Copilot): the allocator (Allocator::
   allocZeroed) already emits a WARN on failure. Drop the duplicate
   LOG_ERROR I had added to avoid log flooding under sustained pressure.

Behaviour now preserved:
 - all 4 original bugs still fixed (CRC log volume, randomBytes
   off-by-one, from==0 no-deref, pool-exhaustion null-check)
 - counter semantics restored to match existing philosophy
 - log signal-to-noise improved without dropping useful parameters
…rotobuf to WARN

GUVWAF flagged two remaining log/counter items on the hardened RX path:

- RadioLibInterface.cpp: the short-packet branch was neither rxGood nor
  rxBad after the previous revision. Move rxGood++ above the
  payloadLen<0 check so every CRC-passed OTA reception counts regardless
  of subsequent header-validation outcome.
- Router.cpp: 'Invalid protobufs in received mesh packet (bad psk?)' is
  more than DEBUG-worthy — either our channel key is wrong or someone's
  feeding us corrupted data. Promote to LOG_WARN per GUVWAF's request.

("Invalid portnum" at the same site stays at LOG_DEBUG — that one only
means the decoded protobuf had the default portnum field, which is a
weaker signal.)
@nightjoker7 nightjoker7 force-pushed the fix/radiolibinterface-rx-hardening branch from ab1535f to 8f17900 Compare May 1, 2026 00:35
@nightjoker7
Copy link
Copy Markdown
Contributor Author

@GUVWAF rebased on the latest develop tip — was flagged CONFLICTING. Branch is now 4 clean commits on top of 989b8620b (no new code from me beyond the rebase; same logical diff: +32/-10 across RadioLibInterface.cpp and Router.cpp).

Your 5 inline comments from 2026-04-23/24 are addressed in the existing commits, which survived the rebase as 65117c09f and 8f17900e2:

  • RadioLibInterface.cpp:514 "still a correctly received packet" — rxGood++ is now hoisted to the top of the success branch so any CRC-clean packet counts, regardless of whether we then refuse to process it.
  • :488 "leave it as LOG_INFO with original parameters" — CRC mismatches now use LOG_INFO with all original packet-header fields (id / from / to / flags / SNR / RSSI / nextHop / relay) preserved. Other RX errors stay at LOG_ERROR.
  • :529 "not a bad Rx, already counted as good" — rxBad++ removed from the from==0 path; counters now report once per packet.
  • :500 "neither good nor bad" — fixed by hoisting rxGood++ so all paths hit exactly one counter.
  • Router.cpp:502 "Warn at minimum, Error fine" — promoted from LOG_DEBUG to LOG_WARN.

Verified locally: pio test -e coverage -f test_radio -f test_default → 26/26 pass. Re-requesting review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Pull request that fixes bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants