MeshService: null-check sender node before dereferencing has_user on hot RX path by nightjoker7 · Pull Request #10229 · meshtastic/firmware

nightjoker7 · 2026-04-21T17:30:48Z

Summary

Null-check nodeDB->getMeshNode(mp->from) before dereferencing ->has_user in MeshService::handleFromRadio().

Problem

In src/mesh/MeshService.cpp::handleFromRadio():

nodeDB->updateFrom(*mp); // update our DB state based off sniffing every RX packet from the radio
...
} else if (mp->which_payload_variant == meshtastic_MeshPacket_decoded_tag && !nodeDB->getMeshNode(mp->from)->has_user &&
           nodeInfoModule && !isPreferredRebroadcaster && !nodeDB->isFull()) {

The code assumes updateFrom(*mp) guarantees mp->from is in the NodeDB on the next line, but it does not. Looking at NodeDB::updateFrom():

void NodeDB::updateFrom(const meshtastic_MeshPacket &mp)
{
    if (mp.from == getNodeNum()) {
        LOG_DEBUG("Ignore update from self");
        return;                               // <-- early return, no insert
    }
    if (mp.which_payload_variant == meshtastic_MeshPacket_decoded_tag && mp.from) {
        ...
        meshtastic_NodeInfoLite *info = getOrCreateMeshNode(getFrom(&mp));
        if (!info) {
            return;                           // <-- early return on alloc failure
        }
        ...
    }
    // (nothing inserted for encrypted-tag packets or mp.from == 0)
}

Three paths through updateFrom leave the DB unchanged:

mp.from == getNodeNum() — the packet is from ourselves (can happen on loopback / MQTT echo / local re-injection).
getOrCreateMeshNode() returns nullptr — the DB is at MAX_NUM_NODES, heap is below MINIMUM_SAFE_FREE_HEAP, and no non-favorite / non-ignored / non-verified node exists to evict.
Encrypted-tag packet with mp.from == 0 — skipped entirely.

In any of these cases, getMeshNode(mp->from) on the next line returns NULL and dereferencing ->has_user is a null-pointer dereference → hard fault / watchdog reset.

The !nodeDB->isFull() clause at the end of the same if does not protect against this — short-circuit evaluation means the has_user dereference happens first.

This path runs on every inbound radio packet, so it is one of the hottest code paths in the firmware. Any node on a crowded mesh that pushes the DB to capacity is at risk.

Fix

Add the standard nodeDB->getMeshNode(mp->from) != nullptr guard earlier in the short-circuit chain, mirroring the pattern already used correctly elsewhere (e.g. Router.cpp::perhapsSetChannel, RoutingModule::isRoutingRequest, ReliableRouter.cpp::shouldRelayPkiEncrypted):

} else if (mp->which_payload_variant == meshtastic_MeshPacket_decoded_tag && nodeDB->getMeshNode(mp->from) != nullptr &&
           !nodeDB->getMeshNode(mp->from)->has_user && nodeInfoModule && !isPreferredRebroadcaster && !nodeDB->isFull()) {

If getMeshNode returns null, we skip the "send NodeInfo to unknown node" path entirely — which is the correct behavior, because we have no record of the peer to respond to anyway.

Testing

Patched firmware running on a 4-node fleet (RAK4631 / Heltec Station G2 / T114 / Heltec V3) for several days with a 250-node-capacity NodeDB close to saturation. No crashes on inbound traffic of any portnum.
Builds cleanly against develop.

Risk

Minimal. One-line change that adds a null-check before an existing dereference on a hot path. No behavior change on the happy path; previously-crashing path now takes the same "skip NodeInfo send" branch it would take if the node were already known.

…hot RX path In handleFromRadio(), the "heard a new node, send our NodeInfo" branch dereferences nodeDB->getMeshNode(mp->from)->has_user without first confirming the lookup succeeded. The preceding updateFrom(*mp) call does NOT guarantee mp->from is now in the DB. It early-returns in three cases: when mp.from equals our own nodenum (local loopback / MQTT echo), when getOrCreateMeshNode fails because the DB is full and no non-favorite non-ignored non-verified entry is available for eviction, and when mp.from == 0 on an encrypted-tag packet. In any of those paths, getMeshNode returns NULL on the next line and ->has_user dereferences it, producing a null-pointer crash. The trailing !nodeDB->isFull() check does not guard this because short-circuit evaluation reaches the dereference first. Add a nullptr check to the short-circuit chain, matching the pattern already used correctly in Router.cpp, RoutingModule.cpp, and ReliableRouter.cpp. If the sender is not in the DB we simply skip the NodeInfo-send branch, which is the correct behavior.

Copilot

Pull request overview

Fixes a potential null-pointer dereference in the mesh RX pipeline by guarding nodeDB->getMeshNode(mp->from) before accessing ->has_user in MeshService::handleFromRadio(), preventing hard faults when updateFrom() doesn’t create/retain a NodeDB entry.

Changes:

Add a getMeshNode(mp->from) != nullptr short-circuit check before dereferencing has_user in the hot RX path.

Copilot · 2026-04-21T20:43:37Z

+    } else if (mp->which_payload_variant == meshtastic_MeshPacket_decoded_tag && nodeDB->getMeshNode(mp->from) != nullptr &&
+               !nodeDB->getMeshNode(mp->from)->has_user && nodeInfoModule && !isPreferredRebroadcaster && !nodeDB->isFull()) {


getMeshNode(mp->from) is now called twice in the same condition. NodeDB::getMeshNode() performs a linear scan over numMeshNodes, so on this hot RX path this doubles the lookup work for packets where the node exists. Consider caching the pointer in a local variable (or otherwise restructuring the condition) so the node lookup happens once and the subsequent has_user check uses the same result.

@copilot apply changes based on this feedback

nightjoker7 · 2026-04-22T22:14:44Z

Adding production evidence from a small deployment — this race fires in normal operation in multi-channel RF areas, not just theoretical edge cases.

Observed 2026-04-22 on two Station-G2 nodes (ROUTER_LATE + CLIENT_BASE) in an urban mesh:

Both nodes rebooted independently within 19 minutes of each other (04:55 and 05:14 UTC), after 10.25h and 3.7h of uptime respectively. A neighbor node went down ~2h later after 6.47 days up
NodeDB at MAX_NUM_NODES (250/250) on both — exactly the isFull() condition the PR describes
12h of meshlogger captures: 417 packets (9.2% of total RX traffic) from 10+ distinct senders not in NodeDB whose payloads fail channel-key decryption because they're on a different Meshtastic channel (channel hash 115 vs our 45). Their LoRa demodulates, firmware enters handleFromRadio, updateFrom() short-circuits because getOrCreateMeshNode() returns null (DB full + heap pressure) — then exactly the getMeshNode(mp->from)->has_user deref this PR null-checks

Loudest example: sender 0xF67C9BA3, ~5/hr all day, peaks +10 dB SNR (1 hop away).

After patching (this PR + #10219 + #10226), same rogue packets continue to arrive (91+ from that one sender alone since patch), no crashes on the MeshService::handleFromRadio path.

Real-world class of trigger: any mesh near MAX_NUM_NODES in an RF area with multiple Meshtastic channels will hit this at a measurable rate. The null-check converts a remote-DoS-over-RF into a benign decode-fail.

nightjoker7 · 2026-04-22T23:55:21Z

Post-hoc correction — the crash rate I cited was amplified by an independent bug in our meshlogger client that I only caught after the first comment. Want to be intellectually honest about what the evidence actually supports.

The Python meshtastic.tcp_interface.TCPInterface(host) call leaks a socket + reader thread every time the config handshake times out. Our meshlogger was retrying on a flat 15-second interval, and each retry triggers a full config-dump + NodeDB-dump on hw71's 250-entry DB. That's a compound reconnect storm happening on top of the cross-channel RF traffic — the crash rate in my first comment reflects both pressures combined.

Once we patched the client (L4 pre-probe before handshake, explicit connectNow=False with iface.close() on handshake timeout, 5-minute handshake-timeout backoff, exponential backoff otherwise) and flashed firmware with this PR + #10226 + #10219, post-flash window has been stable for 3+ hours under the same cross-channel senders still broadcasting.

Revised framing: this PR is not "the sole fix for DoS-over-RF." In-field crash rate depends heavily on client behavior too. But the underlying null-deref — getMeshNode(mp->from)->has_user on a nullptr return when NodeDB is full and getOrCreateMeshNode() can't evict — is still a real hard-fault path that any deployment can hit when:

NodeDB sits near MAX_NUM_NODES (common on routers in active mesh areas)
Packets arrive from senders not in local NodeDB (cross-channel, corruption, cold-boot race, MQTT downlink of previously-unheard nodes)
Any concurrent heap pressure is present (phone reconnect dump, traceroute burst, MQTT flood, etc.)

My data should be read as "this bug does fire in production, just not always at the rate I originally reported" — not "this bug alone causes the crash rate I observed." The null-check is still correct and the fault class is still real under normal operation; a better-behaved client just hits it less often.

nightjoker7 · 2026-04-23T00:09:17Z

Client-side fix for the meshlogger bug I mentioned above — filed as meshtastic/python#918. Independent of the firmware null-deref here, but was amplifying the trigger rate in my deployment.

…om-node-null-deref

@thebentern

Review feedback from @thebentern (via Copilot) on PR meshtastic#10229: the null-check and has_user read called getMeshNode(mp->from) twice in the same condition, doubling the linear-scan work on every RX packet. Cache the result in a local fromNode pointer, then do the null-check and has_user read against the cached pointer. Same behaviour; one scan instead of two. Also eliminates the TOCTOU window where a concurrent task could evict the node between calls.

nightjoker7 · 2026-04-23T21:17:18Z

Addressed in a6a93d281: cached getMeshNode(mp->from) in a local fromNode pointer, one scan per packet, null-check against the cached pointer. Same behaviour as before with no TOCTOU window.

github-actions Bot added the bugfix Pull request that fixes bugs label Apr 21, 2026

thebentern requested a review from Copilot April 21, 2026 20:34

Copilot started reviewing on behalf of thebentern April 21, 2026 20:34 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Merge branch 'develop' into fix/meshservice-from-node-null-deref

2d7aa92

nightjoker7 mentioned this pull request Apr 23, 2026

StreamInterface: prevent socket/reader-thread leak on handshake failure meshtastic/python#918

Open

Merge branch 'develop' into fix/meshservice-from-node-null-deref

6b7fbce

nightjoker7 added 2 commits April 23, 2026 15:53

Merge remote-tracking branch 'origin/develop' into fix/meshservice-fr…

c6e6bb2

…om-node-null-deref

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MeshService: null-check sender node before dereferencing has_user on hot RX path#10229

MeshService: null-check sender node before dereferencing has_user on hot RX path#10229
nightjoker7 wants to merge 5 commits intomeshtastic:developfrom
nightjoker7:fix/meshservice-from-node-null-deref

nightjoker7 commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

thebentern Apr 23, 2026

Uh oh!

nightjoker7 commented Apr 22, 2026

Uh oh!

nightjoker7 commented Apr 22, 2026

Uh oh!

nightjoker7 commented Apr 23, 2026

Uh oh!

nightjoker7 commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		} else if (mp->which_payload_variant == meshtastic_MeshPacket_decoded_tag && nodeDB->getMeshNode(mp->from) != nullptr &&
		!nodeDB->getMeshNode(mp->from)->has_user && nodeInfoModule && !isPreferredRebroadcaster && !nodeDB->isFull()) {

Uh oh!

Conversation

nightjoker7 commented Apr 21, 2026

Summary

Problem

Fix

Testing

Risk

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

thebentern Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

nightjoker7 commented Apr 22, 2026

Uh oh!

nightjoker7 commented Apr 22, 2026

Uh oh!

nightjoker7 commented Apr 23, 2026

Uh oh!

nightjoker7 commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants