Null-check pool.allocZeroed() at 13 sites across 9 files#10262
Null-check pool.allocZeroed() at 13 sites across 9 files#10262nightjoker7 wants to merge 5 commits intomeshtastic:developfrom
Conversation
Fifteen sites across nine files called pool.allocZeroed() and immediately dereferenced the result without null-checking. On static MemoryPool-backed pools, allocZeroed() returns nullptr when slots are exhausted (with a LOG_WARN) — every dereference crashes. Sites fixed: - main.cpp:911 — low-entropy key warning on boot - mesh/NodeDB.cpp:1883 — duplicate-public-key warning - mesh/PhoneAPI.cpp:767 — sendNotification helper - mesh/Router.cpp:327 — duty-cycle-exceeded notification - modules/PositionModule.cpp:399 — tracker power-save notify - modules/SerialModule.cpp:98 — invalid serial config notify - modules/Telemetry/AirQualityTelemetry.cpp:418,432 — sensor sleep notify (2 sites) - modules/Telemetry/EnvironmentTelemetry.cpp:646 — sensor sleep notify - mqtt/MQTT.cpp:484,505 — MQTT publish proxy message (2 sites) - mqtt/MQTT.cpp:687,703 — invalid-config client notifications (2 sites) - mqtt/MQTT.cpp:888 — map report MeshPacket alloc Sister PRs in flight (meshtastic#10261 covers Router::allocForSending) harden the same allocation class on different call sites. Matching the existing style: sites that already null-check (AdminModule::sendWarning, RadioInterface::sendErrorNotification, MQTT.cpp:671) use the same `if (cn) { ... }` or `if (!cn) return;` pattern now applied uniformly.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds defensive null-checks around pool.allocZeroed() call sites to prevent null dereferences when static pools are exhausted under load.
Changes:
- Guard
allocZeroed()results before dereferencing across MQTT, telemetry, routing, and notification paths - In some paths, early-return or skip sending a notification when allocation fails
- Add a warning log when
packetPoolis exhausted during MQTT map reporting
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/mqtt/MQTT.cpp | Adds null checks for proxy messages, config notifications, and map-report packet allocation |
| src/modules/Telemetry/EnvironmentTelemetry.cpp | Skips client notification send when allocation fails while still scheduling sleep behavior |
| src/modules/Telemetry/AirQualityTelemetry.cpp | Skips client notification send when allocation fails (two similar sites) |
| src/modules/SerialModule.cpp | Guards client notification allocation in invalid-config path |
| src/modules/PositionModule.cpp | Guards client notification allocation in power-save tracker notify path |
| src/mesh/Router.cpp | Guards notification allocation on duty-cycle exceeded path |
| src/mesh/PhoneAPI.cpp | Adds early return if notification allocation fails |
| src/mesh/NodeDB.cpp | Guards notification allocation for duplicate public key warning |
| src/main.cpp | Guards notification allocation for low-entropy key warning |
…ation-null-checks
Five issues raised by @thebentern and @Copilot: 1. **MQTT map report alloc failure (thebentern, MQTT.cpp:890)**: changed LOG_WARN to LOG_ERROR — map reports aren't retried, so losing one on alloc failure is worth error-level visibility. 2. **MQTT proxy publish silently drops (Copilot, MQTT.cpp:486+509)**: added LOG_WARN on pool exhaustion so operators can see why a publish failed instead of just getting a silent false return. 3. **Low-entropy warning spin (Copilot, main.cpp:918)**: nodeDB->hasWarned was only set on allocZeroed() success, so if the pool was full we'd retry every tick and re-LOG_WARN forever. Set unconditionally — the LOG_WARN is what matters for visibility, the notification is best-effort. 4. **Level parameter ignored (Copilot, PhoneAPI.cpp:772)**: sendNotification took a level parameter but hardcoded cn->level = WARNING. Honor the caller's level. Also switched strncpy to NUL-terminating form because meshtastic_ClientNotification.message may not be NUL-terminated if the source string fills the buffer. 5. **sprintf overflow (Copilot, NodeDB.cpp:1887)**: the public-key-duplicate warning used sprintf with a remote-controlled p.long_name (up to 40 bytes) formatted into a ~200-byte warning template. Switched to snprintf against sizeof(cn->message).
|
Addressed all 5 review points in 055bf7b52: map-report LOG_ERROR, proxy-publish LOG_WARN, hasWarned unconditional, level-parameter honored, sprintf→snprintf for public-key warning. See commit message for per-issue detail. |
| // Allocate MeshPacket and fill it | ||
| meshtastic_MeshPacket *mp = packetPool.allocZeroed(); | ||
| if (!mp) { | ||
| LOG_ERROR("MQTT Map report: packet pool exhausted"); |
There was a problem hiding this comment.
On packetPool exhaustion this returns immediately without updating last_report_to_map, so perhapsReportToMap() will attempt again on every runOnce() iteration (20–200ms) and can spam logs while the pool remains exhausted. Consider throttling this error path (e.g., set last_report_to_map or a separate last_map_report_alloc_fail timestamp) so failures back off to the configured publish interval.
| LOG_ERROR("MQTT Map report: packet pool exhausted"); | |
| LOG_ERROR("MQTT Map report: packet pool exhausted"); | |
| last_report_to_map = millis(); |
| // notice; if the ClientNotification pool is exhausted we'd otherwise spin every tick | ||
| // retrying + spamming LOG_WARN. |
There was a problem hiding this comment.
This comment says we’d otherwise “spin every tick retrying”, but this check runs in setup() (single-shot). Consider rewording to reflect the actual behavior (e.g., avoiding repeated warnings across restarts / multiple setup invocations) or remove the “every tick” phrasing to prevent confusion.
| // notice; if the ClientNotification pool is exhausted we'd otherwise spin every tick | |
| // retrying + spamming LOG_WARN. | |
| // notice; if the ClientNotification pool is exhausted we should not retry on a later | |
| // setup invocation and emit the same LOG_WARN again. |
| if (!msg) { | ||
| LOG_WARN("MQTT proxy publish skipped: message pool exhausted"); | ||
| return false; | ||
| } |
There was a problem hiding this comment.
allocZeroed()/MemoryPool already emits warnings when the pool is exhausted, so this additional LOG_WARN can double/triple-log and flood output when publishes are frequent. Prefer relying on the allocator log, lowering severity, or adding throttling.
…+ reword setup comment Three line-item nits from Copilot's post-fixup pass on meshtastic#10262: 1. perhapsReportToMap() pool-exhaustion path: didn't update last_report_to_map, so the caller in runOnce() would retry every 20-200 ms while the pool stayed exhausted, turning sustained pool pressure into a log-flood. Back off to the configured publish interval by stamping last_report_to_map = millis() on allocation failure. 2. MQTT::publish() proxy overloads: MemoryPool::alloc already LOG_WARNs on exhaustion, so the caller-side LOG_WARN("MQTT proxy publish skipped: message pool exhausted") was double-logging under the same stress that triggers the allocator's own warn. Drop both (char* and uint8_t* overloads) and rely on the allocator log. 3. main.cpp low-entropy-key warn block: the comment said the unconditional hasWarned assignment prevented spinning 'every tick', but setup() is single-shot. Reword to reflect the actual behavior: hasWarned persists in NodeDB so the warning is suppressed across subsequent boots.
Summary
13 sites across 9 files called `pool.allocZeroed()` and immediately dereferenced the result without null-checking. When the static MemoryPool is exhausted, `allocZeroed()` returns nullptr (with a LOG_WARN in Allocator::allocZeroed) — every dereference that follows is a crash.
From MemoryPool.h:149-160: static pools return nullptr when no free slot is available. That's a real failure mode under burst load.
Sites fixed
Style
Matches the existing pattern used in already-null-checked sites:
Used `if (!cn) return;` for single-exit contexts and `if (cn) { ... }` where the surrounding code needs to run regardless (eg. `sleepOnNextExecution = true;` still happens in the telemetry modules even when the notification can't be sent).
Related