Generate final command stream and C-models for all Ambiq configs via run_matrix.py#6
Generate final command stream and C-models for all Ambiq configs via run_matrix.py#6Copilot wants to merge 1 commit into
Conversation
…tput files with configs in names Agent-Logs-Url: https://github.com/AmbiqAI/vela_example_generator/sessions/887f5f08-27b4-48bb-ac59-b9b567b8a82a Co-authored-by: mbhssain <214526976+mbhssain@users.noreply.github.com>
| # Vela intermediate binary outputs (regenerable via run_matrix.py) | ||
| example_models/**/*_vela.npz | ||
|
|
||
| # Weight C-array headers for matrix outputs – the raw weight data is large |
There was a problem hiding this comment.
Track src/*.txt files. and ignore *_weights.h
mbhssain
left a comment
There was a problem hiding this comment.
Update PR to have src/.txt. ignore *_weights.h
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adds a matrix runner to generate Vela command streams and Ethos-U “direct driver” C-model artifacts across all Ambiq system configurations and memory modes, and updates the Vela pipeline invocation for Vela 5.0.0 compatibility.
Changes:
- Adds
run_matrix.pyto orchestrate full config × memory-mode × model generation with filtering and skip logic. - Updates
run_vela_pipeline.pyto use Vela’s--output-dirflag. - Commits generated headers/C sources and Vela summary CSVs under
example_models/, and updates.gitignoreto exclude regenerable large intermediates.
Reviewed changes
Copilot reviewed 257 out of 1083 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| example_models//_buffers.c | Generated region buffers + region accessor functions for Ethos-U invocation |
| example_models//_buffers.h | Generated function prototypes for region accessors |
| example_models//_meta.h | Generated tensor region/offset/size metadata and region constants |
| example_models//summary*.csv | Generated Vela performance summary outputs per configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| */ | ||
| #include <stddef.h> | ||
| #include <stdint.h> | ||
| #include "efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights.h" |
There was a problem hiding this comment.
The generated *_buffers.c unconditionally includes a *_weights.h header, but the PR description indicates *_weights.h is excluded via .gitignore. In a clean checkout this will cause compilation failures. Consider either (mandatory) committing the weights headers for checked-in outputs, or (preferred) modifying the generator so the checked-in sources do not depend on *_weights.h (e.g., use linker-provided symbols/extern declarations in a separate optional translation unit, or a compile-time feature flag to omit weights references when weights aren’t present).
| uint8_t* get_region_base_ptr(int region) { | ||
| switch(region) { | ||
| case 1: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1; | ||
| case 0: return (uint8_t*)efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights; // weights region | ||
| default: return (uint8_t*)0; // unused region | ||
| } | ||
| } | ||
|
|
||
| size_t get_region_size(int region) { | ||
| switch(region) { | ||
| case 1: return sizeof(efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1); | ||
| case 0: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights_size; | ||
| default: return 0; | ||
| } | ||
| } |
There was a problem hiding this comment.
get_region_base_ptr / get_region_size are defined with generic global symbol names in each generated config directory. Linking more than one generated model/config into the same binary will cause duplicate symbol link errors (or unintended symbol interposition). Make these functions model/config-specific (e.g., prefix with the model+config stem), or expose them as static and provide a unique exported wrapper/struct containing function pointers for each model.
| __attribute__((aligned(32))) static uint8_t efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1[1510640] = {0}; | ||
|
|
||
| uint8_t* get_region_base_ptr(int region) { | ||
| switch(region) { | ||
| case 1: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1; | ||
| case 0: return (uint8_t*)efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights; // weights region | ||
| default: return (uint8_t*)0; // unused region | ||
| } | ||
| } | ||
|
|
||
| size_t get_region_size(int region) { | ||
| switch(region) { | ||
| case 1: return sizeof(efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1); | ||
| case 0: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights_size; |
There was a problem hiding this comment.
The scratch/region buffer size is hard-coded in the C file while the same value is also emitted into the corresponding *_meta.h (..._SCRATCH_SIZE). This creates drift risk if regeneration/generator changes ever diverge. Prefer using the generated macro for the array length (and similarly use generated region-id macros for switch cases) so the buffers layer stays consistent with metadata.
| __attribute__((aligned(32))) static uint8_t efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1[1510640] = {0}; | |
| uint8_t* get_region_base_ptr(int region) { | |
| switch(region) { | |
| case 1: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1; | |
| case 0: return (uint8_t*)efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights; // weights region | |
| default: return (uint8_t*)0; // unused region | |
| } | |
| } | |
| size_t get_region_size(int region) { | |
| switch(region) { | |
| case 1: return sizeof(efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1); | |
| case 0: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights_size; | |
| __attribute__((aligned(32))) static uint8_t efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1[efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_SCRATCH_SIZE] = {0}; | |
| uint8_t* get_region_base_ptr(int region) { | |
| switch(region) { | |
| case efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_SCRATCH_REGION: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1; | |
| case efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_WEIGHTS_REGION: return (uint8_t*)efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights; // weights region | |
| default: return (uint8_t*)0; // unused region | |
| } | |
| } | |
| size_t get_region_size(int region) { | |
| switch(region) { | |
| case efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_SCRATCH_REGION: return sizeof(efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1); | |
| case efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_WEIGHTS_REGION: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights_size; |
| // Base-pointer array length for Ethos-U | ||
| #define ETHOSU_MAX_REGIONS 8 |
There was a problem hiding this comment.
ETHOSU_MAX_REGIONS is defined as a generic macro in each generated *_meta.h. If multiple meta headers are included in a single build, this risks macro redefinition warnings/errors and makes it harder to change per-model in the future. Prefer either a project-level common definition, or a model-prefixed macro name (e.g., <MODEL>_<CONFIG>_ETHOSU_MAX_REGIONS), or guard with #ifndef ETHOSU_MAX_REGIONS if a single global value is intended.
| // Base-pointer array length for Ethos-U | |
| #define ETHOSU_MAX_REGIONS 8 | |
| // Base-pointer array length for Ethos-U | |
| #ifndef ETHOSU_MAX_REGIONS | |
| #define ETHOSU_MAX_REGIONS 8 | |
| #endif |
| switch(region) { | ||
| case 1: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1; | ||
| case 0: return (uint8_t*)efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights; // weights region | ||
| default: return (uint8_t*)0; // unused region |
There was a problem hiding this comment.
Returning (uint8_t*)0 is less idiomatic and less clear than returning NULL. Consider including <stddef.h> (already present) and returning NULL for unknown regions; this also makes it easier to add diagnostics (assert/log) later if desired.
| default: return (uint8_t*)0; // unused region | |
| default: return NULL; // unused region |
Summary
Implements automated matrix generation of Vela command streams and C-model files across all system configurations and memory modes defined in
config/ambiq_final.ini, for all primary models inexample_models/.Changes
New:
run_matrix.pyOrchestrates
run_vela_pipeline.pyacross the full matrix of:AmbiqLP_SRAM,AmbiqLP_HBLRAM,AmbiqLP_PSRAM,AmbiqHP_SRAM,AmbiqHP_HBLRAM,AmbiqHP_PSRAMSram_Only,Shared_Sram,Dedicated_Sram,Shared_Sram_256KB,Dedicated_Sram_256KB,Dedicated_Sram_384KB,Dedicated_Sram_512KBconlarge_xl,efficientnet_lite0_s8_lg,fc_in__200__o_32_relu,ic,mobilenet_v3_sm_min_s8_md,resnet_v1_8_32_tfs_int8Key features:
const_mem_area=Sram;SRAMsys-configs +Shared/Dedicated_Srammemory modes are invalid — 72 combos skipped out of 252 total)ambiq_final.inito resolveinheritchains for memory mode compatibility--models,--sys-configs,--mem-modesfilters and--dry-runFix:
run_vela_pipeline.pyUpdated the Vela CLI flag from the ambiguous
--outputto--output-dirfor compatibility with vela 5.0.0.Generated outputs under
example_models/180 output directories were generated (6 models × 30 valid combos each), organized as:
.gitignore*_weights.h— excluded (up to 90 MB per config for large models; regenerable viarun_matrix.py)*_vela.npz— excluded (binary Vela intermediate; regenerable viarun_matrix.py)To regenerate the full outputs including weights, run: