Skip to content

Generate final command stream and C-models for all Ambiq configs via run_matrix.py#6

Draft
Copilot wants to merge 1 commit into
add_modelsfrom
copilot/generate-final-command-stream
Draft

Generate final command stream and C-models for all Ambiq configs via run_matrix.py#6
Copilot wants to merge 1 commit into
add_modelsfrom
copilot/generate-final-command-stream

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 7, 2026

Summary

Implements automated matrix generation of Vela command streams and C-model files across all system configurations and memory modes defined in config/ambiq_final.ini, for all primary models in example_models/.

Changes

New: run_matrix.py

Orchestrates run_vela_pipeline.py across the full matrix of:

  • 6 system configs: AmbiqLP_SRAM, AmbiqLP_HBLRAM, AmbiqLP_PSRAM, AmbiqHP_SRAM, AmbiqHP_HBLRAM, AmbiqHP_PSRAM
  • 7 memory modes: Sram_Only, Shared_Sram, Dedicated_Sram, Shared_Sram_256KB, Dedicated_Sram_256KB, Dedicated_Sram_384KB, Dedicated_Sram_512KB
  • 6 models: conlarge_xl, efficientnet_lite0_s8_lg, fc_in__200__o_32_relu, ic, mobilenet_v3_sm_min_s8_md, resnet_v1_8_32_tfs_int8

Key features:

  • Automatically detects and skips incompatible combinations (Vela rejects const_mem_area=Sram; SRAM sys-configs + Shared/Dedicated_Sram memory modes are invalid — 72 combos skipped out of 252 total)
  • Parses ambiq_final.ini to resolve inherit chains for memory mode compatibility
  • Supports --models, --sys-configs, --mem-modes filters and --dry-run

Fix: run_vela_pipeline.py

Updated the Vela CLI flag from the ambiguous --output to --output-dir for compatibility with vela 5.0.0.

Generated outputs under example_models/

180 output directories were generated (6 models × 30 valid combos each), organized as:

example_models/<model_dir>/<sys_config>_<mem_mode>/
  <model_stem>_<sys_config>_<mem_mode>_cmd_data.h   ← NPU command stream
  <model_stem>_<sys_config>_<mem_mode>_meta.h        ← tensor offsets/sizes
  <model_stem>_<sys_config>_<mem_mode>_buffers.h     ← region buffer API
  <model_stem>_<sys_config>_<mem_mode>_buffers.c
  <model_stem>_<sys_config>_<mem_mode>_run.c         ← minimal Ethos-U runner
  <model_stem>_summary_<sys_config>.csv              ← Vela performance summary

.gitignore

  • *_weights.h — excluded (up to 90 MB per config for large models; regenerable via run_matrix.py)
  • *_vela.npz — excluded (binary Vela intermediate; regenerable via run_matrix.py)

To regenerate the full outputs including weights, run:

python run_matrix.py

…tput files with configs in names

Agent-Logs-Url: https://github.com/AmbiqAI/vela_example_generator/sessions/887f5f08-27b4-48bb-ac59-b9b567b8a82a

Co-authored-by: mbhssain <214526976+mbhssain@users.noreply.github.com>
Comment thread .gitignore
# Vela intermediate binary outputs (regenerable via run_matrix.py)
example_models/**/*_vela.npz

# Weight C-array headers for matrix outputs – the raw weight data is large
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Track src/*.txt files. and ignore *_weights.h

Copy link
Copy Markdown
Contributor

@mbhssain mbhssain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update PR to have src/.txt. ignore *_weights.h

@mbhssain mbhssain requested a review from Copilot April 7, 2026 21:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds a matrix runner to generate Vela command streams and Ethos-U “direct driver” C-model artifacts across all Ambiq system configurations and memory modes, and updates the Vela pipeline invocation for Vela 5.0.0 compatibility.

Changes:

  • Adds run_matrix.py to orchestrate full config × memory-mode × model generation with filtering and skip logic.
  • Updates run_vela_pipeline.py to use Vela’s --output-dir flag.
  • Commits generated headers/C sources and Vela summary CSVs under example_models/, and updates .gitignore to exclude regenerable large intermediates.

Reviewed changes

Copilot reviewed 257 out of 1083 changed files in this pull request and generated 5 comments.

File Description
example_models//_buffers.c Generated region buffers + region accessor functions for Ethos-U invocation
example_models//_buffers.h Generated function prototypes for region accessors
example_models//_meta.h Generated tensor region/offset/size metadata and region constants
example_models//summary*.csv Generated Vela performance summary outputs per configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

*/
#include <stddef.h>
#include <stdint.h>
#include "efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights.h"
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated *_buffers.c unconditionally includes a *_weights.h header, but the PR description indicates *_weights.h is excluded via .gitignore. In a clean checkout this will cause compilation failures. Consider either (mandatory) committing the weights headers for checked-in outputs, or (preferred) modifying the generator so the checked-in sources do not depend on *_weights.h (e.g., use linker-provided symbols/extern declarations in a separate optional translation unit, or a compile-time feature flag to omit weights references when weights aren’t present).

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +29
uint8_t* get_region_base_ptr(int region) {
switch(region) {
case 1: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1;
case 0: return (uint8_t*)efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights; // weights region
default: return (uint8_t*)0; // unused region
}
}

size_t get_region_size(int region) {
switch(region) {
case 1: return sizeof(efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1);
case 0: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights_size;
default: return 0;
}
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_region_base_ptr / get_region_size are defined with generic global symbol names in each generated config directory. Linking more than one generated model/config into the same binary will cause duplicate symbol link errors (or unintended symbol interposition). Make these functions model/config-specific (e.g., prefix with the model+config stem), or expose them as static and provide a unique exported wrapper/struct containing function pointers for each model.

Copilot uses AI. Check for mistakes.
Comment on lines +13 to +26
__attribute__((aligned(32))) static uint8_t efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1[1510640] = {0};

uint8_t* get_region_base_ptr(int region) {
switch(region) {
case 1: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1;
case 0: return (uint8_t*)efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights; // weights region
default: return (uint8_t*)0; // unused region
}
}

size_t get_region_size(int region) {
switch(region) {
case 1: return sizeof(efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1);
case 0: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights_size;
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scratch/region buffer size is hard-coded in the C file while the same value is also emitted into the corresponding *_meta.h (..._SCRATCH_SIZE). This creates drift risk if regeneration/generator changes ever diverge. Prefer using the generated macro for the array length (and similarly use generated region-id macros for switch cases) so the buffers layer stays consistent with metadata.

Suggested change
__attribute__((aligned(32))) static uint8_t efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1[1510640] = {0};
uint8_t* get_region_base_ptr(int region) {
switch(region) {
case 1: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1;
case 0: return (uint8_t*)efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights; // weights region
default: return (uint8_t*)0; // unused region
}
}
size_t get_region_size(int region) {
switch(region) {
case 1: return sizeof(efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1);
case 0: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights_size;
__attribute__((aligned(32))) static uint8_t efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1[efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_SCRATCH_SIZE] = {0};
uint8_t* get_region_base_ptr(int region) {
switch(region) {
case efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_SCRATCH_REGION: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1;
case efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_WEIGHTS_REGION: return (uint8_t*)efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights; // weights region
default: return (uint8_t*)0; // unused region
}
}
size_t get_region_size(int region) {
switch(region) {
case efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_SCRATCH_REGION: return sizeof(efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1);
case efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_WEIGHTS_REGION: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights_size;

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +13
// Base-pointer array length for Ethos-U
#define ETHOSU_MAX_REGIONS 8
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ETHOSU_MAX_REGIONS is defined as a generic macro in each generated *_meta.h. If multiple meta headers are included in a single build, this risks macro redefinition warnings/errors and makes it harder to change per-model in the future. Prefer either a project-level common definition, or a model-prefixed macro name (e.g., <MODEL>_<CONFIG>_ETHOSU_MAX_REGIONS), or guard with #ifndef ETHOSU_MAX_REGIONS if a single global value is intended.

Suggested change
// Base-pointer array length for Ethos-U
#define ETHOSU_MAX_REGIONS 8
// Base-pointer array length for Ethos-U
#ifndef ETHOSU_MAX_REGIONS
#define ETHOSU_MAX_REGIONS 8
#endif

Copilot uses AI. Check for mistakes.
switch(region) {
case 1: return efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_region_1;
case 0: return (uint8_t*)efficientnet_lite0_s8_lg_AmbiqLP_HBLRAM_Shared_Sram_weights; // weights region
default: return (uint8_t*)0; // unused region
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning (uint8_t*)0 is less idiomatic and less clear than returning NULL. Consider including <stddef.h> (already present) and returning NULL for unknown regions; this also makes it easier to add diagnostics (assert/log) later if desired.

Suggested change
default: return (uint8_t*)0; // unused region
default: return NULL; // unused region

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants