inst64: Drive on-the-fly transpose; add the snitch integration harness by DanielKellerM · Pull Request #113 · pulp-platform/iDMA

DanielKellerM · 2026-06-10T14:14:16Z

Summary

Stacked on #112 (the first 9 commits are that PR — review the last 2 commits here). This adds the frontend and system side of the on-the-fly transpose: the Snitch inst64 frontend learns to issue transpose transfers, and a standalone integration harness verifies the full path end-to-end.

inst64 frontend

Transpose requests are encoded in spare DMCPY argb bits: {enable[5], mode[7:6], tensor_m[19:8], tensor_n[31:20]} (register form only)
A ComputeEnable parameter gates everything at compile time: with it cleared the frontend elaborates exactly as before (NumDim stays 2, no expander)
With transpose enabled, NumDim=4, strides widen to the address width, and idma_transpose_midend is spliced between the request FIFO and the nd_midend
Malformed requests are rejected with an accelerator error response instead of mis-executing: transpose without hardware, reserved element mode, zero tensor dimensions, dst not bus-aligned
An elaboration assert cross-checks the generated backend's baked compute capability against the frontend's

Copilot

Pull request overview

This PR wires the on-the-fly transpose feature end-to-end for the Snitch inst64 frontend and adds a standalone Snitch integration harness, while also extending the backend generation flow to optionally include compute support in selected variants.

Changes:

Add a typed per-transfer opt.compute capability (transpose op + params) and route it through legalizer/backend/transport to a write-seam compute dispatcher and transpose engine.
Extend the inst64 frontend to decode transpose requests from spare DMCPY argb bits, expand transpose geometry via a new idma_transpose_midend, and reject malformed transpose requests.
Add new SV/DPI-C testbenches (engine-level + ND/back-to-back) and a Snitch inst64 integration harness + Makefile flow (snitch_transpose_sweep), plus docs and Bender target support (split_rtl).

Reviewed changes

Copilot reviewed 36 out of 37 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
util/mario/util.py	Add parsing for `--compute-ids` configuration strings (ops + fd/hd).
util/mario/transport_layer.py	Pass compute enable/op info into transport-layer templating context.
util/mario/legalizer.py	Pass compute enable flag into legalizer templating context.
util/mario/backend.py	Enforce “single AXI write port” constraint for compute-enabled variants and pass ops into backend templating context.
util/gen_idma.py	Add `--compute-ids` CLI and propagate compute config into generators.
test/tb_idma_transpose_nd.sv	Multi-tile end-to-end transpose test via ND midend → compute backend → AXI sim mem.
test/tb_idma_transpose_b2b.sv	End-to-end back-to-back transpose regression to distinct destinations.
test/tb_idma_otf_transpose.sv	Standalone transpose engine SV testbench using DPI-C golden model.
test/midend/tb_idma_transpose_midend.sv	Unit test for transpose geometry expansion midend.
test/midend/tb_idma_nd_midend_b2b.sv	Back-to-back ND midend base-address reload regression under backpressure.
test/idma_transpose_dpi.c	DPI-C golden model for element-granular transpose verification.
test/idma_test.sv	Extend request-driving task to optionally program transpose compute options.
systems/snitch/test/tb_idma_inst64_transpose.sv	Snitch `inst64` end-to-end transpose integration test (incl. rejects + no-leak).
systems/snitch/test/tb_idma_inst64_copy.sv	Snitch `inst64` plain-copy regression.
systems/snitch/test/idma_inst64_tb_pkg.sv	Package/types/constants for the standalone Snitch harness.
systems/snitch/test/idma_inst64_drv_if.sv	Accelerator-bus BFM tasks, including `DMCPY`-encoded transpose launch helpers.
systems/snitch/test/idma_inst64_base.sv	Base harness instantiating `idma_inst64_top` + AXI sim memories.
systems/snitch/README.md	Document Snitch harness purpose, build flow, and transpose contract.
systems/snitch/Makefile	Standalone build + sim/sweep targets for the Snitch harness.
systems/snitch/.gitignore	Ignore build products for the Snitch harness flow.
src/midend/idma_transpose_midend.sv	New combinational transpose geometry expander (NumDim=4) for ND midend.
src/midend/idma_nd_midend.sv	Add non-synth assertion enforcing stride width == address width.
src/include/idma/typedef.svh	Extend `options_t` with typed `compute` options field.
src/idma_pkg.sv	Define compute op enum, transpose params, compute options, and feature enable struct.
src/frontend/inst64/idma_inst64_top.sv	Add `ComputeEnable` param, decode/validate transpose from `DMCPY`, splice transpose midend, widen strides to addr width, add backend capability cross-check.
src/db/idma_tilelink.yml	Forward compute options into write datapath request struct where needed.
src/db/idma_axi.yml	Forward compute options; extend AXI write template to accept strobe mask + beat-done pulse.
src/backend/tpl/idma_transport_layer.sv.tpl	Add write-seam compute integration (dispatcher + mask/beat-done plumbing).
src/backend/tpl/idma_legalizer.sv.tpl	Force decouple on compute transfers; propagate compute options into mutable transfer opts and write datapath req.
src/backend/tpl/idma_backend.sv.tpl	Add compute-enabled variant metadata (`ComputeEnable`), enforce NO_ERROR_HANDLING, increase meta FIFO depth for compute latency, propagate compute options into write datapath req.
src/backend/idma_otf_transpose.sv	New transpose engine (tile ping-pong) producing per-byte strobe mask.
src/backend/idma_otf_compute.sv	New write-seam compute dispatcher (currently transpose only).
src/backend/idma_axi_write.sv	Add external strobe mask input and a strobe-independent “beat accepted” pulse output.
jobs/backend_rw_axi/transpose_none.txt	Add job artifact/marker for transpose-none configuration (empty in this diff).
idma.mk	Add compute-enabled variant list (`IDMA_VIDMA_IDS`), propagate to generator, add simulation targets for transpose regressions, include `split_rtl` in vsim script target set.
doc/transpose-engine-routing-plan.md	Detailed routing/signaling plan and rationale for transpose integration.
Bender.yml	Add compute RTL, new midend, Snitch harness sources, transpose tests, and introduce `split_rtl` generated-file selection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    /// Extra write-descriptor slots covering the compute (transpose) tile-fill latency
+    localparam int unsigned ComputeFifoDepth = ${"StrbWidth" if enable_compute else "32'd0"};
+% if enable_compute:
+
+    /// Per-op compute set baked into this variant (frontends may cross-check)
+    localparam idma_pkg::compute_enable_t ComputeEnable =
+        '{${', '.join("%s: 1'b1" % op for op in compute_ops)}};
+`ifndef SYNTHESIS
+    // no engine flush on abort: compute is incompatible with error handling
+    initial assert (ErrorCap == idma_pkg::NO_ERROR_HANDLING) else
+        $fatal(1, "compute requires ErrorCap == NO_ERROR_HANDLING");
+`endif
+% endif


+  // full/empty token
+  always_ff @(posedge clk_i or negedge rst_ni) begin
+    if (!rst_ni || clear_i || exec_done) begin
+      full_q <= 2'b00;
+    end else begin


Verbatim copy of rtl/datamover_engine.sv from pulp-platform/datamover@d58a985 (branch cdurrer/konark), the transpose core of the Ratha HWPE. Co-authored-by: Sergio Mazzola <smazzola@iis.ee.ethz.ch> Co-authored-by: Cyrill Durrer <cdurrer@iis.ee.ethz.ch>

Rework the imported datamover_engine.sv to iDMA conventions: plain valid/ready with byte/strb ports, no hwpe_stream/hci dependencies, transpose only. Runtime element size (int8/fp16/fp32), element-granular edge strobe, ping-pong tile banks with a half-area FullDuplex=0 option, and a standalone DPI-C golden regression.

compute_options_t carries {enable, op, params} in the request options; transpose_options_t packs the element mode and tensor shape; compute_enable_t is the compile-time per-op build gate.

idma_transpose_midend derives the NumDim=4 tiled walk (row / row-tile / col-tile) from the tensor shape and the bus StrbWidth, leaving the generic nd_midend to walk it; the geometry folds to shifts except one stride product. Guards the domain (StrbWidth >= 4, reserved mode, zero dims) and documents the tile-padded access contract; nd_midend asserts strides match the address width.

idma_otf_compute latches the per-transfer compute options and runtime-selects one op per transfer; the AXI write manager gains an external strobe mask and a strobe-independent beat-done so edge tiles drain. Compute support is decided at generation time: IDMA_VIDMA_IDS entries (variant[:ops][:fd|hd]) render the seam, the per-op ComputeEnable set and the transpose duplex into the listed variants only, non-compute variants are untouched. The write-side FIFOs grow by a tile to clear the legalizer in-flight bound and compute variants require NO_ERROR_HANDLING.

Multi-tile aligned and edge transposes through the rw_axi backend, back-to-back geometry-leak checks, an nd_midend burst-address regression, a field-for-field midend unit test and launch_tf transpose options; the engine regression runs in both duplex modes.

Decode the transpose from spare DMCPY argb bits into opt.compute, expand NumDim to 4 with addr-width strides and splice the transpose midend between the request FIFO and the nd_midend, gated by a ComputeEnable parameter. Malformed requests (no hardware, reserved mode, zero dims, unaligned dst) get an error response and the backend's baked compute set is cross-checked at elaboration.

Standalone BFM harness driving the accelerator port: copy and transpose testbenches and a sweep covering all element sizes, tiling, edge, back-to-back, leak and reject cases, registered behind the snitch_cluster target; the flow regenerates the RTL before compiling.

lint-authors requires a blank line after the Authors block (the YAML folded header-regex carries a trailing newline) and the plural "Authors:" tag. Normalize the new transpose/snitch files accordingly.

Copilot AI review requested due to automatic review settings June 10, 2026 14:14

DanielKellerM requested review from micprog and thommythomaso as code owners June 10, 2026 14:14

Copilot started reviewing on behalf of DanielKellerM June 10, 2026 14:14 View session

DanielKellerM marked this pull request as draft June 10, 2026 14:18

Copilot AI reviewed Jun 10, 2026

View reviewed changes

DanielKellerM mentioned this pull request Jun 10, 2026

build: Add per-top trimmed vsim compile scripts #116

Merged

FrancescoConti and others added 9 commits June 11, 2026 16:50

idma_pkg: Add the per-transfer compute request model

c55a1ed

compute_options_t carries {enable, op, params} in the request options; transpose_options_t packs the element mode and tensor shape; compute_enable_t is the compile-time per-op build gate.

doc: Add the transpose engine routing plan

a811cf8

DanielKellerM force-pushed the systems/snitch-integration branch from 30bf0a1 to 854427a Compare June 11, 2026 14:52

test: Add Solderpad license headers to transpose files

1cccfcb

lint-authors requires a blank line after the Authors block (the YAML folded header-regex carries a trailing newline) and the plural "Authors:" tag. Normalize the new transpose/snitch files accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inst64: Drive on-the-fly transpose; add the snitch integration harness#113

inst64: Drive on-the-fly transpose; add the snitch integration harness#113
DanielKellerM wants to merge 10 commits into
pulp-platform:develfrom
DanielKellerM:systems/snitch-integration

DanielKellerM commented Jun 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DanielKellerM commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

inst64 frontend

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DanielKellerM commented Jun 10, 2026 •

edited

Loading