The frame is the harness. The trace is what lets the next agent understand the run.
Agents do not need more autonomy before they have inspectable traces.
A local-first Rust library and CLI for AI agent workflow traces.
Development status: local MVP. Traceframe is public, installable from source, and verified by CI, but the trace schema and CLI are still intentionally narrow. Expect breaking schema/CLI changes while the project is tested against real agent workflows. Use it first for local harness inspection, examples, and failure analysis.
Traceframe records what an AI agent actually did: model calls, tool calls, permission decisions, command results, errors, final state, and the order in which those things happened. It is a small Rust crate and CLI for local harness engineering, not a SaaS dashboard.
A serious agent harness has multiple layers:
- Runtime and sandbox controls: the agent harness, containers, OS confinement, and native approval modes.
- Policy decision gateways: a policy layer that decides whether an agent may perform a capability.
- Trace capture: the ordered run artifact showing what happened, which decisions were made, which tools ran, what failed, and how the run ended.
- Review and conversion: humans or follow-up agents turn failed traces into policies, tests, evals, or workflow fixes.
- Export surfaces: OpenTelemetry, dashboards, issue reports, PR comments, or HTML summaries.
Traceframe owns layer 3. It does not try to own the whole stack.
A policy layer answers:
What is this agent allowed to do?
Traceframe answers:
What did this agent actually do, and why did it fail?
Agent failures are often hard to review after the fact. A transcript is not a trace. A shell log is not a trace. A permission decision alone does not explain the full episode around it.
The fix is a durable artifact: an agent should leave evidence that another agent or a human can inspect after the run — not a transcript, not a shell log, but an ordered record of what happened. Traceframe is deliberately narrow: it records the run, and nothing more.
Traceframe takes a narrow stance:
- Local-first. A trace is a file you can inspect, diff, archive, attach to an issue, or hand to another agent.
- Append-only trace files. Each event is one durable record. Partial writes are recoverable and agent-readable.
- Harness-oriented. Events are about runs, model calls, tool calls, permission decisions, errors, and final state.
- No SaaS dependency. Dashboards and OpenTelemetry can come later as export surfaces, not as the core contract.
- Useful failure artifacts. A failed run should become a policy, test, eval, or workflow improvement.
From this repository:
cargo install --path .traceframe run --file run.traceframe --run-id run-demo -- cargo test
traceframe summary --file run.traceframe
traceframe inspect --file run.traceframe
traceframe render --file run.traceframe --html traceframe.htmlFor longer workflows, keep a trace open and append events as the harness runs:
traceframe init --file workflow.traceframe --run-id run-demo
traceframe record --file workflow.traceframe --kind model.call --payload '{"provider":"openai","model":"gpt"}'
traceframe record --file workflow.traceframe --kind permission.decision --payload '{"capability":"fs.write:README.md","decision":"allow"}'
traceframe exec --file workflow.traceframe -- cargo test
traceframe finish --file workflow.traceframe --status success
traceframe verify --file workflow.traceframe
traceframe summary --file workflow.traceframe
traceframe inspect --file workflow.traceframe
traceframe render --file workflow.traceframe --html traceframe.htmlrecord remains available for raw structured events. For day-to-day harness
use, run, exec, and finish avoid hand-writing the common tool.call,
tool.result, and run.finished payloads. summary, inspect, and render
also work on open traces so interrupted agent runs can still be reviewed.
Once runs accumulate under .traceframe/runs/, rebuild the local ledger. Omit
--file when you want run to use the default local run directory:
traceframe run --run-id run-demo -- cargo test
traceframe ledger rebuild
traceframe ledger list
traceframe ledger list --status failed
traceframe ledger show --run-id run-demoThe ledger is a derived catalog, not a database and not a second source of truth. If it is stale or deleted, rebuild it from the trace files.
For host hooks, ingest the JSON payload from stdin instead of wrapping each
command manually. Use --dir for per-session traces: traceframe derives the
run id from the payload's session, writes <dir>/<run_id>.traceframe, and
creates it on first use, so the wired command needs no --run-id or
--init-if-missing:
traceframe hook ingest \
--source generic \
--dir .traceframe/runs <<'JSON'
{"hook_event_name":"PreToolUse","tool_name":"Bash","tool_input":{"command":"cargo test"},"session_id":"host-session"}
JSON--source is a free-form label the host chooses (default generic);
traceframe stores it verbatim and never names a specific harness.
To target one explicit trace file instead, pass --file (with
--init-if-missing for the first event). Pass exactly one of --file or
--dir.
To wire a host so it pipes hook payloads into hook ingest, use the idempotent
installer. It merges traceframe entries into a local hooks file (default
.agent/hooks.json), or prints a snippet to paste by hand when the host's
settings file is global or delicate:
traceframe hook install
traceframe hook install --printThe wired command is traceframe hook ingest --source generic --dir .traceframe/runs:
it derives the run id from the host session id and writes one
<run-id>.traceframe per session. To capture a real agent session end to end
(wire → run a real agent → verify → render), run
scripts/capture-session.sh (set AGENT_CMD to
launch your harness, or let it fall back to a traceframe run recording);
examples/agent-session.traceframe is a
real (sanitized) capture of one such session.
Once a run is recorded, audit it against capability/permission policy:
traceframe policy-check --file .traceframe/runs/agent-run-demo.traceframepolicy-check fails when a permission deny is never resolved by a later allow,
or when a sensitive public capability (git push / git.push) ran without a
recorded permission allow.
Use verify + policy-check as a gate in CI or a pre-push hook so a public
action is blocked when the run behind it isn't clean. See
docs/ci-gate.md and the example
.github/workflows/evidence-gate.yml.
See docs/hooks.md for the agent-hook pattern, the installer,
and the policy-check rules.
run_id: run-agent-demo
status: failed
events: 8
model_calls: 1
tool_calls: 1
tool_results: 1
permission_decisions: 2
errors: 1
duration_ms: 110
See examples/agent-run.traceframe
for a sample run with an allowed permission, a denied permission, a failed tool
result, and a final failed state.
Rust harnesses can write traces directly with TraceRecorder:
use traceframe::trace::TraceRecorder;
let recorder = TraceRecorder::start(
".traceframe/runs/my-agent-run.traceframe",
"my-agent-run",
true,
)?;
recorder.model_call("openai", "gpt-5.5")?;
recorder.permission_decision("fs.write:README.md", "allow")?;
recorder.tool_call("shell", "cargo test", ["cargo", "test"])?;
recorder.tool_result("shell", "cargo test", true, Some(0), Some(320))?;
recorder.finish("success", Some("harness completed"))?;See docs/harness-integration.md and
examples/harness-recorder.rs.
v0.1 supports one run per trace file. The public contract is the event model; the current local file encoding is line-delimited JSON for simple append, inspection, and recovery.
Required event fields:
versionrun_idevent_idkindts_msseqpayload
Supported event kinds:
run.startedmodel.calltool.calltool.resultpermission.decisionerrorrun.finished
Traceframe's CLI is designed for both humans and agents:
- commands print stable, aligned summaries;
runcreates, records, and closes a command trace in one step;- wrapped command stdout/stderr is preserved;
execreturns the wrapped command's exit code;hook ingestlets any agent host append tool, result, permission, and error events from stdin;hook installidempotently wires a local hooks file (or prints a snippet to paste by hand) so a host pipes payloads intohook ingest;policy-checkaudits a trace for unresolved permission denies and sensitive capabilities (git push) that ran without a recorded allow;- command traces include argv, exit code, duration, byte counts, and bounded stdout/stderr previews;
- open traces can still be summarized, inspected, and rendered;
ledger rebuild/list/showgives agents a stable catalog once many local runs exist;- the raw trace file remains the source of truth when an agent needs to inspect or pass the run evidence to another step.
Every public change should pass the same gate that CI runs:
cargo fmt --check
cargo clippy -- -D warnings
cargo test
cargo llvm-cov --workspace --all-targets --fail-under-lines 80
sh scripts/check-release-readiness.sh
sh scripts/host-smoke.sh
sh scripts/hook-smoke.shThe 80% line-coverage threshold is intentionally modest for v0.1, but it is a floor, not a target. New command behavior should come with focused tests before it is treated as part of the tool.
scripts/host-smoke.sh is the deeper dogfood path. It creates real success,
failed, manual, and open traces in a temporary workspace, renders HTML, rebuilds
the ledger, filters by status, and verifies the Rust harness example.
scripts/hook-smoke.sh separately simulates the host-hook path used by generic
agent workflows.
Traceframe stores traces as local append-only files. The trace file is the source of truth. The current implementation uses line-delimited JSON because it is simple to append, inspect, diff, and recover from. A database may be added later as a derived local index, but not as the primary record of what the agent did.
The run ledger is the first derived storage layer. It catalogs local trace files
for discovery, filtering, and handoff, but it is intentionally rebuildable from
.traceframe/runs/*.traceframe.
See docs/storage.md for the storage decision record and
tradeoffs.
This repo ships an agent-facing skill at
skills/traceframe. Install or copy it into your agent
harness's skill directory when agents should know the correct Traceframe
operating contract, commands, and release gate.
Traceframe deliberately starts with one contract: capture a local, ordered, inspectable record of an agent run. Runtime control, permission policy, dashboards, OpenTelemetry export, eval suites, and prompt management can connect around that trace contract, but they should not define v0.1.
Traceframe is pre-1.0. While the project is in local-MVP/alpha territory, breaking changes to the event schema, CLI flags, or output contracts may happen without a major version bump. The short-term goal is not feature breadth; it is to prove that the local trace contract is useful inside real agent workflows.
- Agents are primary operators; humans are reviewers and operators.
- A trace must help explain a real run, not decorate a dashboard.
- A failed run should become a test, policy, eval, or workflow improvement.
- Local trace evidence comes before SaaS.
- Export surfaces come after the core trace contract is useful.