Skip to content

Arakiss/traceframe

traceframe - inspectable traces for AI agent workflows

The frame is the harness. The trace is what lets the next agent understand the run.

CI License: MIT Rust 1.94+ Trace local file Status: local MVP

traceframe

Agents do not need more autonomy before they have inspectable traces.

A local-first Rust library and CLI for AI agent workflow traces.

Development status: local MVP. Traceframe is public, installable from source, and verified by CI, but the trace schema and CLI are still intentionally narrow. Expect breaking schema/CLI changes while the project is tested against real agent workflows. Use it first for local harness inspection, examples, and failure analysis.

Traceframe records what an AI agent actually did: model calls, tool calls, permission decisions, command results, errors, final state, and the order in which those things happened. It is a small Rust crate and CLI for local harness engineering, not a SaaS dashboard.

Where it fits

A serious agent harness has multiple layers:

  1. Runtime and sandbox controls: the agent harness, containers, OS confinement, and native approval modes.
  2. Policy decision gateways: a policy layer that decides whether an agent may perform a capability.
  3. Trace capture: the ordered run artifact showing what happened, which decisions were made, which tools ran, what failed, and how the run ended.
  4. Review and conversion: humans or follow-up agents turn failed traces into policies, tests, evals, or workflow fixes.
  5. Export surfaces: OpenTelemetry, dashboards, issue reports, PR comments, or HTML summaries.

Traceframe owns layer 3. It does not try to own the whole stack.

A policy layer answers:

What is this agent allowed to do?

Traceframe answers:

What did this agent actually do, and why did it fail?

Why

Agent failures are often hard to review after the fact. A transcript is not a trace. A shell log is not a trace. A permission decision alone does not explain the full episode around it.

The fix is a durable artifact: an agent should leave evidence that another agent or a human can inspect after the run — not a transcript, not a shell log, but an ordered record of what happened. Traceframe is deliberately narrow: it records the run, and nothing more.

Traceframe takes a narrow stance:

  • Local-first. A trace is a file you can inspect, diff, archive, attach to an issue, or hand to another agent.
  • Append-only trace files. Each event is one durable record. Partial writes are recoverable and agent-readable.
  • Harness-oriented. Events are about runs, model calls, tool calls, permission decisions, errors, and final state.
  • No SaaS dependency. Dashboards and OpenTelemetry can come later as export surfaces, not as the core contract.
  • Useful failure artifacts. A failed run should become a policy, test, eval, or workflow improvement.

Install

From this repository:

cargo install --path .

Quick start

traceframe run --file run.traceframe --run-id run-demo -- cargo test
traceframe summary --file run.traceframe
traceframe inspect --file run.traceframe
traceframe render --file run.traceframe --html traceframe.html

For longer workflows, keep a trace open and append events as the harness runs:

traceframe init --file workflow.traceframe --run-id run-demo
traceframe record --file workflow.traceframe --kind model.call --payload '{"provider":"openai","model":"gpt"}'
traceframe record --file workflow.traceframe --kind permission.decision --payload '{"capability":"fs.write:README.md","decision":"allow"}'
traceframe exec --file workflow.traceframe -- cargo test
traceframe finish --file workflow.traceframe --status success
traceframe verify --file workflow.traceframe
traceframe summary --file workflow.traceframe
traceframe inspect --file workflow.traceframe
traceframe render --file workflow.traceframe --html traceframe.html

record remains available for raw structured events. For day-to-day harness use, run, exec, and finish avoid hand-writing the common tool.call, tool.result, and run.finished payloads. summary, inspect, and render also work on open traces so interrupted agent runs can still be reviewed.

Once runs accumulate under .traceframe/runs/, rebuild the local ledger. Omit --file when you want run to use the default local run directory:

traceframe run --run-id run-demo -- cargo test
traceframe ledger rebuild
traceframe ledger list
traceframe ledger list --status failed
traceframe ledger show --run-id run-demo

The ledger is a derived catalog, not a database and not a second source of truth. If it is stale or deleted, rebuild it from the trace files.

For host hooks, ingest the JSON payload from stdin instead of wrapping each command manually. Use --dir for per-session traces: traceframe derives the run id from the payload's session, writes <dir>/<run_id>.traceframe, and creates it on first use, so the wired command needs no --run-id or --init-if-missing:

traceframe hook ingest \
  --source generic \
  --dir .traceframe/runs <<'JSON'
{"hook_event_name":"PreToolUse","tool_name":"Bash","tool_input":{"command":"cargo test"},"session_id":"host-session"}
JSON

--source is a free-form label the host chooses (default generic); traceframe stores it verbatim and never names a specific harness.

To target one explicit trace file instead, pass --file (with --init-if-missing for the first event). Pass exactly one of --file or --dir.

To wire a host so it pipes hook payloads into hook ingest, use the idempotent installer. It merges traceframe entries into a local hooks file (default .agent/hooks.json), or prints a snippet to paste by hand when the host's settings file is global or delicate:

traceframe hook install
traceframe hook install --print

The wired command is traceframe hook ingest --source generic --dir .traceframe/runs: it derives the run id from the host session id and writes one <run-id>.traceframe per session. To capture a real agent session end to end (wire → run a real agent → verify → render), run scripts/capture-session.sh (set AGENT_CMD to launch your harness, or let it fall back to a traceframe run recording); examples/agent-session.traceframe is a real (sanitized) capture of one such session.

Once a run is recorded, audit it against capability/permission policy:

traceframe policy-check --file .traceframe/runs/agent-run-demo.traceframe

policy-check fails when a permission deny is never resolved by a later allow, or when a sensitive public capability (git push / git.push) ran without a recorded permission allow.

Use verify + policy-check as a gate in CI or a pre-push hook so a public action is blocked when the run behind it isn't clean. See docs/ci-gate.md and the example .github/workflows/evidence-gate.yml.

See docs/hooks.md for the agent-hook pattern, the installer, and the policy-check rules.

Example output

run_id: run-agent-demo
status: failed
events: 8
model_calls: 1
tool_calls: 1
tool_results: 1
permission_decisions: 2
errors: 1
duration_ms: 110

See examples/agent-run.traceframe for a sample run with an allowed permission, a denied permission, a failed tool result, and a final failed state.

Library API

Rust harnesses can write traces directly with TraceRecorder:

use traceframe::trace::TraceRecorder;

let recorder = TraceRecorder::start(
    ".traceframe/runs/my-agent-run.traceframe",
    "my-agent-run",
    true,
)?;

recorder.model_call("openai", "gpt-5.5")?;
recorder.permission_decision("fs.write:README.md", "allow")?;
recorder.tool_call("shell", "cargo test", ["cargo", "test"])?;
recorder.tool_result("shell", "cargo test", true, Some(0), Some(320))?;
recorder.finish("success", Some("harness completed"))?;

See docs/harness-integration.md and examples/harness-recorder.rs.

Event model

v0.1 supports one run per trace file. The public contract is the event model; the current local file encoding is line-delimited JSON for simple append, inspection, and recovery.

Required event fields:

  • version
  • run_id
  • event_id
  • kind
  • ts_ms
  • seq
  • payload

Supported event kinds:

  • run.started
  • model.call
  • tool.call
  • tool.result
  • permission.decision
  • error
  • run.finished

CLI experience

Traceframe's CLI is designed for both humans and agents:

  • commands print stable, aligned summaries;
  • run creates, records, and closes a command trace in one step;
  • wrapped command stdout/stderr is preserved;
  • exec returns the wrapped command's exit code;
  • hook ingest lets any agent host append tool, result, permission, and error events from stdin;
  • hook install idempotently wires a local hooks file (or prints a snippet to paste by hand) so a host pipes payloads into hook ingest;
  • policy-check audits a trace for unresolved permission denies and sensitive capabilities (git push) that ran without a recorded allow;
  • command traces include argv, exit code, duration, byte counts, and bounded stdout/stderr previews;
  • open traces can still be summarized, inspected, and rendered;
  • ledger rebuild/list/show gives agents a stable catalog once many local runs exist;
  • the raw trace file remains the source of truth when an agent needs to inspect or pass the run evidence to another step.

Quality gate

Every public change should pass the same gate that CI runs:

cargo fmt --check
cargo clippy -- -D warnings
cargo test
cargo llvm-cov --workspace --all-targets --fail-under-lines 80
sh scripts/check-release-readiness.sh
sh scripts/host-smoke.sh
sh scripts/hook-smoke.sh

The 80% line-coverage threshold is intentionally modest for v0.1, but it is a floor, not a target. New command behavior should come with focused tests before it is treated as part of the tool.

scripts/host-smoke.sh is the deeper dogfood path. It creates real success, failed, manual, and open traces in a temporary workspace, renders HTML, rebuilds the ledger, filters by status, and verifies the Rust harness example. scripts/hook-smoke.sh separately simulates the host-hook path used by generic agent workflows.

Storage model

Traceframe stores traces as local append-only files. The trace file is the source of truth. The current implementation uses line-delimited JSON because it is simple to append, inspect, diff, and recover from. A database may be added later as a derived local index, but not as the primary record of what the agent did.

The run ledger is the first derived storage layer. It catalogs local trace files for discovery, filtering, and handoff, but it is intentionally rebuildable from .traceframe/runs/*.traceframe.

See docs/storage.md for the storage decision record and tradeoffs.

Agent Skill

This repo ships an agent-facing skill at skills/traceframe. Install or copy it into your agent harness's skill directory when agents should know the correct Traceframe operating contract, commands, and release gate.

Product boundaries

Traceframe deliberately starts with one contract: capture a local, ordered, inspectable record of an agent run. Runtime control, permission policy, dashboards, OpenTelemetry export, eval suites, and prompt management can connect around that trace contract, but they should not define v0.1.

Versioning and changelog

Traceframe is pre-1.0. While the project is in local-MVP/alpha territory, breaking changes to the event schema, CLI flags, or output contracts may happen without a major version bump. The short-term goal is not feature breadth; it is to prove that the local trace contract is useful inside real agent workflows.

Design principles

  • Agents are primary operators; humans are reviewers and operators.
  • A trace must help explain a real run, not decorate a dashboard.
  • A failed run should become a test, policy, eval, or workflow improvement.
  • Local trace evidence comes before SaaS.
  • Export surfaces come after the core trace contract is useful.

About

Local-first trace tool for AI agent workflows — inspectable, append-only run traces

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors