Skip to content

Keesan12/martin-loop

MartinLoop

MartinLoop

MartinLoop gives AI coding agents budgets, stop conditions, rollback rules, and receipts.

Built from thousands of agent runs where the problem was not intelligence -- it was uncontrolled execution.

Get started: npx -y martin-loop@latest start
Try the demo: npx -y martin-loop@latest demo

License: Apache-2.0 TypeScript Node npm version npm downloads

MartinLoop is part of the NVIDIA Inception program.
NVIDIA Inception Program logo

Why MartinLoop

AI coding agents are useful, but unbounded retry loops are expensive.

A task that looked like a small fix can become dozens of attempts, a blown token budget, and a diff nobody trusts. MartinLoop gives every run an explicit contract: objective, verifier, budget, scope, receipts, and a clear stop condition.

Use it when AI coding work needs to stay bounded, inspectable, and safe to review before it becomes expensive or destructive.

Why Teams Adopt MartinLoop

  • It turns agent behavior into inspectable run receipts you can actually review.
  • It enforces hard stop conditions before runaway retries spend more money.
  • It adds rollback-aware rules so failed attempts do not silently leave unsafe changes behind.
  • It helps teams compare outcomes across agents under one governed flow.

Teams use MartinLoop when they need governed agent execution that can be reviewed and trusted.

2-Minute Install Path

npx -y martin-loop@latest start
npx -y martin-loop@latest demo
cd martin-loop-demo
npm install
npx -y martin-loop@latest run "Summarize the demo workspace and prove tests still pass" --proof --verify "npm test"

Quick Start

Try MartinLoop in a disposable demo workspace:

npx -y martin-loop@latest start
npx -y martin-loop@latest demo
npx -y martin-loop@latest --version
cd martin-loop-demo
npm install
npx -y martin-loop@latest run "Summarize the demo workspace and prove tests still pass" --proof --verify "npm test"
npx -y martin-loop@latest dossier --latest
npx -y martin-loop@latest share --latest

Optional global install:

npm install -g martin-loop
martin-loop --version

If this flow is useful, open an issue with feedback so we can keep improving the public experience.

start prints the first-run guided path. run auto-checks doctor, session-start, and preflight, then executes when the environment is ready. You can still run those commands directly when you want to inspect the governed checks first.

Inspect-first flow:

npx -y martin-loop@latest doctor
npx -y martin-loop@latest session-start
npx -y martin-loop@latest preflight "Summarize the demo workspace and prove tests still pass" --verify "npm test"

share --latest writes three files into the selected run directory under share/: run-receipt.json, run-receipt.md, and proof-card.svg.

Release notes for the current root package: MartinLoop 0.3.6.

Visual Proof

MartinLoop turns an AI coding run into an inspectable execution record: budget used, verifier result, changed files, rollback evidence, and final receipt.

MartinLoop CLI showing a governed agent run

Ungoverned agents can retry until cost and scope drift. MartinLoop adds budget caps, verifier gates, and audit evidence so the run has a clear stop condition.

MartinLoop governed run compared with an unbounded retry loop

Proof Receipts

Proof receipts are local share bundles for governed AI coding runs. They show the task, spend, budget, verifier result, receipt integrity, and any evidence boundary that should not be rounded into confidence.

This real governed run spent $0.51 against a $3.00 budget. The verifier passed and the receipt integrity was signed, but the proof stayed at EVIDENCE_BOUNDARY because rollback evidence was not recorded.

MartinLoop CLI proof receipt for a governed run with spend, budget, verifier, integrity, and evidence boundary

Generate your own receipt after a governed run:

npx -y martin-loop@latest run "Summarize the demo workspace and prove tests still pass" --proof --verify "npm test"
npx -y martin-loop@latest runs verify --latest
npx -y martin-loop@latest share --latest

Example receipt files: Markdown and JSON.

Run This Audit Yourself

Use this lane from a clean temp directory to verify the public CLI flow exactly as shipped:

npx -y martin-loop@0.3.6 --version
npx -y martin-loop@0.3.6 start
npx -y martin-loop@0.3.6 demo
cd martin-loop-demo
npm install
npx -y martin-loop@0.3.6 run "Summarize the demo workspace and prove tests still pass" --proof --verify "npm test" --json
npx -y martin-loop@0.3.6 dossier --latest --json
npx -y martin-loop@0.3.6 share --latest --json

For deterministic installs, pin the package line (martin-loop@0.3.6) or use martin-loop@latest. Plain npx martin-loop can resolve a stale local cache on some machines.

Expected share bundle outputs:

  • share/run-receipt.json
  • share/run-receipt.md
  • share/proof-card.svg

See It In Action

The point is not that every governed run is always cheaper. The point is that every run becomes inspectable and enforceable: budget policy, verifier result, stop reason, and evidence are explicit.

For a deterministic public repro lane, use the benchmark workspace and compare governed execution to unbounded retry behavior:

  • npx martin-loop bench --suite under-3-challenge
  • npx martin-loop bench --suite ralphy-engineering-50

Ralph-Style Loops

A Ralph-style loop is the failure mode where an AI coding agent keeps trying without knowing when continuing is unsafe, uneconomical, or unlikely to succeed.

MartinLoop keeps the useful part of the loop, then adds brakes:

  • stop before budget overspend
  • classify unsafe or invalid actions before execution
  • write an audit record for every attempt
  • preserve rollback and verifier evidence for review
  • reduce runaway context growth with compact run summaries

Failure Taxonomy (12 Runtime Classes)

Public governed runs use one canonical taxonomy: the 12 runtime FailureClass values from @martin/contracts.

See the canonical table: Failure Taxonomy (12 Runtime Classes).

What It Does

  • Budget caps stop the next attempt before a configured USD, token, or iteration limit is exceeded.
  • Verifier gates require a real check, such as npm test, before a run can count as complete.
  • Policy checks block unsafe verifier commands, risky path changes, and secret-like task inputs before execution.
  • Failure classification uses canonical runtime classes for triage and reporting. See Failure Taxonomy (12 Runtime Classes).
  • Run receipts capture stop reason, verifier evidence, budget posture, integrity state, and the next safe action.
  • martin share --latest turns the latest governed run into a local share bundle with a redacted JSON receipt, Markdown recap, and proof-card SVG.
  • MCP integration gives hosts one write-capable execution entrypoint plus richer planning, inspection, and review helpers.

How It Works

Layer Purpose
Task contract Objective, verifier plan, repo root, allowed paths, denied paths, acceptance criteria, workspace, project, and budget.
Policy and budget Defaults come from martin.config.yaml; CLI flags can override them. Budget preflight blocks attempts that would exceed policy.
Agent adapters Claude CLI, Codex CLI, Gemini CLI, direct-provider, and verifier-only adapters normalize execution results.
Safety and verification Scope checks, verifier command checks, prompt integrity, and grounding decide whether work can continue.
Persistence JSONL run records, evidence summaries, and repo-backed artifacts make every run inspectable later. Each loop record is locally signed (HMAC, per-runs-root key) and dossier/runs get/runs verify/challenge/badge report an integrity verdict (verified / tamper_detected / unsigned) so post-hoc edits to a record are detectable, not just inspectable.

Trust Boundaries

  • Cost and token outputs always include provenance (actual, estimated, or unavailable).
  • For Codex specifically, MartinLoop reports authoritative usage only when the host exposes it; otherwise MartinLoop labels usage as estimated and avoids presenting it as settled accounting.
  • Receipt integrity must be verified before a run is treated as trustworthy evidence for external review.

CLI

martin-loop doctor
martin-loop demo
martin-loop session-start [--host <claude|codex|gemini|generic>]
martin-loop phase status|contract|session-start|preflight|run [--execute]
martin-loop preflight <objective> [options]
martin-loop run <objective> [options]
martin-loop bench --suite <suiteId>
martin-loop triage
martin-loop dossier (--latest | --loop-id <id> | --file <path>)
martin-loop runs list|get|attempt|verify ...
martin-loop mcp print-config --host <codex|claude|gemini|generic>
martin-loop mcp install --host <codex|claude|gemini|generic>
martin-loop challenge [--loop-id <id> | --file <path> | --latest]
martin-loop share (--loop-id <id> | --file <path> | --latest) [--out-dir <path>]
martin-loop badge [--format svg|json] [--runs-dir <path>]

Common options:

--budget <n>            Hard cost cap in USD
--budget-usd <n>        Alias for --budget
--soft-limit-usd <n>    Soft budget threshold in USD
--verify <cmd>          Verifier command after each attempt
--proof                 Use the no-spend proof adapter
--max-iterations <n>    Maximum number of attempts
--max-tokens <n>        Maximum token budget
--engine <name>         Adapter to use: claude, codex, gemini, or openai
--cwd <path>            Repo root for the run
--allow-path <glob>     Restrict writes to this path pattern; repeatable
--deny-path <glob>      Block this path pattern; repeatable
--runs-dir <path>       Override the local Martin runs root

Examples below use npx martin-loop so they work without a global install. If you install martin-loop globally, the martin alias works too.

Use martin-loop share --latest after dossier when you want a redacted bundle you can hand to another person without sending raw run-store files.

More detail: CLI reference and configuration reference.

MartinLoop CLI terminal output

Benchmarks

MartinLoop ships a public deterministic benchmark workspace in benchmarks/ plus the installed-package bench command.

From an installed package:

npx martin-loop bench --suite under-3-challenge
npx martin-loop bench --suite ralphy-engineering-50

From a clean public clone:

pnpm install --frozen-lockfile
pnpm bench:build
pnpm bench:eval
pnpm bench:report:ralphy

Equivalent workspace-filter commands:

pnpm --filter @martin/benchmarks build
pnpm --filter @martin/benchmarks test
pnpm --filter @martin/benchmarks eval
pnpm --filter @martin/benchmarks report:ralphy

The installed-package command reads the shipped public fixtures. The repo-clone workflow runs the public benchmark workspace directly.

MCP

Run the standalone MCP package directly:

npx -y @martinloop/mcp

Add it to common hosts:

codex mcp add martin-loop -- npx -y @martinloop/mcp
claude mcp add --transport stdio --scope user martin-loop -- npx -y @martinloop/mcp
claude mcp add --transport stdio --scope user martin-loop -- cmd /c npx -y @martinloop/mcp

Generate host config from the root CLI:

npx martin-loop mcp print-config --host codex --transport stdio --profile minimal
npx martin-loop mcp print-config --host claude --transport stdio --profile diagnostic
npx martin-loop mcp print-config --host gemini --transport stdio --profile full-local
npx martin-loop mcp print-config --host generic --transport stdio --profile github-review

The root martin-loop package and the standalone @martinloop/mcp package move on separate version lines. The root package line here is 0.3.6; the current standalone MCP package is 0.3.1.

The public MCP release train labels are:

  • 0.1.4 operator foundation
  • 0.2.0 cockpit expansion
  • 0.2.5 public MCP package line
  • 0.2.7 usability and review release
  • 0.3.0 host adoption and onboarding release
  • 0.3.1 review and handoff release

The standalone MCP registry/server identifier is io.github.Keesan12/martin-loop.

More detail: MCP setup, MCP tool reference, and MCP compatibility.

SDK

npm install martin-loop
import { MartinLoop, createClaudeCliAdapter } from "martin-loop";

const loop = new MartinLoop({
  adapter: createClaudeCliAdapter({ workingDirectory: process.cwd() }),
  defaults: {
    workspaceId: "my-workspace",
    projectId: "my-project",
    budget: {
      maxUsd: 3,
      softLimitUsd: 2.25,
      maxIterations: 3,
      maxTokens: 20_000,
    },
  },
});

const result = await loop.run({
  task: {
    title: "Fix auth regression",
    objective: "Fix the failing auth regression tests",
    verificationPlan: ["pnpm test"],
    repoRoot: process.cwd(),
  },
});

console.log(result.decision.status);

The root SDK also exports createCodexCliAdapter, createGeminiCliAdapter, createDirectProviderAdapter, createOpenAiCompatibleAdapter, and createVerifierOnlyAdapter.

More detail: SDK reference and package map.

Examples

Development

Requirements:

  • Node.js 20+
  • pnpm 10.x
git clone https://github.com/Keesan12/martin-loop.git
cd martin-loop
pnpm install --frozen-lockfile
pnpm lint
pnpm test
pnpm build
pnpm public:copy-scan
pnpm public:git-surface
pnpm oss:validate
pnpm public:smoke
pnpm release:matrix:local

Standalone MCP validation:

pnpm --filter @martinloop/mcp lint
pnpm --filter @martinloop/mcp test
pnpm --filter @martinloop/mcp build
pnpm --filter @martinloop/mcp smoke:pack
pnpm --filter @martinloop/mcp smoke:published:pack
pnpm --filter @martinloop/mcp verify:release

Contributing

Issues, bug reports, workflow feedback, and focused pull requests are welcome. Public-facing docs should stay concise, user-centered, and accurate.

git checkout -b feat/your-feature
pnpm lint
pnpm test
git commit -m "feat: describe what you built"
git push -u origin feat/your-feature

Star this repo if you think AI coding needs budgets, brakes, and receipts.

martinloop.com · support@martinloop.com

MartinLoop is part of the NVIDIA Inception program.

NVIDIA Inception Program logo

License

Apache-2.0. See LICENSE.