🧰 cured-harness

_{🌐 English · Português}

Curated index of agent harnesses — frameworks, coding CLIs, memory layers, orchestration platforms and enterprise systems — used as study and benchmark references.

Coding · Multi-agent · Memory · Assistants · Spec-driven · Self-improvement · Workflow · Protocol · Eval · Enterprise · Skills

_{by Alan Nicolas}

32 projects. 11 categories. Opinionated curation. Contains no original code — all repos are forks/clones for analysis. See Local mirror to reproduce the bench.

💻 Coding agents & CLIs — IDE-native, Terminal/CLI, Templates
🤝 Multi-agent orchestration
🧠 Memory & knowledge
🦾 Personal assistants
📐 Spec-driven & methodology
🧬 Self-improvement loops
⚙️ Workflow & durable execution
🔌 Protocol & infrastructure
📊 Evaluation & observability
🏢 Enterprise platforms
🧩 Cross-agent skills
🗒️ Patterns observed
🛠 Local mirror
🤝 Contributing
📄 License

💻 Coding agents & CLIs

Terminal / CLI

Project	Stack	Differential
anthropics/claude-code	TS + Ink + Bun	Snapshot (~512K LOC) of Anthropic's official CLI; canonical reference for terminal-UI coding agents.
openai/codex	TS + Rust	OpenAI's official CLI coding agent with IDE integrations and a focus on local automation.
Aider-AI/aider	Python	Most-used pair-programming CLI in open source; git-native with auto-commits per iteration.
All-Hands-AI/OpenHands	Python	Most-starred autonomous agent on GitHub (formerly OpenDevin); standard SWE-bench benchmark.

Autonomous / Multi-role

Project	Stack	Differential
gsd-build/gsd-2	TS (Pi SDK)	Auto-milestones with no human in the loop; RTK compresses shell output across long runs.
garrytan/gstack	TS + Playwright	Personal software factory by YC's president; 24 specialized agents in a multi-role workflow.
obra/superpowers	TS (Claude Code plugin)	Pure red/green TDD with parallel subagents; reports a 94% PR rejection rate.

Templates / App-builders

Project	Stack	Differential
JCodesMore/ai-website-cloner-template	Next.js + shadcn	Pixel-perfect site cloning via parallel builder agents in the `/clone-website` command.

🤝 Multi-agent orchestration

Project	Stack	Differential
bmad-code-org/BMAD-METHOD	TS	12+ personas (PM, architect, UX…) with "Party Mode" running multiple in one session; agile applied to agents.
crewAI-inc/crewAI	Python	Framework for collaborative role-playing agents with shared goals.
microsoft/autogen	Python	Microsoft's multi-agent conversational framework; research baseline for message-based coordination.
paperclipai/paperclip	TS + React + Postgres	Manages agents like employees — org charts, budgets and goals.
grandamenium/claude-remote-manager	Bash + TS	Claude Code 24/7 controlled via Telegram; persistent cron survives restart.

🧠 Memory & knowledge

Project	Stack	Differential
mem0ai/mem0	Python	Leading memory layer in 2025; native vector store and a simple API for LLM apps.
milla-jovovich/mempalace	Python + Chroma	96.6% R@5 with verbatim storage (no paraphrasing); local-first, no API keys.
garrytan/gbrain	TS + PGLite + pgvector	95% recall@5; entity self-wiring without an LLM; 30-minute setup.
MemoriLabs/Memori	Python	LLM-agnostic, agent-native memory infrastructure; 81.95% on LoCoMo, SQL-backed.

🦾 Personal assistants

Project	Stack	Differential
openclaw/openclaw	TS + SwiftUI + Kotlin	20+ channels (WhatsApp, Telegram, iMessage…) running on-device.
NousResearch/hermes-agent	Python asyncio	Auto-creates skills; runs on a $5 VPS or serverless with hibernation; multi-model.

📐 Spec-driven & methodology

Project	Stack	Differential
github/spec-kit	Python + TS	Executable specs that generate implementation; GitHub's official methodology.
gsd-build/get-shit-done	TS	Solves "context rot" via spec discipline and meta-prompting; in use at Amazon, Google and Shopify.

🧬 Self-improvement loops

Systems that iterate, mutate and optimize against a metric — descendants of karpathy/autoresearch and the "AI scientist" lineage. See also alvinreal/awesome-autoresearch for the full index.

Project	Stack	Differential
ShengranHu/ADAS	Python	Automated Design of Agentic Systems (ICLR 2025); meta-agents that invent novel agent architectures by programming them in code.
SakanaAI/AI-Scientist-v2	Python	Workshop-level autonomous scientific discovery via agentic tree search; removes v1's template dependency and generalizes across domains.
gepa-ai/gepa	Python	GEPA (Genetic-Pareto) — ICLR 2026 Oral; reflective prompt evolution that outperforms RL (GRPO); optimizes any textual parameter against any metric via natural-language reflection.

⚙️ Workflow & durable execution

Project	Stack	Differential
vercel/workflow	TS + Next.js + PG	Deterministic replay via event log; split VM + step runtime.
github/gh-aw	Go + Markdown	Agentic workflows written in natural language, executed sandboxed inside GitHub Actions.

🔌 Protocol & infrastructure

Project	Stack	Differential
modelcontextprotocol/servers	TS + Python	Official Model Context Protocol repo; 50+ reference servers.

📊 Evaluation & observability

Project	Stack	Differential
langfuse/langfuse	TS + Next.js	Most popular open-source observability for LLM apps; tracing, eval and prompt management.

🏢 Enterprise platforms

Project	Stack	Differential
dataelement/Clawith	TS	"OpenClaw for teams": digital employees with `soul.md` + `memory.md`, org chart and multi-tenant delegation.
microsoft/agent-governance-toolkit	Multi-lang	Sub-ms policy enforcement covering 10/10 of the OWASP Agentic Top 10; runtime security.
langgenius/dify	Python + TS	Most popular low-code platform (129k+ stars); ready for teams and production.

🧩 Cross-agent skills

Project	Stack	Differential
alchaincyf/huashu-design	Skill (multi-agent)	Agent-agnostic skill (Claude Code, Cursor, Codex, OpenClaw, Hermes) that delivers ready-to-ship design — animations, clickable prototypes, slide decks, infographics — from a single prompt.

🗒️ Patterns observed

Cross-cutting axis, grouping harnesses by recurring technique/architecture:

Pattern	Projects	Note
First-class memory	gbrain, mempalace, gsd-2, hermes-agent, mem0, memori-labs	Memory as a separate, measurable component — not a bolt-on.
Spec-driven	spec-kit, get-shit-done, superpowers	Pragmatic alternative to "vibe coding"; specs drive execution.
Multi-persona / party-mode	BMAD-METHOD, crewAI, autogen, paperclip, gstack	Coordination across multiple roles inside a single harness.
Parallel subagents	superpowers, ai-website-cloner-template, gstack	Task fan-out to specialized agents.
Durable / replay	vercel/workflow, gh-aw	Event-sourced; survives crashes; reproducible.
Local-first	mempalace, openclaw, gbrain, hermes-agent	No cloud dependency; on-device or self-hosted.
Governance / policy	agent-governance-toolkit, Clawith	Runtime security and agent org charts.
24/7 persistent	claude-remote-manager, hermes-agent, gsd-2	Cron, hibernation, automatic resume.
Self-improvement / autoresearch	ADAS, AI-Scientist-v2, gepa, superpowers	A loop that measures, mutates and optimizes — code, prompts or architecture.

🛠 Local mirror

To clone/update every project in parallel:

./update.sh

The script reads repos.tsv, clones whatever is missing and runs pull --ff-only on the rest.

Legend: [+] cloned · [↑] updated · [=] up-to-date · [x] error

Contributing

PRs welcome. To add a harness:

Add an entry to repos.tsv in the form name<TAB>url<TAB>branch.
Add the row to the matching section's table in the README: [owner/repo](url) | Stack | One-sentence description with the technical differential.
If it represents a new recurring pattern, add it under 🗒️ Patterns observed.

Inclusion criterion: the project must be a harness — something that orchestrates, executes, or gives memory/tools to an LLM. Pure model libraries (no agent loop) and generic infrastructure (raw vector DBs, etc.) are out of scope.

📄 License

This list is released under CC0-1.0.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.claude/skills/os-bench		.claude/skills/os-bench
_bench		_bench
.gitignore		.gitignore
README.md		README.md
README.pt-BR.md		README.pt-BR.md
STUDENT-AGENT.md		STUDENT-AGENT.md
repos.tsv		repos.tsv
update.sh		update.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧰 cured-harness

Curated index of agent harnesses — frameworks, coding CLIs, memory layers, orchestration platforms and enterprise systems — used as study and benchmark references.

Contents

💻 Coding agents & CLIs

Terminal / CLI

Autonomous / Multi-role

Templates / App-builders

🤝 Multi-agent orchestration

🧠 Memory & knowledge

🦾 Personal assistants

📐 Spec-driven & methodology

🧬 Self-improvement loops

⚙️ Workflow & durable execution

🔌 Protocol & infrastructure

📊 Evaluation & observability

🏢 Enterprise platforms

🧩 Cross-agent skills

🗒️ Patterns observed

🛠 Local mirror

Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧰 cured-harness

Curated index of agent harnesses — frameworks, coding CLIs, memory layers, orchestration platforms and enterprise systems — used as study and benchmark references.

Contents

💻 Coding agents & CLIs

Terminal / CLI

Autonomous / Multi-role

Templates / App-builders

🤝 Multi-agent orchestration

🧠 Memory & knowledge

🦾 Personal assistants

📐 Spec-driven & methodology

🧬 Self-improvement loops

⚙️ Workflow & durable execution

🔌 Protocol & infrastructure

📊 Evaluation & observability

🏢 Enterprise platforms

🧩 Cross-agent skills

🗒️ Patterns observed

🛠 Local mirror

Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages