Code Atlas turns a repo into a knowledge graph so humans and AI agents can explore code structure fast.
Instead of reading files one-by-one, you can ask:
- where a symbol is defined
- who calls it
- how two symbols are connected
- what may break if a symbol changes
flowchart LR
A[Repo Path or GitHub URL] --> B[Scan Files]
B --> C[Parse + Resolve Symbols]
C --> D[Build Graph Nodes + Edges]
D --> E[Query / Export / Visualize]
Think of it as:
- Scanner finds code files.
- Extractors read syntax and relationships.
- Graph Store saves structure and maintains fast lookup indexes.
- Query Engine answers navigation/debug questions.
Install dependencies first:
uv syncRun tests (project test suite only):
uv run pytest testsInstall command-line entry points in your current environment:
uv pip install -e .Then run as a normal command-line app:
code-atlasStart the interactive CLI:
code-atlasor:
uv run python main.pyDefault graph file is:
tmp/code-atlas.graph.json
sequenceDiagram
participant U as You
participant CLI as Code Atlas CLI
participant IDX as Indexer
participant G as Graph Store
participant V as Browser Visual
U->>CLI: index https://github.com/owner/repo
CLI->>IDX: scan + parse + resolve
IDX->>G: write tmp/code-atlas.graph.json
U->>CLI: find auth
CLI->>G: query nodes
U->>CLI: callers python://pkg.auth:login
CLI->>G: query edges
U->>CLI: visual login
CLI->>V: open interactive HTML subgraph
help
index <repo-or-github-url> [--out PATH] [--exclude dir1,dir2]
load [PATH]
stats
find <name> [--limit N]
callers <symbol> [--limit N]
related <file> [--depth N] [--limit N]
path <from> <to> [--max-depth N]
impact <symbol> [--depth N] [--limit N]
export graphml [--out PATH]
export neo4j [--out DIR]
visual <symbol> [--depth N] [--limit N] [--out PATH]
visual-all [--limit N] [--out PATH]
raw on|off
where
clear
exit
stats now includes quality and coverage signals:
- confidence distribution (count + %) for edges:
high,medium,low - extraction coverage per language:
- files seen
- files indexed
- coverage percentage
- parser mode (
ast,tree-sitter,regex-fallback,stub)
index also prints clear progress checkpoints (prepare source, scan files, write graph) to make longer indexing runs easier to track.
Index progress is shown live on a single terminal line while indexing runs.
flowchart LR
A[Extracted Edges] --> B[Confidence Buckets]
B --> C[high/medium/low %]
D[Scanned Files by Language] --> E[Indexed Files by Language]
E --> F[Coverage % + Parser Mode]
C --> G[stats panel]
F --> G
Code Atlas now keeps an incremental cache at tmp/code-atlas.cache.json.
How it works:
- Scan files and compute a content hash per file.
- Compare with previous cache entries (hash + language + parser mode).
- If unchanged, reuse cached nodes/edges (cache hit).
- If changed/new, re-extract only that file.
- If deleted, drop its cached contribution.
- Save updated graph + cache for next run.
flowchart LR
A[Scan Files] --> B[Hash + Compare Cache]
B --> C{Changed?}
C -- No --> D[Reuse cached contribution]
C -- Yes --> E[Re-extract file]
D --> F[Merge graph]
E --> F
F --> G[Write graph JSON]
F --> H[Write cache JSON]
stats includes an Incremental Cache section with:
cache_hitsreindexed_filesdeleted_files
Benchmarks are generated by scripts/benchmark_incremental.py and stored in:
docs/benchmarks.md
Current snapshot (cold run vs warm incremental run):
| Repo | Lang | Full Index (s) | Incremental Re-index (s) | Speedup | Cache Hits | Reindexed Files |
|---|---|---|---|---|---|---|
pallets/flask |
python | 0.20 | 0.09 | 2.27x | 83 | 0 |
axios/axios |
typescript | 0.09 | 0.05 | 1.87x | 280 | 0 |
google/gson |
java | 0.75 | 0.31 | 2.45x | 259 | 0 |
A symbol is any named code entity represented in the graph.
flowchart TD
A[module] --> B[class]
A --> C[function]
B --> D[method]
C --> E[call target symbol]
Examples:
python://code_atlas.query(module)python://code_atlas.query:find_symbol(function)python://pkg.mod:Class.method(method)
Tip: use find <text> first to discover valid symbol IDs.
flowchart LR
A[CLI Shell\ncode_atlas/cli/app.py] --> B[Repo Source\nrepo_source.py]
B --> C[Scanner\nscanner.py]
C --> D[Indexer\nindexer.py]
D --> E1[Python Extractor\nextractors/python_extractor.py]
D --> E2[TypeScript Extractor\nextractors/typescript_extractor.py]
D --> E3[Go Extractor\nextractors/go_extractor.py]
D --> E4[Java Extractor\nextractors/java_extractor.py]
D --> E5[Stub Extractor\nextractors/stub_extractor.py]
E1 --> F[Resolver\nimports/self/local symbols]
E2 --> G[TS Nodes + Edges]
E3 --> H2[Go Nodes + Edges]
E4 --> J2[Java Nodes + Edges]
E5 --> G2[File Nodes]
F --> H[Graph Store\ngraph.py + models.py]
G --> H
H2 --> H
J2 --> H
G2 --> H
H --> I[Query Engine\nquery.py]
H --> J[Exporters\nexporters.py]
I --> K[find/callers/path/impact]
J --> L[JSON/GraphML/Neo4j/HTML]
flowchart TD
A[Read .py file] --> B[ast.parse]
B --> C[Collect imports + defs]
C --> D[Walk classes/functions]
D --> E[Extract calls + inheritance]
E --> F[Resolve names best-effort]
F --> G[Emit nodes + edges]
Edges currently include:
CONTAINSIMPORTSCALLSINHERITS
Resolution is best-effort (Python is dynamic), so edges carry confidence.
flowchart LR
A[path from A to B] --> B[Shortest directed traversal]
C[impact X] --> D[Reverse traversal from X]
B --> E[Debug dependency chains]
D --> F[Estimate change risk]
pathhelps explain how two symbols connect.impactshows likely upstream breakage surface.
flowchart LR
A[Graph Store] --> B[visual <symbol>]
A --> B2[visual-all]
A --> C[export graphml]
A --> D[export neo4j]
B --> E[Interactive HTML in browser]
B2 --> E
C --> F[Gephi / graph tools]
D --> G[Neo4j import]
Default artifact locations (under git-ignored tmp/):
tmp/code-atlas.graph.jsontmp/graph-view.htmltmp/graph-view-all.htmltmp/code-atlas.graphmltmp/neo4j/nodes.csvtmp/neo4j/edges.csv
The browser graph now includes:
- robust graph rendering with automatic 3D/2D fallback depending on dataset size/browser support
- search by node name/id
- edge-type filters (
CALLS,IMPORTS,CONTAINS,INHERITS) - confidence-colored edges
- path highlight between two nodes (directed or undirected)
- interactive node details panel
- camera navigation +
Fit,Pause/Resume, andResetcontrols - full-graph mode via
visual-allwith a default node cap (800) for browser performance
index .
stats
find find_symbol
callers python://code_atlas.query:find_symbol
path python://code_atlas.cli:_cmd_interactive python://code_atlas.query:find_symbol
impact python://code_atlas.query:find_symbol --depth 3
visual find_symbol
export graphml --out tmp/repo.graphml
export neo4j --out tmp/neo4j
- Deep semantic extraction is strongest for Python right now.
- TypeScript extraction now handles richer symbols (classes, interfaces, methods/properties), more import styles (default/named/namespace/side-effect), and improved call resolution; dynamic patterns are still best-effort.
- Go and Java use Tree-sitter parsing when available, with regex fallback when parser dependencies are missing.
- Other languages currently use a fallback file-level extractor.
- Dynamic runtime behavior cannot be perfectly resolved statically.
flowchart LR
A[Tree-sitter expansion\nJS/Java + richer TS/Go] --> B[Better symbol resolution]
B --> C[Incremental indexing cache]
C --> D[More query intelligence]
Code Atlas includes an MCP server so AI agents can call graph tools directly.
Start the MCP server (stdio transport):
code-atlas-mcpExposed MCP tools:
index_repo(source, out?)stats(graph?)find_symbol(graph, query, limit?)callers(graph, symbol, limit?)path_between(graph, source, target, max_depth?)impact_of_symbol(graph, symbol, depth?, limit?)related_files(graph, file, depth?, limit?)
Tool responses follow a structured shape:
{
"ok": true,
"data": {},
"meta": { "duration_ms": 12 }
}Errors follow the same envelope with structured codes/messages:
{
"ok": false,
"error": {
"code": "GRAPH_NOT_FOUND",
"message": "Graph file not found: ..."
}
}Common MCP error codes:
SOURCE_NOT_FOUND,INVALID_SOURCE,PERMISSION_DENIEDGRAPH_NOT_FOUND,INVALID_GRAPHINDEX_FAILED,STATS_FAILED,FIND_FAILED,CALLERS_FAILED,PATH_FAILED,IMPACT_FAILED,RELATED_FAILED
Client setup and examples:
- MCP client snippets:
docs/mcp-configs.md - 5-call walkthrough demo:
demo-mcp.md
Add these files and reference them in your portfolio:
docs/assets/cli-workflow.png- interactive CLI indexing/query flowdocs/assets/visual-workflow.png- graph UI with filters and path highlightdocs/assets/mcp-workflow.png- MCP tool-call sequence and outputsdocs/assets/code-atlas-demo.gif- short end-to-end animated demo
