Demo mantis on mantis#8
Conversation
…m live testing
A flagship example that exercises most of the SDK end-to-end against the project's own data,
and two real agent-runtime fixes surfaced by running it live.
examples/repo_radar/:
- sources.py: pull PRs/issues (GitHub REST, both repos), a git contributor rollup, and the
meeting-notes Google Doc into DataFrames (verified live: 1000 PRs, 306 issues, 27 authors,
48 notes). plain ingestion, runnable standalone.
- repo_radar.py: 4 phases — build a portfolio of maps (spaces.create), per-map notebook delta
analysis with checkpoints (week-over-week), provider-scoped agent synthesis, and a markdown
briefing. phases degrade gracefully; defaults to direct-backend (base_url="").
- README documenting why it's impossible in the single-space UI.
agent fixes (found while smoke-testing against the live stack):
- websockets connect: support both extra_headers (<=13) and additional_headers (>=14).
- normalize the agent runtime's untyped assistant frames ({sender:"ai", message, partial});
only the final (partial=false) frame becomes text. handle `heartbeat` like typing.
- ask() now uses an IDLE timeout (resets on each event/heartbeat) instead of a hard total —
long claude_code/opencode runs are heartbeat-punctuated.
- agents.session(all_spaces=…, mode=…): send agent_initialization on connect; the ack is
best-effort (delivered via channel-layer group broadcast that may not reach a headless
socket), so we proceed after a grace window rather than failing.
Verified live: 3/4 maps created from real data; notebook cells ran in-kernel; a claude_code
agent streamed init→heartbeats→complete and the SDK surfaced its real error text
("Could not load credentials") into the brief. Remaining gaps are stack model credentials
(Bedrock/OpenAI), not SDK code. 55 unit tests pass, ruff clean.
… space
The agent's MCP tools (inspect/search/bags/points) require an X-Space-State-ID header to
know which space/map to act on. The composer only sets that header when ws/chat is given a
space_state_id — which the browser mints when you open a space, but a headless SDK session
never had. Result: the agent could talk but not inspect ("mantis://current ... failed").
Fix:
- new client.space_states resource: create()/list() over POST/GET /api/space-state/ — the
same cookie-auth endpoint the frontend uses (verified live; the /api/v1/me/ API-key variant
needs a separate key the SDK doesn't have, so cookie endpoint is the right choice).
- agents.session(space_id=...) now auto-mints a space-state (auto_space_state=True, default)
and threads &space_state_id= onto the ws/chat connect; pass space_state_id= to reuse one, or
auto_space_state=False to skip. all_spaces sessions don't mint (no single space to scope).
Verified live: opencode agent scoped to a space now inspects it and reports real content
("The 'Mantis Radar — issues' space centers on software repository normalization, ...") where
before it failed the tool call. 59 unit tests pass, ruff clean.
SDK:
- client.aliases: resolve()/get()/set() over /api/{getSpaceFromAlias,getAliasFromSpaceId,
setSpaceAlias}/, plus resolve_or_create_space(alias) — the idempotency guardrail: reuse the
space if the alias resolves, else mint a DETERMINISTIC uuid5 space id so concurrent first
runs converge instead of racing.
- spaces.create() now accepts explicit space_id + map_id, so a map can be created INTO an
existing space and refreshed in place on re-runs (backend get_or_creates the space and
updates the map of that id — verified the serializer honors both).
Repo Radar reworked to the single-space pattern: maintain ONE aliased space (/space/m4m)
holding the 4 radar maps, each with a stable uuid5 map_id, aliased once on first run.
Verified live end-to-end:
- run 1: created space + alias m4m-radar, 3 maps upserted (prs 400'd on a data edge — handled).
- run 2: REUSED the same space (not new), identical map ids, no alias error.
- space holds 3 maps after 2 runs (not 6) — no duplication, no orphans.
Depends on the MantisAPI `alias-idempotency` branch for safe alias re-set (the guardrail
also avoids needing it). 65 unit tests pass, ruff clean.
A wrong MANTISAPI_PATH made github_authors raise CalledProcessError, which the build_maps loop didn't catch (it only caught MantisError) → the whole run crashed instead of skipping that one map. Now github_authors raises a clear ValueError when the path isn't a git repo, and build_maps catches any per-source exception so one bad source never aborts the run.
…eate
Two issues surfaced running the demo:
1. PRs map 400'd ("Invalid request data"): the only semantic column was the PR body, and
~7/12 PRs have an empty body, so the backend had nothing to embed. Fall back to the PR
title (always present, and the richest signal) when the body is empty; same defensive
fallback for issues. Verified: 0/12 empty summaries now.
2. Agent synthesis 500'd on POST /api/space-state/: space-state is unique on
(space, name, created_by), so re-running for the same space/user blindly re-POSTing the
same name hit an integrity error. space_states.create() is now get-or-create — it lists
and reuses an existing state of that name before POSTing, making it idempotent across runs.
66 unit tests pass, ruff clean.
…opped The agent run produced its full answer but then dangled for the entire idle timeout: the backend sends the committed final text and *then* a chat_complete envelope over a kafka→socket bridge that sometimes drops the envelope, leaving ask() blocked on recv(). Now, once we see the committed final ai frame (sender=ai, partial=false), we wait only a short grace (8s) for a terminal event before finishing cleanly — so the run ends right after the answer instead of hanging ~90s. final_grace is instance-overridable for tests.
…hors, per-speaker notes
spaces.create now sends map_name (defaults to space_name) — without it the backend names
every map "Untitled Map". repo_radar passes a friendly title per map.
radar sources now match the demo's intent:
- github_prs / github_issues: ALL active (open) items across both repos, each carrying
created_at + updated_at date facets (was "all" states, single date).
- github_authors: everyone who has EVER committed to either repo, via the GitHub API
(/contributors for the complete roster + a bounded /commits pass for subjects and latest
date) — drops the local-clone dependency and the MANTISAPI_PATH tilde footgun entirely.
- meeting_notes: one point PER SPEAKER SEGMENT (829 across 35 meetings) with speakers,
timestamp, meeting, and date facets, instead of one blob per meeting.
…content) Walks both repos via the Git Trees API, fetches the first 1500 chars of each source file, and creates a new "code" map in the space so the codebase is navigable alongside PRs/issues/authors/notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…atch all errors - Code source: focus on .py/.ts/.tsx/.js only, skip tests, sort by size, cap at 800 files to keep embedding under 10min - Move code map last so smaller maps don't queue behind it - Increase stall_timeout to 1800s and cell execution to 300s - Phase 3: catch any exception (not just MantisError) so a WebSocket failure doesn't crash the whole script Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In a multi-map space, the kernel's `maps` list contains all maps. The delta code was always reading maps[0] regardless of which map the notebook was created for, so every map reported the same point count. Now templates the target map_id into the cell code and looks it up by id. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Python's UUID.__eq__ returns False when compared to a plain string. The kernel stores map_id as a UUID object, so str(m.map_id) is needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New client.featured_chat resource: .set(), .get(), .clear(), .clone() - repo_radar phase 3 now pins the synthesis conversation so visitors see the briefing by default when they open the space
…pinning
The SDK was using its local chat_id (sdk-{uuid}) which is just the WS
routing key. Agno generates its own session_id server-side and sends
it back in message frames as 'chat_id'. Now captured as server_chat_id.
…session
The backend only persists to Agno when it generates the session_id
itself (chat_id='new' triggers uuid generation). Passing 'sdk-{uuid}'
meant the chat was never stored, so featured chat pinning/cloning
couldn't find it later.
Agno persists sessions asynchronously at the end of arun(). If the SDK closes the WebSocket immediately after getting the final text, the disconnect cancels the persistence. A 2s delay gives Agno time to flush.
Agno doesn't reliably persist WS sessions. Now the SDK sends the conversation messages along with the pin request so the backend can create the chat row directly if needed.
…d from commit history
…rove prompt - Increased commit scan from 50 to 500 (5 pages) so most active files get author/date metadata instead of 'unknown' - Skip code map in notebook delta (rebuilds last, kernel sees 0 points) - Restructured synthesis prompt for a scannable team digest format
There was a problem hiding this comment.
Code Review
This pull request introduces the 'Repo Radar' example tool, which automates the creation of a portfolio of maps, performs notebook delta analysis, and uses an agent to synthesize a weekly briefing. To support this, the Mantis SDK is extended with new resources for space states, aliases, and featured chats, alongside enhancements to the agent session handling. Feedback from the code review highlights several critical improvements, including handling missing environment variables gracefully to prevent crashes, optimizing sequential HTTP requests to avoid rate limits, removing hardcoded sleep delays, robustly checking websocket signatures instead of catching generic TypeErrors, and ensuring the portability of generated markdown briefings and charts by using relative paths.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| the agent is scoped to the one m4m space (it auto-mints a space-state so its MCP tools can | ||
| inspect the maps). all_spaces mode would reason across every accessible space instead.""" | ||
| print(f"\n=== PHASE 3: agent synthesis (provider={provider}, all_spaces={USE_ALL_SPACES}) ===") | ||
| email = os.environ["MANTIS_USER_EMAIL"] |
There was a problem hiding this comment.
Using os.environ["MANTIS_USER_EMAIL"] directly outside of the try block will raise a KeyError and crash the entire script if the environment variable is not set. Since this workflow is designed to degrade gracefully when components or credentials are missing, it is safer to use os.getenv and handle the missing email gracefully.
email = os.getenv("MANTIS_USER_EMAIL")
if not email:
return "_(agent synthesis unavailable: MANTIS_USER_EMAIL is not set)_", None, None| for i, commit in enumerate(commits): | ||
| sha = commit.get("sha") | ||
| login = (commit.get("author") or {}).get("login") or "unknown" | ||
| date = (commit.get("commit", {}).get("author", {}).get("date") or "")[:10] | ||
| if not sha: | ||
| continue | ||
| try: | ||
| detail = requests.get( | ||
| f"{_API}/repos/{repo}/commits/{sha}", | ||
| headers=_gh_headers(), timeout=15, | ||
| ).json() | ||
| except Exception: | ||
| continue |
There was a problem hiding this comment.
This loop performs a synchronous HTTP request for every single commit returned by _paginate (up to 500 commits per repository, across multiple repositories). This can result in up to 1,500 sequential HTTP requests, which will be extremely slow (taking several minutes) and will likely trigger GitHub's rate limits or abuse detection. Consider parallelizing these requests using a ThreadPoolExecutor, or reducing the default max_pages to a smaller number (e.g., 1 or 2), or using the GitHub GraphQL API to fetch commit file details in bulk.
| import asyncio | ||
| # allow Agno time to persist the session before the WS disconnects | ||
| await asyncio.sleep(2) |
There was a problem hiding this comment.
Hardcoding a 2-second sleep in the close() method of AgentSession introduces a significant and blocking delay every time a session is closed. This is particularly problematic in automated workflows, tests, or loops where sessions are frequently opened and closed. If the backend requires time to persist the session, this synchronization should ideally be handled via a proper protocol message/acknowledgment from the server, or the sleep duration should be configurable rather than hardcoded.
| # the header kwarg was renamed extra_headers → additional_headers in websockets 14. | ||
| # try the new name, fall back to the old so we work across the pinned range (>=10.4). | ||
| try: | ||
| self._ws = await websockets.connect( | ||
| self._ws_url(), additional_headers=headers, max_size=None, open_timeout=self.timeout, | ||
| ) | ||
| except TypeError: | ||
| self._ws = await websockets.connect( | ||
| self._ws_url(), extra_headers=headers, max_size=None, open_timeout=self.timeout, | ||
| ) |
There was a problem hiding this comment.
Catching a generic TypeError on the websockets.connect call to handle library version differences can be risky. If websockets.connect raises a TypeError due to any other argument mismatch or internal bug, it will silently fall back to the except block and attempt a second connection, which might fail with a confusing error or mask the real issue. A more robust approach is to inspect the signature of websockets.connect or check the websockets version beforehand to determine the correct keyword argument.
| # the header kwarg was renamed extra_headers → additional_headers in websockets 14. | |
| # try the new name, fall back to the old so we work across the pinned range (>=10.4). | |
| try: | |
| self._ws = await websockets.connect( | |
| self._ws_url(), additional_headers=headers, max_size=None, open_timeout=self.timeout, | |
| ) | |
| except TypeError: | |
| self._ws = await websockets.connect( | |
| self._ws_url(), extra_headers=headers, max_size=None, open_timeout=self.timeout, | |
| ) | |
| import inspect | |
| import websockets | |
| connect_kwargs = { | |
| "max_size": None, | |
| "open_timeout": self.timeout, | |
| } | |
| sig = inspect.signature(websockets.connect) | |
| if "additional_headers" in sig.parameters: | |
| connect_kwargs["additional_headers"] = headers | |
| else: | |
| connect_kwargs["extra_headers"] = headers | |
| self._ws = await websockets.connect(self._ws_url(), **connect_kwargs) |
| 'N. Title - Speakers (timestamp): discussion…'. we emit one row per segment so the map | ||
| captures who said what, when — the body is semantic; speakers/meeting/date are facets.""" | ||
| url = f"https://docs.google.com/document/d/{gdoc_id}/export?format=txt" | ||
| text = requests.get(url, timeout=30).text |
There was a problem hiding this comment.
Accessing .text directly on the response of requests.get without calling raise_for_status() can lead to silent failures. If the Google Doc is private, the ID is invalid, or the request fails, the server might return an HTML error page. The regex parsing will then silently fail to find any meetings, returning an empty DataFrame instead of raising an error. Calling raise_for_status() ensures that any HTTP errors are caught immediately.
| text = requests.get(url, timeout=30).text | |
| resp = requests.get(url, timeout=30) | |
| resp.raise_for_status() | |
| text = resp.text |
| chart_png = "/tmp/repo_radar_contributors.png" | ||
| Path(chart_png).write_bytes(png) |
There was a problem hiding this comment.
Writing the chart image to a hardcoded /tmp/repo_radar_contributors.png path makes the generated markdown briefing less portable. If REPO_RADAR_BRIEF is configured to write to a different directory, the markdown will still reference the absolute /tmp path, which won't render if the files are moved or viewed on another machine. Writing the chart to the same directory as the briefing file and using a relative path in the markdown makes the report fully self-contained and portable.
| chart_png = "/tmp/repo_radar_contributors.png" | |
| Path(chart_png).write_bytes(png) | |
| brief_dir = Path(os.getenv("REPO_RADAR_BRIEF", "/tmp/repo_radar_brief.md")).parent | |
| chart_png = str(brief_dir / "repo_radar_contributors.png") | |
| Path(chart_png).write_bytes(png) |
| lines += ["", "## Synthesis", synthesis or "_n/a_"] | ||
| chart = metrics.get("_chart") | ||
| if chart: | ||
| lines += ["", f""] |
There was a problem hiding this comment.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7cf9927f77
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for name, map_id in notebook_maps.items(): | ||
| try: | ||
| nb = client.notebooks.from_map(map_id, name=f"radar-{name}", | ||
| user_id=os.getenv("MANTIS_USER_EMAIL")) |
There was a problem hiding this comment.
Do not pass email as notebook user_id
When running the documented cookie + MANTIS_USER_EMAIL setup, this passes the email address into NotebooksResource.create as user_id; that API sends user_id to /api/notebook/create and documents/defaults it as config.internal_user_id (mantis_sdk/notebook.py:205-214). On deployments expecting the backend user UUID, each notebook create/session call fails validation, so phase 2 produces no deltas or chart even though map creation succeeded. Use the configured internal user id (or require MANTIS_INTERNAL_USER_ID) instead of the agent email.
Useful? React with 👍 / 👎.
| except MantisError: | ||
| return None # backend returns 400 when not found; treat as "no such alias" |
There was a problem hiding this comment.
Do not treat all alias errors as misses
If /api/getSpaceFromAlias returns auth, permission, server, or connection errors (for example an expired cookie or a transient backend failure), resolve() converts them to None, so resolve_or_create_space() treats the alias as absent and proceeds with a deterministic new space id. That hides the real failure and can create or update the wrong radar space once the caller continues. Only swallow the specific not-found/400 response and propagate other MantisErrors.
Useful? React with 👍 / 👎.
This pull request introduces "Repo Radar," a comprehensive, headless, and scriptable intelligence tool for the Mantis project. It automates the creation of a weekly briefing by aggregating data from GitHub repositories and meeting notes, performing notebook-based delta analyses, synthesizing insights with an agent, and assembling the results into a markdown report. The workflow is designed to degrade gracefully if any stack is unavailable, and its automation covers use cases that the Mantis UI cannot handle.
Key additions and improvements include:
New End-to-End Intelligence Workflow:
repo_radar.py, a script that orchestrates four phases: (1) building a portfolio of maps from project data, (2) running notebook delta analyses to track changes week-over-week, (3) synthesizing a briefing using an agent, and (4) assembling a markdown report with results and charts. The script is headless, scriptable, and supports scheduled runs.Documentation:
README.mdfor Repo Radar, explaining its purpose, the four-phase workflow, why it cannot be replicated in the Mantis UI, requirements, and instructions for running and configuring the tool via environment variables.