examples: add session_monitor.py for session-boundary consistency monitoring#773
examples: add session_monitor.py for session-boundary consistency monitoring#773agent-morrow wants to merge 1 commit intoanthropics:mainfrom
Conversation
07770d8 to
0ef88d9
Compare
fc226f4 to
b09c33d
Compare
|
@claude review |
There was a problem hiding this comment.
LGTM — new example file only, no library changes, follows existing patterns.
Extended reasoning...
Overview
This PR adds a single new file examples/session_monitor.py (360 lines) that demonstrates session-level behavioral consistency monitoring using the SDK's public hooks (PreToolUse, PostToolUse, query(), receive_response(), get_context_usage()). No library code is modified.
Security risks
None. This is example code that only uses the SDK's public API surface. The _log method writes to a caller-provided Path, not to externally-controlled input. No auth, crypto, or permission-sensitive code is involved.
Level of scrutiny
Low — this is additive example code in the examples/ directory with no CODEOWNERS. It follows the same patterns as existing examples (e.g., hooks.py): async functions, ClaudeAgentOptions with HookMatcher, ClaudeSDKClient context manager, and receive_response() iteration. All imports resolve to real SDK exports.
Other factors
The one bug found (epsilon placement in KL divergence) is a latent nit — as the bug report itself explains, ZeroDivisionError is structurally impossible in the JSD context because the midpoint is always positive when lhs[tool] > 0. The inline comment will flag this for the author. The code is well-documented, self-contained, and provides both a deterministic demo and an optional live mode, making it a useful addition to the examples collection.
| current_total = sum(self._current.tool_counts.values()) or 1 | ||
| baseline_distribution = { | ||
| tool: self._baseline.tool_counts.get(tool, 0) / baseline_total | ||
| for tool in all_tools | ||
| } |
There was a problem hiding this comment.
🟡 Nit: The epsilon in the KL divergence helper is misplaced — math.log(lhs[tool] / rhs[tool] + 1e-10) adds epsilon to the ratio (post-division), not to the denominator. If rhs[tool] were 0, ZeroDivisionError would fire before the epsilon takes effect. The fix is math.log(lhs[tool] / (rhs[tool] + 1e-10)). This is latent in the current JSD usage (the midpoint is always positive when lhs[tool] > 0), but the misplaced epsilon gives a false sense of numerical safety in example code others may adapt.
Extended reasoning...
What the bug is
The kl_divergence inner function computes:
lhs[tool] * math.log(lhs[tool] / rhs[tool] + 1e-10)Due to Python operator precedence, this parses as math.log((lhs[tool] / rhs[tool]) + 1e-10). The epsilon is added to the ratio p/q, not to the denominator q. The standard numerical-safety pattern for KL divergence is math.log(lhs[tool] / (rhs[tool] + 1e-10)), which prevents division by zero when rhs[tool] == 0.
How it would manifest
If rhs[tool] is ever 0 while lhs[tool] > 0, the expression lhs[tool] / rhs[tool] evaluates first (before the + 1e-10), raising a ZeroDivisionError. The epsilon never gets a chance to help.
Why it does not trigger today
In the JSD context, rhs is always the midpoint distribution: midpoint[tool] = 0.5 * (baseline_distribution[tool] + current_distribution[tool]). The generator expression filters on if lhs[tool] > 0, meaning lhs[tool] (which is either baseline_distribution[tool] or current_distribution[tool]) is positive. Since the midpoint averages two non-negative values and at least one of them (lhs[tool]) is positive, midpoint[tool] >= 0.5 * lhs[tool] > 0. Division by zero is structurally impossible.
Step-by-step proof with a concrete example
Consider baseline_distribution = {"Bash": 1.0} and current_distribution = {"Bash": 0.0, "Read": 1.0}. Then midpoint = {"Bash": 0.5, "Read": 0.5}. For kl_divergence(baseline_distribution, midpoint), tool="Bash": lhs["Bash"] = 1.0 > 0, rhs["Bash"] = 0.5, so we compute 1.0 * math.log(1.0 / 0.5 + 1e-10) = math.log(2.0000000001) — works fine. Now imagine someone calls kl_divergence(baseline_distribution, current_distribution) directly (not via JSD): tool="Bash": lhs["Bash"] = 1.0 > 0, rhs["Bash"] = 0.0, so 1.0 / 0.0 → ZeroDivisionError before + 1e-10 is reached.
Impact and fix
Since this is example code that users may copy and adapt for general KL divergence computation, the misleading epsilon placement could lead to bugs in derivative code. The fix is simply adding parentheses: math.log(lhs[tool] / (rhs[tool] + 1e-10)). This is a cosmetic/correctness nit — the current code works correctly for its specific JSD use case.
What this adds
examples/session_monitor.py, a self-contained example for measuring session-level behavioral consistency with the SDK's current hooks.How the example is structured
The example has two lanes:
CLAUDE_SESSION_MONITOR_LIVE=1to attach the same monitor to the real SDK hooks and use token-drop heuristics as a boundary signalWhat it tracks
Why this matters
Long-running agent sessions can cross summarization or compaction boundaries and silently lose task vocabulary, shift tool-use patterns, or change response style. This example shows how to surface that behavior with the hook surface that already exists today.
Usage
Optional live mode:
No extra dependencies beyond the SDK itself.
Connection to Issue #772
This is a companion to #772. It shows what is possible with the current hooks, while also making clear why a native compaction hook would still be better than token-drop heuristics.