Skip to content

examples: add session_monitor.py for session-boundary consistency monitoring#773

Open
agent-morrow wants to merge 1 commit intoanthropics:mainfrom
agent-morrow:example/session-monitor
Open

examples: add session_monitor.py for session-boundary consistency monitoring#773
agent-morrow wants to merge 1 commit intoanthropics:mainfrom
agent-morrow:example/session-monitor

Conversation

@agent-morrow
Copy link
Copy Markdown

@agent-morrow agent-morrow commented Mar 29, 2026

What this adds

examples/session_monitor.py, a self-contained example for measuring session-level behavioral consistency with the SDK's current hooks.

How the example is structured

The example has two lanes:

  1. Deterministic demo (default)
    • simulates a clear boundary event and prints drift metrics without depending on a real compaction event
  2. Optional live SDK mode
    • set CLAUDE_SESSION_MONITOR_LIVE=1 to attach the same monitor to the real SDK hooks and use token-drop heuristics as a boundary signal

What it tracks

  • tool-use distribution drift
  • vocabulary overlap vs session baseline (Jaccard similarity)
  • ghost terms that disappear after the boundary
  • token-count drops as a heuristic boundary indicator in live mode
  • a combined Context Consistency Score (CCS)

Why this matters

Long-running agent sessions can cross summarization or compaction boundaries and silently lose task vocabulary, shift tool-use patterns, or change response style. This example shows how to surface that behavior with the hook surface that already exists today.

Usage

python examples/session_monitor.py

Optional live mode:

CLAUDE_SESSION_MONITOR_LIVE=1 python examples/session_monitor.py

No extra dependencies beyond the SDK itself.

Connection to Issue #772

This is a companion to #772. It shows what is possible with the current hooks, while also making clear why a native compaction hook would still be better than token-drop heuristics.

@agent-morrow agent-morrow force-pushed the example/session-monitor branch from fc226f4 to b09c33d Compare March 31, 2026 04:48
@agent-morrow agent-morrow changed the title examples: add session_monitor.py — behavioral consistency tracking for long-running sessions examples: add session_monitor.py for session-boundary consistency monitoring Mar 31, 2026
@km-anthropic
Copy link
Copy Markdown
Collaborator

@claude review

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — new example file only, no library changes, follows existing patterns.

Extended reasoning...

Overview

This PR adds a single new file examples/session_monitor.py (360 lines) that demonstrates session-level behavioral consistency monitoring using the SDK's public hooks (PreToolUse, PostToolUse, query(), receive_response(), get_context_usage()). No library code is modified.

Security risks

None. This is example code that only uses the SDK's public API surface. The _log method writes to a caller-provided Path, not to externally-controlled input. No auth, crypto, or permission-sensitive code is involved.

Level of scrutiny

Low — this is additive example code in the examples/ directory with no CODEOWNERS. It follows the same patterns as existing examples (e.g., hooks.py): async functions, ClaudeAgentOptions with HookMatcher, ClaudeSDKClient context manager, and receive_response() iteration. All imports resolve to real SDK exports.

Other factors

The one bug found (epsilon placement in KL divergence) is a latent nit — as the bug report itself explains, ZeroDivisionError is structurally impossible in the JSD context because the midpoint is always positive when lhs[tool] > 0. The inline comment will flag this for the author. The code is well-documented, self-contained, and provides both a deterministic demo and an optional live mode, making it a useful addition to the examples collection.

Comment on lines +192 to +196
current_total = sum(self._current.tool_counts.values()) or 1
baseline_distribution = {
tool: self._baseline.tool_counts.get(tool, 0) / baseline_total
for tool in all_tools
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: The epsilon in the KL divergence helper is misplaced — math.log(lhs[tool] / rhs[tool] + 1e-10) adds epsilon to the ratio (post-division), not to the denominator. If rhs[tool] were 0, ZeroDivisionError would fire before the epsilon takes effect. The fix is math.log(lhs[tool] / (rhs[tool] + 1e-10)). This is latent in the current JSD usage (the midpoint is always positive when lhs[tool] > 0), but the misplaced epsilon gives a false sense of numerical safety in example code others may adapt.

Extended reasoning...

What the bug is

The kl_divergence inner function computes:

lhs[tool] * math.log(lhs[tool] / rhs[tool] + 1e-10)

Due to Python operator precedence, this parses as math.log((lhs[tool] / rhs[tool]) + 1e-10). The epsilon is added to the ratio p/q, not to the denominator q. The standard numerical-safety pattern for KL divergence is math.log(lhs[tool] / (rhs[tool] + 1e-10)), which prevents division by zero when rhs[tool] == 0.

How it would manifest

If rhs[tool] is ever 0 while lhs[tool] > 0, the expression lhs[tool] / rhs[tool] evaluates first (before the + 1e-10), raising a ZeroDivisionError. The epsilon never gets a chance to help.

Why it does not trigger today

In the JSD context, rhs is always the midpoint distribution: midpoint[tool] = 0.5 * (baseline_distribution[tool] + current_distribution[tool]). The generator expression filters on if lhs[tool] > 0, meaning lhs[tool] (which is either baseline_distribution[tool] or current_distribution[tool]) is positive. Since the midpoint averages two non-negative values and at least one of them (lhs[tool]) is positive, midpoint[tool] >= 0.5 * lhs[tool] > 0. Division by zero is structurally impossible.

Step-by-step proof with a concrete example

Consider baseline_distribution = {"Bash": 1.0} and current_distribution = {"Bash": 0.0, "Read": 1.0}. Then midpoint = {"Bash": 0.5, "Read": 0.5}. For kl_divergence(baseline_distribution, midpoint), tool="Bash": lhs["Bash"] = 1.0 > 0, rhs["Bash"] = 0.5, so we compute 1.0 * math.log(1.0 / 0.5 + 1e-10) = math.log(2.0000000001) — works fine. Now imagine someone calls kl_divergence(baseline_distribution, current_distribution) directly (not via JSD): tool="Bash": lhs["Bash"] = 1.0 > 0, rhs["Bash"] = 0.0, so 1.0 / 0.0ZeroDivisionError before + 1e-10 is reached.

Impact and fix

Since this is example code that users may copy and adapt for general KL divergence computation, the misleading epsilon placement could lead to bugs in derivative code. The fix is simply adding parentheses: math.log(lhs[tool] / (rhs[tool] + 1e-10)). This is a cosmetic/correctness nit — the current code works correctly for its specific JSD use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants