Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
db388bd
few more udpates for new categories
haoranpb Apr 8, 2026
57c004e
Refactor evaluation and dataset operations for improved workspace setup
haoranpb Apr 9, 2026
8e2f216
enable skipping container setup in action
haoranpb Apr 9, 2026
69a8db8
fix missing implementation for MockEvaluationPipeline
haoranpb Apr 9, 2026
7549d92
Refactor evaluation result classes to be more generic
haoranpb Apr 11, 2026
f32dd00
Merge branch 'main' into fix/more-ready-for-categories
haoranpb Apr 12, 2026
a4089b9
Improve readabilty of GitHub Action summary
haoranpb Apr 12, 2026
99af6b2
fix failing tests
haoranpb Apr 12, 2026
e1b0b93
Code Review POC
haoranpb Apr 12, 2026
1a68d78
Merge branch 'main' into category/code-review
haoranpb Apr 13, 2026
3ec10a0
fix merge conflict resolution mistake
haoranpb Apr 13, 2026
4e52832
Merge branch 'main' into category/code-review
haoranpb Apr 13, 2026
a9f59d9
Make container parameters optional in evaluate and run commands
haoranpb Apr 13, 2026
065e1aa
Merge branch 'category/code-review' of https://github.com/microsoft/B…
haoranpb Apr 13, 2026
4ad4bd9
Enhance code review functionality by adding expected review comments …
haoranpb Apr 13, 2026
92951c4
better hanlding container for not required categories
haoranpb Apr 13, 2026
7902610
Merge branch 'main' into category/code-review
haoranpb Apr 20, 2026
dad9289
Merge branch 'main' of https://github.com/microsoft/BC-Bench into cat…
haoranpb May 5, 2026
f1c4894
Merge branch 'main' of https://github.com/microsoft/BC-Bench into cat…
haoranpb May 11, 2026
aa48a29
prefer copilot.exe executable
haoranpb May 12, 2026
a244503
Normalize code-review dataset and preserve eval outputs
WaelAbuSeada May 16, 2026
9f6c353
Fix code-review branch setup and workflow wiring
WaelAbuSeada May 20, 2026
1a58e44
Require review.json and add log-based recovery fallback
WaelAbuSeada May 20, 2026
d0e8076
Harden code-review prompt for Windows copilot.cmd parsing
WaelAbuSeada May 20, 2026
0b764ef
Experiment: use al-code-review skill template
WaelAbuSeada May 20, 2026
83a4b28
Add skip-container-setup option to evaluation workflows
WaelAbuSeada May 20, 2026
394d005
Fix codereview lint issues in pipeline helpers
WaelAbuSeada May 20, 2026
ae0f1d2
Revert "Fix codereview lint issues in pipeline helpers"
WaelAbuSeada May 20, 2026
143075b
Expand code-review detailed table metrics
WaelAbuSeada May 21, 2026
7411246
Expand code-review detailed table metrics
WaelAbuSeada May 21, 2026
0d6e7ad
Update config and container setup action
WaelAbuSeada May 21, 2026
c7131a4
Update config and container setup action
WaelAbuSeada May 21, 2026
213ce7f
Remove unused apply_patch import from code-review evaluate
WaelAbuSeada May 21, 2026
2e1ced0
Refactor code-review metrics into pipeline and split comment display …
WaelAbuSeada May 21, 2026
b9babe6
Merge category/code-review into experiment/code-review-al-skill
WaelAbuSeada May 21, 2026
558d8ad
Normalize code-review test-run instance IDs to valid pattern
WaelAbuSeada May 21, 2026
3ff6876
Normalize code-review test-run instance IDs to valid pattern
WaelAbuSeada May 21, 2026
be4ccd9
Use plain code-review IDs (security_001 style) and relax ID pattern
WaelAbuSeada May 21, 2026
6e68751
Use plain code-review IDs (security_001 style) and relax ID pattern
WaelAbuSeada May 21, 2026
9b4e5b1
Revert instance_id regex to original strict pattern
WaelAbuSeada May 21, 2026
d691d26
Revert instance_id regex to original strict pattern
WaelAbuSeada May 21, 2026
32e499b
Rename code-review test IDs to strict non-vsoadmin format
WaelAbuSeada May 21, 2026
4c7e03c
Rename code-review test IDs to strict non-vsoadmin format
WaelAbuSeada May 21, 2026
54c618f
fix: add dataset-path input to setup-bc-container action
WaelAbuSeada May 21, 2026
85503e8
fix: add dataset-path input to setup-bc-container action
WaelAbuSeada May 21, 2026
05673bc
feat: add precision and recall to detailed results table
WaelAbuSeada May 21, 2026
06ee0b9
feat: add precision and recall to detailed results table
WaelAbuSeada May 21, 2026
db9c805
fix: apply pre-commit lint and typing fixes
WaelAbuSeada May 21, 2026
f5ffe80
fix: apply pre-commit lint and typing fixes
WaelAbuSeada May 21, 2026
7b6f871
chore: remove UI instruction file
WaelAbuSeada May 21, 2026
8835a18
chore: remove UI instruction file
WaelAbuSeada May 21, 2026
b05a635
fix: review code changes from applied entry patch
WaelAbuSeada May 21, 2026
aabee80
fix: review code changes from applied entry patch
WaelAbuSeada May 21, 2026
0e11f9d
fix: support simplified code-review patch materialization
WaelAbuSeada May 21, 2026
1f7a5bd
fix: support simplified code-review patch materialization
WaelAbuSeada May 21, 2026
7d4ee94
fix: tighten code-review diff and parsing behavior
WaelAbuSeada May 21, 2026
6777610
fix: tighten code-review diff and parsing behavior
WaelAbuSeada May 21, 2026
f64ecdf
Merge branch 'main' of https://github.com/microsoft/BC-Bench into cat…
haoranpb May 29, 2026
ef84b18
cleanup after merge from main
haoranpb May 29, 2026
a711b3e
Refactor evaluation workflows to use dynamic runner and container req…
haoranpb May 29, 2026
0c58e8c
make run step OS indenpendent
haoranpb May 29, 2026
b076b98
fix score mismatch
haoranpb May 29, 2026
df11718
extract github action related commands
haoranpb May 29, 2026
541f6e4
test should not test runner name
haoranpb May 29, 2026
c9193e5
make code review patches proper git diff
haoranpb May 29, 2026
4408974
Merge branch 'main' of https://github.com/microsoft/BC-Bench into cat…
haoranpb May 29, 2026
859ec99
refactor to seperate the logics
haoranpb Jun 1, 2026
0feba63
make more steps OS independent
haoranpb Jun 1, 2026
db12ed4
skip leaderboard update and stricter field for codereview resutl
haoranpb Jun 1, 2026
820b767
simplify import/export
haoranpb Jun 1, 2026
7848f4b
move CodeReviewResultSummary into codereview result file
haoranpb Jun 1, 2026
64f37c0
strongly type CodeReviewResultSummary and reuse metrics util
haoranpb Jun 1, 2026
d216e42
saperate leaderboard from summary and make it generic
haoranpb Jun 1, 2026
49a5cef
fix failing tests
haoranpb Jun 1, 2026
35c5045
Potential fix for pull request finding 'Module imports itself'
haoranpb Jun 1, 2026
eaa1a2c
Merge branch 'main' of https://github.com/microsoft/BC-Bench into cat…
haoranpb Jun 1, 2026
e00e939
add CodeReview to mock tests
haoranpb Jun 1, 2026
13b568c
Merge branch 'main' of https://github.com/microsoft/BC-Bench into cat…
haoranpb Jun 3, 2026
0fef385
Merge category/code-review into experiment/code-review-al-skill
WaelAbuSeada Jun 4, 2026
d34742b
Remove skills and instructions from category branch
WaelAbuSeada Jun 4, 2026
f450ae2
Merge remote-tracking branch 'origin/category/code-review' into exper…
WaelAbuSeada Jun 4, 2026
6c2437b
Keep instructions/skills on experiment and enable skill-based code re…
WaelAbuSeada Jun 4, 2026
d063ac2
Add skill/instruction read diagnostics from hook logs
WaelAbuSeada Jun 4, 2026
a9b3e3f
Set instructions disabled in shared config
WaelAbuSeada Jun 4, 2026
d519e67
Add session-log skill diagnostics and enable custom instructions
WaelAbuSeada Jun 4, 2026
9eab86e
Restore al-test-generation assets on category branch
WaelAbuSeada Jun 4, 2026
7013327
Add domain-aware code-review prompts, parsing, and results
WaelAbuSeada Jun 9, 2026
40a2bdd
Remove domain routing from code-review prompt
WaelAbuSeada Jun 9, 2026
b16dc2d
Tune code-review experiment: harness diff-scoping, domain discipline,…
WaelAbuSeada Jun 9, 2026
d2c4b34
Remove domain-drop scoring; slim privacy entries to in-domain findings
WaelAbuSeada Jun 10, 2026
92d62d6
Slim privacy code-review entries (003/012/015) to eliminate OOD findi…
WaelAbuSeada Jun 10, 2026
6e56fee
Slim security code-review entries 001-007 to eliminate OOD findings (…
WaelAbuSeada Jun 10, 2026
3621550
Slim security code-review entries 008-016 to eliminate OOD findings
WaelAbuSeada Jun 10, 2026
dbed30a
Slim security code-review entries 008-016 to eliminate OOD findings
WaelAbuSeada Jun 10, 2026
2c8798e
Slim style negative-control entries 001-008 to eliminate OOD findings
WaelAbuSeada Jun 10, 2026
9a9092e
Fix style negative-control 001/008 OOD findings
WaelAbuSeada Jun 10, 2026
f5bece9
Reduce OOD in style positive entries 012/014/016/017
WaelAbuSeada Jun 10, 2026
10a6d4b
Eliminate OOD in style positive entry 011
WaelAbuSeada Jun 10, 2026
603bfe6
Eliminate OOD in style positive entry 013
WaelAbuSeada Jun 10, 2026
d027ae8
Clean up diagnostic logging from comment judge
WaelAbuSeada Jun 10, 2026
2d9223b
Remove skill-read diagnostics from hook logs
WaelAbuSeada Jun 10, 2026
6387220
Move comment_judge into code-review namespace
WaelAbuSeada Jun 10, 2026
8560fcc
Fix code-review upgrade entries 001-009 to eliminate out-of-domain fi…
WaelAbuSeada Jun 10, 2026
02c0836
Fix code-review performance negatives to eliminate out-of-domain find…
WaelAbuSeada Jun 10, 2026
2a2a170
Improve code-review summary layout with grouped tables and explanations
WaelAbuSeada Jun 11, 2026
7154f36
Fix code-review performance positives to eliminate out-of-domain find…
WaelAbuSeada Jun 11, 2026
41ec36a
Remove temporary batch_verify.py eval driver script
WaelAbuSeada Jun 11, 2026
302620e
Sort detailed results by domain then natural instance id
WaelAbuSeada Jun 11, 2026
23ce854
Enrich 9 zero-expected performance entries with true-positive findings
WaelAbuSeada Jun 11, 2026
0e3a522
Renumber performance and privacy entries to be contiguous
WaelAbuSeada Jun 11, 2026
f601f0b
Fix Copilot CLI v1.0.61 metrics + Linux hook firing
WaelAbuSeada Jun 11, 2026
7652da8
Switch Copilot Linux hook to Python; capture process logs in artifacts
WaelAbuSeada Jun 11, 2026
0f510f1
WIP: Pass-1 OOD cleanup for non-perf code-review entries (16)
WaelAbuSeada Jun 11, 2026
fac1718
Add code-review probe harness and enrichment scripts
WaelAbuSeada Jun 12, 2026
68e81c4
Enrich 28 zero-expected code-review entries with in-domain findings
WaelAbuSeada Jun 12, 2026
79ec33f
Fix pre-commit lint failures across hook and tools scripts
WaelAbuSeada Jun 12, 2026
652ac9b
Port code-review category infra and dataset enrichment from experimen…
WaelAbuSeada Jun 12, 2026
f5825f2
Potential fix for pull request finding 'Variable defined multiple times'
WaelAbuSeada Jun 12, 2026
740a31a
Merge branch 'category/code-review' into experiment/code-review-al-skill
WaelAbuSeada Jun 12, 2026
1d91024
Restore default config.yaml on category branch
WaelAbuSeada Jun 12, 2026
3efaba1
Merge branch 'category/code-review' into experiment/code-review-al-skill
WaelAbuSeada Jun 12, 2026
e77c93b
Include domain and suggestedCode in code-review finding schema
WaelAbuSeada Jun 12, 2026
8f4cc63
Merge branch 'category/code-review' into experiment/code-review-al-skill
WaelAbuSeada Jun 12, 2026
355c26c
Switch Copilot CLI auth to built-in Actions token (org billing)
WaelAbuSeada Jun 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/claude-evaluation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ on:
options:
- "bug-fix"
- "test-generation"
- "code-review"
test-run:
description: "Indicate this is a test run (with few entries)"
required: false
Expand Down
18 changes: 6 additions & 12 deletions .github/workflows/copilot-evaluation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ on:
options:
- "bug-fix"
- "test-generation"
- "code-review"
test-run:
description: "Indicate this is a test run (with few entries)"
required: false
Expand Down Expand Up @@ -83,6 +84,7 @@ jobs:
permissions:
contents: read
id-token: write
copilot-requests: write
name: ${{ matrix.entry }}
strategy:
fail-fast: false
Expand Down Expand Up @@ -122,21 +124,11 @@ jobs:
- name: Install GitHub Copilot CLI
run: npm install -g @github/copilot@1.0.57

- name: Select PAT based on job index
id: select-pat
shell: pwsh
run: |
$patIndex = ${{ strategy.job-index }} % 4
echo "pat_index=$patIndex" >> $env:GITHUB_OUTPUT

- name: Run GitHub Copilot CLI for entry ${{ matrix.entry }}
timeout-minutes: 120
shell: pwsh
env:
COPILOT_GITHUB_TOKEN: ${{ steps.select-pat.outputs.pat_index == '0' &&
secrets.COPILOT_PAT || (steps.select-pat.outputs.pat_index == '1' &&
secrets.COPILOT_PAT2 || (steps.select-pat.outputs.pat_index == '2'&&
secrets.COPILOT_PAT3 || secrets.COPILOT_PAT4)) }}
COPILOT_GITHUB_TOKEN: ${{ github.token }}
run: |
Write-Output "::add-mask::$env:COPILOT_GITHUB_TOKEN"

Expand All @@ -153,7 +145,9 @@ jobs:
if: always()
with:
name: evaluation-results-${{ github.run_id }}-${{ matrix.entry }}
path: ${{ env.EVALUATION_RESULTS_DIR }}/**/*.jsonl
path: |
${{ env.EVALUATION_RESULTS_DIR }}/**/*.jsonl
${{ env.EVALUATION_RESULTS_DIR }}/**/*.log
retention-days: ${{ inputs.test-run && 1 || 30 }}

summarize-results:
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/summarize-results.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,8 @@ jobs:
--use-capi ${{ !inputs.mock && '--storage braintrust --storage kusto' || '' }}

- name: Update leaderboard in a new branch
if: ${{ !inputs.mock && !inputs.skip-leaderboard }}
# WIP for code-review category
if: ${{ !inputs.mock && !inputs.skip-leaderboard && inputs.category != 'code-review' }}
run: |
git fetch origin main

Expand Down
81 changes: 81 additions & 0 deletions dataset/codereview.jsonl

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions docs/_data/code-review.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"runs": [],
"aggregate": []
}
20 changes: 20 additions & 0 deletions evaluator/scores.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,23 @@ def __call__(self, *, metadata: dict, **kwargs: object) -> bool:
class PostPatchPassedRate:
def __call__(self, *, metadata: dict, **kwargs: object) -> bool:
return metadata.get("post_patch_passed", False)


class PrecisionScore:
def __call__(self, *, metadata: dict, **kwargs: object) -> float:
return float(metadata.get("precision", 0.0))


class RecallScore:
def __call__(self, *, metadata: dict, **kwargs: object) -> float:
return float(metadata.get("recall", 0.0))


class F1Score:
def __call__(self, *, metadata: dict, **kwargs: object) -> float:
return float(metadata.get("f1", 0.0))


class ValidReviewOutput:
def __call__(self, *, metadata: dict, **kwargs: object) -> bool:
return bool(metadata.get("valid_review_output", False))
3 changes: 2 additions & 1 deletion scripts/BCBenchUtils.psm1
Original file line number Diff line number Diff line change
Expand Up @@ -490,13 +490,14 @@ function Get-BCBenchDatasetPath {
param(
[Parameter(Mandatory = $true)]
# Category validation lives only here: every caller resolves the dataset path through this function, so there's no need to duplicate ValidateSet on each caller.
[ValidateSet("bug-fix", "test-generation")]
[ValidateSet("bug-fix", "test-generation", "code-review")]
[string] $Category
)

switch ($Category) {
"bug-fix" { $DatasetName = "bcbench.jsonl" }
"test-generation" { $DatasetName = "bcbench.jsonl" }
"code-review" { $DatasetName = "codereview.jsonl" }
}

[string] $projectRoot = Split-Path $PSScriptRoot -Parent
Expand Down
34 changes: 27 additions & 7 deletions src/bcbench/agent/copilot/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,12 @@ def parse_metrics(output_lines: Sequence[str], session_log_path: Path | None = N
output_lines: Lines from Copilot CLI stderr output
session_log_path: Optional path to session log file for tool usage parsing

Expected output format (new, v1.0.2+):
Expected output format (newest, v1.0.61+):
Changes +23 -0
AI Credits 58.4 (1m 14s)
Tokens ↑ 413.9k (368.1k cached) • ↓ 4.5k (500 reasoning)

Previous output format (v1.0.2..v1.0.60):
Changes +17 -0
Requests 0.33 Premium (1m 45s)
Tokens ↑ 317.5k • ↓ 4.3k • 255.0k (cached)
Expand Down Expand Up @@ -83,26 +88,41 @@ def parse_metrics(output_lines: Sequence[str], session_log_path: Path | None = N
seconds = float(duration_match.group(2))
execution_time = minutes * 60 + seconds

# New format: "Requests 0.33 Premium (1m 45s)" — extract session time from parenthesized duration
# New format (v1.0.2+): "Requests 0.33 Premium (1m 45s)" — extract session time from parenthesized duration
if execution_time is None:
requests_match = re.search(r"Requests\s+[\d.]+\s+Premium\s+\((?:(\d+)m\s*)?(\d+(?:\.\d+)?)s\)", output_text)
if requests_match:
minutes = int(requests_match.group(1)) if requests_match.group(1) else 0
seconds = float(requests_match.group(2))
execution_time = minutes * 60 + seconds

# Newest format (v1.0.61+): "AI Credits 58.4 (1m 14s)" — "Requests N Premium" was renamed to "AI Credits N"
if execution_time is None:
credits_match = re.search(r"AI Credits\s+[\d.]+\s+\((?:(\d+)m\s*)?(\d+(?:\.\d+)?)s\)", output_text)
if credits_match:
minutes = int(credits_match.group(1)) if credits_match.group(1) else 0
seconds = float(credits_match.group(2))
execution_time = minutes * 60 + seconds

# Token usage — legacy format: "1.3m in, 11.6k out"
usage_match = re.search(r"(\d+(?:\.\d+)?[km]?)\s+in,\s*(\d+(?:\.\d+)?[km]?)\s+out", output_text)
if usage_match:
prompt_tokens = _parse_token_count(usage_match.group(1))
completion_tokens = _parse_token_count(usage_match.group(2))

# New format: "Tokens ↑ 317.5k • ↓ 4.3k • 255.0k (cached)"
# New format (v1.0.2+): "Tokens ↑ 317.5k • ↓ 4.3k • 255.0k (cached)"
# Newest format (v1.0.61+): "Tokens ↑ 413.9k (368.1k cached) • ↓ 4.5k (500 reasoning)"
# Use separate ↑ / ↓ lookups to tolerate inline "(N cached)" / "(N reasoning)" annotations
# between the two values.
if prompt_tokens is None:
tokens_match = re.search(r"Tokens\s+[^\d]*(\d+(?:\.\d+)?[km]?)\s*[•·]\s*[^\d]*(\d+(?:\.\d+)?[km]?)", output_text)
if tokens_match:
prompt_tokens = _parse_token_count(tokens_match.group(1))
completion_tokens = _parse_token_count(tokens_match.group(2))
tokens_line_match = re.search(r"Tokens\s+([^\n]+)", output_text)
if tokens_line_match:
tokens_line = tokens_line_match.group(1)
up_match = re.search(r"\u2191\s*(\d+(?:\.\d+)?[km]?)", tokens_line)
down_match = re.search(r"\u2193\s*(\d+(?:\.\d+)?[km]?)", tokens_line)
if up_match and down_match:
prompt_tokens = _parse_token_count(up_match.group(1))
completion_tokens = _parse_token_count(down_match.group(1))

if execution_time is not None or llm_duration is not None or prompt_tokens is not None or completion_tokens is not None or turn_count is not None:
return AgentMetrics(
Expand Down
17 changes: 15 additions & 2 deletions src/bcbench/agent/shared/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,19 @@ prompt:
{{task}}
{% endif %}

code-review-template: |
/al-code-review

Review ONLY the current working-tree AL file changes for this evaluation entry.
Use the working tree diff only (git diff HEAD), and focus on changed *.al files.
Do NOT review committed history or the HEAD commit, and do NOT compare commits (for example, do NOT use HEAD~1..HEAD or origin/main comparisons).

Save findings to a file named "review.json" in the repository root.
The file must contain valid JSON with a top-level object named findings.
Each finding must include: filePath, lineNumber, severity, issue, recommendation, domain, suggestedCode
Allowed severity values are: critical, high, medium, low.
If there are no findings, write an empty findings list.

# controls:
# 1. whether to copy custom instructions from `src/bcbench/agent/shared/instructions/<sanitized-repo>/`
# - Copilot: copies to repo/.github/ and renames AGENTS.md to copilot-instructions.md
Expand All @@ -59,14 +72,14 @@ prompt:
# NOTE: the canonical source file is AGENTS.md; it is automatically renamed
# to the agent-specific filename (AgentType.instruction_filename) during setup
instructions:
enabled: false
enabled: true

# controls:
# 1. whether to copy skills from `src/bcbench/agent/shared/instructions/<sanitized-repo>/skills/`
# - Copilot: copies to repo/.github/skills/
# - Claude: copies to repo/.claude/skills/
skills:
enabled: false
enabled: true

# controls:
# 1. whether to copy custom agents from `src/bcbench/agent/shared/instructions/<sanitized-repo>/agents/`
Expand Down
51 changes: 51 additions & 0 deletions src/bcbench/agent/shared/hooks/log_tool_usage.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
"""Copilot/Claude PreToolUse hook: log tool invocations to a JSONL file.

Reads the hook payload from stdin and appends one JSON line per call to the
path in BCBENCH_TOOL_LOG. Used by both Copilot CLI (Linux runners) and Claude
hooks via the `bash` field of the hook command spec; the legacy .ps1 in this
directory mirrors the same behavior for the Windows `powershell` field.
"""

import contextlib
import json
import os
import sys


def _extract_tool_name(payload: dict) -> str | None:
name = payload.get("tool_name") or payload.get("toolName")
if name != "lsp":
return name

args = payload.get("toolArgs") or payload.get("tool_input")
if isinstance(args, str):
try:
args = json.loads(args)
except json.JSONDecodeError:
args = None
if isinstance(args, dict) and (op := args.get("operation")):
return f"lsp:{op}"
return name


def main() -> None:
try:
payload = json.loads(sys.stdin.read() or "{}")
except json.JSONDecodeError:
return

name = _extract_tool_name(payload)
log_path = os.environ.get("BCBENCH_TOOL_LOG")
if not name or not log_path:
return

entry = {"tool_name": name, "timestamp": payload.get("timestamp", "")}
with open(log_path, "a", encoding="utf-8") as f:
f.write(json.dumps(entry) + "\n")


if __name__ == "__main__":
with contextlib.suppress(Exception):
# Never block tool execution — silently fail.
main()
sys.exit(0)
Loading
Loading