Agentic fuzzer + 7 filter bug fixes by PLNech · Pull Request #12 · algolia/rtk

PLNech · 2026-03-20T16:25:00Z

Summary

Built an agentic fuzzer (139 static tests, 35 command families, 7 heuristics)
Found 15 filter bugs, fixed 7 in this PR
Failure rate: 28% to 16% (remaining are by-design or known limitations)

Fixes

Docker ps/images - accept arbitrary flags with smart passthrough (FUZZ-006)
pip - respect user --format flag instead of overriding with --format=json (FUZZ-007)
npm - route subcommands correctly instead of hardcoding npm run (FUZZ-008)
find - delegate to system find when predicates detected (FUZZ-009)
cargo - preserve stderr channel for clippy/check output (FUZZ-010)
git branch -a - show all remotes without dedup when user asks (FUZZ-011)
gh pr diff - raise max_lines 100 to 500 to avoid silent truncation (FUZZ-009/gh)

Remaining issues

docker ps --format json / --no-trunc should passthrough #9 docker --format json / --no-trunc passthrough
pip: uv doesn't support --format=columns/freeze/--not-required #10 pip/uv compatibility gaps
Add agentic fuzzer to CI as regression gate #11 add fuzzer to CI

Test plan

cargo fmt, clippy, test (427 pass)
Fuzzer: 15.8% failure rate, down from 28%
Manual: rtk docker ps -a, rtk pip list, rtk npm list, rtk find . -maxdepth 2 -name '*.rs'

- install.sh now points to algolia/rtk at pinned v0.22.2 (overridable via RTK_VERSION env var) - All docs, scripts, CI workflows updated from rtk-ai/rtk to algolia/rtk - CHANGELOG historical upstream links preserved for attribution - All GitHub Actions workflows target 'main' branch (not 'master') - New ci.yml: build/test/clippy/fmt on ubuntu + macOS - New telemetry-guard CI job: blocks telemetry.rs, ureq, sha2, hostname deps, and phone-home patterns in source - CLAUDE.md documents fork maintenance strategy, sync policy, and telemetry exclusion rules

…re-commit hook to match CI

…ld on PRs

fork: establish algolia/rtk identity with no-telemetry CI guard

release: v0.22.3

feat: support git -C directory option

release: v0.22.4

gh: when --json flag is present, bypass all filtering and passthrough raw JSON output. RTK was reformatting structured JSON into lossy human-readable summaries, breaking downstream jq/python consumers. grep: detect -c/--count mode and passthrough rg output verbatim. The file:count format was being misinterpreted as file:linenum:content, producing gibberish like "📄 101 (1): prefix_saturated". Closes bug-report: 2026-03-19-gh-json-output-rewritten.md

scripts/fuzz-rtk.py uses Qwen 3.5 (via Algolia inference proxy) to generate diverse command invocations targeting format-changing flags, then compares raw vs RTK output with 6 heuristic checks: JSON integrity, line expansion, emoji injection, exit code match, data loss, and format preservation. First run: 64 tests, 30 failures (47% failure rate). Key findings: - grep: most rg flags crash with exit 2 (-l, --vimgrep, -A/-B/-C, --json) - git log: custom --format/--pretty output mangled by compact filter - git log: --graph characters stripped, --stat/--patch data lost Bug reports documented in bug-reports/2026-03-19-fuzz-findings.md

RTK's grep had short flags colliding with rg: -l (RTK: --max-len) vs rg -l (--files-with-matches) -c (RTK: --context-only) vs rg -c (--count) -m (RTK: --max) vs rg -m (--max-count) Removed colliding short flags from Clap. Now -l, -c, -m flow through to rg via extra_args as users expect. Generalized passthrough to all format-changing rg flags: -l, --json, --vimgrep, -h, --count-matches, and context flags (-A/-B/-C). These bypass RTK's grouping filter since they produce different output formats. Found via agentic fuzzing — 14/16 grep tests were failing before.

When user provides --format, --pretty, --graph, --stat, --numstat, --shortstat, --patch, -p, --raw, --name-only, or --name-status, RTK now passes output through verbatim instead of applying its compact filter. These flags indicate the user wants specific output formatting that RTK's compression would mangle. --oneline without other detail flags still gets RTK's default compression since it produces compatible output. Found via agentic fuzzing — 13/16 git-log tests were failing.

- Add 11 new command families: find, cat, tree, cargo-test, cargo-clippy, git-branch, git-stash, curl, wc, env, diff - Add STATIC_TESTS dict for deterministic regression without LLM - Extract _run_single_test() reusable helper - Switch to qwen3.5-35b-fp8 for faster throughput - Add FUZZ-RTK.md project README documenting architecture and usage

…tats

New families: git-diff, git-show, git-branch, find, cargo-build, cargo-clippy, diff, env, curl. Fix cargo 2>&1 safety false positive. Found 8+ new bugs across ls, find, diff, git-branch, git-show, wc.

…ind/ls/git-branch

Fixes discovered by agentic fuzzer (Run 2, bugs 8/10/11/12): - diff: exit code 1 when files differ (matches diff convention) - git branch: passthrough for --format/--sort flags - git show: passthrough for --name-only/--name-status/--raw/--no-patch/-p - ls: passthrough when -l flag present (preserves metadata users expect) Static fuzzer: 20 fail → 13 fail (65 tests)

- git diff: passthrough for --name-only/--name-status flags - cargo build/clippy/check: passthrough for --message-format=json - grep: disable_help_flag so -h flows to rg instead of Clap - fuzz-rtk.py: remove false cat -b test case

Maps -q to --brief flag, shells out to real diff for brief mode. Preserves exit codes (0=identical, 1=differ).

…rmat check - grep/rg: sort lines before FORMAT_ALTERED comparison to handle non-deterministic rg thread ordering (was causing false positives) - wc: skip FORMAT_ALTERED check entirely (filename stripping is intentional compression, not a format bug) Results: 47% → 8% failure rate (5 remaining are design choices)

…UZZ-005) Five bug classes discovered by LLM-powered fuzzing: - FUZZ-001: git format flags not passthrough - FUZZ-002: cargo --message-format=json mangled - FUZZ-003: git diff --name-only filtered incorrectly - FUZZ-004: grep -h intercepted by Clap - FUZZ-005: missing flag support (diff -q, ls -l, git branch)

New bug classes tested: - separator: -- handling between tool flags and args - large-output: truncation behavior on 100+ line results - stderr-commands: commands producing output on stderr - gh-issue/gh-api: GitHub CLI subcommands - empty-output: no-match/clean-state edge cases New heuristic (#7): STDERR_LOSS — detects when raw stderr content vanishes through RTK's filter pipeline. Preflight now skips vault/API checks for static-only runs. Multi-command families use RTK_MAP for command translation. Found 3 new bugs: cargo test -- separator loss, cargo clippy stderr loss, find -name Clap misinterpretation.

New families: docker-ps, docker-images, pip, go-test, npm, ruff, pytest, git-tag, git-remote, cargo-test (expanded), cargo-clippy (expanded). New bugs: FUZZ-006 (docker flags rejected), FUZZ-007 (pip format override), FUZZ-008 (npm hardcoded to run), FUZZ-009-012 (find, cargo stderr, git branch, diff data loss). LLM round on weak families: 94% failure rate confirms docker/pip/npm/find are comprehensively broken.

Add trailing_var_arg to DockerCommands::Ps and Images so Clap accepts extra flags. Format-changing flags (--format, -q, --quiet) trigger passthrough; content flags (-a, --filter, --no-trunc) pass through while RTK still filters the output. Also adds shared has_output_format_flag() utility to utils.rs for reuse across modules. Fixes 12 fuzzer failures.

…ZZ-007) When user specifies --format=freeze/columns/etc, skip injecting --format=json and passthrough raw output. Unknown subcommands now passthrough instead of erroring. Fixes format override for pip list/outdated. uv compatibility for edge flags (--not-required) is a separate uv issue.

…-008) Only npm run/run-script goes through the boilerplate filter. All other subcommands (list, outdated, config, view, install, etc.) passthrough to npm directly. Previously all input was prepended with "run", breaking npm list, npm config, npm --version, etc. Fixes 3 fuzzer failures.

Restructured Find command to use trailing_var_arg with manual parsing. When system-find predicates (-name, -type, -maxdepth, etc.) are present, delegates to system find. Without predicates, uses RTK's glob finder. Fixes 3 fuzzer failures where Clap rejected valid find predicates.

When cargo subcommand output was primarily on stderr (clippy, check), the filtered output was printed to stdout, causing STDERR_LOSS. Now detects the original channel and preserves it.

When user explicitly passes -a/--all or -r/--remotes, show full remote list instead of deduplicating against local branches. RTK's auto-added -a (when no list flag given) still deduplicates for token savings.

…9/gh) compact_diff() was called with max_lines=100 for PR diffs, silently dropping files beyond ~2 in multi-file PRs. Agents doing PR review received incomplete diffs with no truncation signal. Raised to 500. Includes regression test with synthetic 5-file PR diff.

Replace clap subcommand enum with flat trailing_var_arg for compose, then manually route subcommands after splitting global flags. Fixes: docker compose -f deploy/docker-compose.yml build

filter_curl_output now pretty-prints JSON with serde_json, keeping actual data values intact. Previous behaviour replaced values with type names (string, int) via json_cmd, destroying API response data.

Commands containing =$( or =` are now skipped entirely by the rewrite hook. The ENV_PREFIX regex misparsed VAR=$(grep ...) as an env prefix, mangling the command. Safer to skip than to risk broken shell syntax.

PLNech and others added 30 commits March 17, 2026 21:15

fix: resolve all clippy -D warnings (pre-existing)

435ac51

fix: rustfmt formatting for has_limit_flag chain

f9d442b

fix: resolve remaining clippy -D warnings across 20 files

b76690e

fix: cargo fmt formatting for container.rs and detector.rs; tighten p…

e6963b3

…re-commit hook to match CI

ci: parallelize fmt/telemetry-guard, add rust-cache, skip release bui…

09784e9

…ld on PRs

ci: use actions/cache@v4 for Rust build cache (zero third-party trust)

1a4b175

Merge pull request #1 from algolia/fork/algolia-identity

3e2fa5d

fork: establish algolia/rtk identity with no-telemetry CI guard

release: bump to v0.22.3, drop homebrew job, update version refs

d2d8a34

Merge pull request #3 from algolia/release/v0.22.3

e458192

release: v0.22.3

feat: support git -C (directory) global option for cross-repo commands

16c324e

Merge pull request #5 from algolia/fix/git-global-opts

dd16e7c

feat: support git -C directory option

release: bump to v0.22.4 with git -C support

450c4bb

release: update version refs to 0.22.4 across docs

f89894e

Merge pull request #7 from algolia/release/v0.22.4

9287247

release: v0.22.4

fix: correct git_cmd() doc comment per PR #5 review feedback

3c4126d

feat: add RTK_SKIP_TRACKING env var to exclude fuzzer from rtk gain s…

fa315c6

…tats

feat: expand static tests to 65 across 17 families, add demo page

69eb684

New families: git-diff, git-show, git-branch, find, cargo-build, cargo-clippy, diff, env, curl. Fix cargo 2>&1 safety false positive. Found 8+ new bugs across ls, find, diff, git-branch, git-show, wc.

docs: add Run 2 fuzzer findings — 210 tests, 6 new bugs across diff/f…

68eb1e0

…ind/ls/git-branch

fix: round 2 passthrough fixes from agentic fuzzing

3553b33

- git diff: passthrough for --name-only/--name-status flags - cargo build/clippy/check: passthrough for --message-format=json - grep: disable_help_flag so -h flows to rg instead of Clap - fuzz-rtk.py: remove false cat -b test case

fix: add diff -q/--brief passthrough to system diff

e1e168e

Maps -q to --brief flag, shells out to real diff for brief mode. Preserves exit codes (0=identical, 1=differ).

PLNech added 12 commits March 20, 2026 11:15

fix: cargo clippy/check preserve stderr channel (FUZZ-010)

829bef5

When cargo subcommand output was primarily on stderr (clippy, check), the filtered output was printed to stdout, causing STDERR_LOSS. Now detects the original channel and preserves it.

fix: git branch -a shows all remotes without dedup (FUZZ-011)

d2cde3c

When user explicitly passes -a/--all or -r/--remotes, show full remote list instead of deduplicating against local branches. RTK's auto-added -a (when no list flag given) still deduplicates for token savings.

docs: add remaining bug report files from fuzzing session

91b7f8d

fix: docker compose global flags (-f, -p) forwarded correctly

196fd54

Replace clap subcommand enum with flat trailing_var_arg for compose, then manually route subcommands after splitting global flags. Fixes: docker compose -f deploy/docker-compose.yml build

fix: curl preserves JSON response values instead of schema-ifying

cfcde03

filter_curl_output now pretty-prints JSON with serde_json, keeping actual data values intact. Previous behaviour replaced values with type names (string, int) via json_cmd, destroying API response data.

fix: hook skips rewrite for subshell/backtick assignments

794a8ed

Commands containing =$( or =` are now skipped entirely by the rewrite hook. The ENV_PREFIX regex misparsed VAR=$(grep ...) as an env prefix, mangling the command. Safer to skip than to risk broken shell syntax.

PLNech force-pushed the main branch 2 times, most recently from 43f54e2 to 37ac0de Compare March 31, 2026 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agentic fuzzer + 7 filter bug fixes#12

Agentic fuzzer + 7 filter bug fixes#12
PLNech wants to merge 42 commits intomainfrom
feature/agentic-fuzzing

PLNech commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PLNech commented Mar 20, 2026

Summary

Fixes

Remaining issues

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant