Skip to content

test(supervisor): add comprehensive unit tests with race detector#143

Merged
TerrifiedBug merged 4 commits intomainfrom
feat/supervisor-tests
Apr 3, 2026
Merged

test(supervisor): add comprehensive unit tests with race detector#143
TerrifiedBug merged 4 commits intomainfrom
feat/supervisor-tests

Conversation

@TerrifiedBug
Copy link
Copy Markdown
Owner

Summary

Adds 18 unit tests for agent/internal/supervisor, a critical package that previously had zero test coverage.

Production changes (required to make the package testable):

  • Extract supervisedProcess interface and procFactory type (following the tapper pattern)
  • Add injectable startupDelay, backoffFunc, and stopTimeout fields to Supervisor
  • No exported API changes

Tests cover:

  • Start success (STARTING → RUNNING transition)
  • Duplicate pipeline rejection
  • Process start failure (not registered in active map)
  • Unique port allocation per pipeline
  • Graceful stop (SIGTERM, no SIGKILL)
  • Kill-after-timeout escalation (SIGTERM timeout → SIGKILL)
  • Stop of non-existent pipeline (no-op)
  • Active map cleanup after stop (ID reuse)
  • Crash detection and automatic restart
  • CRASHED status observable before restart
  • Multiple crash cycles (backoff called per crash)
  • Exponential backoff function caps at 60s
  • Clean exit → STOPPED, no restart
  • Restart method (stop + start new proc)
  • ShutdownAll across concurrent pipelines
  • Statuses metadata accuracy
  • UpdateVersion without restart
  • SetConfigChecksum
  • GetRecentLogs (ring buffer + clear semantics)
  • Goroutine leak detection via runtime.NumGoroutine
  • Concurrent read/write safety (most valuable with -race)

Test plan

  • cd agent && go test -race ./internal/supervisor/... -v — all 18 tests pass
  • cd agent && go test -race ./... — full agent suite remains green
  • No data races reported by the race detector

Closes VEC-10, VEC-17

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 3, 2026

Greptile Summary

Adds 18 comprehensive unit tests for agent/internal/supervisor along with the minimal production changes needed to make the package testable (extracted supervisedProcess interface, injectable procFactory, startupDelay, backoffFunc, and stopTimeout). The production-side refactoring is minimal and correct — the removed info.cmd.Process == nil guard in stopProcess is safe because stopProcess is only ever called on a process whose Start() already succeeded.

Confidence Score: 5/5

Safe to merge — no bugs in production or test code; the prior time.Sleep synchronization concern is a minor flakiness risk, not a correctness defect.

All changed code is correct. The production refactoring is minimal and equivalent to pre-refactor behavior. The test suite is comprehensive, race-detector compatible, and uses proper sync primitives (sync.Once, channels) for the mock. The only remaining concern (time.Sleep as synchronization in two tests) was already raised in a prior review thread and is a P2 flakiness risk at best.

No files require special attention.

Important Files Changed

Filename Overview
agent/internal/supervisor/supervisor.go Extracts supervisedProcess interface + procFactory type, injects startupDelay/backoffFunc/stopTimeout into Supervisor struct, and refactors startProcess/stopProcess/monitor to use these. No exported API changes; logic is equivalent to the pre-refactor code.
agent/internal/supervisor/supervisor_test.go New 773-line test file covering 18 scenarios (start, stop, crash/restart, backoff, ShutdownAll, metadata, goroutine leak, race-detector). mockProc uses sync.Once on exitCh for correct single-delivery Wait() semantics. Minor: two time.Sleep synchronization barriers remain in TestStop_RemovesFromActive and TestContextCancellation_NoGoroutineLeak (flagged in prior review thread).

Sequence Diagram

sequenceDiagram
    participant T as Test / Caller
    participant S as Supervisor
    participant M as monitor goroutine
    participant P as supervisedProcess

    T->>S: Start(pipelineID, ...)
    S->>P: mkProc(...) → new proc
    S->>P: proc.Start()
    S->>M: go monitor(info, ...)
    S-->>T: nil

    M->>M: time.Sleep(startupDelay)
    M->>S: mu.Lock → status = RUNNING
    M->>P: proc.Wait() [blocks]

    alt crash (err != nil)
        P-->>M: Wait() returns error
        M->>M: close(info.done)
        M->>S: mu.Lock → status = CRASHED, restarts++
        M->>M: backoff = backoffFunc(restarts)
        M->>M: go restart goroutine
        Note over M: restart goroutine: sleep(backoff) → startProcess
    else clean exit (err == nil)
        P-->>M: Wait() returns nil
        M->>M: close(info.done)
        M->>S: mu.Lock → status = STOPPED
    end

    T->>S: Stop(pipelineID)
    S->>S: delete(processes[id])
    S->>P: proc.Signal(SIGTERM)
    S->>S: select on info.done / stopTimeout
    alt graceful
        M-->>S: done closed
    else timeout
        S->>P: proc.Kill()
        M-->>S: done closed
    end
    S-->>T: nil
Loading

Reviews (3): Last reviewed commit: "ci: remove E2E tests from PR gate" | Re-trigger Greptile

Comment thread agent/internal/supervisor/supervisor_test.go
Comment thread agent/internal/tapper/tapper.go
TerrifiedBug and others added 4 commits April 3, 2026 13:27
The pipeline-validation spec imported `prisma` from scenario-utils,
which declares it locally but never exports it (TS2459). Also adds
explicit Prisma.TransactionClient type for the \$transaction callback
to satisfy strict noImplicitAny (TS7006).

Both issues were introduced in 259fb1b.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Run full Playwright E2E suite on pull_request targeting main, gated by
paths filter (src/**, prisma/**, e2e/**, playwright.config.ts) to skip
E2E on doc-only changes. Full suite (~5 min) continues to run on tag
push. Closes VEC-7.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Adds 18 unit tests for agent/internal/supervisor, a critical package
that previously had zero test coverage.

Production change: extract supervisedProcess interface and procFactory
type (following the tapper pattern), add injectable startupDelay,
backoffFunc and stopTimeout fields to Supervisor — no exported API
changes.

Tests cover:
- Start success (STARTING → RUNNING transition)
- Duplicate pipeline rejection
- Process start failure (not registered in active map)
- Unique port allocation per pipeline
- Graceful stop (SIGTERM, no SIGKILL)
- Kill-after-timeout escalation
- Stop of non-existent pipeline (no-op)
- Active map cleanup after stop (ID reuse)
- Crash detection and automatic restart
- CRASHED status observable before restart
- Multiple crash cycles (backoff called per crash)
- Exponential backoff function caps at 60s
- Clean exit → STOPPED, no restart
- Restart method (stop + start new proc)
- ShutdownAll across concurrent pipelines
- Statuses metadata accuracy
- UpdateVersion without restart
- SetConfigChecksum
- GetRecentLogs (ring buffer + clear semantics)
- Goroutine leak detection via runtime.NumGoroutine
- Concurrent read/write safety (most valuable with -race)

All tests pass: go test -race ./internal/supervisor/...

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Align with main branch — E2E tests run only on version tags
and manual dispatch, not on every PR.
@TerrifiedBug TerrifiedBug force-pushed the feat/supervisor-tests branch from f1342af to a26da53 Compare April 3, 2026 12:28
@TerrifiedBug
Copy link
Copy Markdown
Owner Author

@greptile review

2 similar comments
@TerrifiedBug
Copy link
Copy Markdown
Owner Author

@greptile review

@TerrifiedBug
Copy link
Copy Markdown
Owner Author

@greptile review

@TerrifiedBug
Copy link
Copy Markdown
Owner Author

CTO Override — VEC-62

Applying override per VEC-51 precedent. Evidence reviewed this heartbeat:

Check Status
Greptile check run (13:00–13:08Z) ✅ COMPLETED / SUCCESS
Greptile score (HEAD a26da53) ✅ 5/5 (11:03:13Z)
Lint & Type Check
Analyze Go
Analyze TypeScript/JavaScript
CodeQL
dependency-review
go test -race ./internal/supervisor/... ✅ 21/21 PASS

Root cause of Tester FAIL: Three simultaneous @greptile triggers caused a bot job queue hang (IN_PROGRESS from 13:00Z). Bot self-resolved to SUCCESS at 13:08Z. The hung check was a latency issue, not a code quality concern. No critical findings exist on current HEAD.

CTO decision: PR #143 is approved for merge. @tester can proceed.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 3, 2026

Acknowledged. The evidence supports the decision:

  • Greptile score: 5/5 (no blocking issues)
  • All CI checks green including race detector
  • 21/21 tests passing with -race
  • Bot queue resolution confirmed

No objections from the review side. Good to merge.

@TerrifiedBug TerrifiedBug merged commit 710f152 into main Apr 3, 2026
11 checks passed
@TerrifiedBug TerrifiedBug deleted the feat/supervisor-tests branch April 3, 2026 16:49
TerrifiedBug added a commit that referenced this pull request Apr 4, 2026
* fix(e2e): import prisma directly and type transaction client

The pipeline-validation spec imported `prisma` from scenario-utils,
which declares it locally but never exports it (TS2459). Also adds
explicit Prisma.TransactionClient type for the \$transaction callback
to satisfy strict noImplicitAny (TS7006).

Both issues were introduced in 259fb1b.


* ci: add E2E tests to PR gate for main branch

Run full Playwright E2E suite on pull_request targeting main, gated by
paths filter (src/**, prisma/**, e2e/**, playwright.config.ts) to skip
E2E on doc-only changes. Full suite (~5 min) continues to run on tag
push. Closes VEC-7.


* test(supervisor): add comprehensive unit tests with race detector

Adds 18 unit tests for agent/internal/supervisor, a critical package
that previously had zero test coverage.

Production change: extract supervisedProcess interface and procFactory
type (following the tapper pattern), add injectable startupDelay,
backoffFunc and stopTimeout fields to Supervisor — no exported API
changes.

Tests cover:
- Start success (STARTING → RUNNING transition)
- Duplicate pipeline rejection
- Process start failure (not registered in active map)
- Unique port allocation per pipeline
- Graceful stop (SIGTERM, no SIGKILL)
- Kill-after-timeout escalation
- Stop of non-existent pipeline (no-op)
- Active map cleanup after stop (ID reuse)
- Crash detection and automatic restart
- CRASHED status observable before restart
- Multiple crash cycles (backoff called per crash)
- Exponential backoff function caps at 60s
- Clean exit → STOPPED, no restart
- Restart method (stop + start new proc)
- ShutdownAll across concurrent pipelines
- Statuses metadata accuracy
- UpdateVersion without restart
- SetConfigChecksum
- GetRecentLogs (ring buffer + clear semantics)
- Goroutine leak detection via runtime.NumGoroutine
- Concurrent read/write safety (most valuable with -race)

All tests pass: go test -race ./internal/supervisor/...


* ci: remove E2E tests from PR gate

Align with main branch — E2E tests run only on version tags
and manual dispatch, not on every PR.

---------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant