Skip to content

Add Unix domain socket control interface for TUI (#482)#488

Open
wesm wants to merge 10 commits intomainfrom
feat/tui-control
Open

Add Unix domain socket control interface for TUI (#482)#488
wesm wants to merge 10 commits intomainfrom
feat/tui-control

Conversation

@wesm
Copy link
Collaborator

@wesm wesm commented Mar 9, 2026

Summary

  • Add a newline-delimited JSON protocol over a Unix domain socket that allows external tools to programmatically query state and trigger mutations in running TUI instances
  • Support 12 commands: 4 queries (get-state, get-filter, get-jobs, get-selected) and 8 mutations (set-filter, clear-filter, set-hide-closed, select-job, set-view, close-review, cancel-job, rerun-job)
  • Socket created at {DataDir}/tui.{PID}.sock on every TUI launch with 0600 permissions (umask-based, no TOCTOU window); runtime metadata written to tui.{PID}.json for discoverability
  • Stale sockets from dead processes cleaned on startup via platform-specific PID checks (signal-0 on Unix, tasklist on Windows)
  • select-job rejects jobs hidden by active filters; close-review/cancel-job only reflow selection when the mutated job is the currently selected row
  • Add --control-socket flag for custom socket paths
  • 34 unit and integration tests covering all commands, socket safety, permissions, stale detection, cross-platform PID checks, and runtime metadata

🤖 Generated with Claude Code

@roborev-ci
Copy link

roborev-ci bot commented Mar 9, 2026

roborev: Combined Review (26a9403)

Verdict: The PR successfully adds a Unix domain socket control plane to the TUI, but contains medium-severity issues regarding selection state and safe file cleanup.

Medium

  1. Invalid selection state on job close
    File: cmd/robore v/tui/control_handlers.go:430
    handleCtrlCloseReview can leave an invalid selection when closing the
    currently selected job with hideClosed=true and no other visible jobs exist. In that branch, if no fallback index is found, selection is left pointing to a now-hidden job.
    Suggested fix: When idx < 0, explicitly clear selection (selectedIdx = -1,
    selectedJobID = 0).

  2. Unsafe file removal in cleanup
    File: [cmd/roborev/tui/control_runtime.go:132](/home/roborev/repos/roborev/cmd/roborev/tui
    /control_runtime.go:132)
    CleanupStaleTUIRuntimes deletes info.SocketPath from runtime JSON without validating file type. If a runtime JSON is corrupted or manually edited, cleanup can remove an arbitrary non-socket file, posing a local data-loss risk
    .
    Suggested fix: Use os.Lstat before removal and only delete when the path is a Unix socket (ModeSocket), otherwise skip deleting that path and just remove the metadata.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Mar 9, 2026

roborev: Combined Review (816043a)

Verdict: The PR introduces a useful TUI control socket and runtime metadata, but requires fixes for concurrency
issues, safe cleanup validation, and cross-platform test compatibility.

Medium

  • Concurrent State Mutation in Response Payloads
    • Locations: [control_handlers.go:74](/home/roborev/repos/roborev/cmd/roborev/tui/control
      _handlers.go:74), control_handlers.go:101, [control_handlers.go:148](/home/rob
      orev/repos/roborev/cmd/roborev/tui/control_handlers.go:148), [control.go:208](/home/roborev/repos/roborev/cmd/roborev/tui/control.go:20
    • Issue: buildStateResponse/buildFilterResponse return slices directly from model, and makeJobSnapshot returns pointer fields (Closed, Verdict) directly from jobs. Those responses are marshaled in connection goroutines, while the Bubble Tea update loop can mutate model state concurrently,
      creating a risk for race conditions and inconsistent JSON.
    • Suggested Fix: Deep-copy slices and pointer-backed values before putting them in controlResponse.Data (or marshal to bytes on the update goroutine and pass immutable bytes to the socket writer).
  • Unsafe Stale-
    Runtime Cleanup

    • Location: control_runtime.go:130
    • Issue: CleanupStaleTUIR untimes calls os.Remove(info.SocketPath) when the metadata PID is dead, without verifying the path is a stale socket. If stale metadata points to a currently active socket (custom shared path) or a non-socket file, it can delete the wrong target.
    • Suggested Fix
      :
      Before removal, verify the socket type and stale state (using the same safety checks as removeStaleSocket), and only unlink when connecting proves there is no listener (ECONNREFUSED).
  • Test Incompatibility on Windows (SOCK_DGRAM)

    • Location: cmd /roborev/tui/control_test.go (in TestRemoveStaleSocket_IncompatibleSocketRefused)
    • Issue: syscall.Socket(syscall.AF_UNIX, syscall.SOCK_DGRAM, 0) is used to create a datagram Unix socket
      . Windows does not support SOCK_DGRAM for AF_UNIX, causing this test to unconditionally fail on Windows builds.
    • Suggested Fix: Skip the test on Windows or move it to a control_unix_test.go file.
  • Test Incompatibility
    on Windows (File Permissions)

    • Location: cmd/roborev/tui/control_test.go (in TestControlSocketPermissions)
    • Issue: The test strictly asserts POSIX file permissions (perm&0077 == 0). On
      Windows, newly created files do not respect umask and typically reflect default ACLs (evaluating as 0666), causing the test to reliably fail.
    • Suggested Fix: Skip the file permission assertion on Windows.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Mar 9, 2026

roborev: Combined Review (4b20dda)

Verdict: The TUI control-socket implementation has medium-severity bugs related to startup ordering and stale runtime metadata that need to be addressed.

Medium

  • Startup ordering
    makes the newly published control endpoint intermittently unusable.
    cmd/roborev/tui/tui.go:867 cmd/robore
    v/tui/control.go:180

    Run starts the socket listener and writes runtime metadata before p.Run() starts
    the Bubble Tea event loop. queryViaProgram/mutateViaProgram immediately call p.Send(...), which is a no-op before the program is running, so a client that connects as soon as it discovers the socket can only hit the 3-second "response timeout" path. The current
    tests miss this because TestControlSocketRoundtrip starts p.Run() before the listener. Suggested fix: gate control handling on an explicit “program ready” signal, or only advertise/accept the socket after the event loop is live.

  • Runtime metadata goes stale after remote filter changes. cmd
    /roborev/tui/control_handlers.go:180
    cmd/roborev/tui/tui.go
    :877
    [cmd/roborev/tui/control_runtime.go:152](/home/roborev/repos/
    roborev/cmd/roborev/tui/control_runtime.go#L152)
    The runtime file records RepoFilter/BranchFilter, but it is only written once during startup. set-filter and clear-filter mutate m.active RepoFilter/m.activeBranchFilter without rewriting that file, so any tool using runtime metadata for discovery will keep seeing the old filters after successful control commands. Suggested fix: rewrite the runtime metadata whenever the exported runtime state changes, and add a test that mutates filters then re-reads the metadata.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

wesm and others added 10 commits March 9, 2026 18:06
Expose a newline-delimited JSON protocol over a Unix socket that
allows external tools to programmatically query state and trigger
mutations in a running TUI instance. Supports 12 commands: 4 queries
(get-state, get-filter, get-jobs, get-selected) and 8 mutations
(set-filter, clear-filter, set-hide-closed, select-job, set-view,
close-review, cancel-job, rerun-job).

Socket created at {DataDir}/tui.{PID}.sock on every TUI launch with
0600 permissions. Runtime metadata written to tui.{PID}.json for
discoverability. Stale sockets from dead processes cleaned on startup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ata gating

- Lstat the socket path before removal: only remove verified-stale
  Unix sockets (dial fails), refuse to delete regular files or sockets
  with a live listener
- Create parent directories with MkdirAll before net.Listen so fresh
  installs and custom --control-socket paths in missing dirs work
- Only write tui.{PID}.json runtime metadata after the listener starts
  successfully, preventing external tools from discovering a broken
  endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously removeStaleSocket treated any dial failure as proof that
a socket was stale. Non-ECONNREFUSED errors (e.g. DGRAM sockets,
permission denied) are ambiguous and could indicate a live
non-stream socket. Now only ECONNREFUSED triggers removal; other
dial errors return an error to the caller.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The manual unwrapping of net.OpError → os.SyscallError was fragile
and would miss ECONNREFUSED if Go or the OS wrapped the errno
differently. errors.Is walks the full chain regardless of nesting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- select-job: reject jobs hidden by filters or hide-closed, not just
  jobs missing from m.jobs
- close-review / cancel-job: only reflow selection when the mutated
  job is the currently selected row, preventing unexpected cursor
  movement when operating on a different job
- Runtime metadata: populate RepoFilter/BranchFilter from the
  model's resolved active filters instead of raw CLI config, so
  auto-filter-by-repo/branch is advertised correctly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- isProcessAlive: use platform-specific build files so Windows uses
  tasklist instead of signal 0, and Unix handles EPERM (process
  exists but owned by another user)
- Socket permissions: set umask(0177) before net.Listen so the
  socket is created with 0600 from the start, closing the TOCTOU
  window between Listen and Chmod
- Accept loop: add 100ms backoff on transient errors to prevent
  tight CPU-pegging loops on EMFILE or similar
- Runtime metadata: add regression test verifying that auto-filter
  repo/branch settings are reflected in the written metadata

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- isProcessAlive: use errors.Is(err, syscall.EPERM) instead of
  direct equality to handle wrapped permission errors
- Extract buildTUIRuntimeInfo helper used by both Run() and the
  regression test, so the test exercises the actual metadata
  construction path rather than manually copying model fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Close the pipe write end before Kill so bubbletea's readLoop sees
EOF and exits before shutdown closes the cancel reader. Wait for
Run to complete before returning from cleanup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Deep-copy pointer fields (Closed, Verdict) and slices
  (RepoFilter, FilterStack) in control response builders so
  json.Marshal in connection goroutines doesn't race with
  mutations in the Bubble Tea update loop
- CleanupStaleTUIRuntimes now uses removeStaleSocket to verify
  socket type and liveness before removal, preventing deletion of
  non-socket files or live sockets at shared custom paths
- Move SOCK_DGRAM and POSIX permission tests to control_unix_test.go
  with !windows build tag since these features don't exist on Windows

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite all assertions in control_test.go and control_unix_test.go
to use testify assert/require instead of raw if/t.Errorf/t.Fatalf
patterns, matching the project-wide migration in #487.

Update CLAUDE.md and AGENTS.md to document that testify is the
standard assertion library for all Go tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wesm wesm force-pushed the feat/tui-control branch from 4b20dda to e3af1b1 Compare March 9, 2026 23:13
@roborev-ci
Copy link

roborev-ci bot commented Mar 9, 2026

roborev: Combined Review (e3af1b1)

Verdict: The TUI control interface implementation is mostly complete, but contains a couple of medium-severity edge cases concerning startup ordering and stale metadata.

Medium

  • Control requests can hang during startup
    :
    The socket is opened and advertised before tea.Program.Run() starts processing messages. In [cmd/roborev/tui/tui.go:861](/home/roborev/repos/roborev/cmd/roborev/tui/tui.go
    #L861) the listener and runtime metadata are created before cmd/roborev/tui/tui.go:887,
    but cmd/roborev/tui/control.go:176 and cmd/roborev/tui/control.go:
    194
    call p.Send(...) before the timeout select. Bubble Tea’s Program.Send blocks until the program has started, so a
    client that discovers the runtime immediately can stall indefinitely if startup is slow or fails.
    • Suggested fix: Only advertise the socket after the program is ready, or gate request handling behind a readiness channel and put the timeout around the full send/response path. Ensure test coverage for requests that land before p.Run() is accepting messages.
  • Runtime metadata becomes stale after control-side filter changes: The metadata includes repo_filter/branch_filter in [cmd/roborev/tui/control_runtime.go:13](/home/roborev/repos/roborev/cmd/
    roborev/tui/control_runtime.go#L13) and is written once in [cmd/roborev/tui/tui.go:877](/home/roborev/repos/roborev/cmd/roborev/tui/tui
    .go#L877), but the new mutations in cmd/roborev/tui/control_handlers.go:166 through cmd/roborev/tui/control_handlers.go:270 can change those filters without rewriting the metadata file.
    This means discovery logic can keep routing based on startup filters even after set-filter/clear-filter.
    • Suggested fix: Refresh the runtime file whenever mutable discovery fields change, or remove mutable filter state from the metadata entirely. Add coverage for metadata state after a successful filter mutation.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant