misc: Phoenix runner management scripts#1314
Conversation
Add a toolkit for managing GitHub Actions self-hosted runners on Phoenix login nodes: check-runners.sh — quick per-node health check list-runners.sh — detailed runner table (name, node, status, slurm, RSS) restart-runner.sh — restart one runner with proper login shell PATH restart-all.sh — restart all runners in place rebalance-runners.sh — auto-compute optimal distribution and move runners create-runner.sh — register and start a new runner rerun-failed.sh — find and rerun failed GHA workflows on open PRs config.sh — shared config (nodes, cgroup limit, discovery helpers) Remove the outdated starting-phoenix-runners.md which referenced the old SOCKS5 proxy setup that is no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Claude Code ReviewIncremental review from: New findings since last Claude review (frontier/ scripts are new in this push): 1. Tarball download race condition —
if [ ! -f "${SHARED_DIR}/${TARBALL}" ]; then
curl -fsSL ... -o "${SHARED_DIR}/${TARBALL}"
fiIf N runners are deployed simultaneously and the tarball doesn't yet exist, all N pass the Fix: download (or verify) the tarball once before entering the parallel loop in # In deploy-runners.sh, before the for loop:
bash "${SHARED_DIR}/make-runner.sh" --download-only "${NUM}" 2>/dev/null || trueOr simpler — download once in 2.
Previously reported, still open:
|
… config Simplify scripts to be Phoenix-aware: org, runner group, label, nodes, and parent dirs are all in config.sh rather than passed as arguments. Add GitHub API helpers (gh_list_runners, gh_registration_token) so list-runners.sh shows both API status and process state. create-runner.sh now only needs a name and node. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scripts for deploying, restarting, and stopping self-hosted runners on OLCF Frontier login nodes, modeled after the existing Phoenix runner scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ed discovery - Add misc/common/rerun-failed.sh (site-agnostic, moved from phoenix/) - Replace misc/phoenix/rerun-failed.sh with a thin wrapper to common/ - Add misc/frontier/config.sh modeled after phoenix/config.sh: - Frontier constants, SSH_OPTS with BatchMode for MOTD suppression - find_pids() filters stdout through grep -E '^[0-9]+$' to strip MOTD noise - start_runner() uses timeout+setsid+nohup (no login shell needed) - stop_runner(), find_node(), gh_list_runners(), gh_registration_token() - Rewrite make-runner.sh: sources config.sh, no runner.pid, uses start_runner() - Rewrite deploy-runners.sh: sources config.sh, calls $SCRIPT_DIR/make-runner.sh - Rewrite restart-offline-runners.sh: GitHub API query, CWD-based discovery, stop stale processes before restart, parallel restarts, final status report - Rewrite stop-runner.sh: find_node() + stop_runner() + gh_remove_runner() - Add check-runners.sh: SSH per-node health check (name/status/RSS) - Update README.md: new scripts table, no runner.pid references, updated troubleshooting guide Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e-runner - list-runners.sh: new script using parallel SSH sweep across all 11 nodes simultaneously instead of serial per-runner discovery; flags stale runner.node entries with a warning in the node column - move-runner.sh: new script to relocate a runner between login nodes with one retry on start failure - restart-offline-runners.sh: add retry logic (sleep 5 + second attempt) and runner.node self-healing (detects and corrects stale node entries when a runner is found on a different node than recorded) - make-runner.sh: replace hardcoded RUNNER_VERSION with dynamic GitHub API lookup falling back to pinned version; print selected version at startup - misc/frontier/README.md: document new scripts, update quick reference and troubleshooting sections, note runner.node self-healing behavior - misc/common/README.md: new file documenting site-agnostic shared scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…y logic - config.sh: add sync_runner_nodes() — parallel sweep of all login nodes that updates runner.node files before any action, ensuring accuracy even when runners are manually restarted on different nodes - make-runner.sh: use gh_latest_runner_version() with pinned fallback - restart-offline-runners.sh: call sync_runner_nodes first; self-heal stale runner.node on restart; retry start_runner once on failure - check-runners.sh: call sync_runner_nodes first - list-runners.sh: new — parallel SSH sweep + GitHub API combined view - move-runner.sh: new — move a runner to a different login node - misc/common/README.md: new — document shared scripts - misc/frontier/README.md: update for new scripts and sync behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewIncremental review from: Previously reported issues — now fixed in this push:
New finding:
This push introduces # Frontier (misc/frontier/config.sh)
SSH_OPTS="-o StrictHostKeyChecking=no -o ConnectTimeout=10 -o BatchMode=yes -o ServerAliveInterval=10 -o ServerAliveCountMax=3"
# Phoenix (misc/phoenix/config.sh) ← newly added in this push
SSH_OPTS="-o ConnectTimeout=5"The missing options matter for unattended scripted use:
Suggested fix — align with frontier's opts: SSH_OPTS="-o StrictHostKeyChecking=no -o ConnectTimeout=5 -o BatchMode=yes -o ServerAliveInterval=10 -o ServerAliveCountMax=3"Still open from previous review:
|
start_runner() was launching run.sh via absolute path without cd-ing into the runner directory first. The Runner.Listener process inherited the SSH login shell CWD (user home dir), so find_pids() CWD matching always failed for runners started this way. Fix: cd into $dir before running ./run.sh so the process CWD matches what find_pids() expects. Relative log path follows automatically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- frontier/deploy-runners.sh: pre-download tarball once before spawning parallel make-runner.sh instances; use atomic tmp+mv to prevent concurrent curl writes corrupting the tarball - frontier/make-runner.sh: same atomic tmp+mv for solo-invocation safety - common/rerun-failed.sh: guard gh run view with || continue so an expired/deleted run skips rather than exits the whole script; switch pipe loops to process substitution so continue works correctly and remove the dead rerun_count variable (incremented in a subshell, never read) - frontier/list-runners.sh, phoenix/list-runners.sh, restart-all.sh, rebalance-runners.sh: replace for dir in $(find_runner_dirs) with while IFS= read -r dir; do ... done < <(find_runner_dirs) to eliminate word-splitting and glob expansion on runner paths - phoenix/check-runners.sh, phoenix/list-runners.sh: guard rss value before arithmetic expansion — if SSH fails and rss="?", the expression $(( CGROUP_LIMIT - rss )) is a syntax error that exits under set -euo pipefail; default to 0 instead Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move functions identical across both sites into a common library:
gh_registration_token, gh_latest_runner_version, gh_remove_runner,
get_runner_name, find_pids, find_node, start_runner, stop_runner
Both config.sh files now source runner-lib.sh after defining their
site constants (ORG, NODES, SSH_OPTS), keeping only site-specific
logic locally:
- frontier/config.sh: gh_list_runners, find_runner_dirs, sync_runner_nodes
- phoenix/config.sh: gh_list_runners, find_runner_dirs, has_slurm
Harmonize start_runner() across both sites:
- Use bash -lc on both (was frontier-only) for login shell PATH
- Use timeout 15 + synchronous SSH (was phoenix's background SSH + poll loop)
- cd into runner dir before run.sh so CWD-based discovery works
- Standardize log file to runner.log (was runner-nohup.log on phoenix)
- Use $SSH_OPTS variable (added to phoenix config) throughout all
phoenix scripts instead of hardcoded -o ConnectTimeout=5
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
/proc/$p/exe is intrinsic to the binary and cannot change after exec, making it more reliable than CWD which depends on how the process was launched. Also correctly excludes Runner.Worker processes that share the same directory, since their exe is bin/Runner.Worker not bin/Runner.Listener. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirrors phoenix/rebalance-runners.sh with frontier-specific changes: - calls sync_runner_nodes first to correct any stale runner.node files - updates runner.node after each successful move - no has_slurm check (not applicable on frontier) With 22 runners across 11 nodes the target is 2 per node. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewIncremental review from: New findings since last Claude review: 1.
existing=$(ls -d "$parent_dir"/actions-runner-* 2>/dev/null | sed '...' | sort -n | tail -1)When Minimal fix — append existing=$(ls -d "$parent_dir"/actions-runner-* 2>/dev/null | sed 's/.*actions-runner-//' | sort -n | tail -1 || true)Previously reported, still open:
|
Extract shared logic from frontier/phoenix into common scripts: - runner-lib.sh: add gh_list_runners(), has_slurm() (portable, grep PATH for slurm keyword), sweep_all_nodes() (exe-based, parallel SSH), CGROUP_LIMIT default - check-runners.sh: new common script (exe-based discovery, slurm column, conditional cgroup footer); both sites now show slurm status - list-runners.sh: new common script (parallel sweep, slurm column, stale runner.node detection, conditional cgroup footer) - rebalance-runners.sh: new common script (optional sync_runner_nodes hook, writes runner.node after start, checks slurm after start) All site scripts (frontier/, phoenix/) reduced to thin wrappers that source config.sh then the common implementation. phoenix/create-runner.sh fixed to write runner.node after successful start. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
restart-offline-runners.sh is superseded by rebalance-runners.sh, which handles offline runners as part of its distribution algorithm and also calls sync_runner_nodes to self-heal stale runner.node files. Update frontier/README.md: remove restart-offline-runners.sh references, replace CWD-based discovery language with EXE-based, add rebalance-runners.sh. Update phoenix/README.md: fix config.sh description (functions now in runner-lib.sh), fix list-runners.sh description (now parallel sweep), fix slurm PATH wording (portable grep, not /opt/slurm). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
stop-runner, move-runner, restart-runner, and restart-all were each only available on one cluster. Extract implementations to common/ and add thin wrappers for both sites so all operational scripts work on both Frontier and Phoenix. Improvements over the originals: - stop-runner: searches find_runner_dirs() by name instead of hardcoded SHARED_DIR path; fixes runner_id lookup to use process substitution - move-runner: optional sync_runner_nodes hook (runs on Frontier, no-op on Phoenix); searches find_runner_dirs() by name - restart-runner: writes runner.node after successful start (was missing) - restart-all: adds optional sync_runner_nodes hook; writes runner.node after each successful start (was missing) Update common/README.md and both site READMEs to reflect full script inventory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewIncremental review from: New findings since last Claude review (this push: refactored to 1.
SITE_SCRIPT_DIR="$MISC_DIR/$SITE"
source "$MISC_DIR/$SITE/config.sh"
source "$MISC_DIR/common/$CMD.sh" "$@"No common script references Previously reported, still open:
|
- runner-lib.sh: use sys.argv[1] in get_runner_name (prevent path injection); verify process dead after SIGKILL in stop_runner; warn on sweep_all_nodes SSH failure; log SSH failure in start_runner - rebalance-runners.sh: fix busy-runner grep with trailing slash; handle stop_runner failure before attempting move - restart-all.sh: replace space-delimited restart_list with three separate arrays (space-safe); fix busy grep trailing slash; handle stop_runner failure - restart-runner.sh: exit 1 on start failure (was silently exiting 0) - stop-runner.sh: handle gh_list_runners API failure with warning - rerun-failed.sh: guard empty failed_jobs to avoid blank bullet; add per-job display and expired-run guard to master branch block - deploy-runners.sh: track background PIDs; detect and report failures - create-runner.sh: use #!/usr/bin/env bash; atomic curl download - frontier/config.sh: document RUNNER_GROUP="phoenix" is intentional - phoenix/config.sh: use #!/usr/bin/env bash Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All shared commands (check-runners, list-runners, move-runner, rebalance-runners, restart-all, restart-runner, stop-runner, rerun-failed) previously existed as near-identical thin wrappers in both frontier/ and phoenix/. Replace them all with a single dispatcher: bash misc/runner.sh <site> <command> [args...] Site directories now contain only truly site-specific files: - frontier/: config.sh, make-runner.sh, deploy-runners.sh - phoenix/: config.sh, create-runner.sh Also fix rebalance-runners.sh to source common/check-runners.sh directly (was calling the now-deleted site wrapper). Update all READMEs to document the new dispatcher invocation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Consolidates all runner management scripts under misc/runners/ to
separate them from unrelated misc/ files. Structure:
misc/runners/runner.sh dispatcher
misc/runners/common/ shared scripts and library
misc/runners/frontier/ Frontier config + make/deploy scripts
misc/runners/phoenix/ Phoenix config + create script
All paths are computed from ${BASH_SOURCE[0]} so no internal path
changes are needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
frontier/make-runner.sh and phoenix/create-runner.sh were the same
concept with different names and minor behavioral differences. Unified
into common/create-runner.sh with site-specific behavior driven by two
config.sh hooks:
TARBALL_CACHE_DIR
Frontier: "$SHARED_DIR" — tarball cached on shared Lustre,
reused across parallel deployments (prevents download races).
Phoenix: "" — each runner downloads independently.
runner_install_dir <name> [override-dir]
Frontier: "$SHARED_DIR/<name>"
Phoenix: auto-numbered "$RUNNER_PARENT_DIRS[0]/actions-runner-N"
frontier/make-runner.sh is now a 3-line wrapper that derives the name
"frontier-<N>" from a number argument and delegates to create-runner.
deploy-runners.sh updated to reference $TARBALL_CACHE_DIR instead of
$SHARED_DIR directly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All new runner installs now go to RUNNER_PARENT_DIRS[1]: /storage/project/r-sbryngelson3-0/sbryngelson3/mfc-runners-2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewIncremental review from: New findings since last Claude review: 1. When mkdir -p "$RUNNER_DIR"
tarball_path="$RUNNER_DIR/runner-download.tmp.$$"
curl -fsSL "$TARBALL_URL" -o "$tarball_path"If if [ -d "$RUNNER_DIR" ]; then
echo "ERROR: Directory already exists: $RUNNER_DIR" >&2
exit 1
fiThe user must then manually remove the orphaned directory before retrying. The same issue applies if extraction succeeds but Suggested fix — add a cleanup trap immediately after the directory is created: mkdir -p "$RUNNER_DIR"
trap 'rm -rf "$RUNNER_DIR"' ERR
tarball_path="$RUNNER_DIR/runner-download.tmp.$$"
curl -fsSL "$TARBALL_URL" -o "$tarball_path"Clear the trap once start succeeds: if start_runner "$TARGET_NODE" "$RUNNER_DIR"; then
trap - ERR # runner is up; don't clean up on subsequent errors
...Still open from previous reviews:
|
Claude Code ReviewHead SHA: Files Changed (19)
Summary
Findings1. [Bug] File: existing=With Fix: append existing=$(ls -d "$parent"/actions-runner-* 2>/dev/null | sed 's/.*actions-runner-//' | sort -n | tail -1 || true)2. [Correctness] File: Current: Frontier's equivalent ( The missing options matter for unattended non-interactive scripts:
Suggested fix — align with Frontier: SSH_OPTS="-o StrictHostKeyChecking=no -o ConnectTimeout=5 -o BatchMode=yes -o ServerAliveInterval=10 -o ServerAliveCountMax=3"3. [Minor] File: SITE_SCRIPT_DIR="$MISC_DIR/$SITE"
source "$MISC_DIR/$SITE/config.sh"
source "$MISC_DIR/common/$CMD.sh" "$@"
Non-findings (resolved from prior review rounds)
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (19)
📝 WalkthroughWalkthroughIntroduces a comprehensive GitHub Actions runner management infrastructure for Frontier and Phoenix systems. Adds a shared library (runner-lib.sh) with common GitHub API and process management functions. Implements site-agnostic utility scripts (check-runners.sh, create-runner.sh, list-runners.sh, move-runner.sh, rebalance-runners.sh, restart-runner.sh, restart-all.sh, stop-runner.sh, rerun-failed.sh) for runner operations. Adds site-specific configurations for Frontier and Phoenix with node discovery and installation directory logic. Includes a dispatcher script (runner.sh) that routes commands to site or common implementations. Provides comprehensive documentation for both sites and removes obsolete Phoenix startup documentation. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a new misc/runners/ toolkit to manage GitHub Actions self-hosted runners on Phoenix and Frontier (health checks, listing, restarting, moving, rebalancing, creation, and rerunning failed CI), and removes the obsolete Phoenix runner startup doc (SOCKS5 proxy no longer needed).
Changes:
- Introduce a site-dispatched runner management CLI (
misc/runners/runner.sh) with shared library code undermisc/runners/common/. - Add site configs/docs for Phoenix and Frontier plus Frontier convenience scripts for provisioning.
- Remove
misc/starting-phoenix-runners.md(outdated proxy-based procedure).
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| misc/starting-phoenix-runners.md | Removes outdated Phoenix runner startup documentation. |
| misc/runners/runner.sh | Dispatcher for site/common runner management commands. |
| misc/runners/common/runner-lib.sh | Shared GitHub API + SSH process discovery + start/stop primitives. |
| misc/runners/common/check-runners.sh | Per-node runner health check. |
| misc/runners/common/list-runners.sh | Combined GitHub API + live process table. |
| misc/runners/common/rebalance-runners.sh | Rebalance algorithm for distributing runners across login nodes. |
| misc/runners/common/restart-runner.sh | Restart a single runner and verify SLURM is in PATH. |
| misc/runners/common/restart-all.sh | Restart all runners (dry-run by default). |
| misc/runners/common/move-runner.sh | Move a runner between login nodes by name. |
| misc/runners/common/stop-runner.sh | Stop and deregister a runner. |
| misc/runners/common/create-runner.sh | Download/register/start a new runner. |
| misc/runners/common/rerun-failed.sh | Rerun failed workflows on open PRs and master. |
| misc/runners/common/README.md | Documentation for shared commands and library. |
| misc/runners/phoenix/config.sh | Phoenix-specific constants and runner directory discovery. |
| misc/runners/phoenix/README.md | Phoenix operator documentation and usage examples. |
| misc/runners/frontier/config.sh | Frontier-specific constants, shared dir, and node sync helper. |
| misc/runners/frontier/make-runner.sh | Frontier convenience wrapper for numbered runner creation. |
| misc/runners/frontier/deploy-runners.sh | Parallel runner deployment across multiple login nodes. |
| misc/runners/frontier/README.md | Frontier operator documentation and usage examples. |
| # Return the directory where a named runner should be installed. | ||
| # Auto-increments the actions-runner-N suffix within RUNNER_PARENT_DIRS[0]. | ||
| # Args: $1 = runner name (unused; directory is numbered, not named), $2 = optional override dir | ||
| runner_install_dir() { | ||
| local override="${2:-}" | ||
| [ -n "$override" ] && echo "$override" && return | ||
| local parent="${RUNNER_PARENT_DIRS[1]}" | ||
| local existing next_num | ||
| existing=$(ls -d "$parent"/actions-runner-* 2>/dev/null | sed 's/.*actions-runner-//' | sort -n | tail -1) | ||
| next_num=$(( ${existing:-0} + 1 )) | ||
| echo "$parent/actions-runner-$next_num" |
| echo "ERROR: Unknown command '$CMD' for site '$SITE'." >&2 | ||
| echo "Common: check-runners list-runners move-runner rebalance-runners restart-all restart-runner stop-runner rerun-failed" >&2 | ||
| echo "Frontier: make-runner deploy-runners" >&2 | ||
| echo "Phoenix: create-runner" >&2 |
| ssh $SSH_OPTS "$node" ' | ||
| for p in $(ps aux | grep Runner.Listener | grep -v grep | awk "{print \$2}"); do | ||
| exe=$(readlink -f /proc/$p/exe 2>/dev/null || true) | ||
| [ -z "$exe" ] && continue | ||
| dir=$(dirname "$(dirname "$exe")") | ||
| rss=$(ps -p $p -o rss= 2>/dev/null | awk "{printf \"%.0f\", \$1/1024}" || echo 0) | ||
| slurm=$(tr "\0" "\n" < /proc/$p/environ 2>/dev/null | grep -c "^PATH=.*slurm" || echo 0) | ||
| [ "$slurm" -gt 0 ] && slurm_ok="ok" || slurm_ok="MISSING" | ||
| echo "RUNNER '"$node"' $dir $rss $slurm_ok" | ||
| done | ||
| ' 2>/dev/null > "$tmpdir/$node.out" & |
| start_runner() { | ||
| local node="$1" dir="$2" | ||
| timeout 15 ssh $SSH_OPTS "$node" \ | ||
| "cd $dir && setsid bash -lc 'nohup ./run.sh >> runner.log 2>&1 < /dev/null &'" \ |
| echo "=== Per-node memory ===" | ||
| for node in "${NODES[@]}"; do | ||
| count=$(ssh $SSH_OPTS "$node" \ | ||
| "ps aux | grep Runner.Listener | grep -v grep | wc -l" 2>/dev/null || echo 0) |
| for p in $(ps aux | grep Runner.Listener | grep -v grep | awk "{print \$2}"); do | ||
| found=1 | ||
| exe=$(readlink -f /proc/$p/exe 2>/dev/null || echo "???") | ||
| dir=$(dirname "$(dirname "$exe")" 2>/dev/null || echo "???") | ||
| name=$(basename "$dir") | ||
| worker=$(ps aux | grep "Runner.Worker" | grep "$dir" | grep -v grep | awk "{print \$2}" | head -1) |
| exe=$(readlink -f /proc/$p/exe 2>/dev/null || echo "???") | ||
| dir=$(dirname "$(dirname "$exe")" 2>/dev/null || echo "???") | ||
| name=$(basename "$dir") | ||
| worker=$(ps aux | grep "Runner.Worker" | grep "$dir" | grep -v grep | awk "{print \$2}" | head -1) | ||
| [ -n "$worker" ] && status="BUSY" || status="idle" | ||
| rss=$(ps -p $p -o rss= 2>/dev/null | awk "{printf \"%.0f\", \$1/1024}" || echo "?") | ||
| slurm=$(tr "\0" "\n" < /proc/$p/environ 2>/dev/null | grep -c "^PATH=.*slurm" || echo 0) | ||
| [ "$slurm" -gt 0 ] && slurm_ok="ok" || slurm_ok="MISSING" | ||
| printf " %-30s %5s slurm=%-7s %s MB\n" "$name" "$status" "$slurm_ok" "$rss" |
| # Sites: frontier phoenix | ||
| # Common: check-runners list-runners move-runner rebalance-runners | ||
| # restart-all restart-runner stop-runner rerun-failed | ||
| # Frontier: make-runner deploy-runners | ||
| # Phoenix: create-runner | ||
| # |
| ssh $SSH_OPTS "$1" ' | ||
| for p in $(ps aux | grep Runner.Listener | grep -v grep | awk "{print \$2}"); do | ||
| exe=$(readlink -f /proc/$p/exe 2>/dev/null || true) | ||
| [ "$exe" = "'"$2"'/bin/Runner.Listener" ] && echo "$p" | ||
| done | ||
| ' 2>/dev/null | grep -E '^[0-9]+$' | tr '\n' ' ' || true |
Summary
misc/starting-phoenix-runners.md(SOCKS5 proxy setup no longer needed)Scripts
config.shcheck-runners.shlist-runners.shrestart-runner.shrestart-all.shrebalance-runners.shcreate-runner.shrerun-failed.shTest plan
check-runners.shtested on live Phoenix nodeslist-runners.shtested on live Phoenix nodesrebalance-runners.shdry run tested (correctly identifies moves needed)🤖 Generated with Claude Code