Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions architecture/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,8 +171,8 @@ The inference routing system transparently intercepts AI inference API calls fro

**How it works end-to-end:**

1. An operator configures cluster-level inference via `openshell cluster inference set --provider <name> --model <id>`. This stores a reference to the named provider and model on the gateway.
2. When a sandbox starts, the supervisor fetches an inference bundle from the gateway via the `GetInferenceBundle` RPC. The gateway resolves the stored provider reference into a complete route: endpoint URL, API key, supported protocols, provider type, and auth metadata. The sandbox refreshes this bundle eagerly in the background every 5 seconds by default (override with `OPENSHELL_ROUTE_REFRESH_INTERVAL_SECS`).
1. An operator configures gateway-level inference via `openshell inference set --provider <name> --model <id>`. This stores a default provider/model route on the gateway. Operators can also configure one sandbox to use a different provider/model through a gateway-owned sandbox inference override.
2. When a sandbox starts, the supervisor fetches an inference bundle from the gateway via the `GetInferenceBundle` RPC, passing its sandbox ID. The gateway resolves that sandbox's override if one exists, otherwise falls back to the gateway default, then resolves provider references into complete routes: endpoint URL, API key, supported protocols, provider type, and auth metadata. The sandbox refreshes this bundle eagerly in the background every 5 seconds by default (override with `OPENSHELL_ROUTE_REFRESH_INTERVAL_SECS`).
3. The agent sends requests to `https://inference.local` using standard OpenAI or Anthropic SDK calls.
4. The sandbox proxy intercepts the HTTPS CONNECT to `inference.local` (bypassing OPA policy evaluation), TLS-terminates the connection using the sandbox's ephemeral CA, and parses the HTTP request.
5. Known inference API patterns are detected (e.g., `POST /v1/chat/completions` for OpenAI, `POST /v1/messages` for Anthropic, `GET /v1/models` for model discovery). Matching requests are forwarded to the first compatible route by the `openshell-router`, which rewrites the auth header, injects provider-specific default headers (e.g., `anthropic-version` for Anthropic), and overrides the model field in the request body.
Expand All @@ -184,9 +184,9 @@ The inference routing system transparently intercepts AI inference API calls fro
- The sandbox never sees the real API key for the backend -- credential isolation is maintained through the gateway's bundle resolution.
- Routing is explicit via `inference.local`; OPA network policy is not involved in inference routing.
- Provider-specific behavior (auth header style, default headers, supported protocols) is centralized in `InferenceProviderProfile` definitions in `openshell-core`. Supported inference provider types are openai, anthropic, and nvidia.
- Cluster inference is managed via CLI (`openshell cluster inference set/get`).
- Gateway inference is managed via CLI (`openshell inference set/get`), with optional per-sandbox overrides under `openshell inference sandbox`.

**Inference routes** are stored on the gateway as protobuf objects (`InferenceRoute` in `proto/inference.proto`). Cluster inference uses a managed singleton route entry keyed by `inference.local` and configured from provider + model settings. Endpoint, credentials, and protocols are resolved from the referenced provider record at bundle fetch time, so rotating a provider's API key takes effect on the next bundle refresh without reconfiguring the route.
**Inference routes** are stored on the gateway as protobuf objects (`InferenceRoute` in `proto/inference.proto`). Cluster inference uses a managed default route entry keyed by `inference.local`. Sandbox inference overrides use gateway-owned route records keyed by sandbox ID. Endpoint, credentials, and protocols are resolved from the referenced provider record at bundle fetch time, so rotating a provider's API key takes effect on the next bundle refresh without reconfiguring the route.

**Components involved:**

Expand All @@ -196,7 +196,7 @@ The inference routing system transparently intercepts AI inference API calls fro
| Inference pattern detection | `crates/openshell-sandbox/src/l7/inference.rs` | Matches HTTP method + path against known inference API patterns |
| Local inference router | `crates/openshell-router/src/lib.rs` | Selects a compatible route by protocol and proxies to the backend |
| Provider profiles | `crates/openshell-core/src/inference.rs` | Centralized auth, headers, protocols, and endpoint defaults per provider type |
| Gateway inference service | `crates/openshell-server/src/inference.rs` | Stores cluster inference config, resolves bundles with credentials from provider records |
| Gateway inference service | `crates/openshell-server/src/inference.rs` | Stores cluster inference defaults and sandbox overrides, resolves bundles with credentials from provider records |
| Proto definitions | `proto/inference.proto` | `ClusterInferenceConfig`, `ResolvedRoute`, bundle RPCs |

### Container and Build System
Expand Down Expand Up @@ -238,7 +238,7 @@ The CLI is the primary way users interact with the platform. It provides command
- **Sandbox management** (`openshell sandbox`): Create sandboxes (with optional file upload and provider auto-discovery), connect to sandboxes via SSH, and delete sandboxes.
- **Top-level commands**: `openshell status` (cluster health), `openshell logs` (sandbox logs), `openshell forward` (port forwarding), `openshell policy` (sandbox policy management), `openshell settings` (effective sandbox settings and global/sandbox key updates).
- **Provider management** (`openshell provider`): Create, update, list, and delete external service credentials.
- **Inference management** (`openshell cluster inference`): Configure cluster-level inference by specifying a provider and model. The gateway resolves endpoint and credential details from the named provider record.
- **Inference management** (`openshell inference`): Configure gateway-level inference by specifying a provider and model. Optionally configure individual sandboxes to use a different provider/model. The gateway resolves endpoint and credential details from the named provider record.

The CLI resolves which gateway to operate on through a priority chain: explicit `--gateway` flag, then the `OPENSHELL_GATEWAY` environment variable, then the active gateway set by `openshell gateway select`. Gateway names are exposed to shell completion from local metadata, and `openshell gateway select` opens an interactive chooser on a TTY while falling back to a printed list in non-interactive use. The CLI supports TLS client certificates for mutual authentication with the gateway.

Expand Down
15 changes: 9 additions & 6 deletions architecture/gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Proto definitions consumed by the gateway:
|------------|---------|---------|
| `proto/openshell.proto` | `openshell.v1` | `OpenShell` service, public sandbox resource model, provider/SSH/watch/policy messages, supervisor session messages (`ConnectSupervisor`, `RelayStream`, `RelayFrame`) |
| `proto/compute_driver.proto` | `openshell.compute.v1` | Internal `ComputeDriver` service, driver-native sandbox observations, compute watch stream envelopes |
| `proto/inference.proto` | `openshell.inference.v1` | `Inference` service: `SetClusterInference`, `GetClusterInference`, `GetInferenceBundle` |
| `proto/inference.proto` | `openshell.inference.v1` | `Inference` service: `SetClusterInference`, `GetClusterInference`, sandbox override RPCs, `GetInferenceBundle` |
| `proto/datamodel.proto` | `openshell.datamodel.v1` | `Provider` |
| `proto/sandbox.proto` | `openshell.sandbox.v1` | Sandbox supervisor policy, settings, and config messages |

Expand Down Expand Up @@ -395,27 +395,30 @@ These RPCs support the sandbox-initiated policy recommendation pipeline. The san

Defined in `proto/inference.proto`, implemented in `crates/openshell-server/src/inference.rs` as `InferenceService`.

The gateway acts as the control plane for inference configuration. It stores a single managed cluster inference route (named `inference.local`) and delivers resolved route bundles to sandbox pods. The gateway does not execute inference requests -- sandboxes connect directly to inference backends using the credentials and endpoints provided in the bundle.
The gateway acts as the control plane for inference configuration. It stores a managed cluster inference route (named `inference.local`), optional per-sandbox inference overrides, and delivers resolved route bundles to sandbox pods. The gateway does not execute inference requests -- sandboxes connect directly to inference backends using the credentials and endpoints provided in the bundle.

#### Cluster Inference Configuration

The gateway manages a single cluster-wide inference route that maps to a provider record. When set, the route stores only a `provider_name` and `model_id` reference. At bundle resolution time, the gateway looks up the referenced provider and derives the endpoint URL, API key, protocols, and provider type from it. This late-binding design means provider credential rotations are automatically reflected in the next bundle fetch without updating the route itself.
The gateway manages a cluster-wide default inference route that maps to a provider record. When set, the route stores only a `provider_name` and `model_id` reference. At bundle resolution time, the gateway looks up the referenced provider and derives the endpoint URL, API key, protocols, and provider type from it. This late-binding design means provider credential rotations are automatically reflected in the next bundle fetch without updating the route itself.

| RPC | Description |
|-----|-------------|
| `SetClusterInference` | Configures the cluster inference route. Validates `provider_name` and `model_id` are non-empty, verifies the named provider exists and has a supported type for inference (openai, anthropic, nvidia), validates the provider has a usable API key, then upserts the `inference.local` route record. Increments a monotonic `version` on each update. Returns the configured `provider_name`, `model_id`, and `version`. |
| `GetClusterInference` | Returns the current cluster inference configuration (`provider_name`, `model_id`, `version`). Returns `NotFound` if no cluster inference is configured, or `FailedPrecondition` if the stored route has empty provider/model metadata. |
| `SetSandboxInference` | Configures one sandbox's `inference.local` override after validating that the sandbox ID exists. The gateway stores it under `sandbox/<sandbox_id>/inference.local` and exposes it to the sandbox as the normal `inference.local` route. |
| `GetSandboxInference` | Returns one sandbox's configured override. Returns `NotFound` when the sandbox falls back to the cluster default. |
| `ClearSandboxInference` | Removes one sandbox's override so the sandbox falls back to the cluster default on the next bundle refresh. |
| `GetInferenceBundle` | Returns the resolved inference route bundle for sandbox consumption. See [Route Bundle Delivery](#route-bundle-delivery) below. |

#### Route Bundle Delivery

The `GetInferenceBundle` RPC resolves the managed cluster route into a `GetInferenceBundleResponse` containing fully materialized route data that sandboxes can use directly.
The `GetInferenceBundle` RPC resolves the sandbox override for the requested sandbox ID, falls back to the cluster default when no override exists, and returns a `GetInferenceBundleResponse` containing fully materialized route data that sandboxes can use directly.

The trait method delegates to `resolve_inference_bundle(store)` (`crates/openshell-server/src/inference.rs`), which takes `&Store` instead of `&self`. This extraction decouples bundle resolution from `ServerState`, enabling direct unit testing against an in-memory SQLite store without constructing a full server.
The trait method delegates to `resolve_inference_bundle(store, sandbox_id)` (`crates/openshell-server/src/inference.rs`), which takes `&Store` instead of `&self`. This extraction decouples bundle resolution from `ServerState`, enabling direct unit testing against an in-memory SQLite store without constructing a full server.

The `GetInferenceBundleResponse` includes:

- **`routes`** -- a list of `ResolvedRoute` messages containing base URL, model ID, API key, protocols, and provider type. Currently contains zero or one routes (the managed cluster route).
- **`routes`** -- a list of `ResolvedRoute` messages containing base URL, model ID, API key, protocols, and provider type. A sandbox override replaces the cluster default for that sandbox's `inference.local` route.
- **`revision`** -- a hex-encoded hash computed from route contents. Sandboxes compare this value to detect when their route set has changed.
- **`generated_at_ms`** -- epoch milliseconds when the bundle was assembled.

Expand Down
Loading
Loading