Fournos (φούρνος) = "oven" in Greek.
Fournos is a Kubernetes operator that schedules benchmark jobs via Kueue and executes them as Tekton PipelineRuns on remote clusters through the FORGE framework.
Jobs are submitted as FournosJob custom resources. The operator watches
for new CRs, creates Kueue Workloads for quota management, waits for
admission, then launches the corresponding Tekton PipelineRun.
The following operators must be installed in the cluster before deploying Fournos:
- Red Hat OpenShift Pipelines (
1.21) - Red Hat build of Kueue (
1.3) - Builds for Red Hat OpenShift Operator (
1.7) - Red Hat OpenShift GitOps (
1.20)- only for the GitOps deployment of Fournos
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pre-commit installCreate a FournosJob resource. Use generateName for automatic unique naming
and displayName for a human-readable label:
apiVersion: fournos.dev/v1
kind: FournosJob
metadata:
generateName: sample-run-benchamark-
spec:
owner: perf-team
displayName: sample-run-benchmark
cluster: cluster-1
pipeline: forge-full
forge:
project: llmd
args:
- cks
configOverrides:
batch_size: 64
env:
OCPCI_SUITE: regression
OCPCI_VARIANT: nightlyFOURNOS_NAMESPACE=fournos-$USER-dev
oc create -f config/forge/samples/job-full.yaml -n $FOURNOS_NAMESPACE # returns the generated name, e.g. forge-full-sample-x7k2m
oc get FournosJobs -n $FOURNOS_NAMESPACE -w # watch status transitions
oc patch FournosJob <name> -n $FOURNOS_NAMESPACE --type merge -p '{"spec":{"shutdown":"Stop"}}' # graceful stop (runs finally tasks)
oc patch FournosJob <name> -n $FOURNOS_NAMESPACE --type merge -p '{"spec":{"shutdown":"Terminate"}}' # immediate terminate (skips finally tasks)
oc delete FournosJob -n $FOURNOS_NAMESPACE <name> # cleanup| Field | Required | Description |
|---|---|---|
spec.forge.project |
yes | FORGE project path |
spec.forge.args |
yes | List of arguments passed to FORGE |
spec.forge.configOverrides |
no | Arbitrary YAML overrides passed to the test framework |
spec.env |
no | Environment variables passed to the pipeline as a KEY=VALUE env file |
spec.cluster |
* | Pin to a specific cluster (Kueue ResourceFlavor) |
spec.hardware.gpuType |
* | Short GPU model name — e.g. a100, h200. The operator prepends the FOURNOS_GPU_RESOURCE_PREFIX (default fournos/gpu-) automatically, so do not include the full resource path. |
spec.hardware.gpuCount |
with gpuType | Number of GPUs (minimum 1) |
spec.owner |
no | Team or individual that owns this job |
spec.displayName |
no | Human-readable job name (defaults to metadata.name) |
spec.pipeline |
no | Tekton Pipeline name (default: fournos-full) |
spec.priority |
no | Kueue WorkloadPriorityClass name |
spec.secretRefs |
no | Vault entry names to mount into the pipeline. The operator looks up each name as a K8s Secret and verifies it carries the fournos.dev/vault-entry=true label. Secrets must be synced from Vault first (see Synchronizing secrets from Vault). |
spec.exclusive |
no | If true, locks the target cluster so no other FournosJob can run there. Requires spec.cluster. |
spec.shutdown |
no | Shutdown action: Stop cancels gracefully (Tekton CancelledRunFinally — runs finally tasks); Terminate cancels immediately (Tekton Cancelled — skips finally tasks). Both wait for the PipelineRun to finish before releasing Kueue quota. |
* At least one of spec.cluster or spec.hardware must be provided. Both can be
set together to pin a hardware request to a specific cluster.
The operator writes status to .status:
| Field | Description |
|---|---|
phase |
Pending → Admitted → Running → Succeeded / Failed / Stopping → Stopped |
cluster |
Cluster assigned by Kueue |
pipelineRun |
Name of the Tekton PipelineRun |
dashboardURL |
Tekton Dashboard link (if configured) |
message |
Error details on failure |
Prerequisites: Podman,
kind, and kubectl.
make dev-setup # creates a kind cluster, installs Tekton + Kueue + CRD, applies mock resources
make dev-run # starts the operator locally (connects to the kind cluster)Both targets default to the fournos-local-dev namespace. Override with
FOURNOS_NAMESPACE=<YOUR_NAMESPACE> make dev-setup dev-run.
In another terminal:
FOURNOS_NAMESPACE=fournos-local-dev make test # run the integration test suitemake dev-teardown # deletes the kind clusterdev-setup installs real Tekton Pipelines and Kueue controllers into the kind
cluster, but substitutes lightweight mock Tasks (echo + sleep) in place of the
real FORGE runner. The dev environment uses its own Kueue config
(dev/mock-kueue-config.yaml) with four mock clusters and synthetic GPU quotas,
plus matching kubeconfig Secrets (cluster-{1..4}-kubeconfig).
make lint # lint (fournos/ + tests/)
make test # integration tests (operator must be running)FORGE on the hub: config/forge/ is the real OpenShift configuration for this repo—ImageStreams, Builds, Tekton Tasks and Pipelines, and sample jobs you apply to a cluster. It is not the same as the lightweight stand-ins under dev/mock-pipelines/, which make dev-setup installs on kind for local testing only.
Prepare the namespace
FOURNOS_NAMESPACE=fournos-$USER-dev
oc create ns $FOURNOS_NAMESPACE
oc label ns/$FOURNOS_NAMESPACE fournos.dev/queue-access=trueDeploy the operator:
oc apply -n $FOURNOS_NAMESPACE -f manifests/crd.yaml
for rbac_file in manifests/rbac/*.yaml; do
cat $rbac_file | NAMESPACE=$FOURNOS_NAMESPACE envsubst | oc apply -f- -n $FOURNOS_NAMESPACE
done
oc apply -n $FOURNOS_NAMESPACE -f manifests/deployment.yamlThree things are needed to make a target cluster available to Fournos:
- Create a kubeconfig Secret so the operator can reach the cluster:
FOURNOS_NAMESPACE=fournos-$USER-dev
CLUSTER_NAME=<name>
oc create secret generic ${CLUSTER_NAME}-kubeconfig \
--from-file=kubeconfig=/path/to/auth/kubeconfig \
-n $FOURNOS_NAMESPACEThe secret name must match the FOURNOS_KUBECONFIG_SECRET_PATTERN (default
{cluster}-kubeconfig).
- Add a ResourceFlavor and quota in
config/kueue-config.yaml. Add a newResourceFlavorwith a matchingfournos.dev/clusternodeLabel, and list it under thefournos-queueClusterQueue with the appropriate GPU/CPU quotas. Then apply:
oc apply -f config/kueue-config.yaml- Verify connectivity by submitting a lightweight validate-only
job. Edit
cluster(and optionallyhardware) inconfig/fournos-validation/samples/test-connectivity-job.yamlto match the new target, then:
FOURNOS_NAMESPACE=fournos-$USER-dev
oc create -f config/fournos-validation/samples/test-connectivity-job.yaml -n $FOURNOS_NAMESPACE
oc get fournosjobs -n $FOURNOS_NAMESPACE -w # should reach SucceededThis runs the fournos-validate-only pipeline, which only checks oc cluster-info against the target — no FORGE workload is launched. If the job
reaches Succeeded, the kubeconfig secret and Kueue quota are correctly
configured. If it fails, check the operator logs and the PipelineRun status for
details.
Apply the production FORGE assets from config/forge/ (not the kind mocks in dev/mock-pipelines/). Deploy the cluster configuration (Builds + Tekton):
oc apply -n $FOURNOS_NAMESPACE -f config/forge/images/is_forge.yaml
cat config/forge/images/build_forge-main.yaml \
| sed 's/psap-automation/'$FOURNOS_NAMESPACE'/g' \
| oc apply -n $FOURNOS_NAMESPACE
oc create -n $FOURNOS_NAMESPACE -f config/forge/images/buildrun_forge-main.yaml
for wf_file in config/forge/workflows/*.yaml; do
cat "$wf_file" | NAMESPACE=$FOURNOS_NAMESPACE envsubst '$NAMESPACE' | oc apply -f- -n $FOURNOS_NAMESPACE
donePipeline jobs can reference Kubernetes Secrets via spec.secretRefs. These
secrets originate in a HashiCorp Vault instance. Because there is no permanent
programmatic access to the vault, secrets are synchronized manually on demand —
whenever the vault content changes.
The sync script reads vault entries and creates one Opaque Secret per entry,
using the vault entry name directly as the K8s Secret name. Entries whose
names are not valid DNS-1123 subdomain names are skipped with an error.
Individual keys within an entry that are not valid K8s Secret data keys
(allowed: alphanumeric, -, _, .) are also skipped.
Existing secrets are updated in-place.
# 1. Set the required environment variables
export VAULT_ADDR="https://vault.example.com" # Vault server URL
export VAULT_TOKEN="s.xxxxx" # your short-lived token
export VAULT_SECRET_PATH="path/to/secrets" # directory path within the KV engine
# 2. Sync all vault entries under the configured path
python hacks/sync_vault_secrets.py -n $FOURNOS_NAMESPACE
# 3. Preview without touching the cluster
python hacks/sync_vault_secrets.py -n $FOURNOS_NAMESPACE --dry-runMakefile shortcuts (VAULT_ADDR, VAULT_TOKEN, and VAULT_SECRET_PATH must be set):
make sync-vault-secrets # syncs all entries
make sync-vault-secrets-dry-run # preview onlyThe synced secrets are labelled fournos.dev/vault-entry=true and
app.kubernetes.io/managed-by=fournos-vault-sync for easy identification.
Reference them in a FournosJob by their vault entry name — the operator
verifies the Secret exists and was imported from Vault:
spec:
secretRefs:
- my-credsAll settings are read from environment variables with the FOURNOS_ prefix:
| Variable | Default | Description |
|---|---|---|
FOURNOS_NAMESPACE |
required | Kubernetes namespace |
FOURNOS_TEKTON_DASHBOARD_URL |
Tekton Dashboard base URL | |
FOURNOS_KUBECONFIG_SECRET_PATTERN |
{cluster}-kubeconfig |
Pattern for resolving cluster names to Secret names |
FOURNOS_KUEUE_LOCAL_QUEUE_NAME |
fournos-queue |
Kueue LocalQueue name |
FOURNOS_GPU_RESOURCE_PREFIX |
fournos/gpu- |
Resource name prefix for GPU types |
FOURNOS_LOG_LEVEL |
INFO |
Logging level |
FOURNOS_GC_INTERVAL_SEC |
300 |
Resource GC interval (seconds) |
FournosJob CR ──→ Operator ──→ Kueue Workload ──→ (admission) ──→ Tekton PipelineRun ──→ FORGE ──→ target cluster
The operator runs as a single-replica Deployment using
kopf. On each FournosJob, it:
- Creates a Kueue Workload with the requested GPU resources (owned by the FournosJob via
ownerReferences) - Polls (5 s timer) for Kueue admission and assigned cluster
- Launches a Tekton PipelineRun with FORGE parameters (owned by the FournosJob via
ownerReferences) - Watches the PipelineRun until completion
- Deletes the Workload to release Kueue quota
Setting spec.shutdown on a FournosJob triggers cancellation of the
PipelineRun and transitions to phase=Stopping. Stop uses Tekton's
CancelledRunFinally (runs finally cleanup tasks); Terminate uses
Cancelled (skips finally tasks). In both cases the operator keeps
the Kueue Workload alive until the PipelineRun finishes, ensuring the
cluster slot is not released prematurely. Once done, the Workload is
deleted and the job moves to phase=Stopped.
Deleting a FournosJob automatically cascade-deletes its Workload and PipelineRun through Kubernetes owner references.
Target clusters need nothing installed — FORGE runs on the hub cluster inside
Tekton Task pods and communicates with targets via oc/kubectl through
kubeconfig Secrets.
For a detailed breakdown of the CRD, scheduling, operator internals, and key design decisions, see the Design Document.