Skip to content

feat: add Civo Project cluster template#24

Open
jokestax wants to merge 67 commits intomainfrom
saas-rishi
Open

feat: add Civo Project cluster template#24
jokestax wants to merge 67 commits intomainfrom
saas-rishi

Conversation

@jokestax
Copy link
Copy Markdown
Contributor

No description provided.

jokestax and others added 30 commits April 2, 2026 20:31
* feat: add GPU cluster template and catalog items for Friday demo

- Create templates/civo/gpu-cluster/ with isGpu=true and is_gpu Terraform var
- Backfill is_gpu=false on workload-cluster template for consistency
- Add nvidia-nim-operator catalog item (k8s-nim-operator v3.0.2)
- Add runai catalog item (control-plane + cluster-agent v2.20.0)

Closes #35, Closes #36, Closes #37, Closes #38

* feat: add konstruct.yaml metadata to gpu-cluster and workload-cluster templates

Required for template registration via Konstruct UI/API.

* fix: correct terraform module path in template default URLs

The module lives under civo-github/, not at the repo root.

* fix: update open-webui default chart version to 13.3.1

* fix: cert-manager template fixes from gpu cluster dry run

- Downgrade cert-manager from v1.20.1 to v1.16.5 (selectableFields CRD
  incompatibility with older K8s versions)
- Disable startupapicheck (hangs on Talos clusters)
- Add ServerSideApply=true sync option (required for CRD apply)
- Remove duplicate 35-cert-manager.yaml (caused ArgoCD warnings)
- Add ServerSideApply to ollama-openwebui catalog app

* fix: add ServerSideApply to NIM operator and Run.ai catalog apps

Learned from dry run that CRD-heavy charts need ServerSideApply
to avoid annotation size limits during ArgoCD sync.

* fix: update Run.ai catalog to use public runai-cluster chart

- Switch from JFrog control-plane chart to GCS runai-cluster chart
  (publicly accessible, no auth required)
- Simplify to single Application (cluster agent only, no separate
  control-plane chart needed)
- Update chart version to 2.16.79
- Replace domain/TLS config with cluster URL/UID inputs

* fix(runai): restructure catalog for ArgoCD hook compatibility

ArgoCD interprets helm.sh/hook annotations as hooks and blocks sync.
Restructured Run.ai catalog to use pre-rendered manifests (hooks stripped)
instead of direct Helm chart source.

Catalog now outputs 3 Applications:
- Parent app (wave 47): points at runai environment dir
- Prereqs app (wave 49): creates runai, monitoring, runai-scale-adjust,
  runai-reservation namespaces
- Manifests app (wave 50): deploys rendered runai-cluster 2.16.79 chart

Static files in static/ dir are copied to gitops repo alongside rendered
templates. Manifests use <RUNAI_CLUSTER_URL> and <RUNAI_CLUSTER_UID>
token placeholders for cluster-specific values.

* fix: set civo-gpu demo defaults in NIM and Run.ai catalog values

* fix(nim): disable upgradeCRD hook in catalog template

ArgoCD blocks on helm.sh/hook pre-upgrade Job. Setting
operator.upgradeCRD=false prevents the hook from being rendered.
Add catalog templates for: envoy-gateway, external-dns, cert-manager,
external-secrets, vault, prometheus (kube-prometheus-stack), grafana,
reloader, and groundcover. Each template includes Chart.yaml, values.yaml
with @input annotations for UI form generation, and an ArgoCD Application
template.

Closes #41
PSP was removed in Kubernetes 1.25+. The groundcover chart 0.36.3
creates PSP resources by default, causing sync failures on modern
clusters. Disable pspEnabled across all subcharts (agent, loki,
victoria-metrics, promscale, timescaledb) with a configurable toggle
defaulting to false.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants