This project is an Internal Developer Platform API built on FastAPI, Kubernetes, Terraform, PostgreSQL/SQLite, Redis, Helm, Prometheus, Grafana, and JWT authentication. It extends the original Kubernetes manifests into a mini Heroku/Render/Railway-style platform for provisioning infrastructure and deploying containerized applications through REST APIs.
The API receives authenticated platform requests, validates input, stores metadata in the database, and orchestrates Kubernetes or Terraform operations through service-layer modules.
Request flow for application deployment:
- User authenticates with JWT.
- API validates Docker image, namespace, port, replica, ingress, and autoscaling inputs.
- A deployment row is created in the database.
- Kubernetes service layer creates namespace, Deployment, Service, Ingress, and HPA.
- Deployment status, URL, autoscaling settings, and errors are persisted.
- Users query deployment status, logs, metrics, and cluster health through API endpoints.
app/ FastAPI app, configuration, logging
api/ Route handlers and Pydantic schemas
auth/ JWT, RBAC, rate limiting
database/ SQLAlchemy models and session lifecycle
services/ Kubernetes, Terraform, deployment, monitoring logic
web/ Developer dashboard served by FastAPI
kubernetes/ Cluster RBAC and network policy examples
terraform/ AWS Terraform templates
helm/ Helm chart for the API itself
monitoring/ Prometheus and Grafana examples
scripts/ Bootstrap, migration, production checklist helpers
tests/ Unit tests
Auth:
POST /auth/registerPOST /auth/loginGET /auth/me
Infrastructure:
POST /infrastructure/createGET /infrastructure/{id}DELETE /infrastructure/{id}
Deployments:
POST /deploymentsGET /deployments/{id}DELETE /deployments/{id}
Kubernetes:
POST /namespace/createPOST /service/exposePOST /autoscaling/createPOST /kubernetes/ingress/create
Monitoring:
GET /cluster/healthGET /metricsGET /logs/{pod}?namespace=default
Swagger/OpenAPI is available at /docs.
The developer dashboard is available at /dashboard/.
Create an environment file:
cp .env.example .envFor local development without a Kubernetes cluster or Terraform credentials, keep:
KUBERNETES_DRY_RUN=true
TERRAFORM_DRY_RUN=true
DATABASE_URL=sqlite:///./idp.db
ENABLE_PUBLIC_REGISTRATION=trueInstall dependencies and run the API:
python3 -m venv .venv
./.venv/bin/pip install -r requirements.txt
./.venv/bin/uvicorn app.main:app --reloadOpen the dashboard:
http://127.0.0.1:8000/dashboard/
The dashboard lets developers register/login, deploy Docker images, see app status, delete deployments, and fetch pod logs. It also includes app-template and image-catalog dropdowns so developers can start from known defaults and still override the generated values.
Register and log in:
curl -X POST http://localhost:8000/auth/register \
-H "Content-Type: application/json" \
-d '{"username":"platform-user","password":"change-me-123"}'
TOKEN=$(curl -s -X POST http://localhost:8000/auth/login \
-H "Content-Type: application/json" \
-d '{"username":"platform-user","password":"change-me-123"}' | jq -r .access_token)Deploy an application:
curl -X POST http://localhost:8000/deployments \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "demo-api",
"image": "nginx:1.25",
"port": 80,
"replicas": 2,
"min_replicas": 1,
"max_replicas": 5,
"cpu_threshold": 70
}'The infrastructure API records requests, returns 202 Accepted, and queues Terraform work for a worker that updates the infrastructure status. Local development can use the in-process background backend, while production should use the Redis-backed worker queue. It renders terraform/main.tf.j2 and can create or destroy AWS resources. It is dry-run by default. Before enabling real execution:
- Create an encrypted S3 backend bucket.
- Create a DynamoDB lock table.
- Replace
TERRAFORM_STATE_BUCKETandTERRAFORM_LOCK_TABLE. - Use IAM roles with least privilege.
- Review generated plans before production use.
- For production, move the background job behind a durable queue or use Terraform Cloud, Atlantis, GitHub Actions, or Argo Workflows for plan approval and audit history.
- Set
TERRAFORM_JOB_BACKEND=redisand run the worker withpython -m services.infra_worker.
Example:
curl -X POST http://localhost:8000/infrastructure/create \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "platform-dev",
"cloud_provider": "aws",
"config": {
"aws_region": "us-east-1",
"eks_role_arn": "arn:aws:iam::123456789012:role/EKSClusterRole",
"node_role_arn": "arn:aws:iam::123456789012:role/EKSNodeRole",
"state_bucket": "company-terraform-state",
"lock_table": "company-terraform-locks"
}
}'The initial response will have a status such as queued; poll GET /infrastructure/{id} for provisioning, ready, or failed. The Helm chart deploys a Terraform worker when worker.enabled=true.
Render or install the API chart:
helm template idp-api helm/charts/idp-api \
--set secrets.databaseUrl='postgresql://user:pass@postgres:5432/idp' \
--set secrets.secretKey='replace-with-long-random-secret'
helm upgrade --install idp-api helm/charts/idp-api \
--set image.repository=registry.example.com/idp-api \
--set image.tag=v1 \
--set secrets.databaseUrl='postgresql://user:pass@postgres:5432/idp' \
--set secrets.secretKey='replace-with-long-random-secret'The chart intentionally fails if image.tag is empty. Use a release tag or digest rather than latest.
Implemented:
- JWT authentication
- Role-aware user model
- Protected infrastructure, deployment, Kubernetes, and monitoring APIs
- Redis-backed rate limiting with local fallback
- Non-root Docker container
- Security headers and restricted CORS origin configuration
- Production startup validation for weak/default
SECRET_KEY - Helm defaults for public registration disabled, debug disabled, read-only root filesystem, and dropped Linux capabilities
- Kubernetes RBAC and network-policy examples
- No hardcoded production secret requirement in Helm
Recommended before production:
- Use AWS Secrets Manager, External Secrets Operator, or sealed-secrets.
- Keep public registration disabled unless you add an invite/admin onboarding flow.
- Replace SQLite with managed PostgreSQL.
- Use Alembic migrations.
- Run Terraform through the Redis worker queue, Terraform Cloud, Atlantis, GitHub Actions, or another workflow engine with audit history.
- Enforce tenant-aware namespace ownership.
- Add admission policies with Kyverno or OPA Gatekeeper.
- Use image allowlists and vulnerability scanning.
- Require immutable image digests for production deployments.
The API exposes Prometheus metrics at /metrics. Example scrape configuration and a Grafana dashboard starter live in monitoring/.
Recommended production stack:
- Prometheus Operator
- Grafana dashboards for API latency, error rate, Kubernetes deployment state, and Terraform failures
- Loki or OpenSearch for structured logs
- Alertmanager alerts for failed provisions, high error rate, and unhealthy clusters
The GitHub Actions workflow installs dependencies, runs linting/tests, and builds the Docker image. Registry push and Kubernetes deployment are intentionally placeholders until registry, cluster, and secret strategy are configured.
Phase 1: Architecture and folder structure are represented by the layered app layout.
Phase 2: FastAPI backend includes auth, validation, database models, OpenAPI, health checks, and rate limiting.
Phase 3: Kubernetes integration creates namespaces, deployments, services, ingress, HPA, status, logs, and safe deletes.
Phase 4: Terraform automation renders AWS templates and supports apply/destroy with remote-state configuration.
Phase 5: Monitoring exposes Prometheus metrics, cluster health, pod logs, and dashboard examples.
Phase 6: CI/CD builds, lints, tests, and prepares image/deployment stages.
Phase 7: Production hardening is documented in scripts/prod_checklist.md and should be completed before real cloud use.
- Move long-running deploy/provision tasks to Celery, RQ, Temporal, or Argo Workflows.
- Add per-tenant quotas for namespaces, replicas, CPU, memory, and load balancers.
- Use GitOps with ArgoCD for reconciliation and auditability.
- Split API, worker, scheduler, and webhook receiver into separate deployments.
- Use PostgreSQL row-level ownership checks and explicit tenant IDs.
- Add blue/green and canary deployment strategies with Argo Rollouts or Flagger.