Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,16 @@ REDIS_URL=redis://localhost:6379/0
SECRET_KEY=replace-with-a-long-random-secret
JWT_ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=60
ALLOWED_ORIGINS=http://localhost:8000,http://127.0.0.1:8000
ENABLE_PUBLIC_REGISTRATION=true
AWS_REGION=us-east-1
DEFAULT_INGRESS_DOMAIN=apps.local
KUBERNETES_NAMESPACE_PREFIX=tenant
KUBERNETES_DRY_RUN=true
TERRAFORM_DRY_RUN=true
TERRAFORM_STATE_BUCKET=replace-me-terraform-state
TERRAFORM_LOCK_TABLE=replace-me-terraform-locks
TERRAFORM_JOB_BACKEND=background
TERRAFORM_JOB_REDIS_URL=redis://localhost:6379/1
TERRAFORM_JOB_QUEUE_NAME=terraform-jobs
RATE_LIMIT_REQUESTS_PER_HOUR=100
14 changes: 13 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,19 @@
FROM python:3.11-slim

ARG TARGETARCH
ARG TERRAFORM_VERSION=1.8.5

# Create non-root user
RUN useradd -m appuser
RUN apt-get update \
&& apt-get install -y --no-install-recommends ca-certificates curl unzip \
&& arch="${TARGETARCH:-amd64}" \
&& curl -fsSL "https://releases.hashicorp.com/terraform/${TERRAFORM_VERSION}/terraform_${TERRAFORM_VERSION}_linux_${arch}.zip" -o /tmp/terraform.zip \
&& unzip /tmp/terraform.zip -d /usr/local/bin \
&& chmod 0755 /usr/local/bin/terraform \
&& rm -f /tmp/terraform.zip \
&& apt-get purge -y --auto-remove curl unzip \
&& rm -rf /var/lib/apt/lists/* \
&& useradd -m appuser
WORKDIR /app

COPY requirements.txt ./
Expand Down
20 changes: 17 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ For local development without a Kubernetes cluster or Terraform credentials, kee
KUBERNETES_DRY_RUN=true
TERRAFORM_DRY_RUN=true
DATABASE_URL=sqlite:///./idp.db
ENABLE_PUBLIC_REGISTRATION=true
```

Install dependencies and run the API:
Expand Down Expand Up @@ -133,13 +134,15 @@ curl -X POST http://localhost:8000/deployments \

## Terraform

The infrastructure API renders [terraform/main.tf.j2](terraform/main.tf.j2) and can create or destroy AWS resources. It is dry-run by default. Before enabling real execution:
The infrastructure API records requests, returns `202 Accepted`, and queues Terraform work for a worker that updates the infrastructure status. Local development can use the in-process `background` backend, while production should use the Redis-backed worker queue. It renders [terraform/main.tf.j2](terraform/main.tf.j2) and can create or destroy AWS resources. It is dry-run by default. Before enabling real execution:

- Create an encrypted S3 backend bucket.
- Create a DynamoDB lock table.
- Replace `TERRAFORM_STATE_BUCKET` and `TERRAFORM_LOCK_TABLE`.
- Use IAM roles with least privilege.
- Review generated plans before production use.
- For production, move the background job behind a durable queue or use Terraform Cloud, Atlantis, GitHub Actions, or Argo Workflows for plan approval and audit history.
- Set `TERRAFORM_JOB_BACKEND=redis` and run the worker with `python -m services.infra_worker`.

Example:

Expand All @@ -152,11 +155,16 @@ curl -X POST http://localhost:8000/infrastructure/create \
"cloud_provider": "aws",
"config": {
"aws_region": "us-east-1",
"eks_role_arn": "arn:aws:iam::123456789012:role/EKSRole"
"eks_role_arn": "arn:aws:iam::123456789012:role/EKSClusterRole",
"node_role_arn": "arn:aws:iam::123456789012:role/EKSNodeRole",
"state_bucket": "company-terraform-state",
"lock_table": "company-terraform-locks"
}
}'
```

The initial response will have a status such as `queued`; poll `GET /infrastructure/{id}` for `provisioning`, `ready`, or `failed`. The Helm chart deploys a Terraform worker when `worker.enabled=true`.

## Helm Deployment

Render or install the API chart:
Expand All @@ -173,6 +181,8 @@ helm upgrade --install idp-api helm/charts/idp-api \
--set secrets.secretKey='replace-with-long-random-secret'
```

The chart intentionally fails if `image.tag` is empty. Use a release tag or digest rather than `latest`.

## Security

Implemented:
Expand All @@ -182,15 +192,19 @@ Implemented:
- Protected infrastructure, deployment, Kubernetes, and monitoring APIs
- Redis-backed rate limiting with local fallback
- Non-root Docker container
- Security headers and restricted CORS origin configuration
- Production startup validation for weak/default `SECRET_KEY`
- Helm defaults for public registration disabled, debug disabled, read-only root filesystem, and dropped Linux capabilities
- Kubernetes RBAC and network-policy examples
- No hardcoded production secret requirement in Helm

Recommended before production:

- Use AWS Secrets Manager, External Secrets Operator, or sealed-secrets.
- Keep public registration disabled unless you add an invite/admin onboarding flow.
- Replace SQLite with managed PostgreSQL.
- Use Alembic migrations.
- Run Terraform through a job queue or workflow engine rather than synchronous HTTP requests.
- Run Terraform through the Redis worker queue, Terraform Cloud, Atlantis, GitHub Actions, or another workflow engine with audit history.
- Enforce tenant-aware namespace ownership.
- Add admission policies with Kyverno or OPA Gatekeeper.
- Use image allowlists and vulnerability scanning.
Expand Down
3 changes: 3 additions & 0 deletions api/routes/auth.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from fastapi import APIRouter, HTTPException, Depends
from sqlalchemy.orm import Session
from app.config import settings
from database.models import User
from auth.jwt_utils import get_password_hash, verify_password, create_access_token
from auth.rbac import get_current_user
Expand All @@ -10,6 +11,8 @@

@router.post("/register")
def register(request: RegisterRequest, db: Session = Depends(get_db)):
if not settings.ENABLE_PUBLIC_REGISTRATION:
raise HTTPException(status_code=403, detail="Public registration is disabled")
user = db.query(User).filter(User.username == request.username).first()
if user:
raise HTTPException(status_code=400, detail="Username already registered")
Expand Down
21 changes: 17 additions & 4 deletions api/routes/catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
{
"id": "nginx-web",
"name": "Nginx web app",
"description": "Official nginx image that serves a default HTTP page on port 80.",
"description": "Official nginx image pinned to a stable tag for a tiny static web app.",
"default_app_name": "nginx-web",
"image": "nginx:1.25",
"image": "nginx:1.25-alpine",
"port": 80,
"replicas": 2,
"min_replicas": 1,
Expand All @@ -30,7 +30,7 @@
{
"id": "whoami-api",
"name": "Whoami API",
"description": "Tiny HTTP app that returns request and container details.",
"description": "Tiny HTTP app that returns request metadata; useful for smoke tests.",
"default_app_name": "whoami-api",
"image": "traefik/whoami:v1.10",
"port": 80,
Expand All @@ -51,13 +51,26 @@
"max_replicas": 5,
"cpu_threshold": 70,
},
{
"id": "echo-server",
"name": "Echo server",
"description": "Small request/response test service for ingress and header debugging.",
"default_app_name": "echo-server",
"image": "ealen/echo-server:0.9.2",
"port": 80,
"replicas": 2,
"min_replicas": 1,
"max_replicas": 5,
"cpu_threshold": 70,
},
]

IMAGE_CATALOG = [
{"label": "nginx 1.25", "image": "nginx:1.25", "port": 80},
{"label": "nginx 1.25 alpine", "image": "nginx:1.25-alpine", "port": 80},
{"label": "httpd 2.4", "image": "httpd:2.4", "port": 80},
{"label": "traefik whoami", "image": "traefik/whoami:v1.10", "port": 80},
{"label": "nginx hello demo", "image": "nginxdemos/hello:plain-text", "port": 80},
{"label": "echo server", "image": "ealen/echo-server:0.9.2", "port": 80},
]


Expand Down
58 changes: 37 additions & 21 deletions api/routes/infrastructure.py
Original file line number Diff line number Diff line change
@@ -1,61 +1,77 @@
from fastapi import APIRouter, Depends, HTTPException
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException
from sqlalchemy.orm import Session

from api.schemas import InfrastructureCreateRequest, InfrastructureResponse
from auth.rbac import get_current_user
from database.models import Infrastructure, User
from database.session import get_db
from services.infra_service import provision_infrastructure, destroy_infrastructure
from services.infra_queue import InfrastructureQueueError, enqueue_infrastructure_job
from services.infra_service import validate_infrastructure_config

router = APIRouter()

@router.post("/create", response_model=InfrastructureResponse, status_code=201)

@router.post("/create", response_model=InfrastructureResponse, status_code=202)
def create_infrastructure(
request: InfrastructureCreateRequest,
background_tasks: BackgroundTasks,
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user),
):
validation_error = validate_infrastructure_config(request.name, request.cloud_provider, request.config)
if validation_error:
raise HTTPException(status_code=400, detail=validation_error)

infra = Infrastructure(
owner_id=current_user.id,
name=request.name,
cloud_provider=request.cloud_provider,
config=request.config,
status="provisioning",
status="queued",
)
db.add(infra)
db.commit()
db.refresh(infra)
result = provision_infrastructure(request.name, request.cloud_provider, request.config)
if result is True:
infra.status = "ready"
else:
try:
enqueue_infrastructure_job("provision", infra.id, background_tasks)
except InfrastructureQueueError as exc:
infra.status = "failed"
infra.last_error = str(result)
db.add(infra)
db.commit()
db.refresh(infra)
infra.last_error = str(exc)
db.add(infra)
db.commit()
raise HTTPException(status_code=503, detail=str(exc))
return infra


@router.delete("/{id}")
def delete_infrastructure(
id: int,
background_tasks: BackgroundTasks,
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user),
):
infra = db.query(Infrastructure).filter(Infrastructure.id == id, Infrastructure.owner_id == current_user.id).first()
if not infra:
raise HTTPException(status_code=404, detail="Infrastructure not found")
result = destroy_infrastructure(infra.name, infra.cloud_provider, infra.config or {})
if result is True:
infra.status = "deleted"
db.add(infra)
db.commit()
return {"id": id, "status": "deleted"}
infra.status = "delete_failed"
infra.last_error = str(result)
if infra.status in {"deleting", "deleted"}:
return {"id": id, "status": infra.status}
if infra.status in {"queued", "provisioning"}:
raise HTTPException(status_code=409, detail="Infrastructure is still provisioning")

infra.status = "delete_queued"
infra.last_error = None
db.add(infra)
db.commit()
raise HTTPException(status_code=500, detail=str(result))
try:
enqueue_infrastructure_job("destroy", infra.id, background_tasks)
except InfrastructureQueueError as exc:
infra.status = "delete_failed"
infra.last_error = str(exc)
db.add(infra)
db.commit()
raise HTTPException(status_code=503, detail=str(exc))
return {"id": id, "status": infra.status}


@router.get("/{id}", response_model=InfrastructureResponse)
def get_infrastructure(
Expand Down
18 changes: 9 additions & 9 deletions api/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@ class TokenResponse(BaseModel):


class RegisterRequest(BaseModel):
username: str = Field(..., min_length=3, max_length=64)
username: str = Field(..., min_length=3, max_length=64, regex=r"^[a-zA-Z0-9_.-]+$")
password: str = Field(..., min_length=8, max_length=128)


class LoginRequest(BaseModel):
username: str
password: str
username: str = Field(..., min_length=3, max_length=64)
password: str = Field(..., min_length=8, max_length=128)


class InfrastructureCreateRequest(BaseModel):
Expand Down Expand Up @@ -92,16 +92,16 @@ class ServiceExposeRequest(BaseModel):


class IngressRequest(BaseModel):
namespace: str
name: str
service_name: str
namespace: str = Field(..., min_length=3, max_length=63, regex=r"^[a-z0-9]([-a-z0-9]*[a-z0-9])?$")
name: str = Field(..., min_length=3, max_length=63, regex=r"^[a-z0-9]([-a-z0-9]*[a-z0-9])?$")
service_name: str = Field(..., min_length=3, max_length=63, regex=r"^[a-z0-9]([-a-z0-9]*[a-z0-9])?$")
service_port: int = Field(..., ge=1, le=65535)
host: str
host: str = Field(..., min_length=3, max_length=253, regex=r"^[a-z0-9]([-a-z0-9.]*[a-z0-9])?$")


class AutoscalingRequest(BaseModel):
namespace: str = "default"
deployment: str
namespace: str = Field(default="default", min_length=3, max_length=63, regex=r"^[a-z0-9]([-a-z0-9]*[a-z0-9])?$")
deployment: str = Field(..., min_length=3, max_length=63, regex=r"^[a-z0-9]([-a-z0-9]*[a-z0-9])?$")
min_replicas: int = Field(..., ge=1)
max_replicas: int = Field(..., ge=1)
cpu_threshold: int = Field(..., ge=10, le=95)
35 changes: 33 additions & 2 deletions app/config.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
from functools import lru_cache
from pydantic import BaseSettings, Field
from pydantic import BaseSettings, Field, validator


PRODUCTION_ENVIRONMENTS = {"production", "prod", "staging"}
DEFAULT_SECRET_KEY = "change-me-in-production"


class Settings(BaseSettings):
PROJECT_NAME: str = "Cloud Infrastructure Provisioning API"
Expand All @@ -8,19 +13,45 @@ class Settings(BaseSettings):
DEBUG: bool = Field(default=True, env="APP_DEBUG")
DATABASE_URL: str = Field(default="sqlite:///./idp.db", env="DATABASE_URL")
REDIS_URL: str = Field(default="redis://localhost:6379/0", env="REDIS_URL")
SECRET_KEY: str = Field(default="change-me-in-production", env="SECRET_KEY")
SECRET_KEY: str = Field(default=DEFAULT_SECRET_KEY, env="SECRET_KEY")
JWT_ALGORITHM: str = "HS256"
ACCESS_TOKEN_EXPIRE_MINUTES: int = Field(default=60, env="ACCESS_TOKEN_EXPIRE_MINUTES")
ALLOWED_ORIGINS: str = Field(default="http://localhost:8000,http://127.0.0.1:8000", env="ALLOWED_ORIGINS")
ENABLE_PUBLIC_REGISTRATION: bool = Field(default=True, env="ENABLE_PUBLIC_REGISTRATION")
AWS_REGION: str = Field(default="us-east-1", env="AWS_REGION")
DEFAULT_INGRESS_DOMAIN: str = Field(default="apps.local", env="DEFAULT_INGRESS_DOMAIN")
KUBERNETES_NAMESPACE_PREFIX: str = Field(default="tenant", env="KUBERNETES_NAMESPACE_PREFIX")
KUBERNETES_DRY_RUN: bool = Field(default=False, env="KUBERNETES_DRY_RUN")
TERRAFORM_DRY_RUN: bool = Field(default=True, env="TERRAFORM_DRY_RUN")
TERRAFORM_STATE_BUCKET: str = Field(default="replace-me-terraform-state", env="TERRAFORM_STATE_BUCKET")
TERRAFORM_LOCK_TABLE: str = Field(default="replace-me-terraform-locks", env="TERRAFORM_LOCK_TABLE")
TERRAFORM_JOB_BACKEND: str = Field(default="background", env="TERRAFORM_JOB_BACKEND")
TERRAFORM_JOB_REDIS_URL: str = Field(default="redis://localhost:6379/1", env="TERRAFORM_JOB_REDIS_URL")
TERRAFORM_JOB_QUEUE_NAME: str = Field(default="terraform-jobs", env="TERRAFORM_JOB_QUEUE_NAME")
RATE_LIMIT_REQUESTS_PER_HOUR: int = Field(default=100, env="RATE_LIMIT_REQUESTS_PER_HOUR")
REQUIRE_AUTH_FOR_PLATFORM_APIS: bool = Field(default=True, env="REQUIRE_AUTH_FOR_PLATFORM_APIS")

@validator("SECRET_KEY")
def require_strong_secret_for_non_local(cls, value, values):
environment = values.get("ENVIRONMENT", "local").lower()
if environment in PRODUCTION_ENVIRONMENTS and (value == DEFAULT_SECRET_KEY or len(value) < 32):
raise ValueError("SECRET_KEY must be changed to a random value of at least 32 characters")
return value

@validator("TERRAFORM_JOB_BACKEND")
def validate_terraform_job_backend(cls, value):
if value not in {"background", "redis"}:
raise ValueError("TERRAFORM_JOB_BACKEND must be either 'background' or 'redis'")
return value

@property
def allowed_origins_list(self) -> list[str]:
return [origin.strip() for origin in self.ALLOWED_ORIGINS.split(",") if origin.strip()]

@property
def is_production_like(self) -> bool:
return self.ENVIRONMENT.lower() in PRODUCTION_ENVIRONMENTS

class Config:
env_file = ".env"
case_sensitive = True
Expand Down
Loading
Loading