Skip to content

feat: add onCompletion webhook hooks to TaskSpawner#1240

Open
j-bennet wants to merge 3 commits into
kelos-dev:mainfrom
datagravity-ai:feat/on-completion-hook-upstream
Open

feat: add onCompletion webhook hooks to TaskSpawner#1240
j-bennet wants to merge 3 commits into
kelos-dev:mainfrom
datagravity-ai:feat/on-completion-hook-upstream

Conversation

@j-bennet

@j-bennet j-bennet commented May 29, 2026

Copy link
Copy Markdown

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds a new spec.onCompletion field to TaskSpawnerSpec that configures outbound HTTP webhook notifications when spawned Tasks reach terminal phases (Succeeded or Failed). This enables push-based alerting to external systems (Slack incoming webhooks, PagerDuty, custom dashboards) without requiring users to poll the Kubernetes API or set up custom Prometheus alerting rules.

Key design decisions:

  • Hooks config is serialized into a Task annotation (kelos.dev/on-completion) at spawn time so the reporting loop can dispatch without looking up the TaskSpawner
  • Webhook dispatch runs in the existing reporting loop (not inline in the task controller) to keep reconciliation fast
  • Phase filtering defaults to both Succeeded and Failed when not specified
  • Idempotency via kelos.dev/webhook-report-phase annotation prevents duplicate deliveries
  • Custom headers from a referenced Secret — all keys in the Secret are sent as HTTP headers (e.g. Authorization, X-Api-Key)
  • SSRF protection: HTTPS-only CRD validation, runtime IP blocklist (IPv4/IPv6 private ranges), DNS resolution of hostnames, and redirect guard on all HTTP clients

Which issue(s) this PR is related to:

Fixes #749

Special notes for your reviewer:

This is Phase 1 of the incremental adoption path described in #749. Future phases will add built-in Slack notification formatting and consolidate GitHubReporting into the same framework.

The webhook payload includes: task name, namespace, spawner name, phase, message, agent type, model, start/completion times, outputs, and results (cost/tokens).

The secretRef field references a Secret whose keys are used as HTTP request headers. For example, a Secret with keys Authorization and X-Api-Key will set both headers on the webhook request.

Does this PR introduce a user-facing change?

Add `spec.onCompletion` field to TaskSpawner for configuring outbound webhook notifications when tasks reach terminal phases (Succeeded/Failed). Supports phase filtering and custom HTTP headers via secretRef (all Secret keys are sent as request headers).

Summary by cubic

Adds spec.onCompletion to TaskSpawner to send HTTPS webhooks when spawned Tasks finish (Succeeded/Failed). This enables push alerts to Slack/PagerDuty/custom endpoints without polling.

  • New Features

    • Configure one or more webhooks with optional phase filters (default: both).
    • Hooks are stamped onto each Task via kelos.dev/on-completion and dispatched from the reporting loop; delivery deduped with kelos.dev/webhook-report-phase using merge-patch retries.
    • Custom headers from a Secret are supported (all Secret keys are sent as headers).
    • Payload includes task/namespace/spawner, phase/message, agent type, model, start/completion times, outputs, and results.
  • Bug Fixes

    • Idempotency: persist the report-phase even when some hooks fail to prevent duplicate deliveries on subsequent cycles.
    • Reliability and security: run webhook reporting before source polling; reject URLs with embedded credentials; allow '@' in URL paths; strengthen SSRF guards (HTTPS-only, private/loopback/link-local and 0.0.0.0/8 IP blocks, DNS and redirect checks), and cache the SSRF-wrapped HTTP client.
    • Observability: log a warning and fall back to the default transport when a custom HTTP transport cannot be wrapped for SSRF protection.

Written for commit c8898c5. Summary will update on new commits.

Review in cubic

Add a new `spec.onCompletion` field to TaskSpawnerSpec that configures
outbound HTTP webhook notifications when spawned Tasks reach terminal
phases (Succeeded or Failed).

Key design:
- Hooks config is serialized into a Task annotation at spawn time so the
  reporting loop can dispatch without looking up the TaskSpawner
- SSRF protection: HTTPS-only, runtime IP blocklist (IPv4/IPv6 private
  ranges), DNS resolution of hostnames, and redirect guard
- Idempotency via annotation with conflict-retrying merge patch
- All keys in the referenced Secret are sent as HTTP headers
- Phase filtering defaults to both Succeeded and Failed

Fixes kelos-dev#749

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 9 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread internal/reporting/webhook_test.go
Comment thread cmd/kelos-spawner/reconciler.go Outdated
Comment thread internal/reporting/webhook.go Outdated
Comment thread internal/reporting/webhook.go
Comment thread cmd/kelos-spawner/main.go
Comment thread api/v1alpha1/taskspawner_types.go Outdated
- Fix partial hook failure causing duplicate deliveries: persist the
  idempotency annotation even on partial failure to prevent re-delivery
  to hooks that already succeeded
- Run webhook reporting before source-polling (not in defer) to avoid
  operating on a potentially-cancelled context
- Fix CRD URL pattern to only block @ in the authority (host) portion,
  allowing legitimate @ in URL paths (e.g. MS Teams webhooks)
- Fix ssrfSafeTransport to cleanly fall back when base is not
  *http.Transport, with extracted ssrfDialContext function
- Cache the SSRF-wrapped HTTP client in WebhookReporter to avoid
  per-call transport cloning
- Add 0.0.0.0/8 to private IP ranges (routes to loopback on Linux)
- Guard against empty DNS result panic in ssrfDialContext
- Move privateRanges to package-level var to avoid per-call allocation
- Add missing onCompletionAnnotation call to stamp hooks onto tasks
- Reject URLs with embedded credentials (userinfo) at both CRD and
  runtime validation layers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 7 files (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread internal/reporting/webhook.go
When the injected HTTP client uses a non-standard RoundTripper that
cannot be type-asserted to *http.Transport, emit a log message so
operators know the SSRF transport falls back to DefaultTransport
settings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@j-bennet

j-bennet commented Jun 3, 2026

Copy link
Copy Markdown
Author

I would appreciate if you could take a look, @gjkim42. Thank you.

@knechtionscoding

Copy link
Copy Markdown
Contributor

Bump @gjkim42 (assuming this is something we want, I don't think it adds a large amount of complexity or surface area)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Categorizes issue or PR as related to a new feature needs-actor needs-priority needs-triage release-note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API: Add onCompletion notification hooks to TaskSpawner for outbound event delivery on task terminal phases

2 participants