Skip to content

Block private and loopback dials in webhook HTTP client#5190

Open
JAORMX wants to merge 2 commits into
mainfrom
pr/webhook-ssrf-fix
Open

Block private and loopback dials in webhook HTTP client#5190
JAORMX wants to merge 2 commits into
mainfrom
pr/webhook-ssrf-fix

Conversation

@JAORMX

@JAORMX JAORMX commented May 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

Follow-up to the round-2 review of #4564 (item #5). A tenant with rights to create MCPWebhookConfig could previously point url at any HTTPS endpoint, including 169.254.169.254 (cloud metadata), 127.0.0.1, RFC1918 ranges, and IPv6 loopback / link-local / ULA addresses. The webhook HTTP transport built a bare http.Transport with no DialContext; the wrapping networking.ValidatingTransport only checks the URL scheme — it does not validate the resolved peer address — so cross-tenant access to cloud metadata or in-cluster services was unblocked.

This PR wires networking.ProtectedDialerControl into the inner transport's DialContext so private, loopback, and link-local destinations are rejected at dial time, regardless of whether ValidatingTransport's HTTPS check is bypassed via InsecureAllowHTTP. The hook is held in an atomic.Pointer so test overrides remain race-free if a future test introduces t.Parallel().

protectedDialerControl is exported as ProtectedDialerControl in pkg/networking so the webhook package can install it without duplicating the body. Cross-package test injection (SetDialerControlForTesting, SetDialerControlForTestMain, AllowAnyDialerControl) is added so the existing httptest-based suite in pkg/webhook and its subpackages keeps working; subpackage TestMain functions install the permissive override at suite startup.

Type of change

  • Bug fix

Test plan

  • Unit tests (task testpkg/webhook/... and pkg/networking/... pass with -race)
  • Linting (scoped golangci-lint reports 0 issues)
  • New TestClientSSRFGuardBlocksPrivateAddress covers IPv4 loopback, RFC1918, and 169.254.169.254, plus IPv6 loopback (::1), link-local (fe80::1), and ULA (fc00::1).
  • New TestBuildTransportInstallsDialerGuard asserts DialContext is wired on the inner transport.

Special notes for reviewers

  • The three exported test helpers in pkg/webhook/dialer_testing.go add public surface area in service of cross-package test injection. A reviewer raised the option of moving them to a pkg/webhook/internal/dialer subpackage; that's a viable follow-up but feels like scope creep here. Each helper has a Production code MUST NOT call this function deterrent in its godoc and the testing.TB argument signals test-scoped intent.
  • Item Figure out ergonomics for exposing directories #6 (InsecureSkipVerify conflates cert-skip with plaintext-HTTP) and item Implement secret injection #5's broader IPv6 coverage in pkg/networking itself are tracked separately.

🤖 Generated with Claude Code

@github-actions github-actions Bot added the size/S Small PR: 100-299 lines changed label May 5, 2026
@JAORMX JAORMX force-pushed the pr/webhook-ssrf-fix branch from f1ad404 to a48d30f Compare May 5, 2026 10:16
@github-actions github-actions Bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels May 5, 2026
jhrozek
jhrozek previously approved these changes May 13, 2026
@github-actions github-actions Bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels May 18, 2026
A tenant with rights to create MCPWebhookConfig could previously point
url at any HTTPS endpoint, including 169.254.169.254, 127.0.0.1, RFC1918
ranges, and IPv6 loopback or link-local addresses. The webhook HTTP
transport built a bare http.Transport with no DialContext; the wrapping
networking.ValidatingTransport only checks the URL scheme, not the
resolved peer address, so cross-tenant access to cloud metadata or
in-cluster services was unblocked.

Wire networking.ProtectedDialerControl into the inner transport's
DialContext so private, loopback, and link-local destinations are
rejected at dial time, regardless of whether ValidatingTransport's
HTTPS check is bypassed via InsecureAllowHTTP. The hook is held in an
atomic.Pointer so test overrides remain race-free if a future test
introduces t.Parallel().

Export protectedDialerControl as ProtectedDialerControl in pkg/networking
so the webhook package can install it without duplicating the body.

Add cross-package test injection (SetDialerControlForTesting,
SetDialerControlForTestMain, AllowAnyDialerControl) so existing
httptest-based tests in pkg/webhook and its subpackages continue to
work; subpackage TestMain functions install the permissive override
at suite startup. The pkg/runner webhook middleware integration test
installs the same override, since its production webhook clients dial
httptest servers on 127.0.0.1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX force-pushed the pr/webhook-ssrf-fix branch from 3fa55f1 to cb9b4ad Compare May 29, 2026 11:11
@JAORMX JAORMX requested a review from lujunsan as a code owner May 29, 2026 11:11
@github-actions github-actions Bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels May 29, 2026
@codecov

codecov Bot commented May 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.84%. Comparing base (6f63ac0) to head (cb9b4ad).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5190      +/-   ##
==========================================
+ Coverage   68.77%   68.84%   +0.07%     
==========================================
  Files         629      630       +1     
  Lines       63914    63931      +17     
==========================================
+ Hits        43955    44012      +57     
+ Misses      16703    16664      -39     
+ Partials     3256     3255       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JAORMX JAORMX requested a review from jhrozek May 29, 2026 11:32

@jhrozek jhrozek left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One non-blocking design note from reviewing the SSRF fix. The change itself is correct and closes a real gap — this is just a thought on where the plumbing could live longer-term.

Comment thread pkg/webhook/client.go Outdated
Control: func(network, address string, c syscall.RawConn) error {
return (*dialerControl.Load())(network, address, c)
},
}).DialContext,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking / follow-up thought, not for this PR.

This hand-builds a transport with its own dialer guard, but pkg/networking.HttpClientBuilder already does almost exactly this: it installs ProtectedDialerControl and gives tests a clean opt-out via WithPrivateIPs(true) — no package-level global, no atomic.Pointer, and no exported test helpers. The bespoke dialerControl global (plus dialer_testing.go, which is the only *_testing.go file in the repo) is really what forces us to export the guard-disabling helpers.

As far as I can tell, only two gaps stop webhook from just using the builder today:

  • mTLS client certs (ClientCertPath/ClientKeyPath)
  • a tls.Config.InsecureSkipVerify knob distinct from the HTTP-allow knob

So a possible follow-up (bigger than this PR): add WithClientCert(certPath, keyPath) and WithInsecureSkipVerify(bool) to HttpClientBuilder, then have webhook call the builder and delete the global + dialer_testing.go entirely. Happy to file an issue for it rather than expand scope here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — the builder is clearly the better long-term home for this, and folding webhook into it would let us delete the global dialerControl pointer and dialer_testing.go entirely. I'll file a follow-up issue for adding WithClientCert/WithInsecureSkipVerify to HttpClientBuilder and migrating webhook onto it. This is non-blocking for the SSRF fix here, so no changes needed in this PR.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: went ahead and did this in the PR rather than filing a follow-up. Added WithClientCert/WithInsecureSkipVerify to HttpClientBuilder, migrated buildTransport to build the client through it, and replaced the dialerControl global/atomic.Pointer/dialer_testing.go machinery with a plain atomic.Bool test-only flag. All existing tests (including the SSRF guard integration test) pass unchanged.

Webhook hand-rolled its own transport with a package-level dialerControl
global (an atomic.Pointer to an arbitrary Control function) duplicating
SSRF-guard logic that pkg/networking.HttpClientBuilder already provides.
Add the two capabilities the builder was missing (mTLS client certs and
InsecureSkipVerify) so webhook can build its client through it instead
of maintaining a parallel implementation. The remaining test-only
private-IP override is now a plain atomic.Bool instead of an injectable
function pointer.
@JAORMX JAORMX requested a review from rdimitrov as a code owner July 3, 2026 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small PR: 100-299 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants