Skip to content

fix security: feat(resources): harden HTTP resource ingestion against private-network SSRF#1133

Open
13ernkastel wants to merge 3 commits intovolcengine:mainfrom
13ernkastel:security/http-resource-ssrf-guard
Open

fix security: feat(resources): harden HTTP resource ingestion against private-network SSRF#1133
13ernkastel wants to merge 3 commits intovolcengine:mainfrom
13ernkastel:security/http-resource-ssrf-guard

Conversation

@13ernkastel
Copy link
Copy Markdown
Contributor

@13ernkastel 13ernkastel commented Mar 31, 2026

Summary

This change closes an authenticated SSRF path in the HTTP resource ingestion flow. Before this patch, /api/v1/resources accepted arbitrary remote URLs, the parser stack issued server-side HEAD and GET requests with redirects enabled, and the fetched content could then be read back through normal content APIs. A low-privilege API caller could abuse that behavior to reach loopback, RFC1918, link-local, or metadata services reachable from the OpenViking host.

CVSS v3.1: 8.8 High (CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:H/I:L/A:L)

Root Cause

  • Remote-target validation only distinguished URL input from direct filesystem paths.
  • The HTML fetch path performed outbound requests without checking whether the destination resolved to a private or otherwise non-public address.
  • Permission-style security rejections in the parser path could be swallowed as parse failures instead of surfacing as hard API errors.

What Changed

  • Added openviking/utils/network_guard.py to extract destination hosts, resolve them, reject non-public addresses, and build per-request httpx validation hooks.
  • Enforced public-target validation in openviking/server/local_input_guard.py and enabled it for the HTTP /api/v1/resources route.
  • Added enforce_public_remote_targets in openviking/service/resource_service.py so the service injects request validation into the parser chain and preserves the enforcement flag for watch-based reprocessing.
  • Threaded the request validator through openviking/utils/media_processor.py into openviking/parse/parsers/html.py so URL detection, HTML fetches, downloads, redirects, and proxy inheritance are all checked consistently.
  • Updated openviking/utils/resource_processor.py to re-raise OpenVikingError so blocked requests terminate as structured permission failures instead of degrading into parse warnings.
  • Added regression coverage in tests/server/test_api_local_input_security.py for loopback HTTP targets, private git/SSH targets, and parser-level enforcement.

Simple PoC (localhost-only, pre-patch behavior)

This is a controlled reproduction against a local test instance only. It demonstrates the SSRF primitive without targeting cloud metadata or real internal services.

  1. Start a loopback-only HTTP server that returns a unique token.
mkdir -p /tmp/ov-ssrf-poc
printf 'SSRF_PROOF_TOKEN_9f2d1b\n' > /tmp/ov-ssrf-poc/index.html
python3 -m http.server 8765 --bind 127.0.0.1 --directory /tmp/ov-ssrf-poc
  1. Ask OpenViking to ingest that loopback URL.
curl -X POST http://localhost:1933/api/v1/resources \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-key" \
  -d '{
    "path": "http://127.0.0.1:8765/",
    "wait": true,
    "reason": "localhost-only ssrf reproduction"
  }'
  1. Use the returned root_uri to inspect the imported tree and then read back the stored content.
curl -X GET "http://localhost:1933/api/v1/fs/tree?uri=<root_uri>" \
  -H "X-API-Key: your-key"
curl -X GET "http://localhost:1933/api/v1/content/read?uri=<stored_uri>" \
  -H "X-API-Key: your-key"

Before this patch, the content APIs can return SSRF_PROOF_TOKEN_9f2d1b, proving that the server fetched a loopback-only resource and exposed the response through normal ingestion APIs. After this patch, the initial POST /api/v1/resources request is rejected with PERMISSION_DENIED.

Validation

  • uv run --no-project --python 3.12 python -m py_compile openviking/utils/network_guard.py openviking/server/local_input_guard.py openviking/server/routers/resources.py openviking/service/resource_service.py openviking/utils/media_processor.py openviking/utils/resource_processor.py openviking/parse/parsers/html.py tests/server/test_api_local_input_security.py
  • Local dynamic verification against the parser path showed that the vulnerable flow could previously send HEAD and GET requests to a loopback-only HTTP server and retrieve a unique response token.
  • After this patch, the same loopback target is blocked at precheck, URL detection, and fetch time with PERMISSION_DENIED, and the loopback server receives no requests.
  • A direct pytest run was not completed in this workspace because the repository runtime environment is incomplete here (openviking import currently requires additional dependencies such as requests and bundled AGFS setup).

Follow-up Hardening

  • Extend the same transport-level validation to non-httpx repository fetchers such as the GitHub ZIP download path and git clone, so repository ingestion gets equivalent redirect and proxy protections.
  • Consider connection-time IP pinning if the project wants stronger resistance to DNS rebinding between validation and connect.
  • If controlled intranet ingestion is a legitimate product requirement, gate it behind an explicit administrator allowlist rather than implicit access to private networks.

Code Walkthrough

  1. openviking/utils/network_guard.py
    Introduces the destination-host parser, DNS resolution checks, non-public address rejection, and reusable httpx request hooks.
  2. openviking/server/local_input_guard.py and openviking/server/routers/resources.py
    Keep the existing remote-input contract but now require public remote targets for server-side resource ingestion.
  3. openviking/service/resource_service.py
    Adds enforce_public_remote_targets, re-validates remote sources, injects the request validator into parser kwargs, and preserves the boolean flag for watch-task reprocessing.
  4. openviking/utils/media_processor.py and openviking/parse/parsers/html.py
    Propagate the validator into URL detection and fetch helpers so outbound HEAD and GET requests, redirects, and proxy inheritance are checked consistently.
  5. openviking/utils/resource_processor.py
    Re-raises framework security errors instead of flattening them into parse warnings.
  6. tests/server/test_api_local_input_security.py
    Adds regression tests that fail if loopback or private-network fetches become reachable again through the HTTP resource ingestion path.

@13ernkastel 13ernkastel marked this pull request as ready for review March 31, 2026 13:08
@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

1 similar comment
@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

@13ernkastel 13ernkastel changed the title Harden HTTP resource ingestion against private-network SSRF security: feat(resources): harden HTTP resource ingestion against private-network SSRF Mar 31, 2026
@13ernkastel 13ernkastel changed the title security: feat(resources): harden HTTP resource ingestion against private-network SSRF fix security: feat(resources): harden HTTP resource ingestion against private-network SSRF Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant