Skip to content

security: validate Hydra _target_ before instantiate() to prevent ACE (CWE-913)#40

Open
Allen930311 wants to merge 1 commit into
SapienzaNLP:mainfrom
Allen930311:fix/hydra-instantiate-arbitrary-code-execution
Open

security: validate Hydra _target_ before instantiate() to prevent ACE (CWE-913)#40
Allen930311 wants to merge 1 commit into
SapienzaNLP:mainfrom
Allen930311:fix/hydra-instantiate-arbitrary-code-execution

Conversation

@Allen930311

Copy link
Copy Markdown

Summary

hydra.utils.instantiate() is called in multiple places with config loaded directly from HuggingFace Hub (a config.yaml fully controlled by the model author). A malicious model can set _target_ to any Python callable — e.g. os.system, builtins.exec, or torch.hub.load pointing to an attacker-controlled GitHub repo — achieving arbitrary code execution on the machine that loads the model.

Affected call sites (all reachable via Relik.from_pretrained("attacker/model")):

File Function Line
relik/inference/utils.py _instantiate_index() ~222
relik/inference/utils.py _instantiate_retriever() ~42
relik/inference/utils.py load_reader() ~371
relik/inference/annotator.py Relik.from_pretrained() ~775
relik/retriever/indexers/base.py BaseDocumentIndex.from_pretrained() ~549

Same vulnerability class as CVE-2025-23304 (NeMo) and CVE-2026-22584 (Uni2TS).

Fix

Added _validate_hydra_target(config) in relik/inference/utils.py that rejects any _target_ not prefixed with relik., and called it before every hydra.utils.instantiate() invocation in all affected files.

_SAFE_HYDRA_PREFIXES = ("relik.",)

def _validate_hydra_target(config: DictConfig) -> None:
    target = OmegaConf.select(config, "_target_", default=None)
    if target is not None and not any(
        target.startswith(p) for p in _SAFE_HYDRA_PREFIXES
    ):
        raise ValueError(
            f"Unsafe Hydra _target_ '{target}': only targets within "
            f"{_SAFE_HYDRA_PREFIXES} are permitted."
        )

Test plan

  • Relik.from_pretrained("legit/model") with a valid relik.* target continues to work
  • Config with _target_: os.system raises ValueError before hydra.utils.instantiate is called
  • Config with _target_: torch.hub.load raises ValueError (blocks the torch.hub.load ACE bypass)
  • All existing tests pass

Security Impact

CVSS 3.1: AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H = 8.8 HIGH
A user who runs Relik.from_pretrained("attacker/model") with a malicious model executes attacker code with the privileges of the Python process.

`hydra.utils.instantiate()` is called with config loaded directly from
HuggingFace Hub (config.yaml supplied by the model author). A malicious
model can set `_target_` to any Python callable (e.g. `os.system`,
`torch.hub.load` with an attacker-controlled hubconf.py), achieving
arbitrary code execution on the loading machine.

Add `_validate_hydra_target()` in `relik/inference/utils.py` which
rejects any `_target_` that does not start with the `relik.` prefix.
Apply the guard before every `hydra.utils.instantiate()` call in:
- `_instantiate_retriever()` (utils.py)
- `_instantiate_index()` (utils.py)
- `load_reader()` (utils.py)
- `Relik.from_pretrained()` (annotator.py)
- `BaseDocumentIndex.from_pretrained()` (retriever/indexers/base.py)

Same vulnerability class as CVE-2025-23304 (NeMo) and CVE-2026-22584 (Uni2TS).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant