Skip to content

Google provider base-depends on google-cloud-aiplatform[evaluation], force-installing litellm/scikit-learn for every user #69323

Description

@brunohelloprint

Repo: apache/airflow · Labels: kind:bug, provider:google


Title

Google provider base-depends on google-cloud-aiplatform[evaluation], force-installing litellm / scikit-learn for every user

Apache Airflow Provider(s) version

apache-airflow-providers-google 22.0.0 — also confirmed on latest 22.2.0.

What happened

apache-airflow-providers-google declares google-cloud-aiplatform[evaluation] as an unconditional base dependency (no extra marker):

  • 22.0.0: google-cloud-aiplatform[evaluation]>=1.145.0
  • 22.2.0: google-cloud-aiplatform[evaluation]>=1.155.0

The [evaluation] extra transitively installs a full ML/eval stack that has nothing to do with the core Google hooks/operators:

apache-airflow-providers-google
└── google-cloud-aiplatform[evaluation]
    ├── litellm            → huggingface-hub → tokenizers → tqdm
    ├── scikit-learn
    └── ruamel-yaml

Anyone who installs the provider only for, say, GCSHook / BigQueryHook still gets litellm, scikit-learn, and friends.

What you think should happen instead

The Vertex AI evaluation feature set should be opt-in, not forced on all provider users. Either:

  1. Move it behind a provider extra (e.g. apache-airflow-providers-google[vertex-eval]), or
  2. Base-depend on google-cloud-aiplatform without the [evaluation] extra (users who need eval add it themselves).

Why it matters

  • Image-size bloat: litellm + scikit-learn + the HF/tokenizers chain is a large install for users who never touch Vertex evaluation.
  • CVE noise: litellm ships a steady stream of proxy-server CVEs (e.g. auth bypass, sandbox RCE, privilege escalation) that surface in pip-audit for every provider user, even when litellm is never imported and its proxy is never run — each one has to be triaged/ignored downstream.

How to reproduce

pip install apache-airflow-providers-google==22.2.0
pip show litellm scikit-learn      # both present, transitively via the [evaluation] extra

Anything else

The evaluation extra of google-cloud-aiplatform is the sole source of litellm/scikit-learn/ruamel-yaml in a provider install. Making it optional would let downstreams that use google-genai (the newer unified SDK) for Vertex AI avoid the old aiplatform eval SDK entirely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions