Repo: apache/airflow · Labels: kind:bug, provider:google
Title
Google provider base-depends on google-cloud-aiplatform[evaluation], force-installing litellm / scikit-learn for every user
Apache Airflow Provider(s) version
apache-airflow-providers-google 22.0.0 — also confirmed on latest 22.2.0.
What happened
apache-airflow-providers-google declares google-cloud-aiplatform[evaluation] as an unconditional base dependency (no extra marker):
- 22.0.0:
google-cloud-aiplatform[evaluation]>=1.145.0
- 22.2.0:
google-cloud-aiplatform[evaluation]>=1.155.0
The [evaluation] extra transitively installs a full ML/eval stack that has nothing to do with the core Google hooks/operators:
apache-airflow-providers-google
└── google-cloud-aiplatform[evaluation]
├── litellm → huggingface-hub → tokenizers → tqdm
├── scikit-learn
└── ruamel-yaml
Anyone who installs the provider only for, say, GCSHook / BigQueryHook still gets litellm, scikit-learn, and friends.
What you think should happen instead
The Vertex AI evaluation feature set should be opt-in, not forced on all provider users. Either:
- Move it behind a provider extra (e.g.
apache-airflow-providers-google[vertex-eval]), or
- Base-depend on
google-cloud-aiplatform without the [evaluation] extra (users who need eval add it themselves).
Why it matters
- Image-size bloat: litellm + scikit-learn + the HF/tokenizers chain is a large install for users who never touch Vertex evaluation.
- CVE noise:
litellm ships a steady stream of proxy-server CVEs (e.g. auth bypass, sandbox RCE, privilege escalation) that surface in pip-audit for every provider user, even when litellm is never imported and its proxy is never run — each one has to be triaged/ignored downstream.
How to reproduce
pip install apache-airflow-providers-google==22.2.0
pip show litellm scikit-learn # both present, transitively via the [evaluation] extra
Anything else
The evaluation extra of google-cloud-aiplatform is the sole source of litellm/scikit-learn/ruamel-yaml in a provider install. Making it optional would let downstreams that use google-genai (the newer unified SDK) for Vertex AI avoid the old aiplatform eval SDK entirely.
Repo: apache/airflow · Labels: kind:bug, provider:google
Title
Google provider base-depends on
google-cloud-aiplatform[evaluation], force-installing litellm / scikit-learn for every userApache Airflow Provider(s) version
apache-airflow-providers-google22.0.0 — also confirmed on latest 22.2.0.What happened
apache-airflow-providers-googledeclaresgoogle-cloud-aiplatform[evaluation]as an unconditional base dependency (no extra marker):google-cloud-aiplatform[evaluation]>=1.145.0google-cloud-aiplatform[evaluation]>=1.155.0The
[evaluation]extra transitively installs a full ML/eval stack that has nothing to do with the core Google hooks/operators:Anyone who installs the provider only for, say,
GCSHook/BigQueryHookstill getslitellm,scikit-learn, and friends.What you think should happen instead
The Vertex AI evaluation feature set should be opt-in, not forced on all provider users. Either:
apache-airflow-providers-google[vertex-eval]), orgoogle-cloud-aiplatformwithout the[evaluation]extra (users who need eval add it themselves).Why it matters
litellmships a steady stream of proxy-server CVEs (e.g. auth bypass, sandbox RCE, privilege escalation) that surface inpip-auditfor every provider user, even when litellm is never imported and its proxy is never run — each one has to be triaged/ignored downstream.How to reproduce
pip install apache-airflow-providers-google==22.2.0 pip show litellm scikit-learn # both present, transitively via the [evaluation] extraAnything else
The
evaluationextra ofgoogle-cloud-aiplatformis the sole source oflitellm/scikit-learn/ruamel-yamlin a provider install. Making it optional would let downstreams that usegoogle-genai(the newer unified SDK) for Vertex AI avoid the oldaiplatformeval SDK entirely.