[SPARK-56763][SPARK-56535][INFRA][3.5] Recover branch-3.5 CI#55740
[SPARK-56763][SPARK-56535][INFRA][3.5] Recover branch-3.5 CI#55740sarutak wants to merge 6 commits intoapache:branch-3.5from
Conversation
Base image build
Base image buildBase image build
|
Once |
|
I don't quite understand Why pip constraint? Are these package needed by some other packages? Can we just pin the package we need to a specific version? It's an infra docker. |
|
|
Usually the reason CI broke is that a new version of something used some new stuff. It's an old branch so it has to be working at certain point of time. Could you show some evidence to support your fix? Like why the specific version |
|
I'll pin
Regarding benigetscipy uses pythran (
beniget 0.4.2 and later added code to handle Python 3.10 match statement AST nodes (
Upgrading gast to 0.5.4 or later is not possible because it violates pythran's constraint. Pinning beniget==0.4.1 is the only solution. Regarding pyproject-metadatascipy's build system, meson-python, uses pyproject-metadata for metadata processing. pyproject-metadata 0.9.0 introduced breaking changes for PEP 639 support and implicitly requires The older version of meson-python used in the Dockerfile environment is not compatible with the API changes in pyproject-metadata 0.9.0, causing the build to fail at the metadata generation stage. Pinning pyproject-metadata==0.8.1 maintains compatibility with the existing meson-python version.
|
### What changes were proposed in this pull request? Add `apt-get update` before `apt-get install` for R-related dev libraries to avoid stale package index causing 404 errors. ### Why are the changes needed? The `apt-get install` for R dev dependencies (libtiff5-dev, libharfbuzz-dev, etc.) is in a separate RUN layer from the earlier `apt-get update`, so when the package index becomes stale (packages are superseded on the Ubuntu archive), the install fails with 404. ### Does this PR introduce *any* user-facing change? No. ### How was this patch tested? CI. ### Was this patch authored or co-authored using generative AI tooling? No.
…ran for PyPy 3.8 compatibility
8531ecb to
542b7ea
Compare
Base image build
What changes were proposed in this pull request?
Fix the broken
Base image buildCI workflow onbranch-3.5by addressing multiple issues indev/infra/Dockerfileandpython/mypy.ini:Stale apt package index: Add
apt-get updatebefore eachapt-get installthat runs in a separateRUNlayer from the initial update, preventing 404 errors when packages are superseded on the Ubuntu 20.04 archive.EOL Python get-pip.py URLs: Use version-specific
get-pip.pyURLs (pip/3.9/get-pip.py,pip/3.8/get-pip.py) since the generic endpoint no longer supports Python 3.8/3.9.PyPy 3.8 build dependency conflicts: Pin
beniget==0.4.1andpyproject-metadata==0.8.1via pip constraints to resolve incompatibilities in the scipy build chain:beniget>=0.4.2referencesgast.MatchStarwhich doesn't exist ingast<=0.5.3(required by pythran). See beniget#108.pyproject-metadata>=0.9.0introduced breaking PEP 639 changes incompatible with the older meson-python in this environment. See meson-python changelog.mypy failures from transitive dependencies: Add
follow_imports = skipforpydanticandsqlalchemyinpython/mypy.ini. These packages are not used by Spark directly but are pulled in transitively viamlflow(mlflow→sqlalchemy,mlflow→mlflow-tracing→pydantic). Newer versions of these packages ship type annotations incompatible with the Python 3.9 + older mypy version in this CI environment, causing spurious type-check errors on code Spark doesn't own.Add
libuv1-devfor R: Installlibuv1-devwhich is required by the Rhttpuv/fspackage (a transitive dependency ofrmarkdownandtestthat). Without it, the R packagefsfails to compile from source when the Docker layer cache is invalidated. This is the same fix applied to master and other branches in SPARK-56540 / #55414.Pin
plotly<6.0: Plotly 6.0 introduced breaking changes in datetime handling that cause PySpark plot-related test failures. This is the same fix applied to master in SPARK-51143 / #49863.Misc: Update
FULL_REFRESH_DATEto force a full image rebuild.Why are the changes needed?
The
branch-3.5CI has been broken for an extended period. The Ubuntu 20.04 (focal) base image is aging, and upstream package repositories have rotated or removed packages that the Dockerfile previously fetched without version pins. Multiple unrelated failures compound to make the Docker image unbuildable.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Base image buildworkflow passes on GitHub Actions.docker build dev/infrasucceeds locally.Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Opus 4.6