Skip to content

fix: MPI runs for smoke tests with surfaceGenerator#4068

Open
jhuang2601 wants to merge 7 commits into
developfrom
fix/jhuang/MPI_smokeTests
Open

fix: MPI runs for smoke tests with surfaceGenerator#4068
jhuang2601 wants to merge 7 commits into
developfrom
fix/jhuang/MPI_smokeTests

Conversation

@jhuang2601
Copy link
Copy Markdown
Contributor

@jhuang2601 jhuang2601 commented May 28, 2026

Another attempt to fix CI failures for MPI runs of two newly added smoke tests using surfaceGenerator, see PR#4062

The original code is functionally correct but generates results ordering which is not strictly deterministic across runs.

  • Enable mpiCommOrder="1"in the XML inputs
  • Replace pointer-based ordering of elemLocations with a comparator using deterministic region/subregion indices

@jhuang2601 jhuang2601 requested a review from castelletto1 as a code owner May 28, 2026 21:25
@jhuang2601 jhuang2601 requested a review from bd713 May 28, 2026 21:26
@jhuang2601 jhuang2601 self-assigned this May 28, 2026
@jhuang2601 jhuang2601 added type: bug Something isn't working ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI ci: run code coverage enables running of the code coverage CI jobs flag: requires rebaseline Requires rebaseline branch in integratedTests labels May 28, 2026
@jhuang2601 jhuang2601 requested a review from OmarDuran May 29, 2026 13:53
@bd713 bd713 requested a review from rrsettgast as a code owner May 29, 2026 14:43
@rrsettgast
Copy link
Copy Markdown
Contributor

@jhuang2601 was mpiCommOrder="1" insufficient to solve the ordering issue?

@jhuang2601
Copy link
Copy Markdown
Contributor Author

jhuang2601 commented Jun 1, 2026

Right @rrsettgast, it was not effective in resolving the issue for these two examples.
Hence Bertrand's fix with deterministic region/subregion indices.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make MPI smoke-test results involving SurfaceGenerator deterministic so CI baselines don’t fluctuate between runs. It does this by (1) forcing consistent MPI communication ordering in the relevant XML inputs and (2) ensuring deterministic iteration/ordering for the element-location map used during surface generation.

Changes:

  • Enable deterministic MPI message handling for the two affected smoke tests by setting mpiCommOrder="1" in their XML inputs.
  • Replace pointer-address-based ordering of elemLocations with a deterministic comparator based on element region/subregion indices.
  • Update the ALM ATS decks to run both serial and MPI partitions, and bump integrated test baselines accordingly.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/coreComponents/physicsSolvers/surfaceGeneration/SurfaceGenerator.hpp Introduces a deterministic comparator and an ElemLocMapType alias; updates function signatures to use it.
src/coreComponents/physicsSolvers/surfaceGeneration/SurfaceGenerator.cpp Switches elemLocations to the deterministic map type and updates insertion/iteration accordingly.
inputFiles/poromechanicsFractures/Contact/AugmentedLagrangianMultipliers/AugmentedLagrangianMultipliers.ats Adds an MPI partition configuration for the two smoke tests.
inputFiles/poromechanicsFractures/Contact/AugmentedLagrangianMultipliers/ALM_inclinedFault_ISG_smoke.xml Enables mpiCommOrder="1" for deterministic MPI comm ordering in the test.
inputFiles/poromechanicsFractures/Contact/AugmentedLagrangianMultipliers/ALM_inclinedFault_ISG_DruckerPrager_smoke.xml Enables mpiCommOrder="1" for deterministic MPI comm ordering in the test.
BASELINE_NOTES.md Records the baseline update for PR #4068.
.integrated_tests.yaml Points integrated tests to the new PR #4068 baseline tarball.

Copy link
Copy Markdown
Contributor

@jafranc jafranc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever fix ! I can't think of any tie-blocking insertion configuration. 👍

Comment thread src/coreComponents/physicsSolvers/surfaceGeneration/SurfaceGenerator.hpp Outdated
…erator.hpp

Co-authored-by: Jacques Franc <49998870+jafranc@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci: run code coverage enables running of the code coverage CI jobs ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI flag: requires rebaseline Requires rebaseline branch in integratedTests type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants