Skip to content

feat: implement PHIX validation for schools and daycares#152

Open
eswarchandravidyasagar wants to merge 23 commits into
mainfrom
feat/PHIX-validation
Open

feat: implement PHIX validation for schools and daycares#152
eswarchandravidyasagar wants to merge 23 commits into
mainfrom
feat/PHIX-validation

Conversation

@eswarchandravidyasagar
Copy link
Copy Markdown
Collaborator

  • Added PHIX validation module to validate school/daycare names against the official PHIX reference list.
  • Integrated validation into the preprocessing step in orchestrator.py.
  • Configurable options added to parameters.yaml for enabling validation and handling unmatched facilities.
  • Created unit tests for the validation module covering various scenarios.
  • Added documentation for the validation plan and updated the plans directory.

- Added PHIX validation module to validate school/daycare names against the official PHIX reference list.
- Integrated validation into the preprocessing step in orchestrator.py.
- Configurable options added to parameters.yaml for enabling validation and handling unmatched facilities.
- Created unit tests for the validation module covering various scenarios.
- Added documentation for the validation plan and updated the plans directory.
@jangevaare
Copy link
Copy Markdown
Member

We don't have redistribution permission on the phix reference list file, so that will need to be removed and commits squashed. It'll also blow up the size of this repository and its history.

Users will have to BYO phix reference list

Comment thread config/parameters.yaml Outdated
# Path to PHIX reference Excel file (relative to project root)
reference_file: PHIX Reference Lists v5.2 - 2025Jun30.xlsx
# Minimum fuzzy match score (0-100) to consider a match
match_threshold: 85
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required. It should be exact? This could enable bypass of the exact issues we'd like to protect against like similarly named schools being accidentally selected when a panorama user creates a forecast query

@jangevaare
Copy link
Copy Markdown
Member

We likely need a mapping file that converts the PHU name from phix reference document, to standardized PHU acronyms (which should be enforced for template folders, etc)

We also may need to allow functionality for this map to be many-to-one, in the case of PHUs which have merged since this was last updated.

@jangevaare
Copy link
Copy Markdown
Member

I know in this case that this is important to run early in pipeline before other processing, but I wonder also if we can emit something in the per-pdf validation log regarding valid facility being used for the target PHU?

- Updated `validate_phix.py` to remove fuzzy matching and implement strict exact matching for facility names against the PHIX reference list.
- Introduced PHU alias mapping to restrict validation to specific Public Health Units (PHUs) using a YAML configuration file.
- Enhanced the `validate_facilities` function to support PHU scoping and improved error handling for unmatched facilities.
- Updated tests to reflect changes in matching strategy and added new tests for PHU alias mapping and validation behavior.
- Modified documentation to clarify the new validation process and configuration options.
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 75.06297% with 99 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pipeline/orchestrator.py 16.66% 38 Missing and 2 partials ⚠️
pipeline/validate_phix.py 86.89% 19 Missing and 19 partials ⚠️
pipeline/preprocess.py 47.05% 13 Missing and 5 partials ⚠️
pipeline/validate_pdfs.py 88.00% 2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

eswarchandravidyasagar and others added 20 commits February 5, 2026 16:22
… column prefix, support multiple facility columns
…ch when PHIX ID is verified, otherwise inexact match
Bumps the minor-and-patch group with 4 updates in the / directory: [pypdf](https://github.com/py-pdf/pypdf), [babel](https://github.com/python-babel/babel), [ty](https://github.com/astral-sh/ty) and [git-changelog](https://github.com/pawamoy/git-changelog).


Updates `pypdf` from 6.6.0 to 6.6.2
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](py-pdf/pypdf@6.6.0...6.6.2)

Updates `babel` from 2.17.0 to 2.18.0
- [Release notes](https://github.com/python-babel/babel/releases)
- [Changelog](https://github.com/python-babel/babel/blob/master/CHANGES.rst)
- [Commits](python-babel/babel@v2.17.0...v2.18.0)

Updates `ty` from 0.0.12 to 0.0.14
- [Release notes](https://github.com/astral-sh/ty/releases)
- [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ty@0.0.12...0.0.14)

Updates `git-changelog` from 2.7.0 to 2.7.1
- [Release notes](https://github.com/pawamoy/git-changelog/releases)
- [Changelog](https://github.com/pawamoy/git-changelog/blob/main/CHANGELOG.md)
- [Commits](pawamoy/git-changelog@2.7.0...2.7.1)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.6.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: babel
  dependency-version: 2.18.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: minor-and-patch
- dependency-name: ty
  dependency-version: 0.0.14
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: git-changelog
  dependency-version: 2.7.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor-and-patch group with 3 updates in the / directory: [pypdf](https://github.com/py-pdf/pypdf), [pillow](https://github.com/python-pillow/Pillow) and [ty](https://github.com/astral-sh/ty).


Updates `pypdf` from 6.6.2 to 6.7.0
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](py-pdf/pypdf@6.6.2...6.7.0)

Updates `pillow` from 12.1.0 to 12.1.1
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](python-pillow/Pillow@12.1.0...12.1.1)

Updates `ty` from 0.0.14 to 0.0.17
- [Release notes](https://github.com/astral-sh/ty/releases)
- [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ty@0.0.14...0.0.17)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: minor-and-patch
- dependency-name: pillow
  dependency-version: 12.1.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: ty
  dependency-version: 0.0.17
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor-and-patch group with 2 updates: [pypdf](https://github.com/py-pdf/pypdf) and [ty](https://github.com/astral-sh/ty).


Updates `pypdf` from 6.7.0 to 6.7.2
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](py-pdf/pypdf@6.7.0...6.7.2)

Updates `ty` from 0.0.17 to 0.0.18
- [Release notes](https://github.com/astral-sh/ty/releases)
- [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ty@0.0.17...0.0.18)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.7.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: ty
  dependency-version: 0.0.18
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor-and-patch group with 4 updates in the / directory: [pypdf](https://github.com/py-pdf/pypdf), [ty](https://github.com/astral-sh/ty), [git-changelog](https://github.com/pawamoy/git-changelog) and [pypandoc](https://github.com/JessicaTegner/pypandoc).


Updates `pypdf` from 6.7.2 to 6.9.0
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](py-pdf/pypdf@6.7.2...6.9.0)

Updates `ty` from 0.0.18 to 0.0.23
- [Release notes](https://github.com/astral-sh/ty/releases)
- [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ty@0.0.18...0.0.23)

Updates `git-changelog` from 2.7.1 to 2.9.0
- [Release notes](https://github.com/pawamoy/git-changelog/releases)
- [Changelog](https://github.com/pawamoy/git-changelog/blob/main/CHANGELOG.md)
- [Commits](pawamoy/git-changelog@2.7.1...2.9.0)

Updates `pypandoc` from 1.16.2 to 1.17
- [Release notes](https://github.com/JessicaTegner/pypandoc/releases)
- [Changelog](https://github.com/JessicaTegner/pypandoc/blob/master/release.md)
- [Commits](JessicaTegner/pypandoc@v1.16.2...v1.17)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.9.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: minor-and-patch
- dependency-name: ty
  dependency-version: 0.0.23
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: git-changelog
  dependency-version: 2.9.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: minor-and-patch
- dependency-name: pypandoc
  dependency-version: '1.17'
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor-and-patch group with 4 updates: [pypdf](https://github.com/py-pdf/pypdf), [pytest-cov](https://github.com/pytest-dev/pytest-cov), [ty](https://github.com/astral-sh/ty) and [git-changelog](https://github.com/pawamoy/git-changelog).


Updates `pypdf` from 6.9.0 to 6.9.1
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](py-pdf/pypdf@6.9.0...6.9.1)

Updates `pytest-cov` from 7.0.0 to 7.1.0
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](pytest-dev/pytest-cov@v7.0.0...v7.1.0)

Updates `ty` from 0.0.23 to 0.0.24
- [Release notes](https://github.com/astral-sh/ty/releases)
- [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ty@0.0.23...0.0.24)

Updates `git-changelog` from 2.9.0 to 2.9.2
- [Release notes](https://github.com/pawamoy/git-changelog/releases)
- [Changelog](https://github.com/pawamoy/git-changelog/blob/main/CHANGELOG.md)
- [Commits](pawamoy/git-changelog@2.9.0...2.9.2)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.9.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: pytest-cov
  dependency-version: 7.1.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: minor-and-patch
- dependency-name: ty
  dependency-version: 0.0.24
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: git-changelog
  dependency-version: 2.9.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor-and-patch group with 3 updates: [pypdf](https://github.com/py-pdf/pypdf), [ty](https://github.com/astral-sh/ty) and [git-changelog](https://github.com/pawamoy/git-changelog).


Updates `pypdf` from 6.9.1 to 6.9.2
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](py-pdf/pypdf@6.9.1...6.9.2)

Updates `ty` from 0.0.24 to 0.0.26
- [Release notes](https://github.com/astral-sh/ty/releases)
- [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ty@0.0.24...0.0.26)

Updates `git-changelog` from 2.9.2 to 2.9.3
- [Release notes](https://github.com/pawamoy/git-changelog/releases)
- [Changelog](https://github.com/pawamoy/git-changelog/blob/main/CHANGELOG.md)
- [Commits](pawamoy/git-changelog@2.9.2...2.9.3)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.9.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: ty
  dependency-version: 0.0.26
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: git-changelog
  dependency-version: 2.9.3
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor-and-patch group with 5 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [pypdf](https://github.com/py-pdf/pypdf) | `6.9.2` | `6.10.0` |
| [pillow](https://github.com/python-pillow/Pillow) | `12.1.1` | `12.2.0` |
| [rapidfuzz](https://github.com/rapidfuzz/RapidFuzz) | `3.14.3` | `3.14.5` |
| [pytest](https://github.com/pytest-dev/pytest) | `9.0.2` | `9.0.3` |
| [ty](https://github.com/astral-sh/ty) | `0.0.26` | `0.0.29` |



Updates `pypdf` from 6.9.2 to 6.10.0
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](py-pdf/pypdf@6.9.2...6.10.0)

Updates `pillow` from 12.1.1 to 12.2.0
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](python-pillow/Pillow@12.1.1...12.2.0)

Updates `rapidfuzz` from 3.14.3 to 3.14.5
- [Release notes](https://github.com/rapidfuzz/RapidFuzz/releases)
- [Changelog](https://github.com/rapidfuzz/RapidFuzz/blob/main/CHANGELOG.rst)
- [Commits](rapidfuzz/RapidFuzz@v3.14.3...v3.14.5)

Updates `pytest` from 9.0.2 to 9.0.3
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@9.0.2...9.0.3)

Updates `ty` from 0.0.26 to 0.0.29
- [Release notes](https://github.com/astral-sh/ty/releases)
- [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ty@0.0.26...0.0.29)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.10.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: minor-and-patch
- dependency-name: pillow
  dependency-version: 12.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: minor-and-patch
- dependency-name: rapidfuzz
  dependency-version: 3.14.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: pytest
  dependency-version: 9.0.3
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: ty
  dependency-version: 0.0.29
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5 to 6.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v5...v6)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Updates the requirements on [setuptools](https://github.com/pypa/setuptools) to permit the latest version.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](pypa/setuptools@v45.0.0...v82.0.1)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-version: 82.0.1
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps the minor-and-patch group with 3 updates in the / directory: [pypdf](https://github.com/py-pdf/pypdf), [pre-commit](https://github.com/pre-commit/pre-commit) and [ty](https://github.com/astral-sh/ty).


Updates `pypdf` from 6.10.0 to 6.10.2
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](py-pdf/pypdf@6.10.0...6.10.2)

Updates `pre-commit` from 4.5.1 to 4.6.0
- [Release notes](https://github.com/pre-commit/pre-commit/releases)
- [Changelog](https://github.com/pre-commit/pre-commit/blob/main/CHANGELOG.md)
- [Commits](pre-commit/pre-commit@v4.5.1...v4.6.0)

Updates `ty` from 0.0.29 to 0.0.32
- [Release notes](https://github.com/astral-sh/ty/releases)
- [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ty@0.0.29...0.0.32)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.10.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: pre-commit
  dependency-version: 4.6.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: minor-and-patch
- dependency-name: ty
  dependency-version: 0.0.32
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants