fix: make South Lanarkshire schedule parsing defensive#2077
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughModule-level logging and defensive parsing were added to SouthLanarkshireCouncil.parse_data: header/detail cells are extracted and validated, rows with missing cells or "Area" are skipped, td text is normalized once, cadence is derived with a guarded split, and the date regex runs against the full normalized td text. ChangesSouth Lanarkshire schedule parsing
Estimated Code Review Effort🎯 2 (Simple) | ⏱️ ~8 minutes Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py`:
- Line 87: The current defensive split that sets schedule_cadence =
td_text.split(" ", 1)[1] if " " in td_text else "" should be changed to validate
and fail loudly: in the parsing code that assigns schedule_cadence from td_text
(look for the schedule_cadence variable and the method that processes td_text in
SouthLanarkshireCouncil), check for the expected space separator and either
raise a ValueError with a clear message including td_text when the format is
unexpected, or at minimum emit a logger.warning indicating malformed schedule
text; then perform the split (td_text.split(" ", 1)[1]) only after validation so
downstream checks like "Fortnightly" in schedule_cadence are not silently
skipped.
- Around line 73-77: Add a logger and emit warnings/debugs when rows are
skipped: import logging and create logger = logging.getLogger(__name__), then in
the parsing loop where th_cell and td_cell are checked (the block around th_cell
= row.find("th") / td_cell = row.find("td")), replace the silent continue with
logger.warning("Skipping row: missing th or td cell") before continue; likewise,
where schedule_type == "Area" is silently skipped, add logger.debug("Skipping
area header row") before that continue. Reference these changes in the
SouthLanarkshireCouncil parser method where th_cell, td_cell and schedule_type
are used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dddbb33e-3700-4b77-9da3-0cd5b6d38d37
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py
|
Merged into the June 2026 release PR #2110. Thanks! |
Closes #2018.
Summary
This makes the South Lanarkshire Council parser more defensive when reading the collection schedule table.
The current parser assumes that every schedule has at least two space-separated parts:
row.find("td").get_text().strip().split(" ")[1]
For some South Lanarkshire rows this can raise:
IndexError: list index out of range
This PR stores the and cells, skips malformed rows, and safely defaults schedule_cadence to an empty string when there is no second token.
Testing
poetry run pytest uk_bin_collection/tests/step_defs/ -k "SouthLanarkshireCouncil"
Summary by CodeRabbit