fix: harden cotswold address search and data extraction by InertiaUK · Pull Request #2071 · robbrad/UKBinCollectionData

InertiaUK · 2026-05-13T12:40:03Z

Summary

Fix address construction to avoid duplicating postcode when already present in house_number
Click combobox before typing for reliable Salesforce Lightning component focus
Skip search header in dropdown results (first li is "Search", not an address)
Add text-based extraction fallback for Chrome versions where Shadow DOM hides table elements
Reduce unnecessary sleeps (29s to 14s) for faster scraping
Use data-cell-value attribute with get_text fallback for robust date extraction

Testing

Full address path: 19 SUMMERS WAY, MORETON-IN-MARSH, GL56 0GB + GL56 0GB ✅
Short address path: 7 + GL7 1NX ✅
Tested via API end-to-end ✅

Summary by CodeRabbit

Refactor
- Improved the Cotswold District Council waste collection data extraction process with a more robust parsing approach for bin types and collection dates.

- fix address construction: avoid duplicating postcode when it's already in the house_number field - click combobox before typing for reliable lightning component focus - skip search header in dropdown results (first li is "Search", not an address) - add text-based extraction fallback for chrome versions where shadow dom hides table elements - reduce unnecessary sleeps (29s -> 14s) for faster scraping - use data-cell-value attribute with get_text fallback for robust date extraction Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-13T12:40:19Z

Warning

Review limit reached

@InertiaUK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 22 minutes and 12 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ffe2213-3328-448c-8a4e-47b7175eb328

📥 Commits

Reviewing files that changed from the base of the PR and between 47ef293 and a2375ad.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py

📝 Walkthrough

Walkthrough

The PR rewrites the Cotswold District Council waste-collection enquiry integration with a new Selenium interaction flow and HTML parsing strategy. It updates Selenium imports, adds a bin-type collection, introduces date validation and parsing helpers, and refactors the main parse_data method to use combobox/dropdown navigation and targeted HTML extraction with fallback scanning.

Changes

Cotswold District Council waste enquiry rewrite

Layer / File(s)	Summary
Imports and bin type configuration `uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py`	Adds WebDriverWait and expected_conditions imports for Selenium waits, and defines `_BIN_TYPES` collection used as a fallback pattern for identifying bin types in the HTML parsing fallback path.
Date validation and parsing helpers `uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py`	Introduces `_looks_like_date` to recognize date-like strings ("today", "tomorrow", or weekday-prefixed text via regex), and `_parse_date` to convert raw date text to configured format, supporting special keywords and "%a, %d %B" parsing with year rollover for past dates.
Selenium automation and HTML parsing rewrite `uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py`	Rewrites `parse_data` to use combobox input targeting by role, clicks the final dropdown option, advances via Next button with `execute_script`, waits for "Collection day" text, then extracts bins from `tr.slds-hint-parent` rows; falls back to `_BIN_TYPES` pattern scanning when those rows are absent, and formats dates via `_parse_date` helper.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

robbrad/UKBinCollectionData#1726: Both PRs modify Selenium-driven interactions in a council-specific parse_data flow—Cotswold rewrites the dropdown/next/wait-and-parse sequence, while WestOxfordshire changes the combobox option locator and click target—so they share direct code-level changes around Selenium combobox dropdown handling.

Suggested reviewers

dp247

Poem

🐰 A rabbit hops through bin day queries,
Selenium scripts dance with dropdown berries,
Date parsing magic with year-rollover care,
Tomorrow's rubbish in patterns laid bare! 🗑️✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: harden cotswold address search and data extraction' directly aligns with the PR's main objectives: improving address search reliability, preventing postcode duplication, and enhancing data extraction robustness.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-13T12:42:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (a2375ad).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2071   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py`:
- Around line 59-63: The code currently clicks all_opts[0] when only one
li[`@role`='presentation'] exists, but that first element is the non-selectable
"Search" header; modify the branch handling for the single-option case in the
method in CotswoldDistrictCouncil.py so it does not click the header: either
skip clicking when len(all_opts) == 1 (no-op) or raise/assert that a real
address option exists before clicking; specifically update the logic around the
all_opts variable and the XPath lookup so only a real address option (e.g.
all_opts[-1] when len>1) is clicked and the single-item case is ignored or fails
fast.
- Around line 136-141: In CotswoldDistrictCouncil (the method that parses
raw_date into `parsed`), the year rollover check uses a full datetime so a
parsed date at midnight is treated as earlier than "today" and gets bumped;
change the comparison to date-only granularity (e.g., compare parsed.date() with
datetime.now().date() or convert `parsed` and the current time to date objects)
before deciding to replace the year, so a collection that falls on today is not
advanced to next year.
- Around line 104-114: The fallback loop that scans lines and calls
self._parse_date(raw_date, current_year) can raise ValueError for malformed
strings (so a single bad line aborts the whole scrape); modify the loop around
the data["bins"].append call to wrap the call to self._parse_date in a
try/except (catch ValueError and continue/skip that entry), and consider
tightening the predicate in self._looks_like_date to require a day+day/month
pattern before attempting parse so only plausibly parsable strings reach
_parse_date; update references in the fallback loop (lines, raw_date, line,
data["bins"]) and the methods _looks_like_date and _parse_date accordingly.
- Around line 102-103: The bare "except: continue" in CotswoldDistrictCouncil's
parsing loop swallows parse failures (e.g. from _parse_date) so rows are
silently dropped; replace it with explicit handling: catch only AttributeError
and ValueError, log the offending row (e.g., the local "row" or "tr" HTML
element) and the exception (using logging.exception or the class logger), and
then re-raise the exception (or allow it to propagate) so unexpected format
changes fail loudly instead of being ignored. Ensure you keep references to
_parse_date in the same loop and remove the blanket except to avoid hiding other
errors.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 256f248f-91e2-4367-b6db-5c177ed09b06

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 47ef293.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py

coderabbitai · 2026-05-13T12:43:17Z

+            all_opts = driver.find_elements(By.XPATH, "//li[@role='presentation']")
+            if len(all_opts) > 1:
+                all_opts[-1].click()
+            else:
+                all_opts[0].click()


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Single-option branch likely clicks the "Search" header.

According to the PR description, the first li[@ROLE='presentation'] is the "Search" entry, not an address. The else branch here clicks all_opts[0], i.e. exactly that header, so when the user types something that returns no address matches (only the header remains) the form will not advance. Either skip the click entirely in that case or assert at least one real address option is present.

🛠️ Suggested guard

- all_opts = driver.find_elements(By.XPATH, "//li[`@role`='presentation']") - if len(all_opts) > 1: - all_opts[-1].click() - else: - all_opts[0].click() + all_opts = driver.find_elements(By.XPATH, "//li[`@role`='presentation']") + if len(all_opts) > 1: + # First li is the "Search" header; pick the last actual address result. + all_opts[-1].click() + else: + raise ValueError( + "No address results returned for the supplied paon/postcode" + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

all_opts = driver.find_elements(By.XPATH, "//li[@role='presentation']")

if len(all_opts) > 1:

all_opts[-1].click()

else:

all_opts[0].click()

all_opts = driver.find_elements(By.XPATH, "//li[`@role`='presentation']")

if len(all_opts) > 1:

# First li is the "Search" header; pick the last actual address result.

all_opts[-1].click()

else:

raise ValueError(

"No address results returned for the supplied paon/postcode"

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py` around lines 59 - 63, The code currently clicks all_opts[0] when only one li[`@role`='presentation'] exists, but that first element is the non-selectable "Search" header; modify the branch handling for the single-option case in the method in CotswoldDistrictCouncil.py so it does not click the header: either skip clicking when len(all_opts) == 1 (no-op) or raise/assert that a real address option exists before clicking; specifically update the logic around the all_opts variable and the XPath lookup so only a real address option (e.g. all_opts[-1] when len>1) is clicked and the single-item case is ignored or fails fast.

coderabbitai · 2026-05-13T12:43:17Z

+                    except Exception:
+                        continue


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Bare except … : continue silently drops bins on parse failures.

Any ValueError raised by _parse_date (e.g. unexpected date format) or missing attribute on th/td is swallowed here, so the bin is silently omitted from the output and a council formatting change will look like a partial-but-successful scrape. At minimum, log the offending row; preferably let unexpected formats raise so regressions are caught early.

♻️ Suggested change

- except Exception: - continue + except (ValueError, AttributeError) as row_err: + # Surface unexpected row shapes instead of silently skipping. + print(f"Skipping row due to parse error: {row_err!r}") + continue

Based on learnings: "when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except Exception:

continue

except (ValueError, AttributeError) as row_err:

# Surface unexpected row shapes instead of silently skipping.

print(f"Skipping row due to parse error: {row_err!r}")

continue

🧰 Tools

🪛 Ruff (0.15.12)

[error] 102-103: try-except-continue detected, consider logging the exception

(S112)

[warning] 102-102: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py` around lines 102 - 103, The bare "except: continue" in CotswoldDistrictCouncil's parsing loop swallows parse failures (e.g. from _parse_date) so rows are silently dropped; replace it with explicit handling: catch only AttributeError and ValueError, log the offending row (e.g., the local "row" or "tr" HTML element) and the exception (using logging.exception or the class logger), and then re-raise the exception (or allow it to propagate) so unexpected format changes fail loudly instead of being ignored. Ensure you keep references to _parse_date in the same loop and remove the blanket except to avoid hiding other errors.

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

fix: address CodeRabbit review feedback

a2375ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden cotswold address search and data extraction#2071

fix: harden cotswold address search and data extraction#2071
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:fix/cotswold-address-and-extraction

InertiaUK commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 13, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 13, 2026

Uh oh!

coderabbitai Bot May 13, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

InertiaUK commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

InertiaUK commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading

codecov Bot commented May 13, 2026 •

edited

Loading