fix: harden cotswold address search and data extraction#2071
Conversation
- fix address construction: avoid duplicating postcode when it's already in the house_number field - click combobox before typing for reliable lightning component focus - skip search header in dropdown results (first li is "Search", not an address) - add text-based extraction fallback for chrome versions where shadow dom hides table elements - reduce unnecessary sleeps (29s -> 14s) for faster scraping - use data-cell-value attribute with get_text fallback for robust date extraction Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
Your plan currently allows 2 reviews/hour. Refill in 22 minutes and 12 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR rewrites the Cotswold District Council waste-collection enquiry integration with a new Selenium interaction flow and HTML parsing strategy. It updates Selenium imports, adds a bin-type collection, introduces date validation and parsing helpers, and refactors the main parse_data method to use combobox/dropdown navigation and targeted HTML extraction with fallback scanning. ChangesCotswold District Council waste enquiry rewrite
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2071 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py`:
- Around line 59-63: The code currently clicks all_opts[0] when only one
li[`@role`='presentation'] exists, but that first element is the non-selectable
"Search" header; modify the branch handling for the single-option case in the
method in CotswoldDistrictCouncil.py so it does not click the header: either
skip clicking when len(all_opts) == 1 (no-op) or raise/assert that a real
address option exists before clicking; specifically update the logic around the
all_opts variable and the XPath lookup so only a real address option (e.g.
all_opts[-1] when len>1) is clicked and the single-item case is ignored or fails
fast.
- Around line 136-141: In CotswoldDistrictCouncil (the method that parses
raw_date into `parsed`), the year rollover check uses a full datetime so a
parsed date at midnight is treated as earlier than "today" and gets bumped;
change the comparison to date-only granularity (e.g., compare parsed.date() with
datetime.now().date() or convert `parsed` and the current time to date objects)
before deciding to replace the year, so a collection that falls on today is not
advanced to next year.
- Around line 104-114: The fallback loop that scans lines and calls
self._parse_date(raw_date, current_year) can raise ValueError for malformed
strings (so a single bad line aborts the whole scrape); modify the loop around
the data["bins"].append call to wrap the call to self._parse_date in a
try/except (catch ValueError and continue/skip that entry), and consider
tightening the predicate in self._looks_like_date to require a day+day/month
pattern before attempting parse so only plausibly parsable strings reach
_parse_date; update references in the fallback loop (lines, raw_date, line,
data["bins"]) and the methods _looks_like_date and _parse_date accordingly.
- Around line 102-103: The bare "except: continue" in CotswoldDistrictCouncil's
parsing loop swallows parse failures (e.g. from _parse_date) so rows are
silently dropped; replace it with explicit handling: catch only AttributeError
and ValueError, log the offending row (e.g., the local "row" or "tr" HTML
element) and the exception (using logging.exception or the class logger), and
then re-raise the exception (or allow it to propagate) so unexpected format
changes fail loudly instead of being ignored. Ensure you keep references to
_parse_date in the same loop and remove the blanket except to avoid hiding other
errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 256f248f-91e2-4367-b6db-5c177ed09b06
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py
| all_opts = driver.find_elements(By.XPATH, "//li[@role='presentation']") | ||
| if len(all_opts) > 1: | ||
| all_opts[-1].click() | ||
| else: | ||
| all_opts[0].click() |
There was a problem hiding this comment.
Single-option branch likely clicks the "Search" header.
According to the PR description, the first li[@ROLE='presentation'] is the "Search" entry, not an address. The else branch here clicks all_opts[0], i.e. exactly that header, so when the user types something that returns no address matches (only the header remains) the form will not advance. Either skip the click entirely in that case or assert at least one real address option is present.
🛠️ Suggested guard
- all_opts = driver.find_elements(By.XPATH, "//li[`@role`='presentation']")
- if len(all_opts) > 1:
- all_opts[-1].click()
- else:
- all_opts[0].click()
+ all_opts = driver.find_elements(By.XPATH, "//li[`@role`='presentation']")
+ if len(all_opts) > 1:
+ # First li is the "Search" header; pick the last actual address result.
+ all_opts[-1].click()
+ else:
+ raise ValueError(
+ "No address results returned for the supplied paon/postcode"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| all_opts = driver.find_elements(By.XPATH, "//li[@role='presentation']") | |
| if len(all_opts) > 1: | |
| all_opts[-1].click() | |
| else: | |
| all_opts[0].click() | |
| all_opts = driver.find_elements(By.XPATH, "//li[`@role`='presentation']") | |
| if len(all_opts) > 1: | |
| # First li is the "Search" header; pick the last actual address result. | |
| all_opts[-1].click() | |
| else: | |
| raise ValueError( | |
| "No address results returned for the supplied paon/postcode" | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py`
around lines 59 - 63, The code currently clicks all_opts[0] when only one
li[`@role`='presentation'] exists, but that first element is the non-selectable
"Search" header; modify the branch handling for the single-option case in the
method in CotswoldDistrictCouncil.py so it does not click the header: either
skip clicking when len(all_opts) == 1 (no-op) or raise/assert that a real
address option exists before clicking; specifically update the logic around the
all_opts variable and the XPath lookup so only a real address option (e.g.
all_opts[-1] when len>1) is clicked and the single-item case is ignored or fails
fast.
| except Exception: | ||
| continue |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Bare except … : continue silently drops bins on parse failures.
Any ValueError raised by _parse_date (e.g. unexpected date format) or missing attribute on th/td is swallowed here, so the bin is silently omitted from the output and a council formatting change will look like a partial-but-successful scrape. At minimum, log the offending row; preferably let unexpected formats raise so regressions are caught early.
♻️ Suggested change
- except Exception:
- continue
+ except (ValueError, AttributeError) as row_err:
+ # Surface unexpected row shapes instead of silently skipping.
+ print(f"Skipping row due to parse error: {row_err!r}")
+ continueBased on learnings: "when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception: | |
| continue | |
| except (ValueError, AttributeError) as row_err: | |
| # Surface unexpected row shapes instead of silently skipping. | |
| print(f"Skipping row due to parse error: {row_err!r}") | |
| continue |
🧰 Tools
🪛 Ruff (0.15.12)
[error] 102-103: try-except-continue detected, consider logging the exception
(S112)
[warning] 102-102: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/CotswoldDistrictCouncil.py`
around lines 102 - 103, The bare "except: continue" in CotswoldDistrictCouncil's
parsing loop swallows parse failures (e.g. from _parse_date) so rows are
silently dropped; replace it with explicit handling: catch only AttributeError
and ValueError, log the offending row (e.g., the local "row" or "tr" HTML
element) and the exception (using logging.exception or the class logger), and
then re-raise the exception (or allow it to propagate) so unexpected format
changes fail loudly instead of being ignored. Ensure you keep references to
_parse_date in the same loop and remove the blanket except to avoid hiding other
errors.
Summary
liis "Search", not an address)data-cell-valueattribute withget_textfallback for robust date extractionTesting
19 SUMMERS WAY, MORETON-IN-MARSH, GL56 0GB+GL56 0GB✅7+GL7 1NX✅Summary by CodeRabbit