fix(WirralCouncil): Rewrite for new LocalGov Drupal site#2055
fix(WirralCouncil): Rewrite for new LocalGov Drupal site#2055InertiaUK wants to merge 2 commits into
Conversation
|
Warning Review limit reached
Your plan currently allows 2 reviews/hour. Refill in 27 minutes and 52 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughWirralCouncil.py's ChangesWirral Council HTTP Session and Calendar Parsing
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2055 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py`:
- Line 29: The three calls that disable TLS verification (the s.get(...) that
assigns to r1, and the similar calls assigning to r2 and r3) must not pass
verify=False; remove that parameter (or set verify=True) so the requests.Session
's' uses normal certificate/hostname validation when contacting wirral.gov.uk;
update the get calls in the WirralCouncil class/method that perform
postcode/UPRN requests to omit verify=False and ensure no InsecureRequestWarning
is emitted.
- Around line 161-162: In WirralCouncil (class WirralCouncil) remove the silent
"except ValueError: pass" handlers used during date parsing and instead log the
offending raw input and re-raise the error (or simply let the ValueError
propagate) so upstream code can detect format regressions; update both
occurrences that currently swallow ValueError (the two try/except blocks shown)
to either (a) process_logger.error("Unexpected date format", extra={"input":
raw_date_text}) then raise, or (b) delete the except so the original ValueError
bubbles up—ensure the logged variable contains the original string being parsed
so the failing input is visible in logs.
- Line 94: The code uses a single current_year = datetime.now().year for all
parsed months which breaks the six-month window when it crosses year boundaries;
update the parsing in WirralCouncil (the loop that computes collection_date from
month headings and the "Next collection" fallback) to track the previous_month
during iteration and increment the year whenever the parsed month number
decreases (or whenever a parsed month is earlier than datetime.now().month and
you haven’t wrapped yet), and when building collection_date use the adjusted
year (not the fixed current_year); for the "Next collection" fallback (the logic
that chooses between current_year and current_year+1) compute the candidate
dates for both years and pick the one that is not in the past so January/Feb of
next year aren’t treated as stale.
- Around line 78-79: The code sets action = form2.get("action", "") and builds
post_url = f"{base}{action}" if action.startswith("/") else action, which yields
an empty post_url when action is "" and later causes s.post("", ...) errors;
update the logic so when action is empty or missing you fall back to the current
page url (variable url) and still handle leading-slash actions by prefixing
base: i.e. check if not action or action == "" then set post_url = url, else if
action.startswith("/") set post_url = f"{base}{action}" else set post_url =
action; ensure s.post uses post_url.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6a812dcb-9f00-4d08-9e01-5a9190a32a9e
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py
| except ValueError: | ||
| pass |
There was a problem hiding this comment.
Avoid silently swallowing ValueError during date parsing.
Both fallback branches pass on ValueError, which hides legitimate format changes (e.g. month renamed, day rendered as "1st", new bin-type label) behind an eventual generic "No collection data found on page". Per prior guidance for this repo, council parsers should fail loudly when the upstream HTML deviates from expectations so regressions are caught early rather than producing empty results.
At minimum, log the offending input; preferably, drop the bare except so unexpected formats surface to the caller (the outer "No collection data found" check still gives a clean error for genuinely empty pages).
Based on learnings: "when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors."
Also applies to: 200-201
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py` around lines
161 - 162, In WirralCouncil (class WirralCouncil) remove the silent "except
ValueError: pass" handlers used during date parsing and instead log the
offending raw input and re-raise the error (or simply let the ValueError
propagate) so upstream code can detect format regressions; update both
occurrences that currently swallow ValueError (the two try/except blocks shown)
to either (a) process_logger.error("Unexpected date format", extra={"input":
raw_date_text}) then raise, or (b) delete the except so the original ValueError
bubbles up—ensure the logged variable contains the original string being parsed
so the failing input is visible in logs.
da859e0 to
fdf6fe2
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py`:
- Around line 111-116: The code finds structured day elements into the variable
day_items via container.find_all("div",
class_=re.compile(r"localgov-waste-collection-day")) but never processes them;
update the function that parses month containers to handle these structured
elements (or remove the lookup). If you choose to process them, iterate over
day_items, extract the day number and collection types from each element (use
the same container variable and the class pattern localgov-waste-collection-day
to locate inner text or sub-elements), convert to the existing month/day
collection format used elsewhere in this function, and append to the month
results; alternatively, remove the day_items lookup and always fall back to the
existing text parsing path to avoid dead code. Ensure you modify the function
containing day_items so the parsed collections are added to the same output
structure used by the rest of the parser.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 56c3fba3-808e-469e-a048-34e398da940f
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py
| # Find day entries - each has a day number and collection info | ||
| day_items = container.find_all( | ||
| "div", class_=re.compile(r"localgov-waste-collection-day") | ||
| ) | ||
| if green_match: | ||
| date_str = green_match.group(1) | ||
| collection_date = datetime.strptime(date_str, "%d %B %Y").strftime( | ||
| date_format | ||
| ) | ||
| bindata["bins"].append( | ||
| { | ||
| "type": "Green bin (non-recyclable waste)", | ||
| "collectionDate": collection_date, | ||
| } | ||
| ) | ||
|
|
||
| # Sort by collection date | ||
| bindata["bins"].sort( | ||
| key=lambda x: datetime.strptime(x.get("collectionDate"), date_format) | ||
| ) | ||
|
|
||
| except Exception as e: | ||
| print(f"An error occurred: {e}") | ||
| raise | ||
| finally: | ||
| # Close the driver | ||
| if driver: | ||
| driver.quit() | ||
|
|
||
| return bindata | ||
| if not day_items: |
There was a problem hiding this comment.
Structured day elements are found but never processed.
When day_items is non-empty, the code skips directly past the if not day_items: block without extracting any collection data from those elements. This means if the council's page uses the localgov-waste-collection-day structure, all months will be silently skipped, resulting in either missing data or a "No collection data found" error.
Either implement processing for the structured elements, or remove the day_items lookup and always use the text fallback.
🐛 Suggested fix: process structured elements or remove dead code
Option A: Always use text fallback (simplest)
- day_items = container.find_all(
- "div", class_=re.compile(r"localgov-waste-collection-day")
- )
-
- if not day_items:
- # Fallback: parse text content
- text = container.get_text(separator="\n", strip=True)
+ # Parse text content from container
+ text = container.get_text(separator="\n", strip=True)Option B: Process structured elements when present
day_items = container.find_all(
"div", class_=re.compile(r"localgov-waste-collection-day")
)
- if not day_items:
+ if day_items:
+ # Process structured day elements
+ for day_item in day_items:
+ # Extract day number and bin type from structured markup
+ day_text = day_item.get_text(separator=" ", strip=True)
+ # TODO: Parse day number and bin types from day_item structure
+ pass
+ else:
# Fallback: parse text content🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py` around lines
111 - 116, The code finds structured day elements into the variable day_items
via container.find_all("div",
class_=re.compile(r"localgov-waste-collection-day")) but never processes them;
update the function that parses month containers to handle these structured
elements (or remove the lookup). If you choose to process them, iterate over
day_items, extract the day number and collection types from each element (use
the same container variable and the class pattern localgov-waste-collection-day
to locate inner text or sub-elements), convert to the existing month/day
collection format used elsewhere in this function, and append to the month
results; alternatively, remove the day_items lookup and always fall back to the
existing text parsing path to avoid dead code. Ensure you modify the function
containing day_items so the parsed collections are added to the same output
structure used by the rest of the parser.
Summary
wdp.wirral.gov.uk) no longer resolves — council migrated towirral.gov.uk/bins-and-recycling/bin-collection-datesrunning LocalGov DrupalTest
Summary by CodeRabbit
Bug Fixes
Refactor