Skip to content

fix(WirralCouncil): Rewrite for new LocalGov Drupal site#2055

Open
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:fix-wirral-localgov-drupal
Open

fix(WirralCouncil): Rewrite for new LocalGov Drupal site#2055
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:fix-wirral-localgov-drupal

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 12, 2026

Summary

  • Old URL (wdp.wirral.gov.uk) no longer resolves — council migrated to wirral.gov.uk/bins-and-recycling/bin-collection-dates running LocalGov Drupal
  • Rewrote as pure requests (no Selenium needed): GET form tokens → POST postcode → POST UPRN selection → parse calendar
  • Returns full 6-month calendar with all collection types

Test

  • Postcode: CH44 2BH
  • UPRN: 42029386

Summary by CodeRabbit

  • Bug Fixes

    • More reliable retrieval of Wirral bin collection dates with improved address validation, clearer error handling when address options are missing, and a “Next collection” fallback to surface upcoming pickups.
  • Refactor

    • Reworked the backend fetch/parsing flow for Wirral Council to improve stability and make schedule extraction more robust across varied page layouts.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Warning

Review limit reached

@InertiaUK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 27 minutes and 52 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3665a968-8654-41c9-ab6e-45ae679f3350

📥 Commits

Reviewing files that changed from the base of the PR and between fdf6fe2 and 11c4903.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py
📝 Walkthrough

Walkthrough

WirralCouncil.py's parse_data method was completely reimplemented to replace Selenium-driven browser automation with HTTP requests and BeautifulSoup HTML parsing. The new flow posts a postcode to retrieve an address list, selects a UPRN, fetches the calendar page, and parses bin collection dates using month headings and day elements, with multi-stage fallbacks.

Changes

Wirral Council HTTP Session and Calendar Parsing

Layer / File(s) Summary
Imports and HTTP Session Workflow
uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py
Selenium imports removed; requests and BeautifulSoup maintained. parse_data reimplemented to build a session, POST postcode to retrieve address list, select provided or first-matched UPRN, and POST UPRN to retrieve calendar content, with explicit error handling for missing forms and unmatched UPRNs.
Calendar Parsing with Month Iteration and Fallbacks
uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py
Calendar parsing changed from regex extraction of fixed bin types ("Grey", "Green") to iterating h3 month headings and extracting day entries from div.localgov-waste-collection-day elements; includes structured element-based path and text-content fallback; "Next collection" section parsing added as final fallback if primary parsing yields no bins.
Result Sorting and Return
uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py
Parsed bins are sorted by collectionDate and returned in standard {"bins": [...]} structure.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • robbrad/UKBinCollectionData#1764: Both PRs replace Selenium/form-based retrieval workflows with requests+BeautifulSoup parsing, rewriting council parse_data methods to extract month/day entries and produce sorted bin collection records.

Suggested reviewers

  • dp247

Poem

🐰 I hopped from Selenium's clicky parade,
To quiet requests where no browsers stayed.
BeautifulSoup nibbles months and their dates,
Fallback crumbs find the bins at the gates.
Sorted and tidy — a rabbit's clean parade. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: rewriting WirralCouncil to work with the new LocalGov Drupal site after the council's migration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (11c4903).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2055   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py`:
- Line 29: The three calls that disable TLS verification (the s.get(...) that
assigns to r1, and the similar calls assigning to r2 and r3) must not pass
verify=False; remove that parameter (or set verify=True) so the requests.Session
's' uses normal certificate/hostname validation when contacting wirral.gov.uk;
update the get calls in the WirralCouncil class/method that perform
postcode/UPRN requests to omit verify=False and ensure no InsecureRequestWarning
is emitted.
- Around line 161-162: In WirralCouncil (class WirralCouncil) remove the silent
"except ValueError: pass" handlers used during date parsing and instead log the
offending raw input and re-raise the error (or simply let the ValueError
propagate) so upstream code can detect format regressions; update both
occurrences that currently swallow ValueError (the two try/except blocks shown)
to either (a) process_logger.error("Unexpected date format", extra={"input":
raw_date_text}) then raise, or (b) delete the except so the original ValueError
bubbles up—ensure the logged variable contains the original string being parsed
so the failing input is visible in logs.
- Line 94: The code uses a single current_year = datetime.now().year for all
parsed months which breaks the six-month window when it crosses year boundaries;
update the parsing in WirralCouncil (the loop that computes collection_date from
month headings and the "Next collection" fallback) to track the previous_month
during iteration and increment the year whenever the parsed month number
decreases (or whenever a parsed month is earlier than datetime.now().month and
you haven’t wrapped yet), and when building collection_date use the adjusted
year (not the fixed current_year); for the "Next collection" fallback (the logic
that chooses between current_year and current_year+1) compute the candidate
dates for both years and pick the one that is not in the past so January/Feb of
next year aren’t treated as stale.
- Around line 78-79: The code sets action = form2.get("action", "") and builds
post_url = f"{base}{action}" if action.startswith("/") else action, which yields
an empty post_url when action is "" and later causes s.post("", ...) errors;
update the logic so when action is empty or missing you fall back to the current
page url (variable url) and still handle leading-slash actions by prefixing
base: i.e. check if not action or action == "" then set post_url = url, else if
action.startswith("/") set post_url = f"{base}{action}" else set post_url =
action; ensure s.post uses post_url.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a812dcb-9f00-4d08-9e01-5a9190a32a9e

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and da859e0.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py

Comment thread uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py Outdated
Comment thread uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py Outdated
Comment thread uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py Outdated
Comment on lines +161 to +162
except ValueError:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid silently swallowing ValueError during date parsing.

Both fallback branches pass on ValueError, which hides legitimate format changes (e.g. month renamed, day rendered as "1st", new bin-type label) behind an eventual generic "No collection data found on page". Per prior guidance for this repo, council parsers should fail loudly when the upstream HTML deviates from expectations so regressions are caught early rather than producing empty results.

At minimum, log the offending input; preferably, drop the bare except so unexpected formats surface to the caller (the outer "No collection data found" check still gives a clean error for genuinely empty pages).

Based on learnings: "when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors."

Also applies to: 200-201

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py` around lines
161 - 162, In WirralCouncil (class WirralCouncil) remove the silent "except
ValueError: pass" handlers used during date parsing and instead log the
offending raw input and re-raise the error (or simply let the ValueError
propagate) so upstream code can detect format regressions; update both
occurrences that currently swallow ValueError (the two try/except blocks shown)
to either (a) process_logger.error("Unexpected date format", extra={"input":
raw_date_text}) then raise, or (b) delete the except so the original ValueError
bubbles up—ensure the logged variable contains the original string being parsed
so the failing input is visible in logs.

@InertiaUK InertiaUK force-pushed the fix-wirral-localgov-drupal branch from da859e0 to fdf6fe2 Compare May 12, 2026 15:24
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py`:
- Around line 111-116: The code finds structured day elements into the variable
day_items via container.find_all("div",
class_=re.compile(r"localgov-waste-collection-day")) but never processes them;
update the function that parses month containers to handle these structured
elements (or remove the lookup). If you choose to process them, iterate over
day_items, extract the day number and collection types from each element (use
the same container variable and the class pattern localgov-waste-collection-day
to locate inner text or sub-elements), convert to the existing month/day
collection format used elsewhere in this function, and append to the month
results; alternatively, remove the day_items lookup and always fall back to the
existing text parsing path to avoid dead code. Ensure you modify the function
containing day_items so the parsed collections are added to the same output
structure used by the rest of the parser.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 56c3fba3-808e-469e-a048-34e398da940f

📥 Commits

Reviewing files that changed from the base of the PR and between da859e0 and fdf6fe2.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py

Comment on lines +111 to +116
# Find day entries - each has a day number and collection info
day_items = container.find_all(
"div", class_=re.compile(r"localgov-waste-collection-day")
)
if green_match:
date_str = green_match.group(1)
collection_date = datetime.strptime(date_str, "%d %B %Y").strftime(
date_format
)
bindata["bins"].append(
{
"type": "Green bin (non-recyclable waste)",
"collectionDate": collection_date,
}
)

# Sort by collection date
bindata["bins"].sort(
key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
)

except Exception as e:
print(f"An error occurred: {e}")
raise
finally:
# Close the driver
if driver:
driver.quit()

return bindata
if not day_items:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Structured day elements are found but never processed.

When day_items is non-empty, the code skips directly past the if not day_items: block without extracting any collection data from those elements. This means if the council's page uses the localgov-waste-collection-day structure, all months will be silently skipped, resulting in either missing data or a "No collection data found" error.

Either implement processing for the structured elements, or remove the day_items lookup and always use the text fallback.

🐛 Suggested fix: process structured elements or remove dead code

Option A: Always use text fallback (simplest)

-            day_items = container.find_all(
-                "div", class_=re.compile(r"localgov-waste-collection-day")
-            )
-
-            if not day_items:
-                # Fallback: parse text content
-                text = container.get_text(separator="\n", strip=True)
+            # Parse text content from container
+            text = container.get_text(separator="\n", strip=True)

Option B: Process structured elements when present

             day_items = container.find_all(
                 "div", class_=re.compile(r"localgov-waste-collection-day")
             )

-            if not day_items:
+            if day_items:
+                # Process structured day elements
+                for day_item in day_items:
+                    # Extract day number and bin type from structured markup
+                    day_text = day_item.get_text(separator=" ", strip=True)
+                    # TODO: Parse day number and bin types from day_item structure
+                    pass
+            else:
                 # Fallback: parse text content
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/WirralCouncil.py` around lines
111 - 116, The code finds structured day elements into the variable day_items
via container.find_all("div",
class_=re.compile(r"localgov-waste-collection-day")) but never processes them;
update the function that parses month containers to handle these structured
elements (or remove the lookup). If you choose to process them, iterate over
day_items, extract the day number and collection types from each element (use
the same container variable and the class pattern localgov-waste-collection-day
to locate inner text or sub-elements), convert to the existing month/day
collection format used elsewhere in this function, and append to the month
results; alternatively, remove the day_items lookup and always fall back to the
existing text parsing path to avoid dead code. Ensure you modify the function
containing day_items so the parsed collections are added to the same output
structure used by the rest of the parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant