Skip to content

fix: CalderdaleCouncil - rewrite as pure requests#2058

Open
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:fix/calderdale-pure-requests
Open

fix: CalderdaleCouncil - rewrite as pure requests#2058
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:fix/calderdale-pure-requests

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 12, 2026

Summary

  • Rewrote CalderdaleCouncil scraper to use direct HTTP POST to the JSP form instead of Selenium
  • The council's server-side form has no reCAPTCHA, so pure requests work reliably
  • Two-step flow: POST postcode → get address dropdown, POST UPRN → get collection table
  • Added UPRN zero-padding (.zfill(12)) since the council uses 12-digit padded UPRNs
  • Removed web_driver from input.json as Selenium is no longer needed

Testing

  • Tested directly on VPS: returns Recycling, Waste, and Garden collection dates
  • Tested via production API proxy: returns correct data for geocoded addresses

Summary by CodeRabbit

  • Refactor

    • Calderdale collection retrieval revamped for faster, more reliable results with improved postcode/UPRN handling, cleaner date extraction, and consistent sorting of upcoming collections (no browser automation required).
  • Chores

    • Council configuration entries updated: Calderdale and North Hertfordshire adjustments and a minor wiki note text fix for EnvironmentFirst.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Warning

Review limit reached

@InertiaUK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 24 minutes and 46 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c95316a-f655-472e-86c7-f4d7705d3aa8

📥 Commits

Reviewing files that changed from the base of the PR and between 54024c8 and e7bdec8.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py
📝 Walkthrough

Walkthrough

CalderdaleCouncil's collection parser was migrated from Selenium web driver automation to HTTP requests with BeautifulSoup parsing. The implementation validates address lookups via two sequential POST requests, extracts bin collection dates from HTML tables, and sorts results by collection date. Configuration entries were updated to reflect the removal of web driver requirements and related metadata changes.

Changes

Calderdale Council Parser Migration: Selenium to HTTP Requests

Layer / File(s) Summary
Parser implementation migration: imports, validation, and HTTP requests
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py
Import statements changed from Selenium modules to requests and BeautifulSoup. Parser validates uprn and postcode, creates HTTP session with User-Agent header, performs two POST requests to retrieve UPRN dropdown options and collection table, extracts collection dates from HTML table rows, and sorts results by date.
Configuration updates: remove web_driver and update council metadata
uk_bin_collection/tests/input.json
Calderdale Council configuration entry no longer requires web_driver field. EnvironmentFirst wiki_note instruction text was updated. North Hertfordshire District Council block was re-anchored with same structure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • dp247

Poem

🐰 A scraper once bound by the browser's long chain,
Now dances through HTTP with elegant refrain,
BeautifulSoup sips the HTML delight,
No Selenium sleepy—just POST and parse right!
The bins march on time, their collections are clear.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: rewriting CalderdaleCouncil from a Selenium-based approach to pure requests/HTTP POST implementation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1 1 0 0
View the top 1 failed test(s) by shortest run time
::uk_bin_collection.tests.step_defs.test_validate_council
Stack Traces | 0s run time
.../tests/step_defs/test_validate_council.py:24: in <module>
    @pytest.fixture(params=get_council_list())
.../tests/step_defs/test_validate_council.py:19: in get_council_list
    data = json.load(file)
.../usr/lib/python3.12/json/__init__.py:293: in load
    return loads(fp.read(),
<frozen codecs>:322: in decode
    ???
E   UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 41442: invalid start byte

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py (2)

1-1: ⚡ Quick win

Remove unused import.

The re module is imported but never used in this file.

♻️ Proposed cleanup
-import re
 from datetime import datetime
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` at line 1,
Remove the unused top-level import of the re module in CalderdaleCouncil.py:
delete the line importing "re" (the unused symbol `re`) so the file no longer
contains an unnecessary import.

98-99: ⚡ Quick win

Consider explicit error handling for date parsing failures.

The ValueError exception is caught and silently ignored when date parsing fails. This could hide format changes from the council website or data quality issues. Based on learnings, explicit failures help detect format changes early.

Consider logging the error or raising an exception with context about which row/bin type failed to parse.

♻️ Proposed improvement

Option 1: Log and continue (if some rows may legitimately lack dates):

                     try:
                         parsed = datetime.strptime(date_text, "%A %d %B %Y")
                         data["bins"].append(
                             {
                                 "type": bin_type,
                                 "collectionDate": parsed.strftime(date_format),
                             }
                         )
                     except ValueError:
+                        # Log unexpected date format for monitoring
+                        print(f"Warning: Could not parse date '{date_text}' for bin type '{bin_type}'")
                         continue

Option 2: Raise with context (if all rows should have valid dates):

                     try:
                         parsed = datetime.strptime(date_text, "%A %d %B %Y")
                         data["bins"].append(
                             {
                                 "type": bin_type,
                                 "collectionDate": parsed.strftime(date_format),
                             }
                         )
                     except ValueError:
-                        continue
+                        raise ValueError(
+                            f"Failed to parse date '{date_text}' for bin type '{bin_type}'. "
+                            f"Expected format: 'Day DD Month YYYY' (e.g., 'Monday 12 May 2026')"
+                        )

Based on learnings: Functions handling parsing should validate formats explicitly to detect unexpected changes early.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` around
lines 98 - 99, The silent except ValueError: continue in CalderdaleCouncil.py
hides date parsing failures; update that block (inside the CalderdaleCouncil
class/method that parses row dates) to either log the error with context
(include the offending date string, row data and bin type) and then continue, or
raise a new ValueError with a descriptive message so failures surface; use the
module logger or the class logger (e.g., self.logger or
logging.getLogger(__name__)) and include the original exception details when
logging/raising to aid debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Line 881: The "wiki_note" value contains a Unicode replacement character (�);
update the string for the "wiki_note" field to replace the U+FFFD with the
intended character (e.g., an em dash — or a regular hyphen -) so the text reads
correctly ("...Replace the XXXXXXXXXX with the UPRN of your property — you can
use..."); ensure the file is saved in UTF-8 encoding after making the
replacement so the character is preserved.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Line 1: Remove the unused top-level import of the re module in
CalderdaleCouncil.py: delete the line importing "re" (the unused symbol `re`) so
the file no longer contains an unnecessary import.
- Around line 98-99: The silent except ValueError: continue in
CalderdaleCouncil.py hides date parsing failures; update that block (inside the
CalderdaleCouncil class/method that parses row dates) to either log the error
with context (include the offending date string, row data and bin type) and then
continue, or raise a new ValueError with a descriptive message so failures
surface; use the module logger or the class logger (e.g., self.logger or
logging.getLogger(__name__)) and include the original exception details when
logging/raising to aid debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f3adcab2-453e-4077-8b0f-82ae464e3798

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 27f3b85.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py

"wiki_command_url_override": "https://environmentfirst.co.uk/house.php?uprn=XXXXXXXXXX",
"wiki_name": "Environment First",
"wiki_note": "For properties with collections managed by Environment First, such as Lewes and Eastbourne. Replace the XXXXXXXXXX with the UPRN of your property\u2014you can use [FindMyAddress](https://www.findmyaddress.co.uk/search) to find this."
"wiki_note": "For properties with collections managed by Environment First, such as Lewes and Eastbourne. Replace the XXXXXXXXXX with the UPRN of your property�you can use [FindMyAddress](https://www.findmyaddress.co.uk/search) to find this."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the encoding corruption in wiki_note.

The wiki_note field contains a replacement character (U+FFFD), indicating the original character was lost during encoding. This will display incorrectly to users.

🔧 Proposed fix

The character should likely be an em dash (—) or regular dash (-):

-        "wiki_note": "For properties with collections managed by Environment First, such as Lewes and Eastbourne. Replace the XXXXXXXXXX with the UPRN of your property�you can use [FindMyAddress](https://www.findmyaddress.co.uk/search) to find this."
+        "wiki_note": "For properties with collections managed by Environment First, such as Lewes and Eastbourne. Replace the XXXXXXXXXX with the UPRN of your property—you can use [FindMyAddress](https://www.findmyaddress.co.uk/search) to find this."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"wiki_note": "For properties with collections managed by Environment First, such as Lewes and Eastbourne. Replace the XXXXXXXXXX with the UPRN of your propertyyou can use [FindMyAddress](https://www.findmyaddress.co.uk/search) to find this."
"wiki_note": "For properties with collections managed by Environment First, such as Lewes and Eastbourne. Replace the XXXXXXXXXX with the UPRN of your propertyyou can use [FindMyAddress](https://www.findmyaddress.co.uk/search) to find this."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/tests/input.json` at line 881, The "wiki_note" value
contains a Unicode replacement character (�); update the string for the
"wiki_note" field to replace the U+FFFD with the intended character (e.g., an em
dash — or a regular hyphen -) so the text reads correctly ("...Replace the
XXXXXXXXXX with the UPRN of your property — you can use..."); ensure the file is
saved in UTF-8 encoding after making the replacement so the character is
preserved.

@InertiaUK InertiaUK force-pushed the fix/calderdale-pure-requests branch from 27f3b85 to 54024c8 Compare May 12, 2026 15:24
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py (2)

1-1: 💤 Low value

Remove unused re import.

The re module is imported but never used in the code.

🧹 Proposed fix
-import re
 from datetime import datetime
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` at line 1,
The module imports the unused re symbol which should be removed to avoid linter
warnings; edit CalderdaleCouncil.py and delete the top-level "import re"
statement (or remove re from the import list if combined with others) so only
actually used modules remain, then run tests/linter to confirm no remaining
references to re in the file.

90-99: ⚡ Quick win

Consider raising on unexpected date format instead of silent continue.

If strptime fails, the code silently continues, which could mask upstream format changes from the council. If the date format changes, this scraper would return partial or empty data without any indication of why.

Based on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors. This ensures format changes are detected early."

🔍 Proposed fix to fail explicitly
                     try:
                         parsed = datetime.strptime(date_text, "%A %d %B %Y")
                         data["bins"].append(
                             {
                                 "type": bin_type,
                                 "collectionDate": parsed.strftime(date_format),
                             }
                         )
                     except ValueError:
-                        continue
+                        raise ValueError(
+                            f"Unable to parse date '{date_text}' for bin type '{bin_type}'"
+                        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py` around
lines 90 - 99, The current try/except in CalderdaleCouncil that parses date_text
(where parsed = datetime.strptime(date_text, "%A %d %B %Y") and appends to
data["bins"]) silently continues on ValueError; replace the silent continue with
an explicit raise (or re-raise) including contextual info (date_text, bin_type
and the expected format) so failures surface (e.g., raise ValueError with a
message like "Unexpected date format for {date_text} when parsing bins of type
{bin_type}; expected '%A %d %B %Y'") so upstream callers detect format changes
early.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Around line 28-36: The two HTTP POST calls using session.post (the one that
sets data with "postcode" and the other between lines 53-64) do not include a
timeout, so the scraper can hang; update both session.post invocations to pass a
timeout argument (e.g., timeout=10 or a module-level constant like
DEFAULT_TIMEOUT) so requests will fail fast, and keep the existing
resp.raise_for_status() logic unchanged to handle HTTP errors.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py`:
- Line 1: The module imports the unused re symbol which should be removed to
avoid linter warnings; edit CalderdaleCouncil.py and delete the top-level
"import re" statement (or remove re from the import list if combined with
others) so only actually used modules remain, then run tests/linter to confirm
no remaining references to re in the file.
- Around line 90-99: The current try/except in CalderdaleCouncil that parses
date_text (where parsed = datetime.strptime(date_text, "%A %d %B %Y") and
appends to data["bins"]) silently continues on ValueError; replace the silent
continue with an explicit raise (or re-raise) including contextual info
(date_text, bin_type and the expected format) so failures surface (e.g., raise
ValueError with a message like "Unexpected date format for {date_text} when
parsing bins of type {bin_type}; expected '%A %d %B %Y'") so upstream callers
detect format changes early.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8a98ec4d-2fdc-4102-8b6c-fec2ee398bad

📥 Commits

Reviewing files that changed from the base of the PR and between 27f3b85 and 54024c8.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • uk_bin_collection/tests/input.json

Comment thread uk_bin_collection/uk_bin_collection/councils/CalderdaleCouncil.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant