Skip to content

feat: add torfaen council scraper via itouchvision portal#2082

Open
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:feat/torfaen-council
Open

feat: add torfaen council scraper via itouchvision portal#2082
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:feat/torfaen-council

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 18, 2026

Summary

Adds a new scraper for Torfaen County Borough Council (W06000020). The council uses an iTouchVision iCollectionDay portal for bin lookups.

  • Selenium-based (iTouchVision is fully JS-rendered)
  • Enters postcode, selects address from dropdown by matching house_number
  • Parses collection cards for bin type + next collection dates
  • Falls back to first address if no match found

Testing

  • NP44 1NH + '44 Wayfield Crescent' - returns 6 bins
  • Tested via Grid and local Selenium
  • Confirmed working through production API wrapper end-to-end

Summary by CodeRabbit

  • New Features
    • Added support for Torfaen Council bin collection schedules. Users can now check their waste collection dates by entering their postcode.

Review Change Stack

Selenium-based scraper for the iTouchVision iCollectionDay portal used
by Torfaen County Borough Council. Enters postcode, selects address from
dropdown, then parses collection cards for bin type and next dates.

Requires Selenium webdriver.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Warning

Review limit reached

@InertiaUK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 24 minutes and 12 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7144b4c3-134c-4ecd-afd7-71e838cb6254

📥 Commits

Reviewing files that changed from the base of the PR and between 815df57 and 6f9f97c.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/TorfaenCouncil.py
📝 Walkthrough

Walkthrough

This PR introduces a new Selenium-based council scraper module for Torfaen Council that queries an iTouch Vision collection-day service by postcode, parses returned HTML for upcoming bin collection dates, normalizes date strings, and returns structured bin data. Test configuration is added alongside the implementation.

Changes

Torfaen Council Bin Collection Scraper

Layer / File(s) Summary
Date parsing helper and module constants
uk_bin_collection/uk_bin_collection/councils/TorfaenCouncil.py (lines 1–31)
Module imports Selenium and datetime utilities, defines ITV_URL endpoint constant, and implements _parse_date(text) helper that converts weekday/month text into normalized datetime values using current year with month-boundary adjustment, returning None on parse failure.
Web scraping and bin data extraction
uk_bin_collection/uk_bin_collection/councils/TorfaenCouncil.py (lines 34–132)
CouncilClass.parse_data initializes Selenium webdriver, loads the ITV page, searches by postcode, optionally selects a dropdown option based on PAON prefix matching, parses HTML for "Your next collections" sections, extracts bin types and date strings via regex, normalizes dates using _parse_date, de-duplicates (bin type, date) pairs per page, and sorts results chronologically. Exception handling prints and re-raises errors; webdriver is always terminated in a finally block.
Test configuration
uk_bin_collection/tests/input.json (lines 2880–2889)
New TorfaenCouncil test entry in the configuration JSON with house number, postcode, skip_get_url flag, scraper URL, webdriver requirement, wiki metadata, and LAD24CD identifier.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A new council joins the quest so bright,
Torfaen's bins shall shine with scraped delight,
With Selenium's click and dates parsed true,
Collection day secrets, now in review! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding a new council scraper for Torfaen using the iTouchVision portal, which matches the primary purpose of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (6f9f97c).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2082   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/TorfaenCouncil.py`:
- Around line 20-31: The _parse_date function currently returns None for
unparseable dates which lets callers silently skip bad input; change it to raise
a descriptive ValueError (include the raw text and attempted formats) instead of
returning None, and update the code that calls _parse_date (the
collection-parsing loop that currently ignores None) to catch that ValueError
and fail fast or propagate it so upstream can detect format regressions;
reference and modify _parse_date and the caller in the collection parsing code
to implement this behavior.
- Line 41: The code only reads kwargs.get("paon") into user_paon and then later
hard-selects index 1 when no exact match, which can return the wrong property;
update the logic to accept either kwargs.get("paon") or
kwargs.get("house_number") (e.g., set user_paon = kwargs.get("paon") or
kwargs.get("house_number")), normalize both the user_paon and candidate paon
strings before comparing, and change the fallback in the matching loop (the
block around where candidates are filtered and index 1 is chosen between lines
64-79) to select the first best candidate (e.g., first non-empty match or
candidates[0]) instead of always picking index 1 so you don't return another
property’s collections. Ensure this uses the same matching routine used
elsewhere in TorfaenCouncil.py so comparisons are consistent.
- Around line 120-125: The code currently returns data even when data["bins"] is
empty which can hide parser failures; inside the TorfaenCouncil parsing method
(the block that sorts using data["bins"].sort and datetime.strptime with
date_format), add an explicit check after extraction/sorting: if not
data.get("bins"): raise a clear exception (e.g., ValueError or a custom
ParserError) with a descriptive message like "No bins extracted from
TorfaenCouncil" so schema/HTML drift surfaces immediately instead of returning
an empty payload.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d31f0a8-ca29-48e7-aa1f-08017674507b

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 815df57.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/TorfaenCouncil.py

Comment on lines +20 to +31
def _parse_date(text):
text = text.strip()
current_year = datetime.now().year
for fmt in ["%A %d %B", "%d %B", "%A %d %b", "%d %b"]:
try:
parsed = datetime.strptime(text, fmt).replace(year=current_year)
if parsed.month < datetime.now().month - 1:
parsed = parsed.replace(year=current_year + 1)
return parsed
except ValueError:
continue
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on unparseable collection dates instead of silently skipping them.

Line 31 returns None, and Lines 109-111 silently ignore failed parses. That can hide upstream format changes and emit incomplete data.

Proposed fix
 def _parse_date(text):
@@
-    return None
+    raise ValueError(f"Unsupported date format: {text!r}")
@@
-                for date_str in date_matches:
-                    parsed = _parse_date(date_str)
-                    if parsed:
-                        cd = parsed.strftime(date_format)
-                        key = (bin_type, cd)
-                        if key not in seen:
-                            seen.add(key)
-                            data["bins"].append({
-                                "type": bin_type,
-                                "collectionDate": cd,
-                            })
+                for date_str in date_matches:
+                    parsed = _parse_date(date_str)
+                    cd = parsed.strftime(date_format)
+                    key = (bin_type, cd)
+                    if key not in seen:
+                        seen.add(key)
+                        data["bins"].append({
+                            "type": bin_type,
+                            "collectionDate": cd,
+                        })

Based on learnings: prefer explicit failures on unexpected parsing formats instead of silent defaults.

Also applies to: 108-111

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/TorfaenCouncil.py` around lines
20 - 31, The _parse_date function currently returns None for unparseable dates
which lets callers silently skip bad input; change it to raise a descriptive
ValueError (include the raw text and attempted formats) instead of returning
None, and update the code that calls _parse_date (the collection-parsing loop
that currently ignores None) to catch that ValueError and fail fast or propagate
it so upstream can detect format regressions; reference and modify _parse_date
and the caller in the collection parsing code to implement this behavior.

Comment thread uk_bin_collection/uk_bin_collection/councils/TorfaenCouncil.py Outdated
Comment thread uk_bin_collection/uk_bin_collection/councils/TorfaenCouncil.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant