feat: add torfaen council scraper via itouchvision portal#2082
feat: add torfaen council scraper via itouchvision portal#2082InertiaUK wants to merge 2 commits into
Conversation
Selenium-based scraper for the iTouchVision iCollectionDay portal used by Torfaen County Borough Council. Enters postcode, selects address from dropdown, then parses collection cards for bin type and next dates. Requires Selenium webdriver.
|
Warning Review limit reached
Your plan currently allows 2 reviews/hour. Refill in 24 minutes and 12 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR introduces a new Selenium-based council scraper module for Torfaen Council that queries an iTouch Vision collection-day service by postcode, parses returned HTML for upcoming bin collection dates, normalizes date strings, and returns structured bin data. Test configuration is added alongside the implementation. ChangesTorfaen Council Bin Collection Scraper
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2082 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/TorfaenCouncil.py`:
- Around line 20-31: The _parse_date function currently returns None for
unparseable dates which lets callers silently skip bad input; change it to raise
a descriptive ValueError (include the raw text and attempted formats) instead of
returning None, and update the code that calls _parse_date (the
collection-parsing loop that currently ignores None) to catch that ValueError
and fail fast or propagate it so upstream can detect format regressions;
reference and modify _parse_date and the caller in the collection parsing code
to implement this behavior.
- Line 41: The code only reads kwargs.get("paon") into user_paon and then later
hard-selects index 1 when no exact match, which can return the wrong property;
update the logic to accept either kwargs.get("paon") or
kwargs.get("house_number") (e.g., set user_paon = kwargs.get("paon") or
kwargs.get("house_number")), normalize both the user_paon and candidate paon
strings before comparing, and change the fallback in the matching loop (the
block around where candidates are filtered and index 1 is chosen between lines
64-79) to select the first best candidate (e.g., first non-empty match or
candidates[0]) instead of always picking index 1 so you don't return another
property’s collections. Ensure this uses the same matching routine used
elsewhere in TorfaenCouncil.py so comparisons are consistent.
- Around line 120-125: The code currently returns data even when data["bins"] is
empty which can hide parser failures; inside the TorfaenCouncil parsing method
(the block that sorts using data["bins"].sort and datetime.strptime with
date_format), add an explicit check after extraction/sorting: if not
data.get("bins"): raise a clear exception (e.g., ValueError or a custom
ParserError) with a descriptive message like "No bins extracted from
TorfaenCouncil" so schema/HTML drift surfaces immediately instead of returning
an empty payload.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7d31f0a8-ca29-48e7-aa1f-08017674507b
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/TorfaenCouncil.py
| def _parse_date(text): | ||
| text = text.strip() | ||
| current_year = datetime.now().year | ||
| for fmt in ["%A %d %B", "%d %B", "%A %d %b", "%d %b"]: | ||
| try: | ||
| parsed = datetime.strptime(text, fmt).replace(year=current_year) | ||
| if parsed.month < datetime.now().month - 1: | ||
| parsed = parsed.replace(year=current_year + 1) | ||
| return parsed | ||
| except ValueError: | ||
| continue | ||
| return None |
There was a problem hiding this comment.
Fail fast on unparseable collection dates instead of silently skipping them.
Line 31 returns None, and Lines 109-111 silently ignore failed parses. That can hide upstream format changes and emit incomplete data.
Proposed fix
def _parse_date(text):
@@
- return None
+ raise ValueError(f"Unsupported date format: {text!r}")
@@
- for date_str in date_matches:
- parsed = _parse_date(date_str)
- if parsed:
- cd = parsed.strftime(date_format)
- key = (bin_type, cd)
- if key not in seen:
- seen.add(key)
- data["bins"].append({
- "type": bin_type,
- "collectionDate": cd,
- })
+ for date_str in date_matches:
+ parsed = _parse_date(date_str)
+ cd = parsed.strftime(date_format)
+ key = (bin_type, cd)
+ if key not in seen:
+ seen.add(key)
+ data["bins"].append({
+ "type": bin_type,
+ "collectionDate": cd,
+ })Based on learnings: prefer explicit failures on unexpected parsing formats instead of silent defaults.
Also applies to: 108-111
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/TorfaenCouncil.py` around lines
20 - 31, The _parse_date function currently returns None for unparseable dates
which lets callers silently skip bad input; change it to raise a descriptive
ValueError (include the raw text and attempted formats) instead of returning
None, and update the code that calls _parse_date (the collection-parsing loop
that currently ignores None) to catch that ValueError and fail fast or propagate
it so upstream can detect format regressions; reference and modify _parse_date
and the caller in the collection parsing code to implement this behavior.
Summary
Adds a new scraper for Torfaen County Borough Council (W06000020). The council uses an iTouchVision iCollectionDay portal for bin lookups.
Testing
Summary by CodeRabbit