feat(BassetlawDistrictCouncil): add new scraper using ReCollect API#2049
feat(BassetlawDistrictCouncil): add new scraper using ReCollect API#2049InertiaUK wants to merge 2 commits into
Conversation
Uses ReCollect API (area BassetlawUK, service 50015) with multi-strategy address resolution: full address text, postcode qualifier expansion, and combined paon+postcode search. Test address: 2 Albert Road, DN22 6JD Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Warning Review limit reached
Your plan currently allows 2 reviews/hour. Refill in 23 minutes and 59 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds a new Bassetlaw District Council integration module that fetches bin collection data via the Recollect API, resolving addresses to place IDs through multiple fallback strategies and returning scheduled pickups over a 60-day window. Test fixtures are added to support the integration. ChangesBassetlaw District Council Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2049 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/BassetlawDistrictCouncil.py`:
- Around line 122-128: The parsing silently skips malformed payloads: change the
logic around events_data, the events loop and the day check to fail fast by
validating the upstream contract and raising exceptions when it's violated;
specifically, in the method that assigns events_data (variable events_data)
ensure the "events" key exists and is a list (raise ValueError/RuntimeError if
missing or wrong type instead of using events_data.get("events", []")), and
inside the for loop that iterates events, validate each event contains a
non-empty "day" field (raise an exception mentioning the offending event when
day is missing) so the scraper surfaces ReCollect contract changes rather than
producing incomplete bins.
- Around line 21-22: parse_data currently only reads paon from kwargs, so
user_paon becomes None when the fixture supplies house_number; update the
assignment to read house_number as a fallback (e.g., set user_paon =
kwargs.get("paon") or kwargs.get("house_number")) so the subsequent matching
logic in parse_data (uses of user_paon around the postcode matching and
selection at the branches that pick results) will prefer house_number when paon
is absent; ensure the same fallback is used wherever user_paon is referenced in
this class.
- Around line 45-88: The code currently defaults to the first candidate when no
PAON match is found (assigning place_id = parcels[0]["place_id"] and place_id =
addresses[0]["place_id"]) and also ignores multiple qualifiers; instead, detect
ambiguity and fail loudly: when more than one parcel/qualifier/address exists
and no explicit PAON match was found, raise a clear exception (e.g., ValueError
or a custom AddressAmbiguousError) rather than selecting the first entry; update
the logic in BassetlawDistrictCouncil.py around the parcels, qualifiers and
addresses handling (the blocks that set place_id from parcels, qualifiers[0],
and addresses) to return/raise an error indicating an ambiguous postcode so
callers can handle it explicitly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 55d53489-61ca-49c1-85f0-3d89d5a3250d
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/BassetlawDistrictCouncil.py
| # If postcode returns parcels directly, match by house number | ||
| parcels = [r for r in results if r.get("type") == "parcel" and r.get("place_id")] | ||
| if parcels: | ||
| if user_paon: | ||
| for p in parcels: | ||
| name = p.get("name", "") | ||
| if name.lower().startswith(user_paon.lower()): | ||
| place_id = p["place_id"] | ||
| break | ||
| if not place_id and parcels: | ||
| place_id = parcels[0]["place_id"] | ||
|
|
||
| # If postcode returns qualifiers, get all addresses for that postcode | ||
| if not place_id: | ||
| qualifiers = [r for r in results if r.get("type") == "place_qualifier"] | ||
| if qualifiers: | ||
| qual = qualifiers[0] | ||
| # Fetch all addresses in this postcode area | ||
| params2 = {"q": "", "locale": "en-GB", "place_qualifier_id": qual["qualifier_id"]} | ||
| response2 = requests.get(suggest_url, headers=HEADERS, params=params2, timeout=30) | ||
| response2.raise_for_status() | ||
| addresses = [r for r in response2.json() if r.get("type") == "parcel" and r.get("place_id")] | ||
|
|
||
| if not addresses: | ||
| # Qualifier empty search fails — try with postcode text inside qualifier | ||
| params3 = {"q": postcode, "locale": "en-GB", "place_qualifier_id": qual["qualifier_id"]} | ||
| response3 = requests.get(suggest_url, headers=HEADERS, params=params3, timeout=30) | ||
| response3.raise_for_status() | ||
| addresses = [r for r in response3.json() if r.get("type") == "parcel" and r.get("place_id")] | ||
|
|
||
| if addresses and user_paon: | ||
| for addr in addresses: | ||
| name = addr.get("name", "") | ||
| if name.lower().startswith(user_paon.lower()): | ||
| place_id = addr["place_id"] | ||
| break | ||
| # Match house number at start of address | ||
| num_match = re.match(r"^(\d+[a-zA-Z]?)\b", user_paon.strip()) | ||
| if num_match and name.startswith(num_match.group(1) + " "): | ||
| place_id = addr["place_id"] | ||
| break | ||
|
|
||
| if not place_id and addresses: | ||
| place_id = addresses[0]["place_id"] |
There was a problem hiding this comment.
Don't default ambiguous postcode matches to the first candidate.
Line 55 and Line 88 silently pick the first returned parcel/address when no exact PAON match is found. On a multi-property postcode, that can return a valid-looking schedule for the wrong household. The same ambiguity exists at Line 61 when multiple qualifiers are returned.
Suggested change
- if not place_id and parcels:
- place_id = parcels[0]["place_id"]
+ if not place_id and len(parcels) == 1:
+ place_id = parcels[0]["place_id"]
+ elif not place_id and parcels:
+ raise ValueError(
+ f"Ambiguous postcode match for {postcode}; please provide a more specific address"
+ )
- if qualifiers:
+ if len(qualifiers) == 1:
qual = qualifiers[0]
+ elif qualifiers:
+ raise ValueError(
+ f"Ambiguous postcode qualifier match for {postcode}"
+ )
+ else:
+ qual = None
- if not place_id and addresses:
- place_id = addresses[0]["place_id"]
+ if not place_id and len(addresses) == 1:
+ place_id = addresses[0]["place_id"]
+ elif not place_id and addresses:
+ raise ValueError(
+ f"Ambiguous address match for {postcode}; please provide a more specific address"
+ )Based on learnings: In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/BassetlawDistrictCouncil.py`
around lines 45 - 88, The code currently defaults to the first candidate when no
PAON match is found (assigning place_id = parcels[0]["place_id"] and place_id =
addresses[0]["place_id"]) and also ignores multiple qualifiers; instead, detect
ambiguity and fail loudly: when more than one parcel/qualifier/address exists
and no explicit PAON match was found, raise a clear exception (e.g., ValueError
or a custom AddressAmbiguousError) rather than selecting the first entry; update
the logic in BassetlawDistrictCouncil.py around the parcels, qualifiers and
addresses handling (the blocks that set place_id from parcels, qualifiers[0],
and addresses) to return/raise an error indicating an ambiguous postcode so
callers can handle it explicitly.
| events_data = response.json() | ||
|
|
||
| seen = set() | ||
| for event in events_data.get("events", []): | ||
| day = event.get("day") | ||
| if not day: | ||
| continue |
There was a problem hiding this comment.
Raise on malformed event payloads instead of silently dropping them.
events_data.get("events", []) and if not day: continue turn upstream contract drift into an empty or incomplete bins list. If ReCollect changes the payload shape, this scraper should fail fast rather than under-report collections.
Suggested change
- events_data = response.json()
+ events_data = response.json()
+ events = events_data.get("events")
+ if not isinstance(events, list):
+ raise ValueError(
+ f"Unexpected events payload for place_id {place_id}: missing 'events' list"
+ )
seen = set()
- for event in events_data.get("events", []):
+ for event in events:
day = event.get("day")
if not day:
- continue
+ raise ValueError(
+ f"Unexpected event payload for place_id {place_id}: missing 'day'"
+ )Based on learnings: In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/BassetlawDistrictCouncil.py`
around lines 122 - 128, The parsing silently skips malformed payloads: change
the logic around events_data, the events loop and the day check to fail fast by
validating the upstream contract and raising exceptions when it's violated;
specifically, in the method that assigns events_data (variable events_data)
ensure the "events" key exists and is a list (raise ValueError/RuntimeError if
missing or wrong type instead of using events_data.get("events", []")), and
inside the for loop that iterates events, validate each event contains a
non-empty "day" field (raise an exception mentioning the offending event when
day is missing) so the scraper surfaces ReCollect contract changes rather than
producing incomplete bins.
Summary
BassetlawUK, service50015)Summary by CodeRabbit
New Features
Tests