Skip to content

feat(BassetlawDistrictCouncil): add new scraper using ReCollect API#2049

Open
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:fix/bassetlaw-test-postcode
Open

feat(BassetlawDistrictCouncil): add new scraper using ReCollect API#2049
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:fix/bassetlaw-test-postcode

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 12, 2026

Summary

  • New scraper for Bassetlaw District Council using the ReCollect API (area BassetlawUK, service 50015)
  • Multi-strategy address resolution: full address text search, postcode qualifier expansion, and combined paon+postcode fallback
  • Test address: 2 Albert Road, DN22 6JD (verified returning Green Waste, Blue Recycling, and Glass bins)

Summary by CodeRabbit

  • New Features

    • Added support for Bassetlaw District Council bin collection schedules.
  • Tests

    • Updated test configuration to include Bassetlaw District Council and refined North Hertfordshire District Council test data.

Review Change Stack

Uses ReCollect API (area BassetlawUK, service 50015) with multi-strategy
address resolution: full address text, postcode qualifier expansion,
and combined paon+postcode search.

Test address: 2 Albert Road, DN22 6JD

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Warning

Review limit reached

@InertiaUK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 23 minutes and 59 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0d340679-9df5-4939-bc20-6483ff3d0b4b

📥 Commits

Reviewing files that changed from the base of the PR and between cf6b757 and a8aba99.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/BassetlawDistrictCouncil.py
📝 Walkthrough

Walkthrough

This PR adds a new Bassetlaw District Council integration module that fetches bin collection data via the Recollect API, resolving addresses to place IDs through multiple fallback strategies and returning scheduled pickups over a 60-day window. Test fixtures are added to support the integration.

Changes

Bassetlaw District Council Integration

Layer / File(s) Summary
Bassetlaw council integration module
uk_bin_collection/uk_bin_collection/councils/BassetlawDistrictCouncil.py
HTTP headers and configuration constants (AREA, SERVICE) establish the module foundation. Address-to-place_id resolution uses three staged Recollect API strategies: direct PAON text search, postcode-based lookup with parcel/qualifier matching, and combined PAON+postcode fallback. If no place_id is found, ValueError is raised. Event fetching queries the Recollect events endpoint over a 60-day window, filters to pickup events, extracts bin types from event flags, deduplicates by (bin_type, collection_date), sorts chronologically, and returns the structured bin data object.
Test fixture for Bassetlaw
uk_bin_collection/tests/input.json
BassetlawDistrictCouncil test entry defines LAD24CD, house number, postcode, skip_get_url flag, empty UPRN, base URL, and wiki name for integration testing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A new warren joins the fold,
Bassetlaw's bins in Recollect's hold,
Three address paths and sixty days wide,
Events are sorted, duplicates denied!
Test fixtures rest, all ready to play. 🗑️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(BassetlawDistrictCouncil): add new scraper using ReCollect API' directly and clearly describes the main change: adding a new scraper for Bassetlaw District Council using the ReCollect API.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (a8aba99).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2049   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/BassetlawDistrictCouncil.py`:
- Around line 122-128: The parsing silently skips malformed payloads: change the
logic around events_data, the events loop and the day check to fail fast by
validating the upstream contract and raising exceptions when it's violated;
specifically, in the method that assigns events_data (variable events_data)
ensure the "events" key exists and is a list (raise ValueError/RuntimeError if
missing or wrong type instead of using events_data.get("events", []")), and
inside the for loop that iterates events, validate each event contains a
non-empty "day" field (raise an exception mentioning the offending event when
day is missing) so the scraper surfaces ReCollect contract changes rather than
producing incomplete bins.
- Around line 21-22: parse_data currently only reads paon from kwargs, so
user_paon becomes None when the fixture supplies house_number; update the
assignment to read house_number as a fallback (e.g., set user_paon =
kwargs.get("paon") or kwargs.get("house_number")) so the subsequent matching
logic in parse_data (uses of user_paon around the postcode matching and
selection at the branches that pick results) will prefer house_number when paon
is absent; ensure the same fallback is used wherever user_paon is referenced in
this class.
- Around line 45-88: The code currently defaults to the first candidate when no
PAON match is found (assigning place_id = parcels[0]["place_id"] and place_id =
addresses[0]["place_id"]) and also ignores multiple qualifiers; instead, detect
ambiguity and fail loudly: when more than one parcel/qualifier/address exists
and no explicit PAON match was found, raise a clear exception (e.g., ValueError
or a custom AddressAmbiguousError) rather than selecting the first entry; update
the logic in BassetlawDistrictCouncil.py around the parcels, qualifiers and
addresses handling (the blocks that set place_id from parcels, qualifiers[0],
and addresses) to return/raise an error indicating an ambiguous postcode so
callers can handle it explicitly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 55d53489-61ca-49c1-85f0-3d89d5a3250d

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and cf6b757.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/BassetlawDistrictCouncil.py

Comment thread uk_bin_collection/uk_bin_collection/councils/BassetlawDistrictCouncil.py Outdated
Comment on lines +45 to +88
# If postcode returns parcels directly, match by house number
parcels = [r for r in results if r.get("type") == "parcel" and r.get("place_id")]
if parcels:
if user_paon:
for p in parcels:
name = p.get("name", "")
if name.lower().startswith(user_paon.lower()):
place_id = p["place_id"]
break
if not place_id and parcels:
place_id = parcels[0]["place_id"]

# If postcode returns qualifiers, get all addresses for that postcode
if not place_id:
qualifiers = [r for r in results if r.get("type") == "place_qualifier"]
if qualifiers:
qual = qualifiers[0]
# Fetch all addresses in this postcode area
params2 = {"q": "", "locale": "en-GB", "place_qualifier_id": qual["qualifier_id"]}
response2 = requests.get(suggest_url, headers=HEADERS, params=params2, timeout=30)
response2.raise_for_status()
addresses = [r for r in response2.json() if r.get("type") == "parcel" and r.get("place_id")]

if not addresses:
# Qualifier empty search fails — try with postcode text inside qualifier
params3 = {"q": postcode, "locale": "en-GB", "place_qualifier_id": qual["qualifier_id"]}
response3 = requests.get(suggest_url, headers=HEADERS, params=params3, timeout=30)
response3.raise_for_status()
addresses = [r for r in response3.json() if r.get("type") == "parcel" and r.get("place_id")]

if addresses and user_paon:
for addr in addresses:
name = addr.get("name", "")
if name.lower().startswith(user_paon.lower()):
place_id = addr["place_id"]
break
# Match house number at start of address
num_match = re.match(r"^(\d+[a-zA-Z]?)\b", user_paon.strip())
if num_match and name.startswith(num_match.group(1) + " "):
place_id = addr["place_id"]
break

if not place_id and addresses:
place_id = addresses[0]["place_id"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't default ambiguous postcode matches to the first candidate.

Line 55 and Line 88 silently pick the first returned parcel/address when no exact PAON match is found. On a multi-property postcode, that can return a valid-looking schedule for the wrong household. The same ambiguity exists at Line 61 when multiple qualifiers are returned.

Suggested change
-                if not place_id and parcels:
-                    place_id = parcels[0]["place_id"]
+                if not place_id and len(parcels) == 1:
+                    place_id = parcels[0]["place_id"]
+                elif not place_id and parcels:
+                    raise ValueError(
+                        f"Ambiguous postcode match for {postcode}; please provide a more specific address"
+                    )

-                if qualifiers:
+                if len(qualifiers) == 1:
                     qual = qualifiers[0]
+                elif qualifiers:
+                    raise ValueError(
+                        f"Ambiguous postcode qualifier match for {postcode}"
+                    )
+                else:
+                    qual = None

-                    if not place_id and addresses:
-                        place_id = addresses[0]["place_id"]
+                    if not place_id and len(addresses) == 1:
+                        place_id = addresses[0]["place_id"]
+                    elif not place_id and addresses:
+                        raise ValueError(
+                            f"Ambiguous address match for {postcode}; please provide a more specific address"
+                        )

Based on learnings: In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/BassetlawDistrictCouncil.py`
around lines 45 - 88, The code currently defaults to the first candidate when no
PAON match is found (assigning place_id = parcels[0]["place_id"] and place_id =
addresses[0]["place_id"]) and also ignores multiple qualifiers; instead, detect
ambiguity and fail loudly: when more than one parcel/qualifier/address exists
and no explicit PAON match was found, raise a clear exception (e.g., ValueError
or a custom AddressAmbiguousError) rather than selecting the first entry; update
the logic in BassetlawDistrictCouncil.py around the parcels, qualifiers and
addresses handling (the blocks that set place_id from parcels, qualifiers[0],
and addresses) to return/raise an error indicating an ambiguous postcode so
callers can handle it explicitly.

Comment on lines +122 to +128
events_data = response.json()

seen = set()
for event in events_data.get("events", []):
day = event.get("day")
if not day:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Raise on malformed event payloads instead of silently dropping them.

events_data.get("events", []) and if not day: continue turn upstream contract drift into an empty or incomplete bins list. If ReCollect changes the payload shape, this scraper should fail fast rather than under-report collections.

Suggested change
-        events_data = response.json()
+        events_data = response.json()
+        events = events_data.get("events")
+        if not isinstance(events, list):
+            raise ValueError(
+                f"Unexpected events payload for place_id {place_id}: missing 'events' list"
+            )

         seen = set()
-        for event in events_data.get("events", []):
+        for event in events:
             day = event.get("day")
             if not day:
-                continue
+                raise ValueError(
+                    f"Unexpected event payload for place_id {place_id}: missing 'day'"
+                )

Based on learnings: In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/BassetlawDistrictCouncil.py`
around lines 122 - 128, The parsing silently skips malformed payloads: change
the logic around events_data, the events loop and the day check to fail fast by
validating the upstream contract and raising exceptions when it's violated;
specifically, in the method that assigns events_data (variable events_data)
ensure the "events" key exists and is a list (raise ValueError/RuntimeError if
missing or wrong type instead of using events_data.get("events", []")), and
inside the for loop that iterates events, validate each event contains a
non-empty "day" field (raise an exception mentioning the offending event when
day is missing) so the scraper surfaces ReCollect contract changes rather than
producing incomplete bins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant