feat: add orkney islands council scraper#2096
Conversation
Uses Jadu FAQ search to match street to collection area. Mainland areas (01-15) have embedded Google Calendar iCal feeds with dated events and RRULE recurrence. Island areas return day-of-week. Handles EXDATE exclusions and RECURRENCE-ID overrides for holiday changes. Pure HTTP - no Selenium needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
Your plan currently allows 2 reviews/hour. Refill in 17 minutes and 20 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR adds a new Orkney Islands Council bin collection parser that fetches waste collection schedules from the council's MyBins FAQ. It supports two collection schedule formats: mainland areas use embedded Google Calendar iCal feeds with recurring events, while island areas use simple text-based day-of-week answers. Test configuration entries are added for the new council. ChangesOrkney Islands Council Implementation
Sequence DiagramsequenceDiagram
participant parse_data
participant _parse_faq_detail
participant _extract_calendar_id
participant _parse_google_calendar
participant _expand_ical_events
parse_data->>_parse_faq_detail: FAQ detail HTML
_parse_faq_detail->>_extract_calendar_id: Extract ID from HTML
_extract_calendar_id->>_parse_google_calendar: Calendar ID
_parse_google_calendar->>_expand_ical_events: iCal text
_expand_ical_events->>_parse_google_calendar: Expanded events list
_parse_google_calendar->>parse_data: Sorted bin entries
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2096 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py (3)
322-326: 💤 Low valuePrefix unused loop variables with underscore.
The loop variables
uidandrec_dateare not used in the loop body. Prefix them with_to indicate they are intentionally unused.Proposed fix
- for (uid, rec_date), (o_date, o_summary) in overrides.items(): + for (_uid, _rec_date), (o_date, o_summary) in overrides.items():🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py` around lines 322 - 326, The for-loop over overrides.items() declares unused variables uid and rec_date; change their names to _uid and _rec_date (or simply _ , _rec_date) in the loop header (for (_uid, _rec_date), (o_date, o_summary) in overrides.items():) to signal they are intentionally unused while leaving the rest of the logic (the o_date check against now and horizon and the results.append dedupe using results) unchanged.
369-377: 💤 Low valueSimplify redundant date parsing branches.
The
elif "T" in rawandelsebranches perform identical operations. This can be simplified.Proposed fix
try: if len(raw) == 8: return datetime.strptime(raw, "%Y%m%d") - elif "T" in raw: - return datetime.strptime(raw[:8], "%Y%m%d") else: + # Handle datetime formats like '20241209T000000Z' return datetime.strptime(raw[:8], "%Y%m%d") except ValueError: return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py` around lines 369 - 377, The date-parsing block contains redundant branches; replace the current try block so it always attempts to parse the first 8 characters when the input has at least 8 characters and returns None otherwise. Concretely, in the function that parses `raw` (the block that currently checks len(raw) == 8, `elif "T" in raw`, etc.), change the logic to: if `raw` is truthy and `len(raw) >= 8`, call `datetime.strptime(raw[:8], "%Y%m%d")` inside the try/except and return None on ValueError; otherwise return None. This removes the duplicate `elif "T" in raw`/`else` branches while preserving behavior.
130-158: ⚡ Quick winCatch specific exceptions instead of bare
Exception.The broad
except Exceptionclauses catch too much (including programming errors,KeyboardInterrupt, etc.). Since you're only expecting base64 decoding failures, catch the specific exceptions.Proposed fix
+import binascii + # In _extract_calendar_id method: if print_link and print_link.get("data-calendar-source"): try: return base64.b64decode( print_link["data-calendar-source"] ).decode("utf-8") - except Exception: + except (ValueError, UnicodeDecodeError, binascii.Error): passApply similarly to the other two try-except blocks in this method.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py` around lines 130 - 158, The try/except blocks around base64.b64decode(...).decode("utf-8") are catching broad Exception; narrow them to only the decoding-related exceptions (e.g., catch binascii.Error, TypeError and UnicodeDecodeError) so you don't swallow unrelated errors. Update the three blocks that decode the calendar source (the one using print_link["data-calendar-source"], the one decoding src_list[0] from print_link["href"], and the one decoding src_list[0] from iframe["src"]) to replace "except Exception:" with "except (binascii.Error, TypeError, UnicodeDecodeError):" and import binascii at the top.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Line 882: The value for the JSON key "wiki_note" contains a corrupted Unicode
character `�`; locate the "wiki_note" entry in
uk_bin_collection/tests/input.json and replace the corrupted character with the
intended punctuation (likely an em-dash "—" or a simple hyphen "-", e.g.
"...Replace the XXXXXXXXXX with the UPRN of your property — you can use..."),
ensuring the file remains valid UTF-8 and the string is properly escaped if
needed.
In `@uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py`:
- Around line 48-55: The session.get calls (e.g., the call using search_url with
params and the other session.get calls later) lack a timeout and can hang
indefinitely; fix by adding a timeout parameter (e.g., timeout=10) to each
session.get invocation (the one that assigns response and calls
response.raise_for_status and the other session.get calls referred to) so the
HTTP request will fail fast on unresponsive servers, and propagate or catch
requests.exceptions.Timeout/RequestException where appropriate to handle
failures gracefully.
---
Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py`:
- Around line 322-326: The for-loop over overrides.items() declares unused
variables uid and rec_date; change their names to _uid and _rec_date (or simply
_ , _rec_date) in the loop header (for (_uid, _rec_date), (o_date, o_summary) in
overrides.items():) to signal they are intentionally unused while leaving the
rest of the logic (the o_date check against now and horizon and the
results.append dedupe using results) unchanged.
- Around line 369-377: The date-parsing block contains redundant branches;
replace the current try block so it always attempts to parse the first 8
characters when the input has at least 8 characters and returns None otherwise.
Concretely, in the function that parses `raw` (the block that currently checks
len(raw) == 8, `elif "T" in raw`, etc.), change the logic to: if `raw` is truthy
and `len(raw) >= 8`, call `datetime.strptime(raw[:8], "%Y%m%d")` inside the
try/except and return None on ValueError; otherwise return None. This removes
the duplicate `elif "T" in raw`/`else` branches while preserving behavior.
- Around line 130-158: The try/except blocks around
base64.b64decode(...).decode("utf-8") are catching broad Exception; narrow them
to only the decoding-related exceptions (e.g., catch binascii.Error, TypeError
and UnicodeDecodeError) so you don't swallow unrelated errors. Update the three
blocks that decode the calendar source (the one using
print_link["data-calendar-source"], the one decoding src_list[0] from
print_link["href"], and the one decoding src_list[0] from iframe["src"]) to
replace "except Exception:" with "except (binascii.Error, TypeError,
UnicodeDecodeError):" and import binascii at the top.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 573b6123-3ba6-43b0-89e4-84b82aefed5f
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py
Summary
Test plan
Summary by CodeRabbit
Release Notes
New Features
Tests