Skip to content

feat: add orkney islands council scraper#2096

Open
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:feat/orkney-islands-council
Open

feat: add orkney islands council scraper#2096
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:feat/orkney-islands-council

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 22, 2026

Summary

  • New scraper for Orkney Islands Council (population ~22k, Scotland)
  • Uses Jadu FAQ search to match street/island name to collection area
  • Mainland areas (01-15) have embedded Google Calendar iCal feeds with dated collection events
  • Handles iCal RRULE recurrence (FREQ=WEEKLY with INTERVAL), EXDATE exclusions, RECURRENCE-ID overrides
  • Island areas return day-of-week schedule
  • Pure HTTP with requests - no Selenium needed
  • 3 bin types on mainland: Black/grey (general), Glass and plastic, Paper/card/metal

Test plan

  • Tested with Albert Street, Kirkwall (KW15 1HP, Area 01) - 26 bins from iCal
  • End-to-end verified through Kepthouse API

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for Orkney Islands Council bin collection schedules, enabling users to retrieve household waste and recycling collection dates across all areas
    • Service now supports both mainland calendar-based and island-area text-based collection information formats for accurate scheduling
  • Tests

    • Updated council configuration test data for the new council entry

Review Change Stack

Uses Jadu FAQ search to match street to collection area. Mainland areas
(01-15) have embedded Google Calendar iCal feeds with dated events and
RRULE recurrence. Island areas return day-of-week. Handles EXDATE
exclusions and RECURRENCE-ID overrides for holiday changes.
Pure HTTP - no Selenium needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Warning

Review limit reached

@InertiaUK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 17 minutes and 20 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b5121926-6467-46bd-8b5d-2d1041537f13

📥 Commits

Reviewing files that changed from the base of the PR and between c3ed383 and fc96993.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py
📝 Walkthrough

Walkthrough

This PR adds a new Orkney Islands Council bin collection parser that fetches waste collection schedules from the council's MyBins FAQ. It supports two collection schedule formats: mainland areas use embedded Google Calendar iCal feeds with recurring events, while island areas use simple text-based day-of-week answers. Test configuration entries are added for the new council.

Changes

Orkney Islands Council Implementation

Layer / File(s) Summary
Test configuration for Orkney Islands Council
uk_bin_collection/tests/input.json
Test input entries define Orkney Islands Council LAD code, PAON search parameter, postcode, target URL, and wiki documentation. Minor formatting adjustments to existing EnvironmentFirst and North Hertfordshire entries.
Module imports and entry point
uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py (lines 1–98)
Module wiring and CouncilClass.parse_data entry point requiring PAON parameter; searches MyBins FAQ, fetches matching detail page, dispatches to format-specific parser, and raises ValueError on no match or missing content.
Calendar parsing pipeline
uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py (lines 99–329)
FAQ dispatcher routes to calendar or island parsing. Calendar extraction searches HTML locations and base64-decodes Google Calendar source. Calendar parser fetches iCal feed, expands recurring events over ~180 days (handling EXDATE exclusions and RECURRENCE-ID overrides), normalizes bin types, and returns sorted dated entries.
Island day parsing and helpers
uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py (lines 330–415)
iCal field parsing handles parameterized fields, line folding, and unescaping. Date parsing converts YYYYMMDD and UTC datetime formats. Island day parser extracts and validates day name, computes next weekday occurrence, and emits single "General Waste" bin entry.

Sequence Diagram

sequenceDiagram
  participant parse_data
  participant _parse_faq_detail
  participant _extract_calendar_id
  participant _parse_google_calendar
  participant _expand_ical_events
  parse_data->>_parse_faq_detail: FAQ detail HTML
  _parse_faq_detail->>_extract_calendar_id: Extract ID from HTML
  _extract_calendar_id->>_parse_google_calendar: Calendar ID
  _parse_google_calendar->>_expand_ical_events: iCal text
  _expand_ical_events->>_parse_google_calendar: Expanded events list
  _parse_google_calendar->>parse_data: Sorted bin entries
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

  • dp247

Poem

🐰 A parser for islands swept by northern breeze,
Mainland calendars and island days with ease,
iCal events expand through recurring spree,
While island folk just count the days, you see,
Orkney's waste now tracked digitally! 🗓️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: adding a new scraper implementation for Orkney Islands Council. It is concise, specific, and clearly summarizes the primary contribution.
Docstring Coverage ✅ Passed Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (fc96993).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2096   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py (3)

322-326: 💤 Low value

Prefix unused loop variables with underscore.

The loop variables uid and rec_date are not used in the loop body. Prefix them with _ to indicate they are intentionally unused.

Proposed fix
-        for (uid, rec_date), (o_date, o_summary) in overrides.items():
+        for (_uid, _rec_date), (o_date, o_summary) in overrides.items():
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py` around
lines 322 - 326, The for-loop over overrides.items() declares unused variables
uid and rec_date; change their names to _uid and _rec_date (or simply _ ,
_rec_date) in the loop header (for (_uid, _rec_date), (o_date, o_summary) in
overrides.items():) to signal they are intentionally unused while leaving the
rest of the logic (the o_date check against now and horizon and the
results.append dedupe using results) unchanged.

369-377: 💤 Low value

Simplify redundant date parsing branches.

The elif "T" in raw and else branches perform identical operations. This can be simplified.

Proposed fix
         try:
             if len(raw) == 8:
                 return datetime.strptime(raw, "%Y%m%d")
-            elif "T" in raw:
-                return datetime.strptime(raw[:8], "%Y%m%d")
             else:
+                # Handle datetime formats like '20241209T000000Z'
                 return datetime.strptime(raw[:8], "%Y%m%d")
         except ValueError:
             return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py` around
lines 369 - 377, The date-parsing block contains redundant branches; replace the
current try block so it always attempts to parse the first 8 characters when the
input has at least 8 characters and returns None otherwise. Concretely, in the
function that parses `raw` (the block that currently checks len(raw) == 8, `elif
"T" in raw`, etc.), change the logic to: if `raw` is truthy and `len(raw) >= 8`,
call `datetime.strptime(raw[:8], "%Y%m%d")` inside the try/except and return
None on ValueError; otherwise return None. This removes the duplicate `elif "T"
in raw`/`else` branches while preserving behavior.

130-158: ⚡ Quick win

Catch specific exceptions instead of bare Exception.

The broad except Exception clauses catch too much (including programming errors, KeyboardInterrupt, etc.). Since you're only expecting base64 decoding failures, catch the specific exceptions.

Proposed fix
+import binascii
+
 # In _extract_calendar_id method:
         if print_link and print_link.get("data-calendar-source"):
             try:
                 return base64.b64decode(
                     print_link["data-calendar-source"]
                 ).decode("utf-8")
-            except Exception:
+            except (ValueError, UnicodeDecodeError, binascii.Error):
                 pass

Apply similarly to the other two try-except blocks in this method.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py` around
lines 130 - 158, The try/except blocks around
base64.b64decode(...).decode("utf-8") are catching broad Exception; narrow them
to only the decoding-related exceptions (e.g., catch binascii.Error, TypeError
and UnicodeDecodeError) so you don't swallow unrelated errors. Update the three
blocks that decode the calendar source (the one using
print_link["data-calendar-source"], the one decoding src_list[0] from
print_link["href"], and the one decoding src_list[0] from iframe["src"]) to
replace "except Exception:" with "except (binascii.Error, TypeError,
UnicodeDecodeError):" and import binascii at the top.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Line 882: The value for the JSON key "wiki_note" contains a corrupted Unicode
character `�`; locate the "wiki_note" entry in
uk_bin_collection/tests/input.json and replace the corrupted character with the
intended punctuation (likely an em-dash "—" or a simple hyphen "-", e.g.
"...Replace the XXXXXXXXXX with the UPRN of your property — you can use..."),
ensuring the file remains valid UTF-8 and the string is properly escaped if
needed.

In `@uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py`:
- Around line 48-55: The session.get calls (e.g., the call using search_url with
params and the other session.get calls later) lack a timeout and can hang
indefinitely; fix by adding a timeout parameter (e.g., timeout=10) to each
session.get invocation (the one that assigns response and calls
response.raise_for_status and the other session.get calls referred to) so the
HTTP request will fail fast on unresponsive servers, and propagate or catch
requests.exceptions.Timeout/RequestException where appropriate to handle
failures gracefully.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py`:
- Around line 322-326: The for-loop over overrides.items() declares unused
variables uid and rec_date; change their names to _uid and _rec_date (or simply
_ , _rec_date) in the loop header (for (_uid, _rec_date), (o_date, o_summary) in
overrides.items():) to signal they are intentionally unused while leaving the
rest of the logic (the o_date check against now and horizon and the
results.append dedupe using results) unchanged.
- Around line 369-377: The date-parsing block contains redundant branches;
replace the current try block so it always attempts to parse the first 8
characters when the input has at least 8 characters and returns None otherwise.
Concretely, in the function that parses `raw` (the block that currently checks
len(raw) == 8, `elif "T" in raw`, etc.), change the logic to: if `raw` is truthy
and `len(raw) >= 8`, call `datetime.strptime(raw[:8], "%Y%m%d")` inside the
try/except and return None on ValueError; otherwise return None. This removes
the duplicate `elif "T" in raw`/`else` branches while preserving behavior.
- Around line 130-158: The try/except blocks around
base64.b64decode(...).decode("utf-8") are catching broad Exception; narrow them
to only the decoding-related exceptions (e.g., catch binascii.Error, TypeError
and UnicodeDecodeError) so you don't swallow unrelated errors. Update the three
blocks that decode the calendar source (the one using
print_link["data-calendar-source"], the one decoding src_list[0] from
print_link["href"], and the one decoding src_list[0] from iframe["src"]) to
replace "except Exception:" with "except (binascii.Error, TypeError,
UnicodeDecodeError):" and import binascii at the top.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 573b6123-3ba6-43b0-89e4-84b82aefed5f

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and c3ed383.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/OrkneyIslandsCouncil.py

Comment thread uk_bin_collection/tests/input.json Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant