Skip to content

feat: add inverclyde council scraper#2094

Open
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:feat/inverclyde-council
Open

feat: add inverclyde council scraper#2094
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:feat/inverclyde-council

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 22, 2026

Summary

  • New scraper for Inverclyde Council (population ~78k, Scotland)
  • Parses the council's street-sorted PDF via pdfplumber table extraction
  • Computes fortnightly alternating collection dates from reference date
  • Handles house number ranges (e.g. "2 to 24, 38 and 40") and 28 addresses with split recycling/residual days
  • Pure HTTP with requests + pdfplumber - no Selenium needed
  • Nominatim geocode fallback for street name resolution from postcode

Notes

  • The council's GIS noticeboard is broken (500 errors on data methods)
  • Reference date (wc 16 March 2026 for Week 1) derived from 2026-27 calendar PDFs
  • Garden waste only collected March-November

Test plan

  • Tested with PA16 8AA (Brougham Street, Greenock) - Thursday Week 1
  • Returns Blue Recycling, Brown Garden Waste, Black General Waste, Green Food Waste
  • End-to-end verified through Kepthouse API

Summary by CodeRabbit

  • New Features
    • Added Inverclyde Council support with automated bin collection scheduling for the next 8 weeks, covering recycling and residual waste services with address-specific collection day details.

Review Change Stack

Parses council's street-sorted PDF (pdfplumber) and computes fortnightly
collection dates. Handles house number ranges and split recycling/residual
days. Matches street name from address or Nominatim geocode fallback.
Pure HTTP - no Selenium needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Warning

Review limit reached

@InertiaUK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 18 minutes and 35 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c555dceb-3036-4e9e-8c5b-293b17e921b4

📥 Commits

Reviewing files that changed from the base of the PR and between 91d5f6a and 320f02f.

📒 Files selected for processing (1)
  • uk_bin_collection/tests/input.json
📝 Walkthrough

Walkthrough

Adds a complete bin collection scraper for Inverclyde Council. The scraper downloads a static PDF of street uplift days, parses street/town/detail rows, resolves postcodes to streets via Nominatim with fallbacks, matches rows by street-name similarity and optional house-number logic, computes upcoming collection dates using a fortnightly recycling calendar, and returns sorted bin entries with type variants and per-address overrides.

Changes

Inverclyde Council Scraper Implementation

Layer / File(s) Summary
Test input data setup
uk_bin_collection/tests/input.json
Corrects EnvironmentFirst wiki_note placeholder text, re-serializes North Hertfordshire entry, and adds new InverclydeCouncil test entry with all required fields (LAD24CD, postcode, paon, skip_get_url, url, wiki_name, wiki_note).
Constants and CouncilClass definition
uk_bin_collection/uk_bin_collection/councils/InverclydeCouncil.py (lines 1–43, 227–245)
Declares PDF URL, day-name normalization mapping, and week-1 reference date for fortnightly recycling offset. Defines CouncilClass and documents the scraping approach and input expectations.
Date and day normalization helpers
uk_bin_collection/uk_bin_collection/councils/InverclydeCouncil.py (lines 45–116)
Adds utility functions to normalize day names, map day strings to weekday indices, determine recycling week 1 vs 2 using fortnightly offset, and compute next collection dates for multiple bin types over upcoming weeks.
PDF table extraction
uk_bin_collection/uk_bin_collection/councils/InverclydeCouncil.py (lines 119–144)
Implements PDF download and table parsing that extracts street/town/detail rows from each page, filters headers and blank rows, and returns structured row dictionaries.
Street row matching and house-number resolution
uk_bin_collection/uk_bin_collection/councils/InverclydeCouncil.py (lines 146–225)
Implements progressive matching logic that tries exact, containment, and word-level matches. When paon is provided, narrows results by searching for house-number matches in the Detail field (including numeric ranges "X to Y"), with fallbacks to general street entries.
Main orchestration and output generation
uk_bin_collection/uk_bin_collection/councils/InverclydeCouncil.py (lines 246–452)
Orchestrates parse_data: validates postcode, downloads/parses PDF, resolves street via Nominatim with paon fallback, normalizes matched day field, maps PDF calendar to recycling week or no-recycling mode. Generates bin entries for recycling (blue/brown/black/food-caddy based on week and garden month) or no-recycling (residual/black only) schedules. Applies per-address recycling-day overrides from Detail field, filters to future dates, sorts by collectionDate, and returns final bindata structure.

Sequence Diagram

sequenceDiagram
  participant ParseData as parse_data()
  participant PDF as parse_pdf_data()
  participant Nominatim as Nominatim API
  participant RowMatch as find_best_matching_row()
  participant DateGen as next_dates_for_bin_types()
  participant Output as bin entries dict

  ParseData->>PDF: download and parse PDF
  PDF-->>ParseData: rows with street/town/detail
  ParseData->>Nominatim: resolve postcode to street
  alt Nominatim success
    Nominatim-->>ParseData: street name
  else Nominatim fallback
    ParseData->>Nominatim: resolve paon+postcode
    Nominatim-->>ParseData: street name or use paon as fallback
  end
  ParseData->>RowMatch: find_best_matching_row(street, paon, rows)
  RowMatch-->>ParseData: matched row with detail/day/calendar
  ParseData->>DateGen: compute bin dates for week pattern
  DateGen-->>ParseData: initial bin entry list
  ParseData->>ParseData: apply per-address recycling overrides from Detail
  ParseData->>ParseData: filter to future dates and sort
  ParseData-->>Output: return sorted bins dict
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • dp247

Poem

🐰 A new council joins the warren's fold,
Where PDF streets turn bold.
Nominatim whispers street names true,
Fortnightly calendars start anew.
Bins aligned in rows, week by week—
Inverclyde's collection, no longer unique! 🗑️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and concisely describes the primary change—adding a new council scraper for Inverclyde—which aligns with the substantial code additions and PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (320f02f).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2094   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/InverclydeCouncil.py (1)

91-96: 💤 Low value

Consider capturing datetime.now() once to avoid subtle midnight-boundary inconsistency.

Lines 91 and 94 call datetime.now() separately. If execution spans midnight, today could be the previous date while hour reads from the new day, potentially producing off-by-one behavior. The downstream >= today filter in parse_data mitigates this, but capturing the timestamp once is cleaner.

♻️ Suggested fix
+    now = datetime.now()
+    today = now.date()
-    today = datetime.now().date()
     # Find the next occurrence of this weekday (or today if it matches)
     days_ahead = (day_idx - today.weekday()) % 7
-    if days_ahead == 0 and datetime.now().hour >= 19:
+    if days_ahead == 0 and now.hour >= 19:
         days_ahead = 7
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/InverclydeCouncil.py` around
lines 91 - 96, The code computes today and then re-calls datetime.now() for the
hour check, risking a midnight boundary bug; capture the current timestamp once
(e.g., assign now = datetime.now()), use now.date() for today and now.hour for
the 19-hour comparison, and then compute days_ahead and next_day using those
single timestamp-derived values (affecting the variables today, days_ahead, and
next_day).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Line 882: The string value for the "wiki_note" key contains a mojibake
character in "property�you"; update the "wiki_note" value to replace the invalid
character with a normal separator (for example "property; you" or "property -
you") so the note reads correctly for users, ensuring you only change the
separator and keep the rest of the text (including the UPRN placeholder and URL)
intact.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/InverclydeCouncil.py`:
- Around line 91-96: The code computes today and then re-calls datetime.now()
for the hour check, risking a midnight boundary bug; capture the current
timestamp once (e.g., assign now = datetime.now()), use now.date() for today and
now.hour for the 19-hour comparison, and then compute days_ahead and next_day
using those single timestamp-derived values (affecting the variables today,
days_ahead, and next_day).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 70d899f1-98e4-49fc-a082-ec9e02e6578a

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 91d5f6a.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/InverclydeCouncil.py

Comment thread uk_bin_collection/tests/input.json Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant