Skip to content

feat: add scottish borders council scraper#2087

Open
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:feat/scottish-borders-council
Open

feat: add scottish borders council scraper#2087
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:feat/scottish-borders-council

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 22, 2026

Summary

  • New scraper for Scottish Borders Council (population ~116k)
  • Uses Bartec Municipal Portal at scotborders-live-portal.bartecmunicipal.com
  • Pure HTTP with requests + BeautifulSoup - no Selenium needed
  • 3-step form flow: GET CSRF token, POST postcode, POST UPRN selection
  • Parses Syncfusion DropDownList (addresses) and Schedule (calendar events) JSON embedded in script tags
  • Supports both direct Bartec UPRN and house number matching for address selection

Test plan

  • Tested with postcode TD6 9QQ, UPRN 116043056 (1 Chiefswood Road, Melrose)
  • Returns alternating General Waste and Recycling collections (fortnightly)
  • Also tested with house number "1" instead of Bartec UPRN - correctly resolves

Summary by CodeRabbit

  • New Features

    • Added Scottish Borders Council support, enabling users to retrieve waste bin collection schedules by entering their postcode and property reference number.
  • Configuration Updates

    • Expanded council configuration to include Scottish Borders Council; reorganized existing council entries for improved consistency.

Review Change Stack

Uses Bartec Municipal Portal. Pure HTTP with requests + BeautifulSoup.
3-step form flow with CSRF tokens: postcode search, address select by
UPRN, then parse Syncfusion Schedule JSON for collection events.
Also supports house number matching as fallback for address selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Warning

Review limit reached

@InertiaUK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 19 minutes and 32 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 16f8ee27-5162-4ce1-890c-b47ba1d7613c

📥 Commits

Reviewing files that changed from the base of the PR and between b97b36f and 1c22de4.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/ScottishBordersCouncil.py
📝 Walkthrough

Walkthrough

This PR introduces a new web scraper for Scottish Borders Council that authenticates via CSRF tokens, searches for collection addresses by postcode, and extracts bin collection schedules from portal calendar events. Test configuration is updated to include the new council and fix a unicode character encoding in an existing entry.

Changes

Scottish Borders Council Scraper Implementation and Testing

Layer / File(s) Summary
Scottish Borders Scraper Implementation
uk_bin_collection/uk_bin_collection/councils/ScottishBordersCouncil.py
New council class that establishes HTTP sessions with portal headers, extracts CSRF tokens from hidden form fields, POSTs postcode to retrieve address options and matches them against provided UPRN or property number (paon), then extracts calendar event JSON to build a chronologically sorted bins list with collection dates. Raises ValueError when CSRF, address matches, or events are missing.
Test Configuration
uk_bin_collection/tests/input.json
Added ScottishBordersCouncil test entry with postcode, UPRN, portal URL, and metadata. Fixed unicode punctuation in EnvironmentFirst wiki note. Re-indented NorthHertfordshireDistrictCouncil object without changing values.

Sequence Diagram

sequenceDiagram
  participant Client as parse_data
  participant Portal as Portal Server
  participant Session as HTTP Session
  participant Parser as HTML/JSON Parser
  
  Client->>Session: Create session with headers
  Session->>Portal: GET calendar page
  Portal-->>Session: HTML with CSRF token
  Client->>Parser: Extract CSRF via _get_csrf_token
  Parser-->>Client: __RequestVerificationToken value
  
  Client->>Session: POST postcode to address handler
  Session->>Portal: postcode request
  Portal-->>Session: JSON with address dropdown options
  Client->>Parser: Match UPRN (provided, paon match, or first)
  Parser-->>Client: Selected UPRN and address
  
  Client->>Session: GET search results page
  Session->>Portal: UPRN selection confirmation
  Portal-->>Session: HTML with refreshed CSRF
  Client->>Parser: Re-extract CSRF from results
  Parser-->>Client: Updated CSRF token
  
  Client->>Session: POST premises selection
  Session->>Portal: Selected UPRN confirmation
  Portal-->>Session: Rendered page with calendar events
  Client->>Parser: Extract isJson blocks and find Subject/StartTime
  Parser-->>Client: Calendar event objects
  
  Client->>Parser: Build bins list with collectionDate mapping
  Parser-->>Client: Sorted bins by chronological date
  Client->>Client: Return bindata dict with bins
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • dp247

Poem

🐰 A Scottish border scraper hops to life,
CSRF tokens dancing, free from strife,
Postcode searches find each bin's true place,
Calendar events sorted at a quickened pace,
Collection dates arranged in perfect line,
The rabbit's work makes scheduling divine! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a new Scottish Borders Council scraper implementation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (1c22de4).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2087   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/ScottishBordersCouncil.py`:
- Around line 20-25: The _get_csrf_token function currently assumes the found
input has a value; update it to validate that the input element returned by
soup.find("input", {"name": "__RequestVerificationToken"}) actually contains a
non-empty "value" attribute and raise a clear ValueError if missing.
Specifically, in _get_csrf_token check token is not None and that
token.get("value") (or token.has_attr("value") and token["value"].strip()) is
truthy; if not, raise an error like "CSRF token input found but missing value"
so callers don't receive None in subsequent requests.
- Around line 50-66: The HTTP requests in ScottishBordersCouncil (the
session.get(self.BASE_URL) and the session.post(... handler=SearchPostcode ...))
lack timeouts; update both calls to pass timeout=REQUEST_TIMEOUT so they won't
hang indefinitely (keep response.raise_for_status() as-is); search for the
GET/POST occurrences in the method that calls _get_csrf_token and add the
timeout argument to each request.
- Around line 150-177: The parser currently silent-returns bindata with empty
"bins" when no valid events are found; modify the logic in the event-processing
routine (the loop over events that builds bindata["bins"], using variables
events, subject, start_time, collection_date and date_format) to detect after
the loop if bindata["bins"] is empty and, in that case, raise a descriptive
exception (e.g., ValueError or a custom ParseError) that includes context (e.g.,
number of events processed and a hint that dates/subjects were invalid) instead
of returning {"bins": []}; keep the existing parsing/continue behavior for
individual invalid events but ensure the top-level failure is raised from the
same function that currently returns bindata.
- Around line 137-142: The loop that parses matched JSON fragments can crash on
malformed input; wrap the json.loads(raw_json) call inside a try/except that
catches json.JSONDecodeError (and optionally ValueError) so a single bad
fragment is skipped and the loop continues, optionally logging a warning; keep
the existing logic that checks parsed, isinstance(parsed[0], dict) and "Subject"
in parsed[0] and only set events = parsed and break when a valid fragment is
found (referencing variables all_matches, raw_json, parsed, events in
ScottishBordersCouncil.py).
- Around line 86-102: The code converts UPRN values with int(addr.get("UPRN",
0)) which can raise ValueError/TypeError for None, empty or non-numeric UPRNs;
add a small helper (e.g., safe_uprn_str or parse_uprn) and use it wherever UPRNs
are read (the blocks referencing selected_uprn, addresses, user_uprn, user_paon
and the calls to addr.get("UPRN", 0)) to validate and convert to a numeric
string safely: attempt to coerce to str, strip, check numeric (or catch
ValueError/TypeError around int()), return None for invalid values, and skip
those addresses instead of letting an exception propagate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6028689f-1519-41f9-afe9-5af2f6da8777

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and b97b36f.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/ScottishBordersCouncil.py

Comment thread uk_bin_collection/uk_bin_collection/councils/ScottishBordersCouncil.py Outdated
Comment thread uk_bin_collection/uk_bin_collection/councils/ScottishBordersCouncil.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant