WebScrape Pipeline (OpenClaw Data Provider)

This repository provides a file-based WebScrape pipeline:

Python collectors and pipelines gather and normalize public data.
Daily scrape jobs write deterministic JSON and JSONL artifacts.
OpenClaw reads those artifacts and performs downstream analysis/alerting.
A Next.js static-export frontend shows run health and recent changes.

The scripts in this repository do not call LLMs.

Quick Start

From repository root:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -r requirements.txt

Run daily scrape (ideally once per day around the same time):

python3 -m collector.pipeline.scrape
# or one profile only:
python3 -m collector.pipeline.scrape --profile <profile>

Generate report window(s) after scrape:

python3 -m collector.pipeline.report --profile <profile> --target <slug> --days 1
python3 -m collector.pipeline.report --profile <profile> --target <slug> --days 7

Analyze report outputs in:

data/reports/<profile>/YYYY-MM-DD/<slug>/last-<N>-days.json

Layout

config/: JSON schemas for run, report, and source validation.
collector/: Python collection and report utilities.
data/: local runtime data (target registry/history plus generated outputs, snapshots, and report exports).
frontend/: Next.js app exported to static files (frontend/out/) and browser-fetching JSON.

Frontend setup (Next.js static export)

cd frontend
npm install
npm run build

npm run build creates static files in frontend/out/. Deploy out/ with your static server (for example Nginx).

Important: the frontend fetches data in the browser from:

/data/...

So your static hosting layout should expose those paths at the same web root as the frontend.

Documentation Split

SKILL.md: concise day-to-day operating guide for OpenClaw.
reference.md: advanced operations (rediscovery, cleanup, manual source edits, troubleshooting).
frontend-build.md: frontend build/deploy runbook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebScrape Pipeline (OpenClaw Data Provider)

Quick Start

Layout

Frontend setup (Next.js static export)

Documentation Split

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
collector		collector
config		config
frontend		frontend
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
SKILL.md		SKILL.md
frontend-build.md		frontend-build.md
pyproject.toml		pyproject.toml
reference.md		reference.md
requirements.txt		requirements.txt

amazeeio/webscrape

Folders and files

Latest commit

History

Repository files navigation

WebScrape Pipeline (OpenClaw Data Provider)

Quick Start

Layout

Frontend setup (Next.js static export)

Documentation Split

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages