This repository provides a file-based WebScrape pipeline:
- Python collectors and pipelines gather and normalize public data.
- Daily scrape jobs write deterministic JSON and JSONL artifacts.
- OpenClaw reads those artifacts and performs downstream analysis/alerting.
- A Next.js static-export frontend shows run health and recent changes.
The scripts in this repository do not call LLMs.
From repository root:
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -r requirements.txtRun daily scrape (ideally once per day around the same time):
python3 -m collector.pipeline.scrape
# or one profile only:
python3 -m collector.pipeline.scrape --profile <profile>Generate report window(s) after scrape:
python3 -m collector.pipeline.report --profile <profile> --target <slug> --days 1
python3 -m collector.pipeline.report --profile <profile> --target <slug> --days 7Analyze report outputs in:
data/reports/<profile>/YYYY-MM-DD/<slug>/last-<N>-days.json
config/: JSON schemas for run, report, and source validation.collector/: Python collection and report utilities.data/: local runtime data (target registry/history plus generated outputs, snapshots, and report exports).frontend/: Next.js app exported to static files (frontend/out/) and browser-fetching JSON.
cd frontend
npm install
npm run buildnpm run build creates static files in frontend/out/. Deploy out/ with your static server (for example Nginx).
Important: the frontend fetches data in the browser from:
/data/...
So your static hosting layout should expose those paths at the same web root as the frontend.
SKILL.md: concise day-to-day operating guide for OpenClaw.reference.md: advanced operations (rediscovery, cleanup, manual source edits, troubleshooting).frontend-build.md: frontend build/deploy runbook.