Skip to content

amazeeio/webscrape

Repository files navigation

WebScrape Pipeline (OpenClaw Data Provider)

This repository provides a file-based WebScrape pipeline:

  • Python collectors and pipelines gather and normalize public data.
  • Daily scrape jobs write deterministic JSON and JSONL artifacts.
  • OpenClaw reads those artifacts and performs downstream analysis/alerting.
  • A Next.js static-export frontend shows run health and recent changes.

The scripts in this repository do not call LLMs.

Quick Start

From repository root:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -r requirements.txt

Run daily scrape (ideally once per day around the same time):

python3 -m collector.pipeline.scrape
# or one profile only:
python3 -m collector.pipeline.scrape --profile <profile>

Generate report window(s) after scrape:

python3 -m collector.pipeline.report --profile <profile> --target <slug> --days 1
python3 -m collector.pipeline.report --profile <profile> --target <slug> --days 7

Analyze report outputs in:

  • data/reports/<profile>/YYYY-MM-DD/<slug>/last-<N>-days.json

Layout

  • config/: JSON schemas for run, report, and source validation.
  • collector/: Python collection and report utilities.
  • data/: local runtime data (target registry/history plus generated outputs, snapshots, and report exports).
  • frontend/: Next.js app exported to static files (frontend/out/) and browser-fetching JSON.

Frontend setup (Next.js static export)

cd frontend
npm install
npm run build

npm run build creates static files in frontend/out/. Deploy out/ with your static server (for example Nginx).

Important: the frontend fetches data in the browser from:

  • /data/...

So your static hosting layout should expose those paths at the same web root as the frontend.

Documentation Split

  • SKILL.md: concise day-to-day operating guide for OpenClaw.
  • reference.md: advanced operations (rediscovery, cleanup, manual source edits, troubleshooting).
  • frontend-build.md: frontend build/deploy runbook.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages