Skip to content

johnsamuelwrites/ShExStatements

Repository files navigation

ShExStatements

Python package CI

ShExStatements allows users to generate Shape Expressions (ShEx) from simple CSV statements, CSV files, and spreadsheets. It can be used from the command line, via REST API, or through a modern web interface.

Python compatibility

  • Core CSV/Spreadsheet to ShEx conversion supports modern Python versions including Python 3.13.
  • CI runs on Python 3.12, 3.13, plus 3.14-dev (allowed to fail) to detect future breakages early.

Ways to use ShExStatements

ShExStatements currently supports three primary usage modes:

  1. WASM runtime in the browser (static frontend, no backend required)
  2. Docker runtime (React frontend + FastAPI backend)
  3. Python runtime (CLI and legacy Flask interface)

Quick start

1) Using Python (CLI)

Set up a virtual environment and install shexstatements:

$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install shexstatements

Run the following command with an example CSV file. The file contains an example description of a language on Wikidata. This file uses comma as a delimiter to separate the values.

$ shexstatements.sh examples/language.csv

2) Using Docker (Frontend + Backend)

Run the containerized stack:

cd docker
docker compose up

This starts:

For development mode with hot reloading:

cd docker
docker compose -f docker-compose.yml -f docker-compose.dev.yml up

Build from source

Terminal

Clone the ShExStatements repository.

$ git clone https://github.com/johnsamuelwrites/ShExStatements.git

Go to ShExStatements directory.

$ cd ShExStatements

Install modules required by ShExStatements (here: installing into a virtual environment).

$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install .

Run the following command with an example CSV file. The file contains an example description of a language on Wikidata. This file uses comma as a delimiter to separate the values.

$ ./shexstatements.sh examples/language.csv

CSV file can use delimiters like ;. Take for example, the following command works with a file using semi-colon as a delimiter.

$ ./shexstatements.sh examples/languagedelimsemicolon.csv --delim ";"

But sometimes, users may like to specify the header. In that case, they can make use of -s or --skipheader to tell the generator to skip the header (first line of CSV).

$ ./shexstatements.sh --skipheader examples/header/languageheader.csv

It is also possible to work with Spreadsheet files like .ods, .xls or .xlsx.

$ shexstatements.sh examples/language.ods
$ shexstatements.sh examples/language.xls
$ shexstatements.sh examples/language.xlsx

In all the above cases, the shape expression generated by ShExStatements will look like

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
start = @<language>
<language> {
  wdt:P31 [ wd:Q34770  ] ;# instance of a language
  wdt:P1705 LITERAL ;# native name
  wdt:P17 .+ ;# spoken in country
  wdt:P2989 .+ ;# grammatical cases
  wdt:P282 .+ ;# writing system
  wdt:P1098 .+ ;# speakers
  wdt:P1999 .* ;# UNESCO language status
  wdt:P2341 .+ ;# indigenous to
}

It's also possible to use application profiles of the following form

Entity_name,Property,Property_label,Mand,Repeat,Value,Value_type,Annotation

and Shape expressions can be generated using the following form

$ ./shexstatements.sh -ap --skipheader examples/languageap.csv

Objectives

  • Easily generate shape expressions (ShEx) from CSV files and Spreadsheets
  • Simple syntax

Documentation and examples

A detailed documentation is available here, with example CSV files in the examples folder.

Test cases and coverage

All the test cases can be run in the following manner

$ python3 -m tests.tests

Code coverage report can also be generated by running the unit tests using the coverage tool.

$ coverage run --source=shexstatements -m unittest tests.tests
$ coverage report -m

Web Interface

Modern Web Interface (v1.0+)

ShExStatements now includes a modern, feature-rich web interface built with React and TypeScript.

Using Docker (recommended):

cd docker
docker compose up

Access the interface at http://localhost:3000

Features:

  • Split-pane editor with Monaco Editor (VS Code-like experience)
  • Syntax highlighting for ShExStatements and ShEx output
  • Dark mode support
  • File upload support (CSV, ODS, XLS, XLSX)
  • Multiple delimiter options (comma, pipe, semicolon)
  • Real-time error display
  • Copy output to clipboard
  • Runtime selector (Auto, API, WASM)

Static GitHub Pages (WASM)

The frontend can run conversion directly in the browser using Python-on-WASM (Pyodide), so it can be deployed as a static site on GitHub Pages.

  1. Enable GitHub Pages in repository settings (source: GitHub Actions).
  2. Push to main or master.
  3. The workflow .github/workflows/pages.yml builds and deploys the frontend with VITE_RUNTIME_MODE=wasm.

In WASM runtime:

  • CSV conversion to ShEx is supported in-browser.
  • Spreadsheet uploads (.xlsx, .xls, .ods) are also supported in-browser.
  • Pyodide dynamically installs Python dependencies (shexstatements, ply, and spreadsheet libraries) in the browser runtime.

Legacy Web Interface

The original Flask-based interface is still available:

$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install .
$ ./shexstatements.sh -r

Check the URL http://127.0.0.1:5000/

API

ShExStatements provides a REST API for programmatic access.

Modern API (v1.0+)

The new FastAPI-based API provides:

Convert endpoint:

curl -X POST http://localhost:8000/api/v1/convert \
  -H "Content-Type: application/json" \
  -d '{"content": "@shape|prop|value", "delimiter": "|", "output_format": "shex"}'

API documentation

Detailed API documentation (modern v1 API and legacy compatibility notes) is available here.

Deployment Modes

  • Standalone Python application: CLI + legacy Flask UI (./shexstatements.sh).
  • Docker application: React frontend + FastAPI backend (docker compose up).
  • Static GitHub Pages frontend: WASM runtime (no backend required for CSV-to-ShEx).

Demonstration

Online demonstrations are also available:

Author

Conference Proceedings

  • ShExStatements: Simplifying Shape Expressions for Wikidata , John Samuel, Wiki Workshop 2021 (held at The Web Conference 2021), 14 April 2021 (PDF, Slides)

Acknowledgements

  • Wikidata Community

Archives and Releases

Licence

All code are released under GPLv3+ licence. The associated documentation and other content are released under CC-BY-SA.