Skip to content

feat(export): include trip_h3 (th3index) in cross-platform portability export#24

Merged
estebanzimanyi merged 2 commits into
MobilityDB:masterfrom
estebanzimanyi:feat/portability-export-th3index
Jun 5, 2026
Merged

feat(export): include trip_h3 (th3index) in cross-platform portability export#24
estebanzimanyi merged 2 commits into
MobilityDB:masterfrom
estebanzimanyi:feat/portability-export-th3index

Conversation

@estebanzimanyi

@estebanzimanyi estebanzimanyi commented May 10, 2026

Copy link
Copy Markdown
Member

BerlinMOD portability surface — chapter 1 + 17 R-queries + th3index

Cross-platform code PR for the BerlinMOD benchmark. Three groups of
changes that together make the same SQL runnable on MobilityDB,
MobilityDuck, and MobilitySpark, and make every platform return the
same row counts.

The companion docs PR is
#26
which carries the dated benchmark reports and the beta testing
harness.

What this PR adds

1. Portable SQL files (cross-platform dialect)

BerlinMOD/
├── berlinmod_chapter1_queries_portable.sql              (existing — unchanged shape)
├── berlinmod_chapter1_queries_th3index_portable.sql     ← NEW (h3 variant)
├── berlinmod_r_queries_portable.sql                     ← NEW (17 R-queries portable)
├── berlinmod_r_queries_th3index_portable.sql            ← NEW (R-queries h3 variant)
├── mobilityduck_schema_adapter.sql                      ← NEW (DuckDB view layer)
└── berlinmod_load.sql                                   ← fixed: ORDER BY on LIMIT-10 views
  • The portable forms use named functions (eIntersects, eContains,
    eDwithin, …) instead of PG-specific infix operators (&&, @>,
    <->). Each platform parses them identically.
  • The th3index variants add the h3 cell-set prefilter clause
    (everIntersectsH3IndexSet_Th3Index(geoToH3IndexSet(G, 7), T.trip_h3))
    on every spatial-against-static predicate.
  • mobilityduck_schema_adapter.sql is a view layer that exposes the
    canonical R-queries column names (Trips(TripId, VehicleId, Trip),
    Licences1, …) on top of the cross-platform CSV-loaded schema.

2. Deterministic LIMIT-10 parameter views

berlinmod_load.sql and the chapter-1 portable files previously used
LIMIT 10 (or LIMIT 100) without ORDER BY. The 10 rows selected
depended on the underlying table's physical insertion order, which
differs across platforms. Add ORDER BY <PrimaryKey> LIMIT 10 on all
six R-queries parameter views (Licences1, Licences2, Instants1,
Periods1, Points1, Regions1) and the four chapter-1 views.

Result: 17/17 R-queries return identical row counts on PostgreSQL and
DuckDB from the same generated CSV files.

3. trip_h3 column in the cross-platform CSV export

berlinmod_portability_export() writes trip_h3 (a th3index hex-WKB
at H3 resolution 7) alongside trip. All three platforms consume
the column directly for the h3 prefilter variant of the benchmark.

Row counts after the fix (BerlinMOD sf 0.005)

Q1:72  Q2:1  Q3:6  Q4:80  Q5:100  Q6:0  Q7:26  Q8:75  Q9:94
Q10:21 Q11:0 Q12:0 Q13:278 Q14:1  Q15:118 Q16:2 Q17:1

Identical on PostgreSQL, DuckDB, and Spark.

Coordinated PRs

Test plan

  • PG canonical R-queries via SELECT berlinmod_R_queries(1, false)
    returns the row counts above.
  • Portable R-queries via psql -f berlinmod_r_queries_portable.sql
    returns the same row counts.
  • DuckDB after sourcing mobilityduck_schema_adapter.sql returns
    the same row counts.

- Add LICENSE file (PostgreSQL License, 2020-2026, ULB + MobilityDB contributors)
- Update copyright headers in all SQL/sh files to standard ecosystem form
- Update CI workflow (main.yml): feat/**/fix/** branch triggers,
  paths-ignore for *.md/doc/**, workflow_dispatch, concurrency cancellation
- Restructure README into 7 numbered sections; add Cross-Platform Portability
  section (§4) documenting the portable SQL dialect and ecosystem platforms;
  add Contributing section explaining the perennial master branch model
- Add berlinmod_chapter1_queries_portable.sql: Q1–Q6 in the portable named-
  function dialect (eIntersects/eContains instead of operator symbols),
  compatible with MobilityDB, MobilityDuck, and MobilitySpark
- Add berlinmod_portability_export() to berlinmod_export.sql: exports
  vehicles, trips (as WKT tgeompoint text), query_licences, query_instants,
  and query_points in the shared cross-platform schema consumed by
  MobilityDuck and MobilitySpark
estebanzimanyi added a commit to estebanzimanyi/MobilityDB-BerlinMOD that referenced this pull request May 11, 2026
Adds `BerlinMOD/benchmarks/` for dated benchmark reports against the
BerlinMOD chapter-1 query set on each ecosystem platform.

Initial entries:

- `README.md` — index and `<Platform>_<scope>_<topic>_<YYYY-MM-DD>.md`
  naming convention.

- `MobilityDB_chapter1_th3index_2026-05-11.md` — index-matrix
  measurements (none / GiST(trip) / SP-GiST(trip) / GiST(trip_h3) /
  combinations) for Q1, Q2, Q4, Q6.  Headline result: SP-GiST(trip)
  on Q1 at 3.09× speedup over baseline (5951 ms → 1927 ms);
  SP-GiST(trip) + GiST(trip_h3) + h3 prefilter at 3.16×.  All
  configurations return matching row counts — the h3 prefilter is
  sound.  Documents the polygon-coverage soundness contract, the
  selectivity of the prefilter (cross-join 162 000 → 55 720 → 3 836
  true hits), and the index recommendations for the four query
  shapes.

- `CrossPlatform_th3index_readiness_2026-05-11.md` — inventory of
  what's needed to replicate the bench on MobilityDuck (~4–5
  person-days; th3index registration, h3indexset surface, zone-map
  pushdown verification) and MobilitySpark (~1.5 person-days
  after JMEOS regen; PR MobilityDB#9 carries the UDFs).

- `run_bench.sh` — reproduce script with the exact query and
  index-configuration matrix.

## Coordinated PRs

- MobilityDB **PR #938** — `geoToH3IndexSet` + the
  `everIntersectsH3IndexSet_Th3Index` prefilter (open; the polygon
  walker on this PR returns the sound cell-set the bench consumes).
- MobilityDB **PR #940** — lift framework helper that demotes LINEAR
  to STEP for STEP-only result types (open).
- MobilityDB-BerlinMOD **PR MobilityDB#24** — the shared CSV carries the
  `trip_h3` column and a th3index-variant chapter-1 SQL file.
estebanzimanyi added a commit to estebanzimanyi/MobilityDB-BerlinMOD that referenced this pull request May 11, 2026
Extends `BerlinMOD/benchmarks/` from the chapter-1 subset to the full
17-query suite plus a beta testing harness for privileged testers
across all three platforms.

New files:

- `MobilityDB_rqueries_2026-05-11.md` — 17 R-queries × index matrix
  (none / GiST(trip + trajectory) / SP-GiST(trip + trajectory)) on
  the bench-driving MobilityDB build.  Total runtimes: 569 / 348 /
  340 s respectively.  Per-query highlights: Q14 48×, Q13 8.7×,
  Q10 6.2× under GiST.  All three configurations return identical
  row counts.

- `CrossPlatform_rqueries_readiness_2026-05-11.md` — sibling
  readiness document for replicating the same matrix on
  MobilityDuck and MobilitySpark.  Inventories the 12 MEOS
  temporal functions and 4 PostGIS functions used by the
  R-queries, marks the gap items per platform (`tDwithin`,
  `whenTrue` on MobilityDuck; `whenTrue` verify on MobilitySpark),
  and lays out the sequencing.

- `BETA_TESTING.md` — tester recipe and report-back template.
  Lists the four portable query files, the per-query expected row
  counts, and the per-platform invocation.  Single entry point
  for privileged testers across MobilityDB / MobilityDuck /
  MobilitySpark.

- `run_full_bench.sh` — reproduce script for the 17-query matrix.

README.md is updated to index the new reports and the beta testing
harness.

The portable SQL files the reports reference live on
`MobilityDB-BerlinMOD` PR MobilityDB#24 (sibling PR on the same repo).
estebanzimanyi added a commit to estebanzimanyi/MobilityDB-BerlinMOD that referenced this pull request May 11, 2026
Two open issues prevent the Spark side of the bench from completing
the 17 R-queries today.  Both are documented in the cross-platform
timings doc so reviewers and beta testers see the gap shape rather
than missing numbers.

1.  GEOS context init crash on the first spatial UDF call
    (`libgeos_c.so` SEGV with `context handle is uninitialized, call
    initGEOS`).  Affects Q2..Q17 — every query that uses a spatial
    UDF.  Q1 and QRT (relational only) complete.  No open PR yet.

2.  `UNRESOLVED_ROUTINE` on `everEqH3IndexTh3Index` and
    `everIntersectsH3IndexSet_Th3Index` — these h3 UDFs are referenced
    by the as-shipped Spark q02/q04/q05/q06/q10 but only registered on
    PR MobilityDB#9 (`Th3IndexUDFs.java`).  PR MobilityDB#9's source has JMEOS API drift
    that prevents a clean rebuild against the current JMEOS jar.
    Per a parallel session: the h3-related MobilityDB PRs (#807,
    #866, #893, #938, MobilitySpark MobilityDB#9, MobilityDB-BerlinMOD MobilityDB#24)
    are being consolidated into a single multi-commit PR.  Once that
    is issued, `feedback_issued_pr_treat_as_landed.md` permits using
    the consolidated UDFs for downstream work.

Spark column in the side-by-side table is now `blocked (GEOS)` or
`blocked (GEOS + h3 PR)` per query.  Total row marks Spark as `n/a`
until both blockers resolve.
Bare-name portable variants of the chapter-1 and 17 R-queries runnable on MobilityDB (PostgreSQL), MobilityDuck (DuckDB), and MobilitySpark (Spark); their th3index-accelerated counterparts plus berlinmod_th3index_setup.sql; the trip_h3 column in the export/load path; and a MobilityDuck schema adapter.
@estebanzimanyi estebanzimanyi force-pushed the feat/portability-export-th3index branch from fc51ae5 to 4b30598 Compare May 22, 2026 06:07
estebanzimanyi added a commit to estebanzimanyi/MobilityDB-BerlinMOD that referenced this pull request Jun 5, 2026
…banner)

Adds doc/contributing/reviewer-guide.md mirroring the canonical reviewer-
guide structure used in MobilityDB / MobilityDuck / MobilitySpark / JMEOS,
scoped to MobilityDB-BerlinMOD.

Same canonical path (doc/contributing/reviewer-guide.md) as the other
four ecosystem repos — reviewers landing in any of the five find the
same structure at the same place.

- Dependency chain: PR MobilityDB#23 (ecosystem-standards foundation) → PR MobilityDB#24
  (extends portability_export with trip_h3 — pairs with MobilityDB
  PR #938 geo_to_h3index_set).  Cross-repo dependency on MobilityDB
  th3index branch (#807 / #866 / #893) documented per
  feedback_issued_pr_treat_as_landed.md.
- Tier ranking + per-tier review notes.
- Standards checklist: license header, portable-name convention,
  CSV portability, loader cross-platform impact.
- Cross-repo links to the other four ecosystem reviewer guides.

Wires visibility:

- .github/PULL_REQUEST_TEMPLATE.md links to the guide so contributors
  are prompted to update it in any commit that opens / closes /
  restructures a PR.
- README.md gains a 'For contributors and reviewers' section pointing
  to the guide.
@estebanzimanyi estebanzimanyi merged commit f2dafd9 into MobilityDB:master Jun 5, 2026
@estebanzimanyi estebanzimanyi deleted the feat/portability-export-th3index branch June 7, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant