feat(export): include trip_h3 (th3index) in cross-platform portability export#24
Merged
estebanzimanyi merged 2 commits intoJun 5, 2026
Conversation
- Add LICENSE file (PostgreSQL License, 2020-2026, ULB + MobilityDB contributors) - Update copyright headers in all SQL/sh files to standard ecosystem form - Update CI workflow (main.yml): feat/**/fix/** branch triggers, paths-ignore for *.md/doc/**, workflow_dispatch, concurrency cancellation - Restructure README into 7 numbered sections; add Cross-Platform Portability section (§4) documenting the portable SQL dialect and ecosystem platforms; add Contributing section explaining the perennial master branch model - Add berlinmod_chapter1_queries_portable.sql: Q1–Q6 in the portable named- function dialect (eIntersects/eContains instead of operator symbols), compatible with MobilityDB, MobilityDuck, and MobilitySpark - Add berlinmod_portability_export() to berlinmod_export.sql: exports vehicles, trips (as WKT tgeompoint text), query_licences, query_instants, and query_points in the shared cross-platform schema consumed by MobilityDuck and MobilitySpark
3 tasks
estebanzimanyi
added a commit
to estebanzimanyi/MobilityDB-BerlinMOD
that referenced
this pull request
May 11, 2026
Adds `BerlinMOD/benchmarks/` for dated benchmark reports against the BerlinMOD chapter-1 query set on each ecosystem platform. Initial entries: - `README.md` — index and `<Platform>_<scope>_<topic>_<YYYY-MM-DD>.md` naming convention. - `MobilityDB_chapter1_th3index_2026-05-11.md` — index-matrix measurements (none / GiST(trip) / SP-GiST(trip) / GiST(trip_h3) / combinations) for Q1, Q2, Q4, Q6. Headline result: SP-GiST(trip) on Q1 at 3.09× speedup over baseline (5951 ms → 1927 ms); SP-GiST(trip) + GiST(trip_h3) + h3 prefilter at 3.16×. All configurations return matching row counts — the h3 prefilter is sound. Documents the polygon-coverage soundness contract, the selectivity of the prefilter (cross-join 162 000 → 55 720 → 3 836 true hits), and the index recommendations for the four query shapes. - `CrossPlatform_th3index_readiness_2026-05-11.md` — inventory of what's needed to replicate the bench on MobilityDuck (~4–5 person-days; th3index registration, h3indexset surface, zone-map pushdown verification) and MobilitySpark (~1.5 person-days after JMEOS regen; PR MobilityDB#9 carries the UDFs). - `run_bench.sh` — reproduce script with the exact query and index-configuration matrix. ## Coordinated PRs - MobilityDB **PR #938** — `geoToH3IndexSet` + the `everIntersectsH3IndexSet_Th3Index` prefilter (open; the polygon walker on this PR returns the sound cell-set the bench consumes). - MobilityDB **PR #940** — lift framework helper that demotes LINEAR to STEP for STEP-only result types (open). - MobilityDB-BerlinMOD **PR MobilityDB#24** — the shared CSV carries the `trip_h3` column and a th3index-variant chapter-1 SQL file.
estebanzimanyi
added a commit
to estebanzimanyi/MobilityDB-BerlinMOD
that referenced
this pull request
May 11, 2026
Extends `BerlinMOD/benchmarks/` from the chapter-1 subset to the full 17-query suite plus a beta testing harness for privileged testers across all three platforms. New files: - `MobilityDB_rqueries_2026-05-11.md` — 17 R-queries × index matrix (none / GiST(trip + trajectory) / SP-GiST(trip + trajectory)) on the bench-driving MobilityDB build. Total runtimes: 569 / 348 / 340 s respectively. Per-query highlights: Q14 48×, Q13 8.7×, Q10 6.2× under GiST. All three configurations return identical row counts. - `CrossPlatform_rqueries_readiness_2026-05-11.md` — sibling readiness document for replicating the same matrix on MobilityDuck and MobilitySpark. Inventories the 12 MEOS temporal functions and 4 PostGIS functions used by the R-queries, marks the gap items per platform (`tDwithin`, `whenTrue` on MobilityDuck; `whenTrue` verify on MobilitySpark), and lays out the sequencing. - `BETA_TESTING.md` — tester recipe and report-back template. Lists the four portable query files, the per-query expected row counts, and the per-platform invocation. Single entry point for privileged testers across MobilityDB / MobilityDuck / MobilitySpark. - `run_full_bench.sh` — reproduce script for the 17-query matrix. README.md is updated to index the new reports and the beta testing harness. The portable SQL files the reports reference live on `MobilityDB-BerlinMOD` PR MobilityDB#24 (sibling PR on the same repo).
estebanzimanyi
added a commit
to estebanzimanyi/MobilityDB-BerlinMOD
that referenced
this pull request
May 11, 2026
Two open issues prevent the Spark side of the bench from completing
the 17 R-queries today. Both are documented in the cross-platform
timings doc so reviewers and beta testers see the gap shape rather
than missing numbers.
1. GEOS context init crash on the first spatial UDF call
(`libgeos_c.so` SEGV with `context handle is uninitialized, call
initGEOS`). Affects Q2..Q17 — every query that uses a spatial
UDF. Q1 and QRT (relational only) complete. No open PR yet.
2. `UNRESOLVED_ROUTINE` on `everEqH3IndexTh3Index` and
`everIntersectsH3IndexSet_Th3Index` — these h3 UDFs are referenced
by the as-shipped Spark q02/q04/q05/q06/q10 but only registered on
PR MobilityDB#9 (`Th3IndexUDFs.java`). PR MobilityDB#9's source has JMEOS API drift
that prevents a clean rebuild against the current JMEOS jar.
Per a parallel session: the h3-related MobilityDB PRs (#807,
#866, #893, #938, MobilitySpark MobilityDB#9, MobilityDB-BerlinMOD MobilityDB#24)
are being consolidated into a single multi-commit PR. Once that
is issued, `feedback_issued_pr_treat_as_landed.md` permits using
the consolidated UDFs for downstream work.
Spark column in the side-by-side table is now `blocked (GEOS)` or
`blocked (GEOS + h3 PR)` per query. Total row marks Spark as `n/a`
until both blockers resolve.
Bare-name portable variants of the chapter-1 and 17 R-queries runnable on MobilityDB (PostgreSQL), MobilityDuck (DuckDB), and MobilitySpark (Spark); their th3index-accelerated counterparts plus berlinmod_th3index_setup.sql; the trip_h3 column in the export/load path; and a MobilityDuck schema adapter.
fc51ae5 to
4b30598
Compare
estebanzimanyi
added a commit
to estebanzimanyi/MobilityDB-BerlinMOD
that referenced
this pull request
Jun 5, 2026
…banner) Adds doc/contributing/reviewer-guide.md mirroring the canonical reviewer- guide structure used in MobilityDB / MobilityDuck / MobilitySpark / JMEOS, scoped to MobilityDB-BerlinMOD. Same canonical path (doc/contributing/reviewer-guide.md) as the other four ecosystem repos — reviewers landing in any of the five find the same structure at the same place. - Dependency chain: PR MobilityDB#23 (ecosystem-standards foundation) → PR MobilityDB#24 (extends portability_export with trip_h3 — pairs with MobilityDB PR #938 geo_to_h3index_set). Cross-repo dependency on MobilityDB th3index branch (#807 / #866 / #893) documented per feedback_issued_pr_treat_as_landed.md. - Tier ranking + per-tier review notes. - Standards checklist: license header, portable-name convention, CSV portability, loader cross-platform impact. - Cross-repo links to the other four ecosystem reviewer guides. Wires visibility: - .github/PULL_REQUEST_TEMPLATE.md links to the guide so contributors are prompted to update it in any commit that opens / closes / restructures a PR. - README.md gains a 'For contributors and reviewers' section pointing to the guide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BerlinMOD portability surface — chapter 1 + 17 R-queries + th3index
Cross-platform code PR for the BerlinMOD benchmark. Three groups of
changes that together make the same SQL runnable on MobilityDB,
MobilityDuck, and MobilitySpark, and make every platform return the
same row counts.
The companion docs PR is
#26
which carries the dated benchmark reports and the beta testing
harness.
What this PR adds
1. Portable SQL files (cross-platform dialect)
eIntersects,eContains,eDwithin, …) instead of PG-specific infix operators (&&,@>,<->). Each platform parses them identically.(
everIntersectsH3IndexSet_Th3Index(geoToH3IndexSet(G, 7), T.trip_h3))on every spatial-against-static predicate.
mobilityduck_schema_adapter.sqlis a view layer that exposes thecanonical R-queries column names (
Trips(TripId, VehicleId, Trip),Licences1, …) on top of the cross-platform CSV-loaded schema.2. Deterministic LIMIT-10 parameter views
berlinmod_load.sqland the chapter-1 portable files previously usedLIMIT 10(orLIMIT 100) withoutORDER BY. The 10 rows selecteddepended on the underlying table's physical insertion order, which
differs across platforms. Add
ORDER BY <PrimaryKey> LIMIT 10on allsix R-queries parameter views (
Licences1,Licences2,Instants1,Periods1,Points1,Regions1) and the four chapter-1 views.Result: 17/17 R-queries return identical row counts on PostgreSQL and
DuckDB from the same generated CSV files.
3. trip_h3 column in the cross-platform CSV export
berlinmod_portability_export()writestrip_h3(a th3index hex-WKBat H3 resolution 7) alongside
trip. All three platforms consumethe column directly for the h3 prefilter variant of the benchmark.
Row counts after the fix (BerlinMOD sf 0.005)
Identical on PostgreSQL, DuckDB, and Spark.
Coordinated PRs
geoToH3IndexSet+ sound polygon coverage.feat(h3): static-geometry → H3 cell set public API (POINT/LINESTRING/POLYGON/MULTI*/GeometryCollection) MobilityDB#938
fix(lifting): demote LINEAR to STEP when result type is not continuous MobilityDB#940
Add the dated three-platform BerlinMOD benchmark reports #26
Test plan
SELECT berlinmod_R_queries(1, false)returns the row counts above.
psql -f berlinmod_r_queries_portable.sqlreturns the same row counts.
mobilityduck_schema_adapter.sqlreturnsthe same row counts.