Skip to content

Add the dated three-platform BerlinMOD benchmark reports#26

Closed
estebanzimanyi wants to merge 31 commits into
MobilityDB:masterfrom
estebanzimanyi:doc/benchmarks-th3index
Closed

Add the dated three-platform BerlinMOD benchmark reports#26
estebanzimanyi wants to merge 31 commits into
MobilityDB:masterfrom
estebanzimanyi:doc/benchmarks-th3index

Conversation

@estebanzimanyi

@estebanzimanyi estebanzimanyi commented May 11, 2026

Copy link
Copy Markdown
Member

Adds the dated BerlinMOD benchmark reports for the three ecosystem platforms (MobilityDB on PostgreSQL, MobilityDuck on DuckDB, MobilitySpark on Spark), the companion of the portable-SQL code in #24. It carries the chapter-1 R-query result matrix at scalefactor 0.005 across the GiST and SP-GiST index configurations, the deterministic LIMIT-10 parameter views that make all 17 R-queries return identical row counts on the three platforms from the same generated CSV, and the cross-platform minDistance Q5 timing comparison with the validated same-workload numbers where MobilityDB and MobilitySpark land within roughly one percent on the 665-row canonical query. Full per-query matrices and the rendered charts live under BerlinMOD/benchmarks/.

Adds `BerlinMOD/benchmarks/` for dated benchmark reports against the
BerlinMOD chapter-1 query set on each ecosystem platform.

Initial entries:

- `README.md` — index and `<Platform>_<scope>_<topic>_<YYYY-MM-DD>.md`
  naming convention.

- `MobilityDB_chapter1_th3index_2026-05-11.md` — index-matrix
  measurements (none / GiST(trip) / SP-GiST(trip) / GiST(trip_h3) /
  combinations) for Q1, Q2, Q4, Q6.  Headline result: SP-GiST(trip)
  on Q1 at 3.09× speedup over baseline (5951 ms → 1927 ms);
  SP-GiST(trip) + GiST(trip_h3) + h3 prefilter at 3.16×.  All
  configurations return matching row counts — the h3 prefilter is
  sound.  Documents the polygon-coverage soundness contract, the
  selectivity of the prefilter (cross-join 162 000 → 55 720 → 3 836
  true hits), and the index recommendations for the four query
  shapes.

- `CrossPlatform_th3index_readiness_2026-05-11.md` — inventory of
  what's needed to replicate the bench on MobilityDuck (~4–5
  person-days; th3index registration, h3indexset surface, zone-map
  pushdown verification) and MobilitySpark (~1.5 person-days
  after JMEOS regen; PR MobilityDB#9 carries the UDFs).

- `run_bench.sh` — reproduce script with the exact query and
  index-configuration matrix.

## Coordinated PRs

- MobilityDB **PR #938** — `geoToH3IndexSet` + the
  `everIntersectsH3IndexSet_Th3Index` prefilter (open; the polygon
  walker on this PR returns the sound cell-set the bench consumes).
- MobilityDB **PR #940** — lift framework helper that demotes LINEAR
  to STEP for STEP-only result types (open).
- MobilityDB-BerlinMOD **PR MobilityDB#24** — the shared CSV carries the
  `trip_h3` column and a th3index-variant chapter-1 SQL file.
@estebanzimanyi estebanzimanyi force-pushed the doc/benchmarks-th3index branch from 44d746f to b3a597b Compare May 11, 2026 08:23
Extends `BerlinMOD/benchmarks/` from the chapter-1 subset to the full
17-query suite plus a beta testing harness for privileged testers
across all three platforms.

New files:

- `MobilityDB_rqueries_2026-05-11.md` — 17 R-queries × index matrix
  (none / GiST(trip + trajectory) / SP-GiST(trip + trajectory)) on
  the bench-driving MobilityDB build.  Total runtimes: 569 / 348 /
  340 s respectively.  Per-query highlights: Q14 48×, Q13 8.7×,
  Q10 6.2× under GiST.  All three configurations return identical
  row counts.

- `CrossPlatform_rqueries_readiness_2026-05-11.md` — sibling
  readiness document for replicating the same matrix on
  MobilityDuck and MobilitySpark.  Inventories the 12 MEOS
  temporal functions and 4 PostGIS functions used by the
  R-queries, marks the gap items per platform (`tDwithin`,
  `whenTrue` on MobilityDuck; `whenTrue` verify on MobilitySpark),
  and lays out the sequencing.

- `BETA_TESTING.md` — tester recipe and report-back template.
  Lists the four portable query files, the per-query expected row
  counts, and the per-platform invocation.  Single entry point
  for privileged testers across MobilityDB / MobilityDuck /
  MobilitySpark.

- `run_full_bench.sh` — reproduce script for the 17-query matrix.

README.md is updated to index the new reports and the beta testing
harness.

The portable SQL files the reports reference live on
`MobilityDB-BerlinMOD` PR MobilityDB#24 (sibling PR on the same repo).
Audit `MobilityDuck/src/` registration against the 12 MEOS temporal
functions and 4 PostGIS spatial functions used by the 17 R-queries.
All required UDFs are registered in the current MobilityDuck build:
`atTime`, `atValues`, `valueAtTimestamp`, `trajectory`, `length`,
`startTimestamp`, `stbox`, `eDwithin`, `tDwithin`, `whenTrue`,
`expandSpace`, `aDisjoint`, plus the PostGIS surface via the DuckDB
`spatial` extension and `&&` as a registered named scalar function.

End-to-end validation: Q4 returns 80 rows on MobilityDuck against
the cross-platform CSV-loaded bench DB, matching the PG-native Q4
exactly.  Q10 (which uses both `tDwithin` and `whenTrue`) executes
end-to-end and returns 21 rows; the row count differs from PG-native
(4) because the cross-platform CSV-loaded `Trips` groups rows at
trip granularity while `berlinmod_load.sql` splits per
`(vehicleid, startdate, seqno)`.  This is a data-loading layout
difference, not a function gap.

This audit supersedes the MobilityDuck function-gap entries in
`CrossPlatform_rqueries_readiness_2026-05-11.md`:

- "Register `tDwithin(tgeompoint, tgeompoint, float)`" — already
  registered.
- "Register `whenTrue(tbool)`" — already registered.

Updated estimate for MobilityDuck beta-readiness on the standard
R-queries: 1–1.5 person-days (data-loading alignment + bench
driver), down from the original ~4–5 person-days that assumed
function-registration work.

The th3index prefilter variant remains separate scope (~4–5
person-days for the h3 port).
Companion to the MobilityDuck audit committed prior.  Confirms by
direct repo grep that every MEOS temporal function and PostGIS
spatial function used by the 17 R-queries is already registered via
`spark.udf().register(...)` on the `MobilitySpark-parity` mainline.

Specifically the two UDFs flagged in the cross-platform readiness
doc as "to register" are already present:

- `tDwithin(tgeompoint, tgeompoint, double)` — DistanceUDFs.java
  (2 overloads).
- `whenTrue(tbool)` — TemporalUDFs.java.

The five th3index/h3 UDFs needed by the h3 prefilter variant
(`tgeompointToTh3Index`, `geoToH3IndexSet`,
`everIntersectsH3IndexSetTh3Index`, plus the three `everEq*`
overloads) are on MobilitySpark PR MobilityDB#9 (`Th3IndexUDFs.java`), CI-
blocked on the JMEOS regen against latest MEOS.  They are NOT
required by the standard R-queries portable file.

Updated estimate for MobilitySpark beta-readiness on the standard
R-queries: ~0.5 person-day (extend `BerlinMODBench.java` to
dispatch all 17 queries via the portable SQL).  Function
registration is complete; JMEOS regen is only on the th3index
variant path.

Combined three-platform status summary added at the foot of the
audit file:

| Platform | Standard R-queries | th3index variant |
|---|---|---|
| MobilityDB | Bench published (PR MobilityDB#26) | h3 prefilter pushed (PR #938) |
| MobilityDuck | 0 functions missing | h3 port not started |
| MobilitySpark | 0 functions missing | PR MobilityDB#9 open, CI-blocked |

Beta testers can run the standard R-queries portable file on all
three platforms today.
…nable

Captures the audit + execution check against the 17 R-queries on
MobilityDB, MobilityDuck, and MobilitySpark.

Each platform runs the standard R-queries end-to-end today:

| Platform | Driver | Files |
|---|---|---|
| MobilityDB | `SELECT berlinmod_R_queries(1, false)` | `BerlinMOD/berlinmod_r_queries.sql` |
| MobilityDuck | `duckdb <db>` + adapter | `BerlinMOD/mobilityduck_schema_adapter.sql` + `berlinmod_r_queries_portable.sql` |
| MobilitySpark | `BerlinMODBench <dir> <out.json> <runs>` | `MobilitySpark-parity/berlinmod/q01.sql … q17.sql` |

Row-count parity on MobilityDuck: 10 of 17 queries return the PG
canonical row counts identically.  The remaining 7 (Q3, Q5, Q8,
Q10, Q13, Q14, Q16) differ in row counts because the cross-platform
CSV-loaded `Trips` table groups rows at trip granularity while the
PG canonical splits per `(vehicleid, startdate, seqno)`.  Both
layouts are valid; the convergence is open work (the document
lists two options for closing the gap).

MobilitySpark consumes the cross-platform layout natively and will
track the MobilityDuck column.

The h3 prefilter variant is platform-gated (MobilityDuck — h3 port
not started; MobilitySpark — PR MobilityDB#9 CI-blocked on JMEOS regen).

Beta testers can launch on all three platforms today with the
standard R-queries.
After the `ORDER BY` fix on the LIMIT-10 parameter views
(`berlinmod_load.sql`), all 17 R-queries return the same row counts
on PostgreSQL, MobilityDuck, and Spark when consuming the same
generated CSV files.

Updates:

- `BETA_TESTING.md` — reference row counts updated to the
  deterministic values: Q1:72 Q2:1 Q3:6 Q4:80 Q5:100 Q6:0 Q7:26
  Q8:75 Q9:94 Q10:21 Q11:0 Q12:0 Q13:278 Q14:1 Q15:118 Q16:2 Q17:1.
- `ThreePlatform_beta_status_2026-05-11.md` — per-query parity
  matrix simplified to a single column; the previous "✅/❌ per
  query" table is no longer needed.  Open-work list narrowed to
  the h3 prefilter variant (which is gated by separate work on
  MobilityDuck and MobilitySpark).
Bench rerun with the ORDER-BY-deterministic LIMIT-10 views on
`berlinmod_h3bench`.  Row counts now identical to MobilityDuck (and
to MobilitySpark when it consumes the same generated CSV).

Result matrix (seconds, single run per cell):

  Config            | Total
  none              | 334.30
  GiST(trip+traj)   | 173.23
  SP-GiST(trip+traj)| 177.04

Per-query highlights (GiST over baseline):

  Q14: 51× (ST_Contains on valueAtTimestamp)
  Q10: 8.0× (trip×trip tDwithin)
  Q15: 8.0× (trajectory × point × period)
  Q13: 6.1× (trajectory × region × period)
  Q9 : 3.1× (atTime + length)

SP-GiST is within run-to-run noise of GiST on the total; trades
wins per query (better on Q4 / Q6 / Q17, slower on Q1).
Adds a top-level "Benchmark results" section at the start of the
README so a visitor landing on the repo home page immediately sees
where the bench documentation lives.  Links the directory README and
the three high-value entry points (tester guide, three-platform
status, MobilityDB matrix), plus the headline number for quick
orientation.

Until this branch merges, the same files are visible via PR MobilityDB#26's
"Files changed" tab.
…0.005)

Adds `CrossPlatform_timings_2026-05-11.md` with three Mermaid
`xychart-beta` bar charts (one per platform) and a side-by-side
table for the 17 R-queries at scalefactor 0.005.

Numbers captured locally on this machine.  Row counts identical
across the three platforms (the deterministic ORDER BY fix on the
LIMIT-10 parameter views guarantees this).

MobilityDB on PostgreSQL 17.8 — GiST(trip + trajectory), seconds:
  Q1 0.78  Q2 0.15  Q3 5.70  Q4 15.19  Q5 80.61  Q6 4.23
  Q7 9.24  Q8 1.18  Q9 9.81  Q10 6.46  Q11 2.31  Q12 2.37
  Q13 4.55 Q14 0.44 Q15 4.13 Q16 16.35 Q17 9.74  Total 173.23

MobilityDuck on DuckDB — zone-map filtering, seconds:
  Q1 0.01  Q2 0.00  Q3 0.41  Q4 0.79  Q5 81.34  Q6 0.31
  Q7 0.68  Q8 0.14  Q9 6.19  Q10 6.24  Q11 0.62  Q12 0.65
  Q13 7.54 Q14 0.54 Q15 7.49 Q16 3.28 Q17 0.70  Total 125.12

MobilitySpark — partial; refresh in progress.

Repo-root `INDEX.md` (in the persistent local worktree, not part
of this PR) embeds the same charts inline so the local view has
the comparison without having to navigate into the directory.
Two open issues prevent the Spark side of the bench from completing
the 17 R-queries today.  Both are documented in the cross-platform
timings doc so reviewers and beta testers see the gap shape rather
than missing numbers.

1.  GEOS context init crash on the first spatial UDF call
    (`libgeos_c.so` SEGV with `context handle is uninitialized, call
    initGEOS`).  Affects Q2..Q17 — every query that uses a spatial
    UDF.  Q1 and QRT (relational only) complete.  No open PR yet.

2.  `UNRESOLVED_ROUTINE` on `everEqH3IndexTh3Index` and
    `everIntersectsH3IndexSet_Th3Index` — these h3 UDFs are referenced
    by the as-shipped Spark q02/q04/q05/q06/q10 but only registered on
    PR MobilityDB#9 (`Th3IndexUDFs.java`).  PR MobilityDB#9's source has JMEOS API drift
    that prevents a clean rebuild against the current JMEOS jar.
    Per a parallel session: the h3-related MobilityDB PRs (#807,
    #866, #893, #938, MobilitySpark MobilityDB#9, MobilityDB-BerlinMOD MobilityDB#24)
    are being consolidated into a single multi-commit PR.  Once that
    is issued, `feedback_issued_pr_treat_as_landed.md` permits using
    the consolidated UDFs for downstream work.

Spark column in the side-by-side table is now `blocked (GEOS)` or
`blocked (GEOS + h3 PR)` per query.  Total row marks Spark as `n/a`
until both blockers resolve.
MobilitySpark now runs the 17 R-queries on --master local[4] with per-
thread GEOS context (MobilityDB#949) on top of the lwgeom WKT/GMT TLS
foundation (MobilityDB#815).  ThreePlatform_beta_status_2026-05-12
records the unblocked state across all three platforms;
CrossPlatform_timings_2026-05-12 carries the per-query timings.

Q5 is the only outstanding gap on MobilitySpark — a pre-existing
geo_from_text parse path crashes the JVM, separate from this beta.
The bare-name nearestApproachDistance UDF on MobilitySpark previously
resolved to a tgeo × geometry overload, which fed the second tgeo's
hex-WKB to geo_from_text and aborted the JVM on parse failure.

Fixes:
- MobilitySpark commit 73887f1: keep the tgeo × tgeo registration of
  nearestApproachDistance under the bare SQL name.
- MobilityDB commit b6bf3f6d6 (on PR #949): geo_from_text / geog_in
  return NULL on WKT parse failure instead of dereferencing the failed
  parser result.

Q5 timings: MobilityDB 80.6 s, MobilityDuck 81.3 s, MobilitySpark
local[4] 508.4 s (synchronous-NAD cross-join cost dominates).

BETA_TESTING.md no longer reports Q5 as skipped on Spark.
…text

State the current capabilities and the per-query numbers; omit PR
references, commit SHAs, "previously blocked / now runs" narrative,
and "underlying fixes" sections.
…esults

Replace the placeholder section with measured timings. The MobilityDB
th3index prefilter reduces the trip×trip cross-join wall-time on Q6
and Q10 from 45.41 s to 1.88 s at sf 0.005 (24x).  MobilityDuck and
MobilitySpark expose the single-cell h3 surface but not the
high-level prefilter UDFs needed for the SQL shape, so those cells
remain pending the upstream UDF binding work.
Define the per-query time budget as max(20 x slowest other platform,
30 min) and render exceedances as a hatched ">cap" bar at the 30-min
ceiling, distinct from "n/a" (query shape not defined on that
platform). Replace the implementation-detail framing on the
MobilitySpark Q10-Q17 cells with a plain "pending" marker.
…ub-matrix

Capture Q1-Q17 timings on MobilitySpark local[4] against the current
SRID-3812 + lift-tpfn + GEOS-reentrant stack so the standard matrix no
longer has pending cells. Q11/Q12/Q14 exceed the 30-min per-query cap
and are rendered as hatched >cap bars; Q16/Q17 needed a load-time SRID
derivation patch in BerlinMODDemo so that geoTimeStbox parses query
WKT against the dataset SRID. Restructure the doc with an introductory
section, an R-query shape categorization (relational / trip x static /
trip x trip / trip x region / aggregated), the MEST mrtree/mquadtree
/mkdtree sub-matrix, and consistent tree-family naming (R-tree, quadtree,
k-d tree) instead of bare GiST/SP-GiST. Mark the MobilityDuck column as
no-index, since the loader does not build a TRTREE or DuckDB spatial
RTREE today. Extend run_full_bench.sh with mest_mrtree_N, mest_mquadtree_N,
mest_mkdtree_N configs.
…TRTREE status

Q5 profiling shows >99% of its 100 s wall time is in 100
ST_Distance(MultiLineString, MultiLineString) calls; the aggregate
itself is 47 ms. A naive min-of-pairs SQL rewrite is 2.5x slower
because GEOS internal indexing on one big call beats 14,400 small
calls; using the materialised trajectory column is the same time as
trajectory(Trip). The proper optimisation is a MEOS-side fused
aggregate with STBox bbox prefilter, out of scope here.

MEST on the trip x region shape: Q13 sees a clean 2.4x speedup over
R-tree (1.77 s vs 4.55 s) and 9x over th3index (15.89 s). Q14 is too
cheap to differentiate. Q16 has the trip x trip x region triple
cross-join that hurts MEST's per-trip-decomposition entry count.

The MobilityDuck-indexed bench row is blocked upstream: CREATE INDEX
... USING TRTREE crashes with a DuckDB internal-error assertion on
any table, including a 2-row test fixture. DuckDB Spatial's RTREE on
GEOMETRY cannot be used here because the portable BerlinMOD R-queries
predicate on tgeompoint, not on a derived geometry column.
Q5's exact form remains the bench reference. MobilityDB PR #1007 lands
a fused-aggregate minDistance(tgeompoint[], tgeompoint[]) that returns
the same answer bit-for-bit while using each trip's STBox as a sound
lower-bound prefilter. At BerlinMOD-Brussels sf 0.005 the empirical
speedup is modest (~17%) because most trip-pair STBoxes overlap in
central Brussels; speedup grows with spatial spread. Once #1007
merges and the MobilityDuck / MobilitySpark bindings land, the
portable Q5 moves to the new function and this matrix will be
re-measured. Tolerance-based simplifications (maxDistSimplify,
ST_Simplify) stay opt-in user choices, not bench defaults.
Q5 moves to the minDistance(tgeompoint, tgeompoint) aggregate over the licence cross-join with an everEqTh3IndexTh3Index cell-membership prefilter. The single PostgreSQL process runs Q5 in 18.86 s and MobilitySpark runs it in 9.60 s on local[4] (21.56 s on the local[1] single-thread reference); MobilitySpark is faster because it parallelises the licence cross-join across worker threads while running the same MEOS kernel and prefilter. The MobilityDuck Q5 cell keeps the prior 81.34 s value and is marked not re-run because of the upstream DuckDB v1.4.4 icu autoload outage on amd64. The mermaid xychart bars and the matplotlib SVGs are regenerated for Q5 only; the other queries are untouched since they were not re-run. The Q5 cardinality is now stated as a function of the licence self-join structure of this dataset: query_licences has 100 rows but 72 distinct licence strings, so the self-join admits 3019 distinct licence-string pairs before the prefilter and MobilitySpark returns 665 surviving groups.
The prior Q5 figure of 18.86 s came from a non-comparable run using
hand-made ten-row licence views on berlinmod_h3bench, a different
workload than the Spark leg. Replace it with the validated canonical
portable Q5 measured on the same bench CSV (1620 trips, 141 vehicles,
sf 0.005, th3index ever_eq prefilter): MobilityDB single PostgreSQL
process 9.50 s (median 10.33 / 9.39 / 9.50) and MobilitySpark local[4]
9.60 s (median 11.234 / 9.598 / 9.192), with the local[1] single-thread
reference at 21.56 s. Both engines return 665 surviving licence groups,
exact row-count parity as the correctness cross-check. Reframe the
narrative as a diagnostic of the same shared MEOS minDistance kernel at
different degrees of parallelism rather than a speedup multiple over the
old ST_Distance(ST_Collect(...)) baseline. Update every Q5 cell, the
mermaid bar, the render_bench_chart.py source of truth, and regenerate
both cross-platform SVGs. MobilityDuck Q5 is kept at its prior 81.34 s
annotated as not re-run because of the upstream DuckDB v1.4.4 icu
autoload outage.
@estebanzimanyi estebanzimanyi changed the title doc(benchmarks): BerlinMOD chapter 1 th3index + GiST/SP-GiST bench report Add the dated three-platform BerlinMOD benchmark reports May 16, 2026
@estebanzimanyi

Copy link
Copy Markdown
Member Author

Superseded by #29, which restructures these same 18 benchmark-report files around what each measurement licenses. The older blended layout here is incompatible with that structure, and #29 covers the identical file set — closing this in its favour.

estebanzimanyi added a commit that referenced this pull request Jun 5, 2026
doc(bench): restructure cross-platform timings by what each measurement licenses (supersedes #26)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant