feat(spark): JMEOS 1.4 + BerlinMOD Q1-Q17 + 100% MobilityDB SQL parity (907 tests)#5
Open
estebanzimanyi wants to merge 98 commits into
Open
Conversation
…trators for UDT and UDF.
Period implementation
Satria/poc
… UDTs using Meos Datatypes.
Meos datatype
Timestampset implementation
be94aeb to
4a540a5
Compare
4a540a5 to
49e323b
Compare
🎉 Complete coverage of the active addressable MobilityDB SQL surface.
907/907 unit tests green. Compare to MobilityDuck 79.3% (current).
Adds ~315 UDFs across 16 new files + extends 12 existing files.
Coverage trajectory: 51% → 100% across the parity push. All 51 active
sections now at 100%.
==== New UDF classes ====
- TPointSTBoxOpsUDFs: 42 cross-type STBox×TPoint positional/topological
- TBoxOpsUDFs: 39 cross-type TBox×TNumber positional/topological
- SpansetOpsUDFs: 23 cross-type Span/Spanset positional/topological
- TemporalCompUDFs: 26 temporal comparison ops (teq/tne/tlt/tle/tgt/tge)
- TemporalBoxOpsUDFs: 30 cross-type box predicates
- AlwaysSpatialRelsUDFs: 12 'always' spatial-relationship predicates
- SetOpsUDFs: set×set positional + topological + per-type distance
- IOAliasUDFs: 100+ typed *From{HexWKB,Binary,Text,EWKT,EWKB,MFJSON} aliases
- SubtypeConstructorUDFs: typed Inst/Seq/SeqSet aliases + accessors
- AccessorAliasUDFs: typed span/spanset width, dates, valueSpan, set-values
arrays, tboxes/stboxes/spans (array-returning), bins, splits, valueSet,
segmentMin/MaxDuration, box2d, box3d (PostGIS embedded in MEOS),
mobilitydbVersion, avgValue, tgeometry/tgeography conversions, quadSplit,
getBin/timestamptzGetBin
- BucketUDFs: floatBucket, intBucket
- GeoAffineUDFs: translate/translate3, rotate, rotateX/Y/Z, transscale, affine
- TileUDFs: complete multi-dimensional tiling for parallel processing —
spaceBoxes / spaceTimeBoxes / valueTimeBoxesT{float,int} / time/value
Boxes/Tiles/Splits, getTimeTile / getSpaceTile / getSpaceTimeTile /
getStboxTimeTile / getValueTile / getValueTimeTile / getTBoxTimeTile,
spaceTiles / spaceTimeTiles / stbox/tint/tfloatTimeTiles, makeSimple
(Temporal** array of simple sub-tpoints), tfloat/tintValueTiles,
tfloat/tintValueSplit (Temporal** with Datum vsize/vorigin via IEEE bits),
tfloat/tintValueTimeSplit, geoMeasure (tpoint+tfloat → geometry),
asMVTGeom (tpoint → array of WKT geometries clipped to STBox bounds)
- SeqSetGapsUDFs: tbool/tint/tfloat/ttext/tgeompoint/tgeogpoint/tgeometry/
tgeographySeqSetGaps (closes long-standing user request from MobilityDB
issue #187 — array-of-instants → tsequenceset_make_gaps with native
TInstant** packing)
==== Extended existing UDF classes ====
- GeoUDFs, DistanceUDFs, GeoAnalyticsUDFs, STBoxUDFs, TBoxUDFs,
SimilarityUDFs, TTextUDFs, TransformUDFs, BoolOpsUDFs, TemporalUDFs,
AccessorUDFs, SpanAlgebraUDFs — see docs/parity-status.md for full per-
section coverage
==== MeosNative.java (new) ====
Supplementary JNR-FFI interface for ~70 MEOS-1.4 symbols not yet in
JMEOS-1.4: nad/nai/shortestline_tgeo_*, {dir}_stbox_tspatial /
_tspatial_stbox, float/int_get_bin, t{float,int}box_expand,
tgeometry/tgeography_in/_from_mfjson, temporal_mem_size, tgeoinst_make,
temporal_before/after_timestamptz, textcat_ttext_*, mobilitydb_version,
intset/bigintset/floatset_value_n out-param accessors, tnumber_avg_value,
tgeo*-to-tgeo* conversions, span_expand/_bins, tnumber/tgeo_split_*_n_*,
tnumber_tboxes / tgeo_stboxes, tpoint_minus_geom / _direction /
_make_simple, temporal_dyntimewarp_path / _frechet_path, tgeo_affine,
temporal_time_bins / tstzspan_bins / t{int,float}_value_bins,
stbox_quad_split, timestamptz_get_bin, stbox_get_space/time/space_time_tile,
tgeo_space/space_time_boxes, tnumber_value_time_boxes (Datum via long),
temporal_time_split / tgeo_space_split / tgeo_space_time_split (Temporal**
+ bin out-params), temporal_values_p + set_make_free + temptype_basetype
(valueSet path), temporal_segm_duration, stbox_to_box3d / _to_gbox +
box3d_out / gbox_out (PostGIS BOX3D/BOX2D embedded in MEOS),
stbox_space/time/space_time_tiles, t{int,float}box_time/value/value_time
_tiles, tnumber_value_split / _value_time_split (Datum splits with IEEE
bit-packed vsize/vorigin), tbox_get_value_time_tile (single-tile lookup
with MeosType basetype/spantype enum dispatch), tpoint_tfloat_to_geomeas,
tpoint_as_mvtgeom, tnumber_to_tbox.
==== Audit infrastructure ====
scripts/parity-audit.py — regenerable. Match strategy: snake_case →
camelCase, type-prefix stripping, wrapper-style dispatcher recognition,
type-suffix matching. Out-of-scope buckets:
- Section-level: GiST/SPGiST opclasses, set/span/spanset index files,
019_geo_constructors (PG geometric types), 999_oid_cache
- Suffix-level: PG plumbing (_in/_out/_recv/_send, _transfn/_combinefn/
_finalfn/_serialize/_deserialize, _sel/_joinsel/_supportfn/_analyze,
_typmod_in/_out, _cmp/_eq/_ne/_lt/_le/_gt/_ge/_hash/_hash_extended)
- Exact name: range/multirange (PG range types, NOT in MEOS),
create_trip (BerlinMOD generator, PG-only), transform_gk (SECONDO
Gauss-Krüger projection)
Note: box2d/box3d ARE addressable (PostGIS embedded in MEOS).
Deferred families: cbuffer, npoint, pose, rgeo.
docs/parity-status.md — per-section coverage report (regenerable).
49e323b to
aaaa05e
Compare
… 10 residuals JMEOS regenerated against MEOS 1.4 amalgamated headers (JMEOS PR MobilityDB#15) exposes ~120 of the symbols previously bound by MobilitySpark's supplementary MeosNative.java JNR-FFI interface. This commit: * bumps libs/JMEOS-1.4.jar to the regenerated artefact * migrates ~120 MeosNative.INSTANCE.X callsites to functions.X (or functions.MeosLibrary.meos.X for the long-typed timestamp / out-param functions where the OffsetDateTime wrapper is unwanted) * trims MeosNative.java from 326 lines / 133 method declarations to 81 lines / 10 declarations — the residuals all live in MEOS private headers (meos_internal.h, meos_internal_geo.h, temporal/temporal.h, temporal/meos_catalog.h) and use Datum / MeosType parameters that the JMEOS generator does not currently lower: mobilitydb_version, mobilitydb_full_version, temporal_values_p, set_make_free, temptype_basetype, temporal_mem_size, tnumber_value_split, tnumber_value_time_split, tnumber_value_time_boxes, tbox_get_value_time_tile * fixes a handful of MEOS 1.4 API-rename callsites surfaced by the regen: temporal_value_at_timestamptz → tgeo_value_at_timestamptz, acontains_geo_tpoint → acontains_geo_tgeo, tpoint_transform_pipeline → tspatial_transform_pipeline, temporal_to_tsequence(string interp) → (int interp), temporal_append_tinstant(temp, inst, …) → (temp, inst, interp, …), temporal_lower_inc / _upper_inc → boolean directly (no "!= 0") Tests: 907/907 green (unchanged from pre-regen baseline).
After JMEOS PR MobilityDB#15 added Datum -> long and MeosType -> int generator lowering plus the 10 private-header extern declarations to its amalgamated MEOS header, every MEOS symbol called by MobilitySpark is exposed by functions.functions.* and there is no longer any reason to maintain a parallel JNR-FFI interface in this repository. Removed: - src/main/java/org/mobilitydb/spark/MeosNative.java (was 81 lines / 10 declarations after the previous trim) - 'import org.mobilitydb.spark.MeosNative' from 5 callsite files Migrated 13 callsites across AccessorAliasUDFs, TileUDFs, and SubtypeConstructorUDFs: mobilitydb_version -> functions.mobilitydb_version mobilitydb_full_version -> functions.mobilitydb_full_version temporal_mem_size -> functions.temporal_mem_size temptype_basetype -> functions.temptype_basetype temporal_values_p -> functions.temporal_values_p set_make_free -> functions.set_make_free tnumber_value_split -> functions.MeosLibrary.meos.tnumber_value_split tnumber_value_time_split -> functions.MeosLibrary.meos.tnumber_value_time_split tnumber_value_time_boxes -> functions.MeosLibrary.meos.tnumber_value_time_boxes tbox_get_value_time_tile -> functions.MeosLibrary.meos.tbox_get_value_time_tile Tests: 907 / 907 green.
d591b53 to
d4c08a3
Compare
…handler The noexit error handler was added to MEOS in 9ee6cf721 (May 9, JVM- crash safety) and removed again in ae43d2f4a (May 10, JSONB integration commit that reverted the related thread-safety patch in error.c). JMEOS PR MobilityDB#15 followed suit and dropped the symbol from the regen amalgam (it was no longer in libmeos.so). MobilitySpark callers — three sites: MeosThread.java's per-thread init, MobilitySparkSession.create(), and NativeMemoryLeakTest's @BeforeAll — now install the handler via Class.getMethod() + invoke() and silently fall through if the symbol is absent. Net behaviour: * MEOS installed with noexit (older builds): handler installed, crashes prevented, BerlinMOD memory-leak tests run end-to-end. * MEOS installed without noexit (current branch): handler skipped; MEOS reverts to default_error_handler which calls exit() on any error. 845 / 907 MobilitySpark tests still pass. The 62 that don't are GeoUDFsExt5Test + STBoxUDFsTest, which trigger MEOS error paths that now tear down the JVM. Restoring noexit upstream brings the count back to 907 / 907. Also bumps libs/JMEOS-1.4.jar to the regen artefact from JMEOS PR MobilityDB#15 commit 490ca07 (scripts + smoke test + dropped 2 missing externs).
Pulls in JMEOS PR MobilityDB#15 (rebased) which now includes the dropped 'inline' fix + the noexit handler from MobilityDB PR #939. Once PR #939 lands and JMEOS PR MobilityDB#15 merges, MobilitySpark goes from 845 / 907 (reflective fallback installed by eb58420) to 906 / 907 (noexit installed natively). The remaining 1 failure is MathUDFsExtTest.tnumberTrend_tint — fixture passes a tint sequence (default STEP interpolation) to tnumber_trend() which validates linear interpolation. Tracked as a separate fixture-fix follow-up.
tnumber_trend requires linear interpolation; tint sequences default to step interpolation, so MEOS validates and returns NULL. The previous test asserted non-null, which only held while MEOS was lenient about this validation; the validation has tightened in the current source tree. Renames tnumberTrend_tint_returns_nonnull -> tnumberTrend_tint_step_returns_null and inverts the assertion to document the actual MEOS behaviour. The tfloat case at line 95 covers the main code path. Tests: 907 / 907 green.
…the helper
Once MobilityDB PR #939 is treated as landed (per the issued-PR-as-landed
policy), meos_initialize_noexit_error_handler exists in mainline meos.h
and libmeos.so. The reflective Class.getMethod() dance that survived
both the symbol-present and symbol-absent cases is no longer needed.
Three callsites simplified back to a direct call:
- MeosThread.java per-thread MEOS init
- MobilitySparkSession.java session-level init
(delegated to MeosThread.ensureReady;
duplicate meos_initialize/timezone calls
also removed)
- NativeMemoryLeakTest.java test-suite @BeforeAll
Net: ~24 lines of indirection removed across 3 files, plus one
unused 'import functions.functions' in the test.
Tests: 907 / 907 green.
MeosNative.java was deleted in commit 06765e2; tboxExpandFloat / tboxExpandInt are now wired directly via functions.tfloatbox_expand / tintbox_expand. Comment had no actionable content.
This was referenced May 11, 2026
…EADY MEOS spatial functions (eIntersects, eContains, eDwithin, etc.) call into GEOS through liblwgeom. GEOS 3.12 routes every reentrant function through a thread-local context handle. The first reentrant call on a thread that has not invoked `GEOS_init_r()` raises `context handle is uninitialized, call initGEOS` and aborts the JVM. MEOS's internal spatial helpers call `initGEOS(lwnotice, lwgeom_geos_error)` lazily on first use, but the call is not thread-safe — two Spark task threads racing through the same MEOS helper corrupt the global GEOS state. Bind libgeos_c.so via JNR-FFI and call `GEOS_init_r()` from the per-thread `MEOS_READY` `ThreadLocal` initialiser. Each Spark task thread now gets its own GEOS context the first time it enters `ensureReady()`, before any MEOS spatial UDF can race the global init. Verified by running BerlinMOD Q2 (`eIntersects(t.trip, r.geom)`) end to end on Spark `local[1]`. Without this fix the JVM aborts at the first spatial UDF call. `local[2]` and higher still hit a separate race inside MEOS's internal `initGEOS(lwnotice, lwgeom_geos_error)` call sequence (the lwgeom callbacks are not reentrant). Closing that race needs MEOS- side changes — out of scope for this Spark commit.
The Spark master defaults to local[4] (validated against MobilityDB/MobilityDB#949 + #815, which together make MEOS thread-safe across GEOS, WKT/GMT, errno and timezone). Users can override with SPARK_MASTER=local[N] for tuned thread counts. Validation on local[4]: Q1: 420 ms, Q2: 43.4 s (2.05x speedup vs local[2]), Q3: 40.2 s, Q4: 46.5 s. Clean exit, no hs_err_pid.
meos_initialize() owns the per-thread GEOS context handle (mirroring the existing PROJ pattern in MEOS). MeosThread.MEOS_READY only needs to call meos_initialize, meos_initialize_timezone and the noexit error handler — no separate JNR-FFI binding to libgeos_c is required. Validated on --master local[4]: Q1: 420 ms, Q2: 43.4 s, no SIGSEGV, no hs_err_pid. Depends on MobilityDB/MobilityDB#949 (per-thread GEOS context inside MEOS).
DistanceUDFs.registerAll() previously aliased "nearestApproachDistance" to nadTgeoGeo (tgeo × geometry). GeoUDFs.registerAll() registers the same name to the tgeo × tgeo lambda (which calls nad_tgeo_tgeo via temporal_from_hexwkb). Because registerAll runs in alphabetical order of UDF classes, DistanceUDFs shadowed GeoUDFs and resolved the bare "nearestApproachDistance" call to nadTgeoGeo. Q5 of BerlinMOD calls nearestApproachDistance(t1.trip, t2.trip) — both tgeompoint. Under the shadowed registration, the second tgeo's hex- WKB string was passed to geo_from_text, which returned a parse error on every cross-join row. The tgeo × tgeo registration in GeoUDFs is what MobilityDB exposes under the bare SQL name; keep it. Callers wanting tgeo × geometry use the explicit "nadTgeoGeo" name. Validated: Q5 of MobilitySpark BerlinMOD on local[4]: 508 s (matches the MobilityDB and MobilityDuck reference timings within the cross- join cost).
H3IndexJnrBindings loads four MEOS H3 symbols directly through JNR-FFI: tgeompoint_to_th3index, geo_to_h3index_set, ever_eq_th3index_th3index, and ever_eq_anyof_h3indexset_th3index. This sidesteps the JMEOS function generator's missing H3Index typedef support, so the h3 prefilter surface runs against the mainline JMEOS-1.4 jar. Th3IndexPrefilterUDFs registers four Spark UDFs that wrap the JNR bindings with hex-WKB string marshalling consistent with the rest of the MobilitySpark UDF surface: tgeompointToTh3index(STRING, INTEGER) -> STRING geoToH3IndexSet(STRING, INTEGER) -> STRING everEqTh3IndexTh3Index(STRING, STRING) -> BOOLEAN everIntersectsH3IndexSetTh3Index(STRING, STRING) -> BOOLEAN These match the MobilityDuck h3 prefilter surface (PR #131 on MobilityDuck) and the MobilityDB SQL operator names, so the BerlinMOD th3index portable SQL has a uniform shape across the three platforms for the cross-join queries (Q4, Q5, Q6, Q7, Q10, Q11, Q12, Q15, Q17).
The MEOS H3 symbols (geo_to_h3index_set, ever_eq_anyof_h3indexset_th3index, etc.) are compiled into libmeos.so but the binary may not declare libh3 as a DT_NEEDED dependency. The JVM loader hits an undefined-symbol error on degsToRads / radsToDegs when MobilitySpark's h3 prefilter UDF makes its first JNR-FFI call. Set LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libh3.so by default; allow LIBH3=/path override.
close() runs before spark.stop() in the standard try-with-resources benchmark/usage pattern, so meos_finalize() tears down MEOS global and per-thread TLS state while Spark executor threads are still alive; their subsequent teardown then double-frees the already-finalized MEOS TLS, aborting the JVM with double free or corruption (fasttop) during shutdown. The OS reclaims native MEOS memory at JVM exit, so the explicit finalize is unnecessary and unsafe in the Spark and surefire lifecycles; it belongs only in a standalone main that owns the whole JVM with no live MEOS-using threads at exit.
expandSpace and geoTimeStbox serialised the STBox with stbox_as_hexwkb(box, (byte) 0, ...). WKB variant 0 omits the SRID, so bboxOverlaps re-parsing it via stbox_from_hexwkb gets SRID 0; overlaps_tspatial_stbox then compares an SRID-3812 trip against an SRID-0 box, returns false for every pair, and Q10's WHERE ... AND bboxOverlaps(t2.trip, expandSpace(t1.trip, 3)) silently drops all matches (0 rows instead of the expected count). Serialise with WKB_EXTENDED (0x04) so the SRID round-trips; Q10 then returns the correct rows, matching MobilityDB's native && operator.
CI vendors $GITHUB_WORKSPACE/lib/libmeos.so for the unit tests (.github/workflows/maven.yml + pom surefire -Djava.library.path). The committed binary was a stale MEOS build predating the ensure_linear_interp guard in tnumber_trend, so tnumber_trend on a step-interpolated tint returned a computed trend instead of NULL, deterministically failing MathUDFsExtTest.tnumberTrend_tint_step_returns_null (expected null, got a tfloat hex-WKB). The test and the AnalyticsUDFs.tnumberTrend wrapper are correct against current MEOS: verified that the current libmeos returns NULL for that exact input while the stale one returns non-null. Replace lib/libmeos.so with a current MEOS 1.4 build that carries the guard.
This reverts commit ca9676d.
…lityDB State present coverage only (858/858 active addressable temporal+geo, 100%) with the scope partition and deferred families shared with MobilityDuck; drop dated-milestone and changelog narrative. parity-status.md regenerated from scripts/parity-audit.py against current MobilityDB master.
This was referenced May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docs/parity-status.md(audit script:scripts/parity-audit.py). Compare to MobilityDuck 79.3%.feat(parity)commit per the "1 feature = 1 commit" ecosystem policy.Per-section coverage
22 of 51 active sections at 100%. See
docs/parity-status.mdfor the full table.Sections still under 100% are dominated by:
Methodology
Adapted from
MobilityDuck/scripts/parity-audit.pywith two MobilitySpark-specific enhancements:spark.udf().register("name", ...)fromsrc/main/java/**/*.javatnumber/tpoint/tgeo/…), wrapper-style dispatcher recognition (temporal_above↔stboxAboveTpoint), type-suffix matching (always_eq↔alwaysEqTintInt)Same out-of-scope and deferred bucketing as MobilityDuck:
_in/_out/_recv/_send,_transfn/_combinefn/_finalfn,_sel/_joinsel/_supportfn/_analyze, btree opclass supportTest plan
mvn test— 907/907 green on Linux (Java 21, Spark 3.5)BerlinMODBenchThe single
feat(parity)commit body lists every UDF added/extended and the newMeosNativesymbols.