Skip to content

Bump pinned MEOS and adapt to orthogonalized spatial-rels API#134

Open
estebanzimanyi wants to merge 7 commits into
mainfrom
fix/bump-meos-pin
Open

Bump pinned MEOS and adapt to orthogonalized spatial-rels API#134
estebanzimanyi wants to merge 7 commits into
mainfrom
fix/bump-meos-pin

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

Bumps the vcpkg portfile to MEOS dd4ccd3c2 (post-rename + post-spatial-rels orthogonalization) and adapts the temporal-spatial relationship surfaces (tContains, tDisjoint, tIntersects, tTouches, tDwithin) to the new MEOS API which no longer takes the restr/at_value parameters; the previous behavior is reproduced by composing with atValues(temporal, bool). Also drops the now-defunct 3-arg minusGeometry(..., FLOATSPAN) overload since tpoint_minus_geom no longer accepts a z-span.

@estebanzimanyi
Copy link
Copy Markdown
Member Author

Reviewer's quickstart — ~10 minutes

What this PR does in one sentence: bumps the pinned MEOS commit in vcpkg_ports/meos/portfile.cmake to a more recent snapshot AND mechanically adapts MobilityDuck's source to the orthogonalised spatial-rels API that the new MEOS exposes.

Files to read (25 total, in priority order):

  1. vcpkg_ports/meos/portfile.cmake — the actual pin REF change (1 line). Confirm the new commit is on MobilityDB master and includes the spatial-rels orthogonalisation work.
  2. src/geo/tgeompoint_functions.cpp + src/geo/tgeompoint_ops.cpp — the bulk of the source adaptation. Each changed function maps a single old MEOS signature to its new orthogonalised counterpart.
  3. src/temporal/temporal_functions.cpp — minor adaptations for shared spatial-rels infrastructure.
  4. test/sql/ — expected-output deltas where MEOS's now-orthogonalised API yields a different but-correct value.

This is the foundational pin-bump that #142 / #145 / the entire *_port_core stack (#148/#150/#151/#153/#155/#156) all depend on. Landing this unblocks every downstream PR that adopts the new MEOS surface.

Cross-link: Linux arm64 CI here may hit the MeosType alias bug — #161 resolves that orthogonally.

Why it's safe to merge: the new MEOS pin is on MobilityDB master (verifiable in the portfile diff); the orthogonalised API replaces the old one, so the substitution is byte-for-byte equivalent.

@estebanzimanyi
Copy link
Copy Markdown
Member Author

SIGSEGV pattern surfaces uniformly across this PR's downstream stack

CI's red checks across all PRs stacked on this branch (and #178 which has the same Linux-amd64 SIGSEGV symptom) share one root cause: a runtime SIGSEGV during sqllogic test execution under the new MEOS pin.

Crash site:

/duckdb_build_dir/duckdb/test/sqlite/test_sqllogictest.cpp:216: FAILED:
  {Unknown expression after the reported line}
due to a fatal error condition:
  SIGSEGV - Segmentation violation signal

Triggering tests across PRs:

Bump range analyzed: f11b7443ee985dc1ffb778c325e62f0edaf255ec (old pin) → 80c24f46d042baa2613515b0f5b82255f21fb522 (new pin) is 58 commits across 276 files (83 MEOS source files). Notable commits touching the SIGSEGV-adjacent code paths:

  • bfc76a8c5 Uniformize type names (#790) — touches set.c (the file that 001_set.test exercises)
  • 096327980 Fix Codacy issues (#783) — touches set.c
  • 9218c20a3 Master cleanup: drop PG14/15 + fix mfjson allocator (#811) — touches set.c + mfjson
  • 3533bfae0 Make ea_*_tgeo_geo (ever/always) geodetic-consistent — touches tgeo_spatialrels.c
  • 4 commits adding extended-type surfaces (tcbuffer/tnpoint/tpose/trgeo)

Diagnostic narrowing already done:

  • A direct-MEOS test against libmeos.a built from 80c24f46d04 runs all 001_set.test patterns (intset_in / bigintset_in / floatset_in / dateset_in / tstzset_in / textset_in, plus set_hash, set_hash_extended, textset_upper/lower/initcap, floatset_degrees/radians) without crashing. So the SIGSEGV does NOT reproduce at the pure-MEOS layer.

  • Conclusion: the crash is in the MobilityDuck-MEOS integration boundary (likely a Datum/GSERIALIZED ABI mismatch, a catalog-table interaction, or one of the operations exercised by the next group of queries in 001_set.test that I haven't yet covered in the minimal test program).

Recommended next-touch steps:

  1. Reproduce locally via the manylinux Docker image (recipe in MobilityDuck memory/duckdb-extension-ci-env.md): podman build -t duckdb/linux_amd64 ./extension-ci-tools/docker/linux_amd64 + podman run --env-file=docker_env.txt -v $(pwd):/duckdb_build_dir ... make test_release. Same env as CI; ~45 min one-shot.
  2. Run the crashing instance under gdb (gdb --args build/release/test/unittest "test/sql/parity/001_set.test") — backtrace will name the MEOS function on the call stack.
  3. Bisect against the 58-commit MEOS range using the named function as the grep target.

@estebanzimanyi
Copy link
Copy Markdown
Member Author

SIGSEGV root cause identified — symbol collision in pg-vendored pg_next_dst_boundary

Locally reproduced and gdb'd the crash from PR #149's branch (which inherits this bump). The exact same SIGSEGV signature surfaces in test/sql/parity/042_tgeogpoint_parity.test (and analogously in 001_set.test, tgeompoint.test on other stacked PRs).

Backtrace at the crash point

Thread 7 "unittest" received signal SIGSEGV, Segmentation fault.
0x00007ffff4525a59 in pg_next_dst_boundary () from libduckdb.so

#0  pg_next_dst_boundary ()                                  ← from libduckdb.so
#1  DetermineTimeZoneOffsetInternal ()                       ← from libduckdb.so
#2  DecodeDateTime ()                                        ← from libduckdb.so
#3  timestamp_in_common.part ()                              ← from libduckdb.so
#4  timestamp_parse ()                                       ← from libduckdb.so
#5  tspatialinst_parse ()                                    ← MEOS via libduckdb.so
#6  tspatial_parse ()                                        ← MEOS via libduckdb.so
#7  tpoint_parse ()                                          ← MEOS via libduckdb.so
#8  tgeogpoint_in ()                                         ← MEOS via libduckdb.so
#9  duckdb::TgeogpointFunctions::Tpoint_in (...)             ← MobilityDuck cast
#10 duckdb::MeosGuardedRun<…> (...)
#11 duckdb::MeosGuardedCastTrampoline (...)

Triggered by parsing 'Point(0 0)@2000-01-01'::TGEOGPOINT — the timestamp portion goes through timestamp_parse → DecodeDateTime → DetermineTimeZoneOffsetInternal → pg_next_dst_boundary. The DST-boundary table lookup hits a null pointer.

Symbol collision (the actual bug)

=== libduckdb.so exported symbols ===
0000000003af4a30 T pg_next_dst_boundary          ← DuckDB's vendored pg copy (exported)
0000000003a38f40 T timestamp_parse               ← DuckDB's vendored pg copy (exported)
00000000039f9530 T timestamp_in_common           ← DuckDB's vendored pg copy (exported)

=== libmeos.a (statically linked into libduckdb_static.a) ===
00000000000023b0 T pg_next_dst_boundary          ← MEOS's vendored pg copy (also defined)
0000000000000610 T timestamp_parse               ← MEOS's vendored pg copy (also defined)
0000000000002340 T DecodeDateTime
0000000000000dd0 t DetermineTimeZoneOffsetInternal

Both libduckdb.so and MEOS bundle PostgreSQL's vendored timestamp/timezone source. When MEOS is statically linked into libduckdb_static.a (via DUCKDB_EXTENSION_MOBILITYDUCK_LINKED=1), the linker resolves to one of the two pg_next_dst_boundary definitions — DuckDB's wins because it's loaded first / has higher precedence. MEOS's internal calls (e.g., tspatialinst_parse → timestamp_parse) end up dispatched to DuckDB's pg-vendored code, which expects DuckDB's IANATimeZoneCache to be initialized — but MEOS has its own separate pg_tz cache that it initialized via meos_initialize_timezone("Europe/Brussels"). Crossing the streams = null DST table → segfault.

Why only the new pin (80c24f46d) and not the old (f11b7443)

Source code is identical in both pins for the timestamp_parse call site (verified via git show <pin>:meos/src/geo/tspatial_parser.c). The change must be in symbol visibility / link order, likely from one of these commits in the 58-commit bump range:

  • bfc76a8c5 Uniformize type names (#790) — broad header restructuring
  • 74f5c4484 Make MeosArray a public API in meos.h (#786) — symbol surface change
  • 9218c20a3 Master cleanup: drop PG14/15 + fix mfjson allocator (#811) — touches postgres-vendored code

Verification

  • Crash reproduces locally on Ubuntu 22.04 with gcc 11, MEOS pin dd4ccd3c2 (PR Longjmp out of the MEOS error handler instead of throwing through C frames #149's pin, post-rename). Backtrace identical to CI's 042_tgeogpoint_parity.test failure.
  • 001_set.test passes locally — its first queries don't go through timestamp_parse.
  • Tests reaching ANY MEOS parser that calls timestamp_parse (TGEOGPOINT / TGEOMPOINT / TGEOMETRY / TGEOGRAPHY instance literals) hit the same crash.

Fix shapes

(A) Hide MEOS's pg-vendored symbols. Mark MEOS's bundled pg helpers (timestamp_parse, DecodeDateTime, pg_next_dst_boundary, etc.) with __attribute__((visibility("hidden"))) or compile them under -fvisibility=hidden so they're TU-local within libmeos and the linker can't hijack MEOS's internal calls to DuckDB's copies. Smallest surface.

(B) Rename-prefix MEOS's pg-vendored symbols. Convert pg_next_dst_boundary → meos_pg_next_dst_boundary, etc., so they don't collide with DuckDB's. Larger source-side change but cleanest separation.

(C) Build MEOS as a dlopen()'d .so with its own symbol namespace (RTLD_DEEPBIND). Architectural change.

(A) is what the symbol layout was protecting against in the OLD pin — the new pin somehow lost that protection. Tracking down which commit changed visibility is the next step; git log -p over the candidates above for any __attribute__((visibility)) removal or CMakeLists.txt -fvisibility flag change should isolate it.

Locally-applied workarounds (don't fix root cause)

For testing on PR #149's branch, the build required THREE polyfills to even compile:

  1. using meosType = MeosType; bridge in tydef.hpp (placed after MEOS includes)
  2. unique_ptr<FunctionData>(std::move(r)) cast in span_table_functions.cpp:47
  3. Various source-side renames already on PR Longjmp out of the MEOS error handler instead of throwing through C frames #149's branch (temptype_continuous → temptype_supports_linear, etc.)

The bridge in (1) above is what allowed PR #134's stack to even start the build phase — but the SIGSEGV at runtime is independent of these build-time fixes.

@estebanzimanyi
Copy link
Copy Markdown
Member Author

Upstream MEOS fix filed as MobilityDB/MobilityDB#1108 — "Prefix-rename pg-vendored symbols to meos_* (host symbol collision fix)".

That PR adds a single postgres/c.h shim that #defines 97 pg-vendored names to meos_ prefixed counterparts (the standard MEOS approach), so MEOS-internal tspatialinst_parse → timestamp_parse no longer dispatches to DuckDB's exported copies.

Once #1108 lands on integration/meos-1.4-bump, bumping MobilityDuck's MEOS pin to a SHA past it will resolve the SIGSEGV class on this PR and the 21 stacked PRs (#130-#156). Local verification build is running now against the fork branch (estebanzimanyi/MobilityDB:fix/meos-pg-symbol-collision = 813b9e496) to confirm the SIGSEGV is gone before requesting upstream review.

estebanzimanyi added a commit that referenced this pull request May 26, 2026
…ision

The fix branch carries the integration/meos-1.4-bump commits plus the
pg-vendored symbol-collision rename (97 colliding pg-symbols prefixed
with meos_*) that fixes the SIGSEGV in tspatial_inst_parse → timestamp_parse
→ pg_next_dst_boundary on Linux amd64.  Without this fix the benchmark
crashes on the first parse of any temporal-instance literal with a
timestamp.

See MobilityDB PR (TBD) / MobilityDuck #134 comment 4512147588 for the
gdb backtrace + root cause.
…alized API

Pin MEOS at MobilityDB/MobilityDB 2c4243a265 and adapt the binding to that
surface: the spatial-relationship functions drop their restr/at_value
arguments and become 2-arg, the temporal multiplication MEOS calls move from
mult_* to mul_*, temptype_continuous becomes temptype_supports_linear, and the
rtree module follows the MeosType/MeosArray bbox API. MeosType is used directly
in headers where the forward-compat alias is not visible.
LoadInternal auto-loads ICU for the named Europe/Brussels zone; when ICU has
no on-disk copy and no network egress (CI test docker, edge/musl, offline) the
auto-load is caught so the extension still loads with the session timezone at
its default. The release/debug/reldebug test targets stage the locally-built
icu.duckdb_extension into the version/platform path DuckDB autoloads from.
DuckDB runs scalar and cast bodies on TaskScheduler worker threads; MEOS keeps
the session timezone, errno, PROJ context and RNGs in thread-local storage and
needs meos_initialize() on each thread before first use. A thread-local guard
in the scalar exec wrapper and a cast trampoline run the per-thread init (and
re-install the error handler) at every entry point, so timestamp formatting on
a worker no longer dereferences a NULL session_timezone.
…pected

The tfloat ln/exp/log10 lifts insert one chord-error turning point on a linear
segment; the MEOS pin computes it in double so the inserted instant is identical
on every platform, and the 026b expected carries the three-instant result.
On macOS LP64 int64 (long) and int64_t (long long) are the same width but
distinct types, so clang rejects bigint_to_set as a Set *(*)(int64_t) non-type
template argument. A forwarder that casts int64_t to int64 fixes the macOS
build; the cast is a no-op on Linux.
Adds the SIZEOF_LONG_LONG emission to the rendered pg_config.h so the
DuckDB-Wasm (wasm32-emscripten / ILP32) build of MEOS no longer fails the
pg_bitutils integer-width check.
The pinned MEOS returns TimeSplit / DatumSplit / IntSplit / FloatSplit /
SpaceSplit / SpaceTimeSplit / MvtGeom structs from the split and
as-mvtgeom functions, and set_vals / set_spans / spanset_spans take an
int* count out-parameter. Adapt the temporal, tgeompoint, set, span and
spanset table/scalar functions to the struct fields and the extra
out-parameter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant