Skip to content

Add the BerlinMOD nine-query three-form parity matrix for NebulaStream#15

Open
estebanzimanyi wants to merge 2 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/berlinmod-streaming-forms
Open

Add the BerlinMOD nine-query three-form parity matrix for NebulaStream#15
estebanzimanyi wants to merge 2 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/berlinmod-streaming-forms

Conversation

@estebanzimanyi

@estebanzimanyi estebanzimanyi commented May 21, 2026

Copy link
Copy Markdown
Member

Add the BerlinMOD nine-query parity matrix for NebulaStream across the three query forms (instantaneous, sequence, and full streaming), as the reference surface for measuring streaming MEOS parity. Base of the streaming-parity series.

@estebanzimanyi estebanzimanyi force-pushed the feat/berlinmod-streaming-forms branch from a5ff0f0 to 6d3f885 Compare May 21, 2026 06:07
@estebanzimanyi estebanzimanyi changed the title feat(berlinmod): scaffold BerlinMOD-Q1..Q4 × 3 forms (12 YAMLs, 12/27 cells) feat(berlinmod): scaffold BerlinMOD-Q1..Q4+Q7 × 3 forms (21 YAMLs, 15/27 cells) on NebulaStream May 21, 2026
@estebanzimanyi estebanzimanyi force-pushed the feat/berlinmod-streaming-forms branch from 6d3f885 to 395e364 Compare May 21, 2026 06:40
@estebanzimanyi estebanzimanyi changed the title feat(berlinmod): scaffold BerlinMOD-Q1..Q4+Q7 × 3 forms (21 YAMLs, 15/27 cells) on NebulaStream feat(berlinmod): scaffold the full BerlinMOD-9 × 3-form parity matrix on NebulaStream (33 YAMLs, 27/27 cells) May 21, 2026
@marianaGarcez

Copy link
Copy Markdown
Contributor

These are great additions, happy to review when you're ready!

@marianaGarcez marianaGarcez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good and adds new queries. I do not alter the existing code or add functions

@marianaGarcez marianaGarcez marked this pull request as ready for review May 29, 2026 10:14
@estebanzimanyi estebanzimanyi changed the title feat(berlinmod): scaffold the full BerlinMOD-9 × 3-form parity matrix on NebulaStream (33 YAMLs, 27/27 cells) Add the BerlinMOD nine-query three-form parity matrix for NebulaStream Jun 4, 2026
…aho-mqtt-cpp

The Nix environment provides paho-mqtt-cpp as a shared-only library and does
not include libmeos.  Four changes make the build pass:

- flake.nix: add stock paho-mqtt-c + paho-mqtt-cpp to baseThirdPartyDeps;
  pass -DNES_ENABLE_MEOS=OFF in both defaultPackage and devShell cmakeFlags.

- nes-plugins/CMakeLists.txt: introduce NES_ENABLE_MQTT and NES_ENABLE_MEOS
  options that gate the respective plugin subdirectories via
  activate_optional_plugin().

- nes-plugins/Sources/MQTTSource, nes-plugins/Sinks/MQTTSink: select
  PahoMqttCpp::paho-mqttpp3-static when available, fall back to the shared
  PahoMqttCpp::paho-mqttpp3 target (Nix only ships the shared variant).

- nes-physical-operators: gate all MEOS-specific add_plugin calls and the
  nes-meos link behind if(NES_ENABLE_MEOS) in three CMakeLists files
  (top-level, Functions/Meos, Aggregation/Function/Meos).

- CMakeLists.txt (root): declare option(NES_ENABLE_MEOS) and
  add_compile_definitions(NES_ENABLE_MEOS) before the nes-* subdirectory
  loop so the preprocessor symbol is visible to all sibling components.

- nes-query-optimizer/LowerToPhysicalWindowedAggregation.cpp: guard the
  TemporalSequenceAggregationPhysicalFunction include and instantiation
  with #ifdef NES_ENABLE_MEOS so the translation unit compiles and links
  when MEOS is disabled.
… on NebulaStream (33 YAMLs, 27/27 cells)

Additive scaffold for the BerlinMOD-9 × 3 streaming-form parity contract
on MobilityNebula, sibling to the existing SNCB Q-series and matching
the MobilityFlink MobilityDB#3 / MobilityKafka MobilityDB#1 streaming-form definitions.

All 27 cells covered:

  Q1 'which vehicles have appeared'      — full (continuous + windowed + snapshot)
  Q2 'where is vehicle X at time T'      — full
  Q3 'vehicles within 5 km of P'         — full
  Q4 'vehicles inside region R (polygon)'— full
  Q5 'pairs of vehicles meeting near P'  — partial (emit per-vehicle trajectories near P; consumer joins)
  Q6 'cumulative distance per vehicle'   — partial (emit TEMPORAL_SEQUENCE; consumer computes length)
  Q7 'first passage of vehicle through POI' × {POI1, POI2, POI3}
                                          — full (per-POI fan-out)
  Q8 'vehicles within d of LINESTRING'   — full (edwithin_tgeo_geo with LINESTRING geometry)
  Q9 'distance between X and Y at time T'— partial (emit X and Y trajectories; consumer joins)

18 of 27 cells are FULL (the BerlinMOD-Q semantic is computed entirely
inside NebulaStream). 9 cells are PARTIAL — NebulaStream emits the
per-window inputs (trajectory, candidate vehicles) and a consumer
post-processes for the final BerlinMOD-Q answer. The partial pattern
is the natural expression of these queries in NebulaStream's current
SQL surface; the path to FULL is documented per-Q in
docs/berlinmod-streaming-forms.md (a stream-self-join for Q5/Q9, a
temporal_length scalar function for Q6).

Form mapping to NebulaStream windows:

  continuous: SLIDING(time_utc, SIZE 1 SEC, ADVANCE BY 1 SEC)
  windowed:   TUMBLING(time_utc, SIZE 10 SEC)
  snapshot:   TUMBLING(time_utc, SIZE 5 SEC)

MEOS-side surface consumed (already exposed by PR MobilityDB#14 + follow-ups):

  edwithin_tgeo_geo — Q3 (POINT predicate), Q4 (POLYGON, d=0.0),
                      Q5 (POINT predicate), Q7 (per-POI POINT),
                      Q8 (LINESTRING predicate)
  TEMPORAL_SEQUENCE — Q2 / Q5 / Q6 / Q9 (per-window per-vehicle trajectory)

No new MEOS PhysicalFunction classes added; no C++ changes; no SNCB
Q-series modifications. All 33 YAMLs are additive in a new
Queries/berlinmod/ subdirectory.

Add (additions):
  Queries/berlinmod/q1_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q2_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q3_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q4_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q5_{continuous,windowed,snapshot}.yaml          (3, partial)
  Queries/berlinmod/q6_{continuous,windowed,snapshot}.yaml          (3, partial)
  Queries/berlinmod/q7_poi{1,2,3}_{continuous,windowed,snapshot}.yaml (9, full via fan-out)
  Queries/berlinmod/q8_{continuous,windowed,snapshot}.yaml          (3, LINESTRING predicate)
  Queries/berlinmod/q9_{continuous,windowed,snapshot}.yaml          (3, partial)
  Input/input_berlinmod.csv  (sample data: 3 vehicles × 21 events, 14 simulated seconds)
  docs/berlinmod-streaming-forms.md

Validation: every YAML parses cleanly via python3 yaml.safe_load.
Runtime verification gated on the NebulaStream test harness.

Coverage: 27 of 27 cells (100 %), with 18 FULL and 9 PARTIAL annotated
explicitly per Q. Path to FULL for the 9 PARTIAL cells is one
MobilityNebula C++ PhysicalFunction class each (or a NebulaStream
upstream stream-self-join), documented in
docs/berlinmod-streaming-forms.md.
@estebanzimanyi estebanzimanyi force-pushed the feat/berlinmod-streaming-forms branch from 395e364 to bfdc695 Compare June 11, 2026 21:13
estebanzimanyi added a commit to estebanzimanyi/MobilityNebula that referenced this pull request Jun 11, 2026
The PAIR_MEETING aggregation (added in MobilityDB#17) hardcoded the meeting-distance
threshold at 200 m via a static constexpr DMEET_METRES, with the PR body
noting parameterization as future work. This PR lands that future work:
PAIR_MEETING now takes a fifth argument — a numeric constant in metres —
and the physical operator uses it per-query.

## Surface

  PAIR_MEETING(lon, lat, ts, vehicle_id, dMeet)
                                          ^^^^^ new fifth arg (numeric constant, metres)

The first four args remain FieldAccess (lon, lat, ts, vehicle_id); the
fifth is pulled from the parser's constantBuilder as a numeric literal,
parsed via std::stod, and threaded through the logical→physical lowering
chain into the lower() lambda alongside the existing state pointers.

## Files (9, all stacked on MobilityDB#18MobilityDB#17MobilityDB#16MobilityDB#15)

| Layer | File |
|---|---|
| Physical .hpp | PairMeetingAggregationPhysicalFunction.hpp — `DMEET_METRES` constexpr → `DEFAULT_DMEET_METRES` + instance field `dMeetMetres` |
| Physical .cpp | PairMeetingAggregationPhysicalFunction.cpp — constructor takes dMeet; lower() passes it to the captureless lambda via `nautilus::val<double>` |
| Logical .hpp  | PairMeetingAggregationLogicalFunction.hpp — constructor + create() factory take dMeet; getter `getDMeetMetres()` |
| Logical .cpp  | PairMeetingAggregationLogicalFunction.cpp — initialize field; Registrar deserialize path uses DEFAULT_DMEET_METRES (see Serde caveat below) |
| Parser        | AntlrSQLQueryPlanCreator.cpp — both PAIR_MEETING dispatch sites (lexer-token case + funcName string-name case) extract the constant from constantBuilder, std::stod it, pass to create() |
| Lowering      | LowerToPhysicalWindowedAggregation.cpp — pmDescriptor->getDMeetMetres() flows to the physical constructor |
| YAMLs (×3)    | Queries/berlinmod/q5_continuous.yaml, q5_snapshot.yaml, q5_windowed.yaml — add `, 200.0` as the explicit fifth arg; comments updated to reflect the parameterization |

## Serde round-trip caveat (out of scope for this PR)

`AggregationLogicalFunctionRegistryArguments` is strongly typed to
`vector<FieldAccessLogicalFunction>` — there is no slot for a numeric
constant in the existing Registrar interface, and
`SerializableAggregationFunction` has no proto field for it either. As
a result:

- The parser path (live query execution) is FULLY parameterized — dMeet
  flows from SQL to physical correctly.
- The Serde deserialize path falls back to DEFAULT_DMEET_METRES
  (preserves the 200 m scaffold behaviour). Round-trip fidelity for the
  dMeet value requires (a) adding a new field to
  SerializableAggregationFunction.proto, (b) extending
  AggregationLogicalFunctionRegistryArguments to carry it, and (c)
  threading both through Serialize/Register. That's an infrastructure
  change touching every registered aggregation; tracked as a follow-up.

## Build / test verification

Cannot compile-verify locally — NebulaStream needs the full C++23 +
vcpkg toolchain. Submitted for maintainer build verification (cc
@marianaGarcez). Expected to compile cleanly; the only construction-time
behaviour change is the constructor signature (5 params → 6 params for
physical, 5 → 6 for logical create/ctor); the only runtime behaviour
change is that dMeet is now read from the instance field instead of the
class constexpr (the lambda receives it via the nautilus::val<double>
extra arg).

## Mirrors the CROSS_DISTANCE shape

CROSS_DISTANCE (also added by MobilityDB#17, hardcoded VID_A=100, VID_B=200) has
the exact same parameterization pattern; a sibling PR can apply the
same change with (lon, lat, ts, vid, vid_a, vid_b) — 6 args total
instead of 5. Holding for separate PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants