Skip to content

Cypher/GFQL: replace bounded reentry hidden-column handshake with an explicit ReentryPlan #987

@lmeyerov

Description

@lmeyerov

Problem

Bounded MATCH ... WITH ... MATCH ... reentry currently works, but the internal design is hard to reason about.

Today the same concept is spread across multiple mechanisms:

  • start_nodes_query in graphistry/compute/gfql/cypher/lowering.py
  • hidden __cypher_reentry_* columns and expression rewrites in graphistry/compute/gfql/cypher/lowering.py
  • _cypher_entity_projection_meta side-channel metadata
  • _compiled_query_reentry_state() stitching logic in graphistry/compute/gfql_unified.py

That makes the compiler/runtime contract implicit instead of explicit. A senior compiler / graph language / GPU engineer joining the project would have to reconstruct the model from several places at once.

Why This Matters

  • harder to audit vectorization and backend purity
  • harder to extend to the next row-seeded features
  • hidden invariants across compiler + runtime increase maintenance risk
  • lowering.py and gfql_unified.py are longer and conceptually denser than they need to be

Proposed Refactor

Treat bounded reentry as a first-class plan/runtime concept rather than a protocol assembled from side channels.

Recommended steps:

  1. Introduce an explicit ReentryPlan (or SeededMatchPlan) dataclass.

    • carried alias
    • id column
    • carried scalar outputs
    • ordering contract
    • trailing match alias contract
  2. Replace the current hidden-property rewrite protocol.

    • stop encoding carried scalars as synthetic __cypher_reentry_* property accesses
    • instead carry an explicit scalar mapping in the plan contract
  3. Move runtime stitching into a dedicated reentry module.

    • keep gfql_unified.py as dispatch/orchestration
    • move reentry-specific assembly/validation into a smaller targeted runtime helper module
  4. Make row-order and seed-row semantics explicit.

    • preserve order as part of the contract, not as an inferred merge behavior
  5. Split lowering.py by concern where useful.

    • general lowering
    • result projection planning
    • bounded reentry planning

Non-Goals

Success Criteria

  • existing bounded-reentry semantics stay green
  • current pandas + cudf bounded-reentry tests stay green
  • the reentry contract becomes readable from one place
  • low-hundreds LOC reduction across lowering.py + gfql_unified.py is plausible from collapsing duplicate protocol layers
  • follow-on work for multi-alias row carriers / optional null-extension becomes easier to reason about

Context

Current bounded-reentry hardening/validation work is in PR #975.
This issue is the follow-on cleanup/refactor lane, not a request to reopen that PR scope.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions