Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,22 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
## [Development]
<!-- Do Not Erase This Section - Used for tracking unreleased changes -->

### Internal
- **GFQL / Cypher bounded-reentry runtime extraction (#987 Step 3)**: Moved bounded-reentry data-frame execution helpers (`_compiled_query_reentry_state`, `_compiled_query_scalar_reentry_state`, `_compiled_query_freeform_reentry_state`, `_freeform_broadcast_row_to_nodes`, `_union_scalar_reentry_results`, `_apply_optional_reentry_null_fill`, `_aligned_reentry_rows`, `_reentry_carry_payload`, `_ordered_reentry_start_nodes`, `_reentry_validation_error`, the two suggestion constants) out of `graphistry/compute/gfql_unified.py` into a new `graphistry/compute/gfql/cypher/reentry/execution.py` module so the bounded-reentry contract assembled at compile time (`ReentryPlan`) and the matching data-frame stitching live next to each other. `_entity_projection_meta_entry` moved to `graphistry/compute/gfql/cypher/result_postprocess.py` next to `WholeRowProjectionMeta` since it is shared between the connected-OPTIONAL-MATCH and bounded-reentry paths. Pure-move refactor — no semantic change; `gfql_unified.py` shrinks by ~440 LOC and now re-exports the moved private names via aliased imports so existing tests reaching into `graphistry.compute.gfql_unified._compiled_query_reentry_state` continue to work.

### Documentation
- **GFQL component-labeling examples + README clarity (#1324)**: Added concise WCC/SCC labeling examples for `compute_cugraph`, `compute_igraph('clusters')`, and local Cypher `CALL graphistry.cugraph.*` write/row modes in GFQL docs, clarified that component IDs are partition labels (not stable semantic IDs), and tightened the main README GFQL intro sentence for readability.
- **GFQL / Cypher docs — variable-length boundary refresh (#973)**: Updated direct-Cypher capability docs (`docs/source/gfql/cypher.rst`, `docs/source/gfql/spec/cypher_mapping.md`) to reflect current support for connected variable-length patterns and bounded/exact variable-length `WHERE` pattern predicates, while preserving explicit fail-fast notes for remaining path/list-carrier and advanced row-shaping gaps.

### Changed
- **GFQL / Cypher lowering — bounded/exact variable-length `WHERE` pattern predicates (#973)**: Removed the pre-normalization compiler gate that rejected bounded/exact variable-length `WHERE` pattern predicates and now lower these shapes through the existing WHERE-pattern rewrite and row-filter paths. Converted the old fail-fast test into positive execution coverage and added boolean-wrapper amplification (`OR`/`XOR`/`NOT`) for bounded variable-length `WHERE` predicates in `graphistry/tests/compute/gfql/cypher/test_lowering.py`.

### Tests
- **GFQL / Cypher two-MATCH reentry varlen regression hardening (#1001)**: Strengthened reentry varlen acceptance assertions from shape-only checks to exact expected rows, and added forward/reverse split-vs-connected query equivalence regressions to guard against wrong-row drift in the `match5-25/26` query family.

### Internal
- **GFQL / Cypher row-carrier follow-through cleanup (#989, post-#1260 split)**: Retired transitional lowering-level bounded-reentry delegator shims (`_map_terminal_reentry_query`, `_drop_bare_alias_items_from_stage`, `_rewrite_multi_whole_row_prefix`, `_compile_bounded_reentry_query`) that only forwarded into `graphistry/compute/gfql/cypher/reentry/runtime.py`. Lowering now calls runtime-owned reentry helpers directly at use sites, and the split-guard tests were trimmed to keep only projection-planning delegator assertions.

## [0.55.1 - 2026-05-05]

### Tests
Expand Down
52 changes: 23 additions & 29 deletions docs/source/gfql/cypher.rst
Original file line number Diff line number Diff line change
Expand Up @@ -206,11 +206,11 @@ Support Matrix
- Execute directly through ``g.gfql("...")``. Helper translation to a single ``Chain`` is stricter.
* - Variable-length relationship patterns
- Partial
- Direct Cypher supports endpoint-only traversals such as ``[*2]``,
``[*1..3]``, ``[*]``, and typed forms like ``[:R*2..4]``, plus bounded
connected multi-relationship patterns where the row shape stays in the
current supported subset. Path/list-carrier uses, bounded/exact
``WHERE`` pattern predicates, and broader branching/path-shaping cases
- Direct Cypher supports endpoint traversals such as ``[*2]``,
``[*1..3]``, ``[*]``, and typed forms like ``[:R*2..4]``; connected
multi-relationship variable-length patterns; and bounded/exact/fixed-point
variable-length ``WHERE`` pattern predicates in the current row-shaped
subset. Path/list-carrier uses and unsupported path/row-shaping cases
still fail fast.
* - ``CREATE`` / ``DELETE`` / ``SET``
- Not supported
Expand All @@ -236,9 +236,10 @@ Pattern Matching Forms
- Node labels and multi-label node patterns such as ``(p:Person:Admin)``.
- Relationship direction forms ``->``, ``<-``, and undirected ``-[]-``.
- Relationship type alternation such as ``[r:KNOWS|HATES]``.
- Single variable-length relationship patterns when they are the only
relationship in the connected pattern, including ``[*n]``, ``[*m..n]``,
``[*]``, and typed forms such as ``[:R*2..4]``.
- Single variable-length relationship patterns, including ``[*n]``,
``[*m..n]``, ``[*]``, and typed forms such as ``[:R*2..4]``.
- Connected patterns that mix variable-length and fixed-length relationships,
such as ``MATCH (a)-[:R*2]->()-[:S]->(c) RETURN c``.
- Connected comma-separated patterns such as
``MATCH (a)-[:A]->(b), (b)-[:B]->(c)``.
- Repeated ``MATCH`` clauses when they stay connected through shared aliases.
Expand All @@ -255,40 +256,35 @@ WHERE Forms
- Label predicates such as ``WHERE b:Foo:Bar``.
- Relationship-type predicates such as ``WHERE type(r) = 'KNOWS'``.
- Positive relationship-existence pattern predicates such as
``WHERE (n)-[:R]->()`` and bare fixed-point variable-length existence checks
such as ``WHERE (n)-[*]-()``.
- One positive relationship-existence pattern predicate may be combined with
ordinary row filters through top-level ``AND``, for example
``WHERE n.kind = 'x' AND (n)-[:R*]->() AND n.id <> 'a'``.
``WHERE (n)-[:R]->()`` and variable-length existence checks such as
``WHERE (n)-[*]-()`` and ``WHERE (n)-[:R*2]->()``.
- Pattern predicates can be combined with row predicates in the current
boolean subset, including ``AND`` / ``OR`` / ``XOR`` and ``NOT`` forms.

Variable-Length Relationship Boundary
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Direct Cypher multihop support is intentionally narrow in the current landing
slice. The supported direct forms include endpoint traversals and bounded
connected multi-relationship patterns where the result stays in the current
row-shaping subset, for example:
Direct Cypher multihop support remains intentionally bounded. The supported
direct forms include endpoint traversals, connected multi-relationship
patterns, and variable-length ``WHERE`` pattern predicates where the result
stays in the current row-shaping subset, for example:

- ``MATCH (a)-[*2]->(b) RETURN b``
- ``MATCH (a)-[:R*1..3]->(b) RETURN b``
- ``MATCH (a)<-[*2]-(b) RETURN b``
- ``MATCH (a)-[:R*1..2]-(b) RETURN b``
- ``MATCH (a)-[:R*2]->(b)-[:S]->(c) RETURN c``
- ``MATCH (a)-[:R]->(b), (b)-[:S*1..2]->(c) RETURN a.id AS a_id, c.id AS c_id``
- ``MATCH (n) WHERE (n)-[:R*2]->() RETURN n``
- ``MATCH (n) WHERE NOT (n)-[:R*2]->() RETURN n.id AS id``

The current compiler explicitly rejects these remaining subfamilies with
``GFQLValidationError`` instead of attempting unsound execution:

- path/list-carrier use of a variable-length relationship alias, such as
``RETURN r`` or ``count(r)``
- exact or bounded variable-length ``WHERE`` pattern predicates such as
``WHERE (n)-[:R*2]-()``
- top-level ``OR`` / ``NOT`` around variable-length ``WHERE`` pattern
predicates, or more than one positive pattern predicate in the same
``WHERE`` clause
- branching connected multihop patterns, or shapes that would require
unsupported path/relationship-carrier row shaping around a variable-length
segment
- shapes that still require unsupported path/relationship-carrier row shaping
around a variable-length segment
- connected multi-pattern relationship-alias projection such as
``RETURN r`` / ``r.prop`` when it would require unsupported row shaping
- multi-alias ``RETURN *`` projections that would require unsupported
Expand Down Expand Up @@ -431,10 +427,8 @@ Not Supported Today

- Variable-length relationship aliases used as path/list carriers, such as
``RETURN r`` or ``count(r)``.
- Exact or bounded variable-length ``WHERE`` pattern predicates such as
``WHERE (n)-[:R*2]-()``.
- Branching connected multihop patterns, or connected multihop shapes that
still require unsupported path/relationship-carrier row shaping.
- Connected multihop shapes that still require unsupported
path/relationship-carrier row shaping.
- Multiple disconnected ``MATCH`` patterns used as arbitrary joins.
- Multi-pattern re-entry shapes beyond the bounded single
``MATCH ... WITH ... MATCH ... RETURN`` form.
Expand Down
20 changes: 10 additions & 10 deletions docs/source/gfql/spec/cypher_mapping.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,11 @@ When translating from Cypher, you'll encounter three scenarios:
### Direct Translations
- Graph patterns: `(a)-[r]->(b)` → chain operations
- Property filters: WHERE clauses embed into operations
- Path traversals: direct `g.gfql("MATCH ...")` supports endpoint-only single
variable-length relationship forms such as `[*2]`, `[*1..3]`, and `[*]`.
Native GFQL still gives you the full explicit hop surface, including output
slicing, intermediate-hop aliasing, and rewrites for currently unsupported
direct-Cypher multihop shapes.
- Path traversals: direct `g.gfql("MATCH ...")` supports single and connected
variable-length relationship forms such as `[*2]`, `[*1..3]`, and `[*]`,
including bounded/exact variable-length `WHERE` pattern predicates in the
current row-shaped subset. Native GFQL still gives you the full explicit hop
surface (output slicing, intermediate-hop aliasing, and custom rewrites).
- Pattern composition: Multiple patterns become sequential operations
- Same-path constraints: `WHERE` across steps → `g.gfql([...], where=[...])`

Expand Down Expand Up @@ -255,10 +255,10 @@ g.gfql([
### Edge Patterns

Rows using `[*...]` below show the native GFQL rewrite for the same traversal
intent. Direct `g.gfql("MATCH ...")` now accepts these endpoint-only
single-variable-length relationship forms, while native GFQL remains the more
explicit option when you need intermediate-hop control or unsupported mixed
pattern shapes.
intent. Direct `g.gfql("MATCH ...")` accepts these variable-length forms in
the supported direct-Cypher subset, while native GFQL remains the more explicit
option when you need intermediate-hop control or advanced path/list-carrier
semantics.

| Cypher / intent | Python | Wire Protocol (compact) |
|-----------------|--------|-------------------------|
Expand All @@ -274,7 +274,7 @@ pattern shapes.
| `-[r:BOUGHT {amount: gt(100)}]->` | `e_forward({"type": "BOUGHT", "amount": gt(100)}, name="r")` | `{"type": "Edge", "direction": "forward", "edge_match": {"type": "BOUGHT", "amount": {"type": "GT", "val": 100}}, "name": "r"}` |

When you need constraints on intermediate hops, path/list-carrier semantics, or
mixed connected patterns beyond the current direct-Cypher subset, use repeated
advanced row-shaping beyond the current direct-Cypher subset, use repeated
single-hop GFQL steps with aliases instead of collapsing the traversal into one
multihop edge operator.

Expand Down
83 changes: 6 additions & 77 deletions graphistry/compute/gfql/cypher/lowering.py
Original file line number Diff line number Diff line change
Expand Up @@ -5590,34 +5590,6 @@ def _is_variable_length_relationship_pattern(relationship: RelationshipPattern)
)


def _reject_unsupported_variable_length_where_pattern_predicates(query: CypherQuery) -> None:
if query.where is None:
return
predicates: List[WherePatternPredicate] = [
predicate for predicate in query.where.predicates if isinstance(predicate, WherePatternPredicate)
]
if query.where.expr_tree is not None:
predicates.extend(_where_expr_tree_pattern_predicates(query.where.expr_tree))
for predicate in predicates:
relationships = [
element
for element in predicate.pattern
if isinstance(element, RelationshipPattern)
]
for relationship in relationships:
if not _is_variable_length_relationship_pattern(relationship):
continue
if relationship.min_hops is None and relationship.max_hops is None and relationship.to_fixed_point:
continue
raise _unsupported(
"Cypher WHERE pattern predicates currently support only bare variable-length fixed-point relationships, not exact or bounded hop counts",
field="where",
value=boolean_expr_to_text(query.where.expr_tree) if query.where.expr_tree is not None else None,
line=predicate.span.line,
column=predicate.span.column,
)


def _reject_nonterminal_variable_length_relationship_patterns(query: CypherQuery) -> None: # noqa: ARG001
"""No-op: variable-length rels in connected patterns are now supported.

Expand Down Expand Up @@ -7243,8 +7215,10 @@ def rewrite_text(expr: ExpressionText, field: str) -> ExpressionText:
# `_collect_secondary_property_refs` would fail-fast on what is in fact a
# forwarding pattern, blocking IC3 even after #1248 admits the prefix WITH.
secondary_forwarding_re = re.compile(r"[A-Za-z_][A-Za-z0-9_]*")
from graphistry.compute.gfql.cypher.reentry import runtime as _reentry_runtime

cleaned_with_stages_tail = tuple(
_drop_bare_alias_items_from_stage(
_reentry_runtime._drop_bare_alias_items_from_stage(
stage, secondary_aliases, identifier_re=secondary_forwarding_re
)
for stage in query.with_stages[1:]
Expand Down Expand Up @@ -7373,52 +7347,6 @@ def rewrite_text(expr: ExpressionText, field: str) -> ExpressionText:
return rewritten_query, rewritten_prefix_stage, tuple(sorted(secondary_aliases))


def _map_terminal_reentry_query(
compiled_query: CompiledCypherQuery,
*,
transform: Callable[[CompiledCypherQuery], CompiledCypherQuery],
) -> CompiledCypherQuery:
from graphistry.compute.gfql.cypher.reentry import runtime as _reentry_runtime

return _reentry_runtime._map_terminal_reentry_query(compiled_query, transform=transform)


def _drop_bare_alias_items_from_stage(
stage: ProjectionStage,
aliases: AbstractSet[str],
*,
identifier_re: "re.Pattern[str]",
) -> ProjectionStage:
from graphistry.compute.gfql.cypher.reentry import runtime as _reentry_runtime

return _reentry_runtime._drop_bare_alias_items_from_stage(stage, aliases, identifier_re=identifier_re)


def _rewrite_multi_whole_row_prefix(
prefix_stage: ProjectionStage,
*,
query: CypherQuery,
reentry_first_alias: Optional[str],
) -> Tuple[ProjectionStage, Tuple[ProjectionStage, ...], Dict[str, Tuple[str, ...]]]:
from graphistry.compute.gfql.cypher.reentry import runtime as _reentry_runtime

return _reentry_runtime._rewrite_multi_whole_row_prefix(
prefix_stage,
query=query,
reentry_first_alias=reentry_first_alias,
)


def _compile_bounded_reentry_query(
query: CypherQuery,
*,
params: Optional[Mapping[str, Any]] = None,
) -> CompiledCypherQuery:
from graphistry.compute.gfql.cypher.reentry import runtime as _reentry_runtime

return _reentry_runtime._compile_bounded_reentry_query(query, params=params)


def _compile_call_query(
query: CypherQuery,
*,
Expand Down Expand Up @@ -8331,7 +8259,6 @@ def _attach_graph_context(result: CompiledCypherQuery) -> CompiledCypherQuery:

normalizer = ASTNormalizer()
query = normalizer.rewrite_shortest_path(query)
_reject_unsupported_variable_length_where_pattern_predicates(query)
_reject_variable_length_path_alias_references(query, params=params)
query = normalizer.rewrite_where_pattern_predicates(query)

Expand All @@ -8349,7 +8276,9 @@ def _attach_graph_context(result: CompiledCypherQuery) -> CompiledCypherQuery:
params=params,
)
if query.reentry_matches:
return _attach_graph_context(_compile_bounded_reentry_query(query, params=params))
from graphistry.compute.gfql.cypher.reentry import runtime as _reentry_runtime

return _attach_graph_context(_reentry_runtime._compile_bounded_reentry_query(query, params=params))
if query.call is not None:
return _attach_graph_context(_compile_call_query(query, params=params))
if query.row_sequence:
Expand Down
2 changes: 2 additions & 0 deletions graphistry/compute/gfql/cypher/reentry/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
- prefix carry-column / order helpers (``carry``)
- AST/query rewriters that retarget reentry expressions onto carried columns
(``rewrite``)
- compile-time bounded-reentry query rewrites (``runtime``)
- data-frame execution stitching for bounded reentry (``execution``; #987 Step 3)

Public symbols are re-exported from ``cypher.lowering`` so existing imports
(``from graphistry.compute.gfql.cypher.lowering import _reentry_hidden_column_name``)
Expand Down
Loading
Loading