diff --git a/CHANGELOG.md b/CHANGELOG.md index e060a73887..e06ab1765e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ### Documentation - **GFQL component-labeling examples + README clarity (#1324)**: Added concise WCC/SCC labeling examples for `compute_cugraph`, `compute_igraph('clusters')`, and local Cypher `CALL graphistry.cugraph.*` write/row modes in GFQL docs, clarified that component IDs are partition labels (not stable semantic IDs), and tightened the main README GFQL intro sentence for readability. +- **GFQL / Cypher docs — variable-length boundary refresh (#973)**: Updated direct-Cypher capability docs (`docs/source/gfql/cypher.rst`, `docs/source/gfql/spec/cypher_mapping.md`) to reflect current support for connected variable-length patterns and bounded/exact variable-length `WHERE` pattern predicates, while preserving explicit fail-fast notes for remaining path/list-carrier and advanced row-shaping gaps. + +### Changed +- **GFQL / Cypher lowering — bounded/exact variable-length `WHERE` pattern predicates (#973)**: Removed the pre-normalization compiler gate that rejected bounded/exact variable-length `WHERE` pattern predicates and now lower these shapes through the existing WHERE-pattern rewrite and row-filter paths. Converted the old fail-fast test into positive execution coverage and added boolean-wrapper amplification (`OR`/`XOR`/`NOT`) for bounded variable-length `WHERE` predicates in `graphistry/tests/compute/gfql/cypher/test_lowering.py`. ## [0.55.1 - 2026-05-05] diff --git a/docs/source/gfql/cypher.rst b/docs/source/gfql/cypher.rst index 31652697e9..4e4dad0188 100644 --- a/docs/source/gfql/cypher.rst +++ b/docs/source/gfql/cypher.rst @@ -206,11 +206,11 @@ Support Matrix - Execute directly through ``g.gfql("...")``. Helper translation to a single ``Chain`` is stricter. * - Variable-length relationship patterns - Partial - - Direct Cypher supports endpoint-only traversals such as ``[*2]``, - ``[*1..3]``, ``[*]``, and typed forms like ``[:R*2..4]``, plus bounded - connected multi-relationship patterns where the row shape stays in the - current supported subset. Path/list-carrier uses, bounded/exact - ``WHERE`` pattern predicates, and broader branching/path-shaping cases + - Direct Cypher supports endpoint traversals such as ``[*2]``, + ``[*1..3]``, ``[*]``, and typed forms like ``[:R*2..4]``; connected + multi-relationship variable-length patterns; and bounded/exact/fixed-point + variable-length ``WHERE`` pattern predicates in the current row-shaped + subset. Path/list-carrier uses and unsupported path/row-shaping cases still fail fast. * - ``CREATE`` / ``DELETE`` / ``SET`` - Not supported @@ -236,9 +236,10 @@ Pattern Matching Forms - Node labels and multi-label node patterns such as ``(p:Person:Admin)``. - Relationship direction forms ``->``, ``<-``, and undirected ``-[]-``. - Relationship type alternation such as ``[r:KNOWS|HATES]``. -- Single variable-length relationship patterns when they are the only - relationship in the connected pattern, including ``[*n]``, ``[*m..n]``, - ``[*]``, and typed forms such as ``[:R*2..4]``. +- Single variable-length relationship patterns, including ``[*n]``, + ``[*m..n]``, ``[*]``, and typed forms such as ``[:R*2..4]``. +- Connected patterns that mix variable-length and fixed-length relationships, + such as ``MATCH (a)-[:R*2]->()-[:S]->(c) RETURN c``. - Connected comma-separated patterns such as ``MATCH (a)-[:A]->(b), (b)-[:B]->(c)``. - Repeated ``MATCH`` clauses when they stay connected through shared aliases. @@ -255,19 +256,18 @@ WHERE Forms - Label predicates such as ``WHERE b:Foo:Bar``. - Relationship-type predicates such as ``WHERE type(r) = 'KNOWS'``. - Positive relationship-existence pattern predicates such as - ``WHERE (n)-[:R]->()`` and bare fixed-point variable-length existence checks - such as ``WHERE (n)-[*]-()``. -- One positive relationship-existence pattern predicate may be combined with - ordinary row filters through top-level ``AND``, for example - ``WHERE n.kind = 'x' AND (n)-[:R*]->() AND n.id <> 'a'``. + ``WHERE (n)-[:R]->()`` and variable-length existence checks such as + ``WHERE (n)-[*]-()`` and ``WHERE (n)-[:R*2]->()``. +- Pattern predicates can be combined with row predicates in the current + boolean subset, including ``AND`` / ``OR`` / ``XOR`` and ``NOT`` forms. Variable-Length Relationship Boundary ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Direct Cypher multihop support is intentionally narrow in the current landing -slice. The supported direct forms include endpoint traversals and bounded -connected multi-relationship patterns where the result stays in the current -row-shaping subset, for example: +Direct Cypher multihop support remains intentionally bounded. The supported +direct forms include endpoint traversals, connected multi-relationship +patterns, and variable-length ``WHERE`` pattern predicates where the result +stays in the current row-shaping subset, for example: - ``MATCH (a)-[*2]->(b) RETURN b`` - ``MATCH (a)-[:R*1..3]->(b) RETURN b`` @@ -275,20 +275,16 @@ row-shaping subset, for example: - ``MATCH (a)-[:R*1..2]-(b) RETURN b`` - ``MATCH (a)-[:R*2]->(b)-[:S]->(c) RETURN c`` - ``MATCH (a)-[:R]->(b), (b)-[:S*1..2]->(c) RETURN a.id AS a_id, c.id AS c_id`` +- ``MATCH (n) WHERE (n)-[:R*2]->() RETURN n`` +- ``MATCH (n) WHERE NOT (n)-[:R*2]->() RETURN n.id AS id`` The current compiler explicitly rejects these remaining subfamilies with ``GFQLValidationError`` instead of attempting unsound execution: - path/list-carrier use of a variable-length relationship alias, such as ``RETURN r`` or ``count(r)`` -- exact or bounded variable-length ``WHERE`` pattern predicates such as - ``WHERE (n)-[:R*2]-()`` -- top-level ``OR`` / ``NOT`` around variable-length ``WHERE`` pattern - predicates, or more than one positive pattern predicate in the same - ``WHERE`` clause -- branching connected multihop patterns, or shapes that would require - unsupported path/relationship-carrier row shaping around a variable-length - segment +- shapes that still require unsupported path/relationship-carrier row shaping + around a variable-length segment - connected multi-pattern relationship-alias projection such as ``RETURN r`` / ``r.prop`` when it would require unsupported row shaping - multi-alias ``RETURN *`` projections that would require unsupported @@ -431,10 +427,8 @@ Not Supported Today - Variable-length relationship aliases used as path/list carriers, such as ``RETURN r`` or ``count(r)``. -- Exact or bounded variable-length ``WHERE`` pattern predicates such as - ``WHERE (n)-[:R*2]-()``. -- Branching connected multihop patterns, or connected multihop shapes that - still require unsupported path/relationship-carrier row shaping. +- Connected multihop shapes that still require unsupported + path/relationship-carrier row shaping. - Multiple disconnected ``MATCH`` patterns used as arbitrary joins. - Multi-pattern re-entry shapes beyond the bounded single ``MATCH ... WITH ... MATCH ... RETURN`` form. diff --git a/docs/source/gfql/spec/cypher_mapping.md b/docs/source/gfql/spec/cypher_mapping.md index a920bfe116..754611ab5c 100644 --- a/docs/source/gfql/spec/cypher_mapping.md +++ b/docs/source/gfql/spec/cypher_mapping.md @@ -36,11 +36,11 @@ When translating from Cypher, you'll encounter three scenarios: ### Direct Translations - Graph patterns: `(a)-[r]->(b)` → chain operations - Property filters: WHERE clauses embed into operations -- Path traversals: direct `g.gfql("MATCH ...")` supports endpoint-only single - variable-length relationship forms such as `[*2]`, `[*1..3]`, and `[*]`. - Native GFQL still gives you the full explicit hop surface, including output - slicing, intermediate-hop aliasing, and rewrites for currently unsupported - direct-Cypher multihop shapes. +- Path traversals: direct `g.gfql("MATCH ...")` supports single and connected + variable-length relationship forms such as `[*2]`, `[*1..3]`, and `[*]`, + including bounded/exact variable-length `WHERE` pattern predicates in the + current row-shaped subset. Native GFQL still gives you the full explicit hop + surface (output slicing, intermediate-hop aliasing, and custom rewrites). - Pattern composition: Multiple patterns become sequential operations - Same-path constraints: `WHERE` across steps → `g.gfql([...], where=[...])` @@ -255,10 +255,10 @@ g.gfql([ ### Edge Patterns Rows using `[*...]` below show the native GFQL rewrite for the same traversal -intent. Direct `g.gfql("MATCH ...")` now accepts these endpoint-only -single-variable-length relationship forms, while native GFQL remains the more -explicit option when you need intermediate-hop control or unsupported mixed -pattern shapes. +intent. Direct `g.gfql("MATCH ...")` accepts these variable-length forms in +the supported direct-Cypher subset, while native GFQL remains the more explicit +option when you need intermediate-hop control or advanced path/list-carrier +semantics. | Cypher / intent | Python | Wire Protocol (compact) | |-----------------|--------|-------------------------| @@ -274,7 +274,7 @@ pattern shapes. | `-[r:BOUGHT {amount: gt(100)}]->` | `e_forward({"type": "BOUGHT", "amount": gt(100)}, name="r")` | `{"type": "Edge", "direction": "forward", "edge_match": {"type": "BOUGHT", "amount": {"type": "GT", "val": 100}}, "name": "r"}` | When you need constraints on intermediate hops, path/list-carrier semantics, or -mixed connected patterns beyond the current direct-Cypher subset, use repeated +advanced row-shaping beyond the current direct-Cypher subset, use repeated single-hop GFQL steps with aliases instead of collapsing the traversal into one multihop edge operator. diff --git a/graphistry/compute/gfql/cypher/lowering.py b/graphistry/compute/gfql/cypher/lowering.py index a9e83b3bd9..2d6fdedf41 100644 --- a/graphistry/compute/gfql/cypher/lowering.py +++ b/graphistry/compute/gfql/cypher/lowering.py @@ -5590,34 +5590,6 @@ def _is_variable_length_relationship_pattern(relationship: RelationshipPattern) ) -def _reject_unsupported_variable_length_where_pattern_predicates(query: CypherQuery) -> None: - if query.where is None: - return - predicates: List[WherePatternPredicate] = [ - predicate for predicate in query.where.predicates if isinstance(predicate, WherePatternPredicate) - ] - if query.where.expr_tree is not None: - predicates.extend(_where_expr_tree_pattern_predicates(query.where.expr_tree)) - for predicate in predicates: - relationships = [ - element - for element in predicate.pattern - if isinstance(element, RelationshipPattern) - ] - for relationship in relationships: - if not _is_variable_length_relationship_pattern(relationship): - continue - if relationship.min_hops is None and relationship.max_hops is None and relationship.to_fixed_point: - continue - raise _unsupported( - "Cypher WHERE pattern predicates currently support only bare variable-length fixed-point relationships, not exact or bounded hop counts", - field="where", - value=boolean_expr_to_text(query.where.expr_tree) if query.where.expr_tree is not None else None, - line=predicate.span.line, - column=predicate.span.column, - ) - - def _reject_nonterminal_variable_length_relationship_patterns(query: CypherQuery) -> None: # noqa: ARG001 """No-op: variable-length rels in connected patterns are now supported. @@ -8331,7 +8303,6 @@ def _attach_graph_context(result: CompiledCypherQuery) -> CompiledCypherQuery: normalizer = ASTNormalizer() query = normalizer.rewrite_shortest_path(query) - _reject_unsupported_variable_length_where_pattern_predicates(query) _reject_variable_length_path_alias_references(query, params=params) query = normalizer.rewrite_where_pattern_predicates(query) diff --git a/graphistry/tests/compute/gfql/cypher/test_lowering.py b/graphistry/tests/compute/gfql/cypher/test_lowering.py index b6b9b3c81e..9b691997ad 100644 --- a/graphistry/tests/compute/gfql/cypher/test_lowering.py +++ b/graphistry/tests/compute/gfql/cypher/test_lowering.py @@ -1738,6 +1738,48 @@ def test_lower_match_query_emits_row_anti_semi_filter_for_bound_alias_negated_wh assert [op.get("type") for op in binding_ops] == ["Node", "Edge", "Node"] +def test_lower_match_query_emits_row_anti_semi_filter_for_bound_alias_negated_bounded_varlen_where_pattern() -> None: + lowered = lower_match_query( + _parse_query("MATCH (a)-[:R]->(b) WHERE NOT (b)-[:R*1..2]->(a) RETURN a.id AS a_id, b.id AS b_id") + ) + + assert len(lowered.row_pre_filters) == 1 + anti = lowered.row_pre_filters[0] + assert isinstance(anti, ASTCall) + assert anti.function == "anti_semi_apply" + assert anti.params.get("join_aliases") == ["b", "a"] + binding_ops = anti.params.get("binding_ops") + assert isinstance(binding_ops, list) + assert [op.get("type") for op in binding_ops] == ["Node", "Edge", "Node"] + edge = binding_ops[1] + assert edge.get("min_hops") == 1 + assert edge.get("max_hops") == 2 + assert edge.get("to_fixed_point") is False + + +def test_lower_match_query_emits_row_marker_for_xor_wrapped_bounded_varlen_where_pattern() -> None: + lowered = lower_match_query( + _parse_query("MATCH (n) WHERE (n)-[:R*2]->() XOR n.id = 'd' RETURN n.id AS id") + ) + + assert len(lowered.row_pre_filters) == 1 + marker = lowered.row_pre_filters[0] + assert isinstance(marker, ASTCall) + assert marker.function == "semi_apply_mark" + assert marker.params.get("join_aliases") == ["n"] + out_col = marker.params.get("out_col") + assert isinstance(out_col, str) and out_col.startswith("__gfql_where_pattern_") + assert lowered.row_where is not None + assert " XOR " in lowered.row_where.text + assert out_col in lowered.row_where.text + binding_ops = marker.params.get("binding_ops") + assert isinstance(binding_ops, list) + edge = binding_ops[1] + assert edge.get("min_hops") == 2 + assert edge.get("max_hops") == 2 + assert edge.get("to_fixed_point") is False + + def test_lower_match_query_rejects_where_pattern_predicate_introducing_new_aliases() -> None: with pytest.raises(GFQLValidationError, match="cannot introduce new aliases"): lower_cypher_query(_parse_query("MATCH (n) WHERE (n)-[r]->(a) RETURN n")) @@ -5258,22 +5300,119 @@ def test_connected_variable_length_typed_mixed() -> None: @pytest.mark.parametrize( - "query", + "query,expected_rows", [ - "MATCH (n) WHERE (n)-[:REL1*2]-() RETURN n", - "MATCH (n) WHERE (n)-[*2]-() RETURN n", - "MATCH (n) WHERE (n)<-[:REL1*1..2]-() RETURN n", - "MATCH (n) WHERE (n)-[:REL1*2]-() AND n.id <> 'a' RETURN n", + ( + "MATCH (n) WHERE (n)-[:REL1*2]->() RETURN n.id AS id ORDER BY id", + [{"id": "a"}, {"id": "b"}, {"id": "c"}], + ), + ( + "MATCH (n) WHERE (n)-[*2]->() RETURN n.id AS id ORDER BY id", + [{"id": "a"}, {"id": "b"}, {"id": "c"}], + ), + ( + "MATCH (n) WHERE (n)<-[:REL1*1..2]-() RETURN n.id AS id ORDER BY id", + [{"id": "b"}, {"id": "c"}, {"id": "d"}], + ), + ( + "MATCH (n) WHERE (n)-[:REL1*2]->() AND n.id <> 'a' RETURN n.id AS id ORDER BY id", + [{"id": "b"}, {"id": "c"}], + ), ], ) -def test_string_cypher_failfast_rejects_bounded_variable_length_where_pattern_predicates(query: str) -> None: - graph = _mk_empty_graph() +def test_string_cypher_executes_bounded_variable_length_where_pattern_predicates( + query: str, + expected_rows: list[dict[str, object]], +) -> None: + graph = _mk_graph( + pd.DataFrame({"id": ["a", "b", "c", "d"]}), + pd.DataFrame( + { + "s": ["a", "b", "c"], + "d": ["b", "c", "d"], + "type": ["REL1", "REL1", "REL1"], + } + ), + ) - with pytest.raises(GFQLValidationError) as exc_info: - graph.gfql(query) + result = graph.gfql(query) + assert result._nodes.to_dict(orient="records") == expected_rows - assert exc_info.value.code == ErrorCode.E108 - assert "WHERE pattern predicates" in exc_info.value.message + +@pytest.mark.parametrize( + "query,expected_rows", + [ + ( + "MATCH (n) WHERE (n)-[:REL1*2]->() OR n.id = 'd' RETURN n.id AS id ORDER BY id", + [{"id": "a"}, {"id": "b"}, {"id": "d"}], + ), + ( + "MATCH (n) WHERE (n)-[:REL1*2]->() XOR n.id = 'd' RETURN n.id AS id ORDER BY id", + [{"id": "a"}, {"id": "b"}, {"id": "d"}], + ), + ( + "MATCH (n) WHERE NOT (n)-[:REL1*2]->() RETURN n.id AS id ORDER BY id", + [{"id": "c"}, {"id": "d"}], + ), + ], +) +def test_string_cypher_executes_bounded_variable_length_where_pattern_boolean_wrappers( + query: str, + expected_rows: list[dict[str, object]], +) -> None: + graph = _mk_graph( + pd.DataFrame({"id": ["a", "b", "c", "d"]}), + pd.DataFrame( + { + "s": ["a", "b", "c"], + "d": ["b", "c", "d"], + "type": ["REL1", "REL1", "REL1"], + } + ), + ) + + result = graph.gfql(query) + assert result._nodes.to_dict(orient="records") == expected_rows + + +def test_string_cypher_executes_conjoined_bounded_varlen_where_predicates_across_edge_types() -> None: + graph = _mk_graph( + pd.DataFrame({"id": ["a", "b", "c", "d", "e"]}), + pd.DataFrame( + { + "s": ["a", "b", "c", "a", "b"], + "d": ["b", "c", "d", "e", "e"], + "type": ["REL1", "REL1", "REL1", "REL2", "REL2"], + } + ), + ) + + rows_forward = graph.gfql( + "MATCH (n) WHERE (n)-[:REL1*2]->() AND (n)-[:REL2*1]->() RETURN n.id AS id ORDER BY id" + )._nodes.to_dict(orient="records") + assert rows_forward == [{"id": "b"}] + + +def test_string_cypher_executes_xor_between_bounded_reverse_and_forward_where_patterns() -> None: + graph = _mk_graph( + pd.DataFrame({"id": ["a", "b", "c", "d"]}), + pd.DataFrame( + { + "s": ["a", "b", "c"], + "d": ["b", "c", "d"], + "type": ["REL1", "REL1", "REL1"], + } + ), + ) + + result = graph.gfql( + "MATCH (n) WHERE (n)<-[:REL1*1..2]-() XOR (n)-[:REL1*2]->() RETURN n.id AS id ORDER BY id" + ) + assert result._nodes.to_dict(orient="records") == [ + {"id": "a"}, + {"id": "c"}, + {"id": "d"}, + ]