Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm

### Documentation
- **GFQL component-labeling examples + README clarity (#1324)**: Added concise WCC/SCC labeling examples for `compute_cugraph`, `compute_igraph('clusters')`, and local Cypher `CALL graphistry.cugraph.*` write/row modes in GFQL docs, clarified that component IDs are partition labels (not stable semantic IDs), and tightened the main README GFQL intro sentence for readability.
- **GFQL / Cypher docs — variable-length boundary refresh (#973)**: Updated direct-Cypher capability docs (`docs/source/gfql/cypher.rst`, `docs/source/gfql/spec/cypher_mapping.md`) to reflect current support for connected variable-length patterns and bounded/exact variable-length `WHERE` pattern predicates, while preserving explicit fail-fast notes for remaining path/list-carrier and advanced row-shaping gaps.

### Changed
- **GFQL / Cypher lowering — bounded/exact variable-length `WHERE` pattern predicates (#973)**: Removed the pre-normalization compiler gate that rejected bounded/exact variable-length `WHERE` pattern predicates and now lower these shapes through the existing WHERE-pattern rewrite and row-filter paths. Converted the old fail-fast test into positive execution coverage and added boolean-wrapper amplification (`OR`/`XOR`/`NOT`) for bounded variable-length `WHERE` predicates in `graphistry/tests/compute/gfql/cypher/test_lowering.py`.

## [0.55.1 - 2026-05-05]

Expand Down
52 changes: 23 additions & 29 deletions docs/source/gfql/cypher.rst
Original file line number Diff line number Diff line change
Expand Up @@ -206,11 +206,11 @@ Support Matrix
- Execute directly through ``g.gfql("...")``. Helper translation to a single ``Chain`` is stricter.
* - Variable-length relationship patterns
- Partial
- Direct Cypher supports endpoint-only traversals such as ``[*2]``,
``[*1..3]``, ``[*]``, and typed forms like ``[:R*2..4]``, plus bounded
connected multi-relationship patterns where the row shape stays in the
current supported subset. Path/list-carrier uses, bounded/exact
``WHERE`` pattern predicates, and broader branching/path-shaping cases
- Direct Cypher supports endpoint traversals such as ``[*2]``,
``[*1..3]``, ``[*]``, and typed forms like ``[:R*2..4]``; connected
multi-relationship variable-length patterns; and bounded/exact/fixed-point
variable-length ``WHERE`` pattern predicates in the current row-shaped
subset. Path/list-carrier uses and unsupported path/row-shaping cases
still fail fast.
* - ``CREATE`` / ``DELETE`` / ``SET``
- Not supported
Expand All @@ -236,9 +236,10 @@ Pattern Matching Forms
- Node labels and multi-label node patterns such as ``(p:Person:Admin)``.
- Relationship direction forms ``->``, ``<-``, and undirected ``-[]-``.
- Relationship type alternation such as ``[r:KNOWS|HATES]``.
- Single variable-length relationship patterns when they are the only
relationship in the connected pattern, including ``[*n]``, ``[*m..n]``,
``[*]``, and typed forms such as ``[:R*2..4]``.
- Single variable-length relationship patterns, including ``[*n]``,
``[*m..n]``, ``[*]``, and typed forms such as ``[:R*2..4]``.
- Connected patterns that mix variable-length and fixed-length relationships,
such as ``MATCH (a)-[:R*2]->()-[:S]->(c) RETURN c``.
- Connected comma-separated patterns such as
``MATCH (a)-[:A]->(b), (b)-[:B]->(c)``.
- Repeated ``MATCH`` clauses when they stay connected through shared aliases.
Expand All @@ -255,40 +256,35 @@ WHERE Forms
- Label predicates such as ``WHERE b:Foo:Bar``.
- Relationship-type predicates such as ``WHERE type(r) = 'KNOWS'``.
- Positive relationship-existence pattern predicates such as
``WHERE (n)-[:R]->()`` and bare fixed-point variable-length existence checks
such as ``WHERE (n)-[*]-()``.
- One positive relationship-existence pattern predicate may be combined with
ordinary row filters through top-level ``AND``, for example
``WHERE n.kind = 'x' AND (n)-[:R*]->() AND n.id <> 'a'``.
``WHERE (n)-[:R]->()`` and variable-length existence checks such as
``WHERE (n)-[*]-()`` and ``WHERE (n)-[:R*2]->()``.
- Pattern predicates can be combined with row predicates in the current
boolean subset, including ``AND`` / ``OR`` / ``XOR`` and ``NOT`` forms.

Variable-Length Relationship Boundary
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Direct Cypher multihop support is intentionally narrow in the current landing
slice. The supported direct forms include endpoint traversals and bounded
connected multi-relationship patterns where the result stays in the current
row-shaping subset, for example:
Direct Cypher multihop support remains intentionally bounded. The supported
direct forms include endpoint traversals, connected multi-relationship
patterns, and variable-length ``WHERE`` pattern predicates where the result
stays in the current row-shaping subset, for example:

- ``MATCH (a)-[*2]->(b) RETURN b``
- ``MATCH (a)-[:R*1..3]->(b) RETURN b``
- ``MATCH (a)<-[*2]-(b) RETURN b``
- ``MATCH (a)-[:R*1..2]-(b) RETURN b``
- ``MATCH (a)-[:R*2]->(b)-[:S]->(c) RETURN c``
- ``MATCH (a)-[:R]->(b), (b)-[:S*1..2]->(c) RETURN a.id AS a_id, c.id AS c_id``
- ``MATCH (n) WHERE (n)-[:R*2]->() RETURN n``
- ``MATCH (n) WHERE NOT (n)-[:R*2]->() RETURN n.id AS id``

The current compiler explicitly rejects these remaining subfamilies with
``GFQLValidationError`` instead of attempting unsound execution:

- path/list-carrier use of a variable-length relationship alias, such as
``RETURN r`` or ``count(r)``
- exact or bounded variable-length ``WHERE`` pattern predicates such as
``WHERE (n)-[:R*2]-()``
- top-level ``OR`` / ``NOT`` around variable-length ``WHERE`` pattern
predicates, or more than one positive pattern predicate in the same
``WHERE`` clause
- branching connected multihop patterns, or shapes that would require
unsupported path/relationship-carrier row shaping around a variable-length
segment
- shapes that still require unsupported path/relationship-carrier row shaping
around a variable-length segment
- connected multi-pattern relationship-alias projection such as
``RETURN r`` / ``r.prop`` when it would require unsupported row shaping
- multi-alias ``RETURN *`` projections that would require unsupported
Expand Down Expand Up @@ -431,10 +427,8 @@ Not Supported Today

- Variable-length relationship aliases used as path/list carriers, such as
``RETURN r`` or ``count(r)``.
- Exact or bounded variable-length ``WHERE`` pattern predicates such as
``WHERE (n)-[:R*2]-()``.
- Branching connected multihop patterns, or connected multihop shapes that
still require unsupported path/relationship-carrier row shaping.
- Connected multihop shapes that still require unsupported
path/relationship-carrier row shaping.
- Multiple disconnected ``MATCH`` patterns used as arbitrary joins.
- Multi-pattern re-entry shapes beyond the bounded single
``MATCH ... WITH ... MATCH ... RETURN`` form.
Expand Down
20 changes: 10 additions & 10 deletions docs/source/gfql/spec/cypher_mapping.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,11 @@ When translating from Cypher, you'll encounter three scenarios:
### Direct Translations
- Graph patterns: `(a)-[r]->(b)` → chain operations
- Property filters: WHERE clauses embed into operations
- Path traversals: direct `g.gfql("MATCH ...")` supports endpoint-only single
variable-length relationship forms such as `[*2]`, `[*1..3]`, and `[*]`.
Native GFQL still gives you the full explicit hop surface, including output
slicing, intermediate-hop aliasing, and rewrites for currently unsupported
direct-Cypher multihop shapes.
- Path traversals: direct `g.gfql("MATCH ...")` supports single and connected
variable-length relationship forms such as `[*2]`, `[*1..3]`, and `[*]`,
including bounded/exact variable-length `WHERE` pattern predicates in the
current row-shaped subset. Native GFQL still gives you the full explicit hop
surface (output slicing, intermediate-hop aliasing, and custom rewrites).
- Pattern composition: Multiple patterns become sequential operations
- Same-path constraints: `WHERE` across steps → `g.gfql([...], where=[...])`

Expand Down Expand Up @@ -255,10 +255,10 @@ g.gfql([
### Edge Patterns

Rows using `[*...]` below show the native GFQL rewrite for the same traversal
intent. Direct `g.gfql("MATCH ...")` now accepts these endpoint-only
single-variable-length relationship forms, while native GFQL remains the more
explicit option when you need intermediate-hop control or unsupported mixed
pattern shapes.
intent. Direct `g.gfql("MATCH ...")` accepts these variable-length forms in
the supported direct-Cypher subset, while native GFQL remains the more explicit
option when you need intermediate-hop control or advanced path/list-carrier
semantics.

| Cypher / intent | Python | Wire Protocol (compact) |
|-----------------|--------|-------------------------|
Expand All @@ -274,7 +274,7 @@ pattern shapes.
| `-[r:BOUGHT {amount: gt(100)}]->` | `e_forward({"type": "BOUGHT", "amount": gt(100)}, name="r")` | `{"type": "Edge", "direction": "forward", "edge_match": {"type": "BOUGHT", "amount": {"type": "GT", "val": 100}}, "name": "r"}` |

When you need constraints on intermediate hops, path/list-carrier semantics, or
mixed connected patterns beyond the current direct-Cypher subset, use repeated
advanced row-shaping beyond the current direct-Cypher subset, use repeated
single-hop GFQL steps with aliases instead of collapsing the traversal into one
multihop edge operator.

Expand Down
29 changes: 0 additions & 29 deletions graphistry/compute/gfql/cypher/lowering.py
Original file line number Diff line number Diff line change
Expand Up @@ -5590,34 +5590,6 @@ def _is_variable_length_relationship_pattern(relationship: RelationshipPattern)
)


def _reject_unsupported_variable_length_where_pattern_predicates(query: CypherQuery) -> None:
if query.where is None:
return
predicates: List[WherePatternPredicate] = [
predicate for predicate in query.where.predicates if isinstance(predicate, WherePatternPredicate)
]
if query.where.expr_tree is not None:
predicates.extend(_where_expr_tree_pattern_predicates(query.where.expr_tree))
for predicate in predicates:
relationships = [
element
for element in predicate.pattern
if isinstance(element, RelationshipPattern)
]
for relationship in relationships:
if not _is_variable_length_relationship_pattern(relationship):
continue
if relationship.min_hops is None and relationship.max_hops is None and relationship.to_fixed_point:
continue
raise _unsupported(
"Cypher WHERE pattern predicates currently support only bare variable-length fixed-point relationships, not exact or bounded hop counts",
field="where",
value=boolean_expr_to_text(query.where.expr_tree) if query.where.expr_tree is not None else None,
line=predicate.span.line,
column=predicate.span.column,
)


def _reject_nonterminal_variable_length_relationship_patterns(query: CypherQuery) -> None: # noqa: ARG001
"""No-op: variable-length rels in connected patterns are now supported.

Expand Down Expand Up @@ -8331,7 +8303,6 @@ def _attach_graph_context(result: CompiledCypherQuery) -> CompiledCypherQuery:

normalizer = ASTNormalizer()
query = normalizer.rewrite_shortest_path(query)
_reject_unsupported_variable_length_where_pattern_predicates(query)
_reject_variable_length_path_alias_references(query, params=params)
query = normalizer.rewrite_where_pattern_predicates(query)

Expand Down
161 changes: 150 additions & 11 deletions graphistry/tests/compute/gfql/cypher/test_lowering.py
Original file line number Diff line number Diff line change
Expand Up @@ -1738,6 +1738,48 @@ def test_lower_match_query_emits_row_anti_semi_filter_for_bound_alias_negated_wh
assert [op.get("type") for op in binding_ops] == ["Node", "Edge", "Node"]


def test_lower_match_query_emits_row_anti_semi_filter_for_bound_alias_negated_bounded_varlen_where_pattern() -> None:
lowered = lower_match_query(
_parse_query("MATCH (a)-[:R]->(b) WHERE NOT (b)-[:R*1..2]->(a) RETURN a.id AS a_id, b.id AS b_id")
)

assert len(lowered.row_pre_filters) == 1
anti = lowered.row_pre_filters[0]
assert isinstance(anti, ASTCall)
assert anti.function == "anti_semi_apply"
assert anti.params.get("join_aliases") == ["b", "a"]
binding_ops = anti.params.get("binding_ops")
assert isinstance(binding_ops, list)
assert [op.get("type") for op in binding_ops] == ["Node", "Edge", "Node"]
edge = binding_ops[1]
assert edge.get("min_hops") == 1
assert edge.get("max_hops") == 2
assert edge.get("to_fixed_point") is False


def test_lower_match_query_emits_row_marker_for_xor_wrapped_bounded_varlen_where_pattern() -> None:
lowered = lower_match_query(
_parse_query("MATCH (n) WHERE (n)-[:R*2]->() XOR n.id = 'd' RETURN n.id AS id")
)

assert len(lowered.row_pre_filters) == 1
marker = lowered.row_pre_filters[0]
assert isinstance(marker, ASTCall)
assert marker.function == "semi_apply_mark"
assert marker.params.get("join_aliases") == ["n"]
out_col = marker.params.get("out_col")
assert isinstance(out_col, str) and out_col.startswith("__gfql_where_pattern_")
assert lowered.row_where is not None
assert " XOR " in lowered.row_where.text
assert out_col in lowered.row_where.text
binding_ops = marker.params.get("binding_ops")
assert isinstance(binding_ops, list)
edge = binding_ops[1]
assert edge.get("min_hops") == 2
assert edge.get("max_hops") == 2
assert edge.get("to_fixed_point") is False


def test_lower_match_query_rejects_where_pattern_predicate_introducing_new_aliases() -> None:
with pytest.raises(GFQLValidationError, match="cannot introduce new aliases"):
lower_cypher_query(_parse_query("MATCH (n) WHERE (n)-[r]->(a) RETURN n"))
Expand Down Expand Up @@ -5258,22 +5300,119 @@ def test_connected_variable_length_typed_mixed() -> None:


@pytest.mark.parametrize(
"query",
"query,expected_rows",
[
"MATCH (n) WHERE (n)-[:REL1*2]-() RETURN n",
"MATCH (n) WHERE (n)-[*2]-() RETURN n",
"MATCH (n) WHERE (n)<-[:REL1*1..2]-() RETURN n",
"MATCH (n) WHERE (n)-[:REL1*2]-() AND n.id <> 'a' RETURN n",
(
"MATCH (n) WHERE (n)-[:REL1*2]->() RETURN n.id AS id ORDER BY id",
[{"id": "a"}, {"id": "b"}, {"id": "c"}],
),
(
"MATCH (n) WHERE (n)-[*2]->() RETURN n.id AS id ORDER BY id",
[{"id": "a"}, {"id": "b"}, {"id": "c"}],
),
(
"MATCH (n) WHERE (n)<-[:REL1*1..2]-() RETURN n.id AS id ORDER BY id",
[{"id": "b"}, {"id": "c"}, {"id": "d"}],
),
(
"MATCH (n) WHERE (n)-[:REL1*2]->() AND n.id <> 'a' RETURN n.id AS id ORDER BY id",
[{"id": "b"}, {"id": "c"}],
),
],
)
def test_string_cypher_failfast_rejects_bounded_variable_length_where_pattern_predicates(query: str) -> None:
graph = _mk_empty_graph()
def test_string_cypher_executes_bounded_variable_length_where_pattern_predicates(
query: str,
expected_rows: list[dict[str, object]],
) -> None:
graph = _mk_graph(
pd.DataFrame({"id": ["a", "b", "c", "d"]}),
pd.DataFrame(
{
"s": ["a", "b", "c"],
"d": ["b", "c", "d"],
"type": ["REL1", "REL1", "REL1"],
}
),
)

with pytest.raises(GFQLValidationError) as exc_info:
graph.gfql(query)
result = graph.gfql(query)
assert result._nodes.to_dict(orient="records") == expected_rows

assert exc_info.value.code == ErrorCode.E108
assert "WHERE pattern predicates" in exc_info.value.message

@pytest.mark.parametrize(
"query,expected_rows",
[
(
"MATCH (n) WHERE (n)-[:REL1*2]->() OR n.id = 'd' RETURN n.id AS id ORDER BY id",
[{"id": "a"}, {"id": "b"}, {"id": "d"}],
),
(
"MATCH (n) WHERE (n)-[:REL1*2]->() XOR n.id = 'd' RETURN n.id AS id ORDER BY id",
[{"id": "a"}, {"id": "b"}, {"id": "d"}],
),
(
"MATCH (n) WHERE NOT (n)-[:REL1*2]->() RETURN n.id AS id ORDER BY id",
[{"id": "c"}, {"id": "d"}],
),
],
)
def test_string_cypher_executes_bounded_variable_length_where_pattern_boolean_wrappers(
query: str,
expected_rows: list[dict[str, object]],
) -> None:
graph = _mk_graph(
pd.DataFrame({"id": ["a", "b", "c", "d"]}),
pd.DataFrame(
{
"s": ["a", "b", "c"],
"d": ["b", "c", "d"],
"type": ["REL1", "REL1", "REL1"],
}
),
)

result = graph.gfql(query)
assert result._nodes.to_dict(orient="records") == expected_rows


def test_string_cypher_executes_conjoined_bounded_varlen_where_predicates_across_edge_types() -> None:
graph = _mk_graph(
pd.DataFrame({"id": ["a", "b", "c", "d", "e"]}),
pd.DataFrame(
{
"s": ["a", "b", "c", "a", "b"],
"d": ["b", "c", "d", "e", "e"],
"type": ["REL1", "REL1", "REL1", "REL2", "REL2"],
}
),
)

rows_forward = graph.gfql(
"MATCH (n) WHERE (n)-[:REL1*2]->() AND (n)-[:REL2*1]->() RETURN n.id AS id ORDER BY id"
)._nodes.to_dict(orient="records")
assert rows_forward == [{"id": "b"}]


def test_string_cypher_executes_xor_between_bounded_reverse_and_forward_where_patterns() -> None:
graph = _mk_graph(
pd.DataFrame({"id": ["a", "b", "c", "d"]}),
pd.DataFrame(
{
"s": ["a", "b", "c"],
"d": ["b", "c", "d"],
"type": ["REL1", "REL1", "REL1"],
}
),
)

result = graph.gfql(
"MATCH (n) WHERE (n)<-[:REL1*1..2]-() XOR (n)-[:REL1*2]->() RETURN n.id AS id ORDER BY id"
)
assert result._nodes.to_dict(orient="records") == [
{"id": "a"},
{"id": "c"},
{"id": "d"},
]



Expand Down
Loading