feat(gfql/cypher): multi-positive WHERE pattern predicates (#1031 slice 3) by lmeyerov · Pull Request #1229 · graphistry/pygraphistry

lmeyerov · 2026-04-28T07:30:00Z

Summary

Slice 3 of #1031. AND-joined positive WHERE pattern predicates now lift into structured WhereClause.predicates as N WherePatternPredicate entries. Closes the len(pattern_leaves) > 1 E108 gate that #1217 (slice 1) deferred.

MATCH (n) WHERE (n)-[:R]->() AND (n)-[:T]->() RETURN n.id  -- now works

What changed

Parser (cypher/parser.py):

pattern_atom now splits the greedy WHERE_PATTERN lexer token (which gobbles pattern AND pattern AND ... as a single match) back into individual pattern-item texts via _WHERE_PATTERN_ITEM_RE.finditer. Emits one BooleanExpr(op="pattern") per item, joined by _rebuild_and_tree. Upstream _split_top_level_and_pattern_leaves then naturally extracts N pattern leaves.
_build_where_with_pattern_lift — drop the len(pattern_leaves) > 1 E108 gate; build N WherePatternPredicates.
_parse_where_pattern_predicate_text renamed to _parse_single_where_pattern_predicate_text and its in-helper multi-item gate removed (caller now splits before invocation).

ast_normalizer (cypher/ast_normalizer.py):

_rewrite_where_pattern_predicates_to_matches — drop the matching gate. Run per-predicate validation independently. Pack the N patterns into a single appended MatchClause with patterns: Tuple[Tuple[PatternElement, ...], ...] of N entries (multi-pattern cartesian within MATCH), preserving the lowering invariant that only the FINAL match is connected — pre-binding seeds remain node-only.

Tests

New: test_gfql_executes_multi_positive_where_pattern_predicates_as_intersected_seed — runtime contract: rows where ALL patterns exist.
Updated: test_lower_match_query_supports_multiple_where_pattern_predicates (was ..._rejects_...) — assert the lift + compile path now succeeds.

Test plan

pytest graphistry/tests/compute/gfql/ — 1574 passed, 87 skipped, 15 xfailed, no regressions.
mypy graphistry/compute/gfql/cypher/parser.py graphistry/compute/gfql/cypher/ast_normalizer.py — clean.

Out of scope (deferred, tracked in plan.md)

Slice 2 (NOT-pattern, IC10 benchmark unblock) — needs anti-semi-join lowering. Will be a stacked PR-2 on this branch.
Slice 4 (OR/XOR-around-pattern) — needs row-level pattern-existence column OR build the AntiSemiApply/SemiApply executor properly. Stacked PR-4.
Multi-positive against multiple bound aliases (MATCH (n), (m) WHERE (n)-[:R]->(m) AND (n)-[:T]->(m) RETURN n, m) hits a pre-existing engine limit ("Cypher row projection from repeated MATCH aliases is not yet supported"). Not a slice 3 regression — same limit existed for the single-positive bound-aliases case before this PR. Filed mentally as a separate engine cleanup; out of scope here.

Cypher/GFQL WHERE cannot mix generic row predicates with pattern predicates #1031 — umbrella; slice 3 of 4.
refactor(gfql/cypher): switch to Earley parser; retire LALR pattern-shape cruft (#1031 slice 1) #1217 — slice 1 (Earley swap + grammar groundwork).
Cypher WHERE: static validation gap on row-boolean shapes (OR / NOT / XOR among row predicates) — emergent from #1217 Earley swap #1219 — row-boolean WHERE shapes (separate emergent issue from refactor(gfql/cypher): switch to Earley parser; retire LALR pattern-shape cruft (#1031 slice 1) #1217).

…ce 3) AND-joined positive WHERE pattern predicates (`WHERE (n)-[:R]->() AND (n)-[:T]->()`) now lift into structured `WhereClause.predicates` as N `WherePatternPredicate` entries. The ast_normalizer packs them into a single appended `MatchClause` whose `patterns: Tuple[Tuple[PatternElement, ...], ...]` carries one tuple per predicate (multi-pattern cartesian within MATCH), preserving the lowering invariant that only the FINAL match is connected — pre-binding seeds remain node-only. Changes: - `parser.py::pattern_atom` — split the greedy `WHERE_PATTERN` lexer token (which gobbles `pattern AND pattern AND ...` chains as a single match) back into individual pattern-item texts via `_WHERE_PATTERN_ITEM_RE.finditer` and emit one `BooleanExpr(op="pattern")` per item, joined by an AND-tree via `_rebuild_and_tree`. - `parser.py::_build_where_with_pattern_lift` — drop the `len(pattern_leaves) > 1` E108 gate; build N WherePatternPredicates. - `parser.py::_parse_single_where_pattern_predicate_text` — rename from `_parse_where_pattern_predicate_text` and remove its in-helper multi-item gate (caller now splits before invocation). - `ast_normalizer.py::_rewrite_where_pattern_predicates_to_matches` — drop the matching gate; loop over predicates running per-predicate validation (must include relationship; no new aliases); emit a single appended MatchClause with N patterns. Tests: - Add `test_gfql_executes_multi_positive_where_pattern_predicates_as_intersected_seed` for the runtime contract: rows where ALL patterns exist. - Update the legacy rejection test to assert the new lift + compile path. Verified: - 1574 GFQL tests pass (was 1573 baseline + 1 new + 1 updated). - mypy clean on `parser.py` and `ast_normalizer.py`. Out of scope: - Slice 2 (NOT-pattern, IC10 unblock): needs anti-semi-join lowering. Tracked in plan.md; will be a separate stacked PR. - Slice 4 (OR/XOR-around-pattern): needs row-level pattern-existence column or AntiSemiApply executor build. Tracked in plan.md. - Multi-positive against multiple bound aliases (`MATCH (n), (m) WHERE (n)-[:R]->(m) AND (n)-[:T]->(m) RETURN n, m`): hits a separate pre-existing engine limit ("repeated MATCH aliases" projection). Not introduced by this slice; out of scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lmeyerov · 2026-04-29T00:35:13Z

PR Review: #1229 — feat(gfql/cypher): multi-positive WHERE pattern predicates (#1031 slice 3)

Branch: m4-1031-slice-3-multi-positive @ 761356608
Base: master @ 864ceb20a
Files: parser.py, ast_normalizer.py, test_lowering.py, CHANGELOG.md
LOC: +115 / −85

Blockers

None.

Important

None.

Suggestions

[Quality, trivial] _parse_single_where_pattern_predicate_text docstring could note that input is a non-AND single-pattern text. Cosmetic.
[Testing, optional] Add a sibling test_lower_match_query_rejects_multi_pattern_introducing_new_aliases that asserts the per-predicate alias-introduction validation fires when one of N patterns introduces a new alias. ~5 LOC. The existing single-pattern test plus the loop-level inspection in ast_normalizer.py:484-510 cover the same code path; this is documentation, not safety net.

(Wave 1 also flagged the bound-aliases multi-pattern case hitting "repeated MATCH aliases" projection limit — rejected as a slice 3 finding because it's a pre-existing engine limit affecting single-positive bound-aliases too. PR body documents.)

Human checks required

Approve and merge (after CI green).
Confirm Closes #1031 is not set — slice 4 (OR/XOR-around-pattern) and slice 2 (NOT-pattern, IC10 unblock) remain. Cypher/GFQL WHERE cannot mix generic row predicates with pattern predicates #1031 stays open until those land.
Coordination signal on Cypher/GFQL WHERE cannot mix generic row predicates with pattern predicates #1031 thread post-merge: "slice 3 done; slice 2 next."

Methodology

Per ~/Work/graphistry/.agents/skills/review/SKILL.md. Two waves (Wave 1 in-session, Wave 2 post-CI-rerun). 10 dimensions. Adversarial in adversarial/wave-1.md. Two consecutive non-significant waves → CONVERGED.

CI signal + cross-repo pairing

Wave 1 surfaced one TCK strict-improvement drift: expr-pattern1-20 (MATCH (n) WHERE (n)-[:R1]-() AND (n)-[:R2]-() RETURN n) — exactly slice 3's target shape. Pre-PR: xfail rejecting; post-PR: passes.
Paired: tck-gfql m4-1031-slice-3-multi-positive branch (commit 0387dc7) moves the scenario to DIRECT_CYPHER_XFAIL_MATCHES_EXPECTED_BASE_KEYS. pygraphistry CI's Resolve tck-gfql ref step auto-picks the matching branch.
Final CI: 131/131 SUCCESS, 1 SKIPPED, 0 failures. Pairing well-formed; no other latent drift detected (neighbor scenarios expr-pattern1-19/21 are slice 2/4 territory, still rejected as expected).

Recommendation

Approve and merge with --admin. No defects. Trivial SUGGESTIONs not blocking. Cross-repo pairing complete.

lmeyerov merged commit 202abd3 into master Apr 29, 2026
261 of 263 checks passed

lmeyerov deleted the m4-1031-slice-3-multi-positive branch April 29, 2026 00:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gfql/cypher): multi-positive WHERE pattern predicates (#1031 slice 3)#1229

feat(gfql/cypher): multi-positive WHERE pattern predicates (#1031 slice 3)#1229
lmeyerov merged 1 commit intomasterfrom
m4-1031-slice-3-multi-positive

lmeyerov commented Apr 28, 2026

Uh oh!

lmeyerov commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lmeyerov commented Apr 28, 2026

Summary

What changed

Tests

Test plan

Out of scope (deferred, tracked in plan.md)

Related

Uh oh!

lmeyerov commented Apr 29, 2026

PR Review: #1229 — feat(gfql/cypher): multi-positive WHERE pattern predicates (#1031 slice 3)

Blockers

Important

Suggestions

Human checks required

Methodology

CI signal + cross-repo pairing

Recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant