Skip to content

feat(gfql/cypher): multi-positive WHERE pattern predicates (#1031 slice 3)#1229

Merged
lmeyerov merged 1 commit intomasterfrom
m4-1031-slice-3-multi-positive
Apr 29, 2026
Merged

feat(gfql/cypher): multi-positive WHERE pattern predicates (#1031 slice 3)#1229
lmeyerov merged 1 commit intomasterfrom
m4-1031-slice-3-multi-positive

Conversation

@lmeyerov
Copy link
Copy Markdown
Contributor

Summary

Slice 3 of #1031. AND-joined positive WHERE pattern predicates now lift into structured WhereClause.predicates as N WherePatternPredicate entries. Closes the len(pattern_leaves) > 1 E108 gate that #1217 (slice 1) deferred.

MATCH (n) WHERE (n)-[:R]->() AND (n)-[:T]->() RETURN n.id  -- now works

What changed

Parser (cypher/parser.py):

  • pattern_atom now splits the greedy WHERE_PATTERN lexer token (which gobbles pattern AND pattern AND ... as a single match) back into individual pattern-item texts via _WHERE_PATTERN_ITEM_RE.finditer. Emits one BooleanExpr(op="pattern") per item, joined by _rebuild_and_tree. Upstream _split_top_level_and_pattern_leaves then naturally extracts N pattern leaves.
  • _build_where_with_pattern_lift — drop the len(pattern_leaves) > 1 E108 gate; build N WherePatternPredicates.
  • _parse_where_pattern_predicate_text renamed to _parse_single_where_pattern_predicate_text and its in-helper multi-item gate removed (caller now splits before invocation).

ast_normalizer (cypher/ast_normalizer.py):

  • _rewrite_where_pattern_predicates_to_matches — drop the matching gate. Run per-predicate validation independently. Pack the N patterns into a single appended MatchClause with patterns: Tuple[Tuple[PatternElement, ...], ...] of N entries (multi-pattern cartesian within MATCH), preserving the lowering invariant that only the FINAL match is connected — pre-binding seeds remain node-only.

Tests

  • New: test_gfql_executes_multi_positive_where_pattern_predicates_as_intersected_seed — runtime contract: rows where ALL patterns exist.
  • Updated: test_lower_match_query_supports_multiple_where_pattern_predicates (was ..._rejects_...) — assert the lift + compile path now succeeds.

Test plan

  • pytest graphistry/tests/compute/gfql/1574 passed, 87 skipped, 15 xfailed, no regressions.
  • mypy graphistry/compute/gfql/cypher/parser.py graphistry/compute/gfql/cypher/ast_normalizer.py — clean.

Out of scope (deferred, tracked in plan.md)

  • Slice 2 (NOT-pattern, IC10 benchmark unblock) — needs anti-semi-join lowering. Will be a stacked PR-2 on this branch.
  • Slice 4 (OR/XOR-around-pattern) — needs row-level pattern-existence column OR build the AntiSemiApply/SemiApply executor properly. Stacked PR-4.
  • Multi-positive against multiple bound aliases (MATCH (n), (m) WHERE (n)-[:R]->(m) AND (n)-[:T]->(m) RETURN n, m) hits a pre-existing engine limit ("Cypher row projection from repeated MATCH aliases is not yet supported"). Not a slice 3 regression — same limit existed for the single-positive bound-aliases case before this PR. Filed mentally as a separate engine cleanup; out of scope here.

Related

…ce 3)

AND-joined positive WHERE pattern predicates (`WHERE (n)-[:R]->() AND
(n)-[:T]->()`) now lift into structured `WhereClause.predicates` as N
`WherePatternPredicate` entries.  The ast_normalizer packs them into a
single appended `MatchClause` whose `patterns: Tuple[Tuple[PatternElement,
...], ...]` carries one tuple per predicate (multi-pattern cartesian
within MATCH), preserving the lowering invariant that only the FINAL
match is connected — pre-binding seeds remain node-only.

Changes:

- `parser.py::pattern_atom` — split the greedy `WHERE_PATTERN` lexer
  token (which gobbles `pattern AND pattern AND ...` chains as a single
  match) back into individual pattern-item texts via
  `_WHERE_PATTERN_ITEM_RE.finditer` and emit one
  `BooleanExpr(op="pattern")` per item, joined by an AND-tree via
  `_rebuild_and_tree`.
- `parser.py::_build_where_with_pattern_lift` — drop the
  `len(pattern_leaves) > 1` E108 gate; build N WherePatternPredicates.
- `parser.py::_parse_single_where_pattern_predicate_text` — rename from
  `_parse_where_pattern_predicate_text` and remove its in-helper
  multi-item gate (caller now splits before invocation).
- `ast_normalizer.py::_rewrite_where_pattern_predicates_to_matches` —
  drop the matching gate; loop over predicates running per-predicate
  validation (must include relationship; no new aliases); emit a single
  appended MatchClause with N patterns.

Tests:

- Add `test_gfql_executes_multi_positive_where_pattern_predicates_as_intersected_seed`
  for the runtime contract: rows where ALL patterns exist.
- Update the legacy rejection test to assert the new lift + compile path.

Verified:
- 1574 GFQL tests pass (was 1573 baseline + 1 new + 1 updated).
- mypy clean on `parser.py` and `ast_normalizer.py`.

Out of scope:
- Slice 2 (NOT-pattern, IC10 unblock): needs anti-semi-join lowering.
  Tracked in plan.md; will be a separate stacked PR.
- Slice 4 (OR/XOR-around-pattern): needs row-level pattern-existence
  column or AntiSemiApply executor build.  Tracked in plan.md.
- Multi-positive against multiple bound aliases (`MATCH (n), (m) WHERE
  (n)-[:R]->(m) AND (n)-[:T]->(m) RETURN n, m`): hits a separate
  pre-existing engine limit ("repeated MATCH aliases" projection).
  Not introduced by this slice; out of scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmeyerov
Copy link
Copy Markdown
Contributor Author

PR Review: #1229 — feat(gfql/cypher): multi-positive WHERE pattern predicates (#1031 slice 3)

Branch: m4-1031-slice-3-multi-positive @ 761356608
Base: master @ 864ceb20a
Files: parser.py, ast_normalizer.py, test_lowering.py, CHANGELOG.md
LOC: +115 / −85

Blockers

None.

Important

None.

Suggestions

  1. [Quality, trivial] _parse_single_where_pattern_predicate_text docstring could note that input is a non-AND single-pattern text. Cosmetic.
  2. [Testing, optional] Add a sibling test_lower_match_query_rejects_multi_pattern_introducing_new_aliases that asserts the per-predicate alias-introduction validation fires when one of N patterns introduces a new alias. ~5 LOC. The existing single-pattern test plus the loop-level inspection in ast_normalizer.py:484-510 cover the same code path; this is documentation, not safety net.

(Wave 1 also flagged the bound-aliases multi-pattern case hitting "repeated MATCH aliases" projection limit — rejected as a slice 3 finding because it's a pre-existing engine limit affecting single-positive bound-aliases too. PR body documents.)

Human checks required

  1. Approve and merge (after CI green).
  2. Confirm Closes #1031 is not set — slice 4 (OR/XOR-around-pattern) and slice 2 (NOT-pattern, IC10 unblock) remain. Cypher/GFQL WHERE cannot mix generic row predicates with pattern predicates #1031 stays open until those land.
  3. Coordination signal on Cypher/GFQL WHERE cannot mix generic row predicates with pattern predicates #1031 thread post-merge: "slice 3 done; slice 2 next."

Methodology

Per ~/Work/graphistry/.agents/skills/review/SKILL.md. Two waves (Wave 1 in-session, Wave 2 post-CI-rerun). 10 dimensions. Adversarial in adversarial/wave-1.md. Two consecutive non-significant waves → CONVERGED.

CI signal + cross-repo pairing

  • Wave 1 surfaced one TCK strict-improvement drift: expr-pattern1-20 (MATCH (n) WHERE (n)-[:R1]-() AND (n)-[:R2]-() RETURN n) — exactly slice 3's target shape. Pre-PR: xfail rejecting; post-PR: passes.
  • Paired: tck-gfql m4-1031-slice-3-multi-positive branch (commit 0387dc7) moves the scenario to DIRECT_CYPHER_XFAIL_MATCHES_EXPECTED_BASE_KEYS. pygraphistry CI's Resolve tck-gfql ref step auto-picks the matching branch.
  • Final CI: 131/131 SUCCESS, 1 SKIPPED, 0 failures. Pairing well-formed; no other latent drift detected (neighbor scenarios expr-pattern1-19/21 are slice 2/4 territory, still rejected as expected).

Recommendation

Approve and merge with --admin. No defects. Trivial SUGGESTIONs not blocking. Cross-repo pairing complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant