Skip to content

Add comprehensive test suite for template, substitution, and FOL patterns#7

Open
sileod wants to merge 2 commits intomainfrom
claude/fix-template-digit-bugs-iCtxf
Open

Add comprehensive test suite for template, substitution, and FOL patterns#7
sileod wants to merge 2 commits intomainfrom
claude/fix-template-digit-bugs-iCtxf

Conversation

@sileod
Copy link
Copy Markdown
Owner

@sileod sileod commented Apr 11, 2026

Summary

This PR adds a comprehensive test suite (tests/test_template_digit_and_constraint.py) covering critical functionality in template preprocessing, substitution mechanics, constraints, and first-order logic (FOL) pattern generation. The tests validate both unit-level behavior and end-to-end integration scenarios.

Key Changes

  • Template Preprocessing Tests (TestDefaultPreprocessTemplate): Validates default_preprocess_template() behavior including bare-digit wrapping, edge cases with escaped characters, multi-digit numbers, and the double-wrapping footgun when mixing explicit {n} syntax with preprocessing.

  • Substitution Tests (TestSubstitution): Tests the Substitution class including variable replacement via N[?←X] patterns, chaining mechanics, handling of escaped question marks, and fallback to .format() when no substitution pattern is present. Includes a documented bug test for digits in rendered content.

  • Constraint Tests (TestConstraint): Validates substring-based constraint checking (0∉1) for preventing unwanted overlaps between rendered children, including bidirectional and multi-way constraints.

  • Rule Rendering Integration Tests (TestRuleRenderIntegration): Tests end-to-end rendering of terminal and composite rules with multiple children, including callable templates and digit reordering.

  • FOL Pattern Unit Tests (TestFOLPatternsUnit): Validates specific FOL template patterns including:

    • Entity substitution into properties and predicates
    • Universal and existential quantifier chains with proper variable binding
    • Adjective chain constraints preventing duplicate adjectives
    • Proper placeholder (?) handling through the substitution chain
  • FOL Generation Smoke Tests: Parametrized tests across 60 random seeds validating:

    • No bare question marks leak into English output
    • No unresolved placeholders remain in TPTP output
    • No double whitespace in generated TPTP
    • Free variables are properly bound by quantifiers

Notable Implementation Details

  • Uses minimal helper functions (_Stub, _make_grammar, _fp, _node) to construct test grammars and parse trees without full grammar initialization overhead
  • Documents known bugs (e.g., digits in rendered content causing IndexError) with failing tests that will pass once fixed
  • Covers both happy paths and edge cases (escaped characters, multi-digit numbers, constraint violations)
  • Tests are organized by functional area with clear docstrings explaining fragile areas and expected behavior

https://claude.ai/code/session_01HdENPeusFQ7i6CHZRVJtmr

claude added 2 commits April 11, 2026 10:47
…FOL patterns

Tests cover the fragile areas identified around digit-based template processing:

- default_preprocess_template: bare digits → {n}, ← skip, multi-digit, and
  the footgun where explicit {0} syntax gets double-wrapped to {{0}}
- Substitution: N[?←X] chaining mechanics, \? escape, fallback to format,
  and a confirmed bug where digits in rendered content corrupt wrap()
- Constraint: ∉ substring logic, multi-condition, bidirectional checks
- Rule render integration: terminal/unary/binary rules, reversed digit order,
  callable templates
- FOL patterns: entity-into-property substitution, X_quantifier placeholder
  chain, full universal/existential quantifier chains, adjective constraint
- FOL generation smoke: 60 seeds × 4 checks (no ? in eng, no (?) in tptp,
  no double whitespace, no free X without quantifier binding)

The test_digits_in_rendered_content_crash_substitution test exposes a real
bug: Substitution.wrap() converts ALL bare digits in substituted content to
{n} format placeholders, so content like 'pred5(X)' becomes 'pred{5}(X)'
and format() crashes with IndexError. Currently safe in FOL (predicates use
letters preda–predj) but unprotected at the API level.

https://claude.ai/code/session_01HdENPeusFQ7i6CHZRVJtmr
Bug 1 – Substitution.wrap() corrupts digits in rendered content:
  The old implementation applied wrap() (digit→{n}) to the entire substituted
  string, so a rendered arg like 'pred5(?)' would become 'pred{5}(X)' and
  crash with IndexError when .format() looked for positional arg 5.
  Fix: split the template at N[?←X] boundaries with re.split; wrap only the
  original template text parts and the replacement string X (e.g. '1'→'{1}'
  for format-slot chaining), never the rendered content of the substituted arg.

Bug 2 – default_preprocess_template double-wraps explicit {n} syntax:
  The regex r'(\d+)' matched the digit inside '{0}', turning '{0}' into '{{0}}'
  which .format() renders back as the literal string '{0}'.
  Fix: add negative lookbehind/lookahead (?<!{)(\d+)(?!}) so already-braced
  digits are skipped; bare digits are still wrapped as before.

Also adds depth-bound tests (section 7 in the test file):
  - FOL generation: assert min_depth ≤ height ≤ max_depth for 60 seeds
  - Simple recursive grammar: same bounds check for 30 seeds
  - Without min_depth: verify heights vary and stay ≤ max_depth

https://claude.ai/code/session_01HdENPeusFQ7i6CHZRVJtmr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants