Skip to content

[Benchmarking]: Benchmarking prototype - dataset: CUAD#228

Open
acarbonetto wants to merge 93 commits into
mainfrom
acarbo/poc-benchmark-cuad
Open

[Benchmarking]: Benchmarking prototype - dataset: CUAD#228
acarbonetto wants to merge 93 commits into
mainfrom
acarbo/poc-benchmark-cuad

Conversation

@acarbonetto

@acarbonetto acarbonetto commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator

Issue #, if available:

N/A

Description of changes:

Add an integration test that benchmarks a small prototype dataset.

This PR adds three tests CuadBenchmarkEvaluate, CuadBenchmarkQuery, and CuadBenchmarkBuild, and a small prototype dataset for runnings.

Test Results:

02-CuadBenchmarkEvaluate.PASS.json
01-CuadBenchmarkQuery.PASS.json
00-CuadBenchmarkBuild.PASS.json

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
@acarbonetto acarbonetto marked this pull request as ready for review May 4, 2026 16:31
@oussamahansal oussamahansal changed the title Benchmarking prototype - dataset: CUAD [Benchmarking]: Benchmarking prototype - dataset: CUAD May 28, 2026
Ian Robinson and others added 18 commits June 8, 2026 08:41
)

Co-authored-by: ianrob <ianrob@amazon.co.uk>
…DB with Neo4j, fix notebooks and scripts (#189)

* Consolidate environment configuration

- Delete redundant docker/.env.jupyter and docker/.env.template
- Consolidate into single notebooks/.env.template (source of truth)
- Replace 3 S3 bucket vars with single S3_BUCKET_NAME + prefixes
- Align DynamoDB table name (graphrag-toolkit-batch-table)
- Fix env var names (INCLUDE_DOMAIN_LABELS, AWS_REGION)
- Update models to claude-sonnet-4-6, AWS_PROFILE to default
- Add .env and output.log to .gitignore

* Replace FalkorDB with Neo4j and standardize Docker

- Replace FalkorDB with Neo4j 5.25-community + APOC in dev compose
- Rename all services to hybrid convention across all compose files
- Add neo4j Python driver and build-essential to dev Dockerfile
- Fix dev-reset.sh: run docker compose down before rebuilding
- Add mysql cleanup to dev-reset.sh
- Update container/volume names in reset scripts
- Remove lexical-graph-src mount from main compose (fix dev mode detection)
- Fix dev compose mount path and remove non-existent mysql schema mount

* Fix AWS setup scripts and add prompt upload

- Add bedrock:InvokeModel to batch inference IAM role policy
- Align bucket name to graphrag-toolkit-ACCOUNT_ID pattern
- Fix AWS profile default from padmin to default
- Add S3 prompt file upload (extract from JSON, upload as .txt)
- Apply all fixes to both .sh and .ps1 scripts

* Standardize and fix notebooks

- Fix titles to match numbering (00-Setup through 04-Cloud-Querying)
- Standardize dotenv loading (%dotenv magic)
- Fix collection_id web-docs to best-practices in notebook 01
- Add Neo4jGraphStoreFactory registration to notebook 03
- Add Bedrock prompt provider section to notebook 04
- Add GPU requirement warning for BGEReranker
- Remove hardcoded ARNs, use placeholder comments
- Fix hardcoded region to os.environ in notebook 03
- Remove empty trailing cells, clear all outputs

* Fix documentation

- Update README Quick Start to correct setup order
- Add .env sync instructions after setup script
- Update model references to claude-sonnet-4-6
- Add RESPONSE_MODEL and EVALUATION_MODEL to docs
- Fix BATCH_ROLE_NAME to bedrock-batch-inference-role
- Align DynamoDB table name in batch_processing.md and aws_integration.md
- Replace padmin with your-profile in setup docs
- Replace ccms-rag-extract with graphrag-toolkit bucket names

* Update examples/lexical-graph-hybrid-dev/notebooks/.env.template

Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>

* Update examples/lexical-graph-hybrid-dev/README.md

Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>

* Remove quotes from env vars, add AWS CLI quickstart link

Remove quotes from all env var values in README.md and .env.template.
Add link to AWS CLI quickstart guide and aws sts get-caller-identity
verification command in prerequisites.

* Remove redundant .env paths from .gitignore

The bare .env pattern on line 31 already matches .env files in all
subdirectories, making the explicit paths for local-dev and hybrid-dev
notebooks redundant.

* Replace duplicated env vars in aws_integration.md with link to .env.template

* Remove duplicate .env configuration sections from README

Replace repeated env vars listing and setup instructions with a link
to notebooks/.env.template as the single source of truth.

* Use brackets for optional profile arg in setup docs

* Remove hardcoded profile defaults from setup scripts

Remove hardcoded 'default' profile from setup-bedrock-batch.sh and .ps1.
Scripts now accept an optional profile argument — when omitted, AWS CLI
uses its default credential chain (env vars, instance profile, etc.).

- .sh: use PROFILE_ARGS conditional variable
- .ps1: use @ProfileArgs splatting
- Update README.md and setup-bedrock-batch-doc.md accordingly

---------

Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
* update documentation

* fix broken mdx files

* small comments

---------

Co-authored-by: Oussama Hansal <haooussa@amazon.com>
Bumps [astro](https://github.com/withastro/astro/tree/HEAD/packages/astro) from 5.6.2 to 5.18.1.
- [Release notes](https://github.com/withastro/astro/releases)
- [Changelog](https://github.com/withastro/astro/blob/astro@5.18.1/packages/astro/CHANGELOG.md)
- [Commits](https://github.com/withastro/astro/commits/astro@5.18.1/packages/astro)

---
updated-dependencies:
- dependency-name: astro
  dependency-version: 5.18.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Co-authored-by: Oussama Hansal <haooussa@amazon.com>
The Neptune `create_config` helper constructed a botocore Config without
`max_pool_connections`, which falls back to botocore's default of 10.
Under the default `CompositeTraversalBasedRetriever` both
`ChunkBasedSearch` and `EntityNetworkSearch` run in parallel with
`ThreadPoolExecutor(max_workers=num_workers=10)` each, driving up to
20 concurrent Neptune queries and queuing half of them on the pool.

Apply a 32-connection default via `setdefault`, giving ~60% headroom
above the code-provable peak while preserving explicit overrides via
the existing `config=` kwarg on `GraphStoreFactory.for_graph_store`.

Co-authored-by: voidwisp <voidwisp@users.noreply.github.com>
When the caller passes a sized iterable (e.g. a list), compute the
total batch count once per pipeline run and include it in the
per-batch log message as "batch: N/M". True streaming iterators
without __len__ keep the existing "batch: N" format.

In non-TTY environments (CloudWatch, ECS, CI/CD) the per-batch log is
the only progress signal; knowing "batch 7 of 12" versus just
"batch 7" is the difference between estimating completion and
guessing. Zero extra materialization cost — only len() is probed.

Follow-up to #96 (closes #97).

Co-authored-by: voidwisp <voidwisp@users.noreply.github.com>
* feat(opensearch): thread client_kwargs through OpenSearchIndex

Expose a `client_kwargs` dict on `OpenSearchIndex.for_index` and
`create_opensearch_vector_client` so callers can override OpenSearch-py
client settings (e.g. `pool_maxsize`, `timeout`, `max_retries`) without
patching the library.

The underlying `create_os_client` / `create_os_async_client` helpers
already accept `**kwargs`, but nothing upstream was passing values in,
so the query clients fell back to urllib3's default pool size of 1.
Under concurrent retrieval this causes "connection pool is full"
warnings and repeated TLS handshakes.

Usage:

    VectorStoreFactory.for_vector_store(
        uri,
        client_kwargs={"pool_maxsize": 10, "timeout": 60},
    )

The kwarg is optional; when unset, behaviour is unchanged.

* feat(opensearch): default pool_maxsize to 32

urllib3 falls back to pool_maxsize=1 when not set, which serializes
concurrent queries behind a single TLS connection. 32 is a sane
baseline that matches the typical fan-out of the traversal-based
retrievers. Explicit overrides (including the existing
``pool_maxsize=1`` used by ``index_exists``) still win via
``setdefault``.

* fix(opensearch): make all hardcoded client kwargs overridable via client_kwargs

Addresses reviewer feedback on PR #204. Previously only pool_maxsize
used setdefault; use_ssl, verify_certs, connection_class, timeout,
max_retries, retry_on_timeout were unconditionally hardcoded as
explicit kwargs alongside **kwargs, so any caller-supplied override
for those keys triggered:

  TypeError: OpenSearch() got multiple values for keyword argument ...

Move all defaults into a single dict merged with caller kwargs
(caller wins). Same fix applied to both create_os_client and
create_os_async_client. Adds a test per client that exercises
pool_maxsize + timeout override while confirming the remaining
defaults still flow through.

---------

Co-authored-by: voidwisp <voidwisp@users.noreply.github.com>
Co-authored-by: Ian Robinson <ianrob@amazon.com>
Bumps [astro](https://github.com/withastro/astro/tree/HEAD/packages/astro) and [@astrojs/starlight](https://github.com/withastro/starlight/tree/HEAD/packages/starlight). These dependencies needed to be updated together.

Updates `astro` from 5.18.1 to 6.1.8
- [Release notes](https://github.com/withastro/astro/releases)
- [Changelog](https://github.com/withastro/astro/blob/main/packages/astro/CHANGELOG.md)
- [Commits](https://github.com/withastro/astro/commits/astro@6.1.8/packages/astro)

Updates `@astrojs/starlight` from 0.32.5 to 0.38.3
- [Release notes](https://github.com/withastro/starlight/releases)
- [Changelog](https://github.com/withastro/starlight/blob/main/packages/starlight/CHANGELOG.md)
- [Commits](https://github.com/withastro/starlight/commits/@astrojs/starlight@0.38.3/packages/starlight)

---
updated-dependencies:
- dependency-name: astro
  dependency-version: 6.1.8
  dependency-type: direct:production
- dependency-name: "@astrojs/starlight"
  dependency-version: 0.38.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add documentation site URL and PyPI link  to the top of the root

* address comments

---------

Co-authored-by: Oussama Hansal <haooussa@amazon.com>
---------

Co-authored-by: Oussama Hansal <haooussa@amazon.com>
…cs-site (#219)

Co-authored-by: Oussama Hansal <haooussa@amazon.com>
…ph (#213)

Bumps [python-dotenv](https://github.com/theskumar/python-dotenv) from 1.1.1 to 1.2.2.
- [Release notes](https://github.com/theskumar/python-dotenv/releases)
- [Changelog](https://github.com/theskumar/python-dotenv/blob/main/CHANGELOG.md)
- [Commits](theskumar/python-dotenv@v1.1.1...v1.2.2)

---
updated-dependencies:
- dependency-name: python-dotenv
  dependency-version: 1.2.2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ts (#223)

KeywordVSSProvider and GraphSummary are retrieval-time components but
incorrectly default to GraphRAGConfig.extraction_llm. When a user sets
response_llm to an active model but leaves extraction_llm as default,
traversal-based search fails silently at keyword extraction.

This changes both components to use response_llm, consistent with
KeywordProvider and QueryModeRetriever.

Co-authored-by: Abdellah Ghassel <abdghsl@amazon.com>
mykola-pereyma and others added 23 commits June 8, 2026 08:44
…oss examples (#272)

* Use .env.template pattern for environment configuration across examples

Standardize environment variable handling across lexical-graph notebooks,
workshop, and byokg-rag examples to follow the pattern established in
lexical-graph-local-dev and lexical-graph-hybrid-dev.

Changes:
- Add .env.template files for lexical-graph/notebooks, workshop, and byokg-rag
- Refactor all 5 byokg-rag notebooks to use dotenv instead of hardcoded values
- Add python-dotenv to byokg-rag dependencies
- Update READMEs with environment setup instructions
- Create byokg-rag README with notebook descriptions and required env vars

* Update examples/lexical-graph/workshop/readme.md

Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>

* Update examples/byokg-rag/README.md

Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>

* Update examples/lexical-graph/workshop/.env.template

Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>

* Address PR review feedback

- Add placeholder value for GRAPH_IDENTIFIER in byokg-rag .env.template
- Restore trailing newlines in notebook JSON files
- Remove redundant 'fill in your values' text from READMEs
- Remove bare GRAPH_STORE= and VECTOR_STORE= lines from .env.templates
  to avoid unsetting existing environment variables
- Move python-dotenv from main requirements to optional dev dependency

* Use <YOUR_GRAPH_IDENTIFIER> placeholder style consistent with hybrid-dev

* Simplify env config README sections — remove redundant intro text

* Move python-dotenv to notebook pip install cells

Remove python-dotenv from pyproject.toml dev dependency. Add it to
the pip install cells in each byokg-rag notebook instead, since it is
only needed for notebook usage, not the library runtime.

* Fix cell ordering in local_graph notebook and add python-dotenv to prerequisites

- Swap pip install cell before dotenv usage cell in local_graph notebook
- Add python-dotenv to README prerequisites for discoverability

---------

Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Extend security-scan.yml to scan lexical-graph/src/ with Bandit,
matching the existing byokg-rag pattern. Adds lexical-graph/** to
trigger paths and a separate bandit job for lexical-graph.
Bumps [actions/github-script](https://github.com/actions/github-script) from 8 to 9.
- [Release notes](https://github.com/actions/github-script/releases)
- [Commits](actions/github-script@v8...v9)

---
updated-dependencies:
- dependency-name: actions/github-script
  dependency-version: '9'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 4 to 7.
- [Release notes](https://github.com/astral-sh/setup-uv/releases)
- [Commits](astral-sh/setup-uv@v4...v7)

---
updated-dependencies:
- dependency-name: astral-sh/setup-uv
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Updates the requirements on [pre-commit](https://github.com/pre-commit/pre-commit) to permit the latest version.
- [Release notes](https://github.com/pre-commit/pre-commit/releases)
- [Changelog](https://github.com/pre-commit/pre-commit/blob/main/CHANGELOG.md)
- [Commits](pre-commit/pre-commit@v3.5.0...v4.6.0)

---
updated-dependencies:
- dependency-name: pre-commit
  dependency-version: 4.6.0
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v5...v6)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [softprops/action-gh-release](https://github.com/softprops/action-gh-release) from 2 to 3.
- [Release notes](https://github.com/softprops/action-gh-release/releases)
- [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md)
- [Commits](softprops/action-gh-release@v2...v3)

---
updated-dependencies:
- dependency-name: softprops/action-gh-release
  dependency-version: '3'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4 to 6.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](actions/setup-node@v4...v6)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
- hash_utils.py: Add usedforsecurity=False to MD5 (used for content IDs, not crypto)
- proposition_extractor.py: Add # nosec B615 for HuggingFace downloads (model name is user-configurable, pinning not feasible)
- pg_vector_indexes.py: Add # nosec B608 for SQL identifier interpolation (table/column names from internal config, not user input)

bandit -r lexical-graph/src/ -ll now exits with code 0 (0 medium+ findings).
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
The validation check used '== 3' which incorrectly rejected exactly 3
parameters (the actual maximum supported by the function signature).
Changed to '> 3' to allow the full range of 0-3 parameters.

Added unit tests covering all boundary cases:
- 0, 1, 2, 3 parameters accepted
- 4, 5 parameters rejected
- Missing name rejected
- Parameters marked required and not hidden
…les fix

Bump upload-pages-artifact from v3 to v5 and add include-hidden-files: true
to preserve .nojekyll in the docs artifact. Without this flag, v4+ excludes
dotfiles by default which could affect GitHub Pages deployment.
Replace 3,402 files with :: in their names (invalid on Windows/NTFS) with
ZIP archives (amazon.zip, ecorp.zip, ntsb.zip, wiki-aircraft.zip).
This fixes the inability to clone the repository on Windows.
- Add zip_source parameter for explicit ZIP archive reading
- Add filename_sanitizer parameter for customizable filesystem-safe names
- Include windows_safe_filename helper (replaces :: with __)
- Update 2-Querying.ipynb and setup.py to use zip_source
- Add Windows compatibility docs to workshop readme
- Add 17 unit tests for new functionality
The : character is illegal in Windows/NTFS filenames. The previous
implementation only replaced :: with __ but left single : intact,
causing NotADirectoryError on Windows.

Now replaces :: with __ first, then remaining : with _:
  aws::abc:1234 → aws__abc_1234
…dows

On Windows, open() without encoding defaults to the system codepage
(cp1252) which cannot encode Unicode characters like snowman or globe
emoji. Adding encoding='utf-8' ensures consistent behavior across
platforms.
…rieval, storage, and utils (#300)

* [FEATURE] Add unit tests for lexical-graph storage/graph utilities

Adds unit test coverage for two undertested modules. Tests only, no source
changes.

- storage/graph/graph_utils.py: 21.5% -> 97.5%
- storage/graph/query_tree.py: 25.6% -> 100%

Full-suite coverage: 46.10% -> 46.86% (+488 lines, +71 tests).

* [FEATURE] Bump lexical-graph coverage threshold from 45% to 46%

Locks in the gain from the new graph_utils and query_tree tests
(46.10% -> 46.86%).

* [FEATURE] Trim AI-flavoured comments and tighten test names in new graph tests

Drop the module/class boilerplate docstrings and restate-the-code inline
comments added in the previous coverage commits, keeping only the WHY
comments (branch mappings, LIFO ordering). Rename several tests for
clarity: fix a typo (dedupd to deduped), drop uppercase AND/OR from
identifiers, and reshape the property-based node_id names to describe
the outcome instead of the input shape. Fold the dash case into the
non-alnum test it overlaps with.

* [FEATURE] Bump lexical-graph coverage from 47% to 58% with broad unit-test additions

Adds unit tests across retrieval utils, processors, query-context providers, retrievers, storage wrappers, vector index helpers, and shared utils. Pushes the full-suite coverage from 47% to 58.15% (1305 to 1611 tests) and lifts the CI gate from 47% to 56%.

Covered modules (each new test file targets one source module):

retrieval/utils: statement_utils, chunk_utils, entity_utils, vector_utils, query_decomposition
retrieval/processors: zero_scores, truncate_statements, update_chunk_metadata, rerank_statements
retrieval/query_context: query_mode, keyword_provider, keyword_nlp_provider, keyword_vss_provider, entity_provider, entity_from_top_statement_provider, entity_vss_provider, entity_context_provider
retrieval/retrievers: query_mode_retriever, chunk_cosine_search, chunk_based_search, topic_based_search, entity_based_search, entity_network_search, entity_context_search, composite_traversal_based_retriever, chunk_based_semantic_search, semantic_chunk_beam_search, traversal_based_base_retriever
retrieval/summary: graph_summary
storage/graph: multi_tenant_graph_store, neptune_graph_stores (helpers + factories)
storage/vector: vector_store, multi_tenant_vector_store, read_only_vector_store, neptune_vector_indexes, s3_vector_indexes helpers
utils: reranker_utils, bedrock_utils, llm_cache (predict/stream/cache paths)

Tests use mocks for external services (graph stores, LLMs, boto3 clients, embedding caches) so no infrastructure is required. pg_vector_indexes and opensearch_vector_indexes are not covered here because psycopg2-binary and llama-index-vector-stores-opensearch are not in the test env; the gate at 56 reflects the actual achievable ceiling on this configuration.

* [FEATURE] Remove non-deterministic test_is_unique per reviewer feedback

Drop the new_query_var uniqueness assertion. uuid4 collision is theoretically
possible (1 in 2^122), so the assertion is not strictly reproducible. The hex
body and prefix tests already validate the output shape; an accidental fixed
return would still fail those.

Per acarbonetto review comment on PR #1.

* [FIX] Add license header to retrievers/__init__.py test package

The empty __init__.py was failing the license-header-check job, which
gates the test job. Recent CI runs showed "unknown%" in the coverage
report comment because the artifact never uploaded.

* [FEATURE] Address Mykola's review comments on PR #1 test additions

* test_reranker_utils: add end-to-end TF-IDF ranking test using the real
  tfidf_matcher backend (no mocks), confirming relevance ordering on
  plain English input.
* test_rerank_statements: strengthen model and bedrock dispatch tests to
  verify scores actually land on statements, not just that the scorer
  was called.
* test_entity_searches: add edge-case tests for empty graph-store
  results, store-side exceptions, and missing entity contexts.

* [FEATURE] Enhance tests for composite traversal retriever and neptune vector indexes with pytest fixtures and mock dependencies

* [FEATURE] Raise coverage of four lexical-graph modules to >=80% per Andrew's review

Add unit tests for the modules acarbonetto flagged below 80% on fork PR #1:
rerank_statements (69->99%), keyword_vss_provider (72->100%),
entity_context_provider (75->96%), neptune_graph_stores (66->92%).

Covers the previously-untested scoring-method bodies in rerank_statements,
the topic-content path in keyword_vss_provider, the entity context tree
builders in entity_context_provider, and intercept_before_parse plus the
NeptuneDatabaseClient query/exception paths called out by Kiro.

* [FIX] Return one content string per topic in KeywordVSSProvider._get_topic_content

The result-collection loop iterated over each topic's joined statement string
character by character, so _get_topic_content returned a list of single
characters instead of per-topic content. Append the future result directly and
skip topics with no statements.
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Lexical Graph Coverage Report: The coverage is at 59.55% (target: 80%). Download the HTML report here.

@mykola-pereyma mykola-pereyma left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested inline

@mykola-pereyma mykola-pereyma left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: duplicates, dead code, and minor issues flagged inline.

Comment thread integration-tests/build-tests.sh Outdated
Comment thread integration-tests/build-tests.sh Outdated
Comment thread benchmark-tests/run_evaluation.py
Comment thread integration-tests/benchmark-data/cuad-prototype/qa.json
Comment thread .gitignore Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants