[Benchmarking]: Benchmarking prototype - dataset: CUAD#228
Open
acarbonetto wants to merge 93 commits into
Open
[Benchmarking]: Benchmarking prototype - dataset: CUAD#228acarbonetto wants to merge 93 commits into
acarbonetto wants to merge 93 commits into
Conversation
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
21 tasks
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
…DB with Neo4j, fix notebooks and scripts (#189) * Consolidate environment configuration - Delete redundant docker/.env.jupyter and docker/.env.template - Consolidate into single notebooks/.env.template (source of truth) - Replace 3 S3 bucket vars with single S3_BUCKET_NAME + prefixes - Align DynamoDB table name (graphrag-toolkit-batch-table) - Fix env var names (INCLUDE_DOMAIN_LABELS, AWS_REGION) - Update models to claude-sonnet-4-6, AWS_PROFILE to default - Add .env and output.log to .gitignore * Replace FalkorDB with Neo4j and standardize Docker - Replace FalkorDB with Neo4j 5.25-community + APOC in dev compose - Rename all services to hybrid convention across all compose files - Add neo4j Python driver and build-essential to dev Dockerfile - Fix dev-reset.sh: run docker compose down before rebuilding - Add mysql cleanup to dev-reset.sh - Update container/volume names in reset scripts - Remove lexical-graph-src mount from main compose (fix dev mode detection) - Fix dev compose mount path and remove non-existent mysql schema mount * Fix AWS setup scripts and add prompt upload - Add bedrock:InvokeModel to batch inference IAM role policy - Align bucket name to graphrag-toolkit-ACCOUNT_ID pattern - Fix AWS profile default from padmin to default - Add S3 prompt file upload (extract from JSON, upload as .txt) - Apply all fixes to both .sh and .ps1 scripts * Standardize and fix notebooks - Fix titles to match numbering (00-Setup through 04-Cloud-Querying) - Standardize dotenv loading (%dotenv magic) - Fix collection_id web-docs to best-practices in notebook 01 - Add Neo4jGraphStoreFactory registration to notebook 03 - Add Bedrock prompt provider section to notebook 04 - Add GPU requirement warning for BGEReranker - Remove hardcoded ARNs, use placeholder comments - Fix hardcoded region to os.environ in notebook 03 - Remove empty trailing cells, clear all outputs * Fix documentation - Update README Quick Start to correct setup order - Add .env sync instructions after setup script - Update model references to claude-sonnet-4-6 - Add RESPONSE_MODEL and EVALUATION_MODEL to docs - Fix BATCH_ROLE_NAME to bedrock-batch-inference-role - Align DynamoDB table name in batch_processing.md and aws_integration.md - Replace padmin with your-profile in setup docs - Replace ccms-rag-extract with graphrag-toolkit bucket names * Update examples/lexical-graph-hybrid-dev/notebooks/.env.template Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com> * Update examples/lexical-graph-hybrid-dev/README.md Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com> * Remove quotes from env vars, add AWS CLI quickstart link Remove quotes from all env var values in README.md and .env.template. Add link to AWS CLI quickstart guide and aws sts get-caller-identity verification command in prerequisites. * Remove redundant .env paths from .gitignore The bare .env pattern on line 31 already matches .env files in all subdirectories, making the explicit paths for local-dev and hybrid-dev notebooks redundant. * Replace duplicated env vars in aws_integration.md with link to .env.template * Remove duplicate .env configuration sections from README Replace repeated env vars listing and setup instructions with a link to notebooks/.env.template as the single source of truth. * Use brackets for optional profile arg in setup docs * Remove hardcoded profile defaults from setup scripts Remove hardcoded 'default' profile from setup-bedrock-batch.sh and .ps1. Scripts now accept an optional profile argument — when omitted, AWS CLI uses its default credential chain (env vars, instance profile, etc.). - .sh: use PROFILE_ARGS conditional variable - .ps1: use @ProfileArgs splatting - Update README.md and setup-bedrock-batch-doc.md accordingly --------- Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Co-authored-by: zollie <>
Co-authored-by: zollie <>
* update documentation * fix broken mdx files * small comments --------- Co-authored-by: Oussama Hansal <haooussa@amazon.com>
Bumps [astro](https://github.com/withastro/astro/tree/HEAD/packages/astro) from 5.6.2 to 5.18.1. - [Release notes](https://github.com/withastro/astro/releases) - [Changelog](https://github.com/withastro/astro/blob/astro@5.18.1/packages/astro/CHANGELOG.md) - [Commits](https://github.com/withastro/astro/commits/astro@5.18.1/packages/astro) --- updated-dependencies: - dependency-name: astro dependency-version: 5.18.1 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Co-authored-by: Oussama Hansal <haooussa@amazon.com>
The Neptune `create_config` helper constructed a botocore Config without `max_pool_connections`, which falls back to botocore's default of 10. Under the default `CompositeTraversalBasedRetriever` both `ChunkBasedSearch` and `EntityNetworkSearch` run in parallel with `ThreadPoolExecutor(max_workers=num_workers=10)` each, driving up to 20 concurrent Neptune queries and queuing half of them on the pool. Apply a 32-connection default via `setdefault`, giving ~60% headroom above the code-provable peak while preserving explicit overrides via the existing `config=` kwarg on `GraphStoreFactory.for_graph_store`. Co-authored-by: voidwisp <voidwisp@users.noreply.github.com>
When the caller passes a sized iterable (e.g. a list), compute the total batch count once per pipeline run and include it in the per-batch log message as "batch: N/M". True streaming iterators without __len__ keep the existing "batch: N" format. In non-TTY environments (CloudWatch, ECS, CI/CD) the per-batch log is the only progress signal; knowing "batch 7 of 12" versus just "batch 7" is the difference between estimating completion and guessing. Zero extra materialization cost — only len() is probed. Follow-up to #96 (closes #97). Co-authored-by: voidwisp <voidwisp@users.noreply.github.com>
* feat(opensearch): thread client_kwargs through OpenSearchIndex
Expose a `client_kwargs` dict on `OpenSearchIndex.for_index` and
`create_opensearch_vector_client` so callers can override OpenSearch-py
client settings (e.g. `pool_maxsize`, `timeout`, `max_retries`) without
patching the library.
The underlying `create_os_client` / `create_os_async_client` helpers
already accept `**kwargs`, but nothing upstream was passing values in,
so the query clients fell back to urllib3's default pool size of 1.
Under concurrent retrieval this causes "connection pool is full"
warnings and repeated TLS handshakes.
Usage:
VectorStoreFactory.for_vector_store(
uri,
client_kwargs={"pool_maxsize": 10, "timeout": 60},
)
The kwarg is optional; when unset, behaviour is unchanged.
* feat(opensearch): default pool_maxsize to 32
urllib3 falls back to pool_maxsize=1 when not set, which serializes
concurrent queries behind a single TLS connection. 32 is a sane
baseline that matches the typical fan-out of the traversal-based
retrievers. Explicit overrides (including the existing
``pool_maxsize=1`` used by ``index_exists``) still win via
``setdefault``.
* fix(opensearch): make all hardcoded client kwargs overridable via client_kwargs
Addresses reviewer feedback on PR #204. Previously only pool_maxsize
used setdefault; use_ssl, verify_certs, connection_class, timeout,
max_retries, retry_on_timeout were unconditionally hardcoded as
explicit kwargs alongside **kwargs, so any caller-supplied override
for those keys triggered:
TypeError: OpenSearch() got multiple values for keyword argument ...
Move all defaults into a single dict merged with caller kwargs
(caller wins). Same fix applied to both create_os_client and
create_os_async_client. Adds a test per client that exercises
pool_maxsize + timeout override while confirming the remaining
defaults still flow through.
---------
Co-authored-by: voidwisp <voidwisp@users.noreply.github.com>
Co-authored-by: Ian Robinson <ianrob@amazon.com>
Bumps [astro](https://github.com/withastro/astro/tree/HEAD/packages/astro) and [@astrojs/starlight](https://github.com/withastro/starlight/tree/HEAD/packages/starlight). These dependencies needed to be updated together. Updates `astro` from 5.18.1 to 6.1.8 - [Release notes](https://github.com/withastro/astro/releases) - [Changelog](https://github.com/withastro/astro/blob/main/packages/astro/CHANGELOG.md) - [Commits](https://github.com/withastro/astro/commits/astro@6.1.8/packages/astro) Updates `@astrojs/starlight` from 0.32.5 to 0.38.3 - [Release notes](https://github.com/withastro/starlight/releases) - [Changelog](https://github.com/withastro/starlight/blob/main/packages/starlight/CHANGELOG.md) - [Commits](https://github.com/withastro/starlight/commits/@astrojs/starlight@0.38.3/packages/starlight) --- updated-dependencies: - dependency-name: astro dependency-version: 6.1.8 dependency-type: direct:production - dependency-name: "@astrojs/starlight" dependency-version: 0.38.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add documentation site URL and PyPI link to the top of the root * address comments --------- Co-authored-by: Oussama Hansal <haooussa@amazon.com>
--------- Co-authored-by: Oussama Hansal <haooussa@amazon.com>
…cs-site (#219) Co-authored-by: Oussama Hansal <haooussa@amazon.com>
…ph (#213) Bumps [python-dotenv](https://github.com/theskumar/python-dotenv) from 1.1.1 to 1.2.2. - [Release notes](https://github.com/theskumar/python-dotenv/releases) - [Changelog](https://github.com/theskumar/python-dotenv/blob/main/CHANGELOG.md) - [Commits](theskumar/python-dotenv@v1.1.1...v1.2.2) --- updated-dependencies: - dependency-name: python-dotenv dependency-version: 1.2.2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ts (#223) KeywordVSSProvider and GraphSummary are retrieval-time components but incorrectly default to GraphRAGConfig.extraction_llm. When a user sets response_llm to an active model but leaves extraction_llm as default, traversal-based search fails silently at keyword extraction. This changes both components to use response_llm, consistent with KeywordProvider and QueryModeRetriever. Co-authored-by: Abdellah Ghassel <abdghsl@amazon.com>
…oss examples (#272) * Use .env.template pattern for environment configuration across examples Standardize environment variable handling across lexical-graph notebooks, workshop, and byokg-rag examples to follow the pattern established in lexical-graph-local-dev and lexical-graph-hybrid-dev. Changes: - Add .env.template files for lexical-graph/notebooks, workshop, and byokg-rag - Refactor all 5 byokg-rag notebooks to use dotenv instead of hardcoded values - Add python-dotenv to byokg-rag dependencies - Update READMEs with environment setup instructions - Create byokg-rag README with notebook descriptions and required env vars * Update examples/lexical-graph/workshop/readme.md Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com> * Update examples/byokg-rag/README.md Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com> * Update examples/lexical-graph/workshop/.env.template Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com> * Address PR review feedback - Add placeholder value for GRAPH_IDENTIFIER in byokg-rag .env.template - Restore trailing newlines in notebook JSON files - Remove redundant 'fill in your values' text from READMEs - Remove bare GRAPH_STORE= and VECTOR_STORE= lines from .env.templates to avoid unsetting existing environment variables - Move python-dotenv from main requirements to optional dev dependency * Use <YOUR_GRAPH_IDENTIFIER> placeholder style consistent with hybrid-dev * Simplify env config README sections — remove redundant intro text * Move python-dotenv to notebook pip install cells Remove python-dotenv from pyproject.toml dev dependency. Add it to the pip install cells in each byokg-rag notebook instead, since it is only needed for notebook usage, not the library runtime. * Fix cell ordering in local_graph notebook and add python-dotenv to prerequisites - Swap pip install cell before dotenv usage cell in local_graph notebook - Add python-dotenv to README prerequisites for discoverability --------- Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Extend security-scan.yml to scan lexical-graph/src/ with Bandit, matching the existing byokg-rag pattern. Adds lexical-graph/** to trigger paths and a separate bandit job for lexical-graph.
Bumps [actions/github-script](https://github.com/actions/github-script) from 8 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@v8...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 4 to 7. - [Release notes](https://github.com/astral-sh/setup-uv/releases) - [Commits](astral-sh/setup-uv@v4...v7) --- updated-dependencies: - dependency-name: astral-sh/setup-uv dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Updates the requirements on [pre-commit](https://github.com/pre-commit/pre-commit) to permit the latest version. - [Release notes](https://github.com/pre-commit/pre-commit/releases) - [Changelog](https://github.com/pre-commit/pre-commit/blob/main/CHANGELOG.md) - [Commits](pre-commit/pre-commit@v3.5.0...v4.6.0) --- updated-dependencies: - dependency-name: pre-commit dependency-version: 4.6.0 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [softprops/action-gh-release](https://github.com/softprops/action-gh-release) from 2 to 3. - [Release notes](https://github.com/softprops/action-gh-release/releases) - [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md) - [Commits](softprops/action-gh-release@v2...v3) --- updated-dependencies: - dependency-name: softprops/action-gh-release dependency-version: '3' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4 to 6. - [Release notes](https://github.com/actions/setup-node/releases) - [Commits](actions/setup-node@v4...v6) --- updated-dependencies: - dependency-name: actions/setup-node dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
- hash_utils.py: Add usedforsecurity=False to MD5 (used for content IDs, not crypto) - proposition_extractor.py: Add # nosec B615 for HuggingFace downloads (model name is user-configurable, pinning not feasible) - pg_vector_indexes.py: Add # nosec B608 for SQL identifier interpolation (table/column names from internal config, not user input) bandit -r lexical-graph/src/ -ll now exits with code 0 (0 medium+ findings).
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
The validation check used '== 3' which incorrectly rejected exactly 3 parameters (the actual maximum supported by the function signature). Changed to '> 3' to allow the full range of 0-3 parameters. Added unit tests covering all boundary cases: - 0, 1, 2, 3 parameters accepted - 4, 5 parameters rejected - Missing name rejected - Parameters marked required and not hidden
…les fix Bump upload-pages-artifact from v3 to v5 and add include-hidden-files: true to preserve .nojekyll in the docs artifact. Without this flag, v4+ excludes dotfiles by default which could affect GitHub Pages deployment.
Replace 3,402 files with :: in their names (invalid on Windows/NTFS) with ZIP archives (amazon.zip, ecorp.zip, ntsb.zip, wiki-aircraft.zip). This fixes the inability to clone the repository on Windows.
- Add zip_source parameter for explicit ZIP archive reading - Add filename_sanitizer parameter for customizable filesystem-safe names - Include windows_safe_filename helper (replaces :: with __) - Update 2-Querying.ipynb and setup.py to use zip_source - Add Windows compatibility docs to workshop readme - Add 17 unit tests for new functionality
The : character is illegal in Windows/NTFS filenames. The previous implementation only replaced :: with __ but left single : intact, causing NotADirectoryError on Windows. Now replaces :: with __ first, then remaining : with _: aws::abc:1234 → aws__abc_1234
…dows On Windows, open() without encoding defaults to the system codepage (cp1252) which cannot encode Unicode characters like snowman or globe emoji. Adding encoding='utf-8' ensures consistent behavior across platforms.
…rieval, storage, and utils (#300) * [FEATURE] Add unit tests for lexical-graph storage/graph utilities Adds unit test coverage for two undertested modules. Tests only, no source changes. - storage/graph/graph_utils.py: 21.5% -> 97.5% - storage/graph/query_tree.py: 25.6% -> 100% Full-suite coverage: 46.10% -> 46.86% (+488 lines, +71 tests). * [FEATURE] Bump lexical-graph coverage threshold from 45% to 46% Locks in the gain from the new graph_utils and query_tree tests (46.10% -> 46.86%). * [FEATURE] Trim AI-flavoured comments and tighten test names in new graph tests Drop the module/class boilerplate docstrings and restate-the-code inline comments added in the previous coverage commits, keeping only the WHY comments (branch mappings, LIFO ordering). Rename several tests for clarity: fix a typo (dedupd to deduped), drop uppercase AND/OR from identifiers, and reshape the property-based node_id names to describe the outcome instead of the input shape. Fold the dash case into the non-alnum test it overlaps with. * [FEATURE] Bump lexical-graph coverage from 47% to 58% with broad unit-test additions Adds unit tests across retrieval utils, processors, query-context providers, retrievers, storage wrappers, vector index helpers, and shared utils. Pushes the full-suite coverage from 47% to 58.15% (1305 to 1611 tests) and lifts the CI gate from 47% to 56%. Covered modules (each new test file targets one source module): retrieval/utils: statement_utils, chunk_utils, entity_utils, vector_utils, query_decomposition retrieval/processors: zero_scores, truncate_statements, update_chunk_metadata, rerank_statements retrieval/query_context: query_mode, keyword_provider, keyword_nlp_provider, keyword_vss_provider, entity_provider, entity_from_top_statement_provider, entity_vss_provider, entity_context_provider retrieval/retrievers: query_mode_retriever, chunk_cosine_search, chunk_based_search, topic_based_search, entity_based_search, entity_network_search, entity_context_search, composite_traversal_based_retriever, chunk_based_semantic_search, semantic_chunk_beam_search, traversal_based_base_retriever retrieval/summary: graph_summary storage/graph: multi_tenant_graph_store, neptune_graph_stores (helpers + factories) storage/vector: vector_store, multi_tenant_vector_store, read_only_vector_store, neptune_vector_indexes, s3_vector_indexes helpers utils: reranker_utils, bedrock_utils, llm_cache (predict/stream/cache paths) Tests use mocks for external services (graph stores, LLMs, boto3 clients, embedding caches) so no infrastructure is required. pg_vector_indexes and opensearch_vector_indexes are not covered here because psycopg2-binary and llama-index-vector-stores-opensearch are not in the test env; the gate at 56 reflects the actual achievable ceiling on this configuration. * [FEATURE] Remove non-deterministic test_is_unique per reviewer feedback Drop the new_query_var uniqueness assertion. uuid4 collision is theoretically possible (1 in 2^122), so the assertion is not strictly reproducible. The hex body and prefix tests already validate the output shape; an accidental fixed return would still fail those. Per acarbonetto review comment on PR #1. * [FIX] Add license header to retrievers/__init__.py test package The empty __init__.py was failing the license-header-check job, which gates the test job. Recent CI runs showed "unknown%" in the coverage report comment because the artifact never uploaded. * [FEATURE] Address Mykola's review comments on PR #1 test additions * test_reranker_utils: add end-to-end TF-IDF ranking test using the real tfidf_matcher backend (no mocks), confirming relevance ordering on plain English input. * test_rerank_statements: strengthen model and bedrock dispatch tests to verify scores actually land on statements, not just that the scorer was called. * test_entity_searches: add edge-case tests for empty graph-store results, store-side exceptions, and missing entity contexts. * [FEATURE] Enhance tests for composite traversal retriever and neptune vector indexes with pytest fixtures and mock dependencies * [FEATURE] Raise coverage of four lexical-graph modules to >=80% per Andrew's review Add unit tests for the modules acarbonetto flagged below 80% on fork PR #1: rerank_statements (69->99%), keyword_vss_provider (72->100%), entity_context_provider (75->96%), neptune_graph_stores (66->92%). Covers the previously-untested scoring-method bodies in rerank_statements, the topic-content path in keyword_vss_provider, the entity context tree builders in entity_context_provider, and intercept_before_parse plus the NeptuneDatabaseClient query/exception paths called out by Kiro. * [FIX] Return one content string per topic in KeywordVSSProvider._get_topic_content The result-collection loop iterated over each topic's joined statement string character by character, so _get_topic_content returned a list of single characters instead of per-topic content. Append the future result directly and skip topics with no statements.
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Signed-off-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
|
Lexical Graph Coverage Report: The coverage is at 59.55% (target: 80%). Download the HTML report here. |
mykola-pereyma
left a comment
Collaborator
There was a problem hiding this comment.
Review: duplicates, dead code, and minor issues flagged inline.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
N/A
Description of changes:
Add an integration test that benchmarks a small prototype dataset.
This PR adds three tests
CuadBenchmarkEvaluate,CuadBenchmarkQuery, andCuadBenchmarkBuild, and a small prototype dataset for runnings.Test Results:
02-CuadBenchmarkEvaluate.PASS.json
01-CuadBenchmarkQuery.PASS.json
00-CuadBenchmarkBuild.PASS.json
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.