Skip to content

[Benchmarking]: Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow and PGA#282

Open
oussamahansal wants to merge 4 commits into
acarbo/poc-benchmark-cuadfrom
poc-benchmark-concurrentqa
Open

[Benchmarking]: Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow and PGA#282
oussamahansal wants to merge 4 commits into
acarbo/poc-benchmark-cuadfrom
poc-benchmark-concurrentqa

Conversation

@oussamahansal
Copy link
Copy Markdown
Collaborator

@oussamahansal oussamahansal commented May 21, 2026

Issue #, if available:

Description of changes:

  • Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow, PGA

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@oussamahansal oussamahansal marked this pull request as draft May 21, 2026 16:28
Comment thread integration-tests/benchmark.wikihow
@oussamahansal oussamahansal changed the title Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow and PGA May 21, 2026
@mykola-pereyma
Copy link
Copy Markdown
Collaborator

Please extend description:

  • Required env vars: GRAPH_STORE, VECTOR_STORE, BENCHMARK_DATA_S3_URI, AWS_REGION_NAME, S3_RESULTS_BUCKET, S3_RESULTS_PREFIX, BATCH_INFERENCE_ROLE
    • Pipeline flow: extract → build → query → evaluate
    • How to run prototype mode: set BENCHMARK_IS_PROTOTYPE=true
    • What metrics are produced: correctness, idk (LLM-as-judge)

Comment thread integration-tests/test-scripts/graphrag_toolkit_tests/benchmark_build.py Outdated
Comment thread integration-tests/test-scripts/graphrag_toolkit_tests/benchmark_query.py Outdated
Comment thread integration-tests/test-scripts/graphrag_toolkit_tests/benchmark_extract.py Outdated
Comment thread integration-tests/test-scripts/graphrag_toolkit_tests/benchmark_extract.py Outdated
Comment thread integration-tests/test-scripts/graphrag_toolkit_tests/benchmark_extract.py Outdated
Comment thread integration-tests/test-scripts/graphrag_toolkit_tests/benchmark_build.py Outdated
Comment thread integration-tests/test-scripts/graphrag_toolkit_tests/benchmark_extract.py Outdated
Comment thread integration-tests/test-scripts/graphrag_toolkit_tests/benchmark_query.py Outdated
@oussamahansal oussamahansal marked this pull request as ready for review May 28, 2026 16:24
@oussamahansal oussamahansal changed the title Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow and PGA [Benchmarking]: Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow and PGA May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants