[Benchmarking]: Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow and PGA by oussamahansal · Pull Request #282 · awslabs/graphrag-toolkit

oussamahansal · 2026-05-21T16:28:45Z

Issue #, if available:

Description of changes:

Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow, PGA

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

mykola-pereyma · 2026-05-21T22:28:56Z

Please extend description:

Required env vars: GRAPH_STORE, VECTOR_STORE, BENCHMARK_DATA_S3_URI, AWS_REGION_NAME, S3_RESULTS_BUCKET, S3_RESULTS_PREFIX, BATCH_INFERENCE_ROLE
- Pipeline flow: extract → build → query → evaluate
- How to run prototype mode: set BENCHMARK_IS_PROTOTYPE=true
- What metrics are produced: correctness, idk (LLM-as-judge)

Oussama Hansal added 2 commits May 12, 2026 09:51

benchmarking concurrentQA dataset

5be3bff

concurrentQA and wikihow becnhmarking

cefecc4

oussamahansal marked this pull request as draft May 21, 2026 16:28

acarbonetto reviewed May 21, 2026

View reviewed changes

Comment thread integration-tests/benchmark.wikihow

acarbonetto approved these changes May 21, 2026

View reviewed changes

add pga dataset

82128e0

oussamahansal changed the title ~~Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow~~ Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow and PGA May 21, 2026

mykola-pereyma requested changes May 21, 2026

View reviewed changes

adress comments

0e43dba

oussamahansal requested a review from mykola-pereyma May 22, 2026 17:47

mykola-pereyma approved these changes May 26, 2026

View reviewed changes

oussamahansal marked this pull request as ready for review May 28, 2026 16:24

oussamahansal changed the title ~~Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow and PGA~~ [Benchmarking]: Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow and PGA May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmarking]: Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow and PGA#282

[Benchmarking]: Benchmarking prototype - dataset: CUAD, ConcurrentQA, Wikihow and PGA#282
oussamahansal wants to merge 4 commits into
acarbo/poc-benchmark-cuadfrom
poc-benchmark-concurrentqa

oussamahansal commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

mykola-pereyma commented May 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

oussamahansal commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mykola-pereyma commented May 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oussamahansal commented May 21, 2026 •

edited

Loading