Skip to content

Fix ILIAS/INQUIRE evaluation workflow: configurable paths, format mismatch, and sample data generator#5

Open
Copilot wants to merge 2 commits intomainfrom
copilot/create-sample-dataset-for-evaluation
Open

Fix ILIAS/INQUIRE evaluation workflow: configurable paths, format mismatch, and sample data generator#5
Copilot wants to merge 2 commits intomainfrom
copilot/create-sample-dataset-for-evaluation

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 11, 2026

Users had no clear path to evaluate on ILIAS/INQUIRE: compute_embeds.py had hardcoded researcher-specific paths, eval_retrieval.py silently broke when loading distractor embeddings due to a key mismatch, and there was no way to test the pipeline without downloading large datasets.

compute_embeds.py — remove hardcoded paths

  • --shard_dir and --out_dir are now required CLI arguments (were hardcoded to /u/ericx003/data/ilias/yfcc100m and ./yfcc_embeds)
  • Added module docstring with output format spec and example commands for both ILIAS (YFCC100M) and INQUIRE (iNaturalist)

eval_retrieval.py — fix distractor loading format mismatch

compute_embeds.py writes {"keys": ..., "embeddings": ...} but the distractor loader called data['image_embeddings'], causing a KeyError at eval time.

Added load_distractor_embeddings_from_dir() that accepts both formats:

# compute_embeds.py output → "embeddings" key
# precompute_embeddings.py output → "image_embeddings" key

Explicit .float() cast on distractor tensors since compute_embeds.py defaults to fp16 storage.

create_sample_dataset.py — new synthetic data generator

Generates minimal WebDataset tar shards for smoke-testing the pipeline end-to-end without large downloads:

# Image-only shards for compute_embeds.py (mimics YFCC100M / iNaturalist)
python create_sample_dataset.py --mode distractors --out_dir ./sample/distractors --n_images 200

# Image+caption shards for precompute_embeddings.py (mimics ILIAS-core / INQUIRE queries)
python create_sample_dataset.py --mode pairs --out_dir ./sample/pairs --n_images 50 --captions_per_image 3

README.md — evaluation documentation

Expanded evaluation section with a staged walkthrough (distractor embeddings → paired embeddings → retrieval eval) for both ILIAS and INQUIRE, plus a complete smoke-test example using create_sample_dataset.py.

Copilot AI linked an issue Apr 11, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Add sample dataset for evaluating ILIAS and INQUIRE Fix ILIAS/INQUIRE evaluation workflow: configurable paths, format mismatch, and sample data generator Apr 11, 2026
Copilot AI requested a review from jacobsn April 11, 2026 11:36
@jacobsn jacobsn requested review from EricX003 and removed request for jacobsn April 11, 2026 12:39
@jacobsn jacobsn marked this pull request as ready for review April 11, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Evaluation on ILIAS and INQUIRE

2 participants