Skip to content

STT Evaluation: Gemini Files API instead of signed URLs#685

Open
AkhileshNegi wants to merge 6 commits intomainfrom
bugfix/stt-evaluation
Open

STT Evaluation: Gemini Files API instead of signed URLs#685
AkhileshNegi wants to merge 6 commits intomainfrom
bugfix/stt-evaluation

Conversation

@AkhileshNegi
Copy link
Collaborator

@AkhileshNegi AkhileshNegi commented Mar 18, 2026

Summary

Target issue is #672

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

  • New Features

    • Added audio file upload and deletion for batch STT processing.
    • Automatic cleanup of uploaded audio files after batch jobs complete.
  • Improvements

    • Replaced signed-URL flow with direct file uploads for more reliable handling.

Testing

Tested with 53 files dataset for STT Evaluation and all transcription were successfull

image

@AkhileshNegi AkhileshNegi self-assigned this Mar 18, 2026
@AkhileshNegi AkhileshNegi added the enhancement New feature or request label Mar 18, 2026
@AkhileshNegi AkhileshNegi marked this pull request as ready for review March 18, 2026 06:31
@AkhileshNegi AkhileshNegi linked an issue Mar 18, 2026 that may be closed by this pull request
@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

Refactors STT batch flow to upload audio to the Gemini File API (instead of generating signed S3 URLs), adds helpers to upload and delete Gemini files, changes create_stt_batch_requests to accept file URIs and explicit MIME types, and performs post-run Gemini-file cleanup after batch completion.

Changes

Cohort / File(s) Summary
Gemini core & API
backend/app/core/batch/gemini.py
Added _upload_to_gemini, upload_audio_file(content, mime_type, display_name) -> (file_name, file_uri), and delete_files(file_names) -> (success_count, failure_count). Refactored upload_file() to reuse helper. Replaced URL-based MIME detection with explicit mime_type handling via mimetypes. Updated create_stt_batch_requests(file_uris, mime_types, prompt, keys=None) to accept file URIs and mime types and validate list lengths.
STT batch orchestration
backend/app/crud/stt_evaluations/batch.py
Removed signed-URL generation; now downloads audio from S3, determines MIME, uploads audio to Gemini (thread pool), and collects file_uris, mime_types, and gemini_file_names. Removed signed_url_expires_in param. Persisted Gemini file IDs/names in batch config for cleanup and updated logs/metrics to reflect uploads.
Cleanup & polling
backend/app/crud/stt_evaluations/cron.py
After batch jobs become terminal, collects gemini_audio_file_ids from job configs and calls delete_files via asyncio.to_thread for best-effort cleanup; logs deletion outcomes and continues finalization even on deletion failures.
Tests
backend/app/tests/core/batch/test_gemini.py
Refactored fixtures to mock Gemini client; added tests for upload_audio_file (success/error/default display name) and delete_files (all-success/partial-failure/empty). Updated tests for create_stt_batch_requests to use file_uris + mime_types; removed URL-based MIME inference tests and adjusted assertions.

Sequence Diagram

sequenceDiagram
    participant App as STT Batch Service
    participant S3 as AWS S3
    participant GeminiFile as Gemini File API
    participant GeminiBatch as Gemini Batch API

    App->>S3: Download audio bytes
    S3-->>App: Audio bytes + inferred MIME

    App->>GeminiFile: Upload audio (bytes, mime_type)
    GeminiFile-->>App: file_uri, file_name

    Note over App: Aggregate file_uris & mime_types (thread pool)

    App->>GeminiBatch: create_stt_batch_requests(file_uris, mime_types, prompt)
    GeminiBatch-->>App: Batch request payloads / job IDs

    App->>GeminiBatch: Submit & poll jobs
    GeminiBatch-->>App: STT results

    App->>GeminiFile: delete_files(gemini_audio_file_ids)
    GeminiFile-->>App: deletion confirmations / partial failures
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Evaluation: STT #571 — Modifies GeminiBatchProvider methods and create_stt_batch_requests shape; closely related changes to Gemini upload/URI handling.

Suggested reviewers

  • kartpop
  • vprashrex

Poem

🐰 Hopping bytes to Gemini's door,
I upload sounds and zip once more.
Jobs hum softly, files depart—
I tidy crumbs with a brave heart.
Thump-thump, the pipeline leaps with glee!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: replacing signed URLs with Gemini Files API for STT evaluation, which is the core refactoring across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/stt-evaluation
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 79.34783% with 38 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/crud/stt_evaluations/batch.py 25.71% 26 Missing ⚠️
backend/app/crud/stt_evaluations/cron.py 7.69% 12 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/app/crud/stt_evaluations/batch.py (1)

97-134: Add type hint for mime_type in tuple return and handle potential None value.

The _upload_to_gemini function returns mime_type which can be None on error, but on success at line 145, mime_types.append(mime_type) appends it without a null check. While the conditional if file_uri: guards this path, the type system doesn't guarantee mime_type is non-None when file_uri is truthy.

Consider adding an explicit check for defensive coding:

🛡️ Suggested defensive check
         for completed_task in as_completed(upload_tasks):
             sample, file_uri, file_name, mime_type, error = completed_task.result()
-            if file_uri:
+            if file_uri and mime_type and file_name:
                 file_uris.append(file_uri)
                 mime_types.append(mime_type)
                 sample_keys.append(str(sample.id))
                 gemini_file_names.append(file_name)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/stt_evaluations/batch.py` around lines 97 - 134, Update the
_upload_to_gemini return type to explicitly mark mime_type as optional (e.g.,
tuple[STTSample, str | None, str | None, str | None, str | None]) and ensure
callers treat mime_type as possibly None: when handling the returned (sample,
file_uri, file_name, mime_type, error) from _upload_to_gemini, only append
mime_type to mime_types (or use a safe fallback like "audio/mpeg") after
checking mime_type is not None and file_uri is truthy; adjust the code paths
around upload_provider.upload_audio_file, file_uri checks and mime_types.append
to perform this defensive null check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/app/crud/stt_evaluations/batch.py`:
- Around line 97-134: Update the _upload_to_gemini return type to explicitly
mark mime_type as optional (e.g., tuple[STTSample, str | None, str | None, str |
None, str | None]) and ensure callers treat mime_type as possibly None: when
handling the returned (sample, file_uri, file_name, mime_type, error) from
_upload_to_gemini, only append mime_type to mime_types (or use a safe fallback
like "audio/mpeg") after checking mime_type is not None and file_uri is truthy;
adjust the code paths around upload_provider.upload_audio_file, file_uri checks
and mime_types.append to perform this defensive null check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4e89169a-6920-4f23-8db5-666d9930b908

📥 Commits

Reviewing files that changed from the base of the PR and between 9eab012 and 5e87f10.

📒 Files selected for processing (4)
  • backend/app/core/batch/gemini.py
  • backend/app/crud/stt_evaluations/batch.py
  • backend/app/crud/stt_evaluations/cron.py
  • backend/app/tests/core/batch/test_gemini.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/crud/stt_evaluations/batch.py (1)

196-234: ⚠️ Potential issue | 🟠 Major

Clean up uploaded Gemini files when batch submission never succeeds.

If every start_batch_job() call raises, gemini_file_names never get written into any batch job config, so the later cleanup in poll_stt_run() has nothing to delete. That leaves all uploaded audio orphaned until Gemini’s TTL expires.

♻️ Proposed fix
-    if not batch_jobs:
-        raise Exception("Batch submission failed for all models")
+    if not batch_jobs:
+        if gemini_file_names:
+            upload_provider.delete_files(gemini_file_names)
+        raise Exception("Batch submission failed for all models")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/stt_evaluations/batch.py` around lines 196 - 234, The loop
can fail for every model leaving gemini_file_names orphaned; after the loop when
batch_jobs is empty, ensure you clean up the uploaded Gemini files before
raising. Add a call (and log) to the existing cleanup helper—or implement a
small cleanup routine—to delete gemini_file_names (referencing
gemini_file_names) in the same function (where batch_jobs and first_batch_job_id
are checked) so poll_stt_run() has nothing left to delete later; do this cleanup
just before raising the Exception to avoid leaving uploaded audio behind.
🧹 Nitpick comments (1)
backend/app/tests/core/batch/test_gemini.py (1)

15-24: Please make the shared pytest fixtures typed factories.

These new fixtures return concrete instances directly and omit annotations, which makes the setup less reusable across cases and doesn’t match the repo’s testing conventions.

As per coding guidelines, **/*.py: Always add type hints to all function parameters and return values in Python code; backend/app/tests/**/*.py: Use factory pattern for test fixtures in backend/app/tests/.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/core/batch/test_gemini.py` around lines 15 - 24, The
fixtures currently return concrete instances without type hints or factory
flexibility; change both to typed factory fixtures: make mock_genai_client a
pytest fixture that returns a Callable[[], MagicMock] with explicit typing (from
typing import Callable) that when called creates and returns a MagicMock, and
make provider a pytest fixture that returns a Callable[[MagicMock],
GeminiBatchProvider] (or Callable[[], GeminiBatchProvider] if you prefer it to
construct its own mock) with full type hints and a docstring; update signatures
to include return types and parameter types, reference the existing symbols
mock_genai_client, provider, GeminiBatchProvider, MagicMock and pytest.fixture
so tests can call the factory to get fresh instances instead of reusing a single
concrete object.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/core/batch/gemini.py`:
- Around line 345-350: In the call to self._upload_to_gemini within batch
processing (the uploaded_file assignment), change the invalid mime_type="jsonl"
to mime_type="application/json" so the Gemini Files API accepts the upload;
update the mime_type argument passed into _upload_to_gemini in that call site
(uploaded_file = self._upload_to_gemini(...)) to use "application/json".

In `@backend/app/crud/stt_evaluations/batch.py`:
- Around line 154-176: The current loop logs failed Gemini uploads but drops
those samples, causing missing STTResult rows; update the upload-failure path in
start_stt_evaluation_batch so that for each entry in failed_samples you persist
a failure STTResult (or equivalent model) tied to the sample and run with the
error message/state instead of silently skipping them; use the same persistence
API used by process_completed_stt_batch (or the STTResult creation helper) so
downstream logic sees explicit failed results, and only raise/abort if you
prefer the alternative behavior (document choice) rather than leaving samples
unrecorded.

---

Outside diff comments:
In `@backend/app/crud/stt_evaluations/batch.py`:
- Around line 196-234: The loop can fail for every model leaving
gemini_file_names orphaned; after the loop when batch_jobs is empty, ensure you
clean up the uploaded Gemini files before raising. Add a call (and log) to the
existing cleanup helper—or implement a small cleanup routine—to delete
gemini_file_names (referencing gemini_file_names) in the same function (where
batch_jobs and first_batch_job_id are checked) so poll_stt_run() has nothing
left to delete later; do this cleanup just before raising the Exception to avoid
leaving uploaded audio behind.

---

Nitpick comments:
In `@backend/app/tests/core/batch/test_gemini.py`:
- Around line 15-24: The fixtures currently return concrete instances without
type hints or factory flexibility; change both to typed factory fixtures: make
mock_genai_client a pytest fixture that returns a Callable[[], MagicMock] with
explicit typing (from typing import Callable) that when called creates and
returns a MagicMock, and make provider a pytest fixture that returns a
Callable[[MagicMock], GeminiBatchProvider] (or Callable[[], GeminiBatchProvider]
if you prefer it to construct its own mock) with full type hints and a
docstring; update signatures to include return types and parameter types,
reference the existing symbols mock_genai_client, provider, GeminiBatchProvider,
MagicMock and pytest.fixture so tests can call the factory to get fresh
instances instead of reusing a single concrete object.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6f9b617a-e689-4871-b3db-46c3de3626e7

📥 Commits

Reviewing files that changed from the base of the PR and between 5e87f10 and 447fc91.

📒 Files selected for processing (4)
  • backend/app/core/batch/gemini.py
  • backend/app/crud/stt_evaluations/batch.py
  • backend/app/crud/stt_evaluations/cron.py
  • backend/app/tests/core/batch/test_gemini.py

Comment on lines +154 to +176
for completed_task in as_completed(upload_tasks):
result = completed_task.result()
if result.file_uri:
file_uris.append(result.file_uri)
mime_types.append(result.mime_type)
sample_keys.append(str(result.sample.id))
gemini_file_names.append(result.file_name)
else:
failed_samples.append((sample, error))
failed_samples.append((result.sample, result.error))
logger.error(
f"[start_stt_evaluation_batch] Failed to generate signed URL | "
f"sample_id: {sample.id}, error: {error}"
f"[start_stt_evaluation_batch] Failed to upload to Gemini | "
f"sample_id: {result.sample.id}, error: {result.error}"
)

if failed_samples:
logger.warning(
f"[start_stt_evaluation_batch] Signed URL failures | "
f"[start_stt_evaluation_batch] Gemini upload failures | "
f"run_id: {run.id}, failed_count: {len(failed_samples)}, "
f"succeeded_count: {len(signed_urls)}"
f"succeeded_count: {len(file_uris)}"
)

if not signed_urls:
raise Exception("Failed to generate signed URLs for any audio files")
if not file_uris:
raise Exception("Failed to upload audio files to Gemini for any samples")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t silently drop samples whose Gemini upload fails.

This path only logs failed uploads and then builds the batch from the successful subset. Since process_completed_stt_batch() later creates STTResult rows only for returned batch responses, those samples never get a persisted failure and the run can finish “completed” with missing results. Either create failed result rows here or abort the run when any upload fails.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/stt_evaluations/batch.py` around lines 154 - 176, The
current loop logs failed Gemini uploads but drops those samples, causing missing
STTResult rows; update the upload-failure path in start_stt_evaluation_batch so
that for each entry in failed_samples you persist a failure STTResult (or
equivalent model) tied to the sample and run with the error message/state
instead of silently skipping them; use the same persistence API used by
process_completed_stt_batch (or the STTResult creation helper) so downstream
logic sees explicit failed results, and only raise/abort if you prefer the
alternative behavior (document choice) rather than leaving samples unrecorded.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/core/batch/gemini.py (1)

504-518: ⚠️ Potential issue | 🟡 Minor

Clarify URI vs file-name examples in STT request docs.

The docstring says file_uris but example values like "files/abc123" look like file names, not URIs. Please align wording/examples to avoid misuse.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/core/batch/gemini.py` around lines 504 - 518, The docstring for
create_stt_batch_requests is inconsistent about whether parameter file_uris are
full URIs or simple file names; update the docstring to be explicit and align
the example: either (A) state that file_uris must be Gemini-style relative URIs
(e.g., "files/abc123") and keep the example, or (B) change the example to a full
URI (e.g., "https://.../files/abc123") if you intend full URLs. Edit the
parameter description for file_uris and the example block so they consistently
show the chosen format and ensure mime_types and prompt wording remain
unchanged.
♻️ Duplicate comments (2)
backend/app/crud/stt_evaluations/batch.py (1)

154-176: ⚠️ Potential issue | 🟠 Major

Don’t silently drop samples when Gemini upload fails.

This path logs upload failures but still submits batches for the successful subset only. Those failed samples won’t produce STTResult rows, so runs can complete with missing per-sample outcomes. Persist explicit failed results for failed_samples (or fail the run before submission).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/stt_evaluations/batch.py` around lines 154 - 176, The
current loop logs Gemini upload failures into failed_samples but then proceeds
to submit only the successful uploads, leaving missing STTResult rows; update
the post-upload handling in start_stt_evaluation_batch to persist explicit
failure results for each entry in failed_samples (or alternatively abort the
run): iterate failed_samples and create a failed STTResult (or call the existing
helper that writes STTResult rows) for each sample with status/error fields
populated (include sample id, run id, error message from result.error and an
appropriate failure status), and ensure any downstream submission logic uses the
persisted results (or raise an exception to stop the run) so failures are not
silently dropped.
backend/app/core/batch/gemini.py (1)

347-351: ⚠️ Potential issue | 🟠 Major

Use a supported MIME type for JSONL uploads.

mime_type="jsonl" is likely to fail Files API MIME validation; use a standard type (application/json) for batch JSONL uploads.

Suggested fix
             uploaded_file = self._upload_to_gemini(
                 content=content,
                 suffix=".jsonl",
-                mime_type="jsonl",
+                mime_type="application/json",
                 display_name=f"batch-input-{int(time.time())}",
             )
Official Gemini Files API docs: is `mime_type="jsonl"` supported in UploadFileConfig, and what MIME type is recommended for JSONL batch input files?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/core/batch/gemini.py` around lines 347 - 351, The upload uses an
unsupported mime_type string "jsonl" which may fail the Gemini Files API
validation; update the call to _upload_to_gemini (where uploaded_file is
created) to pass a standard MIME type such as "application/json" (keep
suffix=".jsonl" and display_name as-is) so the Files API accepts the JSONL batch
input.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@backend/app/core/batch/gemini.py`:
- Around line 504-518: The docstring for create_stt_batch_requests is
inconsistent about whether parameter file_uris are full URIs or simple file
names; update the docstring to be explicit and align the example: either (A)
state that file_uris must be Gemini-style relative URIs (e.g., "files/abc123")
and keep the example, or (B) change the example to a full URI (e.g.,
"https://.../files/abc123") if you intend full URLs. Edit the parameter
description for file_uris and the example block so they consistently show the
chosen format and ensure mime_types and prompt wording remain unchanged.

---

Duplicate comments:
In `@backend/app/core/batch/gemini.py`:
- Around line 347-351: The upload uses an unsupported mime_type string "jsonl"
which may fail the Gemini Files API validation; update the call to
_upload_to_gemini (where uploaded_file is created) to pass a standard MIME type
such as "application/json" (keep suffix=".jsonl" and display_name as-is) so the
Files API accepts the JSONL batch input.

In `@backend/app/crud/stt_evaluations/batch.py`:
- Around line 154-176: The current loop logs Gemini upload failures into
failed_samples but then proceeds to submit only the successful uploads, leaving
missing STTResult rows; update the post-upload handling in
start_stt_evaluation_batch to persist explicit failure results for each entry in
failed_samples (or alternatively abort the run): iterate failed_samples and
create a failed STTResult (or call the existing helper that writes STTResult
rows) for each sample with status/error fields populated (include sample id, run
id, error message from result.error and an appropriate failure status), and
ensure any downstream submission logic uses the persisted results (or raise an
exception to stop the run) so failures are not silently dropped.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 18a9403c-08ef-4940-87c0-243f4819a32e

📥 Commits

Reviewing files that changed from the base of the PR and between 447fc91 and e114095.

📒 Files selected for processing (3)
  • backend/app/core/batch/gemini.py
  • backend/app/crud/stt_evaluations/batch.py
  • backend/app/crud/stt_evaluations/cron.py

@AkhileshNegi AkhileshNegi requested a review from kartpop March 18, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

STT Evaluation: Debug STT errors in ATREE evaluation run

1 participant