Skip to content

feat: support word-level timestamps for faster-whisper#9621

Open
eglia wants to merge 1 commit intomudler:masterfrom
eglia:stt-wordlevel
Open

feat: support word-level timestamps for faster-whisper#9621
eglia wants to merge 1 commit intomudler:masterfrom
eglia:stt-wordlevel

Conversation

@eglia
Copy link
Copy Markdown
Contributor

@eglia eglia commented Apr 30, 2026

Description
This PR extends the transcribe endpoints to also include word timestamps. Currently this is only implemented for the faster-whipser backend, but could also be added for other backends if supported by them. I tried to add whisper.cpp, but apparently there is some outstanding issue.
The individual words are returned twice, once as part of each segment, and once as a top level element. The top level element is to comply with the OpenAI spec, the words per segment to not lose any information which might be useful.

Additionally, the endpoint is adapted to return timestamps in seconds to comply with the OpenAI spec.

This PR fixes #9306

Notes for Reviewers

Signed commits

  • Yes, I signed my commits.

Signed-off-by: Andreas Egli <github@kharan.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[whisper] word timings in verbose_json

2 participants