-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: audio transcription API in LLM gateway #44017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 3 files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 2 comments
| "file": file_tuple, | ||
| # Use JSON to collect input/output tokens for billing | ||
| # Other formats are not supported yet ("text", "srt", "verbose_json", "vtt") | ||
| "response_format": "json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: hardcoded response format to json limits flexibility. the openai transcription api supports multiple formats (text, srt, verbose_json, vtt) that users might need
| "response_format": "json", | |
| "response_format": data.get("response_format", "json"), |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/api/llm_gateway/http.py
Line: 410:410
Comment:
**style:** hardcoded response format to `json` limits flexibility. the openai transcription api supports multiple formats (`text`, `srt`, `verbose_json`, `vtt`) that users might need
```suggestion
"response_format": data.get("response_format", "json"),
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| class TranscriptionRequestSerializer(serializers.Serializer): | ||
| model = serializers.ChoiceField( | ||
| choices=["gpt-4o-transcribe", "gpt-4o-mini-transcribe", "whisper-1"], | ||
| default="gpt-4o-transcribe", | ||
| help_text="Transcription model", | ||
| ) | ||
| prompt = serializers.CharField( | ||
| required=False, | ||
| help_text="Optional text prompt to guide the style, vocabulary or continue a previous audio segment", | ||
| ) | ||
| language = serializers.CharField( | ||
| required=False, | ||
| help_text="Language of the input audio in ISO-639-1 format. See https://github.com/openai/whisper#available-models-and-languages", | ||
| ) | ||
| temperature = serializers.FloatField( | ||
| required=False, | ||
| min_value=0.0, | ||
| max_value=1.0, | ||
| help_text="Optional temperature between 0 and 1", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: missing file field validation in serializer. the audio_file validation happens in the view (line 392-397 in http.py), but the serializer should validate file uploads
| class TranscriptionRequestSerializer(serializers.Serializer): | |
| model = serializers.ChoiceField( | |
| choices=["gpt-4o-transcribe", "gpt-4o-mini-transcribe", "whisper-1"], | |
| default="gpt-4o-transcribe", | |
| help_text="Transcription model", | |
| ) | |
| prompt = serializers.CharField( | |
| required=False, | |
| help_text="Optional text prompt to guide the style, vocabulary or continue a previous audio segment", | |
| ) | |
| language = serializers.CharField( | |
| required=False, | |
| help_text="Language of the input audio in ISO-639-1 format. See https://github.com/openai/whisper#available-models-and-languages", | |
| ) | |
| temperature = serializers.FloatField( | |
| required=False, | |
| min_value=0.0, | |
| max_value=1.0, | |
| help_text="Optional temperature between 0 and 1", | |
| ) | |
| class TranscriptionRequestSerializer(serializers.Serializer): | |
| file = serializers.FileField( | |
| required=True, | |
| help_text="Audio file to transcribe (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm)", | |
| ) | |
| model = serializers.ChoiceField( | |
| choices=["gpt-4o-transcribe", "gpt-4o-mini-transcribe", "whisper-1"], | |
| default="gpt-4o-transcribe", | |
| help_text="Transcription model", | |
| ) | |
| prompt = serializers.CharField( | |
| required=False, | |
| help_text="Optional text prompt to guide the style, vocabulary or continue a previous audio segment", | |
| ) | |
| language = serializers.CharField( | |
| required=False, | |
| help_text="Language of the input audio in ISO-639-1 format. See https://github.com/openai/whisper#available-models-and-languages", | |
| ) | |
| temperature = serializers.FloatField( | |
| required=False, | |
| min_value=0.0, | |
| max_value=1.0, | |
| help_text="Optional temperature between 0 and 1", | |
| ) |
does the MultiPartParser already handle file validation before the serializer, making this suggestion unnecessary?
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/api/llm_gateway/serializers.py
Line: 167:186
Comment:
**style:** missing `file` field validation in serializer. the `audio_file` validation happens in the view (line 392-397 in http.py), but the serializer should validate file uploads
```suggestion
class TranscriptionRequestSerializer(serializers.Serializer):
file = serializers.FileField(
required=True,
help_text="Audio file to transcribe (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm)",
)
model = serializers.ChoiceField(
choices=["gpt-4o-transcribe", "gpt-4o-mini-transcribe", "whisper-1"],
default="gpt-4o-transcribe",
help_text="Transcription model",
)
prompt = serializers.CharField(
required=False,
help_text="Optional text prompt to guide the style, vocabulary or continue a previous audio segment",
)
language = serializers.CharField(
required=False,
help_text="Language of the input audio in ISO-639-1 format. See https://github.com/openai/whisper#available-models-and-languages",
)
temperature = serializers.FloatField(
required=False,
min_value=0.0,
max_value=1.0,
help_text="Optional temperature between 0 and 1",
)
```
does the MultiPartParser already handle file validation before the serializer, making this suggestion unnecessary?
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
🦔 Preview instance✅ Preview deployment ready 🌐 Access the instanceURL: https://do-ci-hobby-pr-44017.posthog.cc SSH: IP: Mode: 🔄 Preview (persistent) Full instance detailsDeployment output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="posthog/api/llm_gateway/serializers.py">
<violation number="1" location="posthog/api/llm_gateway/serializers.py:168">
P2: Consider adding a file size validator to prevent large file uploads that could cause memory issues. OpenAI's transcription API limits files to 25MB, so enforcing this at the serializer level is recommended.</violation>
</file>
Reply to cubic to teach it or ask questions. Tag @cubic-dev-ai to re-run a review.
|
|
||
|
|
||
| class TranscriptionRequestSerializer(serializers.Serializer): | ||
| file = serializers.FileField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Consider adding a file size validator to prevent large file uploads that could cause memory issues. OpenAI's transcription API limits files to 25MB, so enforcing this at the serializer level is recommended.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At posthog/api/llm_gateway/serializers.py, line 168:
<comment>Consider adding a file size validator to prevent large file uploads that could cause memory issues. OpenAI's transcription API limits files to 25MB, so enforcing this at the serializer level is recommended.</comment>
<file context>
@@ -165,6 +165,10 @@ class ErrorResponseSerializer(serializers.Serializer):
class TranscriptionRequestSerializer(serializers.Serializer):
+ file = serializers.FileField(
+ required=True,
+ help_text="Audio file to transcribe (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm)",
</file context>
Problem
To deploy the PostHog mobile app to TestFlight (PR: PostHog/Array#224), audio transcription needs to be moved from the client to the backend.
For context: example of the OpenAI Transcription API response:
Docs: https://platform.openai.com/docs/guides/speech-to-text
Changes
/v1/audio/transcriptionsendpoint to LLM Gatewayposthogcallbackgpt-4o-transcribe,whisper-1)How did you test this code?
Manually tested via mobile app voice recording
Screen.Recording.2025-12-26.at.20.29.41.mov