Skip to content

Conversation

@a-lider
Copy link
Contributor

@a-lider a-lider commented Dec 26, 2025

Problem

To deploy the PostHog mobile app to TestFlight (PR: PostHog/Array#224), audio transcription needs to be moved from the client to the backend.

For context: example of the OpenAI Transcription API response:

Transcription(
    text="Audio recording...",
    usage=UsageTokens(
        input_tokens=163,           # Total input (audio + text prompt)
        output_tokens=26,           # Transcribed text tokens
        total_tokens=189,
        input_token_details=UsageTokensInputTokenDetails(
            audio_tokens=163,       # Audio portion
            text_tokens=0           # Text prompt portion
        )
    )
)

Docs: https://platform.openai.com/docs/guides/speech-to-text

Changes

  • Added /v1/audio/transcriptions endpoint to LLM Gateway
  • Token-based billing via LiteLLM posthog callback
  • Added manual cost definitions for OpenAI transcription models in manual-providers.ts
    • OpenRouter models costs data doesn't include speech-to-text models (e.g. gpt-4o-transcribe, whisper-1)

How did you test this code?

Manually tested via mobile app voice recording

Screen.Recording.2025-12-26.at.20.29.41.mov

@assign-reviewers-posthog assign-reviewers-posthog bot requested review from a team December 26, 2025 19:31
@a-lider a-lider requested a review from a team December 26, 2025 19:31
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

"file": file_tuple,
# Use JSON to collect input/output tokens for billing
# Other formats are not supported yet ("text", "srt", "verbose_json", "vtt")
"response_format": "json",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: hardcoded response format to json limits flexibility. the openai transcription api supports multiple formats (text, srt, verbose_json, vtt) that users might need

Suggested change
"response_format": "json",
"response_format": data.get("response_format", "json"),

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/api/llm_gateway/http.py
Line: 410:410

Comment:
**style:** hardcoded response format to `json` limits flexibility. the openai transcription api supports multiple formats (`text`, `srt`, `verbose_json`, `vtt`) that users might need

```suggestion
                "response_format": data.get("response_format", "json"),
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 167 to 186
class TranscriptionRequestSerializer(serializers.Serializer):
model = serializers.ChoiceField(
choices=["gpt-4o-transcribe", "gpt-4o-mini-transcribe", "whisper-1"],
default="gpt-4o-transcribe",
help_text="Transcription model",
)
prompt = serializers.CharField(
required=False,
help_text="Optional text prompt to guide the style, vocabulary or continue a previous audio segment",
)
language = serializers.CharField(
required=False,
help_text="Language of the input audio in ISO-639-1 format. See https://github.com/openai/whisper#available-models-and-languages",
)
temperature = serializers.FloatField(
required=False,
min_value=0.0,
max_value=1.0,
help_text="Optional temperature between 0 and 1",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: missing file field validation in serializer. the audio_file validation happens in the view (line 392-397 in http.py), but the serializer should validate file uploads

Suggested change
class TranscriptionRequestSerializer(serializers.Serializer):
model = serializers.ChoiceField(
choices=["gpt-4o-transcribe", "gpt-4o-mini-transcribe", "whisper-1"],
default="gpt-4o-transcribe",
help_text="Transcription model",
)
prompt = serializers.CharField(
required=False,
help_text="Optional text prompt to guide the style, vocabulary or continue a previous audio segment",
)
language = serializers.CharField(
required=False,
help_text="Language of the input audio in ISO-639-1 format. See https://github.com/openai/whisper#available-models-and-languages",
)
temperature = serializers.FloatField(
required=False,
min_value=0.0,
max_value=1.0,
help_text="Optional temperature between 0 and 1",
)
class TranscriptionRequestSerializer(serializers.Serializer):
file = serializers.FileField(
required=True,
help_text="Audio file to transcribe (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm)",
)
model = serializers.ChoiceField(
choices=["gpt-4o-transcribe", "gpt-4o-mini-transcribe", "whisper-1"],
default="gpt-4o-transcribe",
help_text="Transcription model",
)
prompt = serializers.CharField(
required=False,
help_text="Optional text prompt to guide the style, vocabulary or continue a previous audio segment",
)
language = serializers.CharField(
required=False,
help_text="Language of the input audio in ISO-639-1 format. See https://github.com/openai/whisper#available-models-and-languages",
)
temperature = serializers.FloatField(
required=False,
min_value=0.0,
max_value=1.0,
help_text="Optional temperature between 0 and 1",
)

does the MultiPartParser already handle file validation before the serializer, making this suggestion unnecessary?

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/api/llm_gateway/serializers.py
Line: 167:186

Comment:
**style:** missing `file` field validation in serializer. the `audio_file` validation happens in the view (line 392-397 in http.py), but the serializer should validate file uploads

```suggestion
class TranscriptionRequestSerializer(serializers.Serializer):
    file = serializers.FileField(
        required=True,
        help_text="Audio file to transcribe (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm)",
    )
    model = serializers.ChoiceField(
        choices=["gpt-4o-transcribe", "gpt-4o-mini-transcribe", "whisper-1"],
        default="gpt-4o-transcribe",
        help_text="Transcription model",
    )
    prompt = serializers.CharField(
        required=False,
        help_text="Optional text prompt to guide the style, vocabulary or continue a previous audio segment",
    )
    language = serializers.CharField(
        required=False,
        help_text="Language of the input audio in ISO-639-1 format. See https://github.com/openai/whisper#available-models-and-languages",
    )
    temperature = serializers.FloatField(
        required=False,
        min_value=0.0,
        max_value=1.0,
        help_text="Optional temperature between 0 and 1",
    )
```

 does the MultiPartParser already handle file validation before the serializer, making this suggestion unnecessary?

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@a-lider a-lider removed the request for review from a team December 26, 2025 19:34
@a-lider a-lider added the hobby-preview Keep hobby deployment droplet alive for preview label Dec 26, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 26, 2025

🦔 Preview instance

Preview deployment ready

🌐 Access the instance

URL: https://do-ci-hobby-pr-44017.posthog.cc

SSH: ssh [email protected]

IP: 24.199.102.98

Mode: 🔄 Preview (persistent)
Commit: 26abd64
Workflow run: #6678

Full instance details
Droplet ID: 539915241
Droplet IP: 24.199.102.98
SSH: ssh [email protected]
URL: https://do-ci-hobby-pr-44017.posthog.cc
Deployment output
Downloading pynacl (1.3MiB)
Downloading cryptography (4.3MiB)
 Downloaded pynacl
 Downloaded cryptography
Installed 14 packages in 7ms
🔄 Preview mode enabled - checking for existing droplet for PR #44017
✅ Found existing droplet: do-ci-hobby-pr-44017 (ID: 539915241)
  IP: 24.199.102.98
  Updating to SHA: 26abd64
🔄 Updating existing deployment to SHA: 26abd64fab3a97620692e2fb236836d4c4fd51c8
✅ Updated POSTHOG_APP_TAG to 26abd64fab3a97620692e2fb236836d4c4fd51c8
🐋 Pulling new Docker images...
✅ Images pulled successfully
🔄 Restarting services...
✅ Services restarted
⏳ Waiting for services to stabilize...
✅ Deployment updated successfully
✅ Preview deployment updated successfully
🌐 URL: https://do-ci-hobby-pr-44017.posthog.cc

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="posthog/api/llm_gateway/serializers.py">

<violation number="1" location="posthog/api/llm_gateway/serializers.py:168">
P2: Consider adding a file size validator to prevent large file uploads that could cause memory issues. OpenAI&#39;s transcription API limits files to 25MB, so enforcing this at the serializer level is recommended.</violation>
</file>

Reply to cubic to teach it or ask questions. Tag @cubic-dev-ai to re-run a review.



class TranscriptionRequestSerializer(serializers.Serializer):
file = serializers.FileField(
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Consider adding a file size validator to prevent large file uploads that could cause memory issues. OpenAI's transcription API limits files to 25MB, so enforcing this at the serializer level is recommended.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At posthog/api/llm_gateway/serializers.py, line 168:

<comment>Consider adding a file size validator to prevent large file uploads that could cause memory issues. OpenAI&#39;s transcription API limits files to 25MB, so enforcing this at the serializer level is recommended.</comment>

<file context>
@@ -165,6 +165,10 @@ class ErrorResponseSerializer(serializers.Serializer):
 
 
 class TranscriptionRequestSerializer(serializers.Serializer):
+    file = serializers.FileField(
+        required=True,
+        help_text=&quot;Audio file to transcribe (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm)&quot;,
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hobby-preview Keep hobby deployment droplet alive for preview

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants