Add new ML inference examples for image and video generation by max4c · Pull Request #28 · runpod/flash-examples

max4c · 2026-02-15T03:55:12Z

Summary

Add four new ML inference examples under 02_ml_inference:
- 02_text_to_image (FLUX.1-schnell)
- 03_image_to_image (Stable Diffusion img2img)
- 04_text_to_video (Diffusers text-to-video)
- 05_image_to_video (Stable Video Diffusion)
Add example API/app/demo/mothership configs and docs
Update repository docs (README.md and 02_ml_inference/README.md)
Update uv.lock for added runtime dependencies
Include fallback image assets (poddy.jpg)

Notes

text-to-image now accepts request-level hf_token with env fallback
GIF output FPS is clamped to 25 and response fps reflects encoded output

Validation

Smoke-tested locally

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30f894a9dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-15T04:00:42Z

02_ml_inference/02_text_to_image/gpu_worker.py

+    pipe = FluxPipeline.from_pretrained(
+        "black-forest-labs/FLUX.1-schnell",
+        torch_dtype=torch.bfloat16,
+    )
+    pipe.enable_model_cpu_offload()


Cache the FLUX pipeline instead of recreating it per request

generate_image rebuilds FluxPipeline on every call, so each request pays full model initialization/offload cost instead of reusing a warm worker. In practice this can turn steady-state requests into repeated cold starts (large latency spikes and higher timeout risk) once traffic exceeds a single call, even though the endpoint is configured to keep workers warm.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-15T04:00:42Z

02_ml_inference/02_text_to_image/gpu_worker.py

+# ADA_24 gives us an RTX 4090-class GPU with 24GB — plenty of room.
+gpu_config = LiveServerless(
+    name="02_02_flux_schnell",
+    gpus=[GpuGroup.AMPERE_80],


Use the intended GPU class for the FLUX worker

This worker is documented as fitting on ADA_24, but the config requests GpuGroup.AMPERE_80. That mismatch makes the example harder to run in environments that only expose 24GB GPUs and unnecessarily raises cost/capacity requirements, which can leave deployments unschedulable despite the model's stated VRAM profile.

Useful? React with 👍 / 👎.

Add text/image/video inference examples for ML track

30f894a

chatgpt-codex-connector bot reviewed Feb 15, 2026

View reviewed changes

max4c added 2 commits February 14, 2026 20:00

Apply ruff formatting for CI checks

20eea99

Cache FLUX worker pipeline and target ADA_24 GPUs

9ab9028

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new ML inference examples for image and video generation#28

Add new ML inference examples for image and video generation#28
max4c wants to merge 3 commits intomainfrom
codex/ml-inference-media-examples

max4c commented Feb 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

max4c commented Feb 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant