Skip to content

Add new ML inference examples for image and video generation#28

Open
max4c wants to merge 3 commits intomainfrom
codex/ml-inference-media-examples
Open

Add new ML inference examples for image and video generation#28
max4c wants to merge 3 commits intomainfrom
codex/ml-inference-media-examples

Conversation

@max4c
Copy link

@max4c max4c commented Feb 15, 2026

Summary

  • Add four new ML inference examples under 02_ml_inference:
    • 02_text_to_image (FLUX.1-schnell)
    • 03_image_to_image (Stable Diffusion img2img)
    • 04_text_to_video (Diffusers text-to-video)
    • 05_image_to_video (Stable Video Diffusion)
  • Add example API/app/demo/mothership configs and docs
  • Update repository docs (README.md and 02_ml_inference/README.md)
  • Update uv.lock for added runtime dependencies
  • Include fallback image assets (poddy.jpg)

Notes

  • text-to-image now accepts request-level hf_token with env fallback
  • GIF output FPS is clamped to 25 and response fps reflects encoded output

Validation

  • Smoke-tested locally

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30f894a9dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 54 to 58
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell",
torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Cache the FLUX pipeline instead of recreating it per request

generate_image rebuilds FluxPipeline on every call, so each request pays full model initialization/offload cost instead of reusing a warm worker. In practice this can turn steady-state requests into repeated cold starts (large latency spikes and higher timeout risk) once traffic exceeds a single call, even though the endpoint is configured to keep workers warm.

Useful? React with 👍 / 👎.

# ADA_24 gives us an RTX 4090-class GPU with 24GB — plenty of room.
gpu_config = LiveServerless(
name="02_02_flux_schnell",
gpus=[GpuGroup.AMPERE_80],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the intended GPU class for the FLUX worker

This worker is documented as fitting on ADA_24, but the config requests GpuGroup.AMPERE_80. That mismatch makes the example harder to run in environments that only expose 24GB GPUs and unnecessarily raises cost/capacity requirements, which can leave deployments unschedulable despite the model's stated VRAM profile.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant