Skip to content

Conversation

@fyuan1316
Copy link
Contributor

@fyuan1316 fyuan1316 commented Jan 28, 2026

Summary by CodeRabbit

  • Documentation
    • Added Triton Inference Server as a new runtime example with full configuration and step-by-step usage for NVIDIA GPU deployments.
    • Updated the runtime comparisons table to include Triton's target hardware, supported frameworks, and configuration notes.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Walkthrough

Adds a Triton Inference Server example to the custom inference runtime docs, including a full ClusterServingRuntime YAML configured for NVIDIA GPUs, startup commands/env vars, resource settings, startupProbe, supportedModelFormats, usage steps, and an update to the runtime comparison table.

Changes

Cohort / File(s) Summary
Documentation: Triton Inference Server Runtime
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
Adds a new "Specific Runtime Examples" entry for Triton with a full ClusterServingRuntime YAML (GPU accelerator metadata, container args/env, resources, startupProbe, supportedModelFormats), usage/preparation steps, and a new row in the runtime comparison table.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

Suggested reviewers

  • typhoonzero

Poem

🐰 I hopped through YAML, neat and spry,
Triton lights the GPU sky,
Containers, probes, and models align,
Docs now hum with runtime shine,
🥕🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add triton runtime' directly describes the main change: adding documentation for Triton Inference Server runtime with YAML configuration and usage steps.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`:
- Around line 329-333: Add a Kubernetes startupProbe entry to the Triton runtime
YAML so the pod is not considered ready until the model finishes loading;
specifically, insert a startupProbe block (mirroring the pattern used in other
runtimes) immediately before the supportedModelFormats section in the Triton
runtime example (near the runAsUser key and before supportedModelFormats: -
name: triton) to probe the model server endpoint until it is healthy.
🧹 Nitpick comments (2)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (2)

308-312: Unused environment variable MODEL_REPO.

The MODEL_REPO environment variable is defined on lines 311-312 but is not used anywhere in the container command (lines 302-307). Either remove it or use it in the command if it's intended for some purpose.

🔧 Suggested fix: Remove unused environment variable
      env:
        - name: OMP_NUM_THREADS
          value: "1"
-        - name: MODEL_REPO
-          value: '{{ index .Annotations "aml-model-repo" }}'
      image: 152-231-registry.alauda.cn:60070/mlops/tritonserver:25.02-py3

313-313: Internal registry image may not be accessible to users.

The image 152-231-registry.alauda.cn:60070/mlops/tritonserver:25.02-py3 appears to reference an internal registry. Consider adding a comment similar to other examples, or use the official NVIDIA NGC image reference (e.g., nvcr.io/nvidia/tritonserver:25.02-py3) for better accessibility.

🔧 Suggested fix: Use official NVIDIA image
-      image: 152-231-registry.alauda.cn:60070/mlops/tritonserver:25.02-py3
+      image: nvcr.io/nvidia/tritonserver:25.02-py3  # Replace with your actual image if needed

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 28, 2026

Deploying alauda-ai with  Cloudflare Pages  Cloudflare Pages

Latest commit: 217ba40
Status: ✅  Deploy successful!
Preview URL: https://49a3ab18.alauda-ai.pages.dev
Branch Preview URL: https://add-triton-rt.alauda-ai.pages.dev

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants