-
Notifications
You must be signed in to change notification settings - Fork 0
Add triton runtime #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdds a Triton Inference Server example to the custom inference runtime docs, including a full ClusterServingRuntime YAML configured for NVIDIA GPUs, startup commands/env vars, resource settings, startupProbe, supportedModelFormats, usage steps, and an update to the runtime comparison table. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`:
- Around line 329-333: Add a Kubernetes startupProbe entry to the Triton runtime
YAML so the pod is not considered ready until the model finishes loading;
specifically, insert a startupProbe block (mirroring the pattern used in other
runtimes) immediately before the supportedModelFormats section in the Triton
runtime example (near the runAsUser key and before supportedModelFormats: -
name: triton) to probe the model server endpoint until it is healthy.
🧹 Nitpick comments (2)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (2)
308-312: Unused environment variableMODEL_REPO.The
MODEL_REPOenvironment variable is defined on lines 311-312 but is not used anywhere in the container command (lines 302-307). Either remove it or use it in the command if it's intended for some purpose.🔧 Suggested fix: Remove unused environment variable
env: - name: OMP_NUM_THREADS value: "1" - - name: MODEL_REPO - value: '{{ index .Annotations "aml-model-repo" }}' image: 152-231-registry.alauda.cn:60070/mlops/tritonserver:25.02-py3
313-313: Internal registry image may not be accessible to users.The image
152-231-registry.alauda.cn:60070/mlops/tritonserver:25.02-py3appears to reference an internal registry. Consider adding a comment similar to other examples, or use the official NVIDIA NGC image reference (e.g.,nvcr.io/nvidia/tritonserver:25.02-py3) for better accessibility.🔧 Suggested fix: Use official NVIDIA image
- image: 152-231-registry.alauda.cn:60070/mlops/tritonserver:25.02-py3 + image: nvcr.io/nvidia/tritonserver:25.02-py3 # Replace with your actual image if needed
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
Show resolved
Hide resolved
Deploying alauda-ai with
|
| Latest commit: |
217ba40
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://49a3ab18.alauda-ai.pages.dev |
| Branch Preview URL: | https://add-triton-rt.alauda-ai.pages.dev |
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.