Description
MLServer currently provides out-of-the-box support for several frameworks like Scikit-Learn, XGBoost, and LightGBM. However, it lacks a dedicated, native runtime for ONNX.
As ONNX is the industry standard for model interoperability, adding a first-class mlserver-onnx runtime would eliminate the need for users to write custom Python wrappers for every deployment. This would streamline the path from training (in PyTorch, TensorFlow, etc.) to production serving via MLServer.
Proposed Requirements
- Dedicated Runtime: A new
mlserver-onnx package that implements the MLModel interface.
- Metadata Auto-discovery: The runtime should automatically parse the
.onnx file to infer input/output names, shapes, and types, reducing manual configuration in model-settings.json.
- Execution Providers: Support for hardware acceleration (e.g.,
CUDAExecutionProvider, OpenVINOExecutionProvider) through the parameters field.
- Standardized Data Handling: Optimized mapping between MLServer's
InferenceRequest and ONNX Runtime’s tensor format.
Proposed Configuration Example
The user experience should be as simple as defining the implementation in model-settings.json:
{
"name": "resnet50-onnx",
"implementation": "mlserver_onnx.ONNXModel",
"parameters": {
"uri": "./model.onnx",
"extra": {
"execution_providers": ["CUDAExecutionProvider", "CPUExecutionProvider"]
}
}
}
Description
MLServer currently provides out-of-the-box support for several frameworks like Scikit-Learn, XGBoost, and LightGBM. However, it lacks a dedicated, native runtime for ONNX.
As ONNX is the industry standard for model interoperability, adding a first-class
mlserver-onnxruntime would eliminate the need for users to write custom Python wrappers for every deployment. This would streamline the path from training (in PyTorch, TensorFlow, etc.) to production serving via MLServer.Proposed Requirements
mlserver-onnxpackage that implements theMLModelinterface..onnxfile to infer input/output names, shapes, and types, reducing manual configuration inmodel-settings.json.CUDAExecutionProvider,OpenVINOExecutionProvider) through theparametersfield.InferenceRequestand ONNX Runtime’s tensor format.Proposed Configuration Example
The user experience should be as simple as defining the implementation in
model-settings.json:{ "name": "resnet50-onnx", "implementation": "mlserver_onnx.ONNXModel", "parameters": { "uri": "./model.onnx", "extra": { "execution_providers": ["CUDAExecutionProvider", "CPUExecutionProvider"] } } }