Skip to content

Add VLLMOpenAIModelClass parent class with cancellation support and health probes#998

Merged
christineyu123 merged 3 commits intomasterfrom
vllm_openaimodelclass_cancellation
Mar 24, 2026
Merged

Add VLLMOpenAIModelClass parent class with cancellation support and health probes#998
christineyu123 merged 3 commits intomasterfrom
vllm_openaimodelclass_cancellation

Conversation

@christineyu123
Copy link
Contributor

Summary

  • Adds VLLMOpenAIModelClass (clarifai/runners/models/vllm_openai_class.py) — a parent class that vLLM model implementations can subclass instead of writing a standalone model.py from scratch. Subclasses only need to implement load_model() to set self.server, self.client, self.model, and self.cancellation_handler.
  • Adds VLLMCancellationHandler — manages per-request cancellation via register_item_abort_callback. When the Clarifai runtime signals an abort, it sets a threading.Event and closes the underlying httpx response, causing vLLM to detect is_disconnected(), abort the engine, and free KV cache.
  • openai_stream_transport in VLLMOpenAIModelClass handles both /chat/completions and /responses endpoints with full cancellation support (including the race condition where abort arrives before the stream starts).
  • handle_liveness_probe / handle_readiness_probe hit vLLM's /health endpoint directly when a server is running; fall back to the base class behaviour when no server is set.
  • Adds tests/runners/test_vllm_openai_class.py with 17 unit tests covering VLLMCancellationHandler state transitions, health probe delegation, and streaming cancellation behaviour — no real vLLM process required.

Test plan

  • All 17 new unit tests pass locally (uv run pytest tests/runners/test_vllm_openai_class.py -v)
  • CI green on all matrix targets (macOS/Ubuntu/Windows × Python 3.11/3.12)
  • Manual smoke-test: subclass VLLMOpenAIModelClass in a model upload (e.g. trinity-mini-thinking-vllm) and verify abort cancels the running generation

🤖 Generated with Claude Code

@christineyu123 christineyu123 requested a review from a team March 23, 2026 15:31
@github-actions
Copy link

Code Coverage

Package Line Rate Health
clarifai 45%
clarifai.cli 62%
clarifai.cli.templates 67%
clarifai.cli.templates.toolkits 100%
clarifai.client 65%
clarifai.client.auth 67%
clarifai.constants 100%
clarifai.datasets 100%
clarifai.datasets.export 69%
clarifai.datasets.upload 75%
clarifai.datasets.upload.loaders 37%
clarifai.models 100%
clarifai.rag 0%
clarifai.runners 52%
clarifai.runners.models 59%
clarifai.runners.pipeline_steps 39%
clarifai.runners.pipelines 72%
clarifai.runners.utils 62%
clarifai.runners.utils.data_types 72%
clarifai.schema 100%
clarifai.urls 58%
clarifai.utils 65%
clarifai.utils.evaluation 16%
clarifai.workflows 95%
Summary 60% (11553 / 19181)

Minimum allowed line rate is 50%

Copy link
Contributor

@ackizilkale ackizilkale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@christineyu123 christineyu123 merged commit a9ecd2b into master Mar 24, 2026
11 checks passed
@christineyu123 christineyu123 deleted the vllm_openaimodelclass_cancellation branch March 24, 2026 16:01
@ackizilkale ackizilkale mentioned this pull request Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants