Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/usage/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ Start with:
- [Single-GPU GRPO Example](grpo-single-gpu-example.md)
- [Operators](../operators/README.md)
- [Weight Sync Bridge](weight-sync-bridge.md)
- [Vime Rollout LogP Probe](vime-rollout-logp-probe.md)
143 changes: 143 additions & 0 deletions docs/usage/vime-rollout-logp-probe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Vime Rollout LogP Probe

This page documents the minimal WS5 proof-of-concept for issue #120. It wires
one existing RL-Kernel operator into a single Vime rollout path using Vime's
public `--custom-generate-function-path` hook.

## Issue Checklist Mapping

- Smallest adapter/shim: `custom_generate` wraps one Vime generate call and one
RL-Kernel `logp` probe.
- Explicit opt-in: `RL_KERNEL_VIME_LOGP_PROBE=1` is required before RL-Kernel is
invoked.
- Instrumentation: `Sample.metadata["rl_kernel"]["vime_logp_probe"]` records
structured evidence, including whether the operator was invoked and the
process-local `call_count`.
- Run the minimal vime example: start from the #117 fully-async Qwen2.5-0.5B
baseline and add the exact command/config below.
- Smoke test with mocks: `tests/test_vime_rollout_logp_probe.py` installs a fake
Vime module when full Vime dependencies are unavailable.
- Fallback behavior: when RL-Kernel or its CUDA extension is unavailable,
non-strict mode keeps the native generated sample unchanged; native Vime/vLLM
generation failures still surface normally.

## What It Proves

The probe proves that a Vime rollout can invoke RL-Kernel from an opt-in custom
generate shim. It does not replace vLLM sampling or rollout-side logprob
computation. Vime's HTTP rollout path returns selected logprobs, not logits, so
the shim runs a small deterministic synthetic tensor through RL-Kernel's `logp`
operator and records structured evidence in `Sample.metadata`.

## Entry Point

Use this Vime custom generate path:

```text
rl_engine.integrations.vime.rollout_logp_probe.custom_generate
```

Enable the probe with:

```bash
export RL_KERNEL_VIME_LOGP_PROBE=1
```

Optional strict mode:

```bash
export RL_KERNEL_VIME_LOGP_STRICT=1
```

Strict mode raises if RL-Kernel import or backend dispatch fails. Without strict
mode, the shim records fallback metadata and returns Vime's native generated
sample unchanged.

## Minimal Vime Command

Starting from the #117 baseline
`vime/examples/fully_async/run-qwen2.5-0.5B-fully_async.sh`, add the custom
generate function to `ROLLOUT_ARGS`:

Add this line inside the script's `ROLLOUT_ARGS` array:

```bash
--custom-generate-function-path rl_engine.integrations.vime.rollout_logp_probe.custom_generate
```

Make RL-Kernel importable inside the Ray job runtime environment. Either install
RL-Kernel in the image, or include the checkout path and opt-in variable in the
script's `RUNTIME_ENV_JSON`:

```json
{
"env_vars": {
"PYTHONPATH": "/path/to/RL-Kernel:/root/Megatron-LM/:${SCRIPT_DIR}",
"RL_KERNEL_VIME_LOGP_PROBE": "1",
"CUDA_DEVICE_MAX_CONNECTIONS": "1",
"NCCL_NVLS_ENABLE": "${HAS_NVLINK}"
}
}
```

Then run the baseline script normally:

```bash
bash examples/fully_async/run-qwen2.5-0.5B-fully_async.sh
```

For a direct `train_async.py` or `ray job submit` command, the exact Vime
argument to add is:

```bash
--custom-generate-function-path rl_engine.integrations.vime.rollout_logp_probe.custom_generate
```

The run should produce samples whose metadata contains:

```python
sample.metadata["rl_kernel"]["vime_logp_probe"]
```

Expected fields include:

```text
enabled
invoked
call_count
op
backend
fallback
fallback_reason
output_shape
output_sum
```

The `invoked` field proves the shim reached `kernel_registry.get_op("logp")`
and executed the returned operator for that sample. `call_count` is a
process-local successful invocation counter.

## Fallback Behavior

The shim always calls Vime's native `vime.rollout.vllm_rollout.generate` first.
If native Vime/vLLM generation is unavailable, the run fails the same way the
native Vime path would fail; the shim does not hide that failure. If the probe is
disabled, it records `enabled=False` and returns the sample. If RL-Kernel is
unavailable, a backend is unavailable, or the CUDA extension is not built,
non-strict mode records `fallback=True` and returns the native generated sample
unchanged.

This keeps pure Vime inference and native RL paths unaffected when the probe is
disabled or when RL-Kernel cannot run.

## Local Smoke Test

The mock smoke test does not require a full Vime installation:

```bash
python -m pytest tests/test_vime_rollout_logp_probe.py
```

The test installs a fake `vime.rollout.vllm_rollout.generate`, exercises the
custom generate shim, verifies that RL-Kernel `logp` dispatch was invoked, and
checks non-strict fallback behavior.
4 changes: 4 additions & 0 deletions rl_engine/integrations/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright (c) 2026 RL-Kernel Contributors

"""Framework integration helpers for RL-Kernel."""
22 changes: 22 additions & 0 deletions rl_engine/integrations/vime/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright (c) 2026 RL-Kernel Contributors

"""Opt-in Vime integration helpers."""

from rl_engine.integrations.vime.rollout_logp_probe import (
ENV_ENABLED,
ENV_STRICT,
METADATA_KEY,
RLKernelProbeResult,
custom_generate,
run_logp_probe,
)

__all__ = [
"ENV_ENABLED",
"ENV_STRICT",
"METADATA_KEY",
"RLKernelProbeResult",
"custom_generate",
"run_logp_probe",
]
149 changes: 149 additions & 0 deletions rl_engine/integrations/vime/rollout_logp_probe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright (c) 2026 RL-Kernel Contributors

from __future__ import annotations

import inspect
import os
from dataclasses import asdict, dataclass
from typing import Any

ENV_ENABLED = "RL_KERNEL_VIME_LOGP_PROBE"
ENV_STRICT = "RL_KERNEL_VIME_LOGP_STRICT"
METADATA_KEY = "rl_kernel"

_TRUE_VALUES = {"1", "true", "yes", "on"}
_CALL_COUNT = 0


@dataclass(frozen=True)
class RLKernelProbeResult:
"""Structured evidence that the Vime shim reached RL-Kernel."""

enabled: bool
invoked: bool
call_count: int = 0
op: str = "logp"
backend: str | None = None
fallback: bool = False
fallback_reason: str | None = None
output_shape: tuple[int, ...] | None = None
output_sum: float | None = None


def _env_enabled(name: str) -> bool:
return os.environ.get(name, "").strip().lower() in _TRUE_VALUES


def _probe_tensors():
import torch

from rl_engine.platforms.device import device_ctx

logits = torch.tensor(
[
[0.25, 1.5, -0.5, 0.0],
[2.0, -1.0, 0.5, 0.25],
],
device=device_ctx.device,
dtype=torch.float32,
)
token_ids = torch.tensor([1, 0], device=device_ctx.device, dtype=torch.long)
return logits, token_ids


def _fallback(reason: str) -> RLKernelProbeResult:
return RLKernelProbeResult(
enabled=True,
invoked=False,
call_count=_CALL_COUNT,
fallback=True,
fallback_reason=reason,
)


def _supports_evaluation_arg(fn: Any) -> bool:
try:
parameters = inspect.signature(fn).parameters.values()
except (TypeError, ValueError):
return False
return any(
param.name == "evaluation" or param.kind == inspect.Parameter.VAR_KEYWORD
for param in parameters
)


def run_logp_probe(*, strict: bool | None = None) -> RLKernelProbeResult:
"""Invoke one RL-Kernel logp operator on a small deterministic tensor.

The probe intentionally uses synthetic tensors instead of Vime rollout
logits. Vime's rollout HTTP path exposes selected logprobs, not logits, so
this is an invocation proof rather than a rollout-logprob replacement.
"""

if strict is None:
strict = _env_enabled(ENV_STRICT)

try:
from rl_engine.kernels.registry import kernel_registry

logits, token_ids = _probe_tensors()
op = kernel_registry.get_op("logp")
output = op(logits, token_ids)
global _CALL_COUNT
_CALL_COUNT += 1
return RLKernelProbeResult(
enabled=True,
invoked=True,
call_count=_CALL_COUNT,
backend=op.__class__.__name__,
output_shape=tuple(output.shape),
output_sum=float(output.detach().float().sum().item()),
)
except Exception as exc:
if strict:
raise
return _fallback(f"{type(exc).__name__}: {exc}")


def _record_probe(sample: Any, result: RLKernelProbeResult) -> None:
metadata = getattr(sample, "metadata", None)
if not isinstance(metadata, dict):
metadata = {}
setattr(sample, "metadata", metadata)

rl_kernel_metadata = metadata.get(METADATA_KEY)
if not isinstance(rl_kernel_metadata, dict):
rl_kernel_metadata = {}
metadata[METADATA_KEY] = rl_kernel_metadata

rl_kernel_metadata["vime_logp_probe"] = asdict(result)


async def custom_generate(
args: Any,
sample: Any,
sampling_params: dict[str, Any],
evaluation: bool = False,
) -> Any:
"""Vime ``--custom-generate-function-path`` entry point.

This shim preserves Vime's native generation path and only adds opt-in
RL-Kernel invocation evidence. Enable it with
``RL_KERNEL_VIME_LOGP_PROBE=1``.
"""

from vime.rollout.vllm_rollout import generate

if _supports_evaluation_arg(generate):
sample = await generate(args, sample, sampling_params, evaluation=evaluation)
else:
sample = await generate(args, sample, sampling_params)

if not _env_enabled(ENV_ENABLED):
_record_probe(sample, RLKernelProbeResult(enabled=False, invoked=False))
return sample

result = run_logp_probe(strict=_env_enabled(ENV_STRICT))
_record_probe(sample, result)
return sample
Loading