RL-Align · inaniloquentee · Jun 16, 2026
@@ -10,3 +10,4 @@ Start with:
 - [Single-GPU GRPO Example](grpo-single-gpu-example.md)
 - [Operators](../operators/README.md)
 - [Weight Sync Bridge](weight-sync-bridge.md)
+- [Vime Rollout LogP Probe](vime-rollout-logp-probe.md)
@@ -0,0 +1,143 @@
+# Vime Rollout LogP Probe
+
+This page documents the minimal WS5 proof-of-concept for issue #120. It wires
+one existing RL-Kernel operator into a single Vime rollout path using Vime's
+public `--custom-generate-function-path` hook.
+
+## Issue Checklist Mapping
+
+- Smallest adapter/shim: `custom_generate` wraps one Vime generate call and one
+  RL-Kernel `logp` probe.
+- Explicit opt-in: `RL_KERNEL_VIME_LOGP_PROBE=1` is required before RL-Kernel is
+  invoked.
+- Instrumentation: `Sample.metadata["rl_kernel"]["vime_logp_probe"]` records
+  structured evidence, including whether the operator was invoked and the
+  process-local `call_count`.
+- Run the minimal vime example: start from the #117 fully-async Qwen2.5-0.5B
+  baseline and add the exact command/config below.
+- Smoke test with mocks: `tests/test_vime_rollout_logp_probe.py` installs a fake
+  Vime module when full Vime dependencies are unavailable.
+- Fallback behavior: when RL-Kernel or its CUDA extension is unavailable,
+  non-strict mode keeps the native generated sample unchanged; native Vime/vLLM
+  generation failures still surface normally.
+
+## What It Proves
+
+The probe proves that a Vime rollout can invoke RL-Kernel from an opt-in custom
+generate shim. It does not replace vLLM sampling or rollout-side logprob
+computation. Vime's HTTP rollout path returns selected logprobs, not logits, so
+the shim runs a small deterministic synthetic tensor through RL-Kernel's `logp`
+operator and records structured evidence in `Sample.metadata`.
+
+## Entry Point
+
+Use this Vime custom generate path:
+
+```text
+rl_engine.integrations.vime.rollout_logp_probe.custom_generate
+```
+
+Enable the probe with:
+
+```bash
+export RL_KERNEL_VIME_LOGP_PROBE=1
+```
+
+Optional strict mode:
+
+```bash
+export RL_KERNEL_VIME_LOGP_STRICT=1
+```
+
+Strict mode raises if RL-Kernel import or backend dispatch fails. Without strict
+mode, the shim records fallback metadata and returns Vime's native generated
+sample unchanged.
+
+## Minimal Vime Command
+
+Starting from the #117 baseline
+`vime/examples/fully_async/run-qwen2.5-0.5B-fully_async.sh`, add the custom
+generate function to `ROLLOUT_ARGS`:
+
+Add this line inside the script's `ROLLOUT_ARGS` array:
+
+```bash
+--custom-generate-function-path rl_engine.integrations.vime.rollout_logp_probe.custom_generate
+```
+
+Make RL-Kernel importable inside the Ray job runtime environment. Either install
+RL-Kernel in the image, or include the checkout path and opt-in variable in the
+script's `RUNTIME_ENV_JSON`:
+
+```json
+{
+  "env_vars": {
+    "PYTHONPATH": "/path/to/RL-Kernel:/root/Megatron-LM/:${SCRIPT_DIR}",
+    "RL_KERNEL_VIME_LOGP_PROBE": "1",
+    "CUDA_DEVICE_MAX_CONNECTIONS": "1",
+    "NCCL_NVLS_ENABLE": "${HAS_NVLINK}"
+  }
+}
+```
+
+Then run the baseline script normally:
+
+```bash
+bash examples/fully_async/run-qwen2.5-0.5B-fully_async.sh
+```
+
+For a direct `train_async.py` or `ray job submit` command, the exact Vime
+argument to add is:
+
+```bash
+--custom-generate-function-path rl_engine.integrations.vime.rollout_logp_probe.custom_generate
+```
+
+The run should produce samples whose metadata contains:
+
+```python
+sample.metadata["rl_kernel"]["vime_logp_probe"]
+```
+
+Expected fields include:
+
+```text
+enabled
+invoked
+call_count
+op
+backend
+fallback
+fallback_reason
+output_shape
+output_sum
+```
+
+The `invoked` field proves the shim reached `kernel_registry.get_op("logp")`
+and executed the returned operator for that sample. `call_count` is a
+process-local successful invocation counter.
+
+## Fallback Behavior
+
+The shim always calls Vime's native `vime.rollout.vllm_rollout.generate` first.
+If native Vime/vLLM generation is unavailable, the run fails the same way the
+native Vime path would fail; the shim does not hide that failure. If the probe is
+disabled, it records `enabled=False` and returns the sample. If RL-Kernel is
+unavailable, a backend is unavailable, or the CUDA extension is not built,
+non-strict mode records `fallback=True` and returns the native generated sample
+unchanged.
+
+This keeps pure Vime inference and native RL paths unaffected when the probe is
+disabled or when RL-Kernel cannot run.
+
+## Local Smoke Test
+
+The mock smoke test does not require a full Vime installation:
+
+```bash
+python -m pytest tests/test_vime_rollout_logp_probe.py
+```
+
+The test installs a fake `vime.rollout.vllm_rollout.generate`, exercises the
+custom generate shim, verifies that RL-Kernel `logp` dispatch was invoked, and
+checks non-strict fallback behavior.
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: Apache-2.0
+# Copyright (c) 2026 RL-Kernel Contributors
+
+"""Framework integration helpers for RL-Kernel."""
@@ -0,0 +1,22 @@
+# SPDX-License-Identifier: Apache-2.0
+# Copyright (c) 2026 RL-Kernel Contributors
+
+"""Opt-in Vime integration helpers."""
+
+from rl_engine.integrations.vime.rollout_logp_probe import (
+    ENV_ENABLED,
+    ENV_STRICT,
+    METADATA_KEY,
+    RLKernelProbeResult,
+    custom_generate,
+    run_logp_probe,
+)
+
+__all__ = [
+    "ENV_ENABLED",
+    "ENV_STRICT",
+    "METADATA_KEY",
+    "RLKernelProbeResult",
+    "custom_generate",
+    "run_logp_probe",
+]
@@ -0,0 +1,149 @@
+# SPDX-License-Identifier: Apache-2.0
+# Copyright (c) 2026 RL-Kernel Contributors
+
+from __future__ import annotations
+
+import inspect
+import os
+from dataclasses import asdict, dataclass
+from typing import Any
+
+ENV_ENABLED = "RL_KERNEL_VIME_LOGP_PROBE"
+ENV_STRICT = "RL_KERNEL_VIME_LOGP_STRICT"
+METADATA_KEY = "rl_kernel"
+
+_TRUE_VALUES = {"1", "true", "yes", "on"}
+_CALL_COUNT = 0
+
+
+@dataclass(frozen=True)
+class RLKernelProbeResult:
+    """Structured evidence that the Vime shim reached RL-Kernel."""
+
+    enabled: bool
+    invoked: bool
+    call_count: int = 0
+    op: str = "logp"
+    backend: str | None = None
+    fallback: bool = False
+    fallback_reason: str | None = None
+    output_shape: tuple[int, ...] | None = None
+    output_sum: float | None = None
+
+
+def _env_enabled(name: str) -> bool:
+    return os.environ.get(name, "").strip().lower() in _TRUE_VALUES
+
+
+def _probe_tensors():
+    import torch
+
+    from rl_engine.platforms.device import device_ctx
+
+    logits = torch.tensor(
+        [
+            [0.25, 1.5, -0.5, 0.0],
+            [2.0, -1.0, 0.5, 0.25],
+        ],
+        device=device_ctx.device,
+        dtype=torch.float32,
+    )
+    token_ids = torch.tensor([1, 0], device=device_ctx.device, dtype=torch.long)
+    return logits, token_ids
+
+
+def _fallback(reason: str) -> RLKernelProbeResult:
+    return RLKernelProbeResult(
+        enabled=True,
+        invoked=False,
+        call_count=_CALL_COUNT,
+        fallback=True,
+        fallback_reason=reason,
+    )
+
+
+def _supports_evaluation_arg(fn: Any) -> bool:
+    try:
+        parameters = inspect.signature(fn).parameters.values()
+    except (TypeError, ValueError):
+        return False
+    return any(
+        param.name == "evaluation" or param.kind == inspect.Parameter.VAR_KEYWORD
+        for param in parameters
+    )
+
+
+def run_logp_probe(*, strict: bool | None = None) -> RLKernelProbeResult:
+    """Invoke one RL-Kernel logp operator on a small deterministic tensor.
+
+    The probe intentionally uses synthetic tensors instead of Vime rollout
+    logits. Vime's rollout HTTP path exposes selected logprobs, not logits, so
+    this is an invocation proof rather than a rollout-logprob replacement.
+    """
+
+    if strict is None:
+        strict = _env_enabled(ENV_STRICT)
+
+    try:
+        from rl_engine.kernels.registry import kernel_registry
+
+        logits, token_ids = _probe_tensors()
+        op = kernel_registry.get_op("logp")
+        output = op(logits, token_ids)
+        global _CALL_COUNT
+        _CALL_COUNT += 1
+        return RLKernelProbeResult(
+            enabled=True,
+            invoked=True,
+            call_count=_CALL_COUNT,
+            backend=op.__class__.__name__,
+            output_shape=tuple(output.shape),
+            output_sum=float(output.detach().float().sum().item()),
+        )
+    except Exception as exc:
+        if strict:
+            raise
+        return _fallback(f"{type(exc).__name__}: {exc}")
+
+
+def _record_probe(sample: Any, result: RLKernelProbeResult) -> None:
+    metadata = getattr(sample, "metadata", None)
+    if not isinstance(metadata, dict):
+        metadata = {}
+        setattr(sample, "metadata", metadata)
+
+    rl_kernel_metadata = metadata.get(METADATA_KEY)
+    if not isinstance(rl_kernel_metadata, dict):
+        rl_kernel_metadata = {}
+        metadata[METADATA_KEY] = rl_kernel_metadata
+
+    rl_kernel_metadata["vime_logp_probe"] = asdict(result)
+
+
+async def custom_generate(
+    args: Any,
+    sample: Any,
+    sampling_params: dict[str, Any],
+    evaluation: bool = False,
+) -> Any:
+    """Vime ``--custom-generate-function-path`` entry point.
+
+    This shim preserves Vime's native generation path and only adds opt-in
+    RL-Kernel invocation evidence. Enable it with
+    ``RL_KERNEL_VIME_LOGP_PROBE=1``.
+    """
+
+    from vime.rollout.vllm_rollout import generate
+
+    if _supports_evaluation_arg(generate):
+        sample = await generate(args, sample, sampling_params, evaluation=evaluation)
+    else:
+        sample = await generate(args, sample, sampling_params)
+
+    if not _env_enabled(ENV_ENABLED):
+        _record_probe(sample, RLKernelProbeResult(enabled=False, invoked=False))
+        return sample
+
+    result = run_logp_probe(strict=_env_enabled(ENV_STRICT))
+    _record_probe(sample, result)
+    return sample