Labels
Labels
34 labels
- Tasks involving RL loss functions such as DPO and GRPO, and mathematical alignment logic
- Tasks involving Ray actor management, cross-node scheduling, and communication synchronization.
- Tasks involving the interaction of vLLM inference and DeepSpeed training endpoints.
- Specific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)
- paused due to lack of prerequisites, such as the upstream vLLM has not yet released a new version.