Original article excerpt
Server-side extracted preview paragraphs from the original source.
A Blog post by ServiceNow-AI on Hugging Face
TL;DR. vLLM V1 matched our vLLM V0 reference after we fixed four things: processed rollout logprobs, V1-specific runtime defaults, the inflight weight-update path, and the fp32 lm_head used for the final projection. We fixed the backend behavior before changing the RL objective.
The reference run used vLLM 0.8.5; the V1 runs used vLLM 0.18.1. Figure 1 shows the final result. The red run is the initial V1 attempt, and the green run is the final V1 run after the fixes described below.