Why format rewads are always 0 using Qwen3-VL-8B-Instrcut

I implement the Qwen3-VL model and run grpo_classification.py. However, my forward rewards are always 0, the format_reward function didn't revise.