You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What would be the performance without reasoning, using GRPO directly on the answers provided by the model? This experiment would help showing if the reasoning trace helps for performance or not!
Thank you for your work!
What would be the performance without reasoning, using GRPO directly on the answers provided by the model? This experiment would help showing if the reasoning trace helps for performance or not!