Empty model responses are occurring in roughly 1-3% of evaluations and are affecting performance.
This appears to affect multiple OpenRouter models. The work is to investigate and add retries in verifiers for empty model responses.
Relevant examples: evaluation 1, evaluation 2.
Empty model responses are occurring in roughly 1-3% of evaluations and are affecting performance.
This appears to affect multiple OpenRouter models. The work is to investigate and add retries in verifiers for empty model responses.
Relevant examples: evaluation 1, evaluation 2.