Skip to content

v0.3.3

Choose a tag to compare

@pan-x-c pan-x-c released this 27 Nov 12:42
· 44 commits to main since this release
fbf6c96

Overview

This is a bug fix update that addresses many bugs present in versions 0.3. We recommend all users currently using versions 0.3.0/0.3.1/0.3.2 upgrade to this new version.

Explorer

  1. Over Rollout: This mechanism allows the explorer to proceed with fewer tasks than the full batch size. It effectively increases throughput in scenarios where some tasks take significantly longer to complete than others.
  2. Make the prompt truncation configurable
  3. Fix logprobs calculation when temperature is not 1.0
  4. Support rope scaling
  5. Support loading custom chat template files
  6. Support recording workflow running status
  7. Optimize the aggregation of workflow metrics

Trainer

  1. Update ppo policy loss calculation
  2. Fix loss aggregation for kl loss and entropy loss
  3. Optimize Trainer checkpoints saving

Buffer

  1. Support registering custom Task Reader
  2. Support token-level reward
  3. Fix some bugs

Others

  1. Add "Learn to Ask" and "Frozen Lake" examples
  2. Update Dockerfile

What's Changed

Full Changelog: v0.3.2...v0.3.3