v0.3.3
Overview
This is a bug fix update that addresses many bugs present in versions 0.3. We recommend all users currently using versions 0.3.0/0.3.1/0.3.2 upgrade to this new version.
Explorer
- Over Rollout: This mechanism allows the explorer to proceed with fewer tasks than the full batch size. It effectively increases throughput in scenarios where some tasks take significantly longer to complete than others.
- Make the prompt truncation configurable
- Fix logprobs calculation when temperature is not 1.0
- Support rope scaling
- Support loading custom chat template files
- Support recording workflow running status
- Optimize the aggregation of workflow metrics
Trainer
- Update ppo policy loss calculation
- Fix loss aggregation for kl loss and entropy loss
- Optimize Trainer checkpoints saving
Buffer
- Support registering custom Task Reader
- Support token-level reward
- Fix some bugs
Others
- Add "Learn to Ask" and "Frozen Lake" examples
- Update Dockerfile
What's Changed
- Fix doc/config issues about prompt/response/sequence lengths by @yanxi-chen in #370
- Limit the number of historical document versions by @pan-x-c in #372
- Add
learn_to_askexample by @chenyushuo in #356 - Update main readme by @yanxi-chen in #374
- Fix
default_sampling_paramsandsimple_workflowby @chenyushuo in #373 - Fix metrics in trainer by @chenyushuo in #381
- Minor updates for BOTS example by @HYLcool in #385
- Fix vLLM prompt logprobs calculation by @pan-x-c in #384
- Fix alfworld dataset loading to use correct train/test split by @shiweijiezero in #378
- Bug fix when set
total_steps. by @chenyushuo in #386 - Add more useful options in
OptimizerConfigby @garyzhang99 in #371 - Add
rope_scalingandrope_thetato config by @chenyushuo in #390 - Add chat_template_path and trust_remote_code by @hiyuchang in #379
- Add
loss_agg_modefor kl and entropy_loss by @pan-x-c in #388 - Bug fix in benchmark ckpt loading and megatron hf save by @chenyushuo in #392
- Support Over Rollout by @pan-x-c in #376
- Make Alfworld Rollout Parallel by @hiyuchang in #393
- Make buffer reader registerable by @pan-x-c in #395
- [Example] Frozen_Lake by @hiyuchang in #375
- Add
token_level_rewardtoExperienceby @chenyushuo in #404 - Support recording workflow running status by @pan-x-c in #397
- Bug fix for trainer_state saving by @chenyushuo in #408
- Fix dynamic timeout by @pan-x-c in #409
- Typo fix by @chenyushuo in #411
- Fix repeat times in evaluation by @chenyushuo in #410
- Add
truncate_statusto experience by @hiyuchang in #407 - Add trainer_strategy and save_hf_checkpoint by @pan-x-c in #412
- Save bf16 model by @chenyushuo in #414
- Add Docker image build and push workflow by @pan-x-c in #413
Full Changelog: v0.3.2...v0.3.3