Release v0.3.3 · modelscope/Trinity-RFT

Overview

This is a bug fix update that addresses many bugs present in versions 0.3. We recommend all users currently using versions 0.3.0/0.3.1/0.3.2 upgrade to this new version.

Explorer

Over Rollout: This mechanism allows the explorer to proceed with fewer tasks than the full batch size. It effectively increases throughput in scenarios where some tasks take significantly longer to complete than others.
Make the prompt truncation configurable
Fix logprobs calculation when temperature is not 1.0
Support rope scaling
Support loading custom chat template files
Support recording workflow running status
Optimize the aggregation of workflow metrics

Trainer

Update ppo policy loss calculation
Fix loss aggregation for kl loss and entropy loss
Optimize Trainer checkpoints saving

Buffer

Support registering custom Task Reader
Support token-level reward
Fix some bugs

Others

Add "Learn to Ask" and "Frozen Lake" examples
Update Dockerfile

What's Changed

Fix doc/config issues about prompt/response/sequence lengths by @yanxi-chen in #370
Limit the number of historical document versions by @pan-x-c in #372
Add learn_to_ask example by @chenyushuo in #356
Update main readme by @yanxi-chen in #374
Fix default_sampling_params and simple_workflow by @chenyushuo in #373
Fix metrics in trainer by @chenyushuo in #381
Minor updates for BOTS example by @HYLcool in #385
Fix vLLM prompt logprobs calculation by @pan-x-c in #384
Fix alfworld dataset loading to use correct train/test split by @shiweijiezero in #378
Bug fix when set total_steps. by @chenyushuo in #386
Add more useful options in OptimizerConfig by @garyzhang99 in #371
Add rope_scaling and rope_theta to config by @chenyushuo in #390
Add chat_template_path and trust_remote_code by @hiyuchang in #379
Add loss_agg_mode for kl and entropy_loss by @pan-x-c in #388
Bug fix in benchmark ckpt loading and megatron hf save by @chenyushuo in #392
Support Over Rollout by @pan-x-c in #376
Make Alfworld Rollout Parallel by @hiyuchang in #393
Make buffer reader registerable by @pan-x-c in #395
[Example] Frozen_Lake by @hiyuchang in #375
Add token_level_reward to Experience by @chenyushuo in #404
Support recording workflow running status by @pan-x-c in #397
Bug fix for trainer_state saving by @chenyushuo in #408
Fix dynamic timeout by @pan-x-c in #409
Typo fix by @chenyushuo in #411
Fix repeat times in evaluation by @chenyushuo in #410
Add truncate_status to experience by @hiyuchang in #407
Add trainer_strategy and save_hf_checkpoint by @pan-x-c in #412
Save bf16 model by @chenyushuo in #414
Add Docker image build and push workflow by @pan-x-c in #413

Full Changelog: v0.3.2...v0.3.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.3.3

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Overview

Explorer

Trainer

Buffer

Others

What's Changed

Contributors

Uh oh!