Fix ZeRO-3 + PEFT mixed-dtype error for core trainers by albertvillanova · Pull Request #6091 · huggingface/trl

albertvillanova · 2026-06-17T12:52:57Z

Fixes a TypeError: output tensor must have the same type as input tensor that broke training with DeepSpeed ZeRO Stage 3 + non-quantized PEFT (LoRA) after deepspeed 0.19.2 was released.

Fix #6089.

Motivation

DeepSpeed 0.19.2 changed _configure_distributed_model to skip the blanket module.bfloat16() cast for ZeRO-Init models:

PR: Mixed-precision: per-policy param/buffer dtype cast (preserve fp32 buffers) deepspeedai/DeepSpeed#8066

Before 0.19.2, that cast was accidentally unifying all parameter dtypes, including PEFT LoRA adapter parameters. After 0.19.2 the cast is skipped, exposing a latent bug in DeepSpeed's _allgather_params_coalesced: output buffers are allocated using the dtype of the first persistent parameter, so when persistent parameters have mixed dtypes the subsequent all_gather_into_tensor call raises a TypeError.

The mixed-dtype situation arises because:

The base model is loaded in bf16 via ZeRO-Init → base model parameters have ds_tensor.dtype = bfloat16
PEFT's default autocast_adapter_dtype=True upcasts LoRA adapter parameters to fp32 (intended for QLoRA stability, not needed for non-quantized bf16 training)
Both base model params and LoRA params end up in persistent_parameters → dtype mismatch on all-gather

A fix has been reported upstream:

However, on TRL's side we can add a short-term workaround. Note that TRL already does something analog for QLoRA at SFTTrainer. We can extend this to also handle non-quantized PEFT + ZeRO3 by passing autocast_adapter_dtype=False to get_peft_model(), which prevents PEFT from upcasting adapters to fp32 (keeping them in bf16 to match the base model).

Solution

Pass autocast_adapter_dtype=False to get_peft_model() when ZeRO Stage 3 is active and the model is not quantized. This prevents PEFT from upcasting LoRA adapter parameters to fp32, keeping them in the base model's dtype (bf16) and eliminating the mismatch.

The fp32 upcast (autocast_adapter_dtype=True) is a QLoRA-specific concern: with a 4-bit quantized base model, higher-precision adapters compensate for the coarse weight representation. For non-quantized bf16 training, keeping LoRA adapters in bf16 is correct and causes no stability regression: this matches how FSDP2 handles non-quantized LoRA.

The existing QLoRA workaround (manual cast to bf16 for is_loaded_in_4bit/is_loaded_in_8bit) is left in place.

Note

Medium Risk
Changes PEFT adapter dtype and DeepSpeed version constraints across multiple core trainers; affects distributed ZeRO-3 + LoRA training paths but is narrowly gated and leaves QLoRA behavior unchanged.

Overview
Fixes DeepSpeed ZeRO Stage 3 training with non-quantized LoRA, which started failing on the first optimizer step with a mixed-dtype TypeError after DeepSpeed 0.19.2 (issue #6089).

DPO, GRPO, RLOO, Reward, and SFT trainers now call get_peft_model(..., autocast_adapter_dtype=False) when ZeRO-3 is active, the model is not 4/8-bit quantized, and PEFT is ≥ 0.12.0, so LoRA weights stay in bf16 instead of being upcast to fp32. Quantized (QLoRA) paths still use the existing manual bf16 cast on trainable params. A shared _is_quantized_model flag replaces repeated getattr checks.

Dependencies: the temporary deepspeed<0.19.2 cap is removed from pyproject.toml (deepspeed and dev extras), allowing newer DeepSpeed now that TRL handles the dtype mismatch.

^{Reviewed by Cursor Bugbot for commit ed91b94. Bugbot is set up for automated code reviews on this repo. Configure here.}

…lse for non-quantized models

This reverts commit dc17985.

bot-ci-comment · 2026-06-17T12:55:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2026-06-17T13:40:43Z

Thanks, the other trainers don't require the same fix?

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.

^{Reviewed by Cursor Bugbot for commit f515838. Configure here.}

qgallouedec

thanks

albertvillanova added 4 commits June 17, 2026 12:07

Define _is_quantized_model in SFT

31789bb

Use _is_quantized_model in SFT

01ddc21

Fix ZeRO-3 + PEFT dtype mismatch by passing autocast_adapter_dtype=Fa…

dcb88f2

…lse for non-quantized models

Revert "Hotfix CI: Temporarily pin deepspeed < 0.19.2 (#6090)"

eb5c709

This reverts commit dc17985.

albertvillanova added 12 commits June 18, 2026 11:00

Define _is_quantized_model in DPO

d1f5ab2

Use _is_quantized_model in DPO

2bbe742

Fix ZeRO-3 + PEFT dtype mismatch for non-quantized models in DPO

a27e2b4

Define _is_quantized_model in GRPO

618d649

Use _is_quantized_model in GRPO

f542382

Fix ZeRO-3 + PEFT dtype mismatch for non-quantized models in GRPO

25049e0

Define _is_quantized_model in RLOO

9ca74a9

Use _is_quantized_model in RLOO

3eeafb5

Fix ZeRO-3 + PEFT dtype mismatch for non-quantized models in RLOO

b4c55ad

Define _is_quantized_model in Reward

6cda62d

Use _is_quantized_model in Reward

772fd49

Fix ZeRO-3 + PEFT dtype mismatch for non-quantized models in Reward

f515838

cursor Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread trl/trainer/dpo_trainer.py Outdated

albertvillanova changed the title ~~Fix ZeRO-3 + PEFT (LoRA) TypeError caused by mixed-dtype persistent parameters~~ Fix ZeRO-3 + PEFT (LoRA) TypeError caused by mixed-dtype persistent parameters for core trainers Jun 18, 2026

albertvillanova added 2 commits June 18, 2026 11:08

Merge remote-tracking branch 'upstream/main' into fix-6089

563dbc6

Add PEFT version guard

46bff2e

albertvillanova changed the title ~~Fix ZeRO-3 + PEFT (LoRA) TypeError caused by mixed-dtype persistent parameters for core trainers~~ Fix ZeRO-3 + PEFT mixed-dtype TypeError for core trainers Jun 18, 2026

albertvillanova changed the title ~~Fix ZeRO-3 + PEFT mixed-dtype TypeError for core trainers~~ Fix ZeRO-3 + PEFT mixed-dtype error for core trainers Jun 18, 2026

albertvillanova mentioned this pull request Jun 18, 2026

Align KTO with DPO: Fix ZeRO-3 + PEFT dtype mismatch for non-quantized models #6093

Merged

qgallouedec approved these changes Jun 19, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into fix-6089

ed91b94

albertvillanova merged commit bf6a7b5 into main Jun 19, 2026
13 checks passed

albertvillanova deleted the fix-6089 branch June 19, 2026 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ZeRO-3 + PEFT mixed-dtype error for core trainers#6091

Fix ZeRO-3 + PEFT mixed-dtype error for core trainers#6091
albertvillanova merged 19 commits into
mainfrom
fix-6089

albertvillanova commented Jun 17, 2026 •

edited by cursor Bot

Loading

Uh oh!

bot-ci-comment Bot commented Jun 17, 2026

Uh oh!

qgallouedec commented Jun 17, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

albertvillanova commented Jun 17, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Uh oh!

bot-ci-comment Bot commented Jun 17, 2026

Uh oh!

qgallouedec commented Jun 17, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

albertvillanova commented Jun 17, 2026 •

edited by cursor Bot

Loading