Skip to content

Add Qwen3-VL training adaptation for DFlash#475

Open
gq112 wants to merge 3 commits intosgl-project:mainfrom
gq112:dflash-qwen3-vl
Open

Add Qwen3-VL training adaptation for DFlash#475
gq112 wants to merge 3 commits intosgl-project:mainfrom
gq112:dflash-qwen3-vl

Conversation

@gq112
Copy link

@gq112 gq112 commented Feb 28, 2026

Motivation

This PR adds training adaptation support for Qwen3-VL in the DFlash framework. #461

The goal is to enable end-to-end DFlash draft model training and integration with Qwen3-VL target models under the SpecForge training pipeline.

Modifications

Added Qwen3-VL online DFlash training support.
Added multimodal position id handling for DFlash, including Qwen3-VL mRoPE alignment.
Enabled HF target-side hidden state and position id extraction for VLM inputs.
Updated target embedding / lm head loading for Qwen3-VL weight layouts.
Added qwen3-vl chat template, draft config, and example training script.

Accuracy Test

coming soon

Benchmark & Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants