Investigate: Qwen3.5-9B GRPO — how does Unsloth actually support it?

## Background
Unsloth advertises Qwen3.5 support, but when we tried \`unsloth/Qwen3.5-9B\`:
- It loads as \`Qwen3_5ForConditionalGeneration\` (a vision-language model)
- Crashes during GRPO generation with:
  \`\`\`
  RuntimeError: The size of tensor a (16) must match the size of tensor b (0)
  at non-singleton dimension 1
  \`\`\`
  in \`compute_3d_position_ids\` — a multimodal position encoding function

We switched to Qwen3-8B (pure text CausalLM) which works fine. But Qwen3.5's hybrid Gated DeltaNet architecture (Mamba+Transformer) is interesting and may have better efficiency.

## Questions
1. Does Unsloth's Qwen3.5 support require specific model variants (e.g., text-only vs multimodal)?
2. Is there a \`Qwen3.5-9B-Base\` or similar text-only variant?
3. Does Qwen3.5 GRPO need special generation config (disable 3D position IDs for text-only)?
4. Check Unsloth Discord / GitHub issues for known workarounds

## References
- Unsloth Qwen3.5 docs: https://unsloth.ai/docs/models/qwen3.5/fine-tune
- Qwen3.5 blog (Gated DeltaNet): https://qwen.ai/blog?id=qwen3.5
- Related issues: unslothai/unsloth#3003, #3864, #3149

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate: Qwen3.5-9B GRPO — how does Unsloth actually support it? #15

Background

Questions

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Investigate: Qwen3.5-9B GRPO — how does Unsloth actually support it? #15

Description

Background

Questions

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions