Skip to content

[bugfix] fix qwen3_vl#73

Merged
Jintao-Huang merged 2 commits into
modelscope:mainfrom
Jintao-Huang:fix_qwen3_vl
May 12, 2026
Merged

[bugfix] fix qwen3_vl#73
Jintao-Huang merged 2 commits into
modelscope:mainfrom
Jintao-Huang:fix_qwen3_vl

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the transformer block to support keyword arguments during activation checkpointing by wrapping the forward function and converting kwargs into positional arguments. This ensures that tensor keyword arguments are correctly tracked by the autograd graph. Feedback suggests that other tensor inputs currently captured via closure, such as attention_bias and packed_seq_params, should also be passed as explicit arguments to ensure robust autograd tracking during recomputation.

Comment on lines +113 to +124
def wrapped_forward(hidden_states, attention_mask, context, context_mask, rotary_pos_emb, padding_mask,
*extra_args):
extra_kwargs = dict(zip(extra_kwargs_keys, extra_args))
return forward_func(
hidden_states,
attention_mask,
context,
context_mask,
rotary_pos_emb,
padding_mask,
**extra_kwargs,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The wrapped_forward function correctly converts positional arguments back to keyword arguments to bypass the torch.utils.checkpoint limitation. However, note that attention_bias and packed_seq_params (from the outer _checkpointed_forward scope) are still captured via closure inside custom_forward. While this might be acceptable if they don't require gradients, it is generally safer to pass all Tensor inputs as explicit arguments to the checkpointed function to ensure they are correctly tracked by the autograd engine during activation recomputation.

@Jintao-Huang Jintao-Huang merged commit f59a5d7 into modelscope:main May 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant