Skip to content

[bugfix] fix tie_word_embeddings#74

Merged
Jintao-Huang merged 3 commits into
modelscope:mainfrom
Jintao-Huang:fix_tie_word_embeddings
May 12, 2026
Merged

[bugfix] fix tie_word_embeddings#74
Jintao-Huang merged 3 commits into
modelscope:mainfrom
Jintao-Huang:fix_tie_word_embeddings

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

@Jintao-Huang Jintao-Huang commented May 12, 2026

from modelscope import AutoConfig

config = AutoConfig.from_pretrained('Qwen/Qwen3-VL-30B-A3B-Thinking')

print(config.tie_word_embeddings)  # False
print(config.text_config.tie_word_embeddings)  # True

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the configuration parser to preserve the untie_embeddings_and_output_weights setting in megatron_config when merging sub-configurations for Qwen3-VL models. A review comment identifies a potential bug where the setting is unconditionally overwritten with None if it was not previously defined, and provides a suggestion to only restore the value if it was explicitly present.

Comment thread src/mcore_bridge/config/parser.py Outdated
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the configuration parser to preserve the untie_embeddings_and_output_weights setting when merging sub-configurations, specifically to support Qwen3-VL models. Feedback suggests moving this preservation logic outside the configuration loop to improve efficiency and prevent unintended side effects where the first sub-configuration's value might override others if the parent configuration is missing the key.

Comment thread src/mcore_bridge/config/parser.py Outdated
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the configuration parser in src/mcore_bridge/config/parser.py to ensure that the untie_embeddings_and_output_weights setting is preserved when merging sub-configurations. The reviewer suggested generalizing the inline comment to improve maintainability and recommended investigating whether other configuration flags, such as moe_router_pre_softmax, also require protection from being overridden by conflicting defaults in sub-configs.

Comment on lines +96 to +97
# fix Qwen/Qwen3-VL-30B-A3B-Thinking
untie_embeddings_and_output_weights = megatron_config.get('untie_embeddings_and_output_weights')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment is model-specific; it is better to describe the general logic of preserving parent configuration settings to ensure maintainability as more models are added. Additionally, consider if other boolean flags that are inverted during conversion (such as moe_router_pre_softmax at line 78) should also be protected from being overridden by sub-configurations with conflicting defaults.

    # Preserve parent config's settings as sub-configs might have conflicting defaults.
    # This is particularly important for 'untie_embeddings_and_output_weights'.
    untie_embeddings_and_output_weights = megatron_config.get('untie_embeddings_and_output_weights')

@Jintao-Huang Jintao-Huang merged commit a443770 into modelscope:main May 12, 2026
1 check passed
Jintao-Huang added a commit that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant