Added Eagle training support for Kimi-K2#108
Open
xuhaojie-2025 wants to merge 17 commits intosgl-project:mainfrom
Open
Added Eagle training support for Kimi-K2#108xuhaojie-2025 wants to merge 17 commits intosgl-project:mainfrom
xuhaojie-2025 wants to merge 17 commits intosgl-project:mainfrom
Conversation
…hidden states generation (sgl-project#57) * add local data path support and more assistant * small refactor * separate out the data-preprocess logic
* add support for qwen3 eagle train * fix * Update README.md * fix * fix and add test * fix code style * feat: add training scripts for qwen3-8B Co-authored-by: sleepcoo <sleepcoo@gmail.com> * fix * add 235B config * fix chat template * fix chat template --------- Co-authored-by: Yubo Wang <yubowang2019@gmail.com>
Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com>
* updated badges * Update README.md --------- Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com>
* add wandb args check * fix * opt error log * remove local
Contributor
|
Warning Gemini is unable to generate a summary due to a potential policy violation. |
Collaborator
|
Can you fix the conflict? |
Author
|
Author
I have resolved the conflict based on upstream/main and re - submitted the code. |
jvmncs
reviewed
Aug 27, 2025
| self.head_dim = getattr( | ||
| config, "head_dim", config.hidden_size // config.num_attention_heads | ||
| ) | ||
| <<<<<<< HEAD |
There was a problem hiding this comment.
looks like this conflict snuck into the last commit
|
@xuhaojie-2025 Trying to use this for kimi-k2-0905 but having a bit of a time getting it working. Library issues, some stray bad lines, not using trust_remote_code in various places, outdated kimi_k2.py with bad refs to qk_head_dim, etc. I can struggle through but I'm wondering if perhaps you have an updated or functional branch/commit somewhere I can look at? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
add support for Kimi-K2 eagle train
add target model for Kimi-K2 in specforge/modeling/target/kimi_k2.py
add Kimi-K2 config in configs/kimi-k2-eagle3.json
fix chat template in specforge/data/template.py
When generating the hidden layer, the special dialogue template of Kimi-K2 has been adapted in specforge/data/preprocessing.py
The tokenizer of the Kimi-K2 model cannot automatically use the fasttokenizer. A script is used to generate tokenizer.json, enabling it to use the fasttokenizer interface.