Skip to content

Conversation

@ajkv-google
Copy link
Collaborator

@ajkv-google ajkv-google commented Jan 15, 2026

Summary

  1. Updated DLRM experimental code to work on tillium chips.

  2. Added training scripts for HSTU using both Keras and Jax trainers. This is an implementation of how HSTU in this library can be trained on TPU using different trainers. The hyper parameters (e.g vocab_size, etc.) are set based on the Amazon Books dataset. However, those can be changed based on the dataset used and other factors. Verified training on Trillium chip, which ran successfully when using both trainers.

@ajkv-google ajkv-google requested a review from vlad-karp January 15, 2026 22:31
Copy link
Collaborator

@vlad-karp vlad-karp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a hard change request, but please consider if we can avoid using the global mesh

@ajkv-google ajkv-google merged commit 4317d1c into main Jan 15, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants