Core Contributors: Peng Li, Zihan Zhuang, Yangfan Gao, Yi Dong, Sixian Li, Changhao Jiang, Tao Gui, Xipeng Qiu
The Humanoid Intelligence Team from FudanNLP and OpenMOSS
π Status: Research release β the initial codebase, model checkpoints, datasets, and deployment framework are fully open-source. More powerful models and improved training recipes are under development. Contributions, issues, and PRs are welcome!
For more information, refer to our project page and technical report.
Humanoid robots can perform diverse actions β greeting, dancing, backflipping β but these motions are typically hard-coded or task-specific. FRoM-W1 is an open-source framework for general humanoid whole-body motion control using natural language, operating in two stages:
-
H-GPT β A language-driven whole-body motion generation model trained on large-scale human motion data. Uses Chain-of-Thought (CoT) prompting to improve instruction understanding and generalization.
-
H-ACT β Retargets generated human motions into robot-specific actions, trains motion tracking policies via RL in simulation, and deploys them on real robots through a modular sim-to-real framework.
We evaluate FRoM-W1 on Unitree H1 and G1 robots. Results show strong performance on the HumanML3D-X benchmark for whole-body motion generation, and RL fine-tuning consistently improves both tracking accuracy and task success rates.
- π H-GPT and H-ACT module codebases (H-GPT, H-ACT)
- π Sim-to-real deployment framework RoboJuDo
- CoT datasets (HumanML3D-X, Motion-X) and Ξ΄HumanML3D-X benchmark
- SMPL-X baselines and eval model checkpoints (T2M, MotionDiffuse, MLD, T2M-GPT)
- π Technical Report and Project Page
- More powerful models (in progress)
Due to license restrictions, we cannot publicly share all data. Below are download and processing references.
H-GPT Module (click to expand)
| Dataset | Download Guide |
|---|---|
| HumanML3D | Original HumanML3D repo β backup link |
| KIT-ML | Original KIT-ML repo β backup link |
| Motion-X | Original Motion-X repo β processing guide HERE |
| HumanML3D-X | Process via the Motion-X repo + this guide. Uses original HumanML3D split with re-calculated mean/std. CoT data on HuggingFace. |
| Ξ΄HumanML3D-X | Same as HumanML3D-X, with perturbed instruction variants on HuggingFace. |
Expected structure for each dataset:
H-GPT/datasets/{dataset_name}/data/
βββ new_joint_vecs/
βββ new_joints/
βββ texts/
βββ cots/
βββ Mean.npy
βββ Std.npy
βββ all.txt
βββ train.txt
βββ train_val.txt
βββ val.txt
βββ test.txt
H-ACT Module (click to expand)
| Dataset | Download Guide |
|---|---|
| AMASS | Download and processing procedures from human2humanoid |
| AMASS-H1 | Retargeted for Unitree H1 β box link (from human2humanoid) |
| AMASS-G1 | Retargeted for Unitree G1 β link coming soon |
We retrained these SMPL-X baseline models and fully open-sourced them:
SMPL-X Baseline Codebases (forked repos):
- T2M Β· MotionDiffuse Β· MLD Β· T2M-GPT
Checkpoints (HuggingFace):
- Eval model Β· T2M Β· MotionDiffuse Β· MLD Β· T2M-GPT (all SMPL-X format)
H-GPT (click to expand)
| Model | Download |
|---|---|
| H-GPT w.o. CoT | LoRA weights β merge with Llama-3.1 via this script |
| H-GPT | LoRA weights β merge with Llama-3.1 |
| H-GPT++ w.o. CoT | LoRA weights β merge with Llama-3.1 |
| H-GPT++ | LoRA weights β merge with Llama-3.1 |
H-ACT (click to expand)
| Policy | Download |
|---|---|
| H1-Full | Teacher (TBD), Student |
| H1-Clean | Teacher (TBD), Student |
| G1-Full | Teacher (TBD), Student |
| G1-Clean | Teacher (TBD), Student |
FRoM-W1/
βββ H-GPT/ # Motion generation module
β βββ hGPT/ # Core package (models, data, metrics, losses)
β βββ configs/ # OmegaConf YAML configs (exp + arch)
β βββ scripts/ # Inference entry points
β βββ motionx_processing.md # Dataset preparation guide
βββ H-ACT/ # Action execution module
β βββ retarget/ # SMPL-X β robot joint retargeting (submodule)
β βββ human2humanoid/ # RL policy training framework (submodule)
β βββ RoboJuDo/ # Sim-to-real deployment (submodule)
βββ assets/ # Images and media
βββ QUICKSTART.md # Step-by-step setup guide
βββ requirements.txt
βββ LICENSE # Apache 2.0
βββ README.md
The QUICKSTART.md guide walks through the full pipeline:
Text Instruction β H-GPT (motion generation) β Retarget (SMPL-X β robot joints)
β Policy (RL training) β RoboJuDo (sim-to-real deployment) β Real Robot
# 1. Setup
conda create -n fromw1 python=3.10
conda activate fromw1
pip install -r requirements.txt
# 2. Generate whole-body motion from text (H-GPT)
cd H-GPT
CUDA_VISIBLE_DEVICES=0 python -m scripts.demo \
--cfg_assets ./configs/assets.yaml \
--cfg configs/exp/1217_config_motionx_stage2_body_hands_llama_vqvae2kx1k_cotv3_t2mx.yaml \
--task t2m \
--example ./scripts/instructions.txt
# 3. Visualize
python -m hGPT.data.motionx.visualization.plot_3d_global \
--path ./results/<result_folder>
# 4. Retarget to robot joints (H-ACT)
cd ../H-ACT/retarget
python main.pyFor dataset preparation, model downloads, deps folder setup, and full deployment, follow QUICKSTART.md.
Three training stages controlled by the TRAIN.STAGE config field:
| Stage | TRAIN.STAGE |
Description |
|---|---|---|
| VQ-VAE | "vae" |
Train whole-body motion tokenizer (convolutional encoder/decoder + vector quantization) |
| LM Pretrain | "lm_pretrain" |
Finetune Llama-3.1-8B via LoRA to generate motion tokens (VQ-VAE frozen) |
| LM Instruct | "lm_instruct" |
Instruction-tune with Chain-of-Thought data |
See the H-GPT README for detailed training commands and evaluation protocols.
- human2humanoid β RL-based motion tracking (primary framework)
- Beyondmimic β CSV-formatted motion data required; convert with
retarget/scripts/pkl_2_csv.py - TWIST β Alternative tracking strategy
- RoboJuDo β Unified sim-to-real deployment with pretrained policies
We thank Biao Jiang for discussions on motion generation models, and Tairan He and Ziwen Zhuang for their help in motion tracking. We are grateful to all the open-source datasets and projects that made this work possible.
If you find this work useful, please star β the repo and cite:
@article{DBLP:journals/corr/abs-2601-12799,
author = {Peng Li and
Zihan Zhuang and
Yangfan Gao and
Yi Dong and
Sixian Li and
Changhao Jiang and
Shihan Dou and
Zhiheng Xi and
Enyu Zhou and
Jixuan Huang and
Hui Li and
Jingjing Gong and
Xingjun Ma and
Tao Gui and
Zuxuan Wu and
Qi Zhang and
Xuanjing Huang and
Yu{-}Gang Jiang and
Xipeng Qiu},
title = {FRoM-W1: Towards General Humanoid Whole-Body Control with Language
Instructions},
journal = {CoRR},
volume = {abs/2601.12799},
year = {2026},
url = {https://doi.org/10.48550/arXiv.2601.12799},
doi = {10.48550/ARXIV.2601.12799},
eprinttype = {arXiv},
eprint = {2601.12799},
timestamp = {Tue, 24 Mar 2026 08:45:06 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2601-12799.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}