FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

Core Contributors: Peng Li, Zihan Zhuang, Yangfan Gao, Yi Dong, Sixian Li, Changhao Jiang, Tao Gui, Xipeng Qiu

The Humanoid Intelligence Team from FudanNLP and OpenMOSS

📌 Status: Research release — the initial codebase, model checkpoints, datasets, and deployment framework are fully open-source. More powerful models and improved training recipes are under development. Contributions, issues, and PRs are welcome!

🔥 Introduction

For more information, refer to our project page and technical report.

Humanoid robots can perform diverse actions — greeting, dancing, backflipping — but these motions are typically hard-coded or task-specific. FRoM-W1 is an open-source framework for general humanoid whole-body motion control using natural language, operating in two stages:

H-GPT — A language-driven whole-body motion generation model trained on large-scale human motion data. Uses Chain-of-Thought (CoT) prompting to improve instruction understanding and generalization.
H-ACT — Retargets generated human motions into robot-specific actions, trains motion tracking policies via RL in simulation, and deploys them on real robots through a modular sim-to-real framework.

We evaluate FRoM-W1 on Unitree H1 and G1 robots. Results show strong performance on the HumanML3D-X benchmark for whole-body motion generation, and RL fine-tuning consistently improves both tracking accuracy and task success rates.

📑 Roadmap

🎉 H-GPT and H-ACT module codebases (H-GPT, H-ACT)
🎉 Sim-to-real deployment framework RoboJuDo
CoT datasets (HumanML3D-X, Motion-X) and δHumanML3D-X benchmark
SMPL-X baselines and eval model checkpoints (T2M, MotionDiffuse, MLD, T2M-GPT)
🎉 Technical Report and Project Page
More powerful models (in progress)

💾 Datasets

Due to license restrictions, we cannot publicly share all data. Below are download and processing references.

H-GPT Module (click to expand)

Dataset	Download Guide
HumanML3D	Original HumanML3D repo — backup link
KIT-ML	Original KIT-ML repo — backup link
Motion-X	Original Motion-X repo — processing guide HERE
HumanML3D-X	Process via the Motion-X repo + this guide. Uses original HumanML3D split with re-calculated mean/std. CoT data on HuggingFace.
δHumanML3D-X	Same as HumanML3D-X, with perturbed instruction variants on HuggingFace.

Expected structure for each dataset:

H-GPT/datasets/{dataset_name}/data/
├── new_joint_vecs/
├── new_joints/
├── texts/
├── cots/
├── Mean.npy
├── Std.npy
├── all.txt
├── train.txt
├── train_val.txt
├── val.txt
└── test.txt

H-ACT Module (click to expand)

Dataset	Download Guide
AMASS	Download and processing procedures from human2humanoid
AMASS-H1	Retargeted for Unitree H1 — box link (from human2humanoid)
AMASS-G1	Retargeted for Unitree G1 — link coming soon

📏 Baselines

We retrained these SMPL-X baseline models and fully open-sourced them:

SMPL-X Baseline Codebases (forked repos):

T2M · MotionDiffuse · MLD · T2M-GPT

Checkpoints (HuggingFace):

Eval model · T2M · MotionDiffuse · MLD · T2M-GPT (all SMPL-X format)

🧠 Models

H-GPT (click to expand)

Model	Download
H-GPT w.o. CoT	LoRA weights — merge with Llama-3.1 via this script
H-GPT	LoRA weights — merge with Llama-3.1
H-GPT++ w.o. CoT	LoRA weights — merge with Llama-3.1
H-GPT++	LoRA weights — merge with Llama-3.1

H-ACT (click to expand)

Policy	Download
H1-Full	Teacher (TBD), Student
H1-Clean	Teacher (TBD), Student
G1-Full	Teacher (TBD), Student
G1-Clean	Teacher (TBD), Student

🏗️ Repository Structure

FRoM-W1/
├── H-GPT/                         # Motion generation module
│   ├── hGPT/                      #  Core package (models, data, metrics, losses)
│   ├── configs/                   #  OmegaConf YAML configs (exp + arch)
│   ├── scripts/                   #  Inference entry points
│   └── motionx_processing.md      #  Dataset preparation guide
├── H-ACT/                         # Action execution module
│   ├── retarget/                  #  SMPL-X → robot joint retargeting (submodule)
│   ├── human2humanoid/            #  RL policy training framework (submodule)
│   └── RoboJuDo/                  #  Sim-to-real deployment (submodule)
├── assets/                        #  Images and media
├── QUICKSTART.md                  #  Step-by-step setup guide
├── requirements.txt
├── LICENSE                        #  Apache 2.0
└── README.md

🚀 Quick Start

The QUICKSTART.md guide walks through the full pipeline:

Text Instruction → H-GPT (motion generation) → Retarget (SMPL-X → robot joints)
 → Policy (RL training) → RoboJuDo (sim-to-real deployment) → Real Robot

Minimal inference

# 1. Setup
conda create -n fromw1 python=3.10
conda activate fromw1
pip install -r requirements.txt

# 2. Generate whole-body motion from text (H-GPT)
cd H-GPT
CUDA_VISIBLE_DEVICES=0 python -m scripts.demo \
  --cfg_assets ./configs/assets.yaml \
  --cfg configs/exp/1217_config_motionx_stage2_body_hands_llama_vqvae2kx1k_cotv3_t2mx.yaml \
  --task t2m \
  --example ./scripts/instructions.txt

# 3. Visualize
python -m hGPT.data.motionx.visualization.plot_3d_global \
  --path ./results/<result_folder>

# 4. Retarget to robot joints (H-ACT)
cd ../H-ACT/retarget
python main.py

For dataset preparation, model downloads, deps folder setup, and full deployment, follow QUICKSTART.md.

🛠️ Model Training and Evaluation

H-GPT

Three training stages controlled by the TRAIN.STAGE config field:

Stage	`TRAIN.STAGE`	Description
VQ-VAE	`"vae"`	Train whole-body motion tokenizer (convolutional encoder/decoder + vector quantization)
LM Pretrain	`"lm_pretrain"`	Finetune Llama-3.1-8B via LoRA to generate motion tokens (VQ-VAE frozen)
LM Instruct	`"lm_instruct"`	Instruction-tune with Chain-of-Thought data

See the H-GPT README for detailed training commands and evaluation protocols.

H-ACT

human2humanoid — RL-based motion tracking (primary framework)
Beyondmimic — CSV-formatted motion data required; convert with retarget/scripts/pkl_2_csv.py
TWIST — Alternative tracking strategy
RoboJuDo — Unified sim-to-real deployment with pretrained policies

🙏 Acknowledgements

We thank Biao Jiang for discussions on motion generation models, and Tairan He and Ziwen Zhuang for their help in motion tracking. We are grateful to all the open-source datasets and projects that made this work possible.

📄 Citation

If you find this work useful, please star ⭐ the repo and cite:

@article{DBLP:journals/corr/abs-2601-12799,
  author       = {Peng Li and
                  Zihan Zhuang and
                  Yangfan Gao and
                  Yi Dong and
                  Sixian Li and
                  Changhao Jiang and
                  Shihan Dou and
                  Zhiheng Xi and
                  Enyu Zhou and
                  Jixuan Huang and
                  Hui Li and
                  Jingjing Gong and
                  Xingjun Ma and
                  Tao Gui and
                  Zuxuan Wu and
                  Qi Zhang and
                  Xuanjing Huang and
                  Yu{-}Gang Jiang and
                  Xipeng Qiu},
  title        = {FRoM-W1: Towards General Humanoid Whole-Body Control with Language
                  Instructions},
  journal      = {CoRR},
  volume       = {abs/2601.12799},
  year         = {2026},
  url          = {https://doi.org/10.48550/arXiv.2601.12799},
  doi          = {10.48550/ARXIV.2601.12799},
  eprinttype   = {arXiv},
  eprint       = {2601.12799},
  timestamp    = {Tue, 24 Mar 2026 08:45:06 +0100},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2601-12799.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

🔥 Introduction

📑 Roadmap

💾 Datasets

📏 Baselines

🧠 Models

🏗️ Repository Structure

🚀 Quick Start

Minimal inference

🛠️ Model Training and Evaluation

H-GPT

H-ACT

🙏 Acknowledgements

📄 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
H-ACT		H-ACT
H-GPT		H-GPT
assets		assets
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

🔥 Introduction

📑 Roadmap

💾 Datasets

📏 Baselines

🧠 Models

🏗️ Repository Structure

🚀 Quick Start

Minimal inference

🛠️ Model Training and Evaluation

H-GPT

H-ACT

🙏 Acknowledgements

📄 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages