Recent research shows that emotions can enhance users' cognition and influence information communication. While research on visual emotion analysis is extensive, limited work has been done on helping users generate emotionally rich image content. Existing work on emotional image generation relies on discrete emotion categories, making it challenging to capture complex and subtle emotional nuances accurately. Additionally, these methods struggle to control the specific content of generated images based on text prompts. In this paper, we introduce the task of continuous emotional image content generation (C-EICG) and present EmotiCrafter, a general emotional image generation model that generates images based on free text prompts and Valence-Arousal (V-A) values. It leverages a novel emotion-embedding mapping network to fuse V-A values into textual features, enabling the capture of emotions in alignment with intended input prompts. A novel loss function is also proposed to enhance emotion expression. The experimental results show that our method effectively generates images representing specific emotions with the desired content and outperforms existing techniques.
We introduce the task of Continuous Emotional Image Content Generation (C-EICG) and present EmotiCrafter, a novel emotional image generation model that:
- Accepts free-form text prompts
- Conditions on Valence-Arousal (V-A) values
- Leverages a new emotion-embedding mapping network to fuse V-A signals into text features
- Uses a custom loss function to improve emotional fidelity
👉 Try EmotiCrafter Demo on Hugging Face 🤗
conda env create -f environment.ymlgit clone https://github.com/idvxlab/EmotiCrafter
cd EmotiCrafterYou need to download the Stable Diffusion XL Base 1.0 model and place it appropriately.
You could download the pretrained modal from this url and place it appropriately.
python preprocess.py --sdxl_path [pretrained SDXL]python train.py \
--batch_size 768 \
--lr 0.001 \
--epochs 200 \
--save_dir ./ckpt \
--scale_factor 1.5 \
--enable_density TrueMake sure you have your environment activated and model paths ready.
conda activate emotionpython inference.py \
--prompt "A man is running fast" \
--arousal 2.5 \
--valence -2 \
--ckpt_path [pretrained_eit] \
--sdxl_path [pretrained_sdxl] \
--seed 0python inference5x5.py \
--prompt "A man is running fast" \
--ckpt_path [pretrained_eit] \
--sdxl_path [pretrained_sdxl] \
--seed 0The raw image data has been uploaded to this url. However, EmotiCrafter did not use image data for model training.
We thank the Stable Diffusion XL (SDXL), FindingEmo, OASIS, and Emotic for the their excellent works, which made this work possible. If you use EmotiCrafter in your research or applications, please cite our work.
@inproceedings{dang2025emoticrafter,
title={Emoticrafter: Text-to-emotional-image generation based on valence-arousal model},
author={Dang, Shengqi and He, Yi and Ling, Long and Qian, Ziqing and Zhao, Nanxuan and Cao, Nan},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={15218--15228},
year={2025}
}