MMHU:

A Massive-Scale Multimodal Benchmark for Human Behavior Understanding

1Texas A&M University    2Brown University    3Johns Hopkins University    4UT Austin
*equal contribution    corresponding author
Image

We propose MMHU, a large-scale dataset for human behavior understanding. We collected 57k human instances with diverse behaviors such as playing with a mobile phone, holding an item, or using mobility devices, from diverse scenes such as in the city, school, park, and alley. We provide rich annotations including motion and trajectory, text descriptions for human motions, and recognize the behaviors that are critical to driving safety. MMHU can facilitate many downstream tasks such as motion prediction, motion generation, or behavior VQA.

MMHU Dataset

The statistics of the MMHU dataset.

Image

Some examples of MMHU: each sample contains the motion sequence rendered on the original image, the behavior tags, and the text descriptions.

Annotation Pipeline

We collect data from three sources: the Waymo dataset, the YouTube videos, and the self-collected or paid driving videos. We designed a labeling pipeline to obtain high-quality data annotation with minimal human effort.

Image

Facilitating Downstream Tasks

The current Motion Generation approaches are not capable of generating human motion in the street context (Left of each example). With fine-tuning on MMHU, they can properly generate such human motions (Right of each example).

Image
Image
Image
Image

Fine-tuning on MMHU also improves the performance of the baseline models on
Motion Prediction, Intention Prediction, and Behavior VQA.

Image

BibTeX


@misc{li2025mmhumassivescalemultimodalbenchmark,
      title={MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding}, 
      author={Renjie Li and Ruijie Ye and Mingyang Wu and Hao Frank Yang and Zhiwen Fan and Hezhen Hu and Zhengzhong Tu},
      year={2025},
      eprint={2507.12463},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.12463}, 
}