InterAct: A Large-Scale Dataset of Dynamic, Expressive and Interactive Activities between Two People in Daily Scenarios

*Equal contribution
1 The University of Hong Kong, 2 Centre for Transformative Garment Production, 3 Great Bay University, 4 Dongguan Key Laboratory for Intelligence and Information Technology, 5 Shandong University, 6 National Institute of Informatics
Poster, showing various characters in pairs of two performing different postures

Abstract

We introduce InterAct, a new multi-modal MoCap dataset composed of 241 motion sequences where two people perform a realistic and coherent scenario for one minute or longer over a complete interaction. We simultaneously model two people's activities, and target objective-driven, dynamic, and semantically consistent interactions which often span longer duration and cover bigger space. The speech audios, body motions, and facial expressions of both persons are captured. Most previous works either only consider one person or solely focus on conversational gestures of two people, assuming the body orientation and/or position of each actor are constant or barely change over each interaction. Our work is the first to capture and model such long-term and dynamic interactions between two people. To facilitate further research, the data and code will be made public upon acceptance.

Dataset at a Glance

Dataset at a Glance

Breakdown

Relationship Pie

Relationship

Emotion Pie

Emotion

Gender Pie

Gender

Video Overview

Examples

What's Included in the Dataset

  • Motion Capture Data: High-quality MoCap sequences of two-person interactions in BVH format
  • Facial Expressions: Detailed facial expression data in ARKit format
  • Facial Templates: Facial mesh templates for each actor
  • Speech Audio: Speech audio of two actors for every sequence
  • Scenario Descriptions: Relationship and actor directions for each scenario
  • Annotation Data: Labels for {sit, walk, stand} for every frame of every BVH file
  • Rendered Visualizations: Rendered visualizations of dataset, with Blender files for body and face
  • Additional Facial Dataset: TIMIT facial expression data in ARKit format for lip accuracy fine-tuning

License

The InterAct dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

  • Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • Non-Commercial: You may not use the material for commercial purposes.
  • ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
  • No additional restrictions: You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

BibTeX

      
@article{huang2024interact,
      title={InterAct: Capture and Modelling of Realistic, Expressive and Interactive Activities between Two Persons in Daily Scenarios}, 
      author={Yinghao Huang and Leo Ho and Dafei Qin and Mingyi Shi and Taku Komura},
      year={2024},
      eprint={2405.11690},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}