Computer Graphics and Visualization Lab

Computer Graphics and Visualization Lab

University of Hong Kong

Computer Graphics and Visualization

CGVU Lab, led by Prof. Taku Komura, belongs to the Department of Computer Science, the University of Hong Kong. Our research focus is on physically-based animation and the application of machine learning techniques for animation synthesis.

group.png

Meet the Team

Principal Investigator

Avatar

Taku Komura

Professor

Physical Simulation, Character Animation, 3D Modelling

Research Staff

Avatar

Floyd M. Chitalu

Senior researcher, since Nov. 2022.

Physical Simulation

Avatar

Yinghao Huang

Postdoc, since Aug. 2023.

Human Pose Estimation, Human Motion Generation

Avatar

Chen Peng

Postdoc, since Sep. 2023.

Physically-Based Animation, Fluid Simulation

Graduate Students

Avatar

Linxu Fan

PHD, since Nov. 2019.

Physical Simulation

Avatar

Zhiyang Dou

PhD, since Aug. 2020.
Co-supv. by Prof. Wenping Wang.

Character Animation, Geometric Computing

Avatar

Dafei Qin

PhD, since Sep. 2020.

Facial Animation, Neural Rendering

Avatar

Mingyi Shi

PhD, since Nov. 2020.

3D Human Moton, Generative AI

Avatar

Jintao Lu

PhD, since Sept. 2021.

Human Scene Interaction, Motion Control

Avatar

Huancheng Lin

M.Phil., since Sep. 2022.

Physical Simulation

Avatar

Kemeng Huang

PhD, since Sep. 2022.

Physical Simulation, High Performance Computing

Avatar

Guying Lin

MPhil, since Sept. 2022.
Co-supv. by Prof. Wenping Wang.

Neural Implicit Surface Representation

Avatar

Wenjia Wang

PhD, since Jan. 2023.

3D Reconstruction, Human Pose Estimation, Human Motion Generation

Avatar

Zhouyingcheng Liao

PhD, since Jan. 2023.

Neural Cloth Simulation, Character Animation

Avatar

Yuke Lou

M.Phil, since Sept. 2023.

Motion Generation

Avatar

Xiaohan Ye

PhD, since Sept. 2023.

Physics Simulation, Motion Control

Research Assistant

Avatar

Leo Ho

Research Assistant, since Aug. 2023.

Digital Humans, Motion Synthesis

Avatar

Xinyu Lu

Research Assistant, since Sep. 2023.

Physically-Based Animation, Simulation

Recent Publications

Quickly discover relevant content by filtering publications.
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

The synthesis of realistic and physically plausible human-scene interaction animations presents a critical and complex challenge in computer vision and embodied AI. Recent advances primarily focus on developing specialized character controllers for individual interaction tasks, such as contacting and carrying, often overlooking the need to establish a unified policy for versatile skills. This limitation hinders the ability to generate high-quality motions across a variety of challenging human-scene interaction tasks that require the integration of multiple skills, e.g., walking to a chair and sitting down while carrying a box. To address this issue, we present TokenHSI, a unified controller designed to synthesize various types of human-scene interaction animations. The key innovation of our framework is the use of tokenized proprioception for the simulated character, combined with various task observations, complemented by a masking mechanism that enables the selection of tasks on demand. In addition, our unified policy network is equipped with flexible input size capabilities, enabling efficient adaptation of learned foundational skills to new environments and tasks. By introducing additional input tokens to the pre-trained policy, we can not only modify interaction targets but also integrate learned skills to address diverse challenges. Overall, our framework facilitates the generation of a wide range of character animations, significantly improving flexibility and adaptability in human-scene interactions.

DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand-face interaction recovery, Decaf, introduces a global fitting optimization guided by contact and deformation estimation networks trained on studio-collected data with 3D annotations. However, Decaf suffers from a time-consuming optimization process and limited generalization capability due to its reliance on 3D annotations of hand-face interaction data. To address these issues, we present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image. DICE estimates the poses of hands and faces, contacts, and deformations simultaneously using a Transformer-based architecture. It features disentangling the regression of local deformation fields and global mesh vertex locations into two network branches, enhancing deformation and contact estimation for precise and robust hand-face mesh recovery. To improve generalizability, we propose a weakly-supervised training approach that augments the training set using in-the-wild images without 3D ground-truth annotations, employing the depths of 2D keypoints estimated by off-the-shelf models and adversarial priors of poses for supervision. Our experiments demonstrate that DICE achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility. Additionally, our method operates at an interactive rate (20 fps) on an Nvidia 4090 GPU, whereas Decaf requires more than 15 seconds for a single image. Our code will be publicly available upon publication.

CBIL: Collective Behavior Imitation Learning for Fish from Real Videos

CBIL: Collective Behavior Imitation Learning for Fish from Real Videos

Reproducing realistic collective behaviors presents a captivating yet formidable challenge. Traditional rule-based methods rely on hand-crafted principles, limiting motion diversity and realism in generated collective behaviors. Recent imitation learning methods learn from data but often require ground truth motion trajectories and struggle with authenticity, especially in high-density groups with erratic movements. In this paper, we present a scalable approach, Collective Behavior Imitation Learning (CBIL), for learning fish schooling behavior directly from videos, without relying on captured motion trajectories. Our method first leverages Video Representation Learning, where a Masked Video AutoEncoder (MVAE) extracts implicit states from video inputs in a self-supervised manner. The MVAE effectively maps 2D observations to implicit states that are compact and expressive for following the imitation learning stage. Then, we propose a novel adversarial imitation learning method to effectively capture complex movements of the schools of fish, allowing for efficient imitation of the distribution for motion patterns measured in the latent space. It also incorporates bio-inspired rewards alongside priors to regularize and stabilize training. Once trained, CBIL can be used for various animation tasks with the learned collective motion priors. We further show its effectiveness across different species. Finally, we demonstrate the application of our system in detecting abnormal fish behavior from in-the-wild videos.

Analytic rotation-invariant modelling of anisotropic finite elements

Analytic rotation-invariant modelling of anisotropic finite elements

Anisotropic hyperelastic distortion energies are used to solve many problems in fields like computer graphics and engineering with applications in shape analysis, deformation, design, mesh parameterization, biomechanics and more. However, formulating a robust anisotropic energy that is low-order and yet sufficiently non-linear remains a challenging problem for achieving the convergence promised by Newton-type methods in numerical optimization. In this paper, we propose a novel analytic formulation of an anisotropic energy that is smooth everywhere, low-order, rotationally-invariant and at-least twice differentiable. At its core, our approach utilizes implicit rotation factorizations with invariants of the Cauchy-Green tensor that arises from the deformation gradient. The versatility and generality of our analysis is demonstrated through a variety of examples, where we also show that the constitutive law suggested by the anisotropic version of the well-known \textit{As-Rigid-As-Possible} energy is the foundational parametric description of both passive and active elastic materials. The generality of our approach means that we can systematically derive the force and force-Jacobian expressions for use in implicit and quasistatic numerical optimization schemes, and we can also use our analysis to rewrite, simplify and speedup several existing anisotropic \textit{and} isotropic distortion energies with guaranteed inversion-safety.

EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation

EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation

We introduce Efficient Motion Diffusion Model (EMDM) for fast and high-quality human motion generation. Current state-of-the-art generative diffusion models have produced impressive results but struggle to achieve fast generation without sacrificing quality. On the one hand, previous works, like motion latent diffusion, conduct diffusion within a latent space for efficiency, but learning such a latent space can be a non-trivial effort. On the other hand, accelerating generation by naively increasing the sampling step size, e.g., DDIM, often leads to quality degradation as it fails to approximate the complex denoising distribution. To address these issues, we propose EMDM, which captures the complex distribution during multiple sampling steps in the diffusion model, allowing for much fewer sampling steps and significant acceleration in generation. This is achieved by a conditional denoising diffusion GAN to capture multimodal data distributions among arbitrary (and potentially larger) step sizes conditioned on control signals, enabling fewer-step motion sampling with high fidelity and diversity. To minimize undesired motion artifacts, geometric losses are imposed during network learning. As a result, EMDM achieves real-time motion generation and significantly improves the efficiency of motion diffusion models compared to existing methods while achieving high-quality motion generation. Our code is available at \url{https://github.com/Frank-ZY-Dou/EMDM}.

Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models

Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models

We present Surf-D, a novel method for generating high-quality 3D shapes as Surfaces with arbitrary topologies using Diffusion models. Previous methods explored shape generation with different representations and they suffer from limited topologies and poor geometry details. To generate high-quality surfaces of arbitrary topologies, we use the Unsigned Distance Field (UDF) as our surface representation to accommodate arbitrary topologies. Furthermore, we propose a new pipeline that employs a point-based AutoEncoder to learn a compact and continuous latent space for accurately encoding UDF and support high-resolution mesh extraction. We further show that our new pipeline significantly outperforms the prior approaches to learning the distance fields, such as the grid-based AutoEncoder, which is not scalable and incapable of learning accurate UDF. In addition, we adopt a curriculum learning strategy to efficiently embed various surfaces. With the pretrained shape latent space, we employ a latent diffusion model to acquire the distribution of various shapes. Extensive experiments are presented on using Surf-D for unconditional generation, category conditional generation, image conditional generation, and text-to-shape tasks. The experiments demonstrate the superior performance of Surf-D in shape generation across multiple modalities as conditions.

Contact