MOVIN: Real-time Motion Capture using a Single LiDAR

Jang, Deok-Kyeong; Yang, Dongseok; Jang, Deok-Yun; Choi, Byeoli; Jin, Taeil; Lee, Sung-Hee

dc.contributor.author	Jang, Deok-Kyeong	en_US
dc.contributor.author	Yang, Dongseok	en_US
dc.contributor.author	Jang, Deok-Yun	en_US
dc.contributor.author	Choi, Byeoli	en_US
dc.contributor.author	Jin, Taeil	en_US
dc.contributor.author	Lee, Sung-Hee	en_US
dc.contributor.editor	Chaine, Raphaëlle	en_US
dc.contributor.editor	Deng, Zhigang	en_US
dc.contributor.editor	Kim, Min H.	en_US
dc.date.accessioned	2023-10-09T07:35:45Z
dc.date.available	2023-10-09T07:35:45Z
dc.date.issued	2023
dc.identifier.issn	1467-8659
dc.identifier.uri	https://doi.org/10.1111/cgf.14961
dc.identifier.uri	https://diglib.eg.org:443/handle/10.1111/cgf14961
dc.description.abstract	Recent advancements in technology have brought forth new forms of interactive applications, such as the social metaverse, where end users interact with each other through their virtual avatars. In such applications, precise full-body tracking is essential for an immersive experience and a sense of embodiment with the virtual avatar. However, current motion capture systems are not easily accessible to end users due to their high cost, the requirement for special skills to operate them, or the discomfort associated with wearable devices. In this paper, we present MOVIN, the data-driven generative method for real-time motion capture with global tracking, using a single LiDAR sensor. Our autoregressive conditional variational autoencoder (CVAE) model learns the distribution of pose variations conditioned on the given 3D point cloud from LiDAR. As a central factor for high-accuracy motion capture, we propose a novel feature encoder to learn the correlation between the historical 3D point cloud data and global, local pose features, resulting in effective learning of the pose prior. Global pose features include root translation, rotation, and foot contacts, while local features comprise joint positions and rotations. Subsequently, a pose generator takes into account the sampled latent variable along with the features from the previous frame to generate a plausible current pose. Our framework accurately predicts the performer's 3D global information and local joint details while effectively considering temporally coherent movements across frames. We demonstrate the effectiveness of our architecture through quantitative and qualitative evaluations, comparing it against state-of-the-art methods. Additionally, we implement a real-time application to showcase our method in real-world scenarios. MOVIN dataset is available at https://movin3d. github.io/movin_pg2023/.	en_US
dc.publisher	The Eurographics Association and John Wiley & Sons Ltd.	en_US
dc.subject	CCS Concepts: Computing methodologies -> Motion capture; Motion processing; Neural networks
dc.subject	Computing methodologies
dc.subject	Motion capture
dc.subject	Motion processing
dc.subject	Neural networks
dc.title	MOVIN: Real-time Motion Capture using a Single LiDAR	en_US
dc.description.seriesinformation	Computer Graphics Forum
dc.description.sectionheaders	Motion Capture and Generation
dc.description.volume	42
dc.description.number	7
dc.identifier.doi	10.1111/cgf.14961
dc.identifier.pages	12 pages