Show simple item record

dc.contributor.authorLuvizon, Diogo C.en_US
dc.contributor.authorHabermann, Marcen_US
dc.contributor.authorGolyanik, Vladislaven_US
dc.contributor.authorKortylewski, Adamen_US
dc.contributor.authorTheobalt, Christianen_US
dc.contributor.editorMyszkowski, Karolen_US
dc.contributor.editorNiessner, Matthiasen_US
dc.date.accessioned2023-05-03T06:10:51Z
dc.date.available2023-05-03T06:10:51Z
dc.date.issued2023
dc.identifier.issn1467-8659
dc.identifier.urihttps://doi.org/10.1111/cgf.14768
dc.identifier.urihttps://diglib.eg.org:443/handle/10.1111/cgf14768
dc.description.abstractIn this work, we consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. In contrast to expensive marker-based or multi-view systems, our lightweight setup is ideal for private users as it enables an affordable 3D motion capture that is easy to install and does not require expert knowledge. To deal with this challenging setting, we leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. Thus, we introduce the first non-linear optimization-based approach that jointly solves for the 3D position of each human, their articulated pose, their individual shapes as well as the scale of the scene. In particular, we estimate the scene depth and person scale from normalized disparity predictions using the 2D body joints and joint angles. Given the per-frame scene depth, we reconstruct a point-cloud of the static scene in 3D space. Finally, given the per-frame 3D estimates of the humans and scene point-cloud, we perform a space-time coherent optimization over the video to ensure temporal, spatial and physical plausibility. We evaluate our method on established multi-person 3D human pose benchmarks where we consistently outperform previous methods and we qualitatively demonstrate that our method is robust to in-thewild conditions including challenging scenes with people of different sizes. Code: https://github.com/dluvizon/ scene-aware-3d-multi-humanen_US
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/
dc.subjectCCS Concepts: Computing methodologies -> Motion capture; Scene understanding
dc.subjectComputing methodologies
dc.subjectMotion capture
dc.subjectScene understanding
dc.titleScene-Aware 3D Multi-Human Motion Capture from a Single Cameraen_US
dc.description.seriesinformationComputer Graphics Forum
dc.description.sectionheadersCapturing Human Pose and Appearance
dc.description.volume42
dc.description.number2
dc.identifier.doi10.1111/cgf.14768
dc.identifier.pages371-383
dc.identifier.pages13 pages


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 International License
Except where otherwise noted, this item's license is described as Attribution 4.0 International License