3D hypothesis clustering for cross-view matching in multi-person motion capture

Abstract We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine cross-view correspondences for the 2D joints in the presence of noise. We propose a 3D hypothesis clustering technique to solve this problem. The core idea is t...

Full description

Bibliographic Details
Main Authors: Miaopeng Li, Zimeng Zhou, Xinguo Liu
Format: Article
Language:English
Published: SpringerOpen 2020-06-01
Series:Computational Visual Media
Subjects:
Online Access:http://link.springer.com/article/10.1007/s41095-020-0171-y
id doaj-126b0ea5f44a48198a91b0dc390aaa56
record_format Article
spelling doaj-126b0ea5f44a48198a91b0dc390aaa562020-11-25T03:37:41ZengSpringerOpenComputational Visual Media2096-04332096-06622020-06-016214715610.1007/s41095-020-0171-y3D hypothesis clustering for cross-view matching in multi-person motion captureMiaopeng Li0Zimeng Zhou1Xinguo Liu2State Key Lab of CAD&CG, Zhejiang UniversityState Key Lab of CAD&CG, Zhejiang UniversityState Key Lab of CAD&CG, Zhejiang UniversityAbstract We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine cross-view correspondences for the 2D joints in the presence of noise. We propose a 3D hypothesis clustering technique to solve this problem. The core idea is to transform joint matching in 2D space into a clustering problem in a 3D hypothesis space. In this way, evidence from photometric appearance, multiview geometry, and bone length can be integrated to solve the clustering problem efficiently and robustly. Each cluster encodes a set of matched 2D joints for the same person across different views, from which the 3D joints can be effectively inferred. We then assemble the inferred 3D joints to form full-body skeletons for all persons in a bottom–up way. Our experiments demonstrate the robustness of our approach even in challenging cases with heavy occlusion, closely interacting people, and few cameras. We have evaluated our method on many datasets, and our results show that it has significantly lower estimation errors than many state-of-the-art methods.http://link.springer.com/article/10.1007/s41095-020-0171-ymulti-person motion capturecross-view matchingclusteringhuman pose estimation
collection DOAJ
language English
format Article
sources DOAJ
author Miaopeng Li
Zimeng Zhou
Xinguo Liu
spellingShingle Miaopeng Li
Zimeng Zhou
Xinguo Liu
3D hypothesis clustering for cross-view matching in multi-person motion capture
Computational Visual Media
multi-person motion capture
cross-view matching
clustering
human pose estimation
author_facet Miaopeng Li
Zimeng Zhou
Xinguo Liu
author_sort Miaopeng Li
title 3D hypothesis clustering for cross-view matching in multi-person motion capture
title_short 3D hypothesis clustering for cross-view matching in multi-person motion capture
title_full 3D hypothesis clustering for cross-view matching in multi-person motion capture
title_fullStr 3D hypothesis clustering for cross-view matching in multi-person motion capture
title_full_unstemmed 3D hypothesis clustering for cross-view matching in multi-person motion capture
title_sort 3d hypothesis clustering for cross-view matching in multi-person motion capture
publisher SpringerOpen
series Computational Visual Media
issn 2096-0433
2096-0662
publishDate 2020-06-01
description Abstract We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine cross-view correspondences for the 2D joints in the presence of noise. We propose a 3D hypothesis clustering technique to solve this problem. The core idea is to transform joint matching in 2D space into a clustering problem in a 3D hypothesis space. In this way, evidence from photometric appearance, multiview geometry, and bone length can be integrated to solve the clustering problem efficiently and robustly. Each cluster encodes a set of matched 2D joints for the same person across different views, from which the 3D joints can be effectively inferred. We then assemble the inferred 3D joints to form full-body skeletons for all persons in a bottom–up way. Our experiments demonstrate the robustness of our approach even in challenging cases with heavy occlusion, closely interacting people, and few cameras. We have evaluated our method on many datasets, and our results show that it has significantly lower estimation errors than many state-of-the-art methods.
topic multi-person motion capture
cross-view matching
clustering
human pose estimation
url http://link.springer.com/article/10.1007/s41095-020-0171-y
work_keys_str_mv AT miaopengli 3dhypothesisclusteringforcrossviewmatchinginmultipersonmotioncapture
AT zimengzhou 3dhypothesisclusteringforcrossviewmatchinginmultipersonmotioncapture
AT xinguoliu 3dhypothesisclusteringforcrossviewmatchinginmultipersonmotioncapture
_version_ 1724544484874649600