Transferable Videorealistic Speech Animation

Chang, Yao-Jen; Ezzat, Tony

View/Open

143-152.pdf (370.5Kb)

Date

2005

Author

Chang, Yao-Jen

Ezzat, Tony

Pay-Per-View via TIB Hannover:

Try if this item/paper is available.

Metadata

Show full item record

Abstract

Image-based videorealistic speech animation achieves significant visual realism at the cost of the collection of a large 5- to 10-minute video corpus from the specific person to be animated. This requirement hinders its use in broad applications, since a large video corpus for a specific person under a controlled recording setup may not be easily obtained. In this paper, we propose a model transfer and adaptation algorithm which allows for a novel person to be animated using only a small video corpus. The algorithm starts with a multidimensional morphable model (MMM) previously trained from a different speaker with a large corpus, and transfers it to the novel speaker with a much smaller corpus. The algorithm consists of 1) a novel matching-by-synthesis algorithm which semi-automatically selects new MMM prototype images from the new video corpus and 2) a novel gradient descent linear regression algorithm which adapts the MMM phoneme models to the data in the novel video corpus. Encouraging experimental results are presented in which a morphable model trained from a performer with a 10- minute corpus is transferred to a novel person using a 15-second movie clip of him as the adaptation video corpus.

BibTeX

@inproceedings {10.2312:SCA:SCA05:143-152,
booktitle = {Symposium on Computer Animation},
editor = {D. Terzopoulos and V. Zordan and K. Anjyo and P. Faloutsos},
title = {{Transferable Videorealistic Speech Animation}},
author = {Chang, Yao-Jen and Ezzat, Tony},
year = {2005},
publisher = {The Eurographics Association},
ISSN = {1727-5288},
ISBN = {1-59593-198-8},
DOI = {10.2312/SCA/SCA05/143-152}
}

URI

http://dx.doi.org/10.2312/SCA/SCA05/143-152

Collections

SCA 05: Eurographics/SIGGRAPH Symposium on Computer Animation