Recognising Human-Object Interactions Using Attention-based LSTMs
Abstract
Recognising Human-object interactions (HOIs) in videos is a challenge task especially when a human can interact with multiple objects. This paper attempts to solve the problem of HOIs by proposing a hierarchical framework that analyzes human-object interactions from a video sequence. The framework consists of LSTMs that firstly capture both human motion and temporal object information independently, followed by fusing these information through a bilinear layer to aggregate human-object features, which are then fed to a global deep LSTM to learn high-level information of HOIs. The proposed approach applies an attention mechanism to LSTMs in order to focus on important parts of human and object temporal information.
BibTeX
@inproceedings {10.2312:cgvc.20191269,
booktitle = {Computer Graphics and Visual Computing (CGVC)},
editor = {Vidal, Franck P. and Tam, Gary K. L. and Roberts, Jonathan C.},
title = {{Recognising Human-Object Interactions Using Attention-based LSTMs}},
author = {Almushyti, Muna and Li, Frederick W. B.},
year = {2019},
publisher = {The Eurographics Association},
ISBN = {978-3-03868-096-3},
DOI = {10.2312/cgvc.20191269}
}
booktitle = {Computer Graphics and Visual Computing (CGVC)},
editor = {Vidal, Franck P. and Tam, Gary K. L. and Roberts, Jonathan C.},
title = {{Recognising Human-Object Interactions Using Attention-based LSTMs}},
author = {Almushyti, Muna and Li, Frederick W. B.},
year = {2019},
publisher = {The Eurographics Association},
ISBN = {978-3-03868-096-3},
DOI = {10.2312/cgvc.20191269}
}