Semantic Wordification of Document Collections
View/ Open
Date
2012Author
Paulovich, Fernando V.
Toledo, Franklina M. B.
Telles, Guilherme P.
Minghim, Rosane
Nonato, Luis Gustavo
Metadata
Show full item recordAbstract
Word clouds have become one of the most widely accepted visual resources for document analysis and visualization, motivating the development of several methods for building layouts of keywords extracted from textual data. Existing methods are effective to demonstrate content, but are not capable of preserving semantic relationships among keywords while still linking the word cloud to the underlying document groups that generated them. Such representation is highly desirable for exploratory analysis of document collections. In this paper we present a novel approach to build document clouds, named ProjCloud that aim at solving both semantical layouts and linking with document sets. ProjCloud generates a semantically consistent layout from a set of documents. Through a multidimensional projection, it is possible to visualize the neighborhood relationship between highly related documents and their corresponding word clouds simultaneously. Additionally, we propose a new algorithm for building word clouds inside polygons, which employs spectral sorting to maintain the semantic relationship among words. The effectiveness and flexibility of our methodology is confirmed when comparisons are made to existing methods. The technique automatically constructs projection based layouts the user may choose to examine in the form of the point clouds or corresponding word clouds, allowing a high degree of control over the exploratory process.
BibTeX
@article {10.1111:j.1467-8659.2012.03107.x,
journal = {Computer Graphics Forum},
title = {{Semantic Wordification of Document Collections}},
author = {Paulovich, Fernando V. and Toledo, Franklina M. B. and Telles, Guilherme P. and Minghim, Rosane and Nonato, Luis Gustavo},
year = {2012},
publisher = {The Eurographics Association and Blackwell Publishing Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/j.1467-8659.2012.03107.x}
}
journal = {Computer Graphics Forum},
title = {{Semantic Wordification of Document Collections}},
author = {Paulovich, Fernando V. and Toledo, Franklina M. B. and Telles, Guilherme P. and Minghim, Rosane and Nonato, Luis Gustavo},
year = {2012},
publisher = {The Eurographics Association and Blackwell Publishing Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/j.1467-8659.2012.03107.x}
}