Ferret: Reviewing Tabular Datasets for Manipulation
Date
2023Author
Lange, Devin
Sahai, Shaurya
Phillips, Jeff M.
Lex, Alexander
Metadata
Show full item recordAbstract
How do we ensure the veracity of science? The act of manipulating or fabricating scientifc data has led to many high-profle fraud cases and retractions. Detecting manipulated data, however, is a challenging and time-consuming endeavor. Automated detection methods are limited due to the diversity of data types and manipulation techniques. Furthermore, patterns automatically fagged as suspicious can have reasonable explanations. Instead, we propose a nuanced approach where experts analyze tabular datasets, e.g., as part of the peer-review process, using a guided, interactive visualization approach. In this paper, we present an analysis of how manipulated datasets are created and the artifacts these techniques generate. Based on these fndings, we propose a suite of visualization methods to surface potential irregularities. We have implemented these methods in Ferret, a visualization tool for data forensics work. Ferret makes potential data issues salient and provides guidance on spotting signs of tampering and differentiating them from truthful data.
BibTeX
@article {10.1111:cgf.14822,
journal = {Computer Graphics Forum},
title = {{Ferret: Reviewing Tabular Datasets for Manipulation}},
author = {Lange, Devin and Sahai, Shaurya and Phillips, Jeff M. and Lex, Alexander},
year = {2023},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/cgf.14822}
}
journal = {Computer Graphics Forum},
title = {{Ferret: Reviewing Tabular Datasets for Manipulation}},
author = {Lange, Devin and Sahai, Shaurya and Phillips, Jeff M. and Lex, Alexander},
year = {2023},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/cgf.14822}
}