Auto-Tuning Complex Array Layouts for GPUs
Abstract
The continuing evolution of Graphics Processing Units (GPU) has shown rapid performance increases over the years. But with each new hardware generation, the constraints for programming them efficiently have changed. Programs have to be tuned towards one specific hardware to unleash the full potential. This is time consuming and costly as vendors tend to release a new generation every 18 months. It is therefore important to auto-tune GPU code to achieve GPU-specific improvements. Using either static or empirical profiling to adjust parameters or to change the kernel implementation. We introduce a new approach to automatically improve memory access on GPUs. Our system generates an application specific library which abstracts the memory access for complex arrays on the host and GPU side. This allows to optimize the code by exchanging the memory layout without recompiling the application, as all necessary layouts are pre-compiled into the library. Our implementation is able to speedup real-world applications up to an order of magnitude and even outperforms hand-tuned implementations.
BibTeX
@inproceedings {10.2312:pgv.20141085,
booktitle = {Eurographics Symposium on Parallel Graphics and Visualization},
editor = {Margarita Amor and Markus Hadwiger},
title = {{Auto-Tuning Complex Array Layouts for GPUs}},
author = {Weber, Nicolas and Goesele, Michael},
year = {2014},
publisher = {The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-59-0},
DOI = {10.2312/pgv.20141085}
}
booktitle = {Eurographics Symposium on Parallel Graphics and Visualization},
editor = {Margarita Amor and Markus Hadwiger},
title = {{Auto-Tuning Complex Array Layouts for GPUs}},
author = {Weber, Nicolas and Goesele, Michael},
year = {2014},
publisher = {The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-59-0},
DOI = {10.2312/pgv.20141085}
}