dc.contributor.author | Ha, Linh | en_US |
dc.contributor.author | Krueger, Jens | en_US |
dc.contributor.author | Silva, Claudio T. | en_US |
dc.date.accessioned | 2015-02-23T09:30:13Z | |
dc.date.available | 2015-02-23T09:30:13Z | |
dc.date.issued | 2009 | en_US |
dc.identifier.issn | 1467-8659 | en_US |
dc.identifier.uri | http://dx.doi.org/10.1111/j.1467-8659.2009.01542.x | en_US |
dc.description.abstract | Efficient sorting is a key requirement for many computer science algorithms. Acceleration of existing techniques as well as developing new sorting approaches is crucial for many real-time graphics scenarios, database systems, and numerical simulations to name just a few. It is one of the most fundamental operations to organize and filter the ever growing massive amounts of data gathered on a daily basis. While optimal sorting models for serial execution on a single processor exist, efficient parallel sorting remains a challenge. In this paper, we present a hardware-optimized parallel implementation of the radix sort algorithm that results in a significant speed up over existing sorting implementations. We outperform all known General Processing Unit (GPU) based sorting systems by about a factor of two and eliminate restrictions on the sorting key space. This makes our algorithm not only the fastest, but also the first general GPU sorting solution. | en_US |
dc.publisher | The Eurographics Association and Blackwell Publishing Ltd | en_US |
dc.title | Fast Four-Way Parallel Radix Sorting on GPUs | en_US |
dc.description.seriesinformation | Computer Graphics Forum | en_US |
dc.description.volume | 28 | en_US |
dc.description.number | 8 | en_US |
dc.identifier.doi | 10.1111/j.1467-8659.2009.01542.x | en_US |
dc.identifier.pages | 2368-2378 | en_US |