Show simple item record

dc.contributor.authorGross, Julianen_US
dc.contributor.authorKöster, Marcelen_US
dc.contributor.authorKrüger, Antonioen_US
dc.contributor.editorRitsos, Panagiotis D. and Xu, Kaien_US
dc.date.accessioned2020-09-10T06:27:50Z
dc.date.available2020-09-10T06:27:50Z
dc.date.issued2020
dc.identifier.isbn978-3-03868-122-9
dc.identifier.urihttps://doi.org/10.2312/cgvc.20201151
dc.identifier.urihttps://diglib.eg.org:443/handle/10.2312/cgvc20201151
dc.description.abstractNearest neighbor search algorithms on GPUs have been improving for years. Starting with tree-based approaches in the middle 70's, state-of-the-art methods use hash-based or grid-based methods. Leveraging high-performance hardware functionality decreases runtime of these search algorithms. Furthermore, memory consumption has been decreased significantly as well using Shared Memory. In the scope of these enhancements, particles have been reordered by different constraints that simplify neighbor processing. However, inspecting the existing algorithms reveals underused capabilities caused by algorithm desing. Exploiting these capabilities in a smart way can increase occupancy and efficiency on GPUs. In this paper, we present a neighbor processing approach that is based on dynamic load balancing. We rely on a lightweight workload-analysis phase that is applied during neighbor processing to distribute work throughout all warps in a thread group on-the-fly. In different domains, the neighbor function is often symmetric and, thus, commutative in each argument. In contrast to prior work, we use this domain knowledge to reduce the number of memory accesses considerably. Measurements of the newly introduced features on our evaluation scenarios show a comparable runtime performance to state-of-the-art methods. Increasing the overall workload by processing million-particle domains leads to significant improvements in terms of runtime. At the same time, we minimize global memory consumption to enable more particles to be processed compared to current approaches.en_US
dc.publisherThe Eurographics Associationen_US
dc.subjectComputing methodologies
dc.subjectShared memory algorithms
dc.subjectMassively parallel algorithms
dc.subjectGraphics processors
dc.titleCLAWS: Computational Load Balancing for Accelerated Neighbor Processing on GPUs using Warp Schedulingen_US
dc.description.seriesinformationComputer Graphics and Visual Computing (CGVC)
dc.description.sectionheadersGraphics
dc.identifier.doi10.2312/cgvc.20201151
dc.identifier.pages53-61


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record