A Performance Model for Designing Provably Efficient GPU Algorithms
October 2, 2017, 11AM
Over the past two decades, advances in Graphics Processing Units (GPUs) have made them increasingly effective at solving general computationally intensive tasks. However, many existing algorithms are not well-suited to the unique architectural features of the GPU. While a great deal of work has been focused on using GPUs to efficiently solve specific problems, there remains no de-facto GPU performance model for developing or analyzing algorithms. As a result, many state-of-the-art GPU algorithms rely on heuristics and optimizations to maximize performance.
In this work, we present an in-depth study of modern GPU architectures to identify key features that drive performance. We use the results of this study to develop a model that allows us to analytically evaluate the performance of existing algorithms on the GPU. To verify the accuracy of the model, we consider the fundamental problems of searching and sorting. Using our model, we show that several state-of-the-art algorithms exhibit asymptotically sub-optimal performance on the GPU. Empirical results verify this analysis and indicate that our model accurately identifies the performance bottlenecks of these algorithms. We develop novel techniques that we use to design searching and sorting algorithms that exhibit optimal memory access patterns and maximize overall performance. We empirically measure the performance of our GPU-efficient algorithms on several GPU platforms and demonstrate that they significantly out-perform current state-of-the-art implementations.