Daniel Evangelakos will guide us through High-Performance and Scalable Radix Sorting, by Duane Merrill and Andrew Grimshaw.
Here, high performance is achieved by performing the sort on a GPU. You need not have read this paper to attend; all are welcome!
Sorting is arguably the most common task performed by computing devices, so it has long been seen as a natural and important problem of computer science study. Radix sorting is an important historical (and often overlooked) approach to ordering data. Suppose you have a collection C of length L strings of letters drawn from a known alphabet. (The number of symbols in this alphabet is its “radix”.) If we have one bucket for each letter in the alphabet, it’s pretty clear that tossing each string in the bucket labeled with the first letter of the string is a quick approximation to sorting C.
The important observation in radix sort is that we can sort the data in L passes, with each pass considering letters in each position *starting at the end*:
i = L
while i > 0:
“toss” all strings from C based on column i into appropriate buckets
re-collect strings from buckets in bucket order into C
decrement i
result: C is sorted
It’s interesting to note that (1) the algorithm sorts without using comparison, and (2) the running time of the algorithm does not vary and is proportional to the number of strings times L. This approach may perform better than many of your favorite sorts.
This sort was the basis for the design of sorting machines popular long before computers. It’s also related to approaches used in “sorting networks” that route data between parallel processors and
memories.
Grimshaw has made many important contributions to parallel and grid (structured distributed) computing. Merrill, his student, now works at NVidia research and is a three-time Cavalier.