CUDA Sorting¶
Pyculib provides routines for sorting arrays on CUDA GPUs.
Sorting Large Arrays¶
The pyculib.sorting.RadixSort
class is recommended for
sorting large (approx. more than 1 million items) arrays of numeric types.
-
class
pyculib.sorting.
RadixSort
(maxcount, dtype, descending=False, stream=0)¶ Provides radix sort and radix select.
The algorithm implemented here is best for large arrays (
N > 1e6
) due to the latency introduced by its use of multiple kernel launches. It is recommended to usesegmented_sort
instead for batches of smaller arrays.Parameters: - maxcount (int) – Maximum number of items to sort
- dtype (numpy.dtype) – The element type to sort
- descending (bool) – Sort in descending order?
- stream – The CUDA stream to run the kernels in
-
argselect
(k, keys, begin_bit=0, end_bit=None)¶ Similar to
RadixSort.select
but returns the new sorted indices.Parameters: - keys (numpy.ndarray) – Keys to sort inplace
- begin_bit (int) – The first bit to sort
- end_bit (int) – Optional. The last bit to sort
Returns: The indices indicating the new ordering as an array on the CUDA device or on the host.
-
argsort
(keys, begin_bit=0, end_bit=None)¶ Similar to
RadixSort.sort
but returns the new sorted indices.Parameters: - keys (numpy.ndarray) – Keys to sort inplace
- begin_bit (int) – The first bit to sort
- end_bit (int) – Optional. The last bit to sort
Returns: The indices indicating the new ordering as an array on the CUDA device or on the host.
-
close
()¶ Explicitly release internal resources
Called automatically when the object is deleted.
-
init_arg
(size)¶ Initialize an empty CUDA ndarray of uint32 with ascending integers starting from zero
Parameters: size (int) – Number of elements for the output array Returns: An array with values [0, 1, 2, ...m size - 1 ]
-
select
(k, keys, vals=None, begin_bit=0, end_bit=None)¶ Perform a inplace k-select on
keys
.Memory transfer is performed automatically.
Parameters: - keys (numpy.ndarray) – Keys to sort inplace
- vals (numpy.ndarray) – Optional. Additional values to be reordered along the sort.
It is modified in place. Only the
uint32
dtype is supported in this version. - begin_bit (int) – The first bit to sort
- end_bit (int) – Optional. The last bit to sort
-
sort
(keys, vals=None, begin_bit=0, end_bit=None)¶ Perform a inplace sort on
keys
. Memory transfer is performed automatically.Parameters: - keys (numpy.ndarray) – Keys to sort inplace
- vals (numpy.ndarray) – Optional. Additional values to be reordered along the sort.
It is modified in place. Only the
uint32
dtype is supported in this version. - begin_bit (int) – The first bit to sort
- end_bit (int) – Optional. The last bit to sort
Sorting Many Small Arrays¶
Using pyculib.sorting.RadixSort
on small (approx. less than
1 million items) arrays has significant overhead due to multiple kernel
launches.
A better alternative is to use pyculib.sorting.segmented_sort()
-which launches a single kernel for sorting a batch of many small arrays.
-
pyculib.sorting.
segmented_sort
(keys, vals, segments, stream=0)¶ Performs an inplace sort on small segments (N < 1e6).
Parameters: - keys (numpy.ndarray) – Keys to sort inplace.
- vals (numpy.ndarray) – Values to be reordered inplace along the sort. Only the
uint32
dtype is supported in this implementation. - segments (numpy.ndarray) – Segment separation location. e.g.
array([3, 6, 8])
for segments ofkeys[:3]
,keys[3:6]
,keys[6:8]
,keys[8:]
. - stream – Optional. A cuda stream in which the kernels are executed.