Using Mutable Arrays for Faster Sorting
February 19, 2009
Various people have tried improving the Data.List.sort function in Haskell, see Data.List.sort, not so good? and Sorting At Speed. In this post, I will also attempt improving sort performance. I will use a simple algorithm:

Mergesort with Mutable Arrays
Below we will see why converting a linked list to an array improves sorting performance.
Linked lists versus mutable arrays
I have depicted a linked list graphically below. Data constructors are shown as boxes, pointers as arrows, and elements as ellipses. As seen we need two pointers for each element. As Haskell’s linked lists are immutable we must recreate the list if we want to change one or more elements.

A Singly Linked List
A Haskell mutable array is depicted below. We only use one pointer per element and the lookup operation and the update operation are both O(1).

An Array
Mergesort
Mergesort for immutable linked lists reallocates (recreates) a new list for each traversal. That is, it recreates the list for each time it has visited all elements once. There are log(n) traversals. All these memory allocations and latter cleanups by the garbage collector takes time. See Sorting At Speed for GHC’s implementation of mergesort.
Mergesort, using mutable arrays, do not need to recreate the list for each traversal. It uses two list operations, update and lookup, which are both O(1). Like mergesort for immutable linked lists, mergesort for mutable arrays still needs to traverse the list log(n) times.
The use of one pointer per element and no need to reallocate the list for each traversal, makes it seem likely that converting a linked list to a mutable array, sorting and converting back again, may be faster than sorting a linked list directly.
Benchmarks
Up until now it has all been theory and I will need to do measurements to back up the claim, that we can sort faster by converting to a mutable array. I do not know all the idiosyncrasies of modern hardware or which optimizations GHC may or may not do. There may be GHC optimizations, which makes immutable linked lists a lot faster than I imagine.
I have therefore implemented a mergesort using mutable arrays. I have used this program to benchmark my implementation. The program was compiled with GHC 6.10.1 using the optimization flag -O2. And I ran the benchmark program as: “./Benchmark +RTS -K32388608 -RTS >/dev/null”. I have tested both with lists of Ints and list of strings.
The benchmark program calls System.Mem.performGC before and after sorting. The last call to System.Mem.performGC is measured as part of the sorting. In this way I am internalizing the cost of garbage collection into the sort function. I use this in stead of measuring memory use directly. The the first call is done, so that we start with a clean sheet before each measurement.
| Iterations | Number of elements | Data.List.sort (ms) | Array sort (ms) | Speedup (%) |
|---|---|---|---|---|
| 25000 | 10 | 264 | 272 | -4 |
| 50000 | 10 | 684 | 572 | 16 |
| 100000 | 10 | 2396 | 2216 | 7 |
| 25000 | 20 | 1136 | 1056 | 7 |
| 10000 | 40 | 916 | 808 | 11 |
| 10000 | 50 | 1184 | 1016 | 14 |
| 10000 | 75 | 1872 | 1548 | 17 |
| 5000 | 100 | 1288 | 1048 | 18 |
| 5000 | 200 | 2912 | 2420 | 16 |
| 2000 | 400 | 2492 | 2024 | 18 |
| 2000 | 600 | 4000 | 3184 | 20 |
| 2000 | 800 | 5624 | 4276 | 23 |
| 1000 | 1000 | 3656 | 2744 | 24 |
| 20 | 10000 | 2276 | 1592 | 30 |
| 5 | 100000 | 8844 | 4260 | 51 |
| 2 | 1000000 | 62559 | 24677 | 60 |
| Iterations | Number of elements | Data.List.sort (ms) | Array sort (ms) | Speedup (%) |
|---|---|---|---|---|
| 10000 | 10 | 304 | 312 | -3 |
| 20000 | 10 | 604 | 624 | -4 |
| 40000 | 10 | 1216 | 1252 | -3 |
| 10000 | 20 | 524 | 504 | 3 |
| 5000 | 50 | 632 | 544 | 13 |
| 5000 | 100 | 1320 | 1072 | 18 |
| 2000 | 200 | 1156 | 980 | 15 |
| 1000 | 400 | 1236 | 1008 | 18 |
| 750 | 600 | 1504 | 1184 | 21 |
| 500 | 800 | 1404 | 1080 | 23 |
| 500 | 1000 | 1828 | 1404 | 23 |
| 10 | 10000 | 992 | 716 | 27 |
| 5 | 100000 | 8860 | 4296 | 51 |
As can be seen, the speedups for large lists are quite good, but not for small lists. The speedup is even slighty negative for some small lists. That the benefits are better for large lists is not surprising, when considering that the cost of converting to and from arrays is O(n), whereas the gains are O(n * log(n)).
Sorting Ints shows better speedup than sorting strings. I speculate, that this is due to strings being more expensive to compare than Ints. As both Data.List.sort and my own sorting algorithm uses the same number of comparisons, the actual compare operations are equally expensive for both algorithms. When a constant part, the compare operations, is larger, then the (percentage) speedup gets smaller.
Taking the beauty out of functional programming?
Some might argue that using mutable arrays removes the beauty from functional programming – and they would be right. I do not see a problem though, as the nasty mutable code can be hidden behind a nice functional interface. Furthermore, a sorting algorithm only needs to be implemented once, but can be used man times.
Future Improvements
Although my sorting algorithm is, in most cases, faster than the stock GHC one, there is still room for improvement.
As shown, we are converting a linked list to an array, and back to a linked list again. As we are converting to an array, we might as well convert to an array of unboxed types in stead. Using unboxed types would avoid one pointer per element. Initial experiments shows significant speedups, as compared to my current sorting algorithm. Of cause, this only works for types which can be unboxed, which fortunately includes popular types such as Int, Double and Float. Using rewrite rules we can even hide the choice of using unboxed types from the user of the sorting algorithm.
Haskell has other types of arrays, such as Parallel arrays (module GHC.PArr) which is claimed to be fast. Choosing another array implementations may improve sorting performance.
Lastly, as Don Stewart explains in his blogpost “A Journal of Haskell Programming Write Haskell as fast as C: exploiting strictness, laziness and recursion” we may use GHC core to gain better performance.
February 20, 2009 at 10:49 am
Of course, your via-arrays sort is going to be stricter than Data.List.sort, since it’ll sort the whole list no matter how much or little of it is demanded. Conversely, take 1 . Data.List.sort is O(n).
February 22, 2009 at 11:29 pm
Lilac: Good point. Had really only thought about lazyness in binary terms.
February 20, 2009 at 10:54 pm
lilac: true, but it’s not necessarily that much of a loss; both end up forcing the entire list anyway.
In theory you could use an STArray implementation of, say, heapsort, that returned elements lazily, since the top of the heap moves upwards through the array as you pull successive elements from it. O(n) to establish the heap property, then O(k*log(n)) to retrieve k elements.
February 22, 2009 at 11:38 pm
Richard Barrell: I tried implementing the STArray based heapsort. Unfortunately it was a lot slower than GHC’s Data.List.sort for list where length xs < 100, and about the same speed for larger lists. Of cause it can be that I implemented it poorly. Or maybe heapsort requires more comparison operations than mergesort. I will need to investigate further…
That said, heapsort also has the problem of being unstable. I do not how much of a problem this is. To be earnest, I cannot remember ever requiring a sorting algorithm to be stable.