Using Mutable Arrays for Faster Sorting

February 19, 2009

Various people have tried improving the Data.List.sort function in Haskell, see Data.List.sort, not so good? and Sorting At Speed. In this post, I will also attempt improving sort performance. I will use a simple algorithm:

Mergesort with Mutable Arrays

Mergesort with Mutable Arrays

Below we will see why converting a linked list to an array improves sorting performance.

Linked lists versus mutable arrays

I have depicted a linked list graphically below. Data constructors are shown as boxes, pointers as arrows, and elements as ellipses. As seen we need two pointers for each element. As Haskell’s linked lists are immutable we must recreate the list if we want to change one or more elements.

A Singly Linked List

A Singly Linked List

A Haskell mutable array is depicted below. We only use one pointer per element and the lookup operation and the update operation are both O(1).

An Array

An Array

Mergesort

Mergesort for immutable linked lists reallocates (recreates) a new list for each traversal. That is, it recreates the list for each time it has visited all elements once. There are log(n) traversals. All these memory allocations and latter cleanups by the garbage collector takes time. See Sorting At Speed for GHC’s implementation of mergesort.

Mergesort, using mutable arrays, do not need to recreate the list for each traversal. It uses two list operations, update and lookup, which are both O(1). Like mergesort for immutable linked lists, mergesort for mutable arrays still needs to traverse the list log(n) times.

The use of one pointer per element and no need to reallocate the list for each traversal, makes it seem likely that converting a linked list to a mutable array, sorting and converting back again, may be faster than sorting a linked list directly.

Benchmarks

Up until now it has all been theory and I will need to do measurements to back up the claim, that we can sort faster by converting to a mutable array. I do not know all the idiosyncrasies of modern hardware or which optimizations GHC may or may not do. There may be GHC optimizations, which makes immutable linked lists a lot faster than I imagine.

I have therefore implemented a mergesort using mutable arrays. I have used this program to benchmark my implementation. The program was compiled with GHC 6.10.1 using the optimization flag -O2. And I ran the benchmark program as: “./Benchmark +RTS -K32388608 -RTS >/dev/null”. I have tested both with lists of Ints and list of strings.

The benchmark program calls System.Mem.performGC before and after sorting. The last call to System.Mem.performGC is measured as part of the sorting. In this way I am internalizing the cost of garbage collection into the sort function. I use this in stead of measuring memory use directly. The the first call is done, so that we start with a clean sheet before each measurement.

Int sort results
Iterations Number of elements Data.List.sort (ms) Array sort (ms) Speedup (%)
25000 10 264 272 -4
50000 10 684 572 16
100000 10 2396 2216 7
25000 20 1136 1056 7
10000 40 916 808 11
10000 50 1184 1016 14
10000 75 1872 1548 17
5000 100 1288 1048 18
5000 200 2912 2420 16
2000 400 2492 2024 18
2000 600 4000 3184 20
2000 800 5624 4276 23
1000 1000 3656 2744 24
20 10000 2276 1592 30
5 100000 8844 4260 51
2 1000000 62559 24677 60
String sort results

Iterations Number of elements Data.List.sort (ms) Array sort (ms) Speedup (%)
10000 10 304 312 -3
20000 10 604 624 -4
40000 10 1216 1252 -3
10000 20 524 504 3
5000 50 632 544 13
5000 100 1320 1072 18
2000 200 1156 980 15
1000 400 1236 1008 18
750 600 1504 1184 21
500 800 1404 1080 23
500 1000 1828 1404 23
10 10000 992 716 27
5 100000 8860 4296 51

As can be seen, the speedups for large lists are quite good, but not for small lists. The speedup is even slighty negative for some small lists. That the benefits are better for large lists is not surprising, when considering that the cost of converting to and from arrays is O(n), whereas the gains are O(n * log(n)).

Sorting Ints shows better speedup than sorting strings. I speculate, that this is due to strings being more expensive to compare than Ints. As both Data.List.sort and my own sorting algorithm uses the same number of comparisons, the actual compare operations are equally expensive for both algorithms. When a constant part, the compare operations, is larger, then the (percentage) speedup gets smaller.

Taking the beauty out of functional programming?

Some might argue that using mutable arrays removes the beauty from functional programming – and they would be right. I do not see a problem though, as the nasty mutable code can be hidden behind a nice functional interface. Furthermore, a sorting algorithm only needs to be implemented once, but can be used man times.

Future Improvements

Although my sorting algorithm is, in most cases, faster than the stock GHC one, there is still room for improvement.

As shown, we are converting a linked list to an array, and back to a linked list again. As we are converting to an array, we might as well convert to an array of unboxed types in stead. Using unboxed types would avoid one pointer per element. Initial experiments shows significant speedups, as compared to my current sorting algorithm. Of cause, this only works for types which can be unboxed, which fortunately includes popular types such as Int, Double and Float. Using rewrite rules we can even hide the choice of using unboxed types from the user of the sorting algorithm.

Haskell has other types of arrays, such as Parallel arrays (module GHC.PArr) which is claimed to be fast. Choosing another array implementations may improve sorting performance.

Lastly, as Don Stewart explains in his blogpost “A Journal of Haskell Programming Write Haskell as fast as C: exploiting strictness, laziness and recursion” we may use GHC core to gain better performance.

About these ads

4 Responses to “Using Mutable Arrays for Faster Sorting”

  1. lilac Says:

    Of course, your via-arrays sort is going to be stricter than Data.List.sort, since it’ll sort the whole list no matter how much or little of it is demanded. Conversely, take 1 . Data.List.sort is O(n).

  2. Richard Barrell Says:

    lilac: true, but it’s not necessarily that much of a loss; both end up forcing the entire list anyway.
    In theory you could use an STArray implementation of, say, heapsort, that returned elements lazily, since the top of the heap moves upwards through the array as you pull successive elements from it. O(n) to establish the heap property, then O(k*log(n)) to retrieve k elements.

    • Mads Lindstrøm Says:

      Richard Barrell: I tried implementing the STArray based heapsort. Unfortunately it was a lot slower than GHC’s Data.List.sort for list where length xs < 100, and about the same speed for larger lists. Of cause it can be that I implemented it poorly. Or maybe heapsort requires more comparison operations than mergesort. I will need to investigate further…

      That said, heapsort also has the problem of being unstable. I do not how much of a problem this is. To be earnest, I cannot remember ever requiring a sorting algorithm to be stable.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: