Last updated: Friday 31st October 2008, 8:09 PT by AHD
The topic of sorting occupies a prominent position in computer science. It has been estimated that anywhere from 25% to 50% of all computing time in the commercial world is spent on sorting data. Many sorting methods have been conceived. Some are more appropriate than others, depending on the nature of the data to be sorted. This document describes two sorting algorithms with different performance characteristics. Both algorithms sort a list of integers into ascending order.
The Bubble Sort
The bubblesort algorithm sorts elements by comparing neighbours, and if they are out of order, swaps them. After the first pass of the sequence, the highest valued element is in position. After the second pass, the last two elements are in position. The sequence is sorted by completing n such passes. Because at each pass (i) , one element is in the correct position (sorted), then each subsequent pass need only process n - i + 1 elements of the sequence. That is, the running time of the ith pass is n - i + 1, and the overall running time is big O of the summation: (for every i from i = 1 to i = n) n - i + 1
S (n - i + 1)
i = 1
This can be rewritten as:
O(n + (n-1) + (n-2) + (n-3) ... + 2 + 1)
which is Big O of
i = 1
or (n(n+1)/2 i.e. n2 + n/2 which is O(n2)
The bubble sort makes exactly size -1 passes through the list, regardless of the original ordering of the list. Suppose the list contains 1000 values and only list and list are out of order:
245, 270, 250, 310, 320, 330, 340, . . .
after the first pass the contents are now
245, 250, 270, 310, 320, 330, 340, . . .
(in sorted order) The algorithm then continues to make 998 more passes through the list, needlessly. The bubble sort can be improved by observing the following: After a single pass, all list elements beyond the last one swapped must be in order and need not be considered further. This altered algorithm means that after a single pass, the data above is sorted. The second pass confirms this, and by the amended algorithm, the sort is completed. In the best case where a bubble sort is performed on an list which is already sorted, the result is size -1 comparisons and no swaps, i.e. O(n). The worse case is when the list is in reverse order and there are O(n) comparisons x O(n) swaps making the overall running time O(n2).
See program bubblesort.py in the following file:
The Selection Sort
A selection sort makes repeated passes through the list, from one end to the other. On the first pass, the algorithm finds (i.e. selects) the smallest value and swaps it with the element at index 0. Index 0 is then sorted. Next, the remainder of the list is used to select the next smallest value, and this is swapped with index 1.
1. Set marker to 0
2. While marker < list size
3. Find the smallest element in the range marker + 1 to list size -1
4. Swap that element with the element at position marker.
5. Increase marker by 1
For example, if the original list looks like this:
8 6 10 2 16 4
after the first pass through the list, the list looks like this:
2 6 10 8 16 4
Next, the remainder of the list is used to select the next smallest value, and this is swapped with index 1.
index: 0 1 2 3 4 5
original list: 8 6 10 2 16 4
after pass 1: 2 6 10 8 16 4
after pass 2: 2 4 10 8 16 6
after pass 3: 2 4 6 8 16 10
after pass 4: 2 4 6 8 16 10
after pass 5: 2 4 6 8 10 16
Note that the algorithm results in a nested for loop dependent on the size of the list n. This means that the algorithm runs in O(n2) time.
Efficiency of the selection sort
There are two types of operation involved in sorting a list of data: comparisons and assignments (i.e. exchanges or swaps). Since the two operations require different resources in terms of computing, they are evaluated separately. If we are just comparing integers in a list of integers, the comparison is fast, but if you have to compare several fields in a complex data structure, comparisons can take considerably longer. An assignment (an exchange) requires that data be physically moved within the computer's memory. Therefore, assignments are almost always more time-consuming than comparisons. In general, algorithms that minimize the number of assignments are to be preferred.
Selection sort is unusual amongst sorting algorithms in that, for a given list length, it always performs exactly the same number of both assignments (swaps = size -1) and comparisons (comparisons = size2 / 2).
In the selection sort algorithm, the outer loop iterates exactly size - 1 times. The first time through the outer loop, the inner loop iterates size -1 times. The second time through the outer loop, the inner loop iterates size -2 times, and so forth. Therefore the total count of the inner loop iterations is
(size -1) + (size - 2) + . . . + 2 + 1 = (size -1) * size
which is 0.5 size2
which is about n2 / 2 operations
Sorting routines are often analysed for two separate measures:
1. The number of times two list elements are compared
2. The number of times two list elements are moved
For the selection sort, the number of comparisons is O(n2), and the number of data movement or swaps is O(n). The latter is evident in the preceding algorithm because the swapping portion is within the outer loop, but outside the inner loop.
The running time of the selection sort is largely independent of the data. The number of comparisons and swaps remains the same regardless of the original ordering of the list contents. The only part of the algorithm that can vary is the number of times the then-clause of the inner loop is executed. For selection sort, then, the best case and worse case number of comparisons are both O(n2). This means that even if the original list was nearly sorted, the algorithm performs the same amount of work as if it was completely random.
Dictionary of Algorithms and Data Structures: