Anne
Dawson: CSCI201A_LAB9_FA04.htm
Last
updated: Tuesday 30th November 2004, 11:24 PT
This
document is subject to change without notice.
Please
report any errors or omissions in this document:
adawson@coquitlamcollege.com
Special instructions: For this assignment you may work in
teams of 2, or alone.
CSCI201A –
Data and Program Organization (Data Structures)
Fall 2004
Lab Assignment 9 (over two lab
sessions)
Purpose: To
investigate the performance of Quicksort under different conditions.
Reference
textbooks: Reference text books will be available in class for this lab.
Specification :
The purpose of this lab is to investigate the efficiency of the
Quicksort algorithm under different conditions. Quicksort is the fastest of all sort algorithms. It was devised by the British computer
scientist C.A.R. Hoare. The Quicksort algorithm works by choosing one value
from the array (or vector) to act as a pivot. The array is then split in two parts (partitioned) by
placing values lower than the pivot value on the left, and values >= the
pivot on the right. This
partitioning is done recursively on each side until the array is sorted. After
each partitioning step, the array gets closer to being sorted. The performance of the Quicksort
algorithm depends to a large extent on the value of the pivot chosen to
partition (divide) the array (or vector) of values to be sorted. Different textbooks suggest different
methods of selecting the pivot.
Some suggest taking the first element in the array and using this as the
pivot, others suggest using the last element, others suggest using the middle
element, but a proven popular method is the "Median-of-three Partitioning
Method" described in the Weiss textbook (see references below). In fact, Weiss
warns against using the first element as the pivot and explains in detail why.
The performance of the Quicksort algorithm also depends on the array to be
sorted. For instance, Quicksort
works best when the array is already nearly sorted, and the pivot is exactly in
the middle, and each side of the pivot has equal numbers of elements. In this case, we have a binary split,
and a diagram of the partitions looks like a binary tree (see quicksort.htm and Data Structures and abstraction Using
C++ by Mark R. Headington and David D. Riley, page 580). With an array of size
16, we have a single partition of 16 elements at the root of the tree, two
partitions of 8 elements each at the next level, 4 partitions of 4 elements
each at the next level and so on.
In the best case, (when we have a true binary split), the performance of
the algorithm depends on the number of elements to be sorted multiplied by the
number of levels of partitions (log N), this means we have a performance of O(N
log N). If the pivot is not chosen
well (e.g. if the smallest value in the array is chosen as the pivot), then the
resulting split is not binary, then the performance degenerates to O(N2). However, the nature of the Quicksort
algorithm means that even in the average case, the performance is O(N log N).
In this lab you will investigate the choice of the pivot (first
element, last element, middle element and median of three elements) on the
performance of a Quicksort on a large array. You will also look at the affect of the array itself. For example, you should investigate the
performance of the Quicksort on a large array: 1) whose values are unique and
randomly distributed, 2) whose values are unique but nearly sorted, 3) on an
array whose values are to a large extent duplicated (i.e. 50% or more of the
values in the array are the duplicates).
You will need to use a timing routine and count the number of swaps and
comparisons. You should prepare a
Powerpoint presentation to present your results in a textual and graphical
fashion.
Data Structures and Problem Solving Using Java by
Mark Allen Weiss, ISBN: 0-201-74835-5
Addison Wesley (2nd Edition)
Data Structures and Problem Solving Using C++ by
Mark Allen Weiss, ISBN:0-201-61250-X
Addison Wesley (2nd Edition)
Data Structures and
abstraction Using C++ by Mark R. Headington and David D. Riley
Data Structures and Other Objects Using C++ by Michael Main and
Walter Savitch
Addison Wesley, ISBN 0-201-70297-5 (Second Edition)
Assignment
points will be based on the following marking scheme:
|
Category |
Points |
Description |
|
Comments |
10 |
Source
code should be commented as recommended. |
|
Coding
Style |
10 |
The
source code should be indented properly, use meaningful variable names, use good
programming practices, etc. See the book, course web site and lectures for
examples. Program prompts and/or outputs should be user-friendly. |
|
Pivot - first element |
10 |
Implemented
as per specification |
|
Pivot - last element |
10 |
Implemented
as per specification |
|
Pivot
- middle element |
10 |
Implemented
as per specification |
|
Pivot
- median of three |
10 |
Implemented
as per specification |
|
Large array (unique and random) |
10 |
Implemented
as per specification |
|
Array (unique and nearly sorted) |
10 |
Implemented
as per specification |
|
Array (50% duplicated) |
10 |
Implemented
as per specification |
|
Presentation of results |
10 |
As
per specification |
Source Code Format : As stated above, the source code
should follow good programming style. In addition, each file should start with
a comment block that contains the following information:
#
File: lab9.cpp or
lab9.java
#
Purpose:
To investigate the performance of Quicksort
#
Programmer 1: [your name]
#
Programmer 2: [your partner's name]
#
Course: CSCI201A
#
Date: