Anne Dawson: CSCI201A_LAB9_FA04.htm   

 

Last updated: Tuesday 30th November 2004, 11:24 PT

 

This document is subject to change without notice.

 

Please report any errors or omissions in this document:

adawson@coquitlamcollege.com

 

Special instructions:  For this assignment you may work in teams of 2, or alone.

 

CSCI201A – Data and Program Organization (Data Structures)

Fall 2004

Lab Assignment 9 (over two lab sessions)

Purpose: To investigate the performance of Quicksort under different conditions.

Reference textbooks: Reference text books will be available in class for this lab.

Specification :

The purpose of this lab is to investigate the efficiency of the Quicksort algorithm under different conditions.  Quicksort is the fastest of all sort algorithms.  It was devised by the British computer scientist C.A.R. Hoare. The Quicksort algorithm works by choosing one value from the array (or vector) to act as a pivot.  The array is then split in two parts (partitioned) by placing values lower than the pivot value on the left, and values >= the pivot on the right.  This partitioning is done recursively on each side until the array is sorted. After each partitioning step, the array gets closer to being sorted.  The performance of the Quicksort algorithm depends to a large extent on the value of the pivot chosen to partition (divide) the array (or vector) of values to be sorted.  Different textbooks suggest different methods of selecting the pivot.  Some suggest taking the first element in the array and using this as the pivot, others suggest using the last element, others suggest using the middle element, but a proven popular method is the "Median-of-three Partitioning Method" described in the Weiss textbook (see references below). In fact, Weiss warns against using the first element as the pivot and explains in detail why. The performance of the Quicksort algorithm also depends on the array to be sorted.  For instance, Quicksort works best when the array is already nearly sorted, and the pivot is exactly in the middle, and each side of the pivot has equal numbers of elements.  In this case, we have a binary split, and a diagram of the partitions looks like a binary tree (see quicksort.htm and  Data Structures and abstraction Using C++ by Mark R. Headington and David D. Riley, page 580). With an array of size 16, we have a single partition of 16 elements at the root of the tree, two partitions of 8 elements each at the next level, 4 partitions of 4 elements each at the next level and so on.  In the best case, (when we have a true binary split), the performance of the algorithm depends on the number of elements to be sorted multiplied by the number of levels of partitions (log N), this means we have a performance of O(N log N).  If the pivot is not chosen well (e.g. if the smallest value in the array is chosen as the pivot), then the resulting split is not binary, then the performance degenerates to O(N2).  However, the nature of the Quicksort algorithm means that even in the average case, the performance is O(N log N).

 

In this lab you will investigate the choice of the pivot (first element, last element, middle element and median of three elements) on the performance of a Quicksort on a large array.  You will also look at the affect of the array itself.  For example, you should investigate the performance of the Quicksort on a large array: 1) whose values are unique and randomly distributed, 2) whose values are unique but nearly sorted, 3) on an array whose values are to a large extent duplicated (i.e. 50% or more of the values in the array are the duplicates).  You will need to use a timing routine and count the number of swaps and comparisons.  You should prepare a Powerpoint presentation to present your results in a textual and graphical fashion. 

 

Data Structures and Problem Solving Using Java by Mark Allen Weiss, ISBN: 0-201-74835-5

Addison Wesley (2nd Edition)

 

Data Structures and Problem Solving Using C++ by Mark Allen Weiss, ISBN:0-201-61250-X

Addison Wesley (2nd Edition)

 

 Data Structures and abstraction Using C++ by Mark R. Headington and David D. Riley

 

Data Structures and Other Objects Using C++ by Michael Main and Walter Savitch

Addison Wesley,  ISBN 0-201-70297-5 (Second Edition)

 

 

Assignment points will be based on the following marking scheme:

Category

Points

Description

Comments

10

Source code should be commented as recommended.

Coding Style

10

The source code should be indented properly, use meaningful variable names, use good programming practices, etc. See the book, course web site and lectures for examples. Program prompts and/or outputs should be user-friendly.

Pivot - first element                       

10

Implemented as per specification

Pivot - last element

10

Implemented as per specification

Pivot - middle element

10

Implemented as per specification

Pivot - median of three

10

Implemented as per specification

Large array (unique and random)

10

Implemented as per specification

Array (unique and nearly sorted)

10

Implemented as per specification

Array (50% duplicated)

10

Implemented as per specification

Presentation of results

10

As per specification

Source Code Format : As stated above, the source code should follow good programming style. In addition, each file should start with a comment block that contains the following information:

#  File:         lab9.cpp or lab9.java

#  Purpose:      To investigate the performance of Quicksort

#  Programmer 1: [your name] 

#  Programmer 2: [your partner's name]   

#  Course:       CSCI201A

#  Date: