Anne Dawson: CSCI201A_FA_FA04.htm   

 

Last updated: Friday 29th October 2004, 12:55 PT

 

This document is subject to change without notice.

 

Please report any errors or omissions in this document:

adawson@coquitlamcollege.com

 

Special instructions:

For this assignment you may work in teams of 2, or alone.

 

 

CSCI201A

Data and Program Organization (Data Structures)

Fall 2004

Final Assignment

Implementing a Spelling Dictionary using a Hash Table.

 

One natural use of a hash table is to hold the words in a spelling dictionary. This makes looking up the words very efficient.  In this assignment, you will implement and use such a dictionary.

 

For the first part of the assignment, you should write a class that implements an Open Hash Table of strings. That is, the table is stored as an array of simple linked lists of strings (or, more precisely, an array of pointers to such lists). The Hash class must include:

 

A function add(s) that adds the string s to the table, if it is not already there.

 

A function remove(s) that removes the string s from the table, if it's there.

 

A function contains(s) that returns a boolean value that checks whether the string s is in the table.

 

A function size() that returns the number of strings in the table.

 

A hash function to compute the hash code of a string. This function can be private.

 

A constructor that accepts the size of the table as a parameter.

 

The second part of the assignment is to write a main program that uses your hash table class. The main program will behave like the standard Linux utility program ispell, when that program is used interactively to check individual words. When ispell runs, you are prompted to enter words. When you type a word, there are three possible responses: An output of "ok" means that the word is in the dictionary. An output of "not found" means that the word is not in the dictionary and neither are there any near misses. Otherwise, the output is a list of near misses, that is, words that are in the dictionary and are similar to what you typed. The ispell program recognises several types of near-misses: single-letter omissions, single-letter additions, transpositions of two neighbouring letters, substitution of one letter for another, and two words run together. Your program should do the same.

 

 

The file http://www.coquitlamcollege.com/adawson/words.txt contains 234,936 words, with one word per line. (This is not the same dictionary used by ispell.) Your program will use this file for its dictionary. It should create a hash table of an appropriate size for the number of words. It should read the words from the file and add them all to the hash table. Then it should prompt the user to enter words. For each word, it should check whether the word is in the dictionary. If so, it should say "ok". If not, it should look for all possible near misses. If the program finds any near misses in the dictionary, it should print them. If not, it should say "not found".

 

 

One issue that you will have to settle is what to do with upper-case letters. Note that both the words in the dictionary and the user's input can contain upper case letters. I suggest that you simply convert all letters to lower case.

 

Searching for near misses will require a significant amount of programming. You will have to generate every possible near miss and look for it in the dictionary.

 

Specifically:

 

Construct every string that can be made by deleting one letter from the word. (n possibilities, where n is the length of the word)

 

Construct every string that can be made by inserting any letter of the alphabet at any position in the string. (26*(n+1) possibilities)

 

Construct every string that can be made by swapping two neighbouring characters in the string. (n-1 possibilities)

 

Construct every string that can be made by replacing each letter in the word with some letter of the alphabet. (26*n possibilities (including the original word n times, which is probably easier than avoiding it))

 

Construct every pair of strings that can be made by inserting a space into the word. (n-1 pairs of words; you have to check separately in the dictionary for each word in the pair)

 

Ideally, you should list the near misses in alphabetical order, as ispell does.

 

If you would like to do some extra credit work on this assignment, write a program that can spellcheck files in addition to individual words.

 

 

Your main program should start with a comment block that contains the following information:

 

#  File:       fa.cpp or fa.java

#  Purpose:    Implementation of a Spelling Dictionary using a Hash Table.

#  Programmer: [your name]   

#  Partner:    [your partner's name]

#  Course:     CSCI201A

#  Date:      

 

Program points will be based on the following marking scheme:

Marking Scheme:

CSCI201A   -   Final Assignment

 

Implementing a Spelling Dictionary using a Hash Table.

Student name(s):

Category

Points

Description

Algorithms and Pseudocode for member function operations

10

An informal description of the steps that must be taken to solve the problem.

ref: http://www.coquitlamcollege.com/adawson/Pseudocode.htm

Style

10

The source code should be commented appropriately, indented properly, use meaningful variable names and identifiers, and use good programming practices. Screen prompts and results should be user-friendly.

add(s)

10

Implemented as per algorithm.

remove(s)

10

Implemented as per algorithm.

contains(s)

10

Implemented as per algorithm.

size()

10

Implemented as per algorithm.

hash function

10

The hash function should generate minimal collisions.

near miss programming

20

Implemented as per specification

Presentation

10

Marks will be awarded for the presentation of the assignment, and performance in the Q&A session which follows.

 

 

Assignment Presentations :  In the final class of the semester, each individual student or team will present their final assignment to the CSCI201A group.