Anne
Dawson: CSCI201A_FA_FA04.htm
Last
updated: Friday 29th October 2004, 12:55 PT
This
document is subject to change without notice.
Please
report any errors or omissions in this document:
adawson@coquitlamcollege.com
Special instructions:
For this assignment you may work in teams of 2, or alone.
CSCI201A
Data and Program Organization (Data Structures)
Fall 2004
Final
Assignment
Implementing a Spelling Dictionary using a Hash Table.
One natural use of a hash table is to hold the words in a
spelling dictionary. This makes looking up the words very efficient. In this assignment, you will implement
and use such a dictionary.
For the first part of the assignment, you should write a class
that implements an Open Hash Table of strings. That is, the table is stored as
an array of simple linked lists of strings (or, more precisely, an array of
pointers to such lists). The Hash class must include:
A function add(s) that adds the string s to the table, if
it is not already there.
A function remove(s) that removes the string s from the
table, if it's there.
A function contains(s) that returns a boolean value that
checks whether the string s is in the table.
A function size() that returns the number of strings in
the table.
A hash function to compute the hash code of a string. This
function can be private.
A constructor that accepts the size of the table as a parameter.
The second part of the assignment is to write a main program
that uses your hash table class. The main program will behave like the standard
Linux utility program ispell, when that program is used interactively to
check individual words. When ispell runs, you are prompted to enter
words. When you type a word, there are three possible responses: An output of
"ok" means that the word is in the dictionary. An output of "not
found" means that the word is not in the dictionary and neither are there
any near misses. Otherwise, the output is a list of near misses, that is, words
that are in the dictionary and are similar to what you typed. The ispell
program recognises several types of near-misses: single-letter omissions,
single-letter additions, transpositions of two neighbouring letters,
substitution of one letter for another, and two words run together. Your
program should do the same.
The file http://www.coquitlamcollege.com/adawson/words.txt contains
234,936 words, with one word per line. (This is not the same dictionary used by
ispell.) Your program will use this file for its dictionary. It should
create a hash table of an appropriate size for the number of words. It should
read the words from the file and add them all to the hash table. Then it should
prompt the user to enter words. For each word, it should check whether the word
is in the dictionary. If so, it should say "ok". If not, it should
look for all possible near misses. If the program finds any near misses in the
dictionary, it should print them. If not, it should say "not found".
One issue that you will have to settle is what to do with
upper-case letters. Note that both the words in the dictionary and the user's
input can contain upper case letters. I suggest that you simply convert all
letters to lower case.
Searching for near misses will require a significant amount of
programming. You will have to generate every possible near miss and look for it
in the dictionary.
Specifically:
Construct every string that can be made by deleting one letter from
the word. (n possibilities, where n is the length of the word)
Construct every string that can be made by inserting any letter
of the alphabet at any position in the string. (26*(n+1) possibilities)
Construct every string that can be made by swapping two
neighbouring characters in the string. (n-1 possibilities)
Construct every string that can be made by replacing each letter
in the word with some letter of the alphabet. (26*n possibilities (including
the original word n times, which is probably easier than avoiding it))
Construct every pair of strings that can be made by
inserting a space into the word. (n-1 pairs of words; you have to check
separately in the dictionary for each word in the pair)
Ideally, you should list the near misses in alphabetical order,
as ispell does.
If you would like to do some extra credit work on this
assignment, write a program that can spellcheck files in addition to individual
words.
Your main program should start with a comment block that
contains the following information:
#
File: fa.cpp or fa.java
#
Purpose:
Implementation of a Spelling Dictionary using a Hash Table.
#
Programmer: [your name]
#
Partner: [your
partner's name]
#
Course:
CSCI201A
#
Date:
Program
points will be based on the following marking scheme:
Marking
Scheme:
CSCI201A
- Final Assignment
Implementing a Spelling Dictionary using a Hash Table.
Student
name(s):
|
Category |
Points |
Description |
|
Algorithms and Pseudocode for member function operations |
10 |
An informal description of the
steps that must be taken to solve the problem. |
|
Style |
10 |
The source code should be commented
appropriately, indented properly, use meaningful variable names and
identifiers, and use good programming practices. Screen prompts and results
should be user-friendly. |
|
add(s) |
10 |
Implemented as per algorithm. |
|
remove(s) |
10 |
Implemented as per algorithm. |
|
contains(s) |
10 |
Implemented as per algorithm. |
|
size() |
10 |
Implemented as per algorithm. |
|
hash function |
10 |
The hash function should generate minimal
collisions. |
|
near miss programming |
20 |
Implemented as per specification |
|
Presentation |
10 |
Marks will be awarded for the presentation of
the assignment, and performance in the Q&A session which follows. |
Assignment Presentations : In the final class of the semester,
each individual student or team will present their final assignment to the
CSCI201A group.