Anne
Dawson: CSCI110A_FA_SP04.htm
Last
updated: Wednesday 17th March 2004, 10:02 PT
This document is subject to change without notice.
Please
report any errors or omissions in this document:
adawson@coquitlamcollege.com
Special instructions: For
this assignment you may work in teams of 2, or alone.
This assignment is due at the start of the last class of the
semester (Week 13 Class 2).
Final
Assignment Specification
------------------------------
Due
date: At the start of class (Week 13 Class 2).
The
aim of this assignment is to produce a VB program (sim.vbp, sim.frm) that
compares two text files and reports on their similarity. For example, if the
two text files are identical, the program reports a similarity factor of 1.0.
The similarity factor is determined (see examples below) from the total number
of characters in the file (including spaces and other non-alphabetic characters),
the number of words in the file, the number of lines in the file, the number of
alphabetic characters and the number of non-alphabetic characters. Words are
delimited by spaces or other non-alphabetic characters. The filenames of the
files to be compared must be entered into textboxes on the form. The program
outputs the results as follows:
Total
characters
Total
alphabetic characters
Total
non-alphabetic characters
Total
words
Total
lines
Similarity
factor (see example below)
as
well as a graphical representation showing the differences between the files.
Example
1
---------
'test1.txt
12345678901234567890123456789012345678901234567890123456789012345678901234567890
This is a test file. It contains text
to be used to test the sim.vbp VB project
which looks at the similarity between two
(2) text files. The text files can
contain non-alphabetic characters like
!@#$%^& and uppercase and lower case alphabetic characters.
Case may be ignored.
Total
characters: 370
Total
alphabetic characters: 231
Total
non-alphabetic characters: 139
Total
words: 49
Total
lines: 9
'test2.txt
12345678901234567890123456789012345678901234567890123456789012345678901234567890
This is a the other test file.
note numbers are not counted as alphabetic characters
Total
characters: 173
Total
alphabetic characters: 76
Total
non-alphabetic characters: 97
Total
words: 17
Total
lines: 4
Similarity
factor: 370/173 * 231/76 * 139/97 *
49/17 * 9/4 =
2.139 * 3.039 * 1.433
* 2.882 * 2.25 = 60.404
If
similarity factor > 1 then similarity factor = 1/similarity factor
Similarity
factor: 1/60.404 = 0.017
Example
2
---------
'test2.txt
12345678901234567890123456789012345678901234567890123456789012345678901234567890
This is a the other test file.
note numbers are not counted as alphabetic characters
Total
characters: 173
Total
alphabetic characters: 76
Total
non-alphabetic characters: 97
Total
words: 17
Total
lines: 4
'test2.txt
12345678901234567890123456789012345678901234567890123456789012345678901234567890
This is a the other test file.
note numbers are not counted as alphabetic characters
Total
characters: 173
Total
alphabetic characters: 76
Total
non-alphabetic characters: 97
Total
words: 17
Total
lines: 4
Similarity
factor: 173/173 * 76/76 * 97/97 * 17/17
* 4/4 = 1
If
similarity factor > 1 then similarity factor = 1/similarity factor
Similarity
factor: 1
Submission
instructions
-----------------------
At
the start of class (Week 13 Class 2) you should save just your program files
(sim.vbp and sim.frm) to your folder in CSCI101A\Week13\FA.
If
you are working in a team, both team members save the same files to their own
folder.
This document is subject to change without notice.
Marking
Scheme :
--------------
The
following marking scheme applies:
Course
Code: CSCI110A
Semester: SP04
Assignment
Code: Final Assignment
Lab
Spec: Similarity between files
(sim.vbp, sim.frm)
Instructor
Name: Dr Anne Dawson
Student1 Name:
Student1 Number:
Student2 Name:
Student2 Number:
DESIGN
1. The program has appropriate modularity
i.e. functions and procedures are used
where it makes sense to use them. /10
2. Appropriate data types and control
structures
are used. /10
3. The program is robust
(handles exceptional circumstances). /10
4. The program is efficient
(does not contain unnecessary
statements) /10
MAINTAINABILITY
5. The program is commented appropriately with
meaningful identifiers. /10
6. The program is indented (spaced out)
correctly, to
aid the understanding of the code. /10
7. The code is easy to follow. /10
CORRECTNESS:
9. The program runs as intended and includes
text and graphic
displays of results. /20
10.
Comprehensive test data and results are supplied. /10
% Complete:
Total: /100
Date: