Anne Dawson: CSCI110A_FA_SP04.htm   

 

Last updated: Wednesday 17th March 2004, 10:02 PT

 

This document is subject to change without notice.

 

Please report any errors or omissions in this document:

adawson@coquitlamcollege.com

 

Special instructions:  For this assignment you may work in teams of 2, or alone.

This assignment is due at the start of the last class of the semester (Week 13 Class 2).

 

Final Assignment Specification

------------------------------

 

Due date: At the start of class (Week 13 Class 2).

 

The aim of this assignment is to produce a VB program (sim.vbp, sim.frm) that compares two text files and reports on their similarity. For example, if the two text files are identical, the program reports a similarity factor of 1.0. The similarity factor is determined (see examples below) from the total number of characters in the file (including spaces and other non-alphabetic characters), the number of words in the file, the number of lines in the file, the number of alphabetic characters and the number of non-alphabetic characters. Words are delimited by spaces or other non-alphabetic characters. The filenames of the files to be compared must be entered into textboxes on the form. The program outputs the results as follows:

 

Total characters

Total alphabetic characters

Total non-alphabetic characters

Total words

Total lines

Similarity factor (see example below)

 

as well as a graphical representation showing the differences between the files.

 

 

Example 1

---------

 

'test1.txt

12345678901234567890123456789012345678901234567890123456789012345678901234567890

This is a test file. It contains text

to be used to test the sim.vbp VB project

which looks at the similarity between two

(2) text files. The text files can

contain non-alphabetic characters like

!@#$%^& and uppercase and lower case alphabetic characters.

Case may be ignored.

 

Total characters: 370

Total alphabetic characters: 231

Total non-alphabetic characters: 139

Total words: 49

Total lines: 9

 

 

 

'test2.txt

12345678901234567890123456789012345678901234567890123456789012345678901234567890

This is a the other test file.

note numbers are not counted as alphabetic characters

 

Total characters: 173

Total alphabetic characters: 76

Total non-alphabetic characters: 97

Total words: 17

Total lines: 4

 

 

 

Similarity factor:  370/173 * 231/76 * 139/97 * 49/17 * 9/4  = 

                     2.139  * 3.039  * 1.433  * 2.882 * 2.25 =  60.404

 

If similarity factor > 1 then similarity factor = 1/similarity factor

 

Similarity factor: 1/60.404 = 0.017

 

 

 

 

Example 2

---------

 

 

'test2.txt

12345678901234567890123456789012345678901234567890123456789012345678901234567890

This is a the other test file.

note numbers are not counted as alphabetic characters

 

Total characters: 173

Total alphabetic characters: 76

Total non-alphabetic characters: 97

Total words: 17

Total lines: 4

 

 

 

'test2.txt

12345678901234567890123456789012345678901234567890123456789012345678901234567890

This is a the other test file.

note numbers are not counted as alphabetic characters

 

Total characters: 173

Total alphabetic characters: 76

Total non-alphabetic characters: 97

Total words: 17

Total lines: 4

 

 

Similarity factor:  173/173 * 76/76 * 97/97 * 17/17 * 4/4  =  1

 

If similarity factor > 1 then similarity factor = 1/similarity factor

 

Similarity factor: 1

 

 

Submission instructions

-----------------------

 

At the start of class (Week 13 Class 2) you should save just your program files (sim.vbp and sim.frm) to your folder in CSCI101A\Week13\FA.

 

If you are working in a team, both team members save the same files to their own folder.

 

This document is subject to change without notice.

 

 

Marking Scheme :

--------------

 

The following marking scheme applies:

 

 

Course Code:      CSCI110A

Semester:         SP04

Assignment Code:  Final Assignment

Lab Spec:         Similarity between files (sim.vbp, sim.frm)

Instructor Name:  Dr Anne Dawson

                  Student1 Name:

                  Student1 Number:

                  Student2 Name:

                  Student2 Number:

 

DESIGN

 

1.  The program has appropriate modularity

    i.e. functions and procedures are used

    where it makes sense to use them.                           /10

                       

2.  Appropriate data types and control structures

    are used.                                                   /10

 

3.  The program is robust

    (handles exceptional circumstances).                        /10

 

4.  The program is efficient

    (does not contain unnecessary statements)                   /10

 

MAINTAINABILITY

 

5.  The program is commented appropriately with

    meaningful identifiers.                                     /10

 

6.  The program is indented (spaced out) correctly, to

    aid the understanding of the code.                          /10

 

7.  The code is easy to follow.                                 /10

 

CORRECTNESS:

 

9.  The program runs as intended and includes text and graphic

    displays of results.                                        /20

 

10. Comprehensive test data and results are supplied.           /10

              

 

 

                                               % Complete:

                                                    Total:      /100

                                                     Date: