CSc 120: Writer Bot Hash Table
Background
This program is behaviorally identical to the Writer Bot program from
Assignment 9,
with these exceptions:
-
The program will be altered to use a hash table ADT rather than a Python built-in dictionary, as specified in Programming Requirements.
-
The input will include the table size for the hash table, as specified under Input Format.
-
Some error checking is required as specified under Errors.
Restrictions
-
For the long problem, your code should follow the style guidelines
for the class. You must follow the updated guidelines for commenting classes.
-
You may not use concepts and/or short-hand syntax not yet covered in class. The restrictions include the following:
-
dictionary or list comprehensions, e.g., [n * 2 for i in range(10)]
-
with open (explicitly open and close the file instead)
-
the ternary operator (use an if instead)
-
nested functions (using def within a function to define another function)
-
exceptions (try/except)
-
type annotations
-
lambda expressions
-
generators and user defined iterators
-
default arguments
-
decorators
-
importing libraries (unless a library is explicitly mentioned in the specification)
Markov Chain Algorithm
As in
Assignment 9.
Definitions
As in
Assignment 9.
Expected Behavior
Write a program, in a file writer_bot_ht.py, that generates random text
from a given source text. Your program should behave as follows:
-
Use input() (without arguments) to read the name of the
the source file sfile. Do not
prompt the user for input. Do not hard-code the file name
into your program.
-
Use input() (without arguments) to read the size of the
the hash table M. Do not
prompt the user for input.
-
Use input() (without arguments) to read in the prefix size n.
Do not prompt the user for input.
-
Use input() (without arguments) to read in the number of words to be generated for the random text.
Do not prompt the user for input.
-
Read sfile and build the Markov chain table of prefixes to suffixes
according to the description above.
-
Construct the randomly generated text according to the Markov chain algorithm. Construct a list to hold the words of the generated text.
-
Print out the generated text list accoring to the Output format below.
Input Format
As in
Assignment 9 and repeated here.
Each line of the input file is a sequence of characters separated by whitespace.
The file may consists of any number of lines with any number of words on each line.
Output Format
As in
Assignment 9 and repeated here.
Print out the list of generated text ten words per line.
Any extra words will be printed on the last line. For example, if the generated text has only nine words, the output will consist of one line of nine words.
If the text has 109 words, the output will consist of eleven lines of output, the first ten lines having ten words and the last line having nine.
Programming Requirements
-
The example discussed above shows a table for prefixes of size two. Your program must work for a prefix of arbitrary size n.
-
Instead of using a Python dictionary to build the table mapping prefixes to suffixes, implement a hash table ADT.
Previously, tuples were used to represent the prefixes. In this assignment, you will use strings to represent a prefix. For example, instead of using the tuple
('Half, 'a')
you will use the string
'Half a'
Since the prefixes will be the keys in the hash table, you will use the hash function specified in the Hash Function section to hash a string representing a prefix to an integer. You are required to use the given hash function. Implement the following class, Hashtable
:
-
class Hashtable
-
An object of this class is a hash table that uses linear probing to handle collisions. It should
contain (at least) the following attributes:
-
_pairs: the underlying Python list implementing the hash table.
-
_size: the size of the hash table.
The class must implement (at least) the following methods:
-
__init__(self, size): initializes the object's
attributes as follows: _pairs is set to a list of size size;
_size is set to size.
-
put(self, key, value): hashes key and inserts the key/value pair in the hash table.
Use linear probing with a decrement of 1 to resolve collisions.
-
get(self, key): looks up key in the hash table and if found, returns the corresonding value. If not found, it returns None.
-
__contains__(self, key): looks up key in the hash table and if found returns True and otherwise returns False.
-
__str__(self) returns a string representation of a Hashtable object.
Note: Do not use append() or list concatenation
in the put() method. (You will receive a deduction if you do!)
The code that uses the hashtable ATD will modify a key's corresponding value when
needed.
-
As before, a prefix may have one or more suffixes. You must use
a list to represent the possible suffixes. When a new suffix is encountered
for an existing prefix, you must append the new suffix to the end of the list.
This is important for matching the autograder output: the order in which suffixes
are stored in the list will affect the choices made during text generation and will impact the output.
For example, suppose that
'Half a'
hashes to integer i and that 'a bee'
hashes to integer j.
The following shows the contents of the hash table at those indices for the Eric the Bee example:
hash table at index i is ['Half a', [ 'bee' ]]
...
hash table at index j is ['a bee', [ 'philosophically', 'be', 'Due' ]]
-
In addition, during text generation, when a prefix has more than one suffix, the suffix will be randomly
chosen from the list. You will use the Python random number generator as in Assignment 9 to do this.
As in that assigment, in order for your output to match the tester and grading scripts, you must seed the random
number generator. To do this, define the following constant at the top of your program:
SEED = 8
-
You must define the constant NONWORD, which must be a word that cannot exist in the original text. In Assignment 9, a space was used for this, however, we need spaces to delineate the words of a prefix. Consequently, define NONWORD as the single character @ as follows:
NONWORD = '@'
-
As you can imagine, when generating the output for larger text, it is not useful to print out the random text
one word at a time. During the text generation phase, create a list to hold the words of the generated text.
When the text generation is complete, print the output as specified in the Output format section.
Hash Function
The hash function covered in lecture that hashes a string to an integer
multiples the position of each character in the string by its ord value.
That is straightforward but not robust enough for use in practice.
A better approach is to compute a polynomial whose coefficients are the ord values of the individual characters in the string. Using
Horner's rule to
compute the polynomial, and using 31 for the value of x, gives us the following
hash function:
def _hash(self, key):
p = 0
for c in key:
p = 31*p + ord(c)
return p % self._size
Errors
The following are errors:
-
The input value n for the prefix size is less than one.
Program behavior: Use normal program logic to detect this (i.e., if statements).
Give the following error message:
Error message:
"ERROR: specified prefix size is less than one"
-
The input value for the size of the generated text is less than one.
Program behavior: Use normal program logic to detect this (i.e., if statements).
Give the following error message:
Error message:
"ERROR: specified size of the generated text is less than one"
Terminate the program after the
errors above are reported. You may use Python's system library.
First, include the following import statement at the top of your program:
import sys
After printing the appropriate error message, exit your program like so:
sys.exit(0)
Examples
Some examples of generating random text from different source texts are shown
here.
Reference
The Markov chain algorithm is used to solve a variety of problems. Using it for random text generation has been described in many places, most notably in The Practice of Programming, by Kernighan and Pike,
which can be found
here.