CMU 15-190 Spring 2017: Topics in Intermediate Programming
Homework 5 (Due Wednesday 1-Mar, at 11:59pm)



  1. Huffman Coding with a full alphabet
    First, find a reliable data source that provides the distribution of the letters A-Z in everyday English-language writing. Next, using that distribution, construct a variable-length Huffman coding. Then, write a function encodeLetters(s) that takes a string s of uppercase letters A-Z and returns a string of 1's and 0's, which encodes that string using the Huffman coding you just derived. Also, write decodeLetters(s) that takes a string of 1's and 0's and runs the process in reverse, effectively decoding them back to the original string. So, for any string of letters A-Z, it should be true that (decodeLetters(encodeLetters(s)) == s). Write a test function for these, too.

  2. Better Huffman Coding
    Intead of encoding just uppercase letters, go further and encode arbitrary English text. For this, you might have to compute the distributions of the characters yourself. These will include uppercase, lowercase, digits, punctuation, and whitespace. You can ignore all other characters (special characters, etc). You might refer to Project Gutenberg if you need to find large collections of plaintext. For example, Alice's Adventures in Wonderland in plaintext can be found here. Write encodeArbitraryText(s) and decodeArbitraryText(s) and a simple test function.

  3. Best Huffman Coding
    Continuing, if you have time and inclination, instead of encoding to a string of 1's and 0's, go all the way and turn these into a list of unsigned 32-bit integers (that is, a list of integers each with a value between 0 and 1111...1111 (32 1's), which is 2**32-1). Provide encodeTextToIntList(s) and decodeTextFromIntList(L), and a simple test function. Finally, encode a large block of text and in a comment at the top of your file, include these 4 values: (1) the number of characters in the text; (2) the number of bytes required to store those characters if we encoded one character/byte (same as (1), of course); (3) the number of bytes required with your compressed list of int's (remember, 4 bytes per 32-bit int); and finally, (4) the amount of compression achieved (answer to (3) divided by answer to (2)). How'd we do?

  4. What to Submit
    Each member of the group should submit their own PDF file, 15-190-hw4.pdf, containing this information:
    1. Your name and andrew id
    2. The names and andrew id's of your groupmates
    3. The start/stop times you worked, and the total time you worked.
    4. Your thoughts about this exercise -- useful, interesting, etc? This will help us improve the 15-190 experience as we go.
    5. The solutions from above. Note: one way to create a PDF is to first make a Word file, then export that to PDF.
    You can submit your hw to the 15-190 autolab site here.

Have fun!