CMU 15-190 Spring 2017: Topics in Intermediate Programming
Homework 5 (Due Wednesday 1-Mar, at 11:59pm)

This hw is optional. It is for students who want to get a bit more out of the optional/advanced talk last week, and in particular for those who may wish to get 3 units of pass/fail credit for 15-190 (for whom it is required). We hope that's quite a few of you!

We will grade the hw, but loosely, and chiefly based on effort. You should expect to invest 1-2 hours on it and you'll be fine. This is not about the grade, of course, but about the learning. So take your time, discuss issues with your groupmates, have fun, and enjoy.

Please track when you start and stop working on this, and with whom you collaborate. You'll need that for your submission. And again, have fun.

Huffman Coding with a full alphabet
First, find a reliable data source that provides the distribution of the letters A-Z in everyday English-language writing. Next, using that distribution, construct a variable-length Huffman coding. Then, write a function encodeLetters(s) that takes a string s of uppercase letters A-Z and returns a string of 1's and 0's, which encodes that string using the Huffman coding you just derived. Also, write decodeLetters(s) that takes a string of 1's and 0's and runs the process in reverse, effectively decoding them back to the original string. So, for any string of letters A-Z, it should be true that (decodeLetters(encodeLetters(s)) == s). Write a test function for these, too.

Better Huffman Coding
Intead of encoding just uppercase letters, go further and encode arbitrary English text. For this, you might have to compute the distributions of the characters yourself. These will include uppercase, lowercase, digits, punctuation, and whitespace. You can ignore all other characters (special characters, etc). You might refer to Project Gutenberg if you need to find large collections of plaintext. For example, Alice's Adventures in Wonderland in plaintext can be found here. Write encodeArbitraryText(s) and decodeArbitraryText(s) and a simple test function.

Best Huffman Coding
Continuing, if you have time and inclination, instead of encoding to a string of 1's and 0's, go all the way and turn these into a list of unsigned 32-bit integers (that is, a list of integers each with a value between 0 and 1111...1111 (32 1's), which is 2**32-1). Provide encodeTextToIntList(s) and decodeTextFromIntList(L), and a simple test function. Finally, encode a large block of text and in a comment at the top of your file, include these 4 values: (1) the number of characters in the text; (2) the number of bytes required to store those characters if we encoded one character/byte (same as (1), of course); (3) the number of bytes required with your compressed list of int's (remember, 4 bytes per 32-bit int); and finally, (4) the amount of compression achieved (answer to (3) divided by answer to (2)). How'd we do?

What to Submit
Each member of the group should submit their own PDF file, 15-190-hw4.pdf, containing this information:
1. Your name and andrew id
2. The names and andrew id's of your groupmates
3. The start/stop times you worked, and the total time you worked.
4. Your thoughts about this exercise -- useful, interesting, etc? This will help us improve the 15-190 experience as we go.
5. The solutions from above. Note: one way to create a PDF is to first make a Word file, then export that to PDF.
You can submit your hw to the 15-190 autolab site here.

Have fun!