Week 2

Faithful followers of the Thing a Week project, rejoice! Unexpectedly, I have a submission for week 2 (two days late, but who is paying attention anyhow?).

I’ve decided that DEFLATE (the algorithm used by “zip”, “gzip”, and a few other compression programs) would be a fun algorithm to implement. At this point, I am basically recreating the wheel, because the “zlib” library already does what I am planning on doing and more, but my ultimate goal is to analyze different compression techniques in use, so the logical place to start is by implementing them.

DEFLATE works by first taking the raw data and performing a lempel-zip encoding on the data (scanning through data for duplicates, and replacing duplicate data with a reference to where the data is duplicated). Then, this lempel-ziv encoded data is huffman-coded (codes of varying lengths are created for each symbol/character based on frequency of use). The huffman tree and the huffman-coded data are then written to the file.

This week, I have only implemented huffman-encoding. The huffman trees do not currently adhere to all the properties needed by DEFLATE, but the basic functionality of encoding symbols into a huffman tree seems to be working.

http://joshodom.net/weekly/week2.tar.gz

Leave a Comment