LZ77 compression

LZ77 compression

The first algorithm to use the Lempel-Ziv substitutional compression schemes, proposed in 1977. LZ77 compressionkeeps track of the last n bytes of data seen, and when aphrase is encountered that has already been seen, it outputs apair of values corresponding to the position of the phrase inthe previously-seen buffer of data, and the length of thephrase. In effect the compressor moves a fixed-size "window"over the data (generally referred to as a "sliding window"),with the position part of the (position, length) pairreferring to the position of the phrase within the window.

The most commonly used algorithms are derived from theLZSS scheme described by James Storer and Thomas Szymanskiin 1982. In this the compressor maintains a window of size Nbytes and a "lookahead buffer", the contents of which it triesto find a match for in the window:

while (lookAheadBuffer not empty)get a pointerelse output the first character in the lookahead buffer; shift the window 1 character along; }

Decompression is simple and fast: whenever a (POSITION,LENGTH) pair is encountered, go to that POSITION in the windowand copy LENGTH bytes to the output.

Sliding-window-based schemes can be simplified by numberingthe input text characters mod N, in effect creating a circularbuffer. The sliding window approach automatically creates theLRU effect which must be done explicitly in LZ78 schemes.Variants of this method apply additional compression to theoutput of the LZSS compressor, which include a simplevariable-length code (LZB), dynamic Huffman coding(LZH), and Shannon-Fano coding (ZIP 1.x), all of whichresult in a certain degree of improvement over the basicscheme, especially when the data are rather random and theLZSS compressor has little effect. An algorithm was developedwhich combines the ideas behind LZ77 and LZ78 to produce ahybrid called LZFG. LZFG uses the standard sliding window,but stores the data in a modified trie data structure andproduces as output the position of the text in the trie.Since LZFG only inserts complete *phrases* into thedictionary, it should run faster than other LZ77-basedcompressors.

All popular archivers (arj, lha, zip, zoo) arevariations on LZ77.

[comp.compression FAQ].