Javascript LZMA Decompression

Aug 13 2011 Published by under Linux,Programming

In modern browsers, g-zip compression is a standard feature. The typical compression ratio for a plain text file is 30%, reducing the download time of web content by 70% and making it load 2-3 times faster. In spite of the speed up, g-zip is an old algorithm based on LZ77. Since then, newer algorithms have been invented, with LZMA being the standard. On Linux, LZMA typically produces files half the size compared to g-zip. This tutorial will show you how to use an LZMA compressed file produced by the standard lzma command on Unix machines directly in a client side web application. The rest of the post assumes you have the JavaScript libraries for LZMA and binary AJAX set up.

First, Make a Compressed File

echo "Hello, world." | lzma -3 > hello.lzma

Next,  Read Binary Data

<script src="../src/jquery-1.4.4-binary-ajax.js"></script>
<script src="../src/jdataview.js"></script>
<script>
function unzip(data) {
    // Make a view on the data
    var view = new jDataView(data);

    var int_arr = new Array;

    while (view.tell() < view.length) {

       int_arr.push(view.getUint8(view.tell()));

   }
   console.log(int_arr.length);
   console.log(int_arr);

}

// Download the file
$.get('hello.lzma', unzip, 'binary');
</script>

This is a pretty simple step, except the while loop counter may be unintuitive. getUint8 increments the file pointer, though it wasn’t documented in the API specification. I spent an hour or so comparing the output in hex. One of the problem was that

5d 00 00 08 00 0d 00 00 00 00 00 00 00 00

is the same as

5d 00 00 08 00 0d ff ff ff ff ff ff ff ff

in little Endian. You can try it in the decompressor, just replace the bytes in the hello world lzma on compression level 3. However, I figured out the problem as soon as I compared view.length and int_arr.length. They were multiples of 2! That always has significance in computing, in this case it meant I was reading every other byte. After correcting the while loop, I moved onto decoding the binary.

Third, Enjoy the Decoding

Yes, this is a rather boring thing to do while waiting, but do enjoy it.

    lzma.decompress(int_arr, function(result) {
        $('body').append($('<textarea></textarea>').val(result));
    })

Benefits

Using LZMA compression rather than g-zip, I was able to reduce a g-zipped file to 2/3 of its size, reducing the download time by 33%. The LZMA decompression algorithm could be improved to use an array to store results, joining them at the end, rather than appending to a string. It is not recommended to use this method unless you have large files. The libraries themselves take up about 50kb with g-zip. Furthermore, it is unsuitable for downloads where files are sent directly to the user, without being used by the application, since the user would have the decompression utilities.

One response so far