Windows Server System Magazine - Content Compression in Microsoft IIS Server


	FTPOnline

Channels

Conferences

Resources

Hot Topics

Partner Sites

Magazines

About FTP


	email article
	printer friendly
	more resources

Content Compression in Microsoft IIS Server
How to make IIS deliver compressed content to HTTP 1.1-compliant browsers.
by Konstantin Balashov

Posted April 26, 2004

About This Article
This article is adapted from the book Speed Up Your Site: Web Site Optimization by Andrew B. King. Copyright © 2003 by New Riders Publishing. Reproduced by permission of Person Education Inc. Publishing as New Riders Publishing. All rights reserved. Peachpit Press: www.peachpit.com.

Compression technology seems to offer something for nothing. Compression saves bandwidth and speeds up Web sites by removing redundancy to reduce the amount of data sent. Although the cost of compression is certainly not zero, over networks such as the Internet, transmission time and not rendering is usually the bottleneck. This article shows you how to compress the text in your content to minimize bandwidth costs and maximize speed.

Compression algorithms trade time for space by preprocessing files to create smaller versions of the files. A compressed file is later decompressed to reconstruct the original, or an approximation thereof. The compression process naturally includes two components: encoding and decoding. Encoding compresses the data, and decoding decompresses the data, which usually happens at a faster rate. Because bandwidth is more precious than CPU cycles, network concerns usually trump server load considerations.

Text Compression Algorithms
There are three major approaches to text compression:

Dictionary-based (LZ stands for Lempel and Ziv).
Block sorting-based (BWT, or Burrows-Wheeler Transform).
Symbol probability prediction-based (PPM, or Prediction by Partial Matching).

Most file-compression formats like arj, zip, gz, lzh, and rar (including GIF and PNG) are based on dictionary algorithms of previously occurring phrases. By substituting the distance to the last occurrence and the length of the phrase, they save space. The LZW (Lempel, Ziv, and Welch) algorithm used in the GIF format is different—it substitutes a dictionary index for the phrase. LZ-based algorithms are very fast, with moderate compression ratios and modest memory requirements (< 100K).

BWT algorithms make a block-sorting transform over the text. The result of the transform is text of the same size, but letters are magically grouped. It can then be efficiently compressed with a very fast and simple coding technique. Block-sorting transforms make sense when they are applied to a big data block. BWT algorithms are fast, with high compression ratios and moderate memory requirements (1 MB or more).

PPM algorithms calculate the probability distribution for every symbol and then optimally encode it. Most PPM implementations are slow, sophisticated, and memory intensive (8 MB or more).

The efficiency of these lossless algorithms is measured in bits per character (bpc), or the number of bits needed to encode each character. The English language has an effective bpc of 1.3, so theoretically if a compression algorithm "knew" all the idioms and structure of this language, it could approach this figure. To give you an idea of how effective the current Web compression algorithms are, Table 1 illustrates lossless compression algorithms and their efficiencies.