It’s
a lossless compressed data format. The deflation algorithm used by GZIP (also zip and zlib)
is an open-source, patent-free variation of LZ77 (Lempel-Ziv 1977, see reference below). It finds
duplicated strings in the input data. The second occurrence of a string is replaced by a pointer to the
previous string, in the form of a pair (distance, length), distances are limited to 32K bytes, and
lengths are limited to 258 bytes. When a string does not occur anywhere in the
previous 32K bytes, it is emitted as a sequence of literal bytes. (In this description, “string” must be taken
as an arbitrary sequence of bytes, and is not restricted to printable
characters.)
HTML/XML/JavaScript/text compression: Does it make sense?
The short answer is “only if it can get there quicker.” In 99% of all cases it makes sense to compress the data. However there
are several problems that need to be solved to enable seamless transmission
from the server to the consumer.
Let’s create a simple scenario. An HTML file which contains a large music listing in the form
of a table.
http://12.17.228.53:8080/music.htm This file is
679,188 bytes in length.
Let’s track this download over a 28K modem and then compare the results before and after compression.
The theoretical throughput over a 28K modem is 3,600 bytes per second. Reality
is more like 2,400 bytes per second but for the sake of this article we will
work at the theoretical maximum. If there was no modem compression then the
file would download in 188.66 seconds. On the average with modem compression
running we can expect a download time of about 90 seconds which indicates about
a 2:1 compression factor. The total number of packets transmitted from modem to
modem effectively “halved” the file size. But note that the server still had to
keep open the TCP/IP sub system to “send” all the bytes to the modem for
transmission. What happens if we can compress the data prior to transmission
from the server. The file is 679,188 bytes in length. If we can compress it
using standard techniques (which are not optimized for HTML) then we can expect
to see the file be compressed down to 48,951 bytes. This is a 92.79%
compression factor. We are now transmitting only 48,951 bytes (plus some header
information which should also be compressed but that’s another story). Modem
compression no longer plays a factor because the data is already compressed.
Compression clearly makes sense as long as it’s seamless and doesn’t kill server performance.
A lot! Better algorithms need to be invented that compress the data stream more
efficiently than gzip. Remember gzip was designed before HTML came along. Any
technique which adds a new compression algorithm will require a thin client to
decode and possibly tunneling techniques to enable it “firewall friendly.” To
sum up we need:
Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.