How does gz work
Content on WhatAnswers is provided "as is" for informational purposes. While we strive for accuracy, we make no guarantees. Content is AI-assisted and should not be used as professional advice.
Last updated: April 8, 2026
Key Facts
- Created in 1992 by Jean-loup Gailly and Mark Adler
- Uses DEFLATE algorithm combining LZ77 and Huffman coding
- Typical compression ratios of 2:1 to 3:1 for text files
- Standardized in RFC 1952 in 1996
- Commonly used with .gz file extension
Overview
Gzip (GNU zip) is a widely used file compression format and software application that originated in the early 1990s as part of the GNU Project. Created by Jean-loup Gailly and Mark Adler in 1992, gzip was developed as a free software replacement for the proprietary compress program, which used the LZW algorithm and was subject to patent restrictions. The name "gzip" stands for GNU zip, reflecting its origins in Richard Stallman's GNU Project. Gzip quickly became popular in Unix-like systems and was standardized in RFC 1952 in 1996. The format uses the .gz file extension and is commonly used for compressing single files, though it's often combined with tar to create .tar.gz archives for multiple files. By 2023, gzip remained one of the most common compression formats on the web, with approximately 40% of websites using it for HTTP compression according to W3Techs surveys. Its longevity stems from its open-source nature, good compression ratios, and widespread support across operating systems and applications.
How It Works
Gzip operates using the DEFLATE compression algorithm, which combines two techniques: LZ77 (Lempel-Ziv 1977) and Huffman coding. First, the LZ77 algorithm scans the input data for repeated sequences of bytes. When it finds a match to a previous occurrence (within a 32KB sliding window), it replaces the repeated sequence with a pair of numbers: a distance back to the previous occurrence and the length of the match. This process eliminates redundancy in the data. Next, Huffman coding takes the resulting stream of literals and length-distance pairs and assigns variable-length codes to each symbol, with more frequent symbols receiving shorter codes. The algorithm builds optimal prefix codes based on symbol frequencies, creating a binary tree structure. Gzip typically uses two Huffman trees: one for literals and lengths, and another for distances. The compressed output includes these code trees followed by the encoded data. During decompression, gzip reverses this process: it reads the Huffman trees, decodes the bitstream back into literals and pointers, then uses the LZ77 pointers to reconstruct the original data. The entire process operates on a per-file basis, with each file compressed independently.
Why It Matters
Gzip's significance extends across multiple domains due to its efficiency and ubiquity. On the web, gzip compression reduces bandwidth usage by 50-70% for text-based content like HTML, CSS, and JavaScript, leading to faster page loads and reduced hosting costs. Major web servers like Apache and Nginx include built-in gzip support, and content delivery networks (CDNs) use it extensively. In software distribution, gzip compresses source code archives and package repositories, saving storage space and download time. System administrators rely on gzip for log file compression, where it can reduce multi-gigabyte logs to manageable sizes for archival. The format's open specification and lack of patent restrictions enabled widespread adoption in open-source software, influencing subsequent compression tools like zlib and 7-Zip. While newer algorithms like Brotli and Zstandard offer better compression ratios, gzip remains the baseline due to its universal support, making it essential for backward compatibility and systems with limited computational resources.
More How Does in Daily Life
Also in Daily Life
More "How Does" Questions
Trending on WhatAnswers
Browse by Topic
Browse by Question Type
Sources
- WikipediaCC-BY-SA-4.0
Missing an answer?
Suggest a question and we'll generate an answer for it.