How to tgz a folder in linux

Content on WhatAnswers is provided "as is" for informational purposes. While we strive for accuracy, we make no guarantees. Content is AI-assisted and should not be used as professional advice.

Last updated: April 4, 2026

Quick Answer: To compress a folder into a tar.gz file in Linux, use the command `tar -czf archive.tar.gz folder_name/` which combines compression and archiving into a single file. This creates a gzipped tar archive that preserves the folder structure and reduces file size significantly for storage and transfer.

Key Facts

What It Is

A tar.gz file is a compressed archive that combines two Linux technologies: tar (Tape Archive) for bundling files and directories, and gzip for compression. The tar utility groups multiple files and folders into a single archive while preserving directory structures, permissions, and symbolic links, creating what's called a tarball. Gzip then compresses this tarball, reducing its size significantly for efficient storage and transmission over networks. The .tar.gz extension (sometimes written as .tgz) indicates that the file has been both tarred and gzip-compressed.

The tar command was developed in 1979 as part of the Unix operating system to create tape archives for backup purposes. Gzip compression was created by Jean-loup Gailly in 1992 as an improvement over the older compress utility, becoming the Linux standard for file compression. The combination of tar and gzip became the de facto standard for distributing software, sharing large datasets, and creating backups in Unix and Linux environments. This format gained widespread adoption because it preserved file metadata and handled directory structures better than competing compression methods.

There are several compression methods that can be combined with tar, each with different compression ratios and speeds. Gzip (tar.gz) offers good compression with fast processing, making it ideal for general-purpose archiving and distribution. Bzip2 (tar.bz2) provides better compression ratios but processes more slowly, suitable for long-term storage where speed is less critical. XZ compression (tar.xz) achieves the highest compression ratios but requires the most computational resources. Choosing the right compression method depends on balancing compression ratio, processing speed, and compatibility requirements.

How It Works

The tar command processes folders recursively, reading each file and directory while preserving metadata like timestamps, ownership, and permissions in the archive. The -c flag tells tar to create an archive, -z enables gzip compression, -f specifies the output filename, and the folder path indicates what to archive. The command `tar -czf archive.tar.gz folder_name/` creates the archive in one operation, with tar automatically applying gzip compression before writing to disk. The result is a single file containing the complete folder structure and all contents, compressed and ready for distribution.

Consider a system administrator backing up a web server's configuration directory using `tar -czf config-backup.tar.gz /etc/apache2/`. This creates a single archive file containing all Apache configuration files, modules, and subdirectories with their original permissions intact. When distributed to a remote server for recovery, the administrator simply extracts it with `tar -xzf config-backup.tar.gz`, and the entire directory structure is restored with correct file permissions and ownership. Another example involves a software developer packaging their project with `tar -czf myproject-1.0.tar.gz src/ docs/ Makefile`, creating a distributable package that users can extract and build on any Linux system.

To create a tar.gz archive effectively, navigate to the parent directory containing the folder you want to compress and use `tar -czf output_name.tar.gz folder_to_archive/`. For excluding specific files or patterns, use the --exclude flag: `tar -czf archive.tar.gz --exclude='*.log' --exclude='.git' project/` prevents log files and git metadata from being included. To verify the archive contents before extraction, use `tar -tzf archive.tar.gz` to list all files without decompressing. When extracting, use `tar -xzf archive.tar.gz` to decompress and restore the full directory structure in the current working directory.

Why It Matters

Tar.gz compression is critical for efficient data management in Linux environments, reducing storage requirements by 50-90% depending on file types and content. For organizations managing terabytes of data, this compression translates directly to significant cost savings in storage infrastructure, backup systems, and bandwidth for data transfers. A 1GB uncompressed directory might become 200-400MB when compressed with tar.gz, enabling faster transfers and reduced network congestion. In cloud computing environments where data transfer costs are metered by gigabyte, effective compression can reduce expenses by thousands of dollars annually.

The tar.gz format is essential across numerous industries and applications with widespread practical impact. Linux distributions like Ubuntu, Fedora, and Debian distribute source code and software packages using tar.gz archives, enabling millions of developers to download and compile software efficiently. Web hosting companies use tar.gz backups to store customer data and enable disaster recovery, protecting against data loss from hardware failures or security breaches. Research institutions and scientific organizations rely on tar.gz for sharing large datasets with collaborators worldwide, with formats like HDF5 and NetCDF often distributed as gzipped tar archives containing gigabytes of scientific data.

Future developments will likely see increased adoption of newer compression algorithms like Zstandard (zstd) and Brotli that offer better compression ratios with faster processing than gzip. Container technologies like Docker already leverage compression for efficient image distribution, suggesting that advanced compression will become standard for any large-scale data management. As data volumes continue to grow exponentially with IoT devices, machine learning datasets, and multimedia content, compression algorithms will become increasingly important for sustainable data management and energy efficiency. Integration with distributed storage systems will enable streaming decompression, allowing users to access archived data without fully extracting entire archive files.

Common Misconceptions

Many users mistakenly believe that tar.gz format requires less disk space during extraction than the original uncompressed folder, but extraction actually requires enough space for both the compressed file and the extracted contents temporarily. During the extraction process, disk space must accommodate the full uncompressed folder plus the original tar.gz file, which can be problematic on storage-constrained systems. Some users experience failures when extracting large archives on systems without adequate free space, leading them to incorrectly conclude that the compression failed. Understanding this distinction helps prevent extraction failures and informs storage planning for backup and distribution scenarios.

Another misconception is that tar.gz provides encryption or security protection, when it actually only compresses and archives without any password protection or data scrambling. While compressed files are smaller and thus less immediately readable, the data inside tar.gz archives is not encrypted, and anyone with the file can extract and read its contents. For sensitive data requiring confidentiality, users should encrypt the tar.gz file separately using tools like GPG or create encrypted archives using formats like tar with OpenSSL encryption. This distinction is critical for security-conscious organizations handling confidential information that requires both compression and encryption.

Some users assume that all compression utilities can read tar.gz files interchangeably, but compatibility issues can arise with older or non-standard implementations. While most modern tools including Windows, macOS, and Linux can handle tar.gz files, some older systems might require specific utilities or conversion steps. Additionally, some software incorrectly handles symbolic links, file permissions, or special characters in filenames when processing tar archives created on different systems. Testing archive compatibility across target systems before relying on them for critical operations ensures that extraction will succeed without unexpected data loss or corruption.

Common Misconceptions

Related Questions

What's the difference between tar.gz and zip?

Tar.gz is the Linux standard that preserves file permissions and symbolic links, while zip is more universal but sometimes loses Unix metadata. Zip is better for cross-platform sharing with non-technical users, while tar.gz is superior for Unix/Linux development and system administration. Tar.gz typically compresses better than zip for text-based files, achieving 60-80% compression versus zip's 40-50%.

How do I extract a tar.gz file?

Use the command `tar -xzf archive.tar.gz` to extract the archive in the current directory, or `tar -xzf archive.tar.gz -C /path/` to extract to a specific location. The -x flag extracts, -z handles gzip decompression, and -f specifies the filename. You can preview contents with `tar -tzf archive.tar.gz` before extracting.

Can I exclude files when creating a tar.gz?

Yes, use the --exclude flag: `tar -czf archive.tar.gz --exclude='*.log' --exclude='.git' folder/` to omit matching files from the archive. You can use multiple --exclude flags for different patterns, or --exclude-from=file.txt to read patterns from a file. This is useful for excluding build artifacts, cache files, or version control metadata.

Sources

  1. Wikipedia - Tar (computing)CC-BY-SA-4.0
  2. Wikipedia - GzipCC-BY-SA-4.0

Missing an answer?

Suggest a question and we'll generate an answer for it.