Why do we call file systems a tree when they can have symbolic links

Last updated: April 1, 2026

Quick Answer: We call file systems a tree because Unix's original hierarchical directory structure, designed around 1971, was literally shaped like a mathematical tree — a single root directory with branches of subdirectories and files as leaves, each directory having exactly one parent. The term stuck even after symbolic links were introduced in BSD Unix 4.2 in 1983, which technically transformed the namespace from a strict tree into a directed graph. The underlying inode-based directory hierarchy is still tree-like, and Linux limits symlink traversal to 40 hops to prevent infinite loops. The tree label persists because it remains a useful mental model for everyday file navigation.

Key Facts

Overview: The Tree Metaphor and Its Unix Origins

The term tree for file systems comes directly from computer science's use of the tree data structure — a hierarchical arrangement where each node has exactly one parent (except the root, which has none) and no cycles exist. When Unix introduced its hierarchical file system in the early 1970s, the directory structure genuinely resembled a mathematical tree: a single root directory at the top, directories branching outward into subdirectories, and files sitting at the leaf positions with no paths looping back. The name was natural, intuitive, and technically accurate at the time of its coining.

The problem is that symbolic links (symlinks), introduced in BSD Unix 4.2 in 1983, allow any directory entry to point to any other file or directory anywhere in the system — including ones that cause a single file to appear in multiple locations simultaneously, or that create loops where following a path repeatedly leads back to the same point. This means the modern Unix, Linux, and macOS file system, strictly speaking, is no longer a pure tree. Mathematically, it is closer to a directed graph — or at best, a directed acyclic graph (DAG) when circular symlinks are prevented by operating system limits.

Yet we still say the file system tree, and this phrasing appears throughout official Linux documentation, university operating systems textbooks, shell tool manpages, and everyday developer conversation. The reason is a combination of historical inertia, the fact that the underlying inode-based structure genuinely is still tree-shaped at the directory level, and the continued usefulness of the tree mental model for navigating, organizing, and reasoning about files in everyday work.

Technical Reality: Trees, DAGs, and Graphs in File System Structure

To fully understand why the terminology is both imprecise and defensible, it helps to distinguish three levels of graph structure that are relevant here:

The key insight is that Unix and Linux file systems operate at two distinct structural layers that behave differently:

When Dennis Ritchie and Ken Thompson designed Unix around 1969–1971 at Bell Labs, they explicitly chose a simple, clean hierarchical design as a deliberate improvement over the more complex file system structures in earlier systems. The original Unix Programmer's Manual (1st Edition, November 1971) describes the file system as a hierarchy with a single root, and the directory structure at that time was a genuine tree. Hard links were present from the very beginning, technically making the structure a DAG at the file level — but the critical restriction against hard-linking directories preserved the tree structure at the directory level, which is the structure users actually navigate and reason about.

Symbolic links changed the picture significantly. A symlink is a special file whose entire content is a path string. When the kernel resolves a path and encounters a symlink, it substitutes the symlink's target string and continues resolution from that point. This mechanism allows directories to appear in multiple locations, enables cross-filesystem references that hard links cannot make, and creates the theoretical possibility of resolution cycles. Linux handles cycles by maintaining a hop counter during path resolution: the counter increments with each symlink followed, and when it reaches 40 (the value of MAXSYMLINKS), the system call returns ELOOP — the error message reads too many levels of symbolic links.

Common Misconceptions About File System Structure

Misconception 1: The file system is a tree. This statement is technically imprecise for any modern Unix, Linux, macOS, or Windows system in widespread use today. At the namespace level — the paths you actually traverse — the presence of symbolic links makes the file system a directed graph, not a tree. The more accurate statement is: the underlying inode-based directory hierarchy is tree-shaped (or a DAG due to hard links on files), but the full namespace including symbolic links forms a general directed graph. Most documentation and textbooks simplify this to tree because the concept is more useful for everyday file navigation than graph theory, and for most real-world use the tree model holds well enough.

Misconception 2: Symbolic links break or corrupt the file system. Symlinks do not break anything — they are a deliberate, well-supported, and widely useful feature that has been part of Unix since 1983. The kernel handles cycle detection during path resolution via the hop counter limit described above. Tools like find, rsync, and tar provide explicit flags (-L and --follow-symlinks) to control whether symlinks are followed or preserved as links, and languages like Python provide os.walk(followlinks=False) as a safe default. The graph nature of the namespace is a known, well-managed property of the system, not a defect.

Misconception 3: Windows uses a fundamentally different file system structure. Windows NTFS has supported directory junction points since Windows 2000 (released February 2000) and full symbolic links for both files and directories since Windows Vista (released January 2007) via the CreateSymbolicLink Win32 API call. The NTFS directory structure faces exactly the same theoretical graph issues as Unix when symlinks are present, yet Windows documentation also consistently uses the phrase directory tree throughout its official materials. The simplification to tree in naming is a universal convention across all major operating systems, not a quirk specific to the Unix tradition.

Practical Implications for Developers and System Administrators

Understanding that a file system is a graph rather than a pure tree has real, concrete consequences for anyone writing tools or scripts that traverse directory structures:

The persistence of the tree metaphor is ultimately a lesson in how naming conventions outlive their technical precision once they become embedded in culture, documentation, education, and tooling ecosystems. The tree model is genuinely useful for understanding how to navigate a file system, reasoning about permission inheritance through directory hierarchies, and organizing directory structures for projects. The graph reality matters when you are writing tools that traverse the file system programmatically, auditing security configurations for symlink-based vulnerabilities, or managing complex deployment environments that use symbolic links extensively for version management or configuration abstraction across environments.

Related Questions

What is the difference between a hard link and a symbolic link?

A hard link is a direct directory entry pointing to an inode (the actual on-disk data structure), meaning multiple filenames literally refer to the same physical file — deleting one does not remove the data as long as at least one other hard link exists. A symbolic link is an indirect reference: a special file that contains a path string, and the kernel follows this path when the symlink is accessed during path resolution. Hard links cannot cross file system boundaries or link to directories in most systems, while symbolic links can do both freely. A file with 3 hard links will show a link count of 3 in <code>ls -l</code> output; symlinks are shown as a separate file type indicated by the letter l at the start of the permissions field.

How does Linux detect and prevent infinite loops from circular symbolic links?

Linux uses a simple hop counter during path resolution: every time the kernel follows a symbolic link while resolving a path, it increments this counter, and if the count exceeds 40 (the value of MAXSYMLINKS defined in the kernel source), the system call returns an ELOOP error with the message too many levels of symbolic links. This limit of 40 was chosen as a value that legitimate real-world symlink chains would never approach, providing a practical safety margin. The counter is scoped per path resolution operation, resetting with each new system call, so it does not penalize unrelated paths. This approach is simpler and more efficient than tracking visited inodes, which would require memory allocation proportional to path depth.

What is an inode in a Unix file system?

An inode (index node) is a data structure stored on disk that contains all metadata about a file except its name and actual content: permissions, ownership (user and group IDs), timestamps for creation, modification, and last access, file size in bytes, and pointers to the data blocks on disk where the file's content resides. File names exist only in directory entries, which map a human-readable name to an inode number — this is exactly why hard links work, since multiple names in different locations can map to the identical inode number. In the ext4 file system commonly used in Linux, each inode is 256 bytes by default, and the total number of inodes (and therefore the maximum number of files) is fixed at file system creation time, which can cause a disk to run out of inodes before running out of raw storage space.

Why can't you create hard links to directories in most file systems?

Hard links to directories are prohibited in most Unix-like systems to prevent cycles in the directory graph, which would cause tree traversal algorithms used by tools like <code>find</code>, <code>du</code>, and backup utilities to loop infinitely without special cycle detection. If directory hard links were permitted, you could create a structure where directory A contains directory B, and B also appears as a hard-linked entry back inside A, creating a genuine cycle at the inode level that no path-based traversal could safely handle. The POSIX standard explicitly permits implementations to restrict directory hard links, and Linux, macOS, and most BSD variants enforce this restriction, returning Operation not permitted for such attempts. Only the root user on some legacy Unix systems could create directory hard links, and even then the practice was strongly discouraged in system documentation.

How do other hierarchical data structures compare to a file system in terms of graph theory?

A file system's directory structure is the most complex of several common hierarchical structures in everyday computing when symbolic links are considered. DNS (Domain Name System) uses a genuine strict tree — each domain name has exactly one parent zone, with no cross-links or cycles possible, making it a true tree at all levels. XML and HTML documents also form strict trees where each element has exactly one parent node. Git's commit history is a DAG — commits can have multiple parents via merge commits but cycles are cryptographically impossible. Unlike all of these, a Unix file system with symbolic links is a general directed graph, the only one of these common structures that can contain cycles at the namespace level.

Sources

  1. Unix filesystem — WikipediaCC BY-SA 4.0
  2. Symbolic link — WikipediaCC BY-SA 4.0
  3. Inode — WikipediaCC BY-SA 4.0
  4. Hard link — WikipediaCC BY-SA 4.0