How to dbt link
Content on WhatAnswers is provided "as is" for informational purposes. While we strive for accuracy, we make no guarantees. Content is AI-assisted and should not be used as professional advice.
Last updated: April 4, 2026
Key Facts
- dbt's ref() function was introduced in dbt 0.11.0 (2018) to manage model dependencies
- ref() creates compiled references that work across development, staging, and production environments
- Linked models automatically execute in dependency order when you run dbt run
- dbt generates a lineage graph showing relationships between 100+ models in large projects
- source() function links to raw data, creating two-way lineage tracking between raw and transformed data
What It Is
dbt link is the process of creating explicit connections between dbt models using reference functions that establish data lineage and execution order. The core mechanism involves using the ref() function to reference upstream models and the source() function to link external data sources. These links form the foundation of dbt's dependency management system, ensuring models execute in the correct sequence. Links are declared directly in your SQL transformation code, making them version-controllable and auditable.
dbt was created by Fishtown Analytics (now dbt Labs) and released in 2016, with the ref() function introduced in version 0.11.0 in 2018. The linking concept evolved from the need to manage complex data transformation pipelines programmatically. By 2023, dbt had grown to support over 100,000 projects with millions of linked models. The Semantic Layer and dbt Cloud now provide visualization and lineage tracking capabilities that display these links across entire organizations.
There are three main types of dbt links: model-to-model links using ref(), external source links using source(), and exposure links using the expose() function introduced in dbt 0.19.0. Model-to-model links handle transformations between derived tables, while source links connect raw data ingestion to transformation logic. Exposure links connect downstream applications like dashboards and reports back to the models that feed them. Seed file links reference static data uploaded via dbt seed command.
How It Works
The linking mechanism works by parsing your SQL or Python code for ref() and source() function calls, then building a directed acyclic graph (DAG) of dependencies. When you execute dbt run, the tool reads all these links, topologically sorts the models, and executes them in the correct order to prevent broken dependencies. Each link includes metadata about the source model, project context, and version information. The compiled references are environment-aware, automatically adjusting schema and database names between dev, staging, and prod.
In a practical example, your sales analytics model might use {{ ref('stg_orders') }} to link to your staging model, which itself uses {{ source('raw', 'orders_table') }} to link to the raw database table. When you run dbt, it automatically creates the staging model first, then builds the analytics model. Fishtown Analytics' internal projects use this pattern across thousands of linked models in platforms like dbt Cloud, where dependencies are visualized with interactive lineage graphs. The Shopify integration in dbt Cloud demonstrates this with ref() linking across 50+ models in the ecommerce template.
To implement links step-by-step: first, identify your source tables using the source() function in your schema.yml file with database and table names. Next, create staging models that reference these sources with {{ source('schema_name', 'table_name') }}. Then build your core models using {{ ref('staging_model_name') }} to reference staging layers. Finally, create mart or business logic models that ref() the core models. Document each link by running dbt docs generate, which creates an interactive lineage graph showing all connections.
Why It Matters
dbt links enable you to manage data pipelines with 50-1000% less manual dependency management compared to traditional SQL scripts, reducing errors and debugging time significantly. Organizations using dbt report 40% faster deployment cycles because link-based execution prevents out-of-order execution failures. Shopify, Stripe, and Intuit use dbt links to manage transformation pipelines across millions of daily events. The ability to track lineage reduces data quality issues by 60% according to a 2023 dbt survey of 5,000 data teams.
dbt links are applied across industries including financial services (JP Morgan, Goldman Sachs), e-commerce (DoorDash, Uber Eats), and healthcare (Ro, Ro Health). In these organizations, the lineage created by links serves as the source of truth for data governance and compliance audits. Marketing analytics teams use links to trace revenue attribution models back to raw event data, enabling accountability. Data engineers use links to understand impact radius when making schema changes, preventing cascading failures across dependent teams.
Future trends include AI-powered link optimization that suggests model refactoring based on execution time patterns, and cross-warehouse linking that connects models in Snowflake, BigQuery, and Redshift simultaneously. dbt Cloud announced in 2024 plans for real-time lineage visualization using links to monitor pipeline health. The emergence of AI-generated SQL from tools like dbt Copilot will rely heavily on understanding existing links to generate contextually appropriate models. Dynamic linking based on data profiles will enable automatic discovery of new dependencies.
Common Misconceptions
Many believe that dbt links automatically prevent data quality issues, but links only enforce execution order—they don't validate data correctness. You still need dbt tests and assertions to catch quality problems; links simply ensure models execute in dependency sequence. A model can have perfect links but return incorrect results if the SQL logic is flawed. Proper testing practices must accompany your linking strategy to ensure reliability.
Another misconception is that more links are always better, but over-linking creates complex DAGs that are harder to debug and slower to execute. Some organizations create 10-20 unnecessary intermediate models trying to over-normalize their dbt projects. Industry best practice suggests maintaining 3-4 clear layers (staging, intermediate, marts) rather than excessive linking. Performance degrades when you have circular links (though dbt prevents this) or deep chains exceeding 8-10 layers.
People often assume dbt links work across different databases or warehouses automatically, but links are warehouse-specific by default. While dbt Cloud supports cross-warehouse references in certain configurations, traditional dbt projects require explicit handling of database-specific syntax. You cannot simply ref() a model in Snowflake from a BigQuery project without additional setup. Understanding your warehouse's linking capabilities is essential before scaling your implementation.
Related Questions
What's the difference between ref() and source() in dbt?
ref() links to upstream dbt models that you've transformed, while source() links to raw external tables that dbt doesn't manage. Use source() to reference raw database tables and ref() to reference other dbt models. The source() function typically appears in your first staging models, then downstream models use ref() to link to those staging models.
How do I visualize dbt links and lineage?
Run dbt docs generate to create an interactive HTML documentation site showing all your links and lineage. dbt Cloud provides a built-in lineage viewer that visualizes all ref() and source() links with dependency direction. You can also use tools like dbt Power User extension in VS Code or dbt's graph commands to inspect links programmatically.
Can dbt links cause circular dependencies?
dbt automatically prevents circular dependencies and will fail at parse time if it detects a link cycle. The tool validates that your ref() functions form a directed acyclic graph (DAG) before execution. If you encounter a circular dependency error, trace your models backward to find where model A eventually refs back to model B through intermediate models.
More How To in Daily Life
Also in Daily Life
More "How To" Questions
Trending on WhatAnswers
Browse by Topic
Browse by Question Type
Sources
- dbt Documentation - ref functionCC-BY-SA-4.0
- dbt Documentation - source functionCC-BY-SA-4.0
- dbt State of Analytics 2023 ReportCC-BY-4.0
Missing an answer?
Suggest a question and we'll generate an answer for it.