A dbt DAG (directed acyclic graph) is the dependency graph of a dbt project's models. Every time one model references another with the ref() function, dbt records an edge, and the resulting graph determines build order, powers lineage visualization, and defines what is upstream and downstream of any model. Because the DAG is derived from the code itself, it is always current, unlike a hand-drawn architecture diagram.

What do staging, intermediate, and mart layers mean in dbt?

They are dbt's conventional model layers. Staging models (stg_) clean and standardize one source each: renaming, typing, and light fixes, one staging model per source table. Intermediate models (int_) combine and enrich staged data with joins and business logic. Marts are the consumer-facing layer: fact tables (fct_) holding events and dimension tables (dim_) holding context, which BI tools and analysts query. Data flows left to right through these layers in the DAG.

What does ref() do in dbt?

ref() is how one dbt model refers to another: select from ref('stg_payments') instead of hard-coding a table name. It does two jobs: it resolves to the correct schema and table for the environment you are running in, and it declares a dependency edge that dbt uses to build the DAG. Every arrow in a dbt lineage graph exists because of a ref() call in someone's SQL.

How do dbt tests work?

dbt tests are assertions attached to models and columns that run against the built data: not_null and unique on keys, accepted_values on status codes, relationships to verify foreign keys resolve, plus custom SQL tests for business rules. They run on every build, so a change that violates an assumption fails immediately rather than surfacing as a wrong number in a dashboard weeks later. They are the data layer's regression suite.

Why should an analyst be able to read a dbt DAG?

Because the DAG answers the questions analysts get asked. Where does this dashboard number come from: walk upstream. What breaks if the source field changes: walk downstream. Why do two marts disagree: compare their upstream paths. An analyst who can open the lineage graph and read models answers these directly, instead of queuing them for a data engineer, which is the same self-sufficiency SQL provides.

Reading a dbt DAG: The Map of How Your Data Is Built

Written by Ahmed at Analyst Engineering, a Senior Technical Business Analyst with 10+ years in banking and payments delivery.

A dbt DAG is the dependency graph of your data transformations, built automatically from the ref() calls in the SQL itself. That makes it the rarest kind of diagram: one that cannot go stale. Learn to read it and you can answer where any number comes from and what any change will break.

A dbt DAG is the directed acyclic graph of a dbt project’s models: every model is a node, and every time one model selects from another via the ref() function, dbt records an edge. dbt, the transformation framework from dbt Labs that turned SQL files plus version control into the standard way analytics teams build warehouses, uses this graph to decide build order, to power its lineage view, and to answer the two questions that matter: what is upstream of this model (where its numbers come from) and what is downstream (what breaks if it changes). Because the graph is derived from the code, it is always exactly what the code says, which makes it unlike every architecture diagram you have ever distrusted. Reading one is SQL literacy plus about twenty minutes of convention, and here are the conventions.

What does a dbt DAG look like?

A typical project’s DAG reads left to right through conventional layers, here for a payments warehouse:

Sources are the raw tables dbt reads but does not build, the landing zone, often the bronze layer of a medallion architecture. Staging models (stg_) clean one source each: rename, type, standardize, nothing clever, one staging model per source table. Intermediate models (int_) do the joins and enrichment, here attaching account context to payments. Marts are what consumers query: fct_payments holding the events and dim_accounts holding the context, which is exactly the star schema shape with the construction lines still visible.

Every arrow exists because of one line of SQL. When int_payments_enriched contains from {{ ref('stg_payments') }}, that ref() call both resolves the table name for the environment and declares the edge. The DAG is not documentation about the code; it is the code, projected.

What can you do once you can read it?

Walk upstream to explain a number. The settlement dashboard’s total looks wrong; the DAG shows it reads fct_payments, which reads int_payments_enriched, which reads two staging models. That is your investigation path, and combined with SQL at each node, it is the point-of-divergence method applied to data: check each hop until the number stops being right. This upstream walk is column-level lineage’s table-level cousin, and dbt gives it to you for free.

Walk downstream to scope a change. The core system is renaming a field in raw.payments. Downstream of stg_payments in the DAG sits everything that change can break, through to the dashboard. That is impact analysis as a graph query instead of a guess, the same payoff a traceability matrix gives you for requirements, and it is how you avoid being the team that discovers a breaking change from a broken Monday report.

Read the diff to understand a change. dbt projects live in Git, so every change to the warehouse’s logic is a readable diff: which model changed, what the transformation used to say, what it says now. The analyst who reads the diff knows why the numbers moved this week; the one who cannot waits to be told.

Why do dbt tests matter as much as the DAG?

Because the DAG tells you how data is built, and the tests tell you whether the assumptions held. dbt tests are assertions attached to nodes: not_null and unique on fct_payments.uetr, accepted_values on the status column (the state machine, enforced), relationships verifying every payment’s account exists in dim_accounts, plus custom SQL tests for business rules like “no settled payment has a null settlement date.” They run on every build, so a violation fails the pipeline instead of surfacing as a quietly wrong dashboard three weeks later.

This should sound familiar, because it is regression testing for data: a codified suite of what must always hold, run on every change, growing every time a defect teaches you a new assertion. A QA analyst who can read the DAG and its tests can review a data change the way they review any change: what does it touch, what depends on it, and what proves it still works.

The takeaway

A dbt DAG is the dependency graph of your transformations, built from ref() calls, so it is always current by construction. Read it left to right through the conventional layers, sources, staging, intermediate, marts, and it gives you the two walks that answer most data questions: upstream to explain a number, downstream to scope a change. The tests attached to its nodes are the data regression suite that keeps every build honest.

If you can read SQL, you can read a dbt project, and the DAG is the map that makes the whole warehouse navigable.

Reading a dbt DAG: The Map of How Your Data Is Built

Key takeaways

What does a dbt DAG look like?

What can you do once you can read it?

Why do dbt tests matter as much as the DAG?

The takeaway

About the author

Reading a dbt DAG: The Map of How Your Data Is Built

Key takeaways

What does a dbt DAG look like?

What can you do once you can read it?

Why do dbt tests matter as much as the DAG?

The takeaway

About the author

Related articles

Subscribe