>_ Analyst Engineering

Reading a dbt DAG: The Map of How Your Data Is Built

Written by Ahmed at Analyst Engineering, a Senior Technical Business Analyst with 10+ years in banking and payments delivery.

Cover for a guide to reading a dbt DAG, the dependency graph of data transformations.

Key takeaways

  • A dbt DAG is the dependency graph of your transformations, built automatically from the ref() calls in your SQL, so it is always exactly what the code says, never a stale diagram.
  • The conventional layers read left to right: sources, staging models that clean one source each, intermediate models that join and enrich, and marts (fct_ and dim_ tables) that consumers query.
  • Because the DAG is real, it gives you real answers: upstream of a model is where its numbers come from, downstream is what breaks if it changes.
  • dbt tests attach to nodes in the graph: not_null, unique, accepted_values, and relationships assertions that run on every build, which is regression testing for data.

A dbt DAG is the dependency graph of your data transformations, built automatically from the ref() calls in the SQL itself. That makes it the rarest kind of diagram: one that cannot go stale. Learn to read it and you can answer where any number comes from and what any change will break.

A dbt DAG is the directed acyclic graph of a dbt project’s models: every model is a node, and every time one model selects from another via the ref() function, dbt records an edge. dbt, the transformation framework from dbt Labs that turned SQL files plus version control into the standard way analytics teams build warehouses, uses this graph to decide build order, to power its lineage view, and to answer the two questions that matter: what is upstream of this model (where its numbers come from) and what is downstream (what breaks if it changes). Because the graph is derived from the code, it is always exactly what the code says, which makes it unlike every architecture diagram you have ever distrusted. Reading one is SQL literacy plus about twenty minutes of convention, and here are the conventions.

What does a dbt DAG look like?

A typical project’s DAG reads left to right through conventional layers, here for a payments warehouse:

A dbt DAG: sources through staging and intermediate to marts Left to right: source nodes raw.payments and raw.accounts feed staging models stg_payments and stg_accounts. Both staging models feed int_payments_enriched. The intermediate model feeds the mart fct_payments; stg_accounts also feeds dim_accounts. The marts feed a settlement dashboard. Every arrow corresponds to a ref() call in a model's SQL. sources staging intermediate marts raw.payments raw.accounts stg_payments stg_accounts int_payments_ enriched fct_payments dim_accounts settlement dashboard

Sources are the raw tables dbt reads but does not build, the landing zone, often the bronze layer of a medallion architecture. Staging models (stg_) clean one source each: rename, type, standardize, nothing clever, one staging model per source table. Intermediate models (int_) do the joins and enrichment, here attaching account context to payments. Marts are what consumers query: fct_payments holding the events and dim_accounts holding the context, which is exactly the star schema shape with the construction lines still visible.

Every arrow exists because of one line of SQL. When int_payments_enriched contains from {{ ref('stg_payments') }}, that ref() call both resolves the table name for the environment and declares the edge. The DAG is not documentation about the code; it is the code, projected.

What can you do once you can read it?

Walk upstream to explain a number. The settlement dashboard’s total looks wrong; the DAG shows it reads fct_payments, which reads int_payments_enriched, which reads two staging models. That is your investigation path, and combined with SQL at each node, it is the point-of-divergence method applied to data: check each hop until the number stops being right. This upstream walk is column-level lineage’s table-level cousin, and dbt gives it to you for free.

Walk downstream to scope a change. The core system is renaming a field in raw.payments. Downstream of stg_payments in the DAG sits everything that change can break, through to the dashboard. That is impact analysis as a graph query instead of a guess, the same payoff a traceability matrix gives you for requirements, and it is how you avoid being the team that discovers a breaking change from a broken Monday report.

Read the diff to understand a change. dbt projects live in Git, so every change to the warehouse’s logic is a readable diff: which model changed, what the transformation used to say, what it says now. The analyst who reads the diff knows why the numbers moved this week; the one who cannot waits to be told.

Why do dbt tests matter as much as the DAG?

Because the DAG tells you how data is built, and the tests tell you whether the assumptions held. dbt tests are assertions attached to nodes: not_null and unique on fct_payments.uetr, accepted_values on the status column (the state machine, enforced), relationships verifying every payment’s account exists in dim_accounts, plus custom SQL tests for business rules like “no settled payment has a null settlement date.” They run on every build, so a violation fails the pipeline instead of surfacing as a quietly wrong dashboard three weeks later.

This should sound familiar, because it is regression testing for data: a codified suite of what must always hold, run on every change, growing every time a defect teaches you a new assertion. A QA analyst who can read the DAG and its tests can review a data change the way they review any change: what does it touch, what depends on it, and what proves it still works.

The takeaway

A dbt DAG is the dependency graph of your transformations, built from ref() calls, so it is always current by construction. Read it left to right through the conventional layers, sources, staging, intermediate, marts, and it gives you the two walks that answer most data questions: upstream to explain a number, downstream to scope a change. The tests attached to its nodes are the data regression suite that keeps every build honest.

If you can read SQL, you can read a dbt project, and the DAG is the map that makes the whole warehouse navigable.

About the author

Analyst Engineering is written by Ahmed, a Senior Technical Business Analyst with 10+ years of banking and payments delivery experience: ISO 20022 and SWIFT messaging, payments API integration, Kafka event validation, and production support. Every article comes from real delivery work, and each one is reviewed and updated as tools and standards change.

Newsletter

Subscribe

Practical, no-fluff playbooks for technical analysts who analyze, code, test, and support. New articles straight to your inbox.

No spam. Unsubscribe anytime.