Star Schema vs Snowflake Schema: Which Shape Fits Your Warehouse
Written by Ahmed at Analyst Engineering, a Senior Technical Business Analyst with 10+ years in banking and payments delivery.
Key takeaways
- Both schemas put measurable events in a central fact table surrounded by descriptive dimension tables. The only structural difference is whether the dimensions are denormalized (star) or normalized into sub-dimensions (snowflake).
- A star schema trades storage for simplicity: one join per dimension, queries analysts can actually write, and the shape most BI tools expect.
- A snowflake schema trades simplicity for normalization: less redundancy and cleaner hierarchy maintenance, at the cost of chains of joins.
- For analyst-facing marts the star wins almost every time. Snowflaking earns its keep in large, shared dimensions with deep hierarchies that change independently.
Star and snowflake schemas both organize a warehouse around a central fact table and its dimensions. The whole difference is whether the dimensions are denormalized into single wide tables (star) or normalized into chains of sub-dimensions (snowflake), and that one choice decides how hard the warehouse is to query.
A star schema and a snowflake schema model the same thing, measurable events in a central fact table surrounded by descriptive dimension tables, and differ in exactly one structural decision: in a star, each dimension is one denormalized table joined directly to the fact, while in a snowflake, dimensions are normalized into multiple related tables, so reaching an attribute can take a chain of joins. That single decision cascades into everything an analyst feels: how hard queries are to write, how well BI tools cope, how much redundancy the warehouse carries, and where a hierarchy like branch to region gets maintained. Knowing both shapes, and which one you are looking at, is core data literacy for anyone who queries a warehouse with SQL.
I care about this distinction as a payments analyst because the reporting side of every payment platform lands in one of these shapes. The payments you trace through the message flow during the day become rows in a fct_payments table by night, and whether the surrounding dimensions are starred or snowflaked decides how painful your Monday morning query is.
What do the two shapes actually look like?
Both schemas center on a fact table at a declared grain, one row per payment, with dimensions around it. Here is the same payments warehouse in both shapes:
In the star, dim_account is one wide table: the account attributes, its branch attributes, and its region attributes all denormalized together, so region totals are one join away from the fact. In the snowflake, that same dimension normalizes into a chain:
Same data, same grain, same measures. The difference is entirely in how far an attribute sits from the fact.
Star vs snowflake at a glance
| Dimension of comparison | Star schema | Snowflake schema |
|---|---|---|
| Dimension tables | One wide, denormalized table per dimension | Normalized into chains of sub-dimension tables |
| Joins per query | One hop from fact to each dimension | Multi-hop chains to reach outer attributes |
| Redundancy | Higher: region name repeats on every account row | Lower: each attribute stored once |
| Query writing | Simple, predictable, analyst-friendly | Harder: you must know the join path |
| BI tool fit | The shape most tools and vendors assume | Awkward: generated SQL gets complex |
| Hierarchy maintenance | Updates touch many denormalized rows | Hierarchy maintained in exactly one place |
| Best for | Analyst-facing marts and reporting | Very large shared dimensions with deep hierarchies |
Why does the star usually win for analysts?
The star wins because the person paying the schema’s costs is the analyst writing queries, and the star makes every query one predictable shape: the fact joined one hop to each dimension you need. Group by any attribute, filter by any attribute, and the join path is always the same. That predictability is why the star is the default recommendation of dimensional modeling since Ralph Kimball formalized it, and why BI tools generate their SQL against it.
The snowflake’s storage savings, meanwhile, bought more when storage was expensive. In a modern columnar warehouse, a repeated region name compresses to almost nothing, so the redundancy the snowflake eliminates is cheap, while the join chains it introduces are a permanent tax on every query and every person who writes one. When I find a warehouse where simple questions need five joins, the cause is usually snowflaking that saved megabytes and cost months.
The honest case for snowflaking is narrower but real: a genuinely huge dimension where redundancy hurts, or a hierarchy that several fact tables share and that must be maintained in exactly one place, branch reorganizations being a classic banking example. Then normalizing that one dimension is a deliberate, local decision, not a philosophy applied to the whole warehouse.
How do you read a schema you did not design?
Read a warehouse schema the way you read any system: by testing it. Find the fact table, it is the long table whose name or row count gives it away, and establish its grain: what does one row mean? One payment? One payment per day? Getting the grain wrong is the root of most wrong numbers in reports. Then walk the foreign keys outward and see whether each dimension is one hop (star) or a chain (snowflake), which tells you how to write your joins.
Two checks pay for themselves every time. First, confirm the grain by counting: if fct_payments is one row per payment, the count for yesterday should match the source system’s count, which is a small reconciliation. Second, look up every field you use in the data dictionary, because amount in a payments fact could be instructed, settled, or converted, and the schema diagram will not tell you which. The shape tells you how to join; the dictionary tells you what you joined.
The takeaway
Star and snowflake schemas are the same dimensional model with one different decision: dimensions denormalized into single wide tables, or normalized into chains. The star costs some redundancy and buys simple, predictable, one-hop queries, which is why it is the right default for analyst-facing marts and the shape BI tools expect. The snowflake buys normalization and costs join complexity, and it earns that cost only for large shared dimensions with hierarchies that must live in one place.
Find the fact table, establish the grain, walk the keys, and either shape becomes readable in an afternoon.
About the author
Analyst Engineering is written by Ahmed, a Senior Technical Business Analyst with 10+ years of banking and payments delivery experience: ISO 20022 and SWIFT messaging, payments API integration, Kafka event validation, and production support. Every article comes from real delivery work, and each one is reviewed and updated as tools and standards change.
Related articles
- The Data Dictionary: Every Field, Defined Once What a data dictionary is, how to build one, and why a single authoritative definition of every field prevents the ambiguity that breaks integrations and reports.
- SQL for Analysts: Query the State, Find the Truth The SQL a technical analyst actually needs: SELECT, WHERE, JOIN, GROUP BY, and reading state during analysis and testing. Not for reports, for finding the truth.
- Medallion Architecture: Bronze, Silver, and Gold, Explained The medallion architecture organizes a lakehouse into bronze (raw), silver (cleaned), and gold (business-ready) layers. What each layer owns, with a diagram.
- Reading a dbt DAG: The Map of How Your Data Is Built A dbt DAG is the dependency graph of your transformations, built from ref() calls. How to read one, the staging to marts convention, and why tests live on nodes.
Newsletter
Subscribe
Practical, no-fluff playbooks for technical analysts who analyze, code, test, and support. New articles straight to your inbox.
No spam. Unsubscribe anytime.