>_ Analyst Engineering

Medallion Architecture: Bronze, Silver, and Gold, Explained

Written by Ahmed at Analyst Engineering, a Senior Technical Business Analyst with 10+ years in banking and payments delivery.

Cover for a guide to the medallion architecture, the bronze, silver, and gold layers of a lakehouse.

Key takeaways

  • The medallion architecture organizes data into three layers of increasing quality: bronze holds raw data exactly as it arrived, silver holds cleaned and conformed data, gold holds business-ready aggregates and marts.
  • Bronze is the audit trail: append-only, unmodified, replayable. If silver logic turns out wrong, you rebuild it from bronze, which is why bronze is never 'cleaned up.'
  • Each boundary is a contract an analyst can test: bronze-to-silver owns validation and deduplication, silver-to-gold owns business rules and aggregation.
  • When a number in a gold dashboard looks wrong, the layered design turns the investigation into the same hop-by-hop trace as a failed payment: walk it back through silver to bronze until the divergence appears.

The medallion architecture organizes a data platform into three layers of increasing quality: bronze lands the raw data untouched, silver cleans and conforms it, gold serves it business-ready. The layers are not bureaucracy; each boundary is a contract you can test and a hop you can trace a wrong number back through.

The medallion architecture is a layered data design, popularized by Databricks for the lakehouse, that moves data through three stages of progressively higher quality: bronze holds raw data exactly as it arrived from source systems, silver holds that data cleaned, validated, deduplicated, and conformed, and gold holds the business-ready tables that dashboards and reports actually consume. Each layer has one job, each boundary between layers is a defined transformation, and that structure is what makes a data platform diagnosable rather than a single opaque pipeline. For an analyst, the mental model transfers directly from payments: it is the message flow of the data world, and you trace a wrong number through it the same way you trace a failed payment, hop by hop to the point of divergence.

What does each layer own?

Each layer owns one kind of work, and keeping the work in its layer is the whole discipline:

Medallion architecture: bronze, silver, and gold layers Left to right: sources (core banking extracts, payment events) flow into Bronze (raw, as received, append-only), then through validation and deduplication into Silver (cleaned, conformed, entity level), then through business rules and aggregation into Gold (marts, star schemas, business-ready), which feeds dashboards, regulatory reports, and reconciliation. core extracts payment events Bronze raw, as received Silver cleaned, conformed Gold business-ready validate, dedupe business rules dashboards, regulatory reports, reconciliation append-only audit trail, replayable when logic changes

Bronze lands data exactly as the source sent it: the core banking extract, the stream of payment events, the files, all append-only with load metadata. Nothing is fixed, filtered, or deduplicated here, deliberately, because bronze is the audit trail and the replay source. When silver’s logic turns out to be wrong, and eventually it will, you correct the logic and rebuild from bronze; if bronze had been “cleaned,” the original truth would be gone. This is the same reason event consumers keep the original message rather than discarding failures.

Silver is where the cleaning lives, in one place, once: types standardized, invalid records quarantined, duplicates collapsed, codes conformed, entities resolved. The output is data an engineer or analyst can build on without re-cleaning it, still at the level of individual payments and accounts. Gold is where the business meaning lives: the joins, the aggregations, the star schemas and marts that answer questions, daily settlement totals, fees by scheme, the tables a regulator or a dashboard actually reads.

Why do the layer boundaries matter to an analyst?

Because each boundary is a contract you can specify and test, exactly like an API contract. The bronze-to-silver boundary owns validation and deduplication, so its contract is testable: every silver payment has a valid status from the known set, no duplicate UETRs survive, quarantined records are counted and visible. The silver-to-gold boundary owns business rules, so its contract is testable too: gold’s daily total equals the sum of silver’s payments for that day, one definition of “settled” applied consistently. Writing those checks is negative test design pointed at data, and automating them is the same scripting you already do against APIs.

The layers also give an investigation its hops. When a gold dashboard shows a number that looks wrong, you walk it back: is gold’s aggregate faithful to silver? Is silver’s cleaned record faithful to bronze? Is bronze faithful to the source? The first boundary where the answers diverge is the defect, the same point-of-divergence method as a failed payment investigation. Without layers, that investigation is archaeology inside one giant script; with them, it is three checks.

Where does the pattern fit with everything else?

The medallion pattern describes quality layers, not the whole platform, and it composes with the other structures. Gold frequently contains warehouse-style star schemas; the transformations between layers are typically built and ordered as a dbt DAG; the bronze layer is fed by the same batch files and event streams you already reason about, with Kafka topics landing in bronze the way events land in a consumer. In a data mesh, each domain often runs its own medallion internally and publishes its gold tables as data products.

The pattern’s banking relevance is direct: bronze’s append-only record supports the audit and replay obligations regulators care about, and gold’s single set of business definitions is what keeps the settlement figure in the regulatory report equal to the one in the executive dashboard. Two reports disagreeing on the same number is the data platform’s version of a reconciliation break, and one gold layer with one definition is the design that prevents it.

The takeaway

The medallion architecture moves data through three owned layers: bronze holds the raw, replayable truth as received, silver holds it cleaned and conformed in one place, and gold holds the business-ready tables consumers actually read. Each boundary is a testable contract, and together they turn “the dashboard is wrong” from archaeology into a hop-by-hop trace.

Treat the layers like the services of a payment flow: know what each one owns, test each contract, and walk the hops when a number diverges.

About the author

Analyst Engineering is written by Ahmed, a Senior Technical Business Analyst with 10+ years of banking and payments delivery experience: ISO 20022 and SWIFT messaging, payments API integration, Kafka event validation, and production support. Every article comes from real delivery work, and each one is reviewed and updated as tools and standards change.

Newsletter

Subscribe

Practical, no-fluff playbooks for technical analysts who analyze, code, test, and support. New articles straight to your inbox.

No spam. Unsubscribe anytime.