The Medallion Architecture is a data design pattern promoted by Databricks for organizing data pipelines on the lakehouse. Instead of dumping everything into a single lake, you structure it into layers: Bronze → Silver → Gold.**
- Bronze Layer (Raw)
- What it contains: raw, unprocessed data
- Goal: Capture everything as-is with minimal transformation.
- Examples:
- JSON logs from Kafka
- CSVs dropped in S3
- Silver Layer (Clean and Enriched Data)
- What it contains: Data that is cleansed, standardized, and enriched.
- Goal: Improve quality and usability for downstream users.
- Typical Operations done:
- Deduplication
- Data type casting
- Joining with reference data
- Handling missing values
- Gold Layer (Business-Level Data/Curated Data)
- What it contains: Aggregated, business-ready datasets.
- Goal: Serve analytics, BI dashboards, ML models.