The Medallion Architecture is a data design pattern promoted by Databricks for organizing data pipelines on the lakehouse. Instead of dumping everything into a single lake, you structure it into layers: Bronze → Silver → Gold.**

  1. Bronze Layer (Raw)
    • What it contains: raw, unprocessed data
    • Goal: Capture everything as-is with minimal transformation.
    • Examples:
      • JSON logs from Kafka
      • CSVs dropped in S3
  2. Silver Layer (Clean and Enriched Data)
    • What it contains: Data that is cleansed, standardized, and enriched.
    • Goal: Improve quality and usability for downstream users.
    • Typical Operations done:
      • Deduplication
      • Data type casting
      • Joining with reference data
      • Handling missing values
  3. Gold Layer (Business-Level Data/Curated Data)
    • What it contains: Aggregated, business-ready datasets.
    • Goal: Serve analytics, BI dashboards, ML models.