What is Data Engineering

Data Engineering is the practice of taking raw data from a data source and processing it so it’s stored and organized for a downstream use case such as
- data analytics
- business intelligence (BI) or
- machine learning (ML) model

Framework of Data Engineering

Ingest
- Data ingestion is the process of bringing data from one or more data sources into a data platform.
- These data sources can be files stored on-premises or on cloud storage services (MS Sharepoint), databases, applications and, increasingly, data streams that produce real-time events.
Transform
- Data transformation takes raw ingested data and uses a series of steps (referred to as “transformations”) to filter, standardize, clean and finally aggregate it so it’s stored in a usable way.
- Medallion architecture
  - is a popular pattern that divides transformation phase into three stages - Bronze, Silver, and Gold
  - Bronze - raw ingestion and history
  - Silver - filtered, cleaned, augmented,
  - Gold - business-level aggregates
Orchestrate
- Data orchestration refers to the way a data pipeline that performs ingestion and transformation is scheduled and monitored as well as the control of the various pipeline steps and handling failures (e.g. by executing a retry run)