Data Engineering is the practice of taking raw data from a data source and processing it so it’s stored and organized for a downstream use case such as
data analytics
business intelligence (BI) or
machine learning (ML) model
Framework of Data Engineering
Ingest
Data ingestion is the process of bringing data from one or more data sources into a data platform.
These data sources can be files stored on-premises or on cloud storage services (MS Sharepoint), databases, applications and, increasingly, data streams that produce real-time events.
Transform
Data transformation takes raw ingested data and uses a series of steps (referred to as “transformations”) to filter, standardize, clean and finally aggregate it so it’s stored in a usable way.
Medallion architecture
is a popular pattern that divides transformation phase into three stages - Bronze, Silver, and Gold
Bronze - raw ingestion and history
Silver - filtered, cleaned, augmented,
Gold - business-level aggregates
Orchestrate
Data orchestration refers to the way a data pipeline that performs ingestion and transformation is scheduled and monitored as well as the control of the various pipeline steps and handling failures (e.g. by executing a retry run)