When building a new data ingestion, how you extract data and in what form is what a data engineer will examine.
First, what is the interface to the data?
- A database behind an application, such as Postgres or MySQL database
- A layer of abstraction on top of a system such as REST API
- A stream processing platform such as Apache Kafka
- A shared network file system or cloud storage bucket (e.g. SharePoint folder) containing comma-separated value (CSV) files
- A data warehouse or data lake
In addition to the interface, the structure of the data will vary. Here are some common examples:
- JSON from a REST API
- Well-structurrrerd data frrom a MySQL database
- CSV, fixed-width format (FWF), and other flat file formats
- JSON in flat files
- Stream output from Kafka