When building a new data ingestion, how you extract data and in what form is what a data engineer will examine.

First, what is the interface to the data?

  1. A database behind an application, such as Postgres or MySQL database
  2. A layer of abstraction on top of a system such as REST API
  3. A stream processing platform such as Apache Kafka
  4. A shared network file system or cloud storage bucket (e.g. SharePoint folder) containing comma-separated value (CSV) files
  5. A data warehouse or data lake

In addition to the interface, the structure of the data will vary. Here are some common examples:

  1. JSON from a REST API
  2. Well-structurrrerd data frrom a MySQL database
  3. CSV, fixed-width format (FWF), and other flat file formats
  4. JSON in flat files
  5. Stream output from Kafka