What is Duckdb?

Duckdb is an in-process, embedded relational database management system (RDBMS)

  • Embedded means
    • duckdb runs within another process, like your application or notebook, and are not accessed over a network.
    • that duckdb does not require a separate database server process to run.
  • Similar to SQLite, its database is bundled with the app rather than running as a separate service like PostgreSQL or MySQL.
  • Each database is a single file.
  • It’s a vectorized database which means that achieves high performance by processing data in batches, amortizing interpretation overhead, and enabling efficient use of CPU caches and SIMD instructions

Duckdb as a zero-copy layer

  • Similar to traditional OLAP cubes - SSAS, SAP BW - or modern OLAP systems (ClickHouse, Druid), it only contains a single or no file when used as a zero-copy layer.
  • One use case of duckdb could be to read a bunch of CSVs or Parquets, transform it, and store it somewhere else and have used it only as a compute engine.