What is Duckdb?
Duckdb is an in-process, embedded relational database management system (RDBMS)
- Embedded means
- duckdb runs within another process, like your application or notebook, and are not accessed over a network.
- that duckdb does not require a separate database server process to run.
- Similar to SQLite, its database is bundled with the app rather than running as a separate service like PostgreSQL or MySQL.
- Each database is a single file.
- It’s a vectorized database which means that achieves high performance by processing data in batches, amortizing interpretation overhead, and enabling efficient use of CPU caches and SIMD instructions
Duckdb as a zero-copy layer
- Similar to traditional OLAP cubes - SSAS, SAP BW - or modern OLAP systems (ClickHouse, Druid), it only contains a single or no file when used as a zero-copy layer.
- One use case of duckdb could be to read a bunch of CSVs or Parquets, transform it, and store it somewhere else and have used it only as a compute engine.