DuckDB Practical Uses — Belgavi.AI Lab

DuckDB is SQLite for analytics: in-process, columnar, fast on a single machine, no server. It looks like a curiosity until you actually use it — then it replaces Pandas, makes Parquet workflows trivial, and embeds in apps for analytical features. By 2026 it's a daily tool for many engineers.

Advertisement

In-process columnar SQL

Single library (Python, R, Java, Node, C++ all wrap it). Open files directly: SELECT * FROM 'data.parquet'. Joins, aggregations, window functions. No server, no schema declaration, no setup. Fast — competitive with Spark on single-machine workloads.

Replacing Pandas for big data

Pandas loads everything in memory; chokes above ~1GB. DuckDB streams. duckdb.sql("SELECT category, AVG(price) FROM 'huge.parquet' GROUP BY category").df() returns a small DataFrame for further work. 10-100x faster on aggregations.

Advertisement

Cloud storage support

Read directly from S3, GCS, R2: SELECT * FROM 's3://bucket/data.parquet'. With httpfs extension. Push down predicates; only read the bytes needed. Useful for cheap ad-hoc analytics without a data warehouse.

Embedded analytics in apps

Ship DuckDB inside your app for in-app analytics. User uploads CSV → DuckDB queries it client-side. Smaller than spinning up a warehouse for a feature. Wasm build for browser deployment.

Where it doesn't fit

OLTP (use Postgres). Concurrent writers (single-writer model). Distributed scale (single machine). Production OLAP with many users (use a warehouse). DuckDB is a tool, not the warehouse.

DuckDB replaces Pandas for big data, makes Parquet ergonomic, embeds in apps. Not a warehouse; a powerful local tool.

In-process columnar SQL

Replacing Pandas for big data

Cloud storage support

Embedded analytics in apps

Where it doesn&#x27;t fit

Where it doesn't fit