We are proud to release the first major version of DuckDB, v1.0.0, codenamed "Snow Duck".
This version is a culmination of almost six years of research and development. Today we are shipping an innovative database system with a backwards-compatible storage format.
Check out our
New blog post by
@mraasveldt
: Multi-Database Support in DuckDB
DuckDB can now attach MySQL, Postgres, and SQLite databases in addition to databases stored in its own format. This allows data to be read into DuckDB and moved between these systems in a
DuckDB was recently covered in
@andy_pavlo
's Advanced Database Systems course at CMU. The lecture covers DuckDB's history, internals, and integration with other systems.
Slides:
Recording:
We are proud to release DuckDB v0.10.0:
Some highlights:
– A reworked and much faster CSV reader
– Fixed-length arrays
– Multi-database support
– Secrets manager
– Temporary memory manager
– Adaptive lossless floating-point compression
– New CLI editor
–
DuckDB 0.7.0 "Labradorius" released with
#JSON
support, parallel and partitioned export to CSV and Parquet, UPSERT,
@DataPolars
integration, and much more in our release announcement blog post:
DuckDB is introducing support for vector similarity search through the new VSS extension.
Read
@Maxxen_
's blog post for a sneak preview on the new extension's capabilities:
We wrote a performance guide for DuckDB users! This guide covers topics such as the effects of schema (constraints, indexing) and hardware (CPU, memory, disk). We also share best practices for querying Parquet files and tips for tuning your workload.
New blog post: Access 150k+ Datasets from Hugging Face with DuckDB
This blog post, co-authored by the
@huggingface
and DuckDB teams, describes how you can use the hf:// prefix in DuckDB to access datasets in Hugging Face repositories.
Read more at
DuckDB has introduced native Delta Lake support.
In our new blog post,
@samansmink
walks through the design and implementation of the new Delta Lake extension.
Read more at
New blog post by
@lnkuiper
– No Memory? No Problem. External Aggregation in DuckDB
The post describes how DuckDB can efficiently aggregate over many more groups than fit in memory, allowing it to complete the 50 GB variant of the
DuckDB's co-creator, Hannes Mühleisen, recently became a professor of data engineering at Radboud University. The recording of his inaugural lecture, titled "The Ancient Art of Data Management", is now available.
New blog post by
@szarnyasg
:
Command Line Data Processing: Using DuckDB as a Unix Tool
This blog post shows how DuckDB stacks up against classic Unix tools (such as cut, grep, sort, and sed) when performing simple data processing steps.
Read more at
DuckDB 0.6.0 "Oxyura" released with improved storage, higher performance for CSV loading and indexing, new SQL syntax, better memory management, shell tweaks and so many new features
@mraasveldt
wrote a separate blog post to explain it all:
DuckDB supports querying buckets in the AWS S3 Express One Zone. Read the related guide at , which shows that DuckDB can read a Parquet file from an S3 Express One bucket at about 1.2 gigabytes per second!
PS: You may also noticed that we started rolling
New blog post by
@__AlexMonahan__
SQL Gymnastics: Bending SQL into flexible new shapes
In this post, Alex presents pure SQL queries to implement dynamic groupings and aggregate functions using DuckDB's friendly SQL extensions. The queries can be used to
We have revamped one of our core operators, aggregation. It has improved scalability for many unique groups and for a large number of cores. Thanks to these, you can expect better performance when running large aggregations on big machines.
The Awesome DuckDB repository, maintained by
@davidgasquez
, has grown to more than 100 entries in less than a year. If you are aware of more cool projects using DuckDB, please consider submitting a PR!
Did you know that you can connect to a DuckDB database file via HTTPS or S3 with just two SQL statements? We have a new guide that explains how to do this.
New blog post by
@hfmuehleisen
duckplyr: dplyr powered by DuckDB
The post describes the new R package duckplyr, which translates the dplyr API to DuckDB’s execution engine.
Read more at
There are now a lot of handy tools and cool projects built around DuckDB. You can find a list of these in the Awesome DuckDB repository maintained by
@davidgasquez
.
See the list and contribute your project at
This blog post by Cal Paterson (), "DuckDB Isn't Just Fast", discusses some of DuckDB's characteristics outside of sheer processing speed: developer ergonomics, scalability using out-of-core processing, and ease of setup.
New blog post: JupySQL enables SQL cells in Jupyter, supports DuckDB, and also enables plotting larger than memory datasets using DuckDB! JupySQL is an active fork of ipython-sql being enhanced by the folks at
@ploomber
. Let us know what you think!
The new DuckDB landing page, , has several code snippets for SQL features and DuckDB's APIs. You can use the "Live Demo" button to execute the queries on an example dataset in your browser using the DuckDB shell that runs in WebAssembly.
Note: the demo
DuckDB's documentation is now available for offline use both as a PDF and as a ZIP archive, which contains the static HTML of the website.
Head to to grab a copy.
We released DuckDB v1.0.0 a week ago.
There is a growing list of tools integrating with DuckDB, applications that use DuckDB, and extensions created for DuckDB. You can find a list of these in the Awesome DuckDB repository, maintained by
@davidgasquez
.
The list is never
We have started publishing the recordings of DuckCon
#4
. We are first releasing the “State of the Duck” talk by DuckDB's co-creators, Hannes Mühleisen (
@hfmuehleisen
) and Mark Raasveldt (
@mraasveldt
).
Video:
Slides:
Special thanks
Lambda functions are one of the most popular features in DuckDB. We recently added list_reduce, a new scalar function that supports lambdas, and they got their own documentation page at .
Note that this feature is currently only available in DuckDB's
MotherDuck, the ducking simple data warehouse, is now Generally Available! 🍾🥂 Thank you to our community of thousands of users who have tested, validated, and helped improve MotherDuck over the last year.❤️🦆
New blog post by
@carlo_piovesan
: Extensions for DuckDB-Wasm
Thanks to recent developments, DuckDB-Wasm users can now load DuckDB extensions, allowing them to run extensions in the browser.
DuckDB was included in
@InfoWorlds
's best open-source software list as a "tiny-but-powerful project" that provides just enough OLAP for most use cases. The award praised the lightweight nature and many features of DuckDB.
This blog post is a short summary of the ICDE 2024
(
@icdeconf
) paper authored by
@lnkuiper
,
@peterabcz
, and
@hfmuehleisen
: Robust External Hash Aggregation in the Solid State Age.
The paper is available at
New blog post by
@lnkuiper
– No Memory? No Problem. External Aggregation in DuckDB
The post describes how DuckDB can efficiently aggregate over many more groups than fit in memory, allowing it to complete the 50 GB variant of the
DuckDB's co-creator
@hfmuehleisen
announced support for Delta Lake (
@DeltaLakeOSS
) in DuckDB at last week's
@Data_AI_Summit
.
You can rewatch the keynote segment below:
For more information, see the delta extension's documentation:
DuckDB's co-creator
@hfmuehleisen
will give a keynote tomorrow at
@GOTOamst
.
Hannes, who is also a professor of data engineering at
@Radboud_Uni
, will give an overview of the last decades of data management, discuss why relational systems are still prevailing, and why
New post by
@lnkuiper
: Shredding deeply nested
#JSON
one vector at a time
Querying JSON as a table is as easy as SELECT * FROM 'file.json';
It's fast too, thanks to DuckDB's lists/structs and the yyjson parser by
@ibireme
.
We rolled out an updated syntax highlighter and a new color scheme in the DuckDB documentation, .
The highlighter now knows all of DuckDB's keywords and functions. The color scheme is based in the Bluloco theme ().
New blog post by
@szarnyasg
:
Analyzing Railway Traffic in the Netherlands
This tutorial demonstrates some of DuckDB's key query features using datasets that capture the railway traffic in the Netherlands.
Did you know that DuckDB supports function chaining? This allows function calls to be rewritten in more a readable manner. See the Even Friendlier SQL with DuckDB blog post for details:
“With DuckDB as a browser for the data cloud, relational datasets are always just a hyperlink away.” – That's a great line. Thanks for this nice blog post,
@NikolasGoebel
!
We have released DuckDB v0.10.3, a bugfix release.
The command 'pip install duckdb --upgrade' already delivers the new version. DuckDB clients in other package management systems (CRAN, Maven, Homebrew, etc.) will be updated in the coming days.
For the release notes and binary
🚨 A reminder for our old users and a pointer to our new followers: the DuckDB CLI client has a tldr page.
If you have
@tldr_pages
installed, you can get examples of the most common command line arguments with:
$ tldr duckdb
We extended our performance guide with a new recommendation: avoid joining on VARCHAR-typed columns (i.e., strings). The accompanying microbenchmark demonstrates a case where performing a large join on BIGINT columns is 2.6× faster than evaluating the same join on VARCHAR
DuckDB's co-creator Hannes Mühleisen gave a talk this week at the Hasso-Plattner-Institut
@HPI_DE
titled "Two Tier Architectures are Anachronistic". The recording is now available online.
Today's keynote at the
@Data_AI_Summit
will (again) feature DuckDB co-creator
@hfmuehleisen
, who will talk about DuckDB's support for Delta Lake (
@deltalakeoss
).
Follow the live stream (starting at 8.30am Pacific Time, in approximately half an hour):
DuckCon
#4
will feature a talk by
@polinaeterna
of
@huggingface
titled “Hugging a Duck: democratizing data access and exploration with DuckDB and Hugging Face Hub”.
The talk will explain how they use DuckDB to allow people to easily explore over 250k public dataset on the
A reminder: DuckDB has a tldr page. If you have
@tldr_pages
installed, you can get examples of the most common command-line arguments by running
$ tldr duckdb
We have released DuckDB v0.10.1, a bugfix release. For installation instructions, see:
This release fixes several issues with the CSV parser and tackles scenarios which previously resulted in out-of-memory (OOM) errors (details in 🧵).
We held DuckCon
#4
today in Amsterdam. Thanks to all speakers and attendees for making this an amazing event, and to
@RillData
for sponsoring the drinks & snacks!
The speaker decks are available on the event's site:
The recordings will be published in
DuckDB's DevRel,
@szarnyasg
, gave a talk last November at the
@oredev
conference titled "DuckDB: Harnessing in-process analytics for data science and beyond". The recording is now available:
The slide deck is here:
A new DuckDB article is out on Datanami with quotes from
@hfmuehleisen
:
DuckDB Walks to the Beat of Its Own Analytics Drum
“DuckDB has this different angle,” Mühleisen said. “It’s more like something that you put into a workflow rather than something
We have released DuckDB v0.9.1 today. This is a bug fix release for various issues discovered after we released 0.9.0. There are no new features, just bug fixes. Database files created by DuckDB v0.9.0 can be read by DuckDB v0.9.1.
Laurens Kuiper will present his paper "Robust External Hash Aggregation in the Solid State Age" tomorrow at ICDE 2024 in Utrecht. This work describes the techniques that make larger-than-memory aggregation possible in DuckDB. The paper is co-authored by Peter Boncz (
@peterabcz
)
DuckCon
#4
is next week in Amsterdam, on Feb 2 (Friday). Subash Roul from
@fivetran
is going to talk about building data lakes using DuckDB.
See the rest of the talks and the registration link at