Liquid Clustering vs Partitioning vs Z-Ordering: When to Use What
If you're working with large-scale tables and need to optimize query performance, you'll typically choose between liquid clustering, partitioning, and Z-ordering. Each has its use case, but recent benchmarks suggest that liquid clustering is the fastest, especially for modern workloads.
Here’s a straight comparison of what each one does, when it helps, and how they stack up.
1. Liquid Clustering (Delta Lake 3.0 and beyond)
What it does
Liquid clustering clusters data logically by one or more columns without rewriting physical files. It builds a separate clustering index that enables fast data skipping during reads.
Why it's better
Benchmarks show 2–6x faster performance than traditional Z-ordering
Doesn’t require re-writing existing files
Supports evolving query patterns
No need for repartitioning or periodic sorting jobs
When to use
You’re on Delta Lake 3.0+ or Iceberg and want fast performance with minimal tuning
Your table is large and query patterns change over time
You want adaptive clustering without frequent rewrites
Limitations
Currently available in Delta Lake 3.0+ and some platforms like Snowflake and Iceberg
May need to enable features like clustering indexes explicitly
Bottom line
If supported by your engine, use liquid clustering by default. It’s the most scalable and lowest-maintenance option.
2. Partitioning
What it does
Physically splits data into directories or files based on a column’s value. Commonly used for date, region, or other categorical fields.
When it helps
Queries filter on one specific column (e.g.,
WHERE date = '2024-07-01')Data is stable and doesn’t evolve rapidly
You want to minimize scanned files in predictable patterns
Limitations
Doesn’t work well with high-cardinality or skewed columns
Over-partitioning leads to small files and poor performance
Requires redesign if query patterns shift
Best for
Time-series data
Stable, low-cardinality filters
Legacy systems or simple use cases
3. Z-Ordering
What it does
Sorts data across multiple columns using a space-filling curve (Z-order curve) so that rows with similar values are physically close. Improves file skipping during reads.
When it helps
Queries filter across multiple dimensions (e.g.,
user_id,device_type)You can’t partition on all relevant columns
You want better performance without increasing partition complexity
Limitations
Requires explicit
OPTIMIZE ZORDER BY (...)commandsSorting loses effectiveness as new data is appended
Doesn’t adapt automatically to query patterns
Best for
Delta Lake tables without access to liquid clustering
Moderate-scale datasets
Multi-column filter queries where partitions fall short
Which One Should You Use?
If you're using Delta Lake 3.0+ or a platform that supports it, liquid clustering is the recommended default. It offers faster reads, requires less manual maintenance, and adapts to changing data and query patterns without costly rewrites.
Partitioning and Z-ordering still have valid use cases, especially in older systems. But if you're building or optimizing a modern lakehouse, start with liquid clustering wherever possible.


