Liquid Clustering vs Partitioning vs Z-Ordering: When to Use What

Jul 26, 2025

If you're working with large-scale tables and need to optimize query performance, you'll typically choose between liquid clustering, partitioning, and Z-ordering. Each has its use case, but recent benchmarks suggest that liquid clustering is the fastest, especially for modern workloads.

Here’s a straight comparison of what each one does, when it helps, and how they stack up.

1. Liquid Clustering (Delta Lake 3.0 and beyond)

What it does
Liquid clustering clusters data logically by one or more columns without rewriting physical files. It builds a separate clustering index that enables fast data skipping during reads.

Why it's better

Benchmarks show 2–6x faster performance than traditional Z-ordering
Doesn’t require re-writing existing files
Supports evolving query patterns
No need for repartitioning or periodic sorting jobs

When to use

You’re on Delta Lake 3.0+ or Iceberg and want fast performance with minimal tuning
Your table is large and query patterns change over time
You want adaptive clustering without frequent rewrites

Limitations

Currently available in Delta Lake 3.0+ and some platforms like Snowflake and Iceberg
May need to enable features like clustering indexes explicitly

Bottom line
If supported by your engine, use liquid clustering by default. It’s the most scalable and lowest-maintenance option.

2. Partitioning

What it does
Physically splits data into directories or files based on a column’s value. Commonly used for date, region, or other categorical fields.

When it helps

Queries filter on one specific column (e.g., WHERE date = '2024-07-01')
Data is stable and doesn’t evolve rapidly
You want to minimize scanned files in predictable patterns

Limitations

Doesn’t work well with high-cardinality or skewed columns
Over-partitioning leads to small files and poor performance
Requires redesign if query patterns shift

Best for

Time-series data
Stable, low-cardinality filters
Legacy systems or simple use cases

3. Z-Ordering

What it does
Sorts data across multiple columns using a space-filling curve (Z-order curve) so that rows with similar values are physically close. Improves file skipping during reads.

When it helps

Queries filter across multiple dimensions (e.g., user_id, device_type)
You can’t partition on all relevant columns
You want better performance without increasing partition complexity

Limitations

Requires explicit OPTIMIZE ZORDER BY (...) commands
Sorting loses effectiveness as new data is appended
Doesn’t adapt automatically to query patterns

Best for

Delta Lake tables without access to liquid clustering
Moderate-scale datasets
Multi-column filter queries where partitions fall short

Which One Should You Use?

If you're using Delta Lake 3.0+ or a platform that supports it, liquid clustering is the recommended default. It offers faster reads, requires less manual maintenance, and adapts to changing data and query patterns without costly rewrites.

Partitioning and Z-ordering still have valid use cases, especially in older systems. But if you're building or optimizing a modern lakehouse, start with liquid clustering wherever possible.

Nan's Byte Sized Data

Discussion about this post

Ready for more?