Skip to content

Conversation

@DrakeLin
Copy link
Contributor

@DrakeLin DrakeLin commented Jan 20, 2026

Description

This PR adds documentation to PROTOCOL.md for two table properties that control per-column statistics collection:

  • delta.dataSkippingStatsColumns: Allows users to explicitly specify which columns should have statistics collected. Column names can refer to struct fields using dot notation (e.g., a.b.c), in which case all leaf fields within that struct are included.
  • delta.dataSkippingNumIndexedCols: Specifies the number of leading leaf columns in the table schema for which to collect statistics. Defaults to 32.

This is implemented in Spark today. See

val DATA_SKIPPING_NUM_INDEXED_COLS = buildConfig[Int](

@DrakeLin DrakeLin requested a review from tdas as a code owner January 20, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant