You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Provides a Query cache, aligned with the Shared-nothing architecture.
Supports synchronous materialized views, aligned with the Shared-nothing architecture.
Data Lake Analytics
Optimizes Iceberg V2 query performance by reducing repeated reads of delete-files, lowering memory usage, and enhancing query performance.
Provides Time Travel query capability for Iceberg, allowing data to be read from a specified BRANCH or TAG by specifying TIMESTAMP or VERSION.
Supports Delta lake column mapping.
Data Cache related improvements:
Introduces a Segmented LRU (SLRU) Cache eviction strategy, which significantly defends against cache pollution from occasional large queries, improves cache hit rate, and reduces fluctuations in query performance.
Unifies parameters for Data Cache in both Shared-data architecture and lake query scenarios.
Provides an Adaptive IO strategy optimization for Data Cache, which adaptively routes some query requests to remote storage based on the cache disk's load and performance, thereby enhancing overall access throughput.
Enables asynchronous delivery of query fragments for lake queries, reducing the restriction that FE must obtain all files to be queried before BE can execute the query, thus allowing FE to fetch query files and BE to execute queries in parallel, shortening the overall query latency in lake queries involving a large number of files. (Currently, optimizations for Hudi/Delta are completed, Iceberg's are not yet done).
Supports automatic collection of external table statistics, which can collect more accurate NDV information compared to metadata files, thereby optimizing the query plan and improving query performance.
Performance Improvement and Query Optimization
Provides Arrow Flight interface for more efficient reading of large data volumes in query results.
[Experimental] Offers a preliminary query feedback feature for automatic optimization of slow queries. The system will collect slow queries and automatically analyze a SQL's Query Plan for potential optimization needs based on execution details, and may generate a tailored optimization guide. If the optimizer generates the same bad plan for subsequentidentical queries, the system may locally optimize this query plan to attempt to generate a better one.
Enables the pushdown of multi-column OR predicates, allowing a SQL with multi-column OR conditions (e.g., a = xxx OR b = yyy) to utilize certain column indexes, thus reducing data read volume and improving query performance.
Further Optimizes query performance for TPCDS, with a roughly 30% performance improvement in TPCDS-1TB Iceberg queries.
Supports Python UDFs, offering more convenient function customization compared to Java UDFs.
Storage engine
Provides unified expression partitioning, supporting arbitrary multi-level partitioning, where each level can be any expression.
Introduces a generic aggregate function state storage framework, which, in addition to the originally supported aggregate functions like SUM/MIN/MAX, can now conveniently support almost all other aggregate functions.
Supports Vector Index, offering two types of indexes: IVFPQ and HNSW, enabling fast approximate nearest neighbor searches (ANN) in large-scale, high-dimensional vectors, commonly required in deep learning and machine learning.
In the Shared-nothing architecture, Backup/Restore now supports backing up more objects like Logical View, External Catalog, and also supports expression partitioning/List partitioning.
Optimizes log printing to avoid occupying too much disk space.
Accurately displays the status of BE/CN during a graceful exit.
Loading
Provides a Batch Commit feature, which consolidates multiple concurrent Stream Loads of a table into a single ingestion transaction, thus improving the throughput of real-time data ingestion.
INSERT OVERWRITE now supports automatic creation of partitions based on imported data and only overwrites partitions containing data, simplifying partial data recovery.
Some data ingestion improvements for INSERT from FILES:
INSERT now supports matching columns by name (default is by position matching).
INSERT supports PROPERTIES to set some parameters, like strict_mode, max_filter_ratio, and timeout (strict_mode replaces enable_insert_strict, and differs slightly from it).
Enables pushdowning target table schema when using INSERT from FILES to infer a much accurate source data schema.
FILES now provides the ability to list files specified by the path parameter.
The text was updated successfully, but these errors were encountered:
ETA: December 2024
Shared-data Enhancements
Data Lake Analytics
Performance Improvement and Query Optimization
a = xxx OR b = yyy
) to utilize certain column indexes, thus reducing data read volume and improving query performance.Storage engine
Loading
strict_mode
,max_filter_ratio
, andtimeout
(strict_mode
replacesenable_insert_strict
, and differs slightly from it).path
parameter.The text was updated successfully, but these errors were encountered: