-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
Example:
TPCHQ7
Hive
TPCHQ8
I reran TPCDS Q60 on a medium cluster, and the query elapsed time was stable at 16s and planning time at 13s. I took JFR profile on the coordinator and saw
- The planning thread(s) is repeatedly blocking in socket reads to the remote service.
- The total blocking time on those reads is ~11–12s.
- end-to-end planning time (~13s) lines up almost exactly with the cumulated socket-read time.
The hotspot was in IcebergHiveMetadata.getTableStatistics and through ThrifthiveMetastore. This confirms that network I/O to the metastore is the dominant cause of slow planning, not local CPU.
By manually enabling the metastore cache in com/facebook/presto/hive/metastore/InMemoryCachingHiveMetastore.java,
private InMemoryCachingHiveMetastore(...)
{
...
// switch (metastoreCacheScope) {
// case PARTITION:
// partitionCacheExpiresAfterWriteMillis = expiresAfterWriteMillis;
// partitionCacheRefreshMills = refreshMills;
// partitionCacheMaxSize = maximumSize;
// cacheExpiresAfterWriteMillis = OptionalLong.of(0);
// cacheRefreshMills = OptionalLong.of(0);
// cacheMaxSize = 0;
// break;
//
// case ALL:
// partitionCacheExpiresAfterWriteMillis = expiresAfterWriteMillis;
// partitionCacheRefreshMills = refreshMills;
// partitionCacheMaxSize = maximumSize;
// cacheExpiresAfterWriteMillis = expiresAfterWriteMillis;
// cacheRefreshMills = refreshMills;
// cacheMaxSize = maximumSize;
// break;
//
// default:
// throw new IllegalArgumentException("Unknown metastore-cache-scope: " + metastoreCacheScope);
// }
partitionCacheExpiresAfterWriteMillis = OptionalLong.of(999999999999L);
partitionCacheRefreshMills = OptionalLong.of(999999999999L);
partitionCacheMaxSize = 999999999999L;
cacheExpiresAfterWriteMillis = OptionalLong.of(999999999999L);
cacheRefreshMills = OptionalLong.of(999999999999L);
cacheMaxSize = 999999999999L;
...
}
The query planning time for TPCDS 65 and 56 dropped from >13s to 1.6s. However the config property for it was disabled because the CacheTtl was hard coded to 0.
@agrawalreetika pointed out that Iceberg metadata caching was disabled in #24326. In the issue @imjalpreet mentioned "The ideal long-term solution would involve allowing caching of specific metadata, such as table statistics or the list of table names, which do not violate the ACID contract. However, as an immediate fix, we should disallow enabling the metastore cache when using the Iceberg connector." However the mentioned metadata caches were not implemented yet. @imjalpreet's #24376 add caching for getTable(), but it doesn't work for getTableStatistics().
@zacw7 : When you did the runs earlier this year which showed similar performance between Iceberg vs Hive, was it before #24326 was merged? I'm wondering why you didn't see this before.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status