Fix CTAS for non-hdfs storages, also fixes multi storage cases #256

vikrambohra · 2024-11-19T00:46:16Z

Summary

This PR introduces the following changes

Fix CTAS for non-hdfs storage types
While extracting UUID from a snapshot, the code constructs the database path excluding the endpoint (scheme) when checking if it is a prefix of the manifestList that is part of the snapshot

Example
ManifestList (from snapshot): s3://bucket-name/database/table-uuid/file.avro
Database prefix: bucket-name/database (not a prefix of above)

Fix: Strip the endpoint(scheme) from the manifest list by resolving the correct storage from the tableLocation

After fix
ManifestList stripped : bucket-name/database/table-uuid/file.avro
Database prefix: bucket-name/database (is a prefix of above)

Fix multiple storage scenario
There are assumptions in code that storage is always cluster default. This fails when the default is a storage without scheme (hdfs) and the db.table is being stored in a storage with scheme (S3, BlobFs)

Fix: Resolve the correct storage for by extracting the tableLocation from table props and checking the scheme (endpoint)

Adds a method to storage interface to check if the tableLocation exists
Add a method to StorageClient to check if a specified path exists.

Changes

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

Manually Tested on local docker setup. Please include commands ran, and their output.
Added new tests for the changes made.
Updated existing tests to reflect the changes made.
No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
Some other form of testing like staging or soak time in production. Please explain.

Updated TableUUIDGeneratorTest
Added TableUUIDGeneratorMultiStorageTest
Ran Docker setup for both s3 and hdfs

HDFS

scala> spark.sql("CREATE TABLE openhouse.db.t1 (name string)")
res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("SHOW TBLPROPERTIES openhouse.db.t1").show(100, false)
+------------------------------------------+-------------------------------------------------------------------------------------------------------------------+
|key                                       |value                                                                                                              |
+------------------------------------------+-------------------------------------------------------------------------------------------------------------------+
|openhouse.tableUri                        |LocalHadoopCluster.db.t1                                                                                           |
|openhouse.creationTime                    |1732661881733                                                                                                      |
|current-snapshot-id                       |none                                                                                                               |
|write.metadata.delete-after-commit.enabled|true                                                                                                               |
|write.metadata.previous-versions-max      |28                                                                                                                 |
|openhouse.tableCreator                    |openhouse                                                                                                          |
|openhouse.lastModifiedTime                |1732661881733                                                                                                      |
|openhouse.tableType                       |PRIMARY_TABLE                                                                                                      |
|policies                                  |                                                                                                                   |
|openhouse.tableId                         |t1                                                                                                                 |
|openhouse.tableUUID                       |8faacae2-3b3c-4b24-8169-251576b31e04                                                                               |
|openhouse.databaseId                      |db                                                                                                                 |
|openhouse.clusterId                       |LocalHadoopCluster                                                                                                 |
|format                                    |iceberg/orc                                                                                                        |
|openhouse.tableVersion                    |INITIAL_VERSION                                                                                                    |
|write.format.default                      |orc                                                                                                                |
|write.parquet.compression-codec           |zstd                                                                                                               |
|openhouse.tableLocation                   |/data/openhouse/db/t1-8faacae2-3b3c-4b24-8169-251576b31e04/00000-13b06eb9-e1db-4857-b152-842fbe52b2eb.metadata.json|
+------------------------------------------+-------------------------------------------------------------------------------------------------------------------+


scala> spark.sql("INSERT INTO openhouse.db.t1 values ('value1')")
res2: org.apache.spark.sql.DataFrame = []                                       

scala> spark.sql("SELECT * FROM openhouse.db.t1").show()
+------+
|  name|
+------+
|value1|
+------+


scala> spark.sql("CREATE TABLE openhouse.db.ctas1 AS SELECT * FROM openhouse.db.t1")
res4: org.apache.spark.sql.DataFrame = []                                       

scala> spark.sql("SELECT * FROM openhouse.db.ctas1").show()
+------+
|  name|
+------+
|value1|
+------+


scala> spark.sql("DROP TABLE openhouse.db.ctas1").show()
++
||
++
++


S3

scala> spark.sql("CREATE TABLE openhouse.db.t1 (name string)")
res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("SHOW TBLPROPERTIES openhouse.db.t1").show(100, false)
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+
|key                                       |value                                                                                                                    |
+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+
|openhouse.tableLocation                   |s3://openhouse-bucket/db/t1-9f9c16ca-49e5-4cc6-a32d-4116bfd28ded/00000-b7feaf18-4a32-4624-b903-0ae92d85d482.metadata.json|
|openhouse.clusterId                       |LocalS3Cluster                                                                                                           |
|current-snapshot-id                       |none                                                                                                                     |
|openhouse.lastModifiedTime                |1732662172134                                                                                                            |
|write.metadata.delete-after-commit.enabled|true                                                                                                                     |
|write.metadata.previous-versions-max      |28                                                                                                                       |
|openhouse.tableCreator                    |openhouse                                                                                                                |
|openhouse.tableUri                        |LocalS3Cluster.db.t1                                                                                                     |
|openhouse.tableType                       |PRIMARY_TABLE                                                                                                            |
|policies                                  |                                                                                                                         |
|openhouse.tableId                         |t1                                                                                                                       |
|openhouse.creationTime                    |1732662172134                                                                                                            |
|openhouse.databaseId                      |db                                                                                                                       |
|format                                    |iceberg/orc                                                                                                              |
|openhouse.tableVersion                    |INITIAL_VERSION                                                                                                          |
|openhouse.tableUUID                       |9f9c16ca-49e5-4cc6-a32d-4116bfd28ded                                                                                     |
|write.format.default                      |orc                                                                                                                      |
|write.parquet.compression-codec           |zstd                                                                                                                     |
+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+


scala> spark.sql("INSERT INTO openhouse.db.t1 values ('value1')")
res2: org.apache.spark.sql.DataFrame = []                                       

scala> spark.sql("SELECT * FROM openhouse.db.t1").show()
+------+
|  name|
+------+
|value1|
+------+


scala> spark.sql("CREATE TABLE openhouse.db.ctas1 AS SELECT * FROM openhouse.db.t1")
res4: org.apache.spark.sql.DataFrame = []

scala> spark.sql("SELECT * FROM openhouse.db.ctas1").show()
+------+
|  name|
+------+
|value1|
+------+


scala> spark.sql("DROP TABLE openhouse.db.ctas1").show()
++
||
++
++

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Additional Information

Breaking Changes
Deprecations
Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java

HotSushi

Agree with Storage API changes. One feedback on not introducing iceberg at rest layer.

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/hdfs/HdfsStorageClient.java

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/BaseStorage.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageClient.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/BaseStorage.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/Storage.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageClient.java

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/adls/AdlsStorageClient.java

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/hdfs/HdfsStorageClient.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/s3/S3StorageClient.java

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java

vikrambohra · 2024-11-22T02:00:18Z

@HotSushi @jainlavina Addressed the comments. Some changes in the latest commit

Removed tableLocationExists() from the storage api - we dont need to construct the table location since we fetch it from table properties.
Changed the pathExists in storageClient api to fileExists to be explicit about the check since we are now checking the absolute path of metadata,json file (table location) instead of only the path till table directory.

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageClient.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageManager.java

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/hdfs/HdfsStorageClient.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageClient.java

jainlavina · 2024-11-22T20:47:06Z

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageManager.java

+  public Storage getStorageFromPath(String path) {
+    for (Storage storage : storages) {
+      if (storage.isConfigured()) {
+        if (StorageType.LOCAL.equals(storage.getType())) {


Shouldn't this be fallback only if path does not start with any other configured endpoint?
What if the path is an S3 storage path but local storage is also configured for some other tables?

➕ lets have if (StorageType.LOCAL.equals(storage.getType())) { in the fallback

done. Please check

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/adls/AdlsStorageClient.java

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/hdfs/HdfsStorageClient.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/s3/S3StorageClient.java

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/hdfs/HdfsStorageClient.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/s3/S3StorageClient.java

...lcatalog/src/main/java/com/linkedin/openhouse/internal/catalog/OpenHouseInternalCatalog.java

HotSushi · 2024-11-26T20:03:48Z

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java

-        extractFromTblPropsIfExists(databaseId + "." + tableId, tableProperties, DB_RAW_KEY);
-    String tblIdFromProps =
-        extractFromTblPropsIfExists(databaseId + "." + tableId, tableProperties, TBL_RAW_KEY);
+    String tableURI = String.format("%s.%s", databaseId, tableId);


@rohitkum2506 can you check if the table reinstatement logic for replication is still intact? especially at:

String tableLocation = extractFromTblPropsIfExists(tableURI, tableProperties, TBL_LOC_RAW_KEY);

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java

services/tables/src/test/java/com/linkedin/openhouse/tables/e2e/h2/SnapshotsControllerTest.java

HotSushi · 2024-11-26T20:22:17Z

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageManager.java

+  public Storage getStorageFromPath(String path) {
+    for (Storage storage : storages) {
+      if (storage.isConfigured()) {
+        if (StorageType.LOCAL.equals(storage.getType())) {


➕ lets have if (StorageType.LOCAL.equals(storage.getType())) { in the fallback

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageClient.java

HotSushi

For extractUUIDFromTableProperties, the logic before would check for directory existence thats missing now. We should be able to incorporate it as part of Storage.isPathValid().

For extractUUIDFromExistingManifestListPath we end up calling storage.getClient().getEndpoint(), we should introduce better layering/ or redefine the logic. But i'm ok pursuing this in a future PR.

HotSushi · 2024-11-27T08:32:59Z

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/Storage.java

+   * Checks if the provided path is a valid path for this storage type. It defaults to checking if
+   * the path starts with the endpoint (scheme) specified in cluster.yaml
+   *
+   * @param path path to a file/object


nit: can you add example of what a path string looks like for HDFS + s3, similar to StorageClient doc. Also add a doc explicitly that says prefix check wouldn't work, only object existance check works

HotSushi · 2024-11-27T08:39:58Z

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java

-            storageManager, dbIdFromProps, tblIdFromProps, tableUUIDProperty);
-    if (TableType.REPLICA_TABLE != tableType && !doesPathExist(previousPath)) {
-      log.error("Previous tableLocation: {} doesn't exist", previousPath);
+    if (TableType.REPLICA_TABLE != tableType && !storage.getClient().exists(tableLocation)) {


storage.getClient().exists(tableLocation) -> storage.isPathValid(tableLocation)? after incorporating existence check feedback.

HotSushi · 2024-11-27T08:40:58Z

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/Storage.java

+   * @return true if endpoint is specified in cluster.yaml else false
+   */
+  default boolean isPathValid(String path) {
+    return path.startsWith(getClient().getEndpoint());


shouldnt we also check for object existence here?

isPathValid should check three things:

its appropriate for the right storage (ie: prefix matches s3://)

its directory structure is valid

rootprefix is intact (ie. /data/openhouse/db/tb/metadata.json has rootPrefix data/openhouse)

db/table directory is intact (ie. /data/openhouse/db/tb/metadata.json has directory structure thats created by storage.allocateTableLocation(), /db/tb-uuid)

and object exists(ie. /data/openhouse/db/tb/metadata.json);

To achieve this, you might need to change the signature to: boolean isPathValid(String path, dbname, tbname, tbuuid)

HotSushi · 2024-11-27T08:47:09Z

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/hdfs/HdfsStorage.java

+   */
+  @Override
+  public boolean isPathValid(String path) {
+    return (super.isPathValid(path) || path.startsWith("/"));


we'll also need to validate rootprefix as well, basically this method needs to be equivalent to:

java.nio.file.Path previousPath = InternalRepositoryUtils.constructTablePath( storageManager, dbIdFromProps, tblIdFromProps, tableUUIDProperty); !doesPathExist(previousPath)

in TableUUIDGenerator

HotSushi · 2024-11-27T08:59:43Z

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/s3/S3StorageClient.java

+    Preconditions.checkArgument(
+        path.startsWith(getEndpoint()), String.format("Invalid S3 URL format %s", path));
+    try {
+      URI uri = new URI(path);


can we use s3 utils to get bucket, key information?

import software.amazon.awssdk.services.s3.S3Client; import software.amazon.awssdk.services.s3.S3Utilities; import software.amazon.awssdk.services.s3.model.S3Uri; public class S3UriParser { public static void main(String[] args) { S3Utilities s3Utilities = s3Client.utilities(); String s3UriString = "s3://my-bucket/path/to/file.txt"; S3Uri s3Uri = s3Utilities.parseUri(URI.create(s3UriString)); String bucket = s3Uri.bucket().orElse(null); String key = s3Uri.key().orElse(null); System.out.println("Bucket: " + bucket); System.out.println("Key: " + key); } }

HotSushi · 2024-11-27T09:01:07Z

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java

+      String manifestListPathString =
+          new Gson().fromJson(snapshotStr, JsonObject.class).get(manifestListKey).getAsString();
+      manifestListPathString =
+          StringUtils.removeStart(manifestListPathString, storage.getClient().getEndpoint());


is there a better way to achieve this without calling storage.getClient().getEndpoint, I'm ok with deferring this part to future pr.

vikrambohra requested review from jainlavina and HotSushi November 19, 2024 00:46

vikrambohra force-pushed the fixCtasMultiStorage branch from e15e1dc to 1a49f14 Compare November 20, 2024 00:20

HotSushi reviewed Nov 21, 2024

View reviewed changes

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java Outdated Show resolved Hide resolved

HotSushi reviewed Nov 21, 2024

View reviewed changes

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java Outdated Show resolved Hide resolved

HotSushi requested changes Nov 21, 2024

View reviewed changes

jainlavina reviewed Nov 21, 2024

View reviewed changes

teamurko reviewed Nov 22, 2024

View reviewed changes

jainlavina reviewed Nov 22, 2024

View reviewed changes

vikrambohra added 7 commits November 26, 2024 10:54

Fix CTAS for non-hdfs storages, also fixes multi storage cases

f7a3666

Strip endpoint instead of appending during prefix comparison

3e21556

Use tableLocation to parse storage instead of catalog

7ca2f04

cosmetic changes

f6594b0

fixed test cases and comments

b5547ea

Trigger Build

36cbe99

Update s3 fileExists check

01ee894

HotSushi reviewed Nov 26, 2024

View reviewed changes

Added method to validate specified path in storage api + other changes

ea7a771

vikrambohra force-pushed the fixCtasMultiStorage branch from 3cecc48 to ea7a771 Compare November 26, 2024 23:21

HotSushi reviewed Nov 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CTAS for non-hdfs storages, also fixes multi storage cases #256

Fix CTAS for non-hdfs storages, also fixes multi storage cases #256

vikrambohra commented Nov 19, 2024 •

edited

Loading

HotSushi left a comment

vikrambohra commented Nov 22, 2024 •

edited

Loading

jainlavina Nov 22, 2024

HotSushi Nov 26, 2024

vikrambohra Nov 26, 2024

HotSushi Nov 26, 2024

HotSushi Nov 26, 2024

HotSushi left a comment

HotSushi Nov 27, 2024

HotSushi Nov 27, 2024

HotSushi Nov 27, 2024

HotSushi Nov 27, 2024

HotSushi Nov 27, 2024

HotSushi Nov 27, 2024

HotSushi Nov 27, 2024

Fix CTAS for non-hdfs storages, also fixes multi storage cases #256

Are you sure you want to change the base?

Fix CTAS for non-hdfs storages, also fixes multi storage cases #256

Conversation

vikrambohra commented Nov 19, 2024 • edited Loading

Summary

Changes

Testing Done

Additional Information

HotSushi left a comment

Choose a reason for hiding this comment

vikrambohra commented Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HotSushi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikrambohra commented Nov 19, 2024 •

edited

Loading

vikrambohra commented Nov 22, 2024 •

edited

Loading