-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[openhouse] Implement TableOperations-specific FileIO instance #241
base: main
Are you sure you want to change the base?
[openhouse] Implement TableOperations-specific FileIO instance #241
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welcome to the OH community! Aligned on PR direction but left some comments
@@ -88,6 +92,8 @@ public class OpenHouseCatalog extends BaseMetastoreCatalog | |||
|
|||
protected Map<String, String> properties; | |||
|
|||
private Cache<TableOperations, FileIO> fileIOCloser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these look like temporary duplication of code that will be available in upstream iceberg with version bump. Can we add a todo identifiying the PR and version of iceberg that introduced the fileIOcloser and instructions on removing the code duplication later?
.snapshotApi(snapshotApi) | ||
.cluster(cluster) | ||
.build(); | ||
FileIO tableOperationsLocalFileIO = loadFileIO(properties); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also change the signature of loadFileIO to following:
protected FileIO loadFileIO(TableIdentifier tableIdentifier, Map<String, String> properties) {
reason:
Each table can have its own storage, table1 -> hdfs, table2 -> s3, table3 -> adls . We'd need to instantiate/load the right FileIO per table and properties.
@@ -592,4 +603,25 @@ private TableMetadata createStagedMetadata() { | |||
return new StaticTableOperations(tableLocation, fileIO).refresh(); | |||
} | |||
} | |||
|
|||
@Override | |||
public void close() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add unit tests to validate this logic?
@@ -41,6 +42,7 @@ dependencies { | |||
exclude group: 'com.zaxxer', module: 'HikariCP-java7' | |||
exclude group: 'org.apache.commons', module: 'commons-lang3' | |||
} | |||
implementation("com.github.ben-manes.caffeine:caffeine:" + caffeineVersion) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2,6 +2,9 @@ | |||
|
|||
import static com.linkedin.openhouse.javaclient.OpenHouseTableOperations.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for fileio per table, we might also need similar changes in OpenHouseInternalCatalog https://github.com/linkedin/openhouse/blob/main/iceberg/openhouse/internalcatalog/src/main/java/com/linkedin/openhouse/internal/catalog/OpenHouseInternalCatalog.java
@@ -88,6 +92,8 @@ public class OpenHouseCatalog extends BaseMetastoreCatalog | |||
|
|||
protected Map<String, String> properties; | |||
|
|||
private Cache<TableOperations, FileIO> fileIOCloser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. How long the entries are present in the cache? Wondering if the cache will be populated with too many objects if there are more tables operation done in a short period of time. Do you think is going to consume more memory?
private Cache<TableOperations, FileIO> newFileIOCloser() { | ||
return Caffeine.newBuilder() | ||
.weakKeys() | ||
.removalListener( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the listener is invoked to perform close?
Summary
[Issue][(https://github.com/linkedin/openhouse/issues/#nnn)] Briefly discuss the summary of the changes made in this
pull request in 2-3 lines.](#240)
In order to implement FileIOs that are TableOperations-specific, creating a FileIO instance on a per TableOperations-basis is required.
This is similar to how other Iceberg-compliant Catalogs are implemented: apache/iceberg#10893 and implemented in the same way (taken from the HiveCatalog / GlueCatalog implementations before it got refactored into FileIOTracker)
This feature is needed to implement data access tokens.
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
TODO
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.