-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add REST Catalog tests to Spark 3.5 integration test #11093
base: main
Are you sure you want to change the base?
Conversation
f01f8e5
to
26f3d99
Compare
Add REST Catalog tests to Spark 3.4 integration test tmp save Fix integ tests Revert "Add REST Catalog tests to Spark 3.4 integration test" This reverts commit d052416. unneeded changes fix test retrigger checks Fix integ test Fix port already in use Fix unmatched validation catalog spotless Fix sqlite related test failures
9481bd4
to
8bcd853
Compare
...5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java
Outdated
Show resolved
Hide resolved
@@ -521,7 +524,7 @@ public void testFilesTableTimeTravelWithSchemaEvolution() throws Exception { | |||
optional(3, "category", Types.StringType.get()))); | |||
|
|||
spark.createDataFrame(newRecords, newSparkSchema).coalesce(1).writeTo(tableName).append(); | |||
|
|||
table.refresh(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this refresh only needed for REST?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, RESTTableOperations and other TableOperations has different mechanisms of refreshing metadata.
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestAlterTable.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
// JdbcCatalog, then different jdbc connections could provide different views of table | ||
// status even belonging to the same catalog. Reference: | ||
// https://www.sqlite.org/inmemorydb.html | ||
System.setProperty(CatalogProperties.CLIENT_POOL_SIZE, "1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this strictly needed to make tests pass? I don't think we set this in any other tests for that specific purpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is needed for test to pass. I attempted to run test without this line and here're the result: https://github.com/apache/iceberg/actions/runs/11297060489/job/31423193928?pr=11093
...5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestAlterTable.java
Outdated
Show resolved
Hide resolved
...5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
System.setProperty(CatalogProperties.CLIENT_POOL_SIZE, "1"); | ||
restServer.start(false); | ||
restCatalog = RCKUtils.initCatalogClient(); | ||
System.clearProperty("rest.port"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should pass a Map<String, String> to the REST server rather than having to set system properties (which then also need to be cleared again). @danielcweeks thoughts on this?
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
@BeforeEach | ||
public void before() { | ||
this.validationCatalog = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if we don't do any changes to how the validation catalog is configured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like you created a table on REST catalog, while validate against Hive catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is that I think all tests should be passing when we don't do any changes to the validation catalog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the original validationCatalog
is set to a HadoopCatalog
if catalogName
of the catalog being tested is named testhadoop
; otherwise, make the validationCatalog
the same as catalog
(which is strictly a HiveCatalog
, as defined by TestBase
class).
That will work in the old days, as we only have 2 types catalogs, either Hadoop or Hive, being tested - if the test subject is not a Hadoop Catalog, then setting the validation catalog to Hive Catalog will suffice the validation purpose. However, with the introduction of a 3rd type, REST catalog: when the test subject is a REST catalog, without changing how validationCatalog
is initialized, the validationCatalog
will be set to a Hive catalog. In this case, conducting test behaviors on REST catalog while validating the status post-change on Hive catalog won't work.
Unless, you are suggesting that we should make changes to TestBase
class where the catalog
being tested does not strictly need to be a HiveCatalog, and can be any type of catalog.
Map<String, String> catalogProperties = RCKUtils.environmentCatalogConfig(); | ||
Map<String, String> catalogProperties = Maps.newHashMap(); | ||
catalogProperties.putAll(RCKUtils.environmentCatalogConfig()); | ||
catalogProperties.putAll(Maps.fromProperties(System.getProperties())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned in a comment further above, maybe we should consider passing a Map<String, String> to RESTCatalogServer
rather than relying on system properties (which have to be cleared after they were configured)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that means we can revert this line here: catalogProperties.putAll(Maps.fromProperties(System.getProperties()));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.
This reverts commit ae29c41.
// JdbcCatalog, then different jdbc connections could provide different views of table | ||
// status even belonging to the same catalog. Reference: | ||
// https://www.sqlite.org/inmemorydb.html | ||
System.setProperty(CatalogProperties.CLIENT_POOL_SIZE, "1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was this conversation getting lost (not sure why): #11093 (comment)
…seWithCatalog.java Co-authored-by: Eduard Tudenhoefner <[email protected]>
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
new RESTServerExtension( | ||
Map.of( | ||
RESTCatalogServer.REST_PORT, String.valueOf(RCKUtils.findFreePort()), | ||
CatalogProperties.WAREHOUSE_LOCATION, warehouse.getAbsolutePath(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented on this yesterday but can't find the comment anymore. The warehouse
is currently only used for Hadoop and the REST server will create a separate temp directory for the REST catalog's warehouse, so we can remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
…seWithCatalog.java
port = socket.getLocalPort(); | ||
socket.close(); | ||
} catch (IOException e) { | ||
throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This could use try-with-resources and we should probably use UncheckedIOException rather than RE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, since RCKUtils
is being package protected again, and this findFreePort util is needed on the Spark integ side, I just moved this util out of RCKUtils
.
} | ||
|
||
private static void initRESTCatalog() { | ||
restCatalog = RCKUtils.initCatalogClient(REST_SERVER_EXTENSION.config()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So rather than making the client like theis and making the RESTUtils public. I think you can actually create the client in the extension and just grab it out of the context:
// In the RESTServerExtension class
@Override
public void beforeAll(ExtensionContext extensionContext) throws Exception {
// ... other init
// store the client
Store store = context.getStore(Namespace.GLOBAL);
store.put("rest-client", RCKUtils.initCatalogClient(...));
}
}
In this class then just use
@Override
public void beforeEach(ExtensionContext context) {
// get the rest catalog client from the context store
Store store = context.getStore(Namespace.GLOBAL);
this.restCatalog = store.get("rest-client");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made RCKUtils
package protected again in the latest commit. A little bit different from what you suggested above is: the Spark integ test rely on REST client being initialized in static context, so initializing restCatalog in beforeEach
might be hard. Solved the problem by making RESTServerExtention
taking control over client life cycle along side server life cycle. Anyway, in the end, RCKUtils
is package protected again.
@@ -76,15 +78,22 @@ static Map<String, String> environmentCatalogConfig() { | |||
HashMap::new)); | |||
} | |||
|
|||
static RESTCatalog initCatalogClient() { | |||
public static RESTCatalog initCatalogClient() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get away with leaving this as package protected and not expose the inner workings of the RCK and Extension. Take a look at my example below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Hey @haizhou-zhao this is looking really close. Some minor comments on how I think we can improve the Extension handling to better isolate the test, but other than that, I think it looks good. |
For issue: #11079