You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a custom Catalog implementation that overrides TableOperations.locationProvider. But running DML queries with Spark doesn't seem to invoke that overriden method. Instead it calls LocationProviders.locationsFor (the BaseMetastoreTableOperations implementation of locationProvider) directly:
It looks like this may have broken in #9029 (@przemekd, @aokolnychyi) -- I tried reverting that change and my locationProvider is now invoked as expected.
Willingness to contribute
I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time
The text was updated successfully, but these errors were encountered:
@jamesbornholt I see. It looks like that PR fixed one issue but also introduced new one. I was unaware that you can provide custom default location provider by implementing custom Catalog. But I still argue the spark job shouldn't fail if it just reads data from tables without providing that custom location provider implementation.
I can change the code to use:
table.locationProvider()
for getting the table's location provider, but instead of letting it possibly failing right away we could wrap that into a custom Result class such as:
import java.io.Serializable;
public class Result<T> implements Serializable {
private static final long serialVersionUID = 1L;
private final T value;
private final Exception error;
private Result(T value, Exception error) {
this.value = value;
this.error = error;
}
public static <T> Result<T> success(T value) {
return new Result<>(value, null);
}
public static <T> Result<T> failure(Exception error) {
return new Result<>(null, error);
}
public boolean isSuccess() {
return error == null;
}
public T getValue() {
if (error != null) {
throw new IllegalStateException("Cannot get value from a failed result", error);
}
return value;
}
public Exception getError() {
if (error == null) {
throw new IllegalStateException("No error present in a successful result");
}
return error;
}
}
and it will get only unpacked when the location provider is truly needed.
Let me know what you think about that @jamesbornholt@aokolnychyi
I agree it’s desirable to allow read queries to succeed without the custom location provider available. I like the idea of just deferring the exception like this!
Apache Iceberg version
1.6.1
Query engine
Spark
Please describe the bug 🐞
I have a custom Catalog implementation that overrides
TableOperations.locationProvider
. But running DML queries with Spark doesn't seem to invoke that overriden method. Instead it callsLocationProviders.locationsFor
(theBaseMetastoreTableOperations
implementation oflocationProvider
) directly:iceberg/core/src/main/java/org/apache/iceberg/SerializableTable.java
Line 244 in 11d21b2
through this stack trace when running something like
spark.sql("INSERT INTO ...")
:It looks like this may have broken in #9029 (@przemekd, @aokolnychyi) -- I tried reverting that change and my
locationProvider
is now invoked as expected.Willingness to contribute
The text was updated successfully, but these errors were encountered: