You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for study, i run spark cluster standalone in my local, and i have developed my own IcebergRestCatalog.
My IcebergRestCatalog Iceberg spec is based on 1.6.1 version
for running add_files provided by spark, like below.
Caused by: org.apache.iceberg.exceptions.RuntimeIOException: Failed to get file system for path: s3://dataquery-warehouse/iceberg/dataquery/yearly_month_clicks/metadata/stage-31-task-1619-manifest-855c8009-c073-48b0-9fd7-e12c1daf8930.avro
at org.apache.iceberg.hadoop.Util.getFs(Util.java:58)
at org.apache.iceberg.hadoop.HadoopOutputFile.fromPath(HadoopOutputFile.java:53)
at org.apache.iceberg.hadoop.HadoopFileIO.newOutputFile(HadoopFileIO.java:97)
at org.apache.iceberg.spark.SparkTableUtil.buildManifest(SparkTableUtil.java:368)
at org.apache.iceberg.spark.SparkTableUtil.lambda$importSparkPartitions$1e94a719$1(SparkTableUtil.java:796)
at org.apache.spark.sql.Dataset.$anonfun$mapPartitions$1(Dataset.scala:3414)
at org.apache.spark.sql.execution.MapPartitionsExec.$anonfun$doExecute$3(objects.scala:198)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.iceberg.hadoop.Util.getFs(Util.java:56)
from my point of view, spark try to create staging metadata from location of which iceberg table metadata has.
here, iceberg metadata location is started with s3, and scheme is fixed as s3.
Spark try to access file system by hadoop S3AFileSystem, thus it seems scheme s3 is not supported, s3a should be right scheme.
how can i overcome this issue?
thanks, sincerely
The text was updated successfully, but these errors were encountered:
This is actually related to #11541 . Add Files uses some Hadoop Filesystem classes under the hood and because of this you currently must have a fully setup HadoopConfig in your runtime to do add_files. With #11541 completed we should be able to fix this for addfiles and use s3FileIO instead of hadoop filesystem classes
Query engine
Spark 3.5.3
Question
for study, i run spark cluster standalone in my local, and i have developed my own IcebergRestCatalog.
My IcebergRestCatalog Iceberg spec is based on 1.6.1 version
for running add_files provided by spark, like below.
error occurs like below.
from my point of view, spark try to create staging metadata from location of which iceberg table metadata has.
here, iceberg metadata location is started with
s3
, and scheme is fixed as s3.Spark try to access file system by hadoop S3AFileSystem, thus it seems scheme s3 is not supported, s3a should be right scheme.
how can i overcome this issue?
thanks, sincerely
The text was updated successfully, but these errors were encountered: