Am I using the newest version of the library?
Is there an existing issue for this?
Current Behavior
For the past time I used my own compiled spark excel jar, because the default one did not work on AWS EMR (latest version 7.9.0).
After the "thin" assembly was introduced I did a first test and it worked very well for reading excel. But occasionally we also write excel and there it fails with NoSuchMethod.
Expected Behavior
The main reason for the failure is the presence of pre installed hadoop 3.4.1 jars on AWS EMR. hadoop 3.4.1 (is latest version) utilizes commons-compress version 1.22. poi also needs commons-compress and uses API introduced with 1.25 (currently binds 1.27). When running spark-submit the class loader pulls the hadoop provided commons-compress first,so we do not have the needed API function and the job crashes.
I assume this issue holds true for any other "Spark Cluster service provided by your favourite cloud provider".
Now we could try to force some other class load ordering (haven't investigated that) or patching the EMR installation (that will be tricky), but I guess "shading" would be the best option here. Or has someone a better idea how to cope with it?
If shading is the way to go I would suggest to also offer a classifier for "emr" (or a more generic name, because this issue will come up for other cluster environments too). @nightscape what are your thoughts on this?
BR
Christian
Steps To Reproduce
Here are some details on the issue:
The error message:

The change in commons-compress (materialized in 1.25.0):

Environment
- Spark version: 3.5.5
- Spark-Excel version: 3.5.6_0.31.2
- OS: Amazon Linux 2023
- Cluster environment: AWS EMR 7.9
Anything else?
No response