[VL] Gluten runs slower for some TPC-H queries than Vanilla Spark #8466

VaibhavFRI · 2025-01-08T09:01:42Z

Backend

VL (Velox)

Bug description

I am testing Gluten with Velox backend for the given TPC-H benchmark scripts provided in the repo. It is observed that few SQL queries q7, q9, q10, q12 runs slower with gluten.
What is the reason for the slower performance for these queries and how to improve them?
I am running the tests on ARM based AWS instance :
m7g.4xlarge , VCPUs = 16, Memory = 64GB
Spark Version : 3.5.2
Data size : Used scale factor SF=100
Below is the shell script used to run the tests:
For Gluten

GLUTEN_JAR=/path/to/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar
SPARK_HOME=/home/spark/spark-3.5.2
cat tpch_parquet.scala | ${SPARK_HOME}/bin/spark-shell \
  --master spark://172.32.5.244:7077 --deploy-mode client \
  --conf spark.plugins=org.apache.gluten.GlutenPlugin \
  --conf spark.driver.extraClassPath=${GLUTEN_JAR} \
  --conf spark.executor.extraClassPath=${GLUTEN_JAR} \
  --conf spark.memory.offHeap.enabled=true \
  --conf spark.memory.offHeap.size=12g \
  --conf spark.gluten.sql.columnar.forceShuffledHashJoin=true \
  --conf spark.driver.memory=4G    \
  --conf spark.executor.instances=1   \
  --conf spark.executor.memory=30G   \
  --conf spark.executor.cores=16  \
  --conf spark.executor.memoryOverhead=2g \
  --conf spark.driver.maxResultSize=2g \
  --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \
  --conf spark.driver.extraJavaOptions="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED" \
  --conf spark.executor.extraJavaOptions="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED" \

For Vanilla Spark

SPARK_HOME=/home/spark/spark-3.5.2
cat tpch_parquet.scala | ${SPARK_HOME}/bin/spark-shell \
  --master spark://172.32.5.244:7077 --deploy-mode client \
  --conf spark.memory.offHeap.enabled=true \
  --conf spark.memory.offHeap.size=12g \
  --conf spark.driver.memory=4G    \
  --conf spark.executor.instances=1   \
  --conf spark.executor.memory=30G   \
  --conf spark.executor.cores=16  \
  --conf spark.executor.memoryOverhead=2g \
  --conf spark.driver.maxResultSize=2g \

Spark version

Spark-3.5.x

Spark configurations

GLUTEN_JAR=/path/to/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar
SPARK_HOME=/home/spark/spark-3.5.2
cat tpch_parquet.scala | ${SPARK_HOME}/bin/spark-shell
--master spark://172.32.5.244:7077 --deploy-mode client
--conf spark.plugins=org.apache.gluten.GlutenPlugin
--conf spark.driver.extraClassPath=${GLUTEN_JAR}
--conf spark.executor.extraClassPath=${GLUTEN_JAR}
--conf spark.memory.offHeap.enabled=true
--conf spark.memory.offHeap.size=12g
--conf spark.gluten.sql.columnar.forceShuffledHashJoin=true
--conf spark.driver.memory=4G
--conf spark.executor.instances=1
--conf spark.executor.memory=30G
--conf spark.executor.cores=16
--conf spark.executor.memoryOverhead=2g
--conf spark.driver.maxResultSize=2g
--conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
--conf spark.driver.extraJavaOptions="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED"
--conf spark.executor.extraJavaOptions="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED" \

System information

Gluten Version: 1.3.0-SNAPSHOT
Commit: 4dfdfd7
CMake Version: 3.28.3
System: Linux-6.8.0-1021-aws
Arch: aarch64
CPU Name:
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.10/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

The text was updated successfully, but these errors were encountered:

FelixYBW · 2025-01-08T21:20:46Z

You may refer to the notebook for the configurations:
https://github.com/apache/incubator-gluten/blob/main/tools/workload/benchmark_velox/native_sql_initialize.ipynb

Your executor.memory is too high, offheap.memory is too small in Gluten test. you need to set parallelism. executor.core=16 is too high for Gluten, you may use 2x4 or 4x4.

Are you in Velox slack channel?

VaibhavFRI · 2025-01-09T05:08:08Z

Thanks I will refer to these configurations and test again.
What is the recommended offheap memory size and executor memory size with gluten and with vanilla spark?

I have sent the request to be added in slack channel.

FelixYBW · 2025-01-09T07:15:52Z

Thanks I will refer to these configurations and test again. What is the recommended offheap memory size and executor memory size with gluten and with vanilla spark?

I have sent the request to be added in slack channel.

you may join the ASF slack workspace and then incubator-gluten channel:
invitation link(join.slack.com/t/the-asf/shared_invite/zt-2x74bfj6u-_3mH6Njlq6lZoIAZKNsutw)

VaibhavFRI · 2025-01-16T06:16:00Z

Hi @FelixYBW
I followed your suggestions for the settings. I set the below confs with gluten and vanilla:
With Gluten :
off-heap mem = 30GB, total executor memory = 12 GB, Executors 4x4, parallelism = 32
With Vanilla :
off-heap mem = 12GB, total executor memory = 30GB, Executors 4x4, parallelism = 32

Observations:
While most queries now get a speedup with Gluten, Q10 is still performing worse.
For a few queries (Q22, Q13), the speedup compared to Vanilla is minimal.
Could you help explain why Q10 is underperforming and why some queries (like Q22, Q13) do not show much improvement compared to Vanilla?

VaibhavFRI added bug Something isn't working triage labels Jan 8, 2025

FelixYBW changed the title ~~Gluten runs slower for some TPC-H queries than Vanilla Spark~~ [VL] Gluten runs slower for some TPC-H queries than Vanilla Spark Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Gluten runs slower for some TPC-H queries than Vanilla Spark #8466

[VL] Gluten runs slower for some TPC-H queries than Vanilla Spark #8466

VaibhavFRI commented Jan 8, 2025 •

edited

Loading

FelixYBW commented Jan 8, 2025

VaibhavFRI commented Jan 9, 2025 •

edited

Loading

FelixYBW commented Jan 9, 2025

VaibhavFRI commented Jan 16, 2025

[VL] Gluten runs slower for some TPC-H queries than Vanilla Spark #8466

[VL] Gluten runs slower for some TPC-H queries than Vanilla Spark #8466

Comments

VaibhavFRI commented Jan 8, 2025 • edited Loading

Backend

Bug description

Spark version

Spark configurations

System information

Relevant logs

FelixYBW commented Jan 8, 2025

VaibhavFRI commented Jan 9, 2025 • edited Loading

FelixYBW commented Jan 9, 2025

VaibhavFRI commented Jan 16, 2025

VaibhavFRI commented Jan 8, 2025 •

edited

Loading

VaibhavFRI commented Jan 9, 2025 •

edited

Loading