Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Gluten runs slower for some TPC-H queries than Vanilla Spark #8466

Open
VaibhavFRI opened this issue Jan 8, 2025 · 4 comments
Open
Labels
bug Something isn't working triage

Comments

@VaibhavFRI
Copy link

VaibhavFRI commented Jan 8, 2025

Backend

VL (Velox)

Bug description

I am testing Gluten with Velox backend for the given TPC-H benchmark scripts provided in the repo. It is observed that few SQL queries q7, q9, q10, q12 runs slower with gluten.
What is the reason for the slower performance for these queries and how to improve them?
I am running the tests on ARM based AWS instance :
m7g.4xlarge , VCPUs = 16, Memory = 64GB
Spark Version : 3.5.2
Data size : Used scale factor SF=100
Below is the shell script used to run the tests:
For Gluten

GLUTEN_JAR=/path/to/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar
SPARK_HOME=/home/spark/spark-3.5.2
cat tpch_parquet.scala | ${SPARK_HOME}/bin/spark-shell \
  --master spark://172.32.5.244:7077 --deploy-mode client \
  --conf spark.plugins=org.apache.gluten.GlutenPlugin \
  --conf spark.driver.extraClassPath=${GLUTEN_JAR} \
  --conf spark.executor.extraClassPath=${GLUTEN_JAR} \
  --conf spark.memory.offHeap.enabled=true \
  --conf spark.memory.offHeap.size=12g \
  --conf spark.gluten.sql.columnar.forceShuffledHashJoin=true \
  --conf spark.driver.memory=4G    \
  --conf spark.executor.instances=1   \
  --conf spark.executor.memory=30G   \
  --conf spark.executor.cores=16  \
  --conf spark.executor.memoryOverhead=2g \
  --conf spark.driver.maxResultSize=2g \
  --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \
  --conf spark.driver.extraJavaOptions="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED" \
  --conf spark.executor.extraJavaOptions="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED" \

For Vanilla Spark

SPARK_HOME=/home/spark/spark-3.5.2
cat tpch_parquet.scala | ${SPARK_HOME}/bin/spark-shell \
  --master spark://172.32.5.244:7077 --deploy-mode client \
  --conf spark.memory.offHeap.enabled=true \
  --conf spark.memory.offHeap.size=12g \
  --conf spark.driver.memory=4G    \
  --conf spark.executor.instances=1   \
  --conf spark.executor.memory=30G   \
  --conf spark.executor.cores=16  \
  --conf spark.executor.memoryOverhead=2g \
  --conf spark.driver.maxResultSize=2g \

image

Spark version

Spark-3.5.x

Spark configurations

GLUTEN_JAR=/path/to/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar
SPARK_HOME=/home/spark/spark-3.5.2
cat tpch_parquet.scala | ${SPARK_HOME}/bin/spark-shell
--master spark://172.32.5.244:7077 --deploy-mode client
--conf spark.plugins=org.apache.gluten.GlutenPlugin
--conf spark.driver.extraClassPath=${GLUTEN_JAR}
--conf spark.executor.extraClassPath=${GLUTEN_JAR}
--conf spark.memory.offHeap.enabled=true
--conf spark.memory.offHeap.size=12g
--conf spark.gluten.sql.columnar.forceShuffledHashJoin=true
--conf spark.driver.memory=4G
--conf spark.executor.instances=1
--conf spark.executor.memory=30G
--conf spark.executor.cores=16
--conf spark.executor.memoryOverhead=2g
--conf spark.driver.maxResultSize=2g
--conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
--conf spark.driver.extraJavaOptions="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED"
--conf spark.executor.extraJavaOptions="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED" \

System information

Gluten Version: 1.3.0-SNAPSHOT
Commit: 4dfdfd7
CMake Version: 3.28.3
System: Linux-6.8.0-1021-aws
Arch: aarch64
CPU Name:
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.10/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

@VaibhavFRI VaibhavFRI added bug Something isn't working triage labels Jan 8, 2025
@FelixYBW
Copy link
Contributor

FelixYBW commented Jan 8, 2025

You may refer to the notebook for the configurations:
https://github.com/apache/incubator-gluten/blob/main/tools/workload/benchmark_velox/native_sql_initialize.ipynb
 

Your executor.memory is too high, offheap.memory is too small in Gluten test. you need to set parallelism. executor.core=16 is too high for Gluten, you may use 2x4 or 4x4.

Are you in Velox slack channel?

@FelixYBW FelixYBW changed the title Gluten runs slower for some TPC-H queries than Vanilla Spark [VL] Gluten runs slower for some TPC-H queries than Vanilla Spark Jan 8, 2025
@VaibhavFRI
Copy link
Author

VaibhavFRI commented Jan 9, 2025

Thanks I will refer to these configurations and test again.
What is the recommended offheap memory size and executor memory size with gluten and with vanilla spark?

I have sent the request to be added in slack channel.

@FelixYBW
Copy link
Contributor

FelixYBW commented Jan 9, 2025

Thanks I will refer to these configurations and test again. What is the recommended offheap memory size and executor memory size with gluten and with vanilla spark?

I have sent the request to be added in slack channel.

you may join the ASF slack workspace and then incubator-gluten channel:
invitation link(join.slack.com/t/the-asf/shared_invite/zt-2x74bfj6u-_3mH6Njlq6lZoIAZKNsutw)

@VaibhavFRI
Copy link
Author

Hi @FelixYBW
I followed your suggestions for the settings. I set the below confs with gluten and vanilla:
With Gluten :
off-heap mem = 30GB, total executor memory = 12 GB, Executors 4x4, parallelism = 32
With Vanilla :
off-heap mem = 12GB, total executor memory = 30GB, Executors 4x4, parallelism = 32

Observations:
While most queries now get a speedup with Gluten, Q10 is still performing worse.
For a few queries (Q22, Q13), the speedup compared to Vanilla is minimal.
Could you help explain why Q10 is underperforming and why some queries (like Q22, Q13) do not show much improvement compared to Vanilla?

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants