-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ADAM dependency version to 0.27.0 #33
base: master
Are you sure you want to change the base?
Conversation
@mlinderm Would adding Scala 2.12 support be useful? Spark 2.4.3 supports Scala 2.12 but getting things running on the binary distribution is a nightmare. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
@heuermh I am observing a substantial performance degradation with the new version. On a 16 core workstation, the time for calling CNVs in 2535 samples went from 9m17s with the old version, to 12m22s for the new version (with ADAM 0.27 and Spark 2.4.3). The difference seems to be in PCA/SVD step. Do you have any guesses as to what might have changed between the old spark/ADAM version the current version? |
There are a lot of code changes between 0.24.0 and 0.27.0 in ADAM, but not much that would affect numerical method performance bigdatagenomics/adam@maint_spark2_2.11-0.24.0...maint_spark2_2.11-0.27.0 I imagine differences between Spark 2.1.x and 2.4.3 might be more significant though. Is there a smaller benchmark we could use to reproduce what you are seeing, say with fewer samples? |
There are datasets ranging from 500 samples to 2535 samples on the AMP BDG cluster at |
Great, thanks. I'll also take a look tomorrow. |
I tried several spark distributions with the original code base (older ADAM). The performance degradation seems to occur between 2.2.3 and 2.3.3, that is 2.2.3 was 1m38s while 2.3.3 was 2m13s for 500 samples. One guess was that it was an issue with an upgrade to Breeze, but changing the Breeze dependency to 0.13.2 with Spark 2.4.3 did not improve performance. |
Fixes #34
Updates ADAM dependency version to 0.27.0. Note the workaround in
maven-shade-plugin
configuration to prevent runtime conflicts with parquet and avro versions.