Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-6705][CH] Basic Support Delta write #6767

Merged
merged 5 commits into from
Aug 9, 2024

Conversation

baibaichen
Copy link
Contributor

@baibaichen baibaichen commented Aug 9, 2024

What changes were proposed in this pull request?

Rework #6706 after #6761

In ClickHouse Backend, we are preparing to support native delta write in Spark 3.5 and delta 3.2. This PR refactor codes to support DelayedCommitProtocol in CHColumnarWriteFilesRDD. The Basic idea is introducing CHColumnarWrite,

trait CHColumnarWrite[T <: FileCommitProtocol]  {
  def setupTask(): Unit
  def abortTask(): Unit
  def jobId: String
  def taskAttemptContext: TaskAttemptContext

  def committer: T
  def commitTask(batch: ColumnarBatch): Option[WriteTaskResult]
}

The main difference between DelayedCommitProtocol and HadoopMapReduceCommitProtocol are how to setup and commit task. Here are class hierarchy:

GlutenCommitProtocol
  |_ HadoopMapReduceCommitProtocolWrite
  |_ CHDelayedCommitProtocolWrite

This PR also revert SparkWriteFilesCommitProtocol and move it back to velox module

(Fixes: #6705)

How was this patch tested?

Using Existed UTs

Copy link

github-actions bot commented Aug 9, 2024

#6705

Copy link

github-actions bot commented Aug 9, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/DeltaNativeWrite2 branch from 17dcde5 to 0dca33c Compare August 9, 2024 09:30
Copy link

github-actions bot commented Aug 9, 2024

Run Gluten Clickhouse CI

@github-actions github-actions bot added the CORE works for Gluten Core label Aug 9, 2024
Copy link

github-actions bot commented Aug 9, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen merged commit 2dd5632 into apache:main Aug 9, 2024
42 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_master_08_09_2024_time.csv log/native_master_08_08_2024_ec62a76c88_time.csv difference percentage
q1 14.17 13.83 -0.340 97.60%
q2 14.01 14.31 0.299 102.13%
q3 4.72 5.09 0.373 107.90%
q4 70.34 68.76 -1.580 97.75%
q5 7.84 7.91 0.070 100.89%
q6 3.36 3.50 0.141 104.21%
q7 7.00 7.11 0.107 101.53%
q8 5.02 4.84 -0.182 96.37%
q9 24.25 24.26 0.004 100.02%
q10 9.90 9.12 -0.773 92.19%
q11 36.92 37.13 0.215 100.58%
q12 1.45 1.51 0.067 104.61%
q13 6.33 6.28 -0.059 99.06%
q14a 45.41 45.93 0.518 101.14%
q14b 40.99 41.44 0.449 101.09%
q15 2.65 2.54 -0.112 95.78%
q16 45.96 44.87 -1.089 97.63%
q17 4.84 4.80 -0.038 99.21%
q18 6.61 6.91 0.300 104.55%
q19 2.37 3.65 1.287 154.36%
q20 1.55 1.58 0.030 101.94%
q21 1.17 1.19 0.020 101.70%
q22 7.67 7.64 -0.031 99.60%
q23a 102.40 103.47 1.068 101.04%
q23b 126.57 125.67 -0.898 99.29%
q24a 101.93 94.32 -7.616 92.53%
q24b 93.38 95.84 2.463 102.64%
q25 4.19 6.26 2.073 149.50%
q26 3.27 3.33 0.062 101.90%
q27 4.24 3.96 -0.273 93.57%
q28 31.87 30.52 -1.354 95.75%
q29 12.10 9.44 -2.662 78.00%
q30 7.39 5.19 -2.202 70.21%
q31 7.21 6.93 -0.278 96.15%
q32 1.22 1.33 0.105 108.61%
q33 4.13 4.33 0.197 104.78%
q34 3.71 4.10 0.392 110.56%
q35 7.84 7.82 -0.022 99.72%
q36 4.74 4.50 -0.235 95.04%
q37 4.57 4.79 0.223 104.87%
q38 13.56 13.05 -0.507 96.26%
q39a 3.04 3.60 0.561 118.48%
q39b 2.72 3.26 0.540 119.83%
q40 4.10 3.77 -0.326 92.05%
q41 0.61 0.61 0.002 100.27%
q42 0.89 0.86 -0.025 97.24%
q43 4.21 4.43 0.220 105.22%
q44 11.53 9.53 -1.997 82.68%
q45 3.29 3.20 -0.082 97.50%
q46 3.74 4.31 0.571 115.26%
q47 17.76 17.75 -0.009 99.95%
q48 5.19 5.16 -0.034 99.35%
q49 9.40 8.36 -1.031 89.03%
q50 25.76 21.24 -4.516 82.47%
q51 9.93 9.49 -0.437 95.60%
q52 1.04 1.10 0.061 105.93%
q53 2.53 2.33 -0.201 92.05%
q54 4.14 3.90 -0.234 94.35%
q55 1.04 1.05 0.019 101.84%
q56 4.26 4.14 -0.121 97.15%
q57 10.20 10.69 0.492 104.82%
q58 2.39 2.40 0.012 100.49%
q59 10.80 10.63 -0.165 98.48%
q60 4.47 4.05 -0.414 90.73%
q61 5.89 4.01 -1.880 68.09%
q62 4.94 4.94 0.001 100.02%
q63 2.34 2.29 -0.050 97.86%
q64 59.62 63.06 3.436 105.76%
q65 16.99 17.84 0.845 104.97%
q66 3.80 3.73 -0.069 98.18%
q67 382.25 395.30 13.052 103.41%
q68 3.53 4.63 1.099 131.10%
q69 5.57 5.30 -0.267 95.21%
q70 11.83 11.05 -0.780 93.41%
q71 3.35 2.50 -0.850 74.63%
q72 219.09 214.28 -4.809 97.80%
q73 2.56 2.36 -0.197 92.29%
q74 23.94 22.70 -1.237 94.83%
q75 26.26 26.38 0.117 100.45%
q76 11.44 11.51 0.067 100.58%
q77 2.18 2.27 0.093 104.26%
q78 49.59 49.68 0.086 100.17%
q79 3.89 4.00 0.109 102.79%
q80 12.32 12.47 0.144 101.17%
q81 4.76 4.93 0.176 103.71%
q82 6.67 6.74 0.066 100.99%
q83 1.77 1.70 -0.071 95.97%
q84 2.87 2.57 -0.306 89.36%
q85 7.17 7.50 0.329 104.58%
q86 3.87 3.84 -0.039 98.99%
q87 13.25 14.26 1.008 107.60%
q88 21.38 21.16 -0.221 98.97%
q89 3.47 3.68 0.212 106.11%
q90 3.15 3.17 0.026 100.84%
q91 2.20 2.45 0.255 111.58%
q92 1.27 1.32 0.052 104.06%
q93 39.93 40.10 0.173 100.43%
q94 25.30 24.72 -0.575 97.73%
q9 90.52 88.30 -2.218 97.55%
q5 2.52 2.67 0.143 105.67%
q96 17.49 17.29 -0.200 98.86%
q97 1.98 1.89 -0.094 95.27%
q98 9.70 9.41 -0.296 96.95%
q99 9.70 9.41 -0.296 96.95%
total 2152.54 2142.96 -9.573 99.56%

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_08_09_2024_time.csv log/native_master_08_08_2024_ec62a76c88_time.csv difference percentage
q1 39.24 39.68 0.443 101.13%
q2 30.35 30.10 -0.256 99.16%
q3 52.92 53.18 0.266 100.50%
q4 42.74 43.41 0.669 101.56%
q5 105.04 104.55 -0.484 99.54%
q6 11.00 10.72 -0.280 97.45%
q7 115.54 116.01 0.472 100.41%
q8 114.48 115.70 1.218 101.06%
q9 167.90 167.67 -0.231 99.86%
q10 64.31 64.62 0.313 100.49%
q11 26.59 26.21 -0.384 98.56%
q12 29.35 30.37 1.021 103.48%
q13 51.31 51.23 -0.080 99.84%
q14 22.89 25.81 2.924 112.77%
q15 52.64 53.57 0.933 101.77%
q16 19.38 17.90 -1.484 92.34%
q17 131.87 134.39 2.518 101.91%
q18 197.44 198.60 1.156 100.59%
q19 26.01 24.95 -1.056 95.94%
q20 40.12 40.56 0.439 101.09%
q21 381.48 375.35 -6.126 98.39%
q22 16.02 15.51 -0.515 96.79%
total 1738.62 1740.09 1.473 100.08%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLICKHOUSE CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Basic Support Write Paruet in Delta
3 participants