Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#6270] (Improvement) unified cache framework #6271

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sunxiaojian
Copy link
Contributor

What changes were proposed in this pull request?

Unified cache framework

Why are the changes needed?

Fix: #6270

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

N/A

@tengqm
Copy link
Contributor

tengqm commented Jan 16, 2025

lgtm

@FANNG1
Copy link
Contributor

FANNG1 commented Jan 16, 2025

I am concerned that this will introduce extra dependencies that may conflict with Spark runtime jars. Could you analysis which jars are introduced and could we shade them?

@sunxiaojian
Copy link
Contributor Author

I am concerned that this will introduce extra dependencies that may conflict with Spark runtime jars. Could you analysis which jars are introduced and could we shade them?

@FANNG1 The dependency of caffeine is relatively simple, and there are currently no packages that conflict with Spark. Spark itself does not depend on caffeine, so there should be no need shade them yet

@FANNG1
Copy link
Contributor

FANNG1 commented Jan 17, 2025

@jerryshao WDYT?

@jerqi
Copy link
Contributor

jerqi commented Jan 17, 2025

I am concerned that this will introduce extra dependencies that may conflict with Spark runtime jars. Could you analysis which jars are introduced and could we shade them?

@FANNG1 The dependency of caffeine is relatively simple, and there are currently no packages that conflict with Spark. Spark itself does not depend on caffeine, so there should be no need shade them yet

You should shade caffeine Because caffeine may be used by other projects like Iceberg. Caffeine is a common libary.

@sunxiaojian
Copy link
Contributor Author

sunxiaojian commented Jan 17, 2025

I am concerned that this will introduce extra dependencies that may conflict with Spark runtime jars. Could you analysis which jars are introduced and could we shade them?

@FANNG1 The dependency of caffeine is relatively simple, and there are currently no packages that conflict with Spark. Spark itself does not depend on caffeine, so there should be no need shade them yet

You should shade caffeine Because caffeine may be used by other projects like Iceberg. Caffeine is a common libary.

ok, I got it, have another question, Do all references in Gravitino need to be shaded ?

@jerqi
Copy link
Contributor

jerqi commented Jan 17, 2025

I am concerned that this will introduce extra dependencies that may conflict with Spark runtime jars. Could you analysis which jars are introduced and could we shade them?

@FANNG1 The dependency of caffeine is relatively simple, and there are currently no packages that conflict with Spark. Spark itself does not depend on caffeine, so there should be no need shade them yet

You should shade caffeine Because caffeine may be used by other projects like Iceberg. Caffeine is a common libary.

ok, I got it, have another question, Do all references in Gravitino need to be shaded ?

You just need to shade the common libraries.

@sunxiaojian
Copy link
Contributor Author

@jerqi I understand that it should be adding a 'spark-common-runtime' module, right?

@jerqi
Copy link
Contributor

jerqi commented Jan 17, 2025

@jerqi I understand that it should be adding a 'spark-common-runtime' module, right?

No need. We should include spark-common jar in spark-runtime jar.

@@ -171,3 +174,18 @@ configurations {
artifacts {
add("testArtifacts", testJar)
}

tasks.withType<ShadowJar>(ShadowJar::class.java) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could shade the jar in spark-runtime module, like spark-connector/v3.5/spark-runtime/build.gradle.kts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add this to every spark-runtime version package? Should we extract the common packages or handle them directly in the common module? This way, we can avoid redundant processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be inclined to use @FANNG1 's way to shade jars in spark-runtime, so we don't have to do the shade in this common component, and change all the implementation to shadow.

@tengqm
Copy link
Contributor

tengqm commented Jan 18, 2025

From a long term perspective, we are really supposed to be cautious when introducing new dependencies. We may need to revisit our existing ones and see if those are necessary.
Most modern software developed in Java, Python, Go, Rust etc have a messy graph of package dependencies. This is unavoidable, but still something we as a team should watch out.
It is perfectly fine to introduce an external dependency when the benefits we gain from it is justified. But bear in mind, all of them are animals in our zoo, and we are gonna take care of it since the day it is incorporated.
I'm too old-fashioned to appreciate this kind of wide-range dependencies, for fear of maintenance burden, supply chain attacks, upgrade efforts, to name a few.

@FANNG1
Copy link
Contributor

FANNG1 commented Jan 20, 2025

From a long term perspective, we are really supposed to be cautious when introducing new dependencies. We may need to revisit our existing ones and see if those are necessary. Most modern software developed in Java, Python, Go, Rust etc have a messy graph of package dependencies. This is unavoidable, but still something we as a team should watch out. It is perfectly fine to introduce an external dependency when the benefits we gain from it is justified. But bear in mind, all of them are animals in our zoo, and we are gonna take care of it since the day it is incorporated. I'm too old-fashioned to appreciate this kind of wide-range dependencies, for fear of maintenance burden, supply chain attacks, upgrade efforts, to name a few.

I'm inclined to the point of @tengqm , I doubt whether it's worthwhile to unify the cache framework by introducing another dependency.

@sunxiaojian
Copy link
Contributor Author

@tengqm @FANNG1 Usually, only one caching method is kept in a project, which makes it convenient for us to review or solve only one type of problem when it occurs, including reference issues, vulnerability issues, or other usage issues. Of course, this is not a mandatory option.

@tengqm
Copy link
Contributor

tengqm commented Jan 20, 2025

@tengqm @FANNG1 Usually, only one caching method is kept in a project, which makes it convenient for us to review or solve only one type of problem when it occurs, including reference issues, vulnerability issues, or other usage issues. Of course, this is not a mandatory option.

Cool. Then this PR is actually unifying caching to a single package?
A big yes from me then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] unified cache framework
5 participants