Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Umbrella][Feature] Support spark engine #155

Open
2 of 8 tasks
Alibaba-HZY opened this issue Dec 10, 2024 · 3 comments
Open
2 of 8 tasks

[Umbrella][Feature] Support spark engine #155

Alibaba-HZY opened this issue Dec 10, 2024 · 3 comments
Labels
feature New feature or request

Comments

@Alibaba-HZY
Copy link

Alibaba-HZY commented Dec 10, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

In roadmap,we plan to support the spark engine.
for kv table spark can support micro batch read, batch read, streaming writer and batch writer.
for log table spark can support micro batch read, streaming writer and batch writer.

According to @wuchong‘s suggestion,i will create some sub-tasks.

Task list

Solution

In task "streaming read" I'll implement it and introduce some base classes
such as
Introduce flussSparkTable implements org.apache.spark.sql.connector.catalog.Table
Introduce SparkCatalog implements org.apache.spark.sql.connector.catalog.TableCatalog
Introduce SparkInternalRow and SparkTypeUtils

Anything else?

No response

Willingness to contribute

  • I'm willing to submit a PR!
@Alibaba-HZY Alibaba-HZY added the feature New feature or request label Dec 10, 2024
@XuQianJin-Stars
Copy link
Contributor

Which versions of Spark are supported?

@wuchong
Copy link
Member

wuchong commented Dec 14, 2024

Thanks @Alibaba-HZY for creating this issue! Could you create some sub-tasks that others can help together?

Regarding the priorities, because of Fluss is a streaming storage, IMO, the priorities can be streaming read > batch read > union read > streaming write > batch write.

Regarding the versions, let's start to support from Spark 3.5, and extend to Spark 3.4, 3.3 in the future.

@Alibaba-HZY Alibaba-HZY changed the title [Feature] Support spark engine [Umbrella][Feature] Support spark engine Dec 16, 2024
@Alibaba-HZY
Copy link
Author

Thanks @Alibaba-HZY for creating this issue! Could you create some sub-tasks that others can help together?

Regarding the priorities, because of Fluss is a streaming storage, IMO, the priorities can be streaming read > batch read > union read > streaming write > batch write.

Regarding the versions, let's start to support from Spark 3.5, and extend to Spark 3.4, 3.3 in the future.

ok done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants