How to query data in Apache Hudi using StarRocks #22947
Closed
Replies: 2 comments 1 reply
-
thank you |
Beta Was this translation helpful? Give feedback.
0 replies
-
https://dev.to/michaelmt66/build-an-open-source-lakehouse-with-minimun-code-effort-spark-hudi-dbt-hivemetastore-trino-1amg for another example. You can swap out trino for starrocks. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Prerequisites
For this tutorial, you need to:
Have Docker Desktop or podman container runtime installed
This is out of scope for the tutorial.
Have a MySQL client
This is out of scope for the tutorial.
A StarRocks or CelerData database cluster
This is out of scope for the tutorial.
An Apache Hudi environment
This is out of scope for the tutorial. However, we are using the Apache Hudi docker container for this tutorial.
Note
This article was written for Apache Hudi 0.13.1
Configure Apache Hudi
Follow the Hudi Docker Quickstart at https://hudi.apache.org/docs/docker_demo
Modify
docker-compose_hadoop284_hive233_spark244.yml
ordocker-compose_hadoop284_hive233_spark244_mac_aarch64.yml
to include starrocks in the hudi docker compose. Also you need to apply apache/hudi#8700 if they haven't merged it in yet to fix the docker networking issues.Do all the steps in the hudi docker compose quickstart. When you can do a show tables with beehive, you know that tables are ready and SR should be able to connect. It should look like this:
Create a external catalog and query the data.
Login to the SR container within the hudi docker compose
then execute the sql commands to read from the COW tables.
Tip: It should be faster the second time you run the query due to StarRock's query caching is that enabled out of the box.
Note: The Hudi quickstart stores all the data in a HDFS container in the Docker Compose. Therefore we don't need the S3 location and credentials.
Output should look similar to this:
Run it again and it'll execute against the query cache.
Issue (workaround provided): Hudi exception reading data. com.google.common.util.concurrent.UncheckedExecutionException #23374
Also I would recommend you try querying the MOR tables.
If you run into any issues, I would view the fe.log and be.logs
Beta Was this translation helpful? Give feedback.
All reactions