Common Data (read and write) Patterns in StarRocks #34159
Closed
Replies: 1 comment
-
moved to https://forum.starrocks.io/t/common-data-read-and-write-patterns-in-starrocks/125 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I get this question a lot from various people about the best way to read and write data into StarRocks.
TL-DR; The gist is that as of right now, if you care about open table format it's more performant to have some other application write Iceberg or Hudi or Delta Lake and then use StarRocks as a read database query engine (which most of the value of an OLAP is it's read query performance). If you don't care about open table format, then use the StarRocks native internal format.
Scenario A: When using the default StarRocks format storage for storing your data
Although you can insert records using the mysql interface, it was not designed to insert/upsert in bulk or be fast for individual records. Generally speaking, the suggested pattern is to use one of the StarRocks data loading tools and then read the data using StarRocks via sql.
Note
If you need to import data for a one-off or POC from another database or from an open table format (data lake), you can use the external catalog feature to hook up a source and then CTAS, INSERT INTO SELECT, or INSERT INTO VALUES into a table within StarRocks.
Scenario B: When using Apache Iceberg for storing your data
We support Apache Iceberg via StarRocks' External Catalog feature. Although you can insert records using the mysql interface, it was not designed to insert/upsert in bulk or be fast for individual records. Generally speaking, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.
Scenario C: When using Apache Hudi for storing your data
We support Apache Hudi via StarRocks' External Catalog feature. As of Oct 2023, StarRocks doesn't support write to Apache Hudi. So when using Apache Hudi, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.
Scenario D: When using Delta Lake for storing your data
We support Delta Lake via StarRocks' External Catalog feature. As of Oct 2023, StarRocks doesn't support write to Delta Lake. So when using Delta Lake, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.
Scenario E: When using Apache Hive for storing your data
We support Apache Hive via StarRocks' External Catalog feature. As of Oct 2023, StarRocks doesn't support write to Apache Hive. So when using Apache Hive, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.
More info
See more FAQ on StarRocks Data Loading at https://github.com/StarRocks/starrocks/wiki/FAQ:-StarRocks-Data-Loading and our support for the various open table formats at #24659.
Beta Was this translation helpful? Give feedback.
All reactions