Common Data (read and write) Patterns in StarRocks #34159

alberttwong · 2023-11-01T18:13:45Z

alberttwong
Nov 1, 2023

I get this question a lot from various people about the best way to read and write data into StarRocks.

TL-DR; The gist is that as of right now, if you care about open table format it's more performant to have some other application write Iceberg or Hudi or Delta Lake and then use StarRocks as a read database query engine (which most of the value of an OLAP is it's read query performance). If you don't care about open table format, then use the StarRocks native internal format.

Scenario A: When using the default StarRocks format storage for storing your data

Although you can insert records using the mysql interface, it was not designed to insert/upsert in bulk or be fast for individual records. Generally speaking, the suggested pattern is to use one of the StarRocks data loading tools and then read the data using StarRocks via sql.

Use Case	Technique
INSERT/UPSERT individual record	mysql SQL statements (not as performant as other options), stream load or one of the StarRocks data loading tools (recommended)
INSERT/UPSERT bulk records	stream load or one of the StarRocks data loading tools
SELECT	mysql SQL statements
CREATE	mysql SQL statements
DELETE	mysql SQL statements

Note

If you need to import data for a one-off or POC from another database or from an open table format (data lake), you can use the external catalog feature to hook up a source and then CTAS, INSERT INTO SELECT, or INSERT INTO VALUES into a table within StarRocks.

Scenario B: When using Apache Iceberg for storing your data

We support Apache Iceberg via StarRocks' External Catalog feature. Although you can insert records using the mysql interface, it was not designed to insert/upsert in bulk or be fast for individual records. Generally speaking, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.

Use Case	Technique
INSERT/UPSERT individual record	mysql SQL statements (not as performant as other options), Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Iceberg
INSERT/UPSERT bulk records	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Iceberg
SELECT	mysql SQL statements
CREATE	mysql SQL statements (limited), Apache Spark or Apache Spark SQL, other tool that can write Apache Iceberg
DELETE	mysql SQL statements, Apache Spark or Apache Spark SQL, other tool that can write Apache Iceberg

Scenario C: When using Apache Hudi for storing your data

We support Apache Hudi via StarRocks' External Catalog feature. As of Oct 2023, StarRocks doesn't support write to Apache Hudi. So when using Apache Hudi, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.

Use Case	Technique
INSERT/UPSERT individual record	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi
INSERT/UPSERT bulk records	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi
SELECT	mysql SQL statements
CREATE	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi
DELETE	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi

Scenario D: When using Delta Lake for storing your data

We support Delta Lake via StarRocks' External Catalog feature. As of Oct 2023, StarRocks doesn't support write to Delta Lake. So when using Delta Lake, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.

Use Case	Technique
INSERT/UPSERT individual record	Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake
INSERT/UPSERT bulk records	Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake
SELECT	mysql SQL statements
CREATE	Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake
DELETE	Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake

Scenario E: When using Apache Hive for storing your data

We support Apache Hive via StarRocks' External Catalog feature. As of Oct 2023, StarRocks doesn't support write to Apache Hive. So when using Apache Hive, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.

Use Case	Technique
INSERT/UPSERT individual record	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hive
INSERT/UPSERT bulk records	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hive
SELECT	mysql SQL statements
CREATE	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hive
DELETE	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hive

More info

See more FAQ on StarRocks Data Loading at https://github.com/StarRocks/starrocks/wiki/FAQ:-StarRocks-Data-Loading and our support for the various open table formats at #24659.

alberttwong · 2024-01-26T00:03:39Z

alberttwong
Jan 26, 2024
Author

moved to https://forum.starrocks.io/t/common-data-read-and-write-patterns-in-starrocks/125

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common Data (read and write) Patterns in StarRocks #34159

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Common Data (read and write) Patterns in StarRocks #34159

alberttwong Nov 1, 2023

Scenario A: When using the default StarRocks format storage for storing your data

Scenario B: When using Apache Iceberg for storing your data

Scenario C: When using Apache Hudi for storing your data

Scenario D: When using Delta Lake for storing your data

Scenario E: When using Apache Hive for storing your data

More info

Replies: 1 comment

alberttwong Jan 26, 2024 Author

alberttwong
Nov 1, 2023

alberttwong
Jan 26, 2024
Author