You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: website/docs/streaming-lakehouse/overview.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ sidebar_position: 1
8
8
9
9
Lakehouse represents a new, open architecture that combines the best elements of data lakes and data warehouses.
10
10
It combines data lake scalability and cost-effectiveness with data warehouse reliability and performance.
11
-
The wellknown data lake format such like [Apache Iceberg](https://iceberg.apache.org/), [Apache Paimon](https://paimon.apache.org/), [Apache Hudi](https://hudi.apache.org/) and [Delta Lake](https://delta.io/) play key roles in the Lakehouse architecture,
11
+
The well-known data lake format such like [Apache Iceberg](https://iceberg.apache.org/), [Apache Paimon](https://paimon.apache.org/), [Apache Hudi](https://hudi.apache.org/) and [Delta Lake](https://delta.io/) play key roles in the Lakehouse architecture,
12
12
facilitating a harmonious balance between data storage, reliability, and analytical capabilities within a single, unified platform.
13
13
14
14
Lakehouse, as a modern architecture, is effective in addressing the complex needs of data management and analytics.
@@ -17,7 +17,7 @@ With these data lake formats, you will get into a contradictory situation:
17
17
18
18
1. If you require low latency, then you write and commit frequently, which means many small Parquet files. This becomes inefficient for
19
19
reads which must now deal with masses of small files.
20
-
2. If you require read efficiency, then you accumulate data until you can write to large Parquet files, but this introduces
20
+
2. If you require reading efficiency, then you accumulate data until you can write to large Parquet files, but this introduces
21
21
much higher latency.
22
22
23
23
Overall, these data lake formats typically achieve data freshness at best within minute-level granularity, even under optimal usage conditions.
0 commit comments