Skip to content

Commit

Permalink
GitBook: [#65] fix: Title
Browse files Browse the repository at this point in the history
  • Loading branch information
1ambda authored and gitbook-bot committed Nov 9, 2021
1 parent 6c08caf commit 26b1bbe
Show file tree
Hide file tree
Showing 119 changed files with 181 additions and 99 deletions.
Binary file added .gitbook/assets/image (11) (1) (1) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (11) (1) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (11) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (11) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (11).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (12) (1) (1) (1) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (12) (1) (1) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (12) (1) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (12) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (12) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (12).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (14) (1) (1) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (14) (1) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (14) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (14) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (14).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (15) (1) (1) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (15) (1) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (15) (1) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (15) (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (15).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (19) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (19) (1) (1) (1).png
Binary file modified .gitbook/assets/image (19) (1) (1).png
Binary file modified .gitbook/assets/image (19) (1).png
Binary file modified .gitbook/assets/image (19).png
Binary file modified .gitbook/assets/image (2) (1) (1) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (2) (1) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (2) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (2) (1) (1) (1).png
Binary file modified .gitbook/assets/image (2) (1) (1).png
Binary file modified .gitbook/assets/image (2) (1).png
Binary file modified .gitbook/assets/image (2).png
Binary file added .gitbook/assets/image (20) (1) (1) (1).png
Binary file modified .gitbook/assets/image (20) (1) (1).png
Binary file modified .gitbook/assets/image (20) (1).png
Binary file modified .gitbook/assets/image (20).png
Binary file added .gitbook/assets/image (21) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (21) (1) (1) (1).png
Binary file modified .gitbook/assets/image (21) (1) (1).png
Binary file modified .gitbook/assets/image (21) (1).png
Binary file modified .gitbook/assets/image (21).png
Binary file added .gitbook/assets/image (22) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (22) (1) (1) (1).png
Binary file modified .gitbook/assets/image (22) (1) (1).png
Binary file modified .gitbook/assets/image (22) (1).png
Binary file modified .gitbook/assets/image (22).png
Binary file modified .gitbook/assets/image (23) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (23) (1) (1) (1).png
Binary file modified .gitbook/assets/image (23) (1) (1).png
Binary file modified .gitbook/assets/image (23) (1).png
Binary file modified .gitbook/assets/image (23).png
Binary file added .gitbook/assets/image (24) (1) (1) (1).png
Binary file modified .gitbook/assets/image (24) (1) (1).png
Binary file modified .gitbook/assets/image (24) (1).png
Binary file modified .gitbook/assets/image (24).png
Binary file added .gitbook/assets/image (25) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (25) (1) (1) (1).png
Binary file modified .gitbook/assets/image (25) (1) (1).png
Binary file modified .gitbook/assets/image (25) (1).png
Binary file modified .gitbook/assets/image (25).png
Binary file added .gitbook/assets/image (26) (1) (1) (1).png
Binary file modified .gitbook/assets/image (26) (1) (1).png
Binary file modified .gitbook/assets/image (26) (1).png
Binary file modified .gitbook/assets/image (26).png
Binary file added .gitbook/assets/image (32) (1) (1) (1).png
Binary file modified .gitbook/assets/image (32) (1) (1).png
Binary file modified .gitbook/assets/image (32) (1).png
Binary file modified .gitbook/assets/image (32).png
Binary file added .gitbook/assets/image (34) (1).png
Binary file modified .gitbook/assets/image (34).png
Binary file modified .gitbook/assets/image (5) (1) (1) (1) (1) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (5) (1) (1) (1) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (5) (1) (1) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (5) (1) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (5) (1) (1) (1) (1).png
Binary file modified .gitbook/assets/image (5) (1) (1) (1).png
Binary file modified .gitbook/assets/image (5) (1) (1).png
Binary file modified .gitbook/assets/image (5) (1).png
Binary file modified .gitbook/assets/image (5).png
8 changes: 4 additions & 4 deletions 01-data-infra/1.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
description: Data Infrastructure
---

# 1.1 데이터 파이프라인 (Full)
# 1.1 데이터 파이프라인

이번 챕터에서는 데이터 파이프라인의 구성요소에 대해 이야기 합니다. 가상의 작은 회사 우동마켓을 가정하고, 우동마켓이 성장하면서 마주치는 문제들을 데이터 인프라 컴포넌트와 함께 하나씩 알아봅니다.

Expand Down Expand Up @@ -40,7 +40,7 @@ description: Data Infrastructure

가장 쉬운 방법은 실제 서비스 DB 에 붙어 데이터를 보는것입니다. 물론 당연히 (...) 그러면 안됩니다. 따라서 AWS RDS 에 [Read Replica](https://aws.amazon.com/ko/rds/features/read-replicas/) 를 이용하면 (데이터가 적은 우동마켓의 경우에는) 아주 약간의 지연이 있을뿐, 실제 서비스 DB 를 실시간 / 비동기로 복제해 데이터를 확인할 수 있습니다. 

![AWS RDS 읽기 전용 복제본 (https://aws.amazon.com/ko/rds/features/read-replicas)](<../.gitbook/assets/image (2) (1) (1) (1) (1).png>)
![AWS RDS 읽기 전용 복제본 (https://aws.amazon.com/ko/rds/features/read-replicas)](<../.gitbook/assets/image (2) (1) (1) (1) (1) (1).png>)

[AWS Aurora](https://aws.amazon.com/ko/rds/aurora/?aurora-whats-new.sort-by=item.additionalFields.postDateTime\&aurora-whats-new.sort-order=desc) 를 RDB 로 사용한다면 내부적으로는 조금 다르게 동작할 수 있으나 마찬가지로 [Aurora Read Replica](https://docs.aws.amazon.com/ko\_kr/AmazonRDS/latest/AuroraUserGuide/Aurora.Replication.html) 를 추가할 수 있습니다.&#x20;

Expand Down Expand Up @@ -131,7 +131,7 @@ Presto 는 JDBC / ODBC 를 통해 사용이 가능하며, HTTP API 로도 쿼리

쿼리를 모르는 사용자도, 자신이 사용하는 Google Sheet 를 분석 도구에 연동하고 [Widget](https://redash.io/help/user-guide/querying/query-parameters) 을 활용해 데이터 분석 마우스만으로도 운영성 데이터를 뽑아낼 수 있습니다.

![Redash Query Parameter (https://redash.io/help/user-guide/querying/query-parameters)](<../.gitbook/assets/image (5) (1) (1) (1) (1) (1) (1) (1).png>)
![Redash Query Parameter (https://redash.io/help/user-guide/querying/query-parameters)](<../.gitbook/assets/image (5) (1) (1) (1) (1) (1) (1) (1) (1).png>)



Expand Down Expand Up @@ -163,7 +163,7 @@ Presto 는 JDBC / ODBC 를 통해 사용이 가능하며, HTTP API 로도 쿼리

\[그림] RDS 데이터를 주기적으로 퍼오는 그림 S3

![RDB Replica Pattern ()https://aws.amazon.com/ko/blogs/database/building-data-lakes-and-implementing-data-retention-policies-with-amazon-rds-snapshot-export-to-amazon-s3/](<../.gitbook/assets/image (5) (1) (1) (1) (1) (1) (1).png>)
![RDB Replica Pattern ()https://aws.amazon.com/ko/blogs/database/building-data-lakes-and-implementing-data-retention-policies-with-amazon-rds-snapshot-export-to-amazon-s3/](<../.gitbook/assets/image (5) (1) (1) (1) (1) (1) (1) (1).png>)

![S3 Export Pattern (https://aws.amazon.com/ko/blogs/database/building-data-lakes-and-implementing-data-retention-policies-with-amazon-rds-snapshot-export-to-amazon-s3/)](<../.gitbook/assets/image (1) (1).png>)

Expand Down
2 changes: 1 addition & 1 deletion 01-data-infra/1.2-processing.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# 1.2 데이터 가공 (Processing)

아직 미공개 된 챕터입니다 (2021. 12. 10 이후 공개)
실습 과정입니aPractical AWS Pipeline 의 실습 과정입니다.다.
2 changes: 1 addition & 1 deletion 01-data-infra/1.2.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ description: 리

# 1.2 데이터 입수 (Ingestion)

아직 미공개 된 챕터입니다 (2021. 12. 10 이후 공개)
Practical AWS Pipeline 의 실습 과정입니다.
2 changes: 1 addition & 1 deletion 01-data-infra/1.3-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ description: 소

# 1.3 데이터 저장 (Storage)

아직 미공개 된 챕터입니다 (2021. 12. 10 이후 공개)
Practical AWS Pipeline 의 실습 과정입니다.
2 changes: 1 addition & 1 deletion 01-data-infra/1.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ description: 석

# 1.4 데이터 분석 (Analysis)

아직 미공개 된 챕터입니다 (2021. 12. 10 이후 공개)
Practical AWS Pipeline 의 실습 과정입니다.
6 changes: 3 additions & 3 deletions 02-processing/2.2-batch/2.1.2-spark-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ root
<bound method RDD.id of MapPartitionsRDD[31] at javaToPython at NativeMethodAccessorImpl.java:0>
```

![](<../../.gitbook/assets/image (2) (1) (1) (1) (1) (1) (1).png>)![](<../../.gitbook/assets/image (1) (1) (1) (1) (1) (1) (1).png>)
![](<../../.gitbook/assets/image (2) (1) (1) (1) (1) (1) (1) (1).png>)![](<../../.gitbook/assets/image (1) (1) (1) (1) (1) (1) (1).png>)

위 스크린샷은 [Spark UI](https://spark.apache.org/docs/latest/web-ui.html) 에서 확인할 수 있는 Stage 정보로, `toPandas()` 를 호출하기 까지 실행되는 Spark 연산입니다.

Expand Down Expand Up @@ -440,7 +440,7 @@ Dataset API 는 Type 지원되는 언어인 Scala / Java 를 지원합니다. Py

****

![Catalyst Optimizer Overview (https://blog.bi-geek.com/en/spark-sql-optimizador-catalyst/)](<../../.gitbook/assets/image (23) (1) (1).png>)
![Catalyst Optimizer Overview (https://blog.bi-geek.com/en/spark-sql-optimizador-catalyst/)](<../../.gitbook/assets/image (23) (1) (1) (1).png>)



Expand All @@ -450,7 +450,7 @@ Dataset API 는 Type 지원되는 언어인 Scala / Java 를 지원합니다. Py



![Catalyst Optimizer (Databricks Slide)](<../../.gitbook/assets/image (2) (1).png>)
![Catalyst Optimizer (Databricks Slide)](<../../.gitbook/assets/image (2) (1) (1).png>)



Expand Down
18 changes: 9 additions & 9 deletions 02-processing/2.2-batch/2.1.3-spark-concept.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,17 @@

RDD 의 실제 동작은 크게 Transformation 과 Action 으로 구분할 수 있습니다.

![Spark Transformation and Action (https://medium.com/analytics-vidhya/spark-rdd-low-level-api-basics-using-pyspark-a9a322b58f6)](<../../.gitbook/assets/image (5).png>)
![Spark Transformation and Action (https://medium.com/analytics-vidhya/spark-rdd-low-level-api-basics-using-pyspark-a9a322b58f6)](<../../.gitbook/assets/image (5) (1).png>)



filter, map 등 다양한 **Tranformation** 을 적용하는 것을 반복하다가 최종적으로는 count() 와 같은 **Action** 을 호출해 결과를 만들어 냅니다. Spark UI 에서는 아래처럼 Transformation, Action 울 시각적으로 볼 수 있습니다.&#x20;

![Spark UI - Transformation and Action](<../../.gitbook/assets/image (21) (1) (1).png>)
![Spark UI - Transformation and Action](<../../.gitbook/assets/image (21) (1) (1) (1).png>)

&#x20;

![Spark Multiple Actions (Slide)](<../../.gitbook/assets/image (26) (1) (1).png>)
![Spark Multiple Actions (Slide)](<../../.gitbook/assets/image (26) (1) (1) (1).png>)

Spark 에서는 한 DataFrame 에 대해 여러번 **Action** 을 호출하거나, DataFrame 을 일부 변경해 (**Transformation**) 다시 **Action** 을 호출하는 것도 가능합니다. 이를 그림으로 표현해 보면 위와 같습니다.

Expand Down Expand Up @@ -297,7 +297,7 @@ RDB (MySQL 등) 의 Partition 과 Hive 의 Partition 과 Kafka 의 Partition 과

데이터 시스템에서 Partition 이란, 전체에서 나누어진 "부분" 을 말합니다.

![AWS RDS Sharding (https://aws.amazon.com/blogs/database/sharding-with-amazon-relational-database-service/)](<../../.gitbook/assets/image (22) (1).png>)
![AWS RDS Sharding (https://aws.amazon.com/blogs/database/sharding-with-amazon-relational-database-service/)](<../../.gitbook/assets/image (22) (1) (1).png>)



Expand Down Expand Up @@ -452,7 +452,7 @@ dfConverted.repartition(col("id"))



![](<../../.gitbook/assets/image (6) (1) (1).png>)![](<../../.gitbook/assets/image (19) (1) (1).png>)
![](<../../.gitbook/assets/image (6) (1) (1).png>)![](<../../.gitbook/assets/image (19) (1) (1) (1).png>)

Transformation Types ([Databricks Blog](https://databricks.com/glossary/what-are-transformations))

Expand All @@ -465,7 +465,7 @@ Shuffle 관련하여 추가적으로 알아야 할 Transformation 의 구분이



![Deep Dive Into Spark Transformation and Action](<../../.gitbook/assets/image (20) (1).png>)
![Deep Dive Into Spark Transformation and Action](<../../.gitbook/assets/image (20) (1) (1).png>)



Expand Down Expand Up @@ -519,7 +519,7 @@ Spark Job / Batch (배치) Job 이란 용어를 많이 사용하곤 합니다.



![Spark RDD and Stages 1 (Link)](<../../.gitbook/assets/image (15) (1) (1) (1).png>)
![Spark RDD and Stages 1 (Link)](<../../.gitbook/assets/image (15) (1) (1) (1) (1).png>)

![Spark RDD and Stages 2 (Link)](<../../.gitbook/assets/image (13) (1) (1).png>)

Expand All @@ -540,7 +540,7 @@ Spark 가 실행되고 Action 을 호출하면 실행 계획이 생성되고, Ex



![Spark Dag Schedulers 2 (Link)](<../../.gitbook/assets/image (24) (1).png>)
![Spark Dag Schedulers 2 (Link)](<../../.gitbook/assets/image (24) (1) (1).png>)



Expand All @@ -559,5 +559,5 @@ Stage 도 2가지로 구분지을 수 있습니다.

이제까지 설명한 내용을 그림으로 한번에 표현해보면 다음과 같습니다.

![](<../../.gitbook/assets/image (14) (1) (1).png>)
![](<../../.gitbook/assets/image (14) (1) (1) (1).png>)

10 changes: 5 additions & 5 deletions 02-processing/2.2-batch/2.1.4-spark-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Spark 는 크게 두 가지 컴포넌트로 구성되어 있습니다. **Driver*
Cluster Manager 는 본 챕터의 아래 섹션에서 설명하겠지만, 다수개의 Spark 작업을 실행할 수 있도록 리소스를 관리해주는 Hadoop / AWS EMR 의 Yarn 혹은 Kubernetes 와 같은 클러스터를 말합니다.\


![Spark Cluster Mode Overview (https://spark.apache.org/docs/latest/cluster-overview.html)](<../../.gitbook/assets/image (2) (1) (1) (1) (1) (1).png>)
![Spark Cluster Mode Overview (https://spark.apache.org/docs/latest/cluster-overview.html)](<../../.gitbook/assets/image (2) (1) (1) (1) (1) (1) (1).png>)

\
\
Expand All @@ -25,7 +25,7 @@ Spring Boot 와 같은 단일 머신을 위한 API Framework 와 다르게 Spark



![Spark Collect Overview (Link)](<../../.gitbook/assets/image (26) (1).png>)
![Spark Collect Overview (Link)](<../../.gitbook/assets/image (26) (1) (1).png>)



Expand Down Expand Up @@ -91,7 +91,7 @@ spark.driver.memory # Driver 에서 사용할 메모리 GiB



![Spark Executor Overview (Link)](<../../.gitbook/assets/image (14) (1).png>)
![Spark Executor Overview (Link)](<../../.gitbook/assets/image (14) (1) (1).png>)

\
**Executor** 는 Spark Driver 에서 요청한 작업을 분산처리 하거나 cache() 로 데이터를 분산 저장한 값을 들고 있습니다. 사용자 요청에 따라 갯수와 리소스를 조절할 수 있으며, 다음의 옵션을 사용합니다.
Expand Down Expand Up @@ -130,7 +130,7 @@ df.foreach(persist)

![foreachRDD 1 (Spark Streaming Programming Techniques)](<../../.gitbook/assets/image (27) (1).png>)

![foreachRDD 2 (Spark Streaming Programming Techniques)](<../../.gitbook/assets/image (32) (1).png>)
![foreachRDD 2 (Spark Streaming Programming Techniques)](<../../.gitbook/assets/image (32) (1) (1).png>)

\
\
Expand Down Expand Up @@ -256,7 +256,7 @@ Client 모드와 Cluster 는 다음처럼 구분할 수 있습니다. 작업의

![Spark Client Mode (출처 표기)](<../../.gitbook/assets/image (4) (1) (1) (1) (1) (1) (1) (1) (1).png>)

![Spark Cluster Mode (출처 표기)](<../../.gitbook/assets/image (5) (1) (1) (1) (1) (1) (1) (1) (1) (1).png>)
![Spark Cluster Mode (출처 표기)](<../../.gitbook/assets/image (5) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1).png>)



Expand Down
18 changes: 9 additions & 9 deletions 02-processing/2.2-batch/2.1.5-spark-memory-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Spark 는 JVM 위에서 실행됩니다. PySpark 를 쓰는 경우에는 외부



![Spark Local Mode (Spark in Action)](<../../.gitbook/assets/image (12) (1) (1) (1).png>)
![Spark Local Mode (Spark in Action)](<../../.gitbook/assets/image (12) (1) (1) (1) (1).png>)

![Spark Client Mode (Spark in Action)](<../../.gitbook/assets/image (18) (1) (1) (1) (1).png>)

Expand Down Expand Up @@ -92,9 +92,9 @@ spark.driver.cores, **spark.executor.memory** 와 같은 옵션은 개별 컴포



![Spark < 1.6 Memory (Decoding Memory in Spark)](<../../.gitbook/assets/image (25) (1) (1) (1).png>)
![Spark < 1.6 Memory (Decoding Memory in Spark)](<../../.gitbook/assets/image (25) (1) (1) (1) (1).png>)

![Spark >= 1.6 Memory (Decoding Memory in Spark)](<../../.gitbook/assets/image (24) (1) (1).png>)
![Spark >= 1.6 Memory (Decoding Memory in Spark)](<../../.gitbook/assets/image (24) (1) (1) (1).png>)



Expand Down Expand Up @@ -241,7 +241,7 @@ GC 를 피하기 위해 Off-heap 기능을 이용할 수 있지만, 그렇지


![PySpark Memory Configuration (Decoding Memory in Spark)](<../../.gitbook/assets/image (14) (1) (1) (1).png>)
![PySpark Memory Configuration (Decoding Memory in Spark)](<../../.gitbook/assets/image (14) (1) (1) (1) (1).png>)



Expand All @@ -266,7 +266,7 @@ PySpark 를 사용한다면 다음 두 가지의 메모리 옵션을 설정할

[AWS EMR](https://aws.amazon.com/ko/emr/) 은 AWS 가 제공하는 관리형 빅데이터 클러스터입니다.&#x20;

![On-prem Hadoop vs AWS EMR )https://aws.amazon.com/blogs/big-data/migrate-and-deploy-your-apache-hive-metastore-on-amazon-emr/\_](<../../.gitbook/assets/image (5) (1).png>)
![On-prem Hadoop vs AWS EMR )https://aws.amazon.com/blogs/big-data/migrate-and-deploy-your-apache-hive-metastore-on-amazon-emr/\_](<../../.gitbook/assets/image (5) (1) (1).png>)



Expand All @@ -289,7 +289,7 @@ yarn.node-labels.am.default-node-label-expression: 'CORE'



![EMR Node Types (EMR Best Practices)](<../../.gitbook/assets/image (23) (1) (1) (1) (1).png>)
![EMR Node Types (EMR Best Practices)](<../../.gitbook/assets/image (23) (1) (1) (1) (1) (1).png>)



Expand Down Expand Up @@ -402,15 +402,15 @@ EMR 사용시 maximizeResourceAllocation 옵션을 이용하면 사용자가 Dri

### Spark on Kubernetes

![Spark on Kubernetes (https://aws.amazon.com/blogs/compute/running-cost-optimized-spark-workloads-on-kubernetes-using-ec2-spot-instances/)](<../../.gitbook/assets/image (21) (1) (1) (1).png>)
![Spark on Kubernetes (https://aws.amazon.com/blogs/compute/running-cost-optimized-spark-workloads-on-kubernetes-using-ec2-spot-instances/)](<../../.gitbook/assets/image (21) (1) (1) (1) (1).png>)



Spark 3.1+ 부터 Kubernetes 를 Cluster Manager 로 사용할 수 있습니다. (GA 버전 기준) Spark 를 Kubernetes 에서 사용할 경우 EMR 이 제공하는 몇몇 특화 기능들은 사용할 수 없지만 ([EMRFS S3-Optimized Commiter](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-committer-reqs.html), [EMR Decomission](https://aws.amazon.com/blogs/big-data/spark-enhancements-for-elasticity-and-resiliency-on-amazon-emr/), [EMR Autoscaling](2.1.5-spark-memory-management.md#spark-memory-config) 등), 그럼에도 몇 가지 이점들이 있습니다.



![EMR on EMR vs EMR on EKS (AWS Blog)](<../../.gitbook/assets/image (20) (1) (1).png>)
![EMR on EMR vs EMR on EKS (AWS Blog)](<../../.gitbook/assets/image (20) (1) (1) (1).png>)



Expand Down Expand Up @@ -479,7 +479,7 @@ Arrow 와 같은 공통화된 메모리 직렬 포맷을 이용한다면 Seriali



![Python and JVM Processes Comm. (https://www.edureka.co/blog/spark-with-python-pyspark)](<../../.gitbook/assets/image (23) (1) (1) (1).png>)
![Python and JVM Processes Comm. (https://www.edureka.co/blog/spark-with-python-pyspark)](<../../.gitbook/assets/image (23) (1) (1) (1) (1).png>)



Expand Down
10 changes: 5 additions & 5 deletions 02-processing/2.2-batch/2.1.x-spark-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ reviews.csv -> airbnb_reviews.csv



![Partition and Cache (https://www.nvidia.com/ko-kr/ai-data-science/spark-ebook/introduction-spark-processing/)](<../../.gitbook/assets/image (5) (1) (1) (1).png>)
![Partition and Cache (https://www.nvidia.com/ko-kr/ai-data-science/spark-ebook/introduction-spark-processing/)](<../../.gitbook/assets/image (5) (1) (1) (1) (1).png>)



Expand Down Expand Up @@ -102,7 +102,7 @@ Spark UI 를 확인해보면 다음의 내용을 확인할 수 있습니다.



![Spark UI - Executors Tab](<../../.gitbook/assets/image (11) (1) (1) (1).png>)
![Spark UI - Executors Tab](<../../.gitbook/assets/image (11) (1) (1) (1) (1).png>)

![Spark UI - Storage Tab](<../../.gitbook/assets/image (7) (1) (1) (1) (1).png>)

Expand Down Expand Up @@ -185,7 +185,7 @@ Spark 는 여러 옵션들로 메모리를 조정합니다. 추후에 Spark Memo



![Spark Memory Management (https://www.tutorialdocs.com/article/spark-memory-management.html)](<../../.gitbook/assets/image (12) (1) (1) (1) (1) (1).png>)
![Spark Memory Management (https://www.tutorialdocs.com/article/spark-memory-management.html)](<../../.gitbook/assets/image (12) (1) (1) (1) (1) (1) (1).png>)



Expand All @@ -197,9 +197,9 @@ Spark 는 여러 옵션들로 메모리를 조정합니다. 추후에 Spark Memo



![](<../../.gitbook/assets/image (14) (1) (1) (1) (1).png>)
![](<../../.gitbook/assets/image (14) (1) (1) (1) (1) (1).png>)

![](<../../.gitbook/assets/image (15) (1) (1) (1) (1).png>)
![](<../../.gitbook/assets/image (15) (1) (1) (1) (1) (1).png>)

Spark UI 의 Environment 탭을 살펴보면 다음처럼 메모리 설정을 볼 수 있습니다.

Expand Down
12 changes: 6 additions & 6 deletions 02-processing/2.2-batch/2.1.x-spark-dataframe.md
Original file line number Diff line number Diff line change
Expand Up @@ -297,9 +297,9 @@ df\



![Window Function Basic 1 (learnsql.com)](<../../.gitbook/assets/image (34).png>)
![Window Function Basic 1 (learnsql.com)](<../../.gitbook/assets/image (34) (1).png>)

![Window Function Basic 2 (learnsql.com)](<../../.gitbook/assets/image (19) (1).png>)
![Window Function Basic 2 (learnsql.com)](<../../.gitbook/assets/image (19) (1) (1).png>)

\
대부분의 데이터 처리 프레임워크는 Window Function 을 지원합니다.
Expand Down Expand Up @@ -362,7 +362,7 @@ spark.sql("""

#### 1. 전체 기간동안 브랜드별로 가장 많이 팔린 (판매 금액 총합이 높은) 상품 카테고리

![Window Function Ranking (learnsql.com)](<../../.gitbook/assets/image (24).png>)
![Window Function Ranking (learnsql.com)](<../../.gitbook/assets/image (24) (1).png>)

```sql
spark.sql("""
Expand Down Expand Up @@ -530,7 +530,7 @@ ORDER BY event_date ASC, rank ASC

#### 3. 전체 기간동안 브랜드별 매출(판매 금액의 합) 을 구하되, 자신보다 한단계 높은 순위 또는 낮은 순위의 매출도 같이 표시하기

![Window Function Lag and Lead (learnsql.com)](<../../.gitbook/assets/image (12).png>)
![Window Function Lag and Lead (learnsql.com)](<../../.gitbook/assets/image (12) (1).png>)

```sql
spark.sql("""
Expand Down Expand Up @@ -604,7 +604,7 @@ rank() 와 dense\_rank() 의 차이는 무엇일까요? row\_number() 등 다른

#### 4. 일별로 모든 브랜드를 통틀어, 판매 금액의 합산을 누적으로 구하면 매출의 변화량은 어떤지 살펴보기

![Window Function Relative Rows 1 (learnsql.com)](<../../.gitbook/assets/image (25) (1).png>)
![Window Function Relative Rows 1 (learnsql.com)](<../../.gitbook/assets/image (25) (1) (1).png>)

![Window Function Relative Rows 2 (learnsql.com)](<../../.gitbook/assets/image (6) (1).png>)

Expand Down Expand Up @@ -680,7 +680,7 @@ Shuffle 이 발생할까요? 발생한다면 어떻게 데이터가 나누어지

### Window Function - Attribution

![Attribution Model Overview (https://www.silverdisc.co.uk/blog/2017/09/22/attribution-modelling-%E2%80%93-moving-away-last-click-attribution)](<../../.gitbook/assets/image (5) (1) (1) (1) (1) (1).png>)
![Attribution Model Overview (https://www.silverdisc.co.uk/blog/2017/09/22/attribution-modelling-%E2%80%93-moving-away-last-click-attribution)](<../../.gitbook/assets/image (5) (1) (1) (1) (1) (1) (1).png>)



Expand Down
Loading

0 comments on commit 26b1bbe

Please sign in to comment.