From 6e1aa804d3e7f1a9bb9a63f908dcb0dcb4435fec Mon Sep 17 00:00:00 2001
From: wonchul <kim.wonchul@aiv.ai>
Date: Wed, 10 Jan 2024 14:15:51 +0900
Subject: [PATCH] added test code

---
 _posts/pytorch/2024-01-09-torchrun.md | 39 ++++++++++++++-------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/_posts/pytorch/2024-01-09-torchrun.md b/_posts/pytorch/2024-01-09-torchrun.md
index 8f84e90..21cad46 100644
--- a/_posts/pytorch/2024-01-09-torchrun.md
+++ b/_posts/pytorch/2024-01-09-torchrun.md
@@ -80,7 +80,7 @@ torch.distributed.init_process_group(
 )
 torch.distributed.barrier()
 ```
-
+------------------------------------------------------------------
 ### Dataset
 
 #### `DistributedSampler`를 사용해야하며, 아래의 코드에서처럼 
@@ -118,29 +118,12 @@ I experimented with this a bit. I found that we should use the formula:
 num_worker = 4 * num_GPU .
 Though a factor of 2 and 8 also work good but lower factor (<2) significantly reduces overall performance. Here, worker has no impact on GPU memory allocation. Also, nowadays there are many CPU cores in a machine with few GPUs (<8), so the above formula is practical.
 
-
 #### 리눅스에서 프로세스 갯수를 확인하기 위해서는 다음과 같으며, `num_workers`는 프로세스 갯수만큼 최대로 할당이 가능하다.
 ```cmd
 cat /proc/cpuinfo | grep processor
 ```
 
-### Model
-
-#### `DistributedDataParallel`
-```python
-from torch.nn.parallel import DistributedDataParallel as DDP 
-model = model.cuda(args.rank)
-model = DDP(module=model, device_ids=[args.rank])
-```
-
-### Train
-
-학습이 진행될 때, 매 epoch가 시작하는 시점에서 `sampler`의 `set_epoch()`를 실행해여 `shuffle`이 동작하도록 해야한다.
-```python
-train_sampler.set_epoch(epoch)
-```
-
-## DEMO
+#### DEMO
 ```python
 # dataset.py
 class SimpleDataset(torch.utils.data.Dataset):
@@ -318,6 +301,24 @@ cost 4.337230443954468 ms
 위와 같이, 1 ~ 20까지의 data에 대해서 GPU0, GPU1이 각각 배치를 생성하면서 진행되고, train_dataset의 경우 `shuffle`을 하여 매 epoch마다 순서가 다르다.
 코드는 [github](https://github.com/wonchul-kim/distributed_training)에서 참고 가능합니다.
 
+------------------------------------------------------------------
+### Model
+
+#### `DistributedDataParallel`
+```python
+from torch.nn.parallel import DistributedDataParallel as DDP 
+model = model.cuda(args.rank)
+model = DDP(module=model, device_ids=[args.rank])
+```
+------------------------------------------------------------------
+### Train
+
+학습이 진행될 때, 매 epoch가 시작하는 시점에서 `sampler`의 `set_epoch()`를 실행해여 `shuffle`이 동작하도록 해야한다.
+```python
+train_sampler.set_epoch(epoch)
+```
+
+
 ## references:
 
 - [Guidelines for assigning num_workers to DataLoader](https://discuss.pytorch.org/t/guidelines-for-assigning-num-workers-to-dataloader/813/4)