From 6e1aa804d3e7f1a9bb9a63f908dcb0dcb4435fec Mon Sep 17 00:00:00 2001 From: wonchul Date: Wed, 10 Jan 2024 14:15:51 +0900 Subject: [PATCH] added test code --- _posts/pytorch/2024-01-09-torchrun.md | 39 ++++++++++++++------------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/_posts/pytorch/2024-01-09-torchrun.md b/_posts/pytorch/2024-01-09-torchrun.md index 8f84e90..21cad46 100644 --- a/_posts/pytorch/2024-01-09-torchrun.md +++ b/_posts/pytorch/2024-01-09-torchrun.md @@ -80,7 +80,7 @@ torch.distributed.init_process_group( ) torch.distributed.barrier() ``` - +------------------------------------------------------------------ ### Dataset #### `DistributedSampler`를 사용해야하며, 아래의 코드에서처럼 @@ -118,29 +118,12 @@ I experimented with this a bit. I found that we should use the formula: num_worker = 4 * num_GPU . Though a factor of 2 and 8 also work good but lower factor (<2) significantly reduces overall performance. Here, worker has no impact on GPU memory allocation. Also, nowadays there are many CPU cores in a machine with few GPUs (<8), so the above formula is practical. - #### 리눅스에서 프로세스 갯수를 확인하기 위해서는 다음과 같으며, `num_workers`는 프로세스 갯수만큼 최대로 할당이 가능하다. ```cmd cat /proc/cpuinfo | grep processor ``` -### Model - -#### `DistributedDataParallel` -```python -from torch.nn.parallel import DistributedDataParallel as DDP -model = model.cuda(args.rank) -model = DDP(module=model, device_ids=[args.rank]) -``` - -### Train - -학습이 진행될 때, 매 epoch가 시작하는 시점에서 `sampler`의 `set_epoch()`를 실행해여 `shuffle`이 동작하도록 해야한다. -```python -train_sampler.set_epoch(epoch) -``` - -## DEMO +#### DEMO ```python # dataset.py class SimpleDataset(torch.utils.data.Dataset): @@ -318,6 +301,24 @@ cost 4.337230443954468 ms 위와 같이, 1 ~ 20까지의 data에 대해서 GPU0, GPU1이 각각 배치를 생성하면서 진행되고, train_dataset의 경우 `shuffle`을 하여 매 epoch마다 순서가 다르다. 코드는 [github](https://github.com/wonchul-kim/distributed_training)에서 참고 가능합니다. +------------------------------------------------------------------ +### Model + +#### `DistributedDataParallel` +```python +from torch.nn.parallel import DistributedDataParallel as DDP +model = model.cuda(args.rank) +model = DDP(module=model, device_ids=[args.rank]) +``` +------------------------------------------------------------------ +### Train + +학습이 진행될 때, 매 epoch가 시작하는 시점에서 `sampler`의 `set_epoch()`를 실행해여 `shuffle`이 동작하도록 해야한다. +```python +train_sampler.set_epoch(epoch) +``` + + ## references: - [Guidelines for assigning num_workers to DataLoader](https://discuss.pytorch.org/t/guidelines-for-assigning-num-workers-to-dataloader/813/4)