Skip to content

Commit

Permalink
added test code
Browse files Browse the repository at this point in the history
  • Loading branch information
wonchul committed Jan 10, 2024
1 parent 58cc488 commit f31c79b
Showing 1 changed file with 55 additions and 39 deletions.
94 changes: 55 additions & 39 deletions _posts/pytorch/2024-01-09-torchrun.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,36 +91,49 @@ torch.distributed.barrier()
- `sampler`에서 `shuffle`을 한다면, `DataLoader`에서는 하지 않는다.
- `batch_size``num_worker``gpu` 갯수로 나눈다.


```python
from torch.utils.data import DataLoader
from torch.utils.data.distributed import DistributedSampler

train_sampler = DistributedSampler(dataset=train_dataset, shuffle=True)
val_sampler = DistributedSampler(dataset=val_dataset, shuffle=False)

train_dataloader = DataLoader(dataset=train_dataset,
batch_size=int(args.batch_size/args.world_size),
shuffle=False,
num_workers=int(len(args.device_ids)*4/args.world_size),
sampler=train_sampler,
pin_memory=True)

val_dataloader = DataLoader(dataset=val_dataset,
batch_size=int(args.batch_size/args.world_size),
shuffle=False,
num_workers=int(len(args.device_ids)*4/args.world_size),
sampler=val_sampler,
pin_memory=True)
```
- case 1
```python
from torch.utils.data import DataLoader
from torch.utils.data.distributed import DistributedSampler

train_sampler = DistributedSampler(dataset=train_dataset, shuffle=True)
val_sampler = DistributedSampler(dataset=val_dataset, shuffle=False)

train_dataloader = DataLoader(dataset=train_dataset,
batch_size=int(args.batch_size/args.world_size),
shuffle=False,
num_workers=int(len(args.device_ids)*4/args.world_size),
sampler=train_sampler,
pin_memory=True)

val_dataloader = DataLoader(dataset=val_dataset,
batch_size=int(args.batch_size/args.world_size),
shuffle=False,
num_workers=int(len(args.device_ids)*4/args.world_size),
sampler=val_sampler,
pin_memory=True)
```


위를 보면, `num_workers`GPU 갯수의 4배를 하였는데, 아래와 같은 이야기가 있다.
> harsv Hars Vardhan says the below in [Guidelines for assigning num_workers to DataLoader](https://discuss.pytorch.org/t/guidelines-for-assigning-num-workers-to-dataloader/813/4):
I experimented with this a bit. I found that we should use the formula:
num_worker = 4 * num_GPU .
Though a factor of 2 and 8 also work good but lower factor (<2) significantly reduces overall performance. Here, worker has no impact on GPU memory allocation. Also, nowadays there are many CPU cores in a machine with few GPUs (<8), so the above formula is practical.


- case 2
이 때, `DistributedSampler``batch_sampler`를 사용하기 위해서는 다음과 같이 한다.
```python
train_sampler = torch.utils.data.distributed.DistributedSampler(dataset=train_dataset, shuffle=False)
train_batch_sampler = torch.utils.data.BatchSampler(train_sampler, int(batch_size/args.world_size), drop_last=True)
train_dataloader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_sampler=train_batch_sampler,
num_workers=num_workers
)
```


위를 보면, `num_workers`는 GPU 갯수의 4배를 하였는데, 아래와 같은 이야기가 있다.
> harsv Hars Vardhan says the below in [Guidelines for assigning num_workers to DataLoader](https://discuss.pytorch.org/t/guidelines-for-assigning-num-workers-to-dataloader/813/4):
I experimented with this a bit. I found that we should use the formula:
num_worker = 4 * num_GPU .
Though a factor of 2 and 8 also work good but lower factor (<2) significantly reduces overall performance. Here, worker has no impact on GPU memory allocation. Also, nowadays there are many CPU cores in a machine with few GPUs (<8), so the above formula is practical.

#### 리눅스에서 프로세스 갯수를 확인하기 위해서는 다음과 같으며, `num_workers`는 프로세스 갯수만큼 최대로 할당이 가능하다.
```cmd
cat /proc/cpuinfo | grep processor
Expand Down Expand Up @@ -168,20 +181,23 @@ if __name__ == '__main__':

train_dataset = SimpleDataset()
val_dataset = SimpleDataset()
train_sampler = torch.utils.data.distributed.DistributedSampler(dataset=train_dataset, shuffle=True)
val_sampler = torch.utils.data.distributed.DistributedSampler(dataset=val_dataset, shuffle=False)

train_dataloader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=int(batch_size/args.world_size),
shuffle=False,
num_workers=int(num_workers/args.world_size),
sampler=train_sampler)

# train_batch_sampler = torch.utils.data.BatchSampler(train_sampler, int(batch_size/args.world_size), drop_last=True)
# case 1)
# train_sampler = torch.utils.data.distributed.DistributedSampler(dataset=train_dataset, shuffle=True)
# train_dataloader = torch.utils.data.DataLoader(dataset=train_dataset,
# batch_sampler=train_batch_sampler,
# num_workers=num_workers
# )
# batch_size=int(batch_size/args.world_size),
# shuffle=False,
# num_workers=int(num_workers/args.world_size),
# sampler=train_sampler)

# case 2)
train_sampler = torch.utils.data.distributed.DistributedSampler(dataset=train_dataset, shuffle=False)
train_batch_sampler = torch.utils.data.BatchSampler(train_sampler, int(batch_size/args.world_size), drop_last=True)
train_dataloader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_sampler=train_batch_sampler,
num_workers=num_workers
)

val_dataloader = torch.utils.data.DataLoader(dataset=val_dataset,
batch_size=int(batch_size/args.world_size),
Expand Down

0 comments on commit f31c79b

Please sign in to comment.