You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I just want to train a small version of RWKV-V5-169m model from scratch
I implement it with huggingface:
import torch
from transformers import AutoTokenizer, AutoConfig
tokenizer = AutoTokenizer.from_pretrained("RWKV/rwkv-4-169m-pile")
config = AutoConfig.from_pretrained("RWKV/rwkv-4-169m-pile")
tiny_rwkv_configs = {
"num_hidden_layers": 4,
"hidden_size": 256,
"intermediate_size": 1024,
"attention_hidden_size": 256,
"vocab_size": 20480,
}
"""
implement config with tiny_rwkv_configs:
e.g., config.num_hidden_layers = tiny_rwkv_configs['num_hidden_layers']
"""
model = AutoModelForCausalLM.from_config(config)
"""
initialize dataloader, optimizer, etc
"""
for sample in dataloader:
outputs = model(sample)
loss = outputs.loss
But, when I backward the loss, I encounter the bug:
You are using a CUDA device ('NVIDIA A100-PCIE-40GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/transformers/optimization.py:429: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
Sanity Checking: | | 0/? [00:00<?, ?it/s]/nvme1/zecheng/modelzipper/projects/state-space-model/custom_dataset/AR_ywj.py:116: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
attention_mask = torch.tensor(attention_mask, dtype=torch.long)
/nvme1/zecheng/modelzipper/projects/state-space-model/custom_dataset/AR_ywj.py:116: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
attention_mask = torch.tensor(attention_mask, dtype=torch.long)
/nvme1/zecheng/modelzipper/projects/state-space-model/custom_dataset/AR_ywj.py:116: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
attention_mask = torch.tensor(attention_mask, dtype=torch.long)
Sanity Checking DataLoader 0: 0%| | 0/1 [00:00<?, ?it/s]/nvme1/zecheng/modelzipper/projects/state-space-model/custom_dataset/AR_ywj.py:116: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
attention_mask = torch.tensor(attention_mask, dtype=torch.long)
/nvme1/zecheng/modelzipper/projects/state-space-model/custom_dataset/AR_ywj.py:116: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
attention_mask = torch.tensor(attention_mask, dtype=torch.long)
/nvme1/zecheng/modelzipper/projects/state-space-model/custom_dataset/AR_ywj.py:116: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
attention_mask = torch.tensor(attention_mask, dtype=torch.long)
/nvme1/zecheng/modelzipper/projects/state-space-model/custom_dataset/AR_ywj.py:116: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
attention_mask = torch.tensor(attention_mask, dtype=torch.long)
/nvme1/zecheng/modelzipper/projects/state-space-model/custom_dataset/AR_ywj.py:116: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
attention_mask = torch.tensor(attention_mask, dtype=torch.long)
Epoch 0: 0%| | 3/1398 [00:00<02:55, 7.95it/s, v_num=tzc, train_lm_loss=nan.0, train_ppl=[E ProcessGroupNCCL.cpp:916] [Rank 0] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7faa88159617 in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7faa8811498d in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7faa88215128 in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7faa8914b250 in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7faa8914f078 in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x250 (0x7faa89165910 in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7faa89165c18 in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xc819d (0x7faacd94619d in /home/amax/anaconda3/envs/zecheng/bin/../lib/libstdc++.so.6)
frame #8: <unknown function> + 0x8609 (0x7fab09939609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #9: clone + 0x43 (0x7fab0985e353 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 0] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7faa88159617 in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7faa8811498d in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7faa88215128 in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7faa8914b250 in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7faa8914f078 in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x250 (0x7faa89165910 in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7faa89165c18 in /home/amax/anaconda3/envs/zecheng/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xc819d (0x7faacd94619d in /home/amax/anaconda3/envs/zecheng/bin/../lib/libstdc++.so.6)
frame #8: <unknown function> + 0x8609 (0x7fab09939609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #9: clone + 0x43 (0x7fab0985e353 in /lib/x86_64-linux-gnu/libc.so.6)
Worth noting that I train the mode from scratch, and I only implement 4-layer of RWKV with custom setting, the loss becomes nan.0@www
Does anyone encounter this issue?
The text was updated successfully, but these errors were encountered:
Hi, I just want to train a small version of RWKV-V5-169m model from scratch
I implement it with huggingface:
But, when I backward the loss, I encounter the bug:
Worth noting
that I train the mode from scratch, and I only implement 4-layer of RWKV with custom setting, the loss becomesnan.0
@wwwDoes anyone encounter this issue?
The text was updated successfully, but these errors were encountered: