train error #22

ZTYyy · 2023-12-23T04:21:25Z

I ran train.py and got error below

Traceback (most recent call last): File "/public/home/wangycgroup/public/02_Data/Internal/phage/train.py", line 86, in <module> loss = model(next(train_loader)) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 326, in forward logits = self.net(x_inp, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 272, in forward x = self.transformer(x) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 245, in forward x = block(x) + x File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 206, in forward attn = self.attn(q, k, v) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) TypeError: forward() takes 2 positional arguments but 4 were given

the output is

`No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Using StableAdamWUnfused-v1

training: 0%| | 0/100000 [00:00<?, ?it/s]
training: 0%| | 0/100000 [00:00<?, ?it/s]
`

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

The text was updated successfully, but these errors were encountered:

ZTYyy · 2023-12-23T07:20:50Z

This is my script:

`import gzip
import random

import numpy as np
import torch
import torch.optim as optim
import tqdm
from torch.utils.data import DataLoader, Dataset

from long_net.model import LongNetTransformer, AutoregressiveWrapper
from zeta.optim import StableAdamWUnfused

constants

NUM_BATCHES = int(1e5)
BATCH_SIZE = 4
GRADIENT_ACCUMULATE_EVERY = 4
LEARNING_RATE = 2e-4
VALIDATE_EVERY = 100
GENERATE_EVERY = 500
GENERATE_LENGTH = 512
SEQ_LEN = 8196

helpers

def cycle(loader):
while True:
for data in loader:
yield data

def decode_token(token):
return str(chr(max(32, token)))

def decode_tokens(tokens):
return "".join(list(map(decode_token, tokens)))

instantiate GPT-like decoder model

model = LongNetTransformer(num_tokens=256, dim=512, depth=8)

model = AutoregressiveWrapper(model, max_seq_len=SEQ_LEN)

model.cuda()

prepare enwik8 data

with open("./MGYG000002546-uvig-560334.txt") as file:
X = np.fromstring(file.read(int(95e6)), dtype=np.uint8)
trX, vaX = np.split(X, [int(90e6)])
data_train, data_val = torch.from_numpy(trX), torch.from_numpy(vaX)

class TextSamplerDataset(Dataset):
def init(self, data, seq_len):
super().init()
self.data = data
self.seq_len = seq_len

def __getitem__(self, index):
    rand_start = torch.randint(0, self.data.size(0) - self.seq_len, (1,))
    full_seq = self.data[rand_start : rand_start + self.seq_len + 1].long()
    return full_seq  # .cuda()

def __len__(self):
    return self.data.size(0) // self.seq_len

train_dataset = TextSamplerDataset(data_train, SEQ_LEN)
val_dataset = TextSamplerDataset(data_val, SEQ_LEN)
train_loader = cycle(DataLoader(train_dataset, batch_size=BATCH_SIZE))
val_loader = cycle(DataLoader(val_dataset, batch_size=BATCH_SIZE))

optimizer

optim = StableAdamWUnfused(model.parameters(), lr=LEARNING_RATE)

training

for i in tqdm.tqdm(range(NUM_BATCHES), mininterval=10.0, desc="training"):
model.train()

for __ in range(GRADIENT_ACCUMULATE_EVERY):
    loss = model(next(train_loader))
    loss.backward()

print(f"training loss: {loss.item()}")
torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
optim.step()
optim.zero_grad()

if i % VALIDATE_EVERY == 0:
    model.eval()
    with torch.no_grad():
        loss = model(next(val_loader))
        print(f"validation loss: {loss.item()}")

if i % GENERATE_EVERY == 0:
    model.eval()
    inp = random.choice(val_dataset)[:-1]
    prime = decode_tokens(inp)
    print("%s \n\n %s", (prime, "*" * 100))

    sample = model.generate(inp[None, ...], GENERATE_LENGTH)
    output_str = decode_tokens(sample[0])
    print(output_str)`

kyegomez · 2023-12-23T15:31:18Z

The error is in model.cuda you can take that off or say model.to("cpu")

ZTYyy · 2023-12-23T15:56:41Z

Thank you, but I try both and got the same error.

kyegomez · 2023-12-23T20:25:34Z

@ZTYyy can you please show me the stack trace

ZTYyy · 2023-12-24T04:10:15Z

Sorry, I don't know how to give you more trace.

2023-12-24 12:16:41,972 - root - ERROR - forward() takes 2 positional arguments but 4 were given
Traceback (most recent call last):
File "/public/home/wangycgroup/public/02_Data/Internal/phage/train.py", line 91, in
loss = model(next(train_loader))
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 325, in forward
logits = self.net(x_inp, **kwargs)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 271, in forward
x = self.transformer(x)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 244, in forward
x = block(x) + x
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 205, in forward
attn = self.attn(q, k, v)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
TypeError: forward() takes 2 positional arguments but 4 were given

ZTYyy · 2023-12-25T03:15:32Z

I want to try using my genomic data to train this model, because it is the only model I have found that allows for complete input of a genome (I am using a bacterial genome with a length of around 5 million base pairs).

ZTYyy · 2023-12-25T06:20:50Z

I think the problem might be in the line "attn = self.attn(q, k, v)" in model.py.
"self.attn" is a DilatedAttention class, and its forward() can only accept one input value "def forward(self, x: torch.Tensor) -> torch.Tensor:".
But three are given here.

kyegomez · 2024-01-07T03:42:03Z

@ZTYyy Please attempt to run your script once more, I believe the error has been eliminated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train error #22

train error #22

ZTYyy commented Dec 23, 2023 •

edited by polar-sh bot

Loading

ZTYyy commented Dec 23, 2023

kyegomez commented Dec 23, 2023

ZTYyy commented Dec 23, 2023

kyegomez commented Dec 23, 2023

ZTYyy commented Dec 24, 2023

ZTYyy commented Dec 25, 2023

ZTYyy commented Dec 25, 2023

kyegomez commented Jan 7, 2024

train error #22

train error #22

Comments

ZTYyy commented Dec 23, 2023 • edited by polar-sh bot Loading

Upvote & Fund

ZTYyy commented Dec 23, 2023

constants

helpers

instantiate GPT-like decoder model

model.cuda()

prepare enwik8 data

optimizer

training

kyegomez commented Dec 23, 2023

ZTYyy commented Dec 23, 2023

kyegomez commented Dec 23, 2023

ZTYyy commented Dec 24, 2023

ZTYyy commented Dec 25, 2023

ZTYyy commented Dec 25, 2023

kyegomez commented Jan 7, 2024

ZTYyy commented Dec 23, 2023 •

edited by polar-sh bot

Loading