Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train error #22

Open
ZTYyy opened this issue Dec 23, 2023 · 8 comments
Open

train error #22

ZTYyy opened this issue Dec 23, 2023 · 8 comments

Comments

@ZTYyy
Copy link

ZTYyy commented Dec 23, 2023

I ran train.py and got error below

Traceback (most recent call last): File "/public/home/wangycgroup/public/02_Data/Internal/phage/train.py", line 86, in <module> loss = model(next(train_loader)) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 326, in forward logits = self.net(x_inp, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 272, in forward x = self.transformer(x) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 245, in forward x = block(x) + x File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 206, in forward attn = self.attn(q, k, v) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) TypeError: forward() takes 2 positional arguments but 4 were given

the output is

`No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Using StableAdamWUnfused-v1

training: 0%| | 0/100000 [00:00<?, ?it/s]
training: 0%| | 0/100000 [00:00<?, ?it/s]
`

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@ZTYyy
Copy link
Author

ZTYyy commented Dec 23, 2023

This is my script:

`import gzip
import random

import numpy as np
import torch
import torch.optim as optim
import tqdm
from torch.utils.data import DataLoader, Dataset

from long_net.model import LongNetTransformer, AutoregressiveWrapper
from zeta.optim import StableAdamWUnfused

constants

NUM_BATCHES = int(1e5)
BATCH_SIZE = 4
GRADIENT_ACCUMULATE_EVERY = 4
LEARNING_RATE = 2e-4
VALIDATE_EVERY = 100
GENERATE_EVERY = 500
GENERATE_LENGTH = 512
SEQ_LEN = 8196

helpers

def cycle(loader):
while True:
for data in loader:
yield data

def decode_token(token):
return str(chr(max(32, token)))

def decode_tokens(tokens):
return "".join(list(map(decode_token, tokens)))

instantiate GPT-like decoder model

model = LongNetTransformer(num_tokens=256, dim=512, depth=8)

model = AutoregressiveWrapper(model, max_seq_len=SEQ_LEN)

model.cuda()

prepare enwik8 data

with open("./MGYG000002546-uvig-560334.txt") as file:
X = np.fromstring(file.read(int(95e6)), dtype=np.uint8)
trX, vaX = np.split(X, [int(90e6)])
data_train, data_val = torch.from_numpy(trX), torch.from_numpy(vaX)

class TextSamplerDataset(Dataset):
def init(self, data, seq_len):
super().init()
self.data = data
self.seq_len = seq_len

def __getitem__(self, index):
    rand_start = torch.randint(0, self.data.size(0) - self.seq_len, (1,))
    full_seq = self.data[rand_start : rand_start + self.seq_len + 1].long()
    return full_seq  # .cuda()

def __len__(self):
    return self.data.size(0) // self.seq_len

train_dataset = TextSamplerDataset(data_train, SEQ_LEN)
val_dataset = TextSamplerDataset(data_val, SEQ_LEN)
train_loader = cycle(DataLoader(train_dataset, batch_size=BATCH_SIZE))
val_loader = cycle(DataLoader(val_dataset, batch_size=BATCH_SIZE))

optimizer

optim = StableAdamWUnfused(model.parameters(), lr=LEARNING_RATE)

training

for i in tqdm.tqdm(range(NUM_BATCHES), mininterval=10.0, desc="training"):
model.train()

for __ in range(GRADIENT_ACCUMULATE_EVERY):
    loss = model(next(train_loader))
    loss.backward()

print(f"training loss: {loss.item()}")
torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
optim.step()
optim.zero_grad()

if i % VALIDATE_EVERY == 0:
    model.eval()
    with torch.no_grad():
        loss = model(next(val_loader))
        print(f"validation loss: {loss.item()}")

if i % GENERATE_EVERY == 0:
    model.eval()
    inp = random.choice(val_dataset)[:-1]
    prime = decode_tokens(inp)
    print("%s \n\n %s", (prime, "*" * 100))

    sample = model.generate(inp[None, ...], GENERATE_LENGTH)
    output_str = decode_tokens(sample[0])
    print(output_str)`

@kyegomez
Copy link
Owner

The error is in model.cuda you can take that off or say model.to("cpu")

@ZTYyy
Copy link
Author

ZTYyy commented Dec 23, 2023

Thank you, but I try both and got the same error.

@kyegomez
Copy link
Owner

@ZTYyy can you please show me the stack trace

@ZTYyy
Copy link
Author

ZTYyy commented Dec 24, 2023

Sorry, I don't know how to give you more trace.

2023-12-24 12:16:41,972 - root - ERROR - forward() takes 2 positional arguments but 4 were given
Traceback (most recent call last):
File "/public/home/wangycgroup/public/02_Data/Internal/phage/train.py", line 91, in
loss = model(next(train_loader))
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 325, in forward
logits = self.net(x_inp, **kwargs)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 271, in forward
x = self.transformer(x)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 244, in forward
x = block(x) + x
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 205, in forward
attn = self.attn(q, k, v)
File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
TypeError: forward() takes 2 positional arguments but 4 were given

@ZTYyy
Copy link
Author

ZTYyy commented Dec 25, 2023

I want to try using my genomic data to train this model, because it is the only model I have found that allows for complete input of a genome (I am using a bacterial genome with a length of around 5 million base pairs).

@ZTYyy
Copy link
Author

ZTYyy commented Dec 25, 2023

I think the problem might be in the line "attn = self.attn(q, k, v)" in model.py.
"self.attn" is a DilatedAttention class, and its forward() can only accept one input value "def forward(self, x: torch.Tensor) -> torch.Tensor:".
But three are given here.

@kyegomez
Copy link
Owner

kyegomez commented Jan 7, 2024

@ZTYyy Please attempt to run your script once more, I believe the error has been eliminated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants