Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loading big models into memory #3153

Open
2 of 4 tasks
werruww opened this issue Oct 10, 2024 · 20 comments
Open
2 of 4 tasks

loading big models into memory #3153

werruww opened this issue Oct 10, 2024 · 20 comments

Comments

@werruww
Copy link

werruww commented Oct 10, 2024

System Info

colab
t4


https://huggingface.co/docs/accelerate/concept_guides/
https://huggingface.co/docs/accelerate/concept_guides/big_model_inference

If I have a single 16 GB Vega and a processor, how do I run a larger model of Vega on the Vega and the processor so that I can benefit from the Vega acceleration? Are the codes that I ran correct or can they be modified to achieve good results?

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

!git clone https://github.com/karpathy/minGPT.git
!pip install minGPT/
!pip install huggingface_hub

!pip install accelerate --upgrade

from huggingface_hub import snapshot_download
checkpoint = "marcsun13/gpt2-xl-linear-sharded"
weights_location = snapshot_download(repo_id=checkpoint)

from accelerate import init_empty_weights
from mingpt.model import GPT

model_config = GPT.get_default_config()
model_config.model_type = 'gpt2-xl'
model_config.vocab_size = 50257
model_config.block_size = 1024

with init_empty_weights():
    model = GPT(model_config)

from accelerate import load_checkpoint_and_dispatch

model = load_checkpoint_and_dispatch(
    model, checkpoint=weights_location, device_map="auto", no_split_module_classes=['Block']
)

from mingpt.bpe import BPETokenizer
tokenizer = BPETokenizer()
inputs = tokenizer("who is python?").to(0)

# Change x1 to inputs
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)[0]
tokenizer.decode(outputs.cpu().squeeze())

Expected behavior

code run good
python is  a popular open source Python library for data analysis. It is used by many Python developers to perform data analysis tasks.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language

who is python?

I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure.

@werruww werruww changed the title oading big models into memory loading big models into memory Oct 10, 2024
@werruww
Copy link
Author

werruww commented Oct 10, 2024

from huggingface_hub import snapshot_download
import torch
from accelerate import infer_auto_device_map
from transformers import AutoModelForCausalLM, AutoConfig

checkpoint = "marcsun13/gpt2-xl-linear-sharded"
weights_location = snapshot_download(repo_id=checkpoint)

# Instead of loading directly from checkpoint, use 'gpt2-xl' as base
# and load the sharded weights into it.
config = AutoConfig.from_pretrained("gpt2-xl")  # Load config for gpt2-xl

# Now load the model using the gpt2-xl configuration and downloaded sharded weights
model = AutoModelForCausalLM.from_pretrained(
    weights_location, config=config, torch_dtype=torch.float16, ignore_mismatched_sizes=True
)

# Now use the model object in infer_auto_device_map
device_map = infer_auto_device_map(
    model, max_memory={0: "10GiB", "cpu": "10GiB"}
)

from accelerate import init_empty_weights
from mingpt.model import GPT

model_config = GPT.get_default_config()
model_config.model_type = 'gpt2-xl'
model_config.vocab_size = 50257
model_config.block_size = 1024

with init_empty_weights():
    model = GPT(model_config)

from accelerate import load_checkpoint_and_dispatch

model = load_checkpoint_and_dispatch(
    model, checkpoint=weights_location, device_map="auto", no_split_module_classes=['Block']
)


model.hf_device_map

from mingpt.bpe import BPETokenizer

tokenizer = BPETokenizer()
inputs = tokenizer("Who is Napoleon Bonaparte?").to(0)

# Use 'inputs' instead of 'x1' here
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)[0]
tokenizer.decode(outputs.cpu().squeeze())




Fetching9files: 100%9/9 [00:00<00:00, 370.91it/s]
Loadingcheckpointshards: 100%7/7 [00:01<00:00,  4.30it/s]
Some weights of GPT2LMHeadModel were not initialized from the model checkpoint at /root/.cache/huggingface/hub/models--marcsun13--gpt2-xl-linear-sharded/snapshots/aeb281f0cd2bfc947d4702b27aecd9194c322c7e and are newly initialized because the shapes did not match:
- transformer.h.0.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.0.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.0.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.1.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.1.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.1.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.2.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.2.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.2.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.3.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.3.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.3.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.4.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.4.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.4.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.5.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.10.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.10.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.10.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.11.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.11.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.11.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.12.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.12.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.12.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.5.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.5.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.6.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.6.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.6.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.7.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.7.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.7.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.8.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.8.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.8.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.9.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.9.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.9.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.13.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.13.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.13.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.14.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.14.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.14.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.15.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.15.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.15.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.16.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.16.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.16.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.17.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.17.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.17.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.18.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.18.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.18.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.19.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.19.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.19.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.20.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.20.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.20.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.21.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.21.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.21.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.22.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.22.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.22.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.23.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.23.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.23.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.24.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.24.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.24.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.25.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.25.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.25.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.26.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.26.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.26.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.27.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.27.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.27.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.28.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.28.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.28.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.29.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.29.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.29.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.30.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.30.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.30.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.31.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.31.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.31.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.32.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.32.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.32.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.33.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.33.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.33.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.34.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.34.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.34.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.35.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.35.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.35.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.36.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.36.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.36.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.37.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.37.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.37.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.38.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.38.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.38.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.39.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.39.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.39.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.40.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.40.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.40.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.41.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.41.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.41.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.42.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.42.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.42.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.43.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.43.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.43.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.44.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.44.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.44.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.45.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.45.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.45.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.46.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.46.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.46.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.47.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.47.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.47.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
number of parameters: 1557.61M
Who is Napoleon Bonaparte?\n\nNapoleon Bonaparte was a French general who led the French army during the French Revolution. He was the first to use the term "Napoleon" to describe himself.\n\nWhat is the name of the French Revolution?\n\nThe French Revolution was a period of political and social upheaval in France that began in 1789. It was the first of the French revolutions, and was the first to be led by a man.\n\nWhat is the name of the French Revolution?\n\nThe French Revolution was a period of political and social upheaval in France that began in 1789. It was the first of the French revolutions, and was the first to be led by a man.\n\nWhat is the name of the French Revolution?\n\nThe French Revolution was a period of political and social upheaval in France that began in 1789. It was the first of the French revolutions, and was the first to be led by a man.\n\nWhat is the name of the French Revolution?\n\nThe French Revolution was a period of political and social upheaval in Fran

@werruww
Copy link
Author

werruww commented Oct 10, 2024

If I have a single 16 GB Vega and a processor, how do I run a larger model of Vega on the Vega and the processor so that I can benefit from the Vega acceleration? Are the codes that I ran correct or can they be modified to achieve good results?

@werruww
Copy link
Author

werruww commented Oct 10, 2024

What are the steps from a to z to run a model larger than the 16 GB Vega on the Vega and the processor? Starting from downloading the model, then creating an empty model, then placing the weights in it, then running it with a request or completing the text

@muellerzr
Copy link
Collaborator

@werruww please do not spam this with nearly the same result. It makes us think that this is an LLM instead of a real problem, and bloats our notifications as well

@muellerzr
Copy link
Collaborator

In general, do device_map="auto" and accelerate will fill your model how it can, and offload the rest to the CPU/hard drive and run the model from there.

@werruww
Copy link
Author

werruww commented Oct 10, 2024

from huggingface_hub import snapshot_download
checkpoint = "marcsun13/gpt2-xl-linear-sharded"
weights_location = snapshot_download(repo_id=checkpoint)

from accelerate import init_empty_weights
from mingpt.model import GPT

model_config = GPT.get_default_config()
model_config.model_type = 'gpt2-xl'
model_config.vocab_size = 50257
model_config.block_size = 1024

with init_empty_weights():
model = GPT(model_config)

from accelerate import load_checkpoint_and_dispatch

model = load_checkpoint_and_dispatch(
model, checkpoint=weights_location, device_map="auto", no_split_module_classes=['Block']
)

from mingpt.bpe import BPETokenizer
tokenizer = BPETokenizer()
inputs = tokenizer("who is python?").to(0)

Change x1 to inputs

outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)[0]
tokenizer.decode(outputs.cpu().squeeze())

device_map="auto"
instead
?????????
model = load_checkpoint_and_dispatch(
model, checkpoint=weights_location, device_map="auto", no_split_module_classes=['Block']
)

This is the code what is the modification؟

@werruww
Copy link
Author

werruww commented Oct 10, 2024

clear
device_map = infer_auto_device_map(
model, max_memory={0: "10GiB", "cpu": "10GiB"}
)

and pot
device_map="auto"

@werruww
Copy link
Author

werruww commented Oct 10, 2024

If you allow me to write a complete code that I trust
I copied the codes from the site randomly
If it is possible to write a complete code that reassures me that it will build an empty form, fill it out, and run it on the Vega and then the processor, no matter the size of the form

@werruww
Copy link
Author

werruww commented Oct 10, 2024

If possible, a collab page tpu 24g
A model larger than 24 GB
To clarify things
Thank you

@werruww
Copy link
Author

werruww commented Oct 10, 2024

ValueError Traceback (most recent call last)
in <cell line: 3>()
1 from accelerate import load_checkpoint_and_dispatch
2
----> 3 model = load_checkpoint_and_dispatch(
4 model, checkpoint=weights_location, device_map="auto", no_split_module_classes=['Block']
5 )

2 frames
/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py in set_module_tensor_to_device(module, tensor_name, device, value, dtype, fp16_statistics, tied_params_map)
371 # In other cases, we want to make sure we're not loading checkpoints that do not match the config.
372 if old_value.shape != value.shape and param_cls.name != "Params4bit":
--> 373 raise ValueError(
374 f'Trying to set a tensor of shape {value.shape} in "{tensor_name}" (which has shape {old_value.shape}), this looks incorrect.'
375 )

ValueError: Trying to set a tensor of shape torch.Size([32768, 4096]) in "weight" (which has shape torch.Size([32768, 768])), this looks incorrect.

code

from huggingface_hub import snapshot_download
checkpoint = "mistralai/Mistral-7B-Instruct-v0.3"
weights_location = snapshot_download(repo_id=checkpoint)

from accelerate import init_empty_weights
from mingpt.model import GPT

model_config = GPT.get_default_config()
model_config.model_type = 'gpt2'
model_config.vocab_size = 32768
model_config.block_size = 768
model_config.hidden_size = 768
with init_empty_weights():
model = GPT(model_config)

from accelerate import init_empty_weights
from mingpt.model import GPT
model_config = GPT.get_default_config()
model_config.model_type = 'mistral'
model_config.vocab_size = 32000 # حجم المفردات لـ Mistral
model_config.block_size = 4096 # الحد الأقصى لطول السياق
model_config.n_layer = 32 # عدد الطبقات
model_config.n_head = 32 # عدد رؤوس الانتباه
model_config.n_embd = 4096 # حجم التضمين الخفي

with init_empty_weights():
model = GPT(model_config)

from accelerate import load_checkpoint_and_dispatch

model = load_checkpoint_and_dispatch(
model, checkpoint=weights_location, device_map="auto", no_split_module_classes=['Block']
)

!model.hf_device_map

from mingpt.bpe import BPETokenizer
tokenizer = BPETokenizer()
inputs = tokenizer("Who is Napoleon Bonaparte?").to(0)

Use 'inputs' instead of 'x1' for model generation

outputs = model.generate(inputs, max_new_tokens=1024, do_sample=False)[0]
tokenizer.decode(outputs.cpu().squeeze())

@werruww
Copy link
Author

werruww commented Oct 10, 2024

@werruww
Copy link
Author

werruww commented Oct 10, 2024

Extended vocabulary to 32768

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

@werruww
Copy link
Author

werruww commented Oct 10, 2024

I ran the code. on colab t4 12 ram

@werruww
Copy link
Author

werruww commented Oct 10, 2024

ValueError Traceback (most recent call last)
in <cell line: 3>()
1 from accelerate import load_checkpoint_and_dispatch
2
----> 3 model = load_checkpoint_and_dispatch(
4 model, checkpoint=weights_location, device_map="auto", no_split_module_classes=['Block']
5 )

2 frames
/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py in set_module_tensor_to_device(module, tensor_name, device, value, dtype, fp16_statistics, tied_params_map)
371 # In other cases, we want to make sure we're not loading checkpoints that do not match the config.
372 if old_value.shape != value.shape and param_cls.name != "Params4bit":
--> 373 raise ValueError(
374 f'Trying to set a tensor of shape {value.shape} in "{tensor_name}" (which has shape {old_value.shape}), this looks incorrect.'
375 )

ValueError: Trying to set a tensor of shape torch.Size([32768, 4096]) in "weight" (which has shape torch.Size([32768, 768])), this looks incorrect.

@werruww
Copy link
Author

werruww commented Oct 11, 2024

  | { -- | --   | "architectures": [   | "MistralForCausalLM"   | ],   | "attention_dropout": 0.0,   | "bos_token_id": 1,   | "eos_token_id": 2,   | "hidden_act": "silu",   | "hidden_size": 4096,   | "initializer_range": 0.02,   | "intermediate_size": 14336,   | "max_position_embeddings": 32768,   | "model_type": "mistral",   | "num_attention_heads": 32,   | "num_hidden_layers": 32,   | "num_key_value_heads": 8,   | "rms_norm_eps": 1e-05,   | "rope_theta": 1000000.0,   | "sliding_window": null,   | "tie_word_embeddings": false,   | "torch_dtype": "bfloat16",   | "transformers_version": "4.42.0.dev0",   | "use_cache": true,   | "vocab_size": 32768   | }

no config.block_size

@werruww
Copy link
Author

werruww commented Oct 11, 2024

@werruww
Copy link
Author

werruww commented Oct 11, 2024

from huggingface_hub import snapshot_download
checkpoint = "openai-community/gpt2"
weights_location = snapshot_download(repo_id=checkpoint)

import torch.nn as nn # import the torch.nn module and alias it as nn
from accelerate import init_empty_weights

with init_empty_weights():
model = nn.Sequential(*[nn.Linear(10000, 10000) for _ in range(1000)])

import torch
import torch.nn as nn
from huggingface_hub import snapshot_download
from accelerate import init_empty_weights, load_checkpoint_and_dispatch
from transformers import GPT2LMHeadModel

Download checkpoint weights

checkpoint = "openai-community/gpt2"
weights_location = snapshot_download(repo_id=checkpoint)

Initialize an empty model of the correct type, but load the weights immediately

instead of using init_empty_weights

with init_empty_weights(): # Remove this line

model = GPT2LMHeadModel.from_pretrained(checkpoint, torch_dtype=torch.float16)

Initialize the model and load weights directly

model = GPT2LMHeadModel.from_pretrained(checkpoint, torch_dtype=torch.float16)

Load the checkpoint weights into the model, dispatching to appropriate devices

Note: If you want to load specific weights from the checkpoint file,

you'll need to modify this part to load the state_dict explicitly.

model = load_checkpoint_and_dispatch(
model,
checkpoint=weights_location,
device_map="auto",
offload_folder="offload_folder", # Use a folder name, not "True"
no_split_module_classes=['Block']
)

import torch.nn as nn # import the torch.nn module and alias it as nn
from accelerate import init_empty_weights, load_checkpoint_and_dispatch
from transformers import AutoModelForCausalLM # Import AutoModelForCausalLM
from huggingface_hub import snapshot_download

Download the checkpoint

checkpoint = "openai-community/gpt2"
weights_location = snapshot_download(repo_id=checkpoint)

Option 1: Load the pre-trained GPT-2 model

Instead of creating a sequential model, use AutoModelForCausalLM to load GPT-2 directly

model = AutoModelForCausalLM.from_pretrained(checkpoint)

Option 2: Update the device_map to be compatible with the Sequential model.

NOTE: This assumes the checkpoint is compatible with your sequential model.

It is more likely that you will need to create a model compatible with your checkpoint.

device_map = {}

for i in range(1000):

device_map[f"{i}.weight"] = "cpu" # Map weights of each layer to CPU

device_map[f"{i}.bias"] = "cpu" # Map biases of each layer to CPU

Load the checkpoint and dispatch

model = load_checkpoint_and_dispatch(
model, checkpoint=weights_location, device_map="auto", offload_folder="True"
)

import torch
from tokenizers import ByteLevelBPETokenizer
from transformers import GPT2Tokenizer

Instantiate the GPT-2 tokenizer instead of ByteLevelBPETokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

Use the tokenizer

inputs = tokenizer("Hello, my name is", return_tensors="pt").input_ids.to("cpu")

outputs = model.generate(inputs, max_new_tokens=10, do_sample=False)[0]
decoded_output = tokenizer.decode(outputs.cpu().squeeze().tolist())
print(decoded_output)

/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884
warnings.warn(
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:50256 for open-end generation.
Hello, my name is John. I'm a writer, and I'm

@werruww
Copy link
Author

werruww commented Oct 11, 2024

colab no t4 no tpu

@werruww
Copy link
Author

werruww commented Oct 11, 2024

How do I create a model without a family gpt and without minGPT

like mistral, phi3.5,lama3.1,qwen

@werruww
Copy link
Author

werruww commented Oct 14, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants