Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using base model on GPU with no bfloat16 #163

Open
yichen0104 opened this issue May 28, 2024 · 1 comment
Open

Using base model on GPU with no bfloat16 #163

yichen0104 opened this issue May 28, 2024 · 1 comment

Comments

@yichen0104
Copy link

Hi. I'm trying to run mistral-7B-v0.1 model using mistral-inference with a Nvidia Tesla V100 32GB GPU. Considering that my GPU doesn't have bfloat support, I would like to know if it is possible to configure the runtime code to run under fp16 mode, or it will raise an error identical to that in Issue #160. I've tried both mistral-demo and the sample Python code in README and yielded the same error. Thanks in advance.

@yichen0104 yichen0104 changed the title Using the model on older GPUs with no bfloat16 Using base model on GPU with no bfloat16 May 28, 2024
@the-crypt-keeper
Copy link

@yichen0104 The library underneath actually supports it, the problem is just that dtype is not exposed via the CLI. I was able to make it work on my 2x3060+2xP100 machine by applying the following patch:

diff --git a/src/mistral_inference/main.py b/src/mistral_inference/main.py
index a5ef3a0..d97c4c9 100644
--- a/src/mistral_inference/main.py
+++ b/src/mistral_inference/main.py
@@ -42,7 +42,7 @@ def load_tokenizer(model_path: Path) -> MistralTokenizer:

 def interactive(
     model_path: str,
-    max_tokens: int = 35,
+    max_tokens: int = 512,
     temperature: float = 0.7,
     num_pipeline_ranks: int = 1,
     instruct: bool = False,
@@ -62,7 +62,7 @@ def interactive(
     tokenizer: Tokenizer = mistral_tokenizer.instruct_tokenizer.tokenizer

     transformer = Transformer.from_folder(
-        Path(model_path), max_batch_size=3, num_pipeline_ranks=num_pipeline_ranks
+        Path(model_path), max_batch_size=3, num_pipeline_ranks=num_pipeline_ranks, dtype=torch.float16
     )

     # load LoRA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants