Skip to content

Commit

Permalink
Add error logging for failed tokenizer loading
Browse files Browse the repository at this point in the history
Signed-off-by: Khaled Sulayman <[email protected]>
  • Loading branch information
khaledsulayman committed Nov 11, 2024
1 parent 2aa3aee commit a6ee9b3
Showing 1 changed file with 10 additions and 5 deletions.
15 changes: 10 additions & 5 deletions src/instructlab/sdg/utils/chunkers.py
Original file line number Diff line number Diff line change
Expand Up @@ -305,18 +305,23 @@ def create_tokenizer(self, model_name: str):

ipdb.set_trace()

Check warning on line 306 in src/instructlab/sdg/utils/chunkers.py

View workflow job for this annotation

GitHub Actions / pylint

C0415: Import outside toplevel (ipdb) (import-outside-toplevel)

Check failure on line 306 in src/instructlab/sdg/utils/chunkers.py

View workflow job for this annotation

GitHub Actions / pylint

E0401: Unable to import 'ipdb' (import-error)
model_path = Path(model_name)
error_info_message = "Please run ilab model download {download_args} and try again"
try:
if is_model_safetensors(model_path):
tokenizer = AutoTokenizer.from_pretrained(model_path)
error_info_message = error_info_message.format(download_args=f"--repository {model_path}")
elif is_model_gguf(model_path):
tokenizer = AutoTokenizer.from_pretrained(model_path.parent, gguf_file=model_path.name)
model_dir, model_filename = model_path.parent, model_path.name
tokenizer = AutoTokenizer.from_pretrained(model_dir, gguf_file=model_filename)
error_info_message = error_info_message.format(download_args=f"--repository {model_dir} --filename {model_filename}")
else:
raise Exception(f"Received path to invalid model format {model_path}")
logger.info(f"Successfully loaded tokenizer from: {model_path}")
return tokenizer

Check warning on line 320 in src/instructlab/sdg/utils/chunkers.py

View workflow job for this annotation

GitHub Actions / pylint

W0719: Raising too general exception: Exception (broad-exception-raised)
except Exception as e:
except (OSError, ValueError) as e:
logger.error(
f"Failed to load tokenizer as model was not found at {model_path}."
"Please run `ilab model download {model_name} and try again\n"
"{str(e)}"
str(e),
f"Failed to load tokenizer as model was not found at {model_path}. {error_info_message}"
)
raise

Expand Down

0 comments on commit a6ee9b3

Please sign in to comment.