FutureWarning: You are using `torch.load` with `weights_only=False` #1429

mskaif · 2024-10-23T05:14:18Z

Describe the bug
FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state = torch.load(filename, lambda storage, loc: storage)

This warning is triggered by all torch.load used in stanza. The issue does not cause any problem with data processing at the moment but the long warnings are distracting.

To Reproduce
Steps to reproduce the behavior:

upgrade torch to 2.4.1

Expected behavior
no error

Environment (please complete the following information):

OS: Windows
Python version: python 3.12.7
Stanza version: 1.9.2

The text was updated successfully, but these errors were encountered:

mskaif · 2024-10-23T05:19:20Z

The error can be suppressed by using the following before calling stanza functions but is not a solution

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

source: ultralytics/ultralytics#14994 (comment)

AngledLuffa · 2024-10-23T06:20:25Z

Aware of it. There's a limitation where we are saving plenty of things other than weights in the current file. Config strings and numbers, mostly. Would those still work?

…

On Tue, Oct 22, 2024, 10:19 PM mskaif ***@***.***> wrote: The error can be suppressed by using the following before calling stanza functions but is not a solution import warnings warnings.simplefilter(action='ignore', category=FutureWarning) source: ultralytics/ultralytics#14994 (comment) <ultralytics/ultralytics#14994 (comment)> — Reply to this email directly, view it on GitHub <#1429 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWPIB3WEPPBSGN337ALZ44WXBAVCNFSM6AAAAABQN6FUXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZQHEZTAMBQGY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429

AngledLuffa · 2024-10-24T06:51:49Z

Some of the models can be updated to use weights_only=True right away, but others require resaving with enums or other data structures removed. Will have to investigate some more.

…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429

mskaif · 2024-10-25T05:19:14Z

Some of the models can be updated to use weights_only=True right away, but others require resaving with enums or other data structures removed. Will have to investigate some more.

sorry for not getting back earlier. I'm using the built-in models like so:
STANZA_PIPE = stanza.Pipeline(
lang="en",
dir=settings.STANZA_DATA_DIR,
processors="tokenize,mwt,pos",
download_method=None,
use_gpu=False,
)

affected from the pipeline are:
tokenization\trainer.py:82
mwt\trainer.py:201
pos\trainer.py:139
common\pretrain.py:56
common\char_model.py:271

Thank you for the commit!

…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429

dvrogozh · 2024-12-07T00:11:19Z

Please, be aware that on pytorch 2.6 this warning will become an error. That got reported to pytorch as:

[XPU] model works with 2.5.1 while break with nightly build pytorch/pytorch#142123

I posted more details in pytorch/pytorch#142123 (comment), but shortly huggingface/transformers#34632 PR on pytorch side has flipped the default of weights_only from False to True in the upcoming pytorch 2.6.

You can consider to add explicit list of allowed safe globals following similar approach which was done in Huggingface Transformers and Accelerate. For the reference, see:

Add safe_globals to resume training on PyTorch 2.6 huggingface/transformers#34632

AngledLuffa · 2024-12-07T00:15:00Z

I am finishing up some model training and will be able to make a new release with the updated models soon.

dvrogozh · 2024-12-07T00:32:00Z

@AngledLuffa : note that at the moment the failure reported in pytorch/pytorch#142123 is not fixed in the latest stanza from main branch (I tried 539760c - see log below). The repro is with:

import stanza
pos_pipeline = stanza.Pipeline(lang='en', processors='tokenize,pos', use_gpu=True, device='xpu')
sentence = "Some sentence"
pos_pipeline(sentence)

The #1430 previously merged in stanza is not enough to handle this case. The failure happens on this torch.load():

stanza/stanza/models/common/pretrain.py

Line 56 in 539760c

data = torch.load(self.filename, lambda storage, loc: storage)

Full log:

2024-12-06 16:29:16 INFO: Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.9.0.json: 392kB [00:00, 70.6MB/s]
2024-12-06 16:29:17 INFO: Downloaded file to /home/dvrogozh/stanza_resources/resources.json
2024-12-06 16:29:17 WARNING: Language en package default expects mwt, which has been added
2024-12-06 16:29:17 INFO: Loading these models for language: en (English):
===============================
| Processor | Package         |
-------------------------------
| tokenize  | combined        |
| mwt       | combined        |
| pos       | combined_charlm |
===============================

2024-12-06 16:29:17 INFO: Using device: xpu
2024-12-06 16:29:17 INFO: Loading: tokenize
2024-12-06 16:29:18 INFO: Loading: mwt
2024-12-06 16:29:18 INFO: Loading: pos
/home/dvrogozh/git/pytorch/pytorch/torch/_weights_only_unpickler.py:515: UserWarning: Detected pickle protocol 3 in the checkpoint, which was not the default pickle protocol used by `torch.load` (2). The weights_only Unpickler might not support all instructions implemented by this protocol, please file an issue for adding support if you encounter this.
  warnings.warn(
Traceback (most recent call last):
  File "/home/dvrogozh/tmp/st.py", line 3, in <module>
    pos_pipeline = stanza.Pipeline(lang='en', processors='tokenize,pos', use_gpu=True, device='xpu')
  File "/home/dvrogozh/git/stanza/stanza/pipeline/core.py", line 308, in __init__
    self.processors[processor_name] = NAME_TO_PROCESSOR_CLASS[processor_name](config=curr_processor_config,
  File "/home/dvrogozh/git/stanza/stanza/pipeline/processor.py", line 193, in __init__
    self._set_up_model(config, pipeline, device)
  File "/home/dvrogozh/git/stanza/stanza/pipeline/pos_processor.py", line 32, in _set_up_model
    self._trainer = Trainer(pretrain=self.pretrain, model_file=config['model_path'], device=device, args=args, foundation_cache=pipeline.foundation_cache)
  File "/home/dvrogozh/git/stanza/stanza/models/pos/trainer.py", line 34, in __init__
    self.load(model_file, pretrain, args=args, foundation_cache=foundation_cache)
  File "/home/dvrogozh/git/stanza/stanza/models/pos/trainer.py", line 174, in load
    emb_matrix = pretrain.emb
  File "/home/dvrogozh/git/stanza/stanza/models/common/pretrain.py", line 50, in emb
    self.load()
  File "/home/dvrogozh/git/stanza/stanza/models/common/pretrain.py", line 56, in load
    data = torch.load(self.filename, lambda storage, loc: storage)
  File "/home/dvrogozh/git/pytorch/pytorch/torch/serialization.py", line 1480, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
        (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
        (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
        WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
(pytorch.xpu) dvrogozh@willow-spr03:~/tmp$ cat st.py
import stanza

pos_pipeline = stanza.Pipeline(lang='en', processors='tokenize,pos', use_gpu=True, device='xpu')
sentence = "Some sentence"
pos_pipeline(sentence)

AngledLuffa · 2024-12-07T00:36:53Z

Got it, but that's the main branch. The updates merged in are in the dev branch, which at that line has torch.load(... weights_only=True)

stanza/stanza/models/common/pretrain.py

Line 60 in 5754ec0

try:

dvrogozh · 2024-12-07T00:50:58Z

Got it, but that's the main branch. The updates merged in are in the dev branch, which at that line has torch.load(... weights_only=True)

Ah, sorry. I missed that.

AngledLuffa · 2024-12-23T07:24:25Z

This should now be pushed in v1.10.0

YuhuYang · 2025-01-27T15:38:25Z

This should now be pushed in v1.10.0
this problem still exists in 1.10.1 if I use zh-hans 1.9.0. And if i used zh-hans 1.10.0, this error arised:

ValueError: md5 for D:\Anaconda\envs\stanza\Lib\site-packages\stanza\stanza_resources\zh-hans\tokenize\gsdsimp.pt is 48f993223d568afedc2893f7cd76719c, expected 68fb709f2a556b132b4915f2b3893ce7

AngledLuffa · 2025-01-27T15:45:35Z

It sounds like you have the old models on your system and aren't downloading them. Which is weird, since the Pipeline should automatically download the models for the new version. Are you creating the Pipeline in a way that stops it from downloading?

YuhuYang · 2025-01-27T15:53:56Z

thanks for you reply! this is the whole part of my script: import stanza import os nlp = stanza.Pipeline(lang='zh-hans') input_path = r"cleanned" output_path = r"parsed" text_name = [file.split('.')[0] for file in os.listdir(input_path) if file.endswith('.txt')] text_list = [os.path.join(input_path, file) for file in os.listdir(input_path) if file.endswith('.txt')] texts = [open(i, "r", encoding='utf-8') for i in text_list] for i in range(len(texts)): doc = nlp(texts[i].read()) # 打印CoNLL-U格式的输出 with open(output_path + "/" + text_name[i] + ".conllu", "w", encoding='utf-8') as f: for sentence in doc.sentences: for word in sentence.words: f.write(f"{word.id}\t{word.text}\t{word.lemma}\t{word.upos}\t{word.xpos}\t{word.feats}\t{word.head}\t{word.deprel}\t{word.deps}\t{word.misc}\n") f.write("\n") I have tried to download 1.10.1,1.9.0,1.8.0 manually. But this bug still raised. and if i use old models, it will show: 2025-01-27 23:27:30 INFO: Using device: cpu 2025-01-27 23:27:30 INFO: Loading: tokenize 2025-01-27 23:27:30 ERROR: Cannot load model from D:\Anaconda\envs\stanza\Lib\site-packages\stanza\stanza_resources\zh-hans\tokenize\gsdsimp.pt Traceback (most recent call last): File "d:\papers_2\cooperation\华语树库\stanza_parse.py", line 7, in <module> nlp = stanza.Pipeline(lang='zh-hans') File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\pipeline\core.py", line 308, in __init__ self.processors[processor_name] = NAME_TO_PROCESSOR_CLASS[processor_name](config=curr_processor_config, File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\pipeline\processor.py", line 193, in __init__ self._set_up_model(config, pipeline, device) File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\pipeline\tokenize_processor.py", line 44, in _set_up_model self._trainer = Trainer(model_file=config['model_path'], device=device) File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\models\tokenization\trainer.py", line 20, in __init__ self.load(model_file) File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\models\tokenization\trainer.py", line 84, in load checkpoint = torch.load(filename, lambda storage, loc: storage, weights_only=True) File "D:\Anaconda\envs\stanza\lib\site-packages\torch\serialization.py", line 1383, in load raise pickle.UnpicklingError(_get_wo_message(str(e))) from None _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. (1) Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. WeightsUnpickler error: Unsupported global: GLOBAL builtins.set was not an allowed global by default. Please use `torch.serialization.add_safe_globals([set])` to allowlist this global if you trust this class/function. Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. 在 2025-01-27 23:45:58，"John Bauer" ***@***.***> 写道： It sounds like you have the old models on your system and aren't downloading them. Which is weird, since the Pipeline should automatically download the models for the new version. Are you creating the Pipeline in a way that stops it from downloading? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

AngledLuffa · 2025-01-27T15:59:30Z

highly recommend using backticks ` to format code

when the script is running, does it say something like

Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES

it should. the md5sum for the tokenizer model for version 1.10 is 48f993223d568afedc2893f7cd76719c, so it looks like you downloaded the right model but somehow didn't download the resources. are you able to download this file?

https://github.com/stanfordnlp/stanza-resources/blob/main/resources_1.10.0.json

the block for the Chinese models is here:

https://github.com/stanfordnlp/stanza-resources/blob/f06522caadca99c72200e20ee158fe5e63b75e97/resources_1.10.0.json#L12350

your local version of the resources file should look like that

YuhuYang · 2025-01-27T16:08:39Z

now it is working! it should be i used the wrong resources.json. Thank you very much! At 2025-01-27 23:59:53, "John Bauer" ***@***.***> wrote: highly recommend using backticks ` to format code when the script is running, does it say something like Checking for updates to resources.json in case models have been updated. Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES it should. the md5sum for the tokenizer model for version 1.10 is 48f993223d568afedc2893f7cd76719c, so it looks like you downloaded the right model but somehow didn't download the resources. are you able to download this file? https://github.com/stanfordnlp/stanza-resources/blob/main/resources_1.10.0.json the block for the Chinese models is here: https://github.com/stanfordnlp/stanza-resources/blob/f06522caadca99c72200e20ee158fe5e63b75e97/resources_1.10.0.json#L12350 your local version of the resources file should look like that — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

mskaif added the bug label Oct 23, 2024

AngledLuffa added a commit that referenced this issue Oct 24, 2024

For the simple use cases where no Enum or unexpected config objects a…

772099d

…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429

AngledLuffa added a commit that referenced this issue Oct 24, 2024

For the simple use cases where no Enum or unexpected config objects a…

80be642

…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429

AngledLuffa added a commit that referenced this issue Oct 24, 2024

For the simple use cases where no Enum or unexpected config objects a…

6e4ca78

…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429

AngledLuffa added a commit that referenced this issue Oct 24, 2024

For the simple use cases where no Enum or unexpected config objects a…

73b850e

…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429

AngledLuffa added a commit that referenced this issue Oct 25, 2024

For the simple use cases where no Enum or unexpected config objects a…

56fe3c5

…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429

AngledLuffa added a commit that referenced this issue Oct 27, 2024

For the simple use cases where no Enum or unexpected config objects a…

087d633

…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429

AngledLuffa added a commit that referenced this issue Oct 28, 2024

For the simple use cases where no Enum or unexpected config objects a…

89f3e12

…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429

AngledLuffa mentioned this issue Oct 28, 2024

Weights only #1430

Merged

AngledLuffa added a commit that referenced this issue Oct 28, 2024

For the simple use cases where no Enum or unexpected config objects a…

c972332

…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429

dvrogozh mentioned this issue Dec 7, 2024

[XPU] model works with 2.5.1 while break with nightly build pytorch/pytorch#142123

Closed

stanfordnlp deleted a comment from huiyan2021 Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FutureWarning: You are using `torch.load` with `weights_only=False` #1429

FutureWarning: You are using `torch.load` with `weights_only=False` #1429

mskaif commented Oct 23, 2024

mskaif commented Oct 23, 2024

AngledLuffa commented Oct 23, 2024 via email

AngledLuffa commented Oct 24, 2024

mskaif commented Oct 25, 2024

dvrogozh commented Dec 7, 2024

AngledLuffa commented Dec 7, 2024

dvrogozh commented Dec 7, 2024

AngledLuffa commented Dec 7, 2024

dvrogozh commented Dec 7, 2024

AngledLuffa commented Dec 23, 2024

YuhuYang commented Jan 27, 2025

AngledLuffa commented Jan 27, 2025

YuhuYang commented Jan 27, 2025 via email

AngledLuffa commented Jan 27, 2025

YuhuYang commented Jan 27, 2025 via email

FutureWarning: You are using torch.load with weights_only=False #1429

FutureWarning: You are using torch.load with weights_only=False #1429

Comments

mskaif commented Oct 23, 2024

mskaif commented Oct 23, 2024

AngledLuffa commented Oct 23, 2024 via email

AngledLuffa commented Oct 24, 2024

mskaif commented Oct 25, 2024

dvrogozh commented Dec 7, 2024

AngledLuffa commented Dec 7, 2024

dvrogozh commented Dec 7, 2024

AngledLuffa commented Dec 7, 2024

dvrogozh commented Dec 7, 2024

AngledLuffa commented Dec 23, 2024

YuhuYang commented Jan 27, 2025

AngledLuffa commented Jan 27, 2025

YuhuYang commented Jan 27, 2025 via email

AngledLuffa commented Jan 27, 2025

YuhuYang commented Jan 27, 2025 via email

FutureWarning: You are using `torch.load` with `weights_only=False` #1429

FutureWarning: You are using `torch.load` with `weights_only=False` #1429