-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FutureWarning: You are using torch.load
with weights_only=False
#1429
Comments
The error can be suppressed by using the following before calling stanza functions but is not a solution import warnings |
Aware of it. There's a limitation where we are saving plenty of things
other than weights in the current file. Config strings and numbers,
mostly. Would those still work?
…On Tue, Oct 22, 2024, 10:19 PM mskaif ***@***.***> wrote:
The error can be suppressed by using the following before calling stanza
functions but is not a solution
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
source: ultralytics/ultralytics#14994 (comment)
<ultralytics/ultralytics#14994 (comment)>
—
Reply to this email directly, view it on GitHub
<#1429 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWPIB3WEPPBSGN337ALZ44WXBAVCNFSM6AAAAABQN6FUXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZQHEZTAMBQGY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
Some of the models can be updated to use |
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
sorry for not getting back earlier. I'm using the built-in models like so: affected from the pipeline are: Thank you for the commit! |
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
Please, be aware that on pytorch 2.6 this warning will become an error. That got reported to pytorch as: I posted more details in pytorch/pytorch#142123 (comment), but shortly huggingface/transformers#34632 PR on pytorch side has flipped the default of You can consider to add explicit list of allowed safe globals following similar approach which was done in Huggingface Transformers and Accelerate. For the reference, see: |
I am finishing up some model training and will be able to make a new release with the updated models soon. |
@AngledLuffa : note that at the moment the failure reported in pytorch/pytorch#142123 is not fixed in the latest stanza from main branch (I tried 539760c - see log below). The repro is with:
The #1430 previously merged in stanza is not enough to handle this case. The failure happens on this stanza/stanza/models/common/pretrain.py Line 56 in 539760c
Full log:
|
Got it, but that's the main branch. The updates merged in are in the dev branch, which at that line has stanza/stanza/models/common/pretrain.py Line 60 in 5754ec0
|
Ah, sorry. I missed that. |
This should now be pushed in v1.10.0 |
ValueError: md5 for D:\Anaconda\envs\stanza\Lib\site-packages\stanza\stanza_resources\zh-hans\tokenize\gsdsimp.pt is 48f993223d568afedc2893f7cd76719c, expected 68fb709f2a556b132b4915f2b3893ce7 |
It sounds like you have the old models on your system and aren't downloading them. Which is weird, since the Pipeline should automatically download the models for the new version. Are you creating the Pipeline in a way that stops it from downloading? |
thanks for you reply! this is the whole part of my script:
import stanza
import os
nlp = stanza.Pipeline(lang='zh-hans')
input_path = r"cleanned"
output_path = r"parsed"
text_name = [file.split('.')[0] for file in os.listdir(input_path) if file.endswith('.txt')]
text_list = [os.path.join(input_path, file) for file in os.listdir(input_path) if file.endswith('.txt')]
texts = [open(i, "r", encoding='utf-8') for i in text_list]
for i in range(len(texts)):
doc = nlp(texts[i].read())
# 打印CoNLL-U格式的输出
with open(output_path + "/" + text_name[i] + ".conllu", "w", encoding='utf-8') as f:
for sentence in doc.sentences:
for word in sentence.words:
f.write(f"{word.id}\t{word.text}\t{word.lemma}\t{word.upos}\t{word.xpos}\t{word.feats}\t{word.head}\t{word.deprel}\t{word.deps}\t{word.misc}\n")
f.write("\n")
I have tried to download 1.10.1,1.9.0,1.8.0 manually. But this bug still raised. and if i use old models, it will show:
2025-01-27 23:27:30 INFO: Using device: cpu
2025-01-27 23:27:30 INFO: Loading: tokenize
2025-01-27 23:27:30 ERROR: Cannot load model from D:\Anaconda\envs\stanza\Lib\site-packages\stanza\stanza_resources\zh-hans\tokenize\gsdsimp.pt
Traceback (most recent call last):
File "d:\papers_2\cooperation\华语树库\stanza_parse.py", line 7, in <module>
nlp = stanza.Pipeline(lang='zh-hans')
File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\pipeline\core.py", line 308, in __init__
self.processors[processor_name] = NAME_TO_PROCESSOR_CLASS[processor_name](config=curr_processor_config,
File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\pipeline\processor.py", line 193, in __init__
self._set_up_model(config, pipeline, device)
File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\pipeline\tokenize_processor.py", line 44, in _set_up_model
self._trainer = Trainer(model_file=config['model_path'], device=device)
File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\models\tokenization\trainer.py", line 20, in __init__
self.load(model_file)
File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\models\tokenization\trainer.py", line 84, in load
checkpoint = torch.load(filename, lambda storage, loc: storage, weights_only=True)
File "D:\Anaconda\envs\stanza\lib\site-packages\torch\serialization.py", line 1383, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
(1) Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
WeightsUnpickler error: Unsupported global: GLOBAL builtins.set was not an allowed global by default. Please use `torch.serialization.add_safe_globals([set])` to allowlist this global if you trust this class/function.
Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
在 2025-01-27 23:45:58,"John Bauer" ***@***.***> 写道:
It sounds like you have the old models on your system and aren't downloading them. Which is weird, since the Pipeline should automatically download the models for the new version. Are you creating the Pipeline in a way that stops it from downloading?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
highly recommend using backticks ` to format code when the script is running, does it say something like
it should. the https://github.com/stanfordnlp/stanza-resources/blob/main/resources_1.10.0.json the block for the Chinese models is here: your local version of the resources file should look like that |
now it is working! it should be i used the wrong resources.json. Thank you very much!
At 2025-01-27 23:59:53, "John Bauer" ***@***.***> wrote:
highly recommend using backticks ` to format code
when the script is running, does it say something like
Checking for updates to resources.json in case models have been updated. Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES
it should. the md5sum for the tokenizer model for version 1.10 is 48f993223d568afedc2893f7cd76719c, so it looks like you downloaded the right model but somehow didn't download the resources. are you able to download this file?
https://github.com/stanfordnlp/stanza-resources/blob/main/resources_1.10.0.json
the block for the Chinese models is here:
https://github.com/stanfordnlp/stanza-resources/blob/f06522caadca99c72200e20ee158fe5e63b75e97/resources_1.10.0.json#L12350
your local version of the resources file should look like that
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
Describe the bug
FutureWarning: You are using
torch.load
withweights_only=False
(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_only
will be flipped toTrue
. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user viatorch.serialization.add_safe_globals
. We recommend you start settingweights_only=True
for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.state = torch.load(filename, lambda storage, loc: storage)
This warning is triggered by all torch.load used in stanza. The issue does not cause any problem with data processing at the moment but the long warnings are distracting.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
no error
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: