-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset loading issue for german_rag_evals on Windows #211
Comments
Hi! |
Yes I can do that @clefourrier . @Pommel4711 here is the command how I use the evaluation: It works (worked) without the |
Here is a Colab with code that shows that the dataset can be loaded without setting |
Interesting, thanks a lot! |
@clefourrier and @Pommel4711
Can you please check that? |
@PhilipMay For reference, I'm running on this commit: Thanks for your help! |
Hm, I'm going to ping @lhoestq on this then because it seems like a |
trust_remote_code=True
to allow custom code to be run.
Good idea. Thanks. |
OSes that don't support SIGALRM are supported thanks to a Anyway feel free to update |
No problem for the transfer if needed |
I coppied the (lighteval) D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval>python run_evals_accelerate.py ^ --model_args "pretrained=DiscoResearch/DiscoLM_German_7b_v1" ^ --tasks "./examples/tasks/all_german_rag_evals.txt" ^ --override_batch_size 1 ^ --use_chat_template ^ --custom_tasks "community_tasks/german_rag_evals.py" ^ --output_dir "./evals/"
Traceback (most recent call last):
File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 30, in <module>
from lighteval.main_accelerate import CACHE_DIR, main
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 31, in <module>
from lighteval.evaluator import evaluate, make_results_table
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\evaluator.py", line 32, in <module>
from lighteval.logging.evaluation_tracker import EvaluationTracker
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\evaluation_tracker.py", line 32, in <module>
from datasets import Dataset, load_dataset
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\__init__.py", line 26, in <module>
from .inspect import (
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\inspect.py", line 32, in <module>
from .load import (
ImportError: cannot import name 'metric_module_factory' from 'datasets.load' (C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py) |
Hi @Pommel4711 , |
Yes, I did update datasets as @lhoestq suggested.
Despite updating the records I get a new error. Any further suggestions would be greatly appreciated. Thank you! |
Just to be sure, how did you update the package, and what is the current version you are running? |
Issue with
|
Thanks a lot for the detailed steps! |
I tried running ImportError: cannot import name 'metric_module_factory' from 'datasets.load' (C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py) Do you have any other suggestions on how to resolve this issue? Thank you! |
cc @lhoestq this sounds like a datasets issue, you can transfer the issue to your lib if needed :) |
I was unable to reproduce the issue even following the steps. I think it is indeed a |
Maybe i found the problem with the dataset.I followed the steps mentioned in this comment to resolve the issue without deleting the file Instead, I tried upgrading the pip install -U datasets However, after the upgrade, I noticed that the load.py file remains unchanged and is not the same as the one from this link. But than i remain with this error (lighteval) D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval>python run_evals_accelerate.py ^ --model_args "pretrained=DiscoResearch/DiscoLM_German_7b_v1" ^ --tasks "./examples/tasks/all_german_rag_evals.txt" ^ --override_batch_size 1 ^ --use_chat_template ^ --custom_tasks "community_tasks/german_rag_evals.py" ^ --output_dir "./evals/"
Using either accelerate or text-generation to run this script is advised.
main: (0, Namespace(model_config_path=None, model_args='pretrained=DiscoResearch/DiscoLM_German_7b_v1', max_samples=None, override_batch_size=1, job_id='', output_dir='./evals/', push_results_to_hub=False, save_details=False, push_details_to_hub=False, push_results_to_tensorboard=False, public_run=False, cache_dir=None, results_org=None, use_chat_template=True, system_prompt=None, dataset_loading_processes=1, custom_tasks='community_tasks/german_rag_evals.py', tasks='./examples/tasks/all_german_rag_evals.txt', num_fewshot_seeds=1)), {
Test all gather {
Not running in a parallel setup, nothing to test
} [0:00:00.001000]
Creating model configuration {
} [0:00:00]
Model loading {
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Tokenizer truncation and padding size set to the left side.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:21<00:00, 7.09s/it]
Using Data Parallelism, putting model on device cpu
Model info: ModelInfo(model_name='DiscoResearch/DiscoLM_German_7b_v1', model_sha='560f972f9f735fc9289584b3aa8d75d0e539c44e', model_dtype='torch.bfloat16', model_size=-1)
} [0:00:23.565683]
Tasks loading {
} [0:00:00.061002]
} [0:00:23.641685]
Traceback (most recent call last):
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 117, in resolve_trust_remote_code
signal.signal(signal.SIGALRM, _raise_timeout_error)
AttributeError: module 'signal' has no attribute 'SIGALRM'. Did you mean: 'SIGABRT'?
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 89, in <module>
main(args)
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\hierarchical_logger.py", line 166, in wrapper
return fn(*args, **kwargs)
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 91, in main
task_dict = Registry(cache_dir=env_config.cache_dir).get_task_dict(
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 133, in get_task_dict
custom_tasks_module.append(create_custom_tasks_module(custom_tasks=custom_tasks))
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 168, in create_custom_tasks_module
dataset_module = dataset_module_factory(str(custom_tasks))
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 1814, in dataset_module_factory
).get_module()
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 962, in get_module
trust_remote_code = resolve_trust_remote_code(self.trust_remote_code, self.name)
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 133, in resolve_trust_remote_code
raise ValueError(
ValueError: The repository for german_rag_evals contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/german_rag_evals.
Please pass the argument `trust_remote_code=True` to allow custom code to be run. |
Are you trying to run evaluation in offline mode? I got the same error but I am trying offline and I have replace HF links with local location but same trust_remote_code error keeps arising. |
I'm running this always with internet connection. But i don't know the problem. I switched to Linux and it worked |
@Pommel4711 now I also have the same issue. I am on linux. So this should not be the root cause of the problem. |
@Pommel4711 I found a solution that works for me. See here: #278 It is by adding But this should not be required. IMO this should be considered as a bug in lighteval. |
can you try uninstalling and reinstalling |
You mean a |
I double checked and actually the Anyway there seems to be a dataset called I couldn't find this dataset on HF though, is it a local dataset of yours ? |
Ah it's |
I think this is not how lighteval is supposed to work. |
So it looks like |
This may be the case and may be the cause of this issue. |
@NathanHB we have new insights into this issue - see comments from me above. |
Interesting, I'll take a look this week |
Hello, I don't know what I'm doing wrong. I received the following error as indicated in the title.
My input was as shown on this website: :
Hugging Face - Ger-RAG-eval.
The output was as follows:
I discovered that the argument
trust_remote_code=True
must be passed as part of the model_args parameter. To fix the issue, I tried the following code, but unfortunately, the error persisted.Maybe this can help.
When I entered the command
accelerate env
, I received the following output:Copy-and-paste the text below in your GitHub issue
The text was updated successfully, but these errors were encountered: