Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FT] Pipeline does not fully handle trust_remote_code to load dataset #362

Open
Sanahm opened this issue Oct 15, 2024 · 0 comments
Open
Labels
feature request New feature/request

Comments

@Sanahm
Copy link

Sanahm commented Oct 15, 2024

Issue encountered

trust_remote_code is now mandatory to load dataset specifically with a script.

Solution/Feature

Add trust_remote_code as an input to create_custom_tasks_module() then in get_custom_tasks() and in Pipeline.pipeline_parameters (lighteval.tasks.registry).

class Pipeline:
    def _init_tasks_and_requests(self, tasks):
        with htrack_block("Tasks loading"):
            with local_ranks_zero_first() if self.launcher_type == ParallelismManager.NANOTRON else nullcontext():
                # If some tasks are provided as task groups, we load them separately
                custom_tasks = self.pipeline_parameters.custom_tasks_directory
                tasks_groups_dict = None
                if custom_tasks:
                    _, tasks_groups_dict = get_custom_tasks(custom_tasks, trust_remote_code=self.pipeline_parameters.trust_remote_code)
                if tasks_groups_dict and tasks in tasks_groups_dict:
                    tasks = tasks_groups_dict[tasks]

                ...

Maybe a more general fix could be to pass a generic **kwargs as an additional input to get_custom_tasks().

@Sanahm Sanahm added the feature request New feature/request label Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature/request
Projects
None yet
Development

No branches or pull requests

1 participant