Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ZeroshotSimulatorContext #6

Open
Innixma opened this issue Apr 23, 2023 · 0 comments
Open

Refactor ZeroshotSimulatorContext #6

Innixma opened this issue Apr 23, 2023 · 0 comments

Comments

@Innixma
Copy link
Collaborator

Innixma commented Apr 23, 2023

ZeroshotSimulatorContext was written early on in the codebase and is hacky. It would be good to revisit and clean up the class, especially in terms of variable naming consistency with the rest of the package and more elegantly incorporating zpp/gt objects.

Most critically, the following code in init is very confusing, and we should make the most of property functions to minimize confusion and code complexity:

        self.df_results_by_dataset_vs_automl, \
        self.df_raw, \
        self.dataset_name_to_tid_dict, \
        self.dataset_to_tid_dict, \
        self.dataset_name_to_fold_dict, \
        self.unique_dataset_folds, \
        self.unique_datasets, \
        self.rank_scorer_vs_automl = self.align_valid_folds(
            df_results_by_dataset=df_results_by_dataset,
            df_results_by_dataset_automl=df_results_by_dataset_automl,
            df_raw=df_raw,
            folds=folds,
            score_against_only_automl=self.score_against_only_automl,
            pct=self.pct,
        )
        self.dataset_parent_to_fold_map = self._compute_dataset_parent_to_fold_map()

        tmp = self.df_results_by_dataset_vs_automl[['dataset', 'tid', 'problem_type']]
        self.dataset_to_problem_type_dict = tmp[['dataset', 'problem_type']].drop_duplicates().set_index(
            'dataset').squeeze().to_dict()
        self.tid_to_problem_type_dict = tmp[['tid', 'problem_type']].drop_duplicates().set_index(
            'tid').squeeze().to_dict()

For example:

  1. self.unique_dataset_folds can simply be self.tasks as a property
  2. self.unique_datasets can simply be self.datasets as a property
  3. self.dataset_to_tid_dict can be simplified to a method call
  4. ditto for self.dataset_name_to_fold_dict, which should be named self.task_to_fold_dict
@Innixma Innixma transferred this issue from Innixma/autogluon-zeroshot May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant