Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.11.0 release #153

Merged
merged 10 commits into from
Sep 12, 2024
Merged

0.11.0 release #153

merged 10 commits into from
Sep 12, 2024

Conversation

rodrigo-arenas
Copy link
Owner

This PR adds the following features and changes:

Features:

  • Added a parameter use_cache, which defaults to True. When enabled, the algorithm will skip re-evaluating solutions that have already been evaluated, retrieving the performance metrics from the cache instead.
    If use_cache is set to False, the algorithm will always re-evaluate solutions, even if they have been seen before, to obtain fresh performance metrics.

  • Added a parameter in GAFeatureSelectionCV named warm_start_configs, which defaults to None. This is a list of predefined hyperparameter configurations to seed the initial population. Each element in the list is a dictionary where the keys are the names of the hyperparameters, and the values are the corresponding hyperparameter values to be used for the individual.

    Example:

    warm_start_configs = [
        {"min_weight_fraction_leaf": 0.02, "bootstrap": True, "max_depth": None, "n_estimators": 100},
        {"min_weight_fraction_leaf": 0.4, "bootstrap": True, "max_depth": 5, "n_estimators": 200},
    ]

The genetic algorithm will initialize part of the population with these configurations to warm-start the optimization process. The remaining individuals in the population will be initialized randomly according to the defined hyperparameter space.

This parameter is useful when prior knowledge of good hyperparameter configurations exists, allowing the algorithm to focus on refining known good solutions while still exploring new areas of the hyperparameter space. If set to None, the entire population will be initialized randomly.

  • Introduced a novelty search strategy to the GASearchCV class. This strategy rewards solutions that are more distinct from others in the population by incorporating a novelty score into the fitness evaluation. The novelty score encourages exploration and promotes diversity, reducing the risk of premature convergence to local optima.

    * Novelty Score: Calculated based on the distance between an individual and its nearest neighbors in the population. Individuals with higher novelty scores are more distinct from the rest of the population.
    
    * Fitness Evaluation: The overall fitness is now a combination of the traditional performance score and the novelty score, allowing the algorithm to balance between exploiting known good solutions and exploring new, diverse ones.
    
    * Improved Exploration: This strategy helps explore new areas of the hyperparameter space, increasing the likelihood of discovering better solutions and avoiding local optima.
    

API Changes:

  • Dropped support for Python 3.8

@rodrigo-arenas rodrigo-arenas merged commit 1314a7c into master Sep 12, 2024
0 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant