Adding a seed argument to the ClusterBasedNormalizer #574

AndresAlgaba · 2022-11-02T12:00:56Z

Problem Description

There is some randomness in fitting the ClusterBasedNormalizer. This also causes reproducibility issues in the other sdv libraries, e.g., sdv-dev/CTGAN#213.

Expected behavior

The BayesianGaussianMixture used to fit the distribution has a random_state argument that could be used for reproducibility purposes (see https://scikit-learn.org/stable/modules/generated/sklearn.mixture.BayesianGaussianMixture.html).

Additional context

I have only looked at the ClusterBasedNormalizer, but it may be that other methods could use the same approach for reproducibility purposes.

The text was updated successfully, but these errors were encountered:

AndresAlgaba added the pending review label Nov 2, 2022

AndresAlgaba mentioned this issue Nov 2, 2022

Add seed argument to the ClusterBasedNormalizer #575

Closed

amontanez24 mentioned this issue Nov 23, 2022

Add ability to control randomness #584

Closed

npatki removed the pending review label Dec 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a seed argument to the ClusterBasedNormalizer #574

Adding a seed argument to the ClusterBasedNormalizer #574

AndresAlgaba commented Nov 2, 2022

Adding a seed argument to the ClusterBasedNormalizer #574

Adding a seed argument to the ClusterBasedNormalizer #574

Comments

AndresAlgaba commented Nov 2, 2022

Problem Description

Expected behavior

Additional context