Mini-batch training on GMM #19

Daisy-GENG · 2022-05-24T16:22:15Z

Hi,

I want to implement mini-batching training on GMM as discussed in #7 . However, I am little bit confused by the code gmm.reset_parameters(torch.Tensor(fvectors[:500].astype(np.float32))). I am not sure whether it is related to my version of pycave, or maybe my understanding to the code in #7 is wrong. My code doesn't work.

My code are as follows:

from pycave.bayes.gmm import GaussianMixture as GM
from dataloader.gmm_dataset import gmm_dataset

train_gmm_dataset = gmm_dataset(data_path)
train_dataset_loader = torch.utils.data.DataLoader(dataset=train_gmm_dataset,
                                                        batch_size=train_dataloader_config["batch_size"],
                                                        shuffle=train_dataloader_config["shuffle"],
                                                        num_workers=train_dataloader_config["num_workers"])

for i, data in enumerate(train_dataset_loader):  # data:[1, pt, 3]
    data = torch.squeeze(data, 0)
    gmm = GM(num_components=2, covariance_type="diag", init_strategy="kmeans")
    gmm.model_.reset_parameters(data)  
    history = gmm.fit(train_dataset_loader)

And the error is:

`GaussianMixture` has not been fitted yet

Thank you so much!

Best regards,
Daisy

The text was updated successfully, but these errors were encountered:

borchero · 2022-05-24T16:24:08Z

Issue #7 still referred to PyCave version 2. In PyCave v3, you don't need to call gmm.model_.reset_parameters: the model_ attribute will only be available once fit has returned without error.

I believe that this should be the line that causes your error.

Daisy-GENG · 2022-05-24T16:36:14Z

So is there a similar way to implement batch training in PyCave version 3 using dataloader? My whole dataset is large, so I cannot load all the data into the memory once.

Thank you so much!

Best regards,
Daisy

borchero · 2022-05-24T16:40:20Z

Ah, sorry! Yes, you can simply set the batch size when initializing the GMM. In your case, you might, for example, use:

gmm = GM(..., batch_size=8192)

This will automatically take care to load data in batches, both for initialization and GMM training. Note that you might be better off with init_strategy='kmeans++' since kmeans is quite costly to run. You'll need PyCave 3.1.3 for that, though (there was a bug for kmeans++ initialization before).

justuswill mentioned this issue Mar 9, 2023

GMM with Mini-Batches #51

Open

hashim19 mentioned this issue Jan 2, 2024

GMM Training with Mini-Batch #57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mini-batch training on GMM #19

Mini-batch training on GMM #19

Daisy-GENG commented May 24, 2022

borchero commented May 24, 2022 •

edited

Loading

Daisy-GENG commented May 24, 2022

borchero commented May 24, 2022

Mini-batch training on GMM #19

Mini-batch training on GMM #19

Comments

Daisy-GENG commented May 24, 2022

borchero commented May 24, 2022 • edited Loading

Daisy-GENG commented May 24, 2022

borchero commented May 24, 2022

borchero commented May 24, 2022 •

edited

Loading