-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify Tybalt to handle missing values for incomplete data #156
Comments
Hi Yagmur, Do you have details on how you are implementing this? In prior work with a different architecture, we used:
https://www.biorxiv.org/content/10.1101/039800v1.full I think we would need more details to provide any guidance. |
Dear Dr. Greene, thank you for your reply and the paper. I see that in the paper the corrupted values have been masked with zeros. In my case, the original data set may initially contain zeros and missing values are represented separately with numpy.nan. Therefore I cannot simply overwrite the missing values with zeros in the preprocessing. I believe replacing the missing values(numpy.nan) with any value would affect the binary cross-entropy loss even if multiply the input vector and the reconstruction with the "missingness vector" m in the end. Please correct me if I am wrong. Instead, I need to omit them when calculating the loss. Therefore what I need is rather a loss function that creates a mask for the missing values in the original data and applies this mask to the original and predicted values before the calculations. To sum up, the pipeline we have in mind is as follows: 1- Get the original data which initially may have missing values (numpy.nan) and preprocess omitting the missing values As far as I am concerned, I only need to modify the reconstruction error, K.metrics.binary_crossentropy() and not the KL term to achieve this. Therefore I have been working on a custom binary-cross-entropy function that masks the original and predicted values, where the original data is missing :
After testing this function with the variables as in the code snippet below, the loss is 0.659456. However, when I use it instead of keras.metrics.binary_crossentropy(), the loss graph is empty and the both axes ticks show unexpecetd values (-0.04, -0.02, 0, 0.02, 0.04). Do I need to do other modifications to the training pipeline/ model? I am also not sure if I need to keep the mean calculation at the end of the custom loss function. Is vae_loss( ) calculated on each sample? Thank you very much for your time and suggestions!
|
Nice explanation @yagmuronay - a couple quick things to consider:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Dear Dr. Greg,
Firstly, thank you very much for this well-documented research. This is not an issue but a feature that I need for my thesis. I would like to modify this model so that it can handle missing values in the test and training data. Because the data set I want to use in the future contains missing values in every sample. Imputing missing values before the training e.g. with zeros is not an option as this would introduce bias and the value zero has a meaning in our data set. Therefore I started with replacing the reconstruction error function with my custom binary cross-entropy function that can handle missing values. I tested this function in another notebook and it seemed to work. However, I observe NaN loss values in the training, also even after only using the reconstruction error. If you have any tips for me regarding how to handle missing values in the model, I would be very grateful for your help.
Kind regards,
Yagmur Onay
The text was updated successfully, but these errors were encountered: