Skip to content
This repository has been archived by the owner on Aug 10, 2022. It is now read-only.

fix: change padding style to tf style #4

Merged
merged 1 commit into from
Jun 24, 2022

Conversation

unemployed-denizen
Copy link
Contributor

Main changes are the paddings in convolution layers (I have done the experiments on tf to make sure that the padding works). I also rewiring the last down conv layer, since it is not used in the tf version. Please check the examples to see if the quality of output is increased.
I also edited test_estimator.py so that it fits the new version of pytorch, and the inference is also supporting GPU.

@tuan3w
Copy link
Owner

tuan3w commented Jun 18, 2022

Hi @unemployed-denizen,
Thanks for this investigation. I haven't learned more about Pytorch/TensorFlow for years, so I'm not sure about this.
Just checked Spectrogram in Audacity and spectrogram differences. Here is what I found.

  • Spectrogram in Audacity:

Screenshot from 2022-06-18 18-57-50

Screenshot from 2022-06-18 18-52-18
From top to bottom: Old result, reference result, new result.

  • The shape of output:
    • Old result: [2, 481280]
    • Reference & new result: [2, 479832]
  • AVG Spec difference between each result and the original output:
    • Old result: 1.8341962099075317
    • New result: 0.0863308310508728

The shape of the new output matches the shape of the original one but there is still a large enough difference. Do you have any ideas to improve this?

@unemployed-denizen
Copy link
Contributor Author

Hi @unemployed-denizen, Thanks for this investigation. I haven't learned more about Pytorch/TensorFlow for years, so I'm not sure about this. Just checked Spectrogram in Audacity and spectrogram differences. Here is what I found.

  • Spectrogram in Audacity:

Screenshot from 2022-06-18 18-57-50

Screenshot from 2022-06-18 18-52-18 From top to bottom: Old result, reference result, new result.

  • The shape of output:

    • Old result: [2, 481280]
    • Reference & new result: [2, 479832]
  • AVG Spec difference between each result and the original output:

    • Old result: 1.8341962099075317
    • New result: 0.0863308310508728

The shape of the new output matches the shape of the original one but there is still a large enough difference. Do you have any ideas to improve this?

For shape issue, it's all about how the padding of stft and istft works. I changed both of them to 'centre' so that the padding works in the right way.
For computation accuracy, it might depend on the stft lib and framwork itself. Even if the procedure is 100% same, same input will give different output (the difference could be small but would not be 100% same). It would be hard to figure out what happends in the underlying computation process.
By the way, I would recommend Demucs v3.0.4, since it is SOTA in 2022.

@tuan3w
Copy link
Owner

tuan3w commented Jun 24, 2022

Hi @unemployed-denizen,

Sorry for the late reply. Actually, I don't have much passion for ML nowadays because ML is now more about scaling computing power.
Yes, I'm using the latest version of Demucs. My evaluation shows that your result is better, but checking the spectrogram shows that my old result seems better at higher frequencies.
image

However, I'm going to merge this. Anyone who uses this library should be aware of this difference.

Again, thanks for your time.

Ref issue: #2

@tuan3w tuan3w changed the title changed padding to tf style fix: change padding style to tf style Jun 24, 2022
@tuan3w tuan3w merged commit 88249f5 into tuan3w:master Jun 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants