Skip to content
This repository has been archived by the owner on Aug 10, 2022. It is now read-only.

difference reason #2

Open
generalwave opened this issue Jun 12, 2020 · 9 comments
Open

difference reason #2

generalwave opened this issue Jun 12, 2020 · 9 comments

Comments

@generalwave
Copy link

generalwave commented Jun 12, 2020

1、bn:
nn.BatchNorm2d(out_channels, eps=1e-3, momentum=0.01)
2、padding:left top first in pytorch,right bottom first in tensorflow or keras
`class Conv2dKeras(nn.Conv2d):
def init(self, in_channels, out_channels, kernel_size, stride=1,
padding='same', dilation=1, groups=1,
bias=True, padding_mode='zeros'):
super(Conv2dKeras, self).init(
in_channels, out_channels, kernel_size, stride,
0, dilation, groups,
bias, padding_mode)
self.keras_mode = padding

def _padding_size(self, size, idx):
    output = (size[idx] + self.stride[idx] - 1) // self.stride[idx]
    padding = (output - 1) * self.stride[idx] + (self.kernel_size[idx] - 1) * self.dilation[idx] + 1 - size[idx]
    padding = max(0, padding)
    return padding

def forward(self, x):
    if self.keras_mode == 'same':
        size = x.shape[2:]
        row = self._padding_size(size, 0)
        col = self._padding_size(size, 1)
        x = functional.pad(x, [floor(col / 2), ceil(col / 2), floor(row / 2), ceil(row / 2)])

    return super(Conv2dKeras, self).forward(x)`
@james34602
Copy link

@tuan3w
The only fatal error of your implementation is concatenation.
https://github.com/deezer/spleeter/blob/39af9502ab1156c013f17f8d8cd1c53d46459857/spleeter/model/functions/unet.py#L127
Each U-Net encoder convolutional layer output is being concated with decoder output.
We are not concatenating the encoder batch norm or activation output.

Minor issue to solve:

  1. Batch normalization set to 1e-3.
  2. Leaky ReLU alpha is 0.2 in official Spleeter, not 0.3
  3. 4 stems model change all the encoder and decoder activation to exponential ReLU.

Here I got my implementation of Spleeter in C correct with a VST demo:
https://github.com/james34602/SpleeterRT/blob/master/Source/spleeter.c

@generalwave
I don't think the problem is about CNN padding, no?

tuan3w pushed a commit that referenced this issue Jul 12, 2020
@tuan3w
Copy link
Owner

tuan3w commented Jul 12, 2020

Thanks @james34602 and @generalwave.

The quality of output seems better now. However, I still see some differences in waveform output. Not sure due to some bug or differences in preprocessing step.

@james34602
Copy link

james34602 commented Jul 12, 2020

@tuan3w What's the MSE/MAE of output mask between your output and official Spleeter(Tensorflow)?
If the mask function is identical or similar (1e-3), then you are implement absolutely correct.
You don't have to care the differences cause by minor processing.

@tuan3w
Copy link
Owner

tuan3w commented Jul 12, 2020

Hi @james34602 ,
Here the spectrogram by output audios.
image

The top one is from my implementation, the bottom is from spleeter. As you can see, the audio generated by spleeter seems has litter noise at high frequencies than mine.

@james34602
Copy link

Recently is busy on my projects, may be help you to find remaining bugs in the future.

@generalwave
Copy link
Author

@james34602 padding 在pytorch和tensorflow的不同,影响还是挺大,如果从头训练没问题,但是模型来自模型转换的部分还需按照原来的,CNN和转置CNN中padding的方式和pytorch都不一致,都需要改动,为方便参考,刚提交了我的pytorch实现,训练部分和预测部分有和原始文件有些许不同。
https://github.com/generalwave/spleeter.pytorch

@james34602
Copy link

@generalwave
据我经验Tensorflow和Matlab的Padding几乎无别。
至于Pytorch和Tensorflow的区别,除了Padding='same'外的特例我不知道。
我试过将SRGAN Pytorch的CNN系数转到Matlab里,两者预测的结果是一致。
就算Tensorflow和Pytorch的Padding不一样,理论上完全能预补零解决。
Spleeter官方没公开训练集,重头训练并匹配原论文的结果是没可能。

@generalwave
Copy link
Author

pytorch和matlab一致,应该是图大小和padding方式正好的缘故。
卷积核不对称的话,补零是不一致的。
我说的不是一定要模型参数和spleeter一致,而是用pytorch的padding方式,需从头训练,效果可以spleeter能一致。

@james34602
Copy link

可能SRGAN那方的Padding刚好导致输入输出大小一样,所以和Matlab的'same'无别,所以结果吻合。
个人在C实现TF或Pytorch的CNN都没问题,设好stride, padding, dilation和offset,然后送去im2col(),gemm()就ok

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants