difference reason #2

generalwave · 2020-06-12T04:07:49Z

1、bn：
nn.BatchNorm2d(out_channels, eps=1e-3, momentum=0.01)
2、padding：left top first in pytorch，right bottom first in tensorflow or keras
`class Conv2dKeras(nn.Conv2d):
def init(self, in_channels, out_channels, kernel_size, stride=1,
padding='same', dilation=1, groups=1,
bias=True, padding_mode='zeros'):
super(Conv2dKeras, self).init(
in_channels, out_channels, kernel_size, stride,
0, dilation, groups,
bias, padding_mode)
self.keras_mode = padding

def _padding_size(self, size, idx):
    output = (size[idx] + self.stride[idx] - 1) // self.stride[idx]
    padding = (output - 1) * self.stride[idx] + (self.kernel_size[idx] - 1) * self.dilation[idx] + 1 - size[idx]
    padding = max(0, padding)
    return padding

def forward(self, x):
    if self.keras_mode == 'same':
        size = x.shape[2:]
        row = self._padding_size(size, 0)
        col = self._padding_size(size, 1)
        x = functional.pad(x, [floor(col / 2), ceil(col / 2), floor(row / 2), ceil(row / 2)])

    return super(Conv2dKeras, self).forward(x)`

The text was updated successfully, but these errors were encountered:

james34602 · 2020-07-07T23:43:43Z

@tuan3w
The only fatal error of your implementation is concatenation.
https://github.com/deezer/spleeter/blob/39af9502ab1156c013f17f8d8cd1c53d46459857/spleeter/model/functions/unet.py#L127
Each U-Net encoder convolutional layer output is being concated with decoder output.
We are not concatenating the encoder batch norm or activation output.

Minor issue to solve:

Batch normalization set to 1e-3.
Leaky ReLU alpha is 0.2 in official Spleeter, not 0.3
4 stems model change all the encoder and decoder activation to exponential ReLU.

Here I got my implementation of Spleeter in C correct with a VST demo:
https://github.com/james34602/SpleeterRT/blob/master/Source/spleeter.c

@generalwave
I don't think the problem is about CNN padding, no?

tuan3w · 2020-07-12T05:56:18Z

Thanks @james34602 and @generalwave.

The quality of output seems better now. However, I still see some differences in waveform output. Not sure due to some bug or differences in preprocessing step.

james34602 · 2020-07-12T07:08:15Z

@tuan3w What's the MSE/MAE of output mask between your output and official Spleeter(Tensorflow)?
If the mask function is identical or similar (1e-3), then you are implement absolutely correct.
You don't have to care the differences cause by minor processing.

tuan3w · 2020-07-12T08:26:38Z

Hi @james34602 ,
Here the spectrogram by output audios.

The top one is from my implementation, the bottom is from spleeter. As you can see, the audio generated by spleeter seems has litter noise at high frequencies than mine.

james34602 · 2020-07-12T14:37:56Z

Recently is busy on my projects, may be help you to find remaining bugs in the future.

generalwave · 2020-07-13T01:57:22Z

@james34602 padding 在pytorch和tensorflow的不同，影响还是挺大，如果从头训练没问题，但是模型来自模型转换的部分还需按照原来的，CNN和转置CNN中padding的方式和pytorch都不一致，都需要改动，为方便参考，刚提交了我的pytorch实现，训练部分和预测部分有和原始文件有些许不同。
https://github.com/generalwave/spleeter.pytorch

james34602 · 2020-07-13T02:12:12Z

@generalwave
据我经验Tensorflow和Matlab的Padding几乎无别。
至于Pytorch和Tensorflow的区别，除了Padding='same'外的特例我不知道。
我试过将SRGAN Pytorch的CNN系数转到Matlab里，两者预测的结果是一致。
就算Tensorflow和Pytorch的Padding不一样，理论上完全能预补零解决。
Spleeter官方没公开训练集，重头训练并匹配原论文的结果是没可能。

generalwave · 2020-07-13T02:21:03Z

pytorch和matlab一致，应该是图大小和padding方式正好的缘故。
卷积核不对称的话，补零是不一致的。
我说的不是一定要模型参数和spleeter一致，而是用pytorch的padding方式，需从头训练，效果可以spleeter能一致。

james34602 · 2020-07-13T02:40:41Z

可能SRGAN那方的Padding刚好导致输入输出大小一样，所以和Matlab的'same'无别，所以结果吻合。
个人在C实现TF或Pytorch的CNN都没问题，设好stride, padding, dilation和offset，然后送去im2col(),gemm()就ok

tuan3w pushed a commit that referenced this issue Jul 12, 2020

fix: fix spleeter architecture as in #2

853d4bb

tuan3w mentioned this issue Jun 24, 2022

fix: change padding style to tf style #4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

difference reason #2

difference reason #2

generalwave commented Jun 12, 2020 •

edited

Loading

james34602 commented Jul 7, 2020

tuan3w commented Jul 12, 2020

james34602 commented Jul 12, 2020 •

edited

Loading

tuan3w commented Jul 12, 2020

james34602 commented Jul 12, 2020

generalwave commented Jul 13, 2020

james34602 commented Jul 13, 2020

generalwave commented Jul 13, 2020

james34602 commented Jul 13, 2020

difference reason #2

difference reason #2

Comments

generalwave commented Jun 12, 2020 • edited Loading

james34602 commented Jul 7, 2020

tuan3w commented Jul 12, 2020

james34602 commented Jul 12, 2020 • edited Loading

tuan3w commented Jul 12, 2020

james34602 commented Jul 12, 2020

generalwave commented Jul 13, 2020

james34602 commented Jul 13, 2020

generalwave commented Jul 13, 2020

james34602 commented Jul 13, 2020

generalwave commented Jun 12, 2020 •

edited

Loading

james34602 commented Jul 12, 2020 •

edited

Loading