Training with my own data #48

cszer · 2020-06-21T01:24:40Z

Hello,thanks for this awesome project. I have the strange issue. I prepared my own dataset with imaqes 542x1024 and when training starts i always get
N/A% (0 of 200) | | Elapsed Time: 0:00:00 ETA: --:--:--
N/A% (0 of 946) | | Elapsed Time: 0:00:00 ETA: --:--:--
[torch.Size([2, 256, 34, 64]), torch.Size([2, 256, 34, 64])]
[torch.Size([2, 128, 68, 128]), torch.Size([2, 128, 68, 128])]
[torch.Size([2, 64, 136, 256]), torch.Size([2, 64, 136, 256])]
[torch.Size([2, 32, 272, 512]), torch.Size([2, 64, 271, 512])]
Dimension error when torch.cat(x,1)
Maybe its a stride , padding issue , please help me

JiawangBian · 2020-06-21T02:10:31Z

Often we require that the image width and height can be divided by 64. Here 1024/64 = 16 is ok, but 542 / 64 = 8.4。 So I suggest that you cut the top border of image t to make the height as 8*64=512. Also, do not forget change the intrinsic parameters (c_y = c_y - offset_y).

cszer · 2020-06-22T11:44:32Z

Thank you , it's works ,but new issue now , problem with nn.DataParallel RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution). When i use only 1 card all good ,but it's impossible to train on one 2070 super

cszer · 2020-06-22T11:52:14Z

this issue occurs at the decoder stage , at every network(dis,pose net)

JiawangBian · 2020-06-22T13:44:40Z

I suggest that you train model in one GPU, because the batchsize=4 is small. Also you can downsample your image to 1/2 resolution, i.e., 256x512. If you want to try Multi-GPU training, I suggest that you replace the DepthDecoder with the following parallel version.

class DepthDecoder_parallel(nn.Module):
def init(self, num_ch_enc, scales=range(4), num_output_channels=1, use_skips=True):
super(DepthDecoder_parallel, self).init()

    self.alpha = 10
    self.beta = 0.01

    self.num_output_channels = num_output_channels
    self.use_skips = use_skips
    self.upsample_mode = 'nearest'
    self.scales = scales

    self.num_ch_enc = num_ch_enc
    self.num_ch_dec = np.array([16, 32, 64, 128, 256])

    # decoder
    self.upconvs0 = []
    self.upconvs1 = []
    self.dispconvs = []
    self.i_to_scaleIdx_conversion = {}

    for i in range(4, -1, -1):
        # upconv_0
        num_ch_in = self.num_ch_enc[-1] if i == 4 else self.num_ch_dec[i + 1]
        num_ch_out = self.num_ch_dec[i]
        self.upconvs0.append(ConvBlock(num_ch_in, num_ch_out))

        # upconv_1
        num_ch_in = self.num_ch_dec[i]
        if self.use_skips and i > 0:
            num_ch_in += self.num_ch_enc[i - 1]
        num_ch_out = self.num_ch_dec[i]
        self.upconvs1.append(ConvBlock(num_ch_in, num_ch_out))

    for cnt, s in enumerate(self.scales):
        self.dispconvs.append(Conv3x3(self.num_ch_dec[s], self.num_output_channels))

        if s in range(4, -1, -1):
            self.i_to_scaleIdx_conversion[s] = cnt

    self.upconvs0 = nn.ModuleList(self.upconvs0)
    self.upconvs1 = nn.ModuleList(self.upconvs1)
    self.dispconvs = nn.ModuleList(self.dispconvs)
    self.sigmoid = nn.Sigmoid()

def init_weights(self):
    return

def forward(self, input_features):

    self.outputs = []

    # decoder
    x = input_features[-1]

    for cnt, i in enumerate(range(4, -1, -1)):
        x = self.upconvs0[cnt](x)
        x = [upsample(x)]
        if self.use_skips and i > 0:
            x += [input_features[i - 1]]
        x = torch.cat(x, 1)
        x = self.upconvs1[cnt](x)
        if i in self.scales:
            idx = self.i_to_scaleIdx_conversion[i]
            self.outputs.append(self.alpha * self.sigmoid(self.dispconvs[idx](x)) + self.beta)
        
    self.outputs = self.outputs[::-1]
    return self.outputs

JiawangBian · 2020-06-22T13:45:34Z

and replace the PoseDecoder with:

class PoseDecoder_Parallel(nn.Module):
def init(self, num_ch_enc, num_input_features=1, num_frames_to_predict_for=1, stride=1):
super(PoseDecoder_Parallel, self).init()

    self.num_ch_enc = num_ch_enc
    self.num_input_features = num_input_features

    if num_frames_to_predict_for is None:
        num_frames_to_predict_for = num_input_features - 1
    self.num_frames_to_predict_for = num_frames_to_predict_for

    self.conv_squeeze = nn.Conv2d(self.num_ch_enc[-1], 256, 1)

    self.convs_pose = []
    self.convs_pose.append(nn.Conv2d(num_input_features * 256, 256, 3, stride, 1))
    self.convs_pose.append(nn.Conv2d(256, 256, 3, stride, 1))
    self.convs_pose.append(nn.Conv2d(256, 6 * num_frames_to_predict_for, 1))

    self.relu = nn.ReLU()

    self.convs_pose = nn.ModuleList(list(self.convs_pose))

def forward(self, input_features):
    last_features = [f[-1] for f in input_features]

    cat_features = [self.relu(self.conv_squeeze(f)) for f in last_features]
    cat_features = torch.cat(cat_features, 1)

    out = cat_features
    for i in range(3):
        out = self.convs_pose[i](out)
        if i != 2:
            out = self.relu(out)

    out = out.mean(3).mean(2)

    pose = 0.01 * out.view(-1, 6)

    return pose

cszer · 2020-06-22T14:09:56Z

thanks , i simple rewrite your code , and issue solved [

cszer · 2020-06-22T14:11:04Z

I think it is OrderedDict issue

zhengmiao1996 · 2020-10-25T06:36:45Z

Hello,thanks for this awesome project. I have the strange issue. I prepared my own dataset with imaqes 542x1024 and when training starts i always get
N/A% (0 of 200) | | Elapsed Time: 0:00:00 ETA: --:--:--
N/A% (0 of 946) | | Elapsed Time: 0:00:00 ETA: --:--:--
[torch.Size([2, 256, 34, 64]), torch.Size([2, 256, 34, 64])]
[torch.Size([2, 128, 68, 128]), torch.Size([2, 128, 68, 128])]
[torch.Size([2, 64, 136, 256]), torch.Size([2, 64, 136, 256])]
[torch.Size([2, 32, 272, 512]), torch.Size([2, 64, 271, 512])]
Dimension error when torch.cat(x,1)
Maybe its a stride , padding issue , please help me

I have trouble with prepare my own data, can you show me your code on how to run prepare own data, Thanks!

JiawangBian · 2020-10-25T06:49:29Z

image resolution should be divided by 32。so you can change resolution to 512x1024

JiawangBian added the multi-gpu training label Jun 22, 2020

cszer closed this as completed Jun 22, 2020

JiawangBian added the good first issue Good for newcomers label Jun 26, 2020

JiawangBian pinned this issue Jun 26, 2020

JiawangBian mentioned this issue Jul 20, 2020

Trouble on Parallel Training #55

Closed

CSU-NXY mentioned this issue Oct 26, 2020

slow training with multiple gpu #62

Closed

JiawangBian unpinned this issue Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with my own data #48

Training with my own data #48

cszer commented Jun 21, 2020

JiawangBian commented Jun 21, 2020 •

edited

Loading

cszer commented Jun 22, 2020

cszer commented Jun 22, 2020

JiawangBian commented Jun 22, 2020

JiawangBian commented Jun 22, 2020

cszer commented Jun 22, 2020

cszer commented Jun 22, 2020

zhengmiao1996 commented Oct 25, 2020

JiawangBian commented Oct 25, 2020

Training with my own data #48

Training with my own data #48

Comments

cszer commented Jun 21, 2020

JiawangBian commented Jun 21, 2020 • edited Loading

cszer commented Jun 22, 2020

cszer commented Jun 22, 2020

JiawangBian commented Jun 22, 2020

JiawangBian commented Jun 22, 2020

cszer commented Jun 22, 2020

cszer commented Jun 22, 2020

zhengmiao1996 commented Oct 25, 2020

JiawangBian commented Oct 25, 2020

JiawangBian commented Jun 21, 2020 •

edited

Loading