Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

efficientAD_det model.ae.encoder.enconv1.weight 1536,There was a problem during conversion #1605

Open
watertianyi opened this issue Dec 5, 2024 · 13 comments

Comments

@watertianyi
Copy link

watertianyi commented Dec 5, 2024

Env

  • GPU, RTX3060TI
  • OS Ubuntu20.04
  • Cuda version11.8
  • TensorRT-8.5.3.1

About this repo

Your problem

I trained the model through anomalib and used https://github.com/wang-xinyu/tensorrtx/tree/master/efficient_ad.Can you help me find out what the problem is? I am using small Version
mkdir build
cd build
cmake ..
make
./efficientAD_det -s model.wts model.engine
The following bug appeared:
[12/05/2024-13:39:40] [E] [TRT] 3: ae.encoder.enconv1:kernel weights has count 0 but 1536 was expected
[12/05/2024-13:39:40] [E] [TRT] 4: ae.encoder.enconv1: count of 0 weights in kernel, but kernel dimensions (4,4) with 3 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 3 * 44 * 32 / 1 = 1536
[12/05/2024-13:39:40] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::58] Error Code 4: Internal Error (ae.encoder.enconv1: number of kernel weights does not match tensor dimensions)
[12/05/2024-13:39:40] [E] [TRT] 3: [network.cpp::addResize::1382] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/network.cpp::addResize::1382, condition: input.getDimensions().nbDims > 0
)
efficientAD_det: /tensorrt_ad/efficient_ad/src/model.cpp:242: nvinfer1::ILayer interpConvRelu(nvinfer1::INetworkDefinition*, std::map<std::__cxx11::basic_string, nvinfer1::Weights>&, nvinfer1::ITensor&, int, int, int, int, int, std::string, int): Assertion `interpolateLayer != nullptr' failed.
已放弃 (核心已转储)

The wts parsed content is
2024-12-05 15-10-40 的屏幕截图

@watertianyi
Copy link
Author

Using the https://github.com/B1SH0PP/EfficientAD_TRT model to run gen_wts.py, the error is as follows: Traceback (most recent call last):
File "/media/hjq/EC3C5BDA3C5B9E80/win10/hjq_code/gpt/CV/AD/tensorrt_ad/efficient_ad/datas/models1/gen_wts.py", line 12, in
model = torch.load(pt_file, map_location=torch.device('cuda'))
File "/media/hjq/EC3C5BDA3C5B9E80/ubuntu/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/torch/serialization.py", line 1360, in load
return _load(
File "/media/hjq/EC3C5BDA3C5B9E80/ubuntu/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/torch/serialization.py", line 1848, in _load
result = unpickler.load()
File "/media/hjq/EC3C5BDA3C5B9E80/ubuntu/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/torch/serialization.py", line 1837, in find_class
return super().find_class(mod_name, name)
ModuleNotFoundError: No module named 'anomalib.models.efficient_ad'

@wang-xinyu
Copy link
Owner

@B1SH0PP Can you help here?

@watertianyi
Copy link
Author

@wang-xinyu

The first problem has been solved, but I don't know how to solve the second problem. Another question is, when performing tensorrt inference, why is the inference time longer when the batch size is set to 4, 8, 16, or 32 than when it is set to 1? How can this be solved?

@watertianyi
Copy link
Author

watertianyi commented Dec 13, 2024

@wang-xinyu
The model is trained using its own data in anomalib. The input data in C++ is the same as that in Python. I have compared it, but the output data is very different. When tensorrt is inferring, I cannot print the middle output layer of the model. How to solve the accuracy loss?
C++ output:
Image_20241213181825
python output:
Image_20241213181845

@watertianyi
Copy link
Author

@wang-xinyu @B1SH0PP
After a period of troubleshooting, what is more serious than the loss of accuracy is that no matter what data is input, the output is the same. I suspect that wts has been damaged, so I modified a few places:

  1. The author uses efficientAD_medium, and I trained efficientAD_samll, which was modified.
    /* PDN_samll_teacher */
    // no BN added after the convolutional layer
    // auto BN2 = NormalizeInput(network, *InputData);
    auto teacher1 = convRelu(network, weightMap, *InputData, 128, 4, 1, 0, 1, "teacher.conv1", true);
    auto avgPool1 = avgPool2d(network, *teacher1->getOutput(0), 2, 2, 0);
    auto teacher2 = convRelu(network, weightMap, *avgPool1->getOutput(0), 256, 4, 1, 0, 1, "teacher.conv2", true);
    auto avgPool2 = avgPool2d(network, *teacher2->getOutput(0), 2, 2, 0);
    auto teacher3 = convRelu(network, weightMap, *avgPool2->getOutput(0), 256, 3, 1, 0, 1, "teacher.conv3", true);
    auto teacher4 = convRelu(network, weightMap, *teacher3->getOutput(0), 384, 4, 1, 0, 1, "teacher.conv4", false);

/* PDN_samll_student */
// auto BN3 = NormalizeInput(network, *InputData);
auto student1 = convRelu(network, weightMap, *InputData, 128, 4, 1, 0, 1, "student.conv1", true);
auto avgPool3 = avgPool2d(network, *student1->getOutput(0), 2, 2, 0);
auto student2 = convRelu(network, weightMap, *avgPool3->getOutput(0), 256, 4, 1, 0, 1, "student.conv2", true);
auto avgPool4 = avgPool2d(network, *student2->getOutput(0), 2, 2, 0);
auto student3 = convRelu(network, weightMap, *avgPool4->getOutput(0), 256, 3, 1, 0, 1, "student.conv3", true);
auto student4 = convRelu(network, weightMap, *student3->getOutput(0), 768, 4, 1, 0, 1, "student.conv4", false);

  1. Due to the process of converting the trained anomalib model pt to wts, there is an extra field model in the red box as shown below. In order to successfully convert the engine, I deleted the model field.
    Image_20241215163922

Based on the above 1 and 2, it will lead to serious loss of accuracy, and it will also lead to different input data and the same output results.

  1. Can the model name field in wts be modified?
  2. I restored the field in the modified code to model, and the following bug appeared:
    [12/15/2024-16:26:54] [E] [TRT] 3: model.ae.decoder.deconv1.conv:kernel weights has count 65536 but 1024 was expected
    [12/15/2024-16:26:54] [E] [TRT] 4: model.ae.decoder.deconv1.conv: count of 65536 weights in kernel, but kernel dimensions (4,4) with 1 input channels, 64 output channels and 1 groups were specified. Expected Weights count is 1 * 4*4 * 64 / 1 = 1024
    [12/15/2024-16:26:54] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::58] Error Code 4: Internal Error (model.ae.decoder.deconv1.conv: number of kernel weights does not match tensor dimensions)
    [12/15/2024-16:26:54] [E] [TRT] 3: [network.cpp::addResize::1382] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/network.cpp::addResize::1382, condition: input.getDimensions().nbDims > 0
    )
    It's urgent. Can you help me?

@watertianyi
Copy link
Author

@wang-xinyu @B1SH0PP

I also found a problem that the author used an input with three dimensions, and the python version had four dimensions, with a batchsize added.
Does the code need to be modified?
`
I also found a problem that the author used an input with three dimensions, and the python version had four dimensions, with a batchsize added.
/* create network object /
INetworkDefinition
network = builder->createNetworkV2(0U); // 解析onnx网络文件,tensorRT模型类
// INetworkDefinition* network = builder->createNetworkV2(1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));

/* create input tensor {3, kInputH, kInputW} */
// ITensor* InputData =network->addInput(kInputTensorName, dt, Dims4{kBatchSize, 3,kInputH, kInputW});
ITensor* InputData = network->addInput(kInputTensorName, dt, Dims3{3, kInputH, kInputW});
assert(InputData);`

@wang-xinyu
Copy link
Owner

3-dimension is fine. Can you try the same model which B1SH0PP was using?

@watertianyi
Copy link
Author

OK, I'll try

@watertianyi
Copy link
Author

watertianyi commented Dec 16, 2024

@wang-xinyu
I tried the author's original code and there was no problem after testing. However, when I ran the modified code, the accuracy was 0.1 worse.

`ICudaEngine* build_efficientAD_engine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt,
float& gd, float& gw, std::string& wts_name) {
/* create network object /
INetworkDefinition
network = builder->createNetworkV2(0U);

/* create input tensor {3, kInputH, kInputW} */
ITensor* InputData = network->addInput(kInputTensorName, dt, Dims3{3, kInputH, kInputW});
assert(InputData);

/* create weight map */
std::map<std::string, Weights> weightMap = loadWeights(wts_name);

/* AE */
// auto BN1 = NormalizeInput(network, *InputData);
// encoder
auto enconv1 = convRelu(network, weightMap, *InputData, 32, 4, 2, 1, 1, "model.ae.encoder.enconv1", true);
auto enconv2 = convRelu(network, weightMap, *enconv1->getOutput(0), 32, 4, 2, 1, 1, "model.ae.encoder.enconv2", true);
auto enconv3 = convRelu(network, weightMap, *enconv2->getOutput(0), 64, 4, 2, 1, 1, "model.ae.encoder.enconv3", true);
auto enconv4 = convRelu(network, weightMap, *enconv3->getOutput(0), 64, 4, 2, 1, 1, "model.ae.encoder.enconv4", true);
auto enconv5 = convRelu(network, weightMap, *enconv4->getOutput(0), 64, 4, 2, 1, 1, "model.ae.encoder.enconv5", true);
auto enconv6 = convRelu(network, weightMap, *enconv5->getOutput(0), 64, 8, 1, 0, 1, "model.ae.encoder.enconv6", false);
// decoder
auto deconv1 = interpConvRelu(network, weightMap, *enconv6->getOutput(0), 64, 4, 1, 2, 1, "model.ae.decoder.deconv1", 3);
auto deconv2 = interpConvRelu(network, weightMap, *deconv1->getOutput(0), 64, 4, 1, 2, 1, "model.ae.decoder.deconv2", 8);
auto deconv3 = interpConvRelu(network, weightMap, *deconv2->getOutput(0), 64, 4, 1, 2, 1, "model.ae.decoder.deconv3", 15);
auto deconv4 = interpConvRelu(network, weightMap, *deconv3->getOutput(0), 64, 4, 1, 2, 1, "model.ae.decoder.deconv4", 32);
auto deconv5 = interpConvRelu(network, weightMap, *deconv4->getOutput(0), 64, 4, 1, 2, 1, "model.ae.decoder.deconv5", 63);
auto deconv6 =
        interpConvRelu(network, weightMap, *deconv5->getOutput(0), 64, 4, 1, 2, 1, "model.ae.decoder.deconv6", 127);
auto deconv7 = interpConvRelu(network, weightMap, *deconv6->getOutput(0), 64, 3, 1, 1, 1, "model.ae.decoder.deconv7", 56);
auto deconv8 = convRelu(network, weightMap, *deconv7->getOutput(0), 384, 3, 1, 1, 1, "model.ae.decoder.deconv8", false);

/* PDN_samll_teacher */
auto teacher1 = convRelu(network, weightMap, *InputData, 128, 4, 1, 0, 1, "model.teacher.conv1", true);
auto avgPool1 = avgPool2d(network, *teacher1->getOutput(0), 2, 2, 0);
auto teacher2 = convRelu(network, weightMap, *avgPool1->getOutput(0), 256, 4, 1, 0, 1, "model.teacher.conv2", true);
auto avgPool2 = avgPool2d(network, *teacher2->getOutput(0), 2, 2, 0);
auto teacher3 = convRelu(network, weightMap, *avgPool2->getOutput(0), 256, 3, 1, 0, 1, "model.teacher.conv3", true);
auto teacher4 = convRelu(network, weightMap, *teacher3->getOutput(0), 384, 4, 1, 0, 1, "model.teacher.conv4", false);

/* PDN_samll_student */
auto student1 = convRelu(network, weightMap, *InputData, 128, 4, 1, 0, 1, "model.student.conv1", true);
auto avgPool3 = avgPool2d(network, *student1->getOutput(0), 2, 2, 0);
auto student2 = convRelu(network, weightMap, *avgPool3->getOutput(0), 256, 4, 1, 0, 1, "model.student.conv2", true);
auto avgPool4 = avgPool2d(network, *student2->getOutput(0), 2, 2, 0);
auto student3 = convRelu(network, weightMap, *avgPool4->getOutput(0), 256, 3, 1, 0, 1, "model.student.conv3", true);
auto student4 = convRelu(network, weightMap, *student3->getOutput(0), 768, 4, 1, 0, 1, "model.student.conv4", false);

/* PDN_medium_teacher */
// no BN added after the convolutional layer
// auto teacher1 = convRelu(network, weightMap, *InputData, 256, 4, 1, 0, 1, "teacher.conv1", true);
// auto avgPool1 = avgPool2d(network, *teacher1->getOutput(0), 2, 2, 0);
// auto teacher2 = convRelu(network, weightMap, *avgPool1->getOutput(0), 512, 4, 1, 0, 1, "teacher.conv2", true);
// auto avgPool2 = avgPool2d(network, *teacher2->getOutput(0), 2, 2, 0);
// auto teacher3 = convRelu(network, weightMap, *avgPool2->getOutput(0), 512, 1, 1, 0, 1, "teacher.conv3", true);
// auto teacher4 = convRelu(network, weightMap, *teacher3->getOutput(0), 512, 3, 1, 0, 1, "teacher.conv4", true);
// auto teacher5 = convRelu(network, weightMap, *teacher4->getOutput(0), 384, 4, 1, 0, 1, "teacher.conv5", true);
// auto teacher6 = convRelu(network, weightMap, *teacher5->getOutput(0), 384, 1, 1, 0, 1, "teacher.conv6", false);

// /* PDN_medium_student */
// auto student1 = convRelu(network, weightMap, *InputData, 256, 4, 1, 0, 1, "student.conv1", true);
// auto avgPool3 = avgPool2d(network, *student1->getOutput(0), 2, 2, 0);
// auto student2 = convRelu(network, weightMap, *avgPool3->getOutput(0), 512, 4, 1, 0, 1, "student.conv2", true);
// auto avgPool4 = avgPool2d(network, *student2->getOutput(0), 2, 2, 0);
// auto student3 = convRelu(network, weightMap, *avgPool4->getOutput(0), 512, 1, 1, 0, 1, "student.conv3", true);
// auto student4 = convRelu(network, weightMap, *student3->getOutput(0), 512, 3, 1, 0, 1, "student.conv4", true);
// auto student5 = convRelu(network, weightMap, *student4->getOutput(0), 768, 4, 1, 0, 1, "student.conv5", true);
// auto student6 = convRelu(network, weightMap, *student5->getOutput(0), 768, 1, 1, 0, 1, "student.conv6", false);

/* postCalculate */
auto normal_teacher_output = NormalizeTeacherMap(network, weightMap, *teacher4->getOutput(0));
std::vector<ITensor*> layer_vec{};
slice(network, *student4->getOutput(0), layer_vec);`

static ILayer* NormalizeFinalMap(INetworkDefinition* network, std::map<std::string, Weights>& weightMap, ITensor& input, std::string name) { float* qa = (float*)weightMap["model.quantiles.qa_" + name].values; float* qb = (float*)weightMap["model.quantiles.qb_" + name].values; int len = weightMap["model.quantiles.qa_" + name].count;

static IScaleLayer* NormalizeTeacherMap(INetworkDefinition* network, std::map<std::string, Weights>& weightMap, ITensor& input) { float* mean = (float*)weightMap["model.mean_std.mean"].values; float* std = (float*)weightMap["model.mean_std.std"].values; int len = weightMap["model.mean_std.mean"].count;

@wang-xinyu
Copy link
Owner

Double check your layers one by one. And you can also mark the middle layers as output, to compare the tensor values with pytorch.

@watertianyi
Copy link
Author

watertianyi commented Dec 17, 2024

@wang-xinyu
Yes, I printed each layer as follows, This accuracy loss seems to jump
`teacher conv4: tensor(-0.0262, device='cuda:0', dtype=torch.float16)
teacher conv4: tensor(-0.4404, device='cuda:0', dtype=torch.float16)
teacher conv4: tensor(0.5142, device='cuda:0', dtype=torch.float16)

cpu_output_data[0]:-0.0233255
cpu_output_data[1]:-0.471282
cpu_output_data[2]:0.299254

teacher relu3: tensor(0.1133, device='cuda:0', dtype=torch.float16)
teacher relu3: tensor(0.0643, device='cuda:0', dtype=torch.float16)
teacher relu3: tensor(0., device='cuda:0', dtype=torch.float16)

cpu_output_data[0]:0.126856
cpu_output_data[1]:0
cpu_output_data[2]:0

teacher avgpoo2: torch.Size([1, 256, 61, 61])
teacher avgpoo2: tensor(0.2576, device='cuda:0', dtype=torch.float16)
teacher avgpoo2: tensor(0., device='cuda:0', dtype=torch.float16)
teacher avgpoo2: tensor(0., device='cuda:0', dtype=torch.float16)

cpu_output_data[0]:0.265905
cpu_output_data[1]:0.000384835
cpu_output_data[2]:0

teacher relu2: torch.Size([1, 256, 123, 123])
teacher relu2: tensor(0.3022, device='cuda:0', dtype=torch.float16)
teacher relu2: tensor(0.0109, device='cuda:0', dtype=torch.float16)
teacher relu2: tensor(0., device='cuda:0', dtype=torch.float16)

cpu_output_data[0]:0.320983
cpu_output_data[1]:0.00683483
cpu_output_data[2]:0

teacher avgpool1: torch.Size([1, 128, 126, 126])
teacher avgpool1: tensor(0.0789, device='cuda:0', dtype=torch.float16)
teacher avgpool1: tensor(0., device='cuda:0', dtype=torch.float16)
teacher avgpool1: tensor(0.0011, device='cuda:0', dtype=torch.float16)

cpu_output_data[batch * kOutputSize]:2032128
cpu_output_data[0]:0.0785448
cpu_output_data[1]:0
cpu_output_data[2]:0.0017153

teacher relu1: torch.Size([1, 128, 253, 253])
teacher relu1: tensor(0.0658, device='cuda:0', dtype=torch.float16)
teacher relu1: tensor(0., device='cuda:0', dtype=torch.float16)
teacher relu1: tensor(0., device='cuda:0', dtype=torch.float16)

cpu_output_data[batch * kOutputSize]:8193152
cpu_output_data[0]:0.0733184
cpu_output_data[1]:0
cpu_output_data[2]:0.000657938

student:

teacher conv4: torch.Size([1, 768, 56, 56])
teacher conv4: tensor(0.4897, device='cuda:0', dtype=torch.float16, grad_fn=)
teacher conv4: tensor(-0.0524, device='cuda:0', dtype=torch.float16,grad_fn=)
teacher conv4: tensor(-0.4678, device='cuda:0', dtype=torch.float16,grad_fn=)

cpu_output_data[batch * kOutputSize]:2408448
cpu_output_data[0]:0.566343
cpu_output_data[1]:-0.0974327
cpu_output_data[2]:-0.555574

teacher relu3: torch.Size([1, 256, 59, 59])
teacher relu3: tensor(0.1377, device='cuda:0', dtype=torch.float16, grad_fn=)
teacher relu3: tensor(0., device='cuda:0', dtype=torch.float16, grad_fn=)
teacher relu3: tensor(0.1207, device='cuda:0', dtype=torch.float16, grad_fn=)

cpu_output_data[batch * kOutputSize]:891136
cpu_output_data[0]:0.0886817
cpu_output_data[1]:0
cpu_output_data[2]:0

teacher avgpoo2: torch.Size([1, 256, 61, 61])
teacher avgpoo2: tensor(0., device='cuda:0', dtype=torch.float16, grad_fn=)
teacher avgpoo2: tensor(0.1450, device='cuda:0', dtype=torch.float16, grad_fn=)
teacher avgpoo2: tensor(0.1967, device='cuda:0', dtype=torch.float16, grad_fn=)

cpu_output_data[batch * kOutputSize]:952576
cpu_output_data[0]:0
cpu_output_data[1]:0.141481
cpu_output_data[2]:0.172758

teacher relu2: torch.Size([1, 256, 123, 123])
teacher relu2: tensor(0., device='cuda:0', dtype=torch.float16, grad_fn=)
teacher relu2: tensor(0.2939, device='cuda:0', dtype=torch.float16, grad_fn=)
teacher relu2: tensor(0.1024, device='cuda:0', dtype=torch.float16, grad_fn=)

cpu_output_data[batch * kOutputSize]:3873024
cpu_output_data[0]:0
cpu_output_data[1]:0.241007
cpu_output_data[2]:0.0987995

teacher avgpool1: torch.Size([1, 128, 126, 126])
teacher avgpool1: tensor(0.1506, device='cuda:0', dtype=torch.float16, grad_fn=)
teacher avgpool1: tensor(0.1250, device='cuda:0', dtype=torch.float16, grad_fn=)
teacher avgpool1: tensor(0.0928, device='cuda:0', dtype=torch.float16, grad_fn=)

cpu_output_data[batch * kOutputSize]:2032128
cpu_output_data[0]:0.150846
cpu_output_data[1]:0.124322
cpu_output_data[2]:0.0962757

teacher relu1: torch.Size([1, 128, 253, 253])
teacher relu1: tensor(0.1478, device='cuda:0', dtype=torch.float16, grad_fn=)
teacher relu1: tensor(0.1027, device='cuda:0', dtype=torch.float16, grad_fn=)
teacher relu1: tensor(0.0696, device='cuda:0', dtype=torch.float16, grad_fn=)

cpu_output_data[batch * kOutputSize]:8193152
cpu_output_data[0]:0.147407
cpu_output_data[1]:0.112791
cpu_output_data[2]:0.0832989

torch.pow distance_st: torch.Size([1, 384, 56, 56])
torch.pow distance_st: tensor(0.2851, device='cuda:0', grad_fn=)
torch.pow distance_st: tensor(0.0027, device='cuda:0', grad_fn=)
torch.pow distance_st: tensor(0.2538, device='cuda:0', grad_fn=)

cpu_output_data[batch * kOutputSize]:1204224
cpu_output_data[0]:0.365506
cpu_output_data[1]:0.00344336
cpu_output_data[2]:0.518179

torch.mean map_st: torch.Size([1, 1, 56, 56])
torch.mean map_st: tensor(0.0607, device='cuda:0')
torch.mean map_st: tensor(0.0556, device='cuda:0')
torch.mean map_st: tensor(0.0551, device='cuda:0')

cpu_output_data[batch * kOutputSize]:3136
cpu_output_data[0]:0.0713918
cpu_output_data[1]:0.0725485
cpu_output_data[2]:0.067341

torch.mean map_stae: torch.Size([1, 1, 56, 56])
torch.mean map_stae: tensor(0.0198, device='cuda:0')
torch.mean map_stae: tensor(0.0088, device='cuda:0')
torch.mean map_stae: tensor(0.0064, device='cuda:0')
cpu_output_data[batch * kOutputSize]:3136
cpu_output_data[0]:0.0148362
cpu_output_data[1]:0.00799429
cpu_output_data[2]:0.00749832

F.pad map_st: torch.Size([1, 1, 64, 64])
F.pad map_st: tensor(0.0671, device='cuda:0')
F.pad map_st: tensor(0.0639, device='cuda:0')
F.pad map_st: tensor(0.0952, device='cuda:0')
cpu_output_data[batch * kOutputSize]:4096
cpu_output_data[0]:0.0619231
cpu_output_data[1]:0.0552825
cpu_output_data[2]:0.0812732

F.pad map_stae: torch.Size([1, 1, 64, 64])
F.pad map_stae: tensor(0.1895, device='cuda:0')
F.pad map_stae: tensor(0.1639, device='cuda:0')
F.pad map_stae: tensor(0.3578, device='cuda:0')
cpu_output_data[batch * kOutputSize]:4096
cpu_output_data[0]:0.18965
cpu_output_data[1]:0.13389
cpu_output_data[2]:0.302589

F.interpolate map_st: torch.Size([1, 1, 256, 256])
F.interpolate map_st: tensor(0.0283, device='cuda:0')
F.interpolate map_st: tensor(0.0335, device='cuda:0')
F.interpolate map_st: tensor(0.0334, device='cuda:0')
cpu_output_data[batch * kOutputSize]:65536
cpu_output_data[0]:0.0291607
cpu_output_data[1]:0.0388513
cpu_output_data[2]:0.042677

F.interpolate map_stae: torch.Size([1, 1, 256, 256])
F.interpolate map_stae: tensor(0.0046, device='cuda:0')
F.interpolate map_stae: tensor(0.0052, device='cuda:0')
F.interpolate map_stae: tensor(0.0071, device='cuda:0')
cpu_output_data[batch * kOutputSize]:65536
cpu_output_data[0]:0.00428357
cpu_output_data[1]:0.00367584
cpu_output_data[2]:0.00533737

normalize map_st: torch.Size([1, 1, 256, 256])
normalize map_st: tensor(-0.0653, device='cuda:0', grad_fn=)
normalize map_st: tensor(-0.0583, device='cuda:0', grad_fn=)
normalize map_st: tensor(-0.0585, device='cuda:0', grad_fn=)
cpu_output_data[batch * kOutputSize]:65536
cpu_output_data[0]:-0.064137
cpu_output_data[1]:-0.051166
cpu_output_data[2]:-0.0460452

normalize map_stae: torch.Size([1, 1, 256, 256])
normalize map_stae: tensor(-0.0354, device='cuda:0', grad_fn=)
normalize map_stae: tensor(-0.0351, device='cuda:0', grad_fn=)
normalize map_stae: tensor(-0.0340, device='cuda:0', grad_fn=)
cpu_output_data[batch * kOutputSize]:65536
cpu_output_data[0]:-0.0355803
cpu_output_data[1]:-0.0359128
cpu_output_data[2]:-0.0350038

anomaly_map: torch.Size([1, 1, 256, 256])
anomaly_map: tensor(-0.0504, device='cuda:0', grad_fn=)
anomaly_map: tensor(-0.0467, device='cuda:0', grad_fn=)
anomaly_map: tensor(-0.0462, device='cuda:0', grad_fn=)
cpu_output_data[batch * kOutputSize]:65536
cpu_output_data[0]:-0.0641369
cpu_output_data[1]:-0.051166
cpu_output_data[2]:-0.0460452

`

@watertianyi
Copy link
Author

@wang-xinyu There is an error in the code, and the accuracy difference is now 0.0013. Why does batchsize increase linearly?

@wang-xinyu
Copy link
Owner

Maybe your GPU's compute resource is already fully used even when bs=1.
Check nv docs for further performance improvement. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#performance-guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants