Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detail of BinaryDenseNet or BinaryResNet18E #8

Open
LaVieEnRoseSMZ opened this issue Aug 6, 2019 · 12 comments
Open

Detail of BinaryDenseNet or BinaryResNet18E #8

LaVieEnRoseSMZ opened this issue Aug 6, 2019 · 12 comments

Comments

@LaVieEnRoseSMZ
Copy link

Hi, recently I read your newly realeased paper "Back to Simplicity: How to Train Accurate BNNs from Scratch?" It is a quite good paper and inspires me a lot.

However, I am a little confused about the implementation in this paper. I am not familiar with the code structure of MXNet, Could you please write a more detailed readme or a tutorial or anything similar which could explain the code and the training details?

Thanks a lot in advance~

@Jopyth
Copy link
Collaborator

Jopyth commented Aug 6, 2019

Hi, I agree the framework definitely could benefit from more tutorials and documentation. However it would be best if these hit the "sweet spot" which cover what you (or anyone else who stumbles upon this) is interested in. In the title you name the details of the networks, but in your comment you mention training in general? Which resources have you used/found already? Did you manage to build the framework on your machine?

In the meantime, here are a few links, that might help you get started, depending on what you are looking for:

  • The Examples contain code for building the models (ResNetE, BinaryDenseNet) and training on ImageNet or Cifar (image_classification.py)
  • The training script has a lot of parameters, you could check out the pages we have already created in the Wiki (e.g. regarding Hyperparameters). There are also pages with options you need to provide to the training script to train/reproduce the results of a particular model. This page contains lots of experiments. Unfortunately, I have not yet updated this page to include the latest experiment results, but you can find the configuration details, e.g. for BinaryDenseNet (reduction), in the supplementary material already.

Please let me know if this already helps, or whether you would like additional information.

@LaVieEnRoseSMZ
Copy link
Author

Thanks very much for your patience and detailed answer. I have spent days to reproduce your work using pytorch and I already gets 56.3% using your hyperparameter in wiki which indicated your solid work. And I have found your supplementary meterial in arxiv.com and found detailed log of BinaryResNetE of 58.1% and I will reproduce this results too.

Thanks again for your detailed log and detailed supplementary material and it really helps us a lot~

@lgeiger
Copy link

lgeiger commented Aug 9, 2019

We also started trying out BinaryDenseNets and the first results seam promising.

@LaVieEnRoseSMZ To which supplementary materials are you referring to? It doesn't seam like they are included in the arXiv version: https://arxiv.org/pdf/1906.08637.pdf

@LaVieEnRoseSMZ
Copy link
Author

I am referring to the url in the comments of arxiv paper supplementary material

@lgeiger
Copy link

lgeiger commented Aug 9, 2019

Thanks. That's helpful 👍

@LaVieEnRoseSMZ
Copy link
Author

I still have one more problem in reading the code of Binarylayer

I can not find the defination and implementation of "det_sign" which is used to quantize activation and weight. Can you please show me the url of this part of code?

Thanks a lot in advance~

@Jopyth
Copy link
Collaborator

Jopyth commented Aug 9, 2019

As described in Overview of changes, you can find the parts of the code for det_sign in this commit.

@LaVieEnRoseSMZ
Copy link
Author

One more question about gradient_cancel layer, does it exactly the same as the supplementary material describes:
image

Thanks a lot for your patient answering~

@Jopyth
Copy link
Collaborator

Jopyth commented Aug 9, 2019

The combination of gradient_cancelling + det_sign (e.g. QActivation with det_sign as activation function) does exactly what is described in the above part. You can check the implementation of gradient cancelling:

template<int req>
struct gradcancel_forward {
template<typename DType>
MSHADOW_XINLINE static void Map(int i, DType* out_data, const DType* in_data) {
KERNEL_ASSIGN(out_data[i], req, in_data[i]);
}
};
template<int req>
struct gradcancel_backward {
template<typename DType>
MSHADOW_XINLINE static void Map(int i, DType* in_grad, const DType* out_grad,
const DType* in_data, const float threshold) {
KERNEL_ASSIGN(in_grad[i], req, math::fabs(in_data[i]) <= threshold ? out_grad[i] : DType(0));
}
};

@LaVieEnRoseSMZ
Copy link
Author

Hi~I am reproducing BinaryDenseNet in the paper. When I go through the code, I find three version of densenet called densenet, densenet_x and densenet_y, what is the exact version used in you final experiments? And I couldn't find DenseNet28 anywhere, would you mind show me the code?

Thanks a lot in advance~

@yanghaojin
Copy link
Collaborator

yanghaojin commented Sep 26, 2019

Hi~I am reproducing BinaryDenseNet in the paper. When I go through the code, I find three version of densenet called densenet, densenet_x and densenet_y, what is the exact version used in you final experiments? And I couldn't find DenseNet28 anywhere, would you mind show me the code?

Thanks a lot in advance~

We use "densenet.py" for all the experiments. The networks like BinaryDenseNet28/37 that have been created by using the densenet specific configurations (init feature number, reduction rate, growth rate) described in the supplementary material published with our paper: https://owncloud.hpi.de/s/1jrAUnqRAfg0TXH
you will find the detailed network configuration, network architecture, logs of training etc.

@lgeiger
Copy link

lgeiger commented Oct 7, 2019

Thanks for sharing all the details and the supplementary materials. We were able to reproduce your experiments (which unfortunately isn't the case with many other papers) 👍

@LaVieEnRoseSMZ If you are looking for an reimplementation of this paper using Larq (a Keras and TensorFlow based BNN library), you can also checkout the pretrained models and training code at https://larq.dev/models/ and https://github.com/larq/zoo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants