Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will module output not be quantized when the model is directly trained after Calibration? #336

Open
tusiqi1 opened this issue Oct 11, 2024 · 5 comments

Comments

@tusiqi1
Copy link

tusiqi1 commented Oct 11, 2024

No description provided.

@tusiqi1
Copy link
Author

tusiqi1 commented Oct 12, 2024

  1. model structure and train data is:
    image
  2. clibration code is:
    image
  3. train quant model with initial input_scale and output_scale code is:
    image

I found that when I executed step 3, the output quant would not execute, and the reason is when model first passes

with Clibration():
    model(input)

the hook for callibrate_output for the entire model executes the code in the red box:
image

Why should the code in the red box disable output quant?
Or my use of calibration + QAT use errors.

@dacorvo
Copy link
Collaborator

dacorvo commented Oct 14, 2024

As you can see in the comment on the line you highlighted, the quantization of the outputs is disabled because the operation immediately following is not compatible with quantized inputs. This means that when the Tensor reaches that operation, it will be immediately dequantized: the streamline optimization policy removes the spurious quantize/dequantize.

If you want to disable this behaviour, just pass streamline=False during calibration.

@tusiqi1
Copy link
Author

tusiqi1 commented Oct 15, 2024

As you can see in the comment on the line you highlighted, the quantization of the outputs is disabled because the operation immediately following is not compatible with quantized inputs. This means that when the Tensor reaches that operation, it will be immediately dequantized: the streamline optimization policy removes the spurious quantize/dequantize.

If you want to disable this behaviour, just pass streamline=False during calibration.

image
When I press the break point in the image above, the entire model is passed to the calibrate_output() function, which causes the quantization of the output to be turned off.

Is this right?

@dacorvo
Copy link
Collaborator

dacorvo commented Oct 15, 2024

Just use with Calibration(streamline=False): to disable this behaviour.

@tusiqi1
Copy link
Author

tusiqi1 commented Oct 15, 2024

Just use with Calibration(streamline=False): to disable this behaviour.

Thank you very much, I'll think more about the purpose of streamline myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants