-
Notifications
You must be signed in to change notification settings - Fork 282
Description
Description
The BaseQuantizer._forward_impl in the Torch backend contains several conditional checks that execute on every forward pass. These include debug counters, enablement checks, and tracing state checks.
While each condition is inexpensive individually, the quantizer runs in the hot path of deep models and is invoked extremely frequently. The cumulative overhead becomes measurable in large workloads.
This results in unnecessary branching during the forward pass and degrades performance.
Location
layers.py:L432–451
Problem Summary
The current implementation repeatedly evaluates internal state flags during execution:
- quantizer enablement
- debug counters
- tracing mode
- runtime state checks
These decisions are static most of the time and do not need to be recomputed on every forward call.
Suggested Fix
Refactor the quantizer to use a state-based dispatch approach:
- Maintain separate forward implementations for each state
- Switch the forward pointer when the quantizer state changes
- Eliminate repeated branching inside the hot path
Example idea:
self.forward = self._forward_enabled
# or
self.forward = self._forward_disabledThis moves decision-making outside the hot loop and keeps execution minimal.
Expected Benefits
- Reduced branching in the forward path
- Lower runtime overhead in deep models
- Cleaner separation of quantizer states
- Improved performance in training and inference