You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should work out a good way to raise warnings for models which won't convert in an 'optimal' way -- essentially any eggregious violation of our model optimisation guide. It's slightly complicated for a few reasons:
We need to decide where to raise warnings (in Larq, in the LCE converter)?
Many of the warnings we could raise are target-dependent, but the flatbuffer files produced by the LCE converter are target-independent.
There are different degrees of 'sub-optimal'. In the particular example you raise, when using the LCE optimized kernels on Aarch64, not being able to write bitpacked activations doesn't have a huge measurable latency impact (< 10%).
We don't want to spam users with lots of warnings about sub-optimal code, especially as there are legitimate use-cases where a user doesn't care if something will run slowly (for example, the user might be doing research into new BNN architecture designs and using patterns which aren't currently optimised in LCE, such as grouped convolutions, but could be implemented in the future).
* We need to decide where to raise warnings (in Larq, in the LCE converter)?
My two cents: I would try to decouple Larq from any conversion-related logic, including when to warn. (TensorFlow doesn't give you a warning either when you use an operator that cannot be efficiently converted by TFLite.)
* Many of the warnings we could raise are target-dependent, but the flatbuffer files produced by the LCE converter are target-independent.
How about a target argument to the converter, where one could specify "reference" or "Aarch64" (or others later), with the latter being the default (if desired) for compatibility? To make the flatbuffer target dependent, a metadata field with name target could be used. Then the runtime could check the intended target at initialization and abort/warn.
* We don't want to spam users with lots of warnings about sub-optimal code, especially as there are legitimate use-cases where a user doesn't care if something will run slowly (for example, the user might be doing research into new BNN architecture designs and using patterns which aren't currently optimised in LCE, such as grouped convolutions, but could be implemented in the future).
I think being able to specify a target would already help with this, but depending on how the warnings are logged, you could push to responsibility on the user to control what's logged:
explicit verbosity settings passed to convert (I generally don't like this approach)
if you can use logging (and even in some other cases), the user has the ability to fully control levels and handlers, and you could provide helpers like this:
with lce.converter_log_level('q'):
lce.convert_keras_model(
model_with_unsupported_op,
target='reference',
)
We should work out a good way to raise warnings for models which won't convert in an 'optimal' way -- essentially any eggregious violation of our model optimisation guide. It's slightly complicated for a few reasons:
Originally posted by @AdamHillier in #541 (comment)
The text was updated successfully, but these errors were encountered: