-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⬆️ [email protected] #722
⬆️ [email protected] #722
Conversation
Operation* LarqDialect::materializeConstant(OpBuilder& builder, Attribute value, | ||
Type type, Location loc) { | ||
if (arith::ConstantOp::isBuildableWith(value, type)) | ||
return builder.create<mlir::arith::ConstantOp>(loc, type, value); | ||
return nullptr; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required to enable constant folding with the switch to arith.ConstantOp
. TensorFlow does something similar as well. See here for more information about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the tensorflow example they first try to make it into a TFL::ConstOp
before they try the arith::ConstOp
. I guess we don't need that here because the larq dialect is used for TF ops and not TFL ops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the tensorflow example they first try to make it into a TFL::ConstOp
before they try the arith::ConstOp
. I guess we don't need that here because the larq dialect is used for TF ops and not TFL ops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand their code correctly it will create a tfl.pseudo_const
if the value is opaque which isn't supported by arith::ConstantOp
. I don't think this is a case that we will encounter. In general TFL::ConstOp
will always be folded to arith::ConstantOp
if possible so I think this should be fine to keep it like this for now.
* Update converter to sync with upstream tensorflow * Update larq_compute_engine/mlir/tf_to_tfl_flatbuffer.cc Co-authored-by: Tom Bannink <[email protected]> Co-authored-by: Tom Bannink <[email protected]>
I've removed the old make-builds, and replaced it with the new CMake build system. I've only done something quite basic, building the library and the example and benchmark binaries. I tested the benchmark binary locally and that seems to work. So I think we are ready for a last round of reviews now. |
Awesome! 🚀 Just for reference, so we don't forget. Once this is merged and we've released a new version we will also need to update build docs. |
I triggered a test release, let's see how this goes. |
The windows looks like it succeeded within 12 minutes but actually the build failed with
The linux builds fail because the manylinux toolchain bazel files can't be found; they've probably been updated to newer versions. |
I pinned the Windows VM version to 2019 in 5bf5748, let's see if this fixes the issue. Alternatively we could also change the
I pinned the docker image to 2.8 in 3923692 which didn't seem to fix things. It seems like the TensorFlow repo removed their toolchains from the repo. I copied the toolchain from the TF Addons repo in bdb6f20. They use the same docker container to build the wheels as we do, so I expect this to work. I triggered a new build to test these changes. |
Two builds failed:
Update: the linux build fails with:
But in the latest build it is the 3.8 version of the linux build that fails and the 3.9 passes, so that seems random: |
The timeout increase did not work, it still times out after 6 hours: https://github.com/larq/compute-engine/runs/6036809795?check_suite_focus=true. Perhaps that is some global setting or a Github Actions maximum? In any case, this time two out of the 3 Windows builds passed since they finished in five and a half hours, only one timed out. This behaviour was also seen earlier without TF 2.8, it was always tight with this 6 hours maximum. So I do not consider this related to this branch. My proposal is to merge this in and then investigate making the releases more stable in a separate PR (Linux builds random failures and Windows builds time outs)? I can open an issue for that. |
I agree, let's merge it. I think I've seen this before and in that particular case the system had run out of memory. That could be fixed by adding swap space or changing the number of jobs that bazel runs in parallel. But maybe this problem is different.
|
Co-authored-by: Lukas Geiger <[email protected]>
What do these changes do?
This PR updates LCE to TensorFlow 2.8.0. Quite a few changes have been made, and most likely more are needed. I'll highlight some of the more controversial changes below in the form of a comment.
How Has This Been Tested?
All unittests have been ran locally. Fixes have been made as compilation, linking, or testing errors showed-up. However, some of them do not succeed yet, most notably the MLIR tests. This PR is therefore still a draft.
Benchmark Results
Latency has not been verified yet.
Related issue number
N/A.
Tasks
Open tasks before being able to turn the draft PR in an actual PR: