Replies: 2 comments 1 reply
-
I saw that! I'm going to warn you, this will be long but like some of my other rants on these issues and discussions you're the first to ask this question but I know it will come up in the future so this response will serve as a draft to what we will eventually make available on heywillow.io and elsewhere as I'm sure you won't be the last to ask :). TLDR we have no plans to support Wyoming and likely never will. If you want to know why, read on. A little background/clarification:
With that, I've done extensive research and testing with Home Assistant Voice and the component pieces including Wyoming. They are both a tremendous achievement and are very interesting but they are problematic in terms of aligning with the overall goals of the Willow. In no particular order:
Overall, Willow and WIS are much more performant because we do things like stuff audio frames into packets at MTU plus everything else I've described. It's also much more robust (accurate) even in the most challenging conditions because HTTP/HTTPS use TCP (of course) with retransmissions. Willow and WIS also support high-quality and performant audio compression to reduce wifi airtime and network utilization even further, which some Willow users enable because they have especially challenging wifi environments - situations where 2.4 GHz wifi spectrum is completely stomped on with dozens of visible networks and overall competition for airtime. Implementing this with Wyoming and the HA Voice ecosystem would also likely be challenging and is currently unsupported. So, the concern overall with Wyoming and HA Voice ecosystem support is that (in my view) it's fundamentally incapable of meeting our goals. This is a very good and valid question but overall I don't want to create a scenario where the fundamental user experience with WIS and HA Voice doesn't (and almost certainly can't) compare to native Willow. Also note I'm in no way an authority on Wyoming/HA Voice or a true expert with them so I may be wrong on some of these points but based on documentation, code review, and testing I believe all of this is reasonably accurate. Willow is a very small team with three less-than-fulltime developers. Certainly at this time we need to focus our energies and time on our project goals and users. Our goal is to truly be the best voice interface in the world - even beating commercial implementations. We have more than enough to do just to catch up with the tens/hundreds of millions of dollars over nearly a decade Amazon has put into Alexa ;). In terms of the BOX-3, I'm not sure what you mean here. Willow has had full support for the BOX-3 since mid-September and obviously the BOX and BOX-Lite since inception. We've moved on to supporting even more hardware like the M5Stack CoreS3. I understand this may sound critical as I tend to write very matter-of-factly but I want to be very clear: what HA has done is fantastic and I support anything that enables the goals we're all interested in - privacy, local hosting, and flexibility. The HA Voice approach has other advantages over Willow (like ease of use and tight integration) and in the end users should use what works best for them and their situation. Willow has no HA user monetization approaches (and never will) or other motivations to try to convince people we're "better" because we're not - all of this is very user and situation specific. Willow is designed with the approach I've described because our monetization strategy is commercial and "enterprise" applications, the only markets I understand and have experience with. I have many hard-learned lessons from 20 years of experience with the intersection of voice, audio, networks, and systems. In the end these points may not have much meaningful difference to end users but in my experience they almost always do, with the difference often being it works or doesn't. |
Beta Was this translation helpful? Give feedback.
-
@kristiankielhofner I appreciate the very thorough response! It is very impressive what you guys have built. I have tested various demos like Picovoice, Sensory, OpenWakeWord, DSP Concepts on different hardware platforms like STM32, Arduino, XMOS, and Raspberry Pi and have still had the best results in terms of speech to text from Willow Inference Server. Whisper just seems to work very well with a variety of mics and without any requiring extra DSP. I recently have been using the M5 Atom Echo with HA and the out of the box experience (haven't messed with any of the voice settings yet) using both Home Assistant Cloud and faster-whisper has sadly been underwhelming. I am excited for my ESP32-S3-BOX-3 to arrive so I can test out the full Willow experience! |
Beta Was this translation helpful? Give feedback.
-
Home Assistant recently announced wake word support in their latest Year of the Voice update which brought attention to the Wyoming protocol which is a "peer-to-peer protocol for voice assistants".
It would be awesome to be able to hook up WIS as the speech to text inference server for Home Assistant using the Wyoming protocol since WIS is way faster than a Raspberry Pi 4 (which is what I am using).
I'd love to hear if you guys are planning on a Wyoming integration anytime soon? I've seen mention of the WAS protocol but haven't found documentation or implementation details for it.
Home Assistant also mentioned they are targeting the ESP32-S3-Box-3 for future voice satellite improvements.
Beta Was this translation helpful? Give feedback.
All reactions