-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTGW integration randomly losing data #258
Comments
Hi @rhtenhove So turn off the Native HA integration of the OTGW component, thus stop using the serial over network integration. And then just use the MQTT option. Using both serial and MQTT at the same time seems to give you problems, so try stop using the "serial integration" to find out if it is more reliable. Let me know what you find, |
To further the investigation, I've done a packet capture, and was "lucky" to receive a malformed packet and a timeout quickly after each other. The problem still persists, and it's quite frustrating since it makes it a lot harder to manage my multiple rooms and TRVs. The malformed packet shows 3 packets which happened at the same time the Mosquitto logs show the failure. The first 2 appear fine, while the third seems to be unrecovered packet loss? Based on the The Mosquitto timeout log was shown 25 seconds after this packet was seen in the packet capture. Same thing, This was captured on a Raspberry Pi 5 in the Mosquitto Add-on container of Home Assistant OS using: tcpdump -i eth0 -w - "(tcp port 1883) and (tcp[24:4]=0x4f544757 and tcp[28:4]=0x2f76616c and tcp[32:4]=0x75652f6f and tcp[36:4
]=0x7467772d and tcp[40:4]=0x39344239 and tcp[44:4]=0x37453135 and tcp[48:4]=0x43413833)" The latter is to filter on the topic Malformed packet
Timeout
I hope you're able to make cheese out of this 😉 thanks for your help and for making this firmware! |
Hello @rhtenhove , edit: !!! ---> NOT true anymore: "Up till now, no malformed MQTT packets and no HA unavailability messages." @rhtenhove , when you have simultaneous MQTT and serial (telnet) session in use, maybe that is the issue too? |
I had a similar idea that the ESP8266 was just being overloaded; however I've already disabled all sessions, gave the OTGW a power cycle, and made sure to not open any new session (such as telnet or the webbrowser), even temporarily, to ensure it is only connected to the MQTT broker. Sadly this has been like that for 8 days and still the problems occurs. I do see that every day there is a time block of several hours where the problem does not occur, but not the same time block every day. Also not a discernible pattern with people in the house, heat demand, or anything of the sort. I'm going to try a different wifi channel, see what that does. I had already chosen the least occupied channel here (1), and made sure my zigbee network is on a non overlapping channel (25). But perhaps my neighbours do something odd most of the day, who knows. |
A few days ago I received a fresh OTGW with a Wemos D1 (for WiFi) and everything prepared (soldered and flashed by nodo shop).
After connecting my devices (had to do this first after setting up wifi, as it would not boot without):
And updating:
All seemed well, however in Home Assistant I get various warnings and errors (Command PR failed and timeouts) from the Opentherm Gateway integration. I gave it some time, and the log collected quite a lot of these.
At first I also had MQTT enabled, and was checking the web interface, but I understand combining these causes unreliable results. So I disabled MQTT, closed the web interface, power cycled the OTGW, and opened telnet to see what's going on.
The HA logbook is full of the devices switching to unknown and then back on again, and the OTGW thermostat climate entity will lose temperature and setpoint. Automations don't behave well because of all the status changes. In telnet I see several
Not processed, received from OTGW
messages, but I don't know what these mean. I've added part of the log at the bottom of this issue.Given that there seemed little that could have gone wrong (it was mostly plug and play), I'm not sure where to look now for what's misbehaving. The OTGW has internet, dns, ntp, and access to the world if need be, no limitations there.
When it is working, it works just fine, data is read correctly, states are correctly changed (both thermostat and boiler). However randomly it will go bananas, and I can't relate this to anything.
Any ideas? 🙏
Logging and debug
Home assistant logs:
and
Here's the debug output:
Telnet log when MQTT fails
The text was updated successfully, but these errors were encountered: