Skip to content

Conversation

functionpointer
Copy link

@functionpointer functionpointer commented Mar 6, 2025

HMS inverters, especially HMS-1600-4T, have been notorious for losing connection with OpenDTU.
See #2137, #2278, #2277

It seems the inverter moves to a neighboring frequency, and ChangeChannelCommand can't bring it back.
However, if we find the new frequency and tune to it, OpenDTU can communicate just fine.

This PR introduces a FrequencyManager, that automatically searches for the frequency of unreachable HMS inverters.
The search starts around the configured frequency (inverterTargetFrequency) and slowly moves away.
To prevent spamming the whole band, the search is slow and does not increase the number of messages sent.

On every fetch (5s by default DTU_POLL_INTERVAL), each message is sent up to 5 times (1 + MAX_RESEND_COUNT). This stays the same, but FrequencyManager will decide the frequency for each of them.
The first one will always be on inverterTargetFrequency, while the following ones slowly move away by this formula:

((failed_fetch_count + cmd_retransmit_count) % 20) * (cmd_retransmit_count%2==0?-1:1) * channel_width.

Example with inverterTargetFrequency=865
first fetch: 865; 864.75; 865.25; 864.50; 865.50;
second fetch: 865; 864.50; 865.50; 864.25; 865.75;
third fetch: 865; 864.25; 865.75; 864; 866;

At this point, I have not tested the code yet. I plan on doing that in the coming few days.
Comments or testing is welcome.
Code reviews only to a limited extent, I think that should wait until we know the strategy works.

@tbnobody
Copy link
Owner

Wouldn't it make more sense to implement it the same way as the original dtu does it?
#2137 (comment)

@functionpointer
Copy link
Author

I don't quite see a well-defined strategy in those logs.
If you see one, I would be happy to implement it.

The current strategy is a fairly simple, and it has been reliable for a few days now.
In fact, I have not seen any inverter not on the inverterTargetFrequency even once (according to HA history).
Maybe that's due to insufficient testing, or maybe it is sufficient to ChangeChannelCommand on more frequencies than invBootFrequency.

Fixes HMS inverters losing connection sometimes.
Fixes tbnobody#2277, tbnobody#2278, tbnobody#2137.

HMS inverters sometimes decide to change their own frequency.
They no longer listen to our inverterTargetFrequency and not on inverterBootFrequency.

FrequencyManager deals with this by slowly scanning the band once isReachable()==false
@functionpointer
Copy link
Author

This has been working great for me

@functionpointer functionpointer marked this pull request as ready for review March 14, 2025 14:18
@fr-fhe
Copy link

fr-fhe commented Mar 19, 2025

This has been working great for me

Hello,
Still working great after 5 days ?

My inverters are hell since there's some sun.
When power > 1000W (HMS-1600), they lost frequency.

Impatient to have your version.

@schlimmchen
Copy link
Contributor

@tbnobody Do you agree that approving the workflow is okay? @fr-fhe could test these changes if there was a firmware ready to install.

@fishpepper
Copy link

fishpepper commented Mar 19, 2025

I compiled and installed commit functionpointer@802ca4e one day ago. Seems to work as before, but unfortunately it did not fix the issue for me.

The HMS2000 (blue) stops reporting when reaching around 900W of power. The HMS 800 (orange) works and reports fine:
image

Both seem to produce just fine, its only the connection to the DTU dropping out.

This looks rather strange to me. Always stopping to send data at around 900W and returning to normal when going below 900W....
I think this might be not only a frequency hopping issue (at least in my case). Maybe the inverter warms up and shifts the frequency? I just added the inverter and DTU temperature and the rx stats to my logging and will report back after some sunny days.

@functionpointer
Copy link
Author

@fishpepper that is unfortunate. Could you post some console logs during the outage?
This PR is specifically written to address frequency shifting.

@fishpepper
Copy link

I did reset "CMT2300A frequency" in the settings yesterday to the default and today the service was running fine above 900W for the first time in months. Maybe saving this setting triggered a reset in something used by the frequency hopping?
I have no serial log right now as the device is outside in the shed, i will continue watching it and add a raspberry pi if necessary,

@functionpointer
Copy link
Author

You don't actually need a Raspberry Pi, the console on the webpage is enough. Make sure you capture several minutes.

FYI i wrote a longer term logging script, if you want to automate capturing logs: https://github.com/functionpointer/OpenDTU-Logger

@fishpepper
Copy link

perfect! logger is running. i will report back with positive or negative results! any ideas why setting the base frequency made a difference?

@functionpointer
Copy link
Author

functionpointer commented Mar 21, 2025 via email

@fishpepper
Copy link

A total connection loss is unlikely, the HMS800 reported fine.

Unfortunately the HMS2000 lost the connection again at ~900W
image
Temperature does not seem to be the issue.

I have a log file of the whole day. I attached the part where it lost connection to this post (~9:40 the first time, recovered and then lost it completely at 9:42):
opendtu_log1.txt
I replaced the serial numbers with the xxx__HMS200/HMS800

@fishpepper
Copy link

one more dataset where the connection is lost and restored:
image

and one picture where it is easier to see the dropouts:
image

and the log file
opendtu_log2.txt

@functionpointer
Copy link
Author

Huh, it seems to not answer on any frequency.
Maybe the inverter creates its own interference at high power, causing your chosen channel to fail.
With the ChangeChannelCommand now on all frequencies, we may be causing the inverter to go back to the broken channel repeatedly.

I have pushed a small change that restores old behavior for ChangeChannelCommand.
Only the other commands are still being scanned.

Could you test it?

@fishpepper
Copy link

sure! just uploaded commit 90882579c73b6418f7bdd1748c459422292f538d. I will report back :) Thanks!

@fishpepper
Copy link

Unfortunately this did not help. Actually, 9088257 seems worse than 802ca4e. The reception loss seems to be more frequent:
image

before:
image

I only had one day of full receiption above 900W (21.03.25) with 802ca4e. Am i the only one testing? Would it help others to post the binaries somewhere? I will revert back to 802ca4e for now.

@WIsbrecht
Copy link

WIsbrecht commented Apr 1, 2025

Unfortunately this did not help. Actually, 9088257 seems worse than 802ca4e. The reception loss seems to be more frequent: image

before: image

I only had one day of full receiption above 900W (21.03.25) with 802ca4e. Am i the only one testing? Would it help others to post the binaries somewhere? I will revert back to 802ca4e for now.

I've been running this pull request ever since functionpointer published it. But I've never had connection problems before.
Did you try the original hoymiles DTU? Just to make sure it's behaving differently in a known good environment. This 900 W issue you are currently experiencing seems weird to me. Maybe it behaves the same with an original hoymiles DTU and the issue is with your setup?

  • Waldemar

@functionpointer
Copy link
Author

Aw that's disappointing :(

Do you have logs of the new (90882579) version losing connection?
It probably won't show much new information, but I'd like to take a look regardless.

@fishpepper
Copy link

I've been running this pull request ever since functionpointer published it. But I've never had connection problems before.

I think the problems exist when running an HMS2000 with firmware revision 1.0.27 and another HMS 800 (?) with the same DTU. How is your setup?

Did you try the original hoymiles DTU?

I just ordered one to give it a try and/or upgrade the firmware of the HMS2000.

@functionpointer
Copy link
Author

functionpointer commented Apr 2, 2025

I am running:

  • HMS 1800 4T firmware 1.0.27
  • HMS 1800 4T firmware 1.0.18
  • HMS 800 2T firmware 1.0.4
  • HMS 800 2T firmware 1.0.4

Before this PR, I had connectivity issues every few days. Every time one or two of the HMS1800's were affected. HMS800 never had issues.

With both versions of this PR it has been working great. Not a single connection loss for weeks now.

@fishpepper
Copy link

I am running

  • HMS2000 4T with firmware 1.0.27 (hw 270692628 @ 01.10)
  • HMS800 2T with firmware 1.0.4 (hw 270614788 @ 01.00)

The DTU is 4m away from the HMS. The HMS800 is always working fine, the HMS2000 usually stops around 900W. With the first commit i had one single day where it worked fine all day.

My DTU settings are (just to be safe i did not mess those up):

  • Interval: 10s
  • CMT2300A power: 5dBm
  • Region: Europe (860.25-923.5MHz)
  • CMT2300A Frequency: 865.0MHz

@WIsbrecht
Copy link

I'm running three HMS-1600-4T, but only two are added to the OpenDTU.

  • HMS-1600-4T Firmware Version 2.0.4
  • HMS-1600-4T Firmware Version 1.0.27

If you suspect there might be some kind of interference between your HMS-2000 and HMS-800, you could try disconnecting your HMS-800 and see if the 900 W anomaly still exists.

@fishpepper
Copy link

good idea, i will disconnect the panels of the HMS800 tonight and report back :)

@fishpepper
Copy link

I unplugged the HMS800 (DC and AC) and disabled it in openDTU. Connection was lost at 895W :( I will try the original DTU now.

@functionpointer
Copy link
Author

I unplugged the HMS800 (DC and AC) and disabled it in openDTU. Connection was lost at 895W :( I will try the original DTU now.

Actually, i think that is a good sign. Interference would be pretty annoying to deal with. This way the problem is inherent to the HMS2000 inverter itself.

If the original DTU works, i think it is time to try a few different inverterTargetFrequency settings.
If that ends up working i may implement a new strategy that sweeps the target frequency automatically.

@WIsbrecht
Copy link

WIsbrecht commented Apr 3, 2025

I would suggest testing your HMS-2000 with the original hoymiles DTU - maybe your HMS-2000 is defective. You never know.
Perhaps make sure your HMS-2000 is actually still feeding in the mains when OpenDTU looses connection at 900 W. Do you have an energy meter or a current clamp meter?

@fishpepper
Copy link

The original DTU connected jsut fine and happily reports 1.2kW right now. I will keep watching it.

Perhaps make sure your HMS-2000 is actually still feeding in the mains when OpenDTU looses connection at 900 W.

It keeps producing. You can also see this in the following graph (only HMS2000 Power and Yield per Day shown):
image

Is there an easy way to capture the original DTU traffic? Something like #1650 ?
If there is no simple solution i could crack open the original DTU and attach a logic analyzer with SPI decoding to the CMT pins...

@WIsbrecht
Copy link

Ok, good to know for sure your HMS-2000 is working as expected.
When you are done testing with the hoymiles DTU could you try placing the OpenDTU at another location. Or try playing with the "CMT2300A Transmitting power " setting in OpenDTU.

@fishpepper
Copy link

fishpepper commented Apr 3, 2025

The openDTU is 4m from the HMS2000 and HMS800 (both at the same location). Direct line of sight. The hoymiles DTU is currently placed ~10m with several metal objects in between. I doubt my problems are due to reception/transmit power :(

@WIsbrecht
Copy link

I don't think it's due to too little Power but maybe too much power. There can be radio interference due to placement and too much RF Power. Try the same location as your hoymiles DTU.

@fishpepper
Copy link

The hoymiles DTU is now 3m from the HMS800 and HMS2000. I will have to wait for more sun tomorrow :)
To me, it looks rather strange that there are so many reports where it skips reception at around 900W. It looks like a bug/misinterpretation of some data to me. If there is a lot of sun, I might try to selectively cover/uncover the cells rapidly while above 900W. Would be interesting to see if it regains connectivity immediately on those events.

@functionpointer
Copy link
Author

functionpointer commented Apr 3, 2025

Data misinterpretation seems unlikely, both my HMS1800 happily reported over 1000W for several hours today.
Additionally, the logs show no answers to any commands at all. Not even broken things we failed to decode.

Thats why my assumption is self-interference. At high power levels, the converters might make noise that interferes with the radio. The hoymiles DTU might be switching to different frequencies that still work.

That's why I suggested trying some frequencies in the DTU settings. Ideally with the older commit 802ca4e. It should allow changing the channel more quickly

@fishpepper
Copy link

The mystery is why the hoymiles DTU works. I think i will crack it open tomorrow afternoon and attach a Salae logic analyzer to the SPI bus and capture some logs. That way we will see how the DTU manages to keep the connection alive :)

@fishpepper
Copy link

Hoymiles DTU in close proximity worked fine as well.
I cracked open the DTU and found a serial port between the main CPU and a smaller CPU controlling the rf link.
For now i have set up a serial logger on RX and TX, to me it looks like the serial protocol (0x7E .... 0x7F). I will sniff the whole day tomorrow and write a decoder to see whats going on. I also attached a logic analyzer to rx,tx, and the spi bus but it can not keep up with the data on the raspi. So for now only serial logs...

@fishpepper
Copy link

fishpepper commented Apr 5, 2025

I have a question on the ChannelChangeCommand. If you look at my old log:

Mar 22 14:36:45 home python[869622]: TX ChannelChangeCommand 868.00 MHz --> 56 A0 09 A4 5C 80 16 67 96 02 15 21 14 14 56
Mar 22 14:36:45 home python[869622]: 2025-03-22 14:36:45 [INFO] Received: RX Period End
Mar 22 14:36:45 home python[869622]: All missing
Mar 22 14:36:45 home python[869622]: Nothing received, resend whole request
Mar 22 14:36:45 home python[869622]: TX ChannelChangeCommand 868.00 MHz --> 56 A0 09 A4 5C 80 16 67 96 02 15 21 14 14 56
Mar 22 14:36:45 home python[869622]: 2025-03-22 14:36:45 [INFO] Received: RX Period End
Mar 22 14:36:45 home python[869622]: All missing
Mar 22 14:36:45 home python[869622]: Nothing received, resend whole request
Mar 22 14:36:45 home python[869622]: TX ChannelChangeCommand 864.75 MHz --> 56 A0 09 A4 5C 80 16 67 96 02 15 21 14 14 56
Mar 22 14:36:46 home python[869622]: 2025-03-22 14:36:46 [INFO] Received: RX Period End
Mar 22 14:36:46 home python[869622]: All missing
Mar 22 14:36:46 home python[869622]: Nothing received, resend whole request
Mar 22 14:36:46 home python[869622]: TX ChannelChangeCommand 865.25 MHz --> 56 A0 09 A4 5C 80 16 67 96 02 15 21 14 14 56
Mar 22 14:36:46 home python[869622]: 2025-03-22 14:36:46 [INFO] Received: RX Period End
Mar 22 14:36:46 home python[869622]: All missing

The log says 868.00MHz, 864.75 MHz, and 865.25MHz but the command that is being sent is always the same:

 56 A0 09 A4 5C 80 16 67 96 02 15 21 14 14 56

You change the frequency at the CMT3200 but the packet you send always contains 865 MHz (0x14) as target?
Is this intentional?

@tbnobody
Copy link
Owner

tbnobody commented Apr 5, 2025

The log says 868.00MHz, 864.75 MHz, and 865.25MHz but the command that is being sent is always the same:

56 A0 09 A4 5C 80 16 67 96 02 15 21 14 14 56
You change the frequency at the CMT3200 but the packet you send always contains 865 MHz (0x14) as target?
Is this intentional?

Yes. At First the inverter always listens at 865. With the Channel change command the frequency will be Changed to the working frequency.
If the inverter does Not Receives any packet within 15 Minutes the inverter frequency falls back to 865.

@kwaaak
Copy link

kwaaak commented Apr 10, 2025

May I ask about your DTU hardware? I also had poor reception of an HMS device when production was high.
I am using the Fusion DTU board. After replacing the included adhesive antenna with a generic 868 MHz rubber rod antenna, reception is perfect. Using v25.4.4

@stefan123t
Copy link
Contributor

stefan123t commented Jun 12, 2025

@broth-itk posted some new DTU Pro S logs with actual Channel Change Command Sequence here:
#1218 (comment)

He first set it manually to Chanel #1, then Channel #5, Channel #10 and finally back again to "Auto"
communication_log.txt

These used to be his IDs and SW

DTU: 80164633 (v00.02.23)
HMS 1: 80729785 (1800-4T, v1.00.27)
HMS 2: 81313070 (2000-4T, v1.00.27)
HMS 3: 83199047 (350-1T, v1.00.14)

I have been able to determine the following ChannelChangeCommand 0x56 with Frame 0x01 and 0x02 respectively

     66 [TX] Frame: 7E 56 80164633 80164633 01 15212714 50 7F
     41 [TX] Frame: 7E 56 80164633 80164633 01 15212114 56 7F
      9 [TX] Frame: 7E 56 80164633 80164633 01 15211814 6F 7F
      4 [TX] Frame: 7E 56 80164633 80164633 01 15210C14 7B 7F
     10 [TX] Frame: 7E 56 83199047 83199047 02 15211814 6C 7F
      6 [TX] Frame: 7E 56 83199047 83199047 02 15210C14 78 7F
      4 [TX] Frame: 7E 56 83199047 83199047 02 15210F14 7B 7F
      4 [TX] Frame: 7E 56 81313070 81313070 02 15210F14 7B 7F
      2 [TX] Frame: 7E 56 80729785 80729785 02 15210F14 7B 7F

Here are the HMS ChannelChangeCommand 0xD6 responses to these commands.

    120 [RX] Frame: 7E D6 80164633 80164633 01 001521 E3 7F
     18 [RX] Frame: 7E D6 83199047 80164633 02 001521 4E 7F
      4 [RX] Frame: 7E D6 81313070 80164633 02 001521 F3 7F
      2 [RX] Frame: 7E D6 80729785 80164633 02 001521 E3 7F

We can also see the following 0x06 REQ_RF_SVERSISON commands but only a single received response 0x86

     62 [TX] Frame: 7E 06 81313070 81313070 00 06 7F
     58 [TX] Frame: 7E 06 80729785 80729785 00 06 7F
     39 [TX] Frame: 7E 06 83199047 83199047 00 06 7F
      1 [RX] Frame: 7E 86 83199047 80164633 02 1124831990470502000200000005 52 7F

Similar there is 0x07 REQ_RF_RVERSISON with SubCmd = 0x00 something all in UsartNrf_Send_NetCmdToNrfCmd()

      1 [TX] Frame: 7E 07 80164633 80164633 00 07 7F
      1 [RX] Frame: 7E 87 80164633 80164633 02 10FD801646330502000200000005 8B 7F

Maybe @rejoe2 can recall what these Commands were used for in the ancient MI Gen2 protocol, i.e. pre Gen3 HM and HMS/HMT ?
AFAIR these used to be sent in the very first logs on microcontroller.net too. Usually right after boot up.

Similar to the 0x02 BROADCAST Command the 0x06 REQ_RF_SVERSISON / 0x07 REF_RF_RVERSISON are IMO a kind of init for the RF protocol, i.e. the Hoymiles DTU uses it to tell the inverter that it should connect to a (new) DTU and train to answer it on a specific RF channel.

I will have to search through the channel_change_communication_log.txt to find these commands in context ...

2025-06-09 13:13:19.401703 [RX] Frame: 7E8780164633801646330210FD8016463305020002000000058B7F
2025-06-09 13:13:19.407326 [TX] Frame: 7E07801646338016463300077F

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants