Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAProxy crashing on start, thread 2 is about to kill the process. #224

Open
ahuston-0 opened this issue Dec 11, 2023 · 11 comments
Open

HAProxy crashing on start, thread 2 is about to kill the process. #224

ahuston-0 opened this issue Dec 11, 2023 · 11 comments

Comments

@ahuston-0
Copy link

Not sure if I should file this here or with the upstream, but my haproxy setup was working fine for months until a few days ago. I'm not sure what changed, but now it crashes with "Thread 2 is about to kill the process" and then no traffic goes through.

Environment Details
OS: NixOS 23.11
Docker 24.0.5
Storage is OverlayFS2 on ZFS 2.2.2

HAProxy config

global
 # log stdout format raw local0 info
  log stdout format raw local0
  crt-base /etc/ssl/certs/

defaults
  log global
  mode http
  timeout client 2000m
  timeout connect 200s
  timeout server 2000m
  timeout http-request 2000m

#Application Setup
frontend ContentSwitching
  bind *:80
 # bind *:443 ssl crt /etc/ssl/certs/cloudflare.pem
  bind *:443 ssl crt /etc/ssl/certs/origin_ca_ecc_root_new.pem
  mode  http
  option httplog

  # Front-end acess control list
  acl host_glances hdr(host) -i monit.mydomain.xyz
  acl host_glances hdr(host) -i glances.mydomain.xyz
  # Backend-forwarding
  use_backend glances_nodes if host_glances

backend glances_nodes
  mode http
  server server glances:61208

Docker Compose file

services:
  haproxy:
    privileged: false
    restart: always
    image:latest
     volumes:
      - ./haproxy:/usr/local/etc/haproxy:ro
      - ./certs:/etc/ssl/certs:ro
    ports:
      - 9080:80
      - 9443:443
      - 8404:8404
      - 25565:25565
    environment:
      - PUID=997
      - PGID=100
    networks:
      - web
      - default
networks:
  web:
    name: haproxy-net

Crash logs

haproxy-1 | [NOTICE] (1) : New worker (8) forked
haproxy-1 | [NOTICE] (1) : Loading success.
haproxy-1 | Thread 2 is about to kill the process.
haproxy-1 | Thread 1 : id=0x7fc626c57100 act=1 glob=0 wq=0 rq=1 tl=0 tlsz=0 rqsz=1
haproxy-1 | 1/1 stuck=0 prof=0 harmless=0 isolated=0
haproxy-1 | cpu_ns: poll=0 now=6035627479 diff=6035627479
haproxy-1 | curr_task=0
haproxy-1 | *>Thread 2 : id=0x7fc626c4b700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
haproxy-1 | 1/2 stuck=1 prof=0 harmless=0 isolated=0
haproxy-1 | cpu_ns: poll=97660 now=2001614429 diff=2001516769
haproxy-1 | curr_task=0
haproxy-1 | call trace(10):
haproxy-1 | | 0x564a2f23d661 [eb ba 66 66 2e 0f 1f 84]: ha_thread_dump+0x91/0x93
haproxy-1 | | 0x564a2f23d8a1 [64 48 8b 53 10 64 48 8b]: ha_panic+0x111/0x3f4
haproxy-1 | | 0x7fc626f86140 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x13140
haproxy-1 | | 0x564a2f284282 [48 83 7a 28 00 74 36 48]: fd_reregister_all+0x32/0xb1
haproxy-1 | | 0x564a2f0b5709 [b8 01 00 00 00 5b 5d 41]: main+0x4ec9
haproxy-1 | | 0x564a2f21345e [85 c0 0f 84 5a 03 00 00]: main+0x162c1e
haproxy-1 | | 0x7fc626f7aea7 [64 48 89 04 25 30 06 00]: libpthread:+0x7ea7
haproxy-1 | | 0x7fc626e9aa2f [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a

@Darlelet
Copy link
Contributor

If you still encounter the issue please file an issue on HAProxy directly (https://github.com/haproxy/haproxy/issues)

"Thread X is about to kill the process" means haproxy watchdog noticed that one thread has become unresponsive and to prevent further issues the watchdog decided to abort the process.

Since you're deploying using the :latest tag, it's very likely that you are hitting a bug or limitation which only happens on haproxy 2.9 (which was just released, see 81e9df2) and didn't show up with the previous version (2.8)

@ahuston-0
Copy link
Author

For reference, I've managed to get a copy of my haproxy config running with the nixos haproxy service, so the issue is isolated to docker.

@ahuston-0
Copy link
Author

If you still encounter the issue please file an issue on HAProxy directly (https://github.com/haproxy/haproxy/issues)

"Thread X is about to kill the process" means haproxy watchdog noticed that one thread has become unresponsive and to prevent further issues the watchdog decided to abort the process.

Since you're deploying using the :latest tag, it's very likely that you are hitting a bug or limitation which only happens on haproxy 2.9 (which was just released, see 81e9df2) and didn't show up with the previous version (2.8)

I think I've seen the same on haproxy lts but let me file a bug with upstream. Thanks

@Darlelet
Copy link
Contributor

Darlelet commented Jan 13, 2024

Any news about this issue? Do you still encounter the crash with :latest or :2.9.2 tag which where some high cpu usage related bugs were addressed?

Thanks

@ahuston-0
Copy link
Author

Hi, I've upgraded to the latest version and will get back in a few hours. I did discover though that the original issue I was having that was causing these logs is some memory/FD bug. If I dont add a nofile limit to the container it was just consuming like 200GB of memory and then crashing (this is on a server with ~256GB of memory available).

@ahuston-0
Copy link
Author

This is what I added

    ulimits:
      nofile:
        soft: 1024
        hard: 4096

@Darlelet
Copy link
Contributor

Does it instantly ramp up to 200GB of used memory or is it slowing getting there, which would suggest a leak somewhere?

In the first case, maybe this is not related to haproxy itself but to docker engine update (which removed or increased an existing limit), see haproxy/haproxy#2043. If that's the case, you could also mitigate using maxconn or fd-hard-limit global parameters in haproxy config file.

@ahuston-0
Copy link
Author

Does it instantly ramp up to 200GB of used memory or is it slowing getting there, which would suggest a leak somewhere?

In the first case, maybe this is not related to haproxy itself but to docker engine update (which removed or increased an existing limit), see haproxy/haproxy#2043. If that's the case, you could also mitigate using maxconn or fd-hard-limit global parameters in haproxy config file.

It was happening over the span of like a minute or two. I can probably time it and get back. I'll check out those two settings and see if they help.

Regarding the CPU utilization issue, I did the upgrade last night and 12 hours in we're at 0.6% CPU so I think that one might have worked.

@ahuston-0
Copy link
Author

Does it instantly ramp up to 200GB of used memory or is it slowing getting there, which would suggest a leak somewhere?

In the first case, maybe this is not related to haproxy itself but to docker engine update (which removed or increased an existing limit), see haproxy/haproxy#2043. If that's the case, you could also mitigate using maxconn or fd-hard-limit global parameters in haproxy config file.

It looks like this ended up working. I removed the ulimits settings in docker-compose.yml and replaced it with maxconn 60000 in haproxy.cfg and now theres no more high memory utlization and crashing. Then for the other issue having upgraded to 2.9.2 seems to have fixed the high CPU utilization.

@Darlelet
Copy link
Contributor

Great news, thanks for sharing your positive results with us.

@LaurentGoderre
Copy link
Member

Can we close this as solved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants