Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite load for a specific container #3385

Closed
1 task done
FrancYescO opened this issue Nov 13, 2024 · 39 comments
Closed
1 task done

Infinite load for a specific container #3385

FrancYescO opened this issue Nov 13, 2024 · 39 comments
Labels
bug Something isn't working

Comments

@FrancYescO
Copy link

πŸ” Check for existing issues

  • Completed

How is Dozzle deployed?

Standalone Deployment

πŸ“¦ Dozzle version

8.7.5

βœ… Command used to run Dozzle

as doc

πŸ› Describe the bug / provide steps to reproduce it

i have a specific container that does not load logs, is like the same infinite load as if you set a log filter that have no result, or if a container have a 0b of logs (can we fix also these 2cases and get the loading bar instead of spinner if /stream endpoint connection is fullfilled? :D)... but in this case the logs are populated just the /stream api returns empty

going backward with versions i noticed the bug popups in >=v7.0.5 while v7.0.4 correctly loads the logs of this container

for reference is what i posted here #3222 (comment)

all other container on the same host shows no issue, so i assume is something in the content of logs that breaks things, (is a pretty standard nginx log output)... this is how it ends actually:

...
2023-06-01T13:42:08.858573533Z 172.18.0.13 - - [01/Jun/2023:13:42:08 +0000] "GET /api/rpc/command/get-file-libraries?file-id=3b37f927-c632-809f-8002-67f1ddd6fbf5&features=storage/pointer-map HTTP/1.1" 200 113778 "-" "Mozilla/5.0 (X11; Linux x86_64) Chrome/113.0.0.0 Safari/537.36"
2023-09-21T10:01:32.969509513Z 2023/09/21 10:01:26 [emerg] 1#1: host not found in upstream "penpot-backend" in /etc/nginx/nginx.conf:85
2023-09-21T10:01:33.372310975Z nginx: [emerg] host not found in upstream "penpot-backend" in /etc/nginx/nginx.conf:85
2023-10-03T09:02:23.994639918Z 2023/10/03 09:02:23 [emerg] 1#1: host not found in upstream "penpot-backend" in /etc/nginx/nginx.conf:85
2023-10-03T09:02:24.415120539Z nginx: [emerg] host not found in upstream "penpot-backend" in /etc/nginx/nginx.conf:85

πŸ’» Environment

πŸ“Έ If applicable, add screenshots to help explain your bug

No response

πŸ“œ If applicable, attach your Dozzle logs. You many need to enable debug mode. See https://dozzle.dev/guide/debugging.

No response

@FrancYescO FrancYescO added the bug Something isn't working label Nov 13, 2024
@amir20
Copy link
Owner

amir20 commented Nov 13, 2024

I reviewed the differences between versions 7.0.4 and 7.0.5 at v7.0.4...v7.0.5. One thing stood out: in version 7.0.5, I follow containers and only stream logs since the container started. This change prevents old logs from appearing when containers are restarted. I doubt this is the source of the problem.

I need a way to reproduce this issue on my end.

  1. How are you starting Nginx?
  2. If you create a dummy Nginx instance, does the same issue occur?
  3. Have you confirmed that no filters, log levels, or stdout/stderr toggles are set?
  4. What does the API endpoint /stream return?

I can't think of anything else.

@FrancYescO
Copy link
Author

FrancYescO commented Nov 13, 2024

is not a vanilla nginx is the one embedded in the frontend of this stack: https://github.com/penpot/penpot/blob/develop/docker/images/docker-compose.yaml runned via portainer and with restart: unless-stopped instead of always

  1. No. i have ~30services on this host only this container have the issue

  2. Sure.

  3. After ~6s just pings :ping

@amir20
Copy link
Owner

amir20 commented Nov 14, 2024

I am stumped on this one. Seems like your frontend container is based off nginxinc/nginx-unprivileged. docker run nginxinc/nginx-unprivileged seems to work with Dozzle.

Not sure what can be done to debug. Maybe completely destroy the container and recreate it.

@FrancYescO
Copy link
Author

i've recreated the container and the bug (but also older logs :D) disappeared, so actually i have no way to replicate we don't know the culprit but maybe we can just close this isse....

@amir20 amir20 closed this as completed Nov 15, 2024
@FrancYescO
Copy link
Author

FrancYescO commented Nov 18, 2024

i reuse this issue but i'm not sure this is the same bug ... today with 8.8.0 i was trying to see log of a specific container but figured out that it was searching for logs of wrong-destroyed container id:

i got my container id e9fbd40f2b29 running than deployed a new image via portainer, so the container e9fbd40f2b29 got stopped and restarted with a new id 20c05b0abb96, dozzle did not switched to the new container and also after refreshing the page/clearing cache looks like was still seeing the old id e9fbd40f2b29 as running (but with no logs)

each click on the container in the sidebar result in dozzle logs says that the containerid was not found but in the webpage the "Container not found" message did not show (was just loading a few sec than shows no log, not the same as if you put a non-existent id in the URL)

recreating the container did not help this time, instead restarting dozzle container fixed the issue.... is there some sort of cache somewhere?

@amir20
Copy link
Owner

amir20 commented Nov 18, 2024

Hey @FrancYescO,

I haven't seen this issue personally. Someone else created it a while back, but I can't find it right now. I'll need your help to debug. If I can't reproduce the issue, it will be difficult to fix.

Yes, Dozzle has an internal "store" that caches and stores all statistics for containers. It usually works well. I listen for specific Docker events to update the internal store.

It seems like there might be an issue with Portainer, as the other case I remember was similar.

Can you try to reproduce this state again? If you can consistently get into this state, it would be helpful to enable DEBUG logs for Dozzle and also keep docker system events open while this is happening.

Most likely, some unusual event is occurring, and Dozzle isn't refreshing its own store.

Thanks!

@FrancYescO
Copy link
Author

i'll keep an eye on it to find a way to easily reproduce... surely i can say that seems is the first time i've seen the issue and my deployment was ever done in the same way

listen for specific Docker events

at what events are you listening for destroy and recreate? ... maybe there is some edge-case event the is getting ignored

@amir20
Copy link
Owner

amir20 commented Nov 18, 2024

Dozzle listens to start, destroy, die, rename and few other ones.

You can find it at

case "start":
ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
if container, err := s.client.FindContainer(ctx, event.ActorID); err == nil {
list, _ := s.client.ListContainers(ctx)
// make sure the container is in the list of containers when using filter
valid := lo.ContainsBy(list, func(item Container) bool {
return item.ID == container.ID
})
if valid {
log.Debug().Str("id", container.ID).Msg("container started")
s.containers.Store(container.ID, &container)
s.newContainerSubscribers.Range(func(c context.Context, containers chan<- Container) bool {
select {
case containers <- container:
case <-c.Done():
}
return true
})
}
}
cancel()
case "destroy":
log.Debug().Str("id", event.ActorID).Msg("container destroyed")
s.containers.Delete(event.ActorID)
case "die":
s.containers.Compute(event.ActorID, func(c *Container, loaded bool) (*Container, bool) {
if loaded {
log.Debug().Str("id", c.ID).Msg("container died")
c.State = "exited"
return c, false
} else {
return c, true
}
})
case "health_status: healthy", "health_status: unhealthy":
healthy := "unhealthy"
if event.Name == "health_status: healthy" {
healthy = "healthy"
}
s.containers.Compute(event.ActorID, func(c *Container, loaded bool) (*Container, bool) {
if loaded {
log.Debug().Str("id", c.ID).Str("health", healthy).Msg("container health status changed")
c.Health = healthy
return c, false
} else {
return c, true
}
})
case "rename":

@FrancYescO
Copy link
Author

Btw just happened again...

I'll keep trying to isolate the issue but pretty sure is something with portainer webhook redeploy (or recreate?) methods

@FrancYescO
Copy link
Author

FrancYescO commented Nov 20, 2024

i'm not 100% sure but this should be an extract during that bug happend .. maybe we miss kill? or the rename after kill-stop-die-destroy do some magic ?

...
2024-11-20T10:41:57.305370996+01:00 image pull registry.org/image:latest
2024-11-20T10:41:57.418400610+01:00 container create 6e5d2f7db6f5a8b2d1b0ae6a42aed70dbb7f039fb51a79599e274e2d4b5f98b4
2024-11-20T10:41:57.436166302+01:00 container kill 21a85032fee12773a36b060f769dc7b5c7ad72af0342333abb7642d2d9eef6a9
2024-11-20T10:41:58.120718193+01:00 container stop 21a85032fee12773a36b060f769dc7b5c7ad72af0342333abb7642d2d9eef6a9
2024-11-20T10:41:58.125265685+01:00 container die 21a85032fee12773a36b060f769dc7b5c7ad72af0342333abb7642d2d9eef6a9
2024-11-20T10:41:58.136485321+01:00 container destroy 21a85032fee12773a36b060f769dc7b5c7ad72af0342333abb7642d2d9eef6a9
2024-11-20T10:41:58.145454536+01:00 container rename 6e5d2f7db6f5a8b2d1b0ae6a42aed70dbb7f039fb51a79599e274e2d4b5f98b4
2024-11-20T10:41:58.804224198+01:00 container start 6e5d2f7db6f5a8b2d1b0ae6a42aed70dbb7f039fb51a79599e274e2d4b5f98b4
...

@amir20
Copy link
Owner

amir20 commented Nov 20, 2024

Hmmm....

I don't see anything wrong here. I see two containers. The one starting 21 and the other with 6e.

  1. 6e gets created. This event is ignored by Dozzle.
  2. Then renamed. Which gets picked up but ignored because Dozzle never picked up the created event.
  3. Then started which should be handled correclty.

And then other container:

  1. Container 21 get killed. Which is ignored. stoped which is ignored.
  2. die should be picked up.
  3. And finally destroy should remove it.

One thing you can do is run with --level trace for Dozzle. You should see something like "event=.... received container event" which output all the events Dozzle sees.

Perhaps compare those with system events. There might be something else happening. How are you getting to this state? I tried setting a Portainer a while ago but it just seemed more complicated and I couldn't figure out how to use it properly.

@FrancYescO
Copy link
Author

It happened 2times during a stack update via portainer webhook (you should register at least a free account for the business edition)

But this happens various time during the day and all keeps working correctly so I'm still not able to find an easy replicable path...

@amir20
Copy link
Owner

amir20 commented Nov 20, 2024

But this happens various time during the day and all keeps working correctly so I'm still not able to find an easy replicable path.

I wonder if this is a race issue, as it occurs sometimes but not always.

  1. What specific action are you taking on Portainer? Are you updating a container?
  2. And I know you said this earlier, but what is the broken behavior?
  3. Is it that the IDs and names are incorrect? Or something else.

@FrancYescO
Copy link
Author

  1. CI push an image > call portainer stack webhook > portainer pull and redeploy the stack (the updated pushed image)
  2. container is shown on the sidebar but not loading logs as it points to a wrong container ID
  3. ID (the one showed also in browser URL)

@amir20
Copy link
Owner

amir20 commented Nov 20, 2024

I see. So wrong id, correct name. I didn't realize this is part of a stack. I can try it out.

@amir20
Copy link
Owner

amir20 commented Nov 21, 2024

@FrancYescO, is dozzle part of your stack, or is it always running separately, and the stack is updated independently?

@FrancYescO
Copy link
Author

@FrancYescO, is dozzle part of your stack, or is it always running separately, and the stack is updated independently?

I have various stack, and dozzle is part of another stack

@amir20
Copy link
Owner

amir20 commented Nov 21, 2024

@FrancYescO, no luck. I set up Portainer Community Edition and created a new stack like this:

services:
  echo:
    image: amir20/echo:latest
    command: -r
    deploy:
      replicas: 3 # Specifies three replicas
      update_config:
        parallelism: 1 # Update one container at a time
        delay: 10s # Wait 10s between updating each container
      restart_policy:
        condition: on-failure
        max_attempts: 3

Everything was working well in Dozzle. I redeployed the stack by changing amir20/echo:latest to amir20/echo:v2. I checked the logs, and everything looked fine. In Dozzle, all the IDs matched those in docker ps, and everything functioned properly.

I wonder if the issue is related to the webhooks. Unfortunately, that's beyond the time I have available. I can't seem to reproduce it easily. If you find a reliable way to reproduce it without setting up the entire pipeline, that would be helpful.

@FrancYescO
Copy link
Author

in my case the image, have ever the same "latest" tag.... but i don't think that matter... this happened 2 time more today (while the stack was probably redeployed with no dozzle issue 20+ more), but i was still not tracing the events properly i'll update here if i find something more

@FrancYescO
Copy link
Author

Ah what surely I should say is that during the portainer redeploy you will NEVER be redirected to the new container (but this can be a different issue related to #3036 )

@FrancYescO
Copy link
Author

Here we are again, and now i'm sure, here a result of the dozzle filtered log grep -E "e4c57f33f6d8|c8cabeb803ed|Dozzle"

e4c57f33f6d8 > container that remain orphaned after the stack update
c8cabeb803ed > the container that replaced e4c57f33f6d8
Dozzle > just to show when i've restarted dozzle

"Error response from daemon: No such container: e4c57f33f6d8" is me clicking on the cointainer in the sidebar and getting no logs...

looks dozzle never received events from c8cabeb803ed, the only thing i see before the Dozzle restart is "starting to stream stats"...

2024-11-22T09:01:50.058746494Z {"level":"info","time":"2024-11-22T09:01:50Z","message":"Dozzle version v8.8.1"}
2024-11-22T11:04:52.017508387Z {"level":"info","time":"2024-11-22T11:04:52Z","message":"Dozzle version v8.8.1"}
2024-11-22T14:03:57.154746555Z {"level":"trace","event":"create","id":"e4c57f33f6d8","time":"2024-11-22T14:03:57Z","message":"received container event"}
2024-11-22T14:03:57.899557141Z {"level":"trace","event":"rename","id":"e4c57f33f6d8","time":"2024-11-22T14:03:57Z","message":"received container event"}
2024-11-22T14:03:57.899577988Z {"level":"debug","action":"rename","id":"e4c57f33f6d8","time":"2024-11-22T14:03:57Z","message":"container event"}
2024-11-22T14:03:57.899582791Z {"level":"debug","action":"rename","id":"e4c57f33f6d8","time":"2024-11-22T14:03:57Z","message":"container event"}
2024-11-22T14:03:58.380355811Z {"level":"debug","container":"e4c57f33f6d8","host":"sasasa","time":"2024-11-22T14:03:58Z","message":"starting to stream stats"}
2024-11-22T14:03:58.380398024Z {"level":"trace","event":"start","id":"e4c57f33f6d8","time":"2024-11-22T14:03:58Z","message":"received container event"}
2024-11-22T14:03:58.380429289Z {"level":"debug","id":"e4c57f33f6d8","time":"2024-11-22T14:03:58Z","message":"Finding container"}
2024-11-22T14:03:58.402801270Z {"level":"debug","id":"e4c57f33f6d8","time":"2024-11-22T14:03:58Z","message":"container started"}
2024-11-22T14:03:58.402823770Z {"level":"debug","action":"start","id":"e4c57f33f6d8","time":"2024-11-22T14:03:58Z","message":"container event"}
2024-11-22T14:03:58.402845560Z {"level":"debug","action":"start","id":"e4c57f33f6d8","time":"2024-11-22T14:03:58Z","message":"container event"}
2024-11-22T14:18:38.038759596Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:18:38Z","message":"Streaming logs for container"}
2024-11-22T14:28:53.477980308Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:28:53Z","message":"Streaming logs for container"}
2024-11-22T14:34:27.811490031Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:34:27Z","message":"Streaming logs for container"}
2024-11-22T14:35:18.459489416Z {"level":"debug","container":"e4c57f33f6d8","time":"2024-11-22T14:35:18Z","message":"streaming ended"}
2024-11-22T14:35:18.459538628Z {"level":"debug","event":"container-stopped","container":"e4c57f33f6d8","time":"2024-11-22T14:35:18Z","message":"received event"}
2024-11-22T14:35:19.570267074Z {"level":"debug","container":"c8cabeb803ed","host":"sasasa","time":"2024-11-22T14:35:19Z","message":"starting to stream stats"}
2024-11-22T14:38:30.085802175Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:38:30Z","message":"Streaming logs for container"}
2024-11-22T14:38:30.086887681Z {"level":"error","error":"Error response from daemon: No such container: e4c57f33f6d8","container":"e4c57f33f6d8","time":"2024-11-22T14:38:30Z","message":"unknown error while streaming logs"}
2024-11-22T14:39:16.178559669Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:39:16Z","message":"Streaming logs for container"}
2024-11-22T14:39:16.180725431Z {"level":"error","error":"Error response from daemon: No such container: e4c57f33f6d8","container":"e4c57f33f6d8","time":"2024-11-22T14:39:16Z","message":"unknown error while streaming logs"}
2024-11-22T14:39:26.350330671Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:39:26Z","message":"Streaming logs for container"}
2024-11-22T14:39:26.351109664Z {"level":"error","error":"Error response from daemon: No such container: e4c57f33f6d8","container":"e4c57f33f6d8","time":"2024-11-22T14:39:26Z","message":"unknown error while streaming logs"}
2024-11-22T14:42:49.928094353Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:42:49Z","message":"Streaming logs for container"}
2024-11-22T14:42:49.929305570Z {"level":"error","error":"Error response from daemon: No such container: e4c57f33f6d8","container":"e4c57f33f6d8","time":"2024-11-22T14:42:49Z","message":"unknown error while streaming logs"}
2024-11-22T14:43:40.137630195Z {"level":"info","time":"2024-11-22T14:43:40Z","message":"Dozzle version v8.8.1"}
2024-11-22T14:43:40.209060944Z {"level":"debug","id":"c8cabeb803ed","time":"2024-11-22T14:43:40Z","message":"Finding container"}
2024-11-22T14:43:43.596250130Z {"level":"debug","container":"c8cabeb803ed","host":"sasasa","time":"2024-11-22T14:43:43Z","message":"starting to stream stats"}
2024-11-22T14:43:55.880136993Z {"level":"debug","id":"c8cabeb803ed","since":"2024-11-22T14:35:19Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:43:55Z","message":"Streaming logs for container"}

@amir20
Copy link
Owner

amir20 commented Nov 22, 2024

Can you attach the whole logs? I am looking for other errors such as "docker store unexpectedly disconnected from docker events". It doesn't make sense why there is a bunch of "received container event" events and then suddenly they stop.

@FrancYescO
Copy link
Author

@amir20
Copy link
Owner

amir20 commented Nov 22, 2024

OK it's not what I thought. But I have a wild guess. Give me a little time. Let me check something and provide a PR.

@amir20 amir20 reopened this Nov 22, 2024
@amir20
Copy link
Owner

amir20 commented Nov 22, 2024

#3417 I am just taking a guess here. I noticed that after e4c57f33f6d8 started, no other events were received. So I thought maybe it's blocking for some reason.

I don't know why it would be but this would be a fix. Can you try pr-3417?

@FrancYescO
Copy link
Author

ok i'll leave this pr running, i'll update if happens again...

@amir20
Copy link
Owner

amir20 commented Nov 22, 2024

If it works, it will make me go crazy why. Because currently, it shouldn't block.

By the way, how many people are using Dozzle in a browser? Is it just you or is it a few?

@FrancYescO
Copy link
Author

few, but often non-concurrent

@amir20
Copy link
Owner

amir20 commented Nov 22, 2024

Hmm, okay. That could explain what I am seeing. I'll add more debugging logs later. If my intuition is correct, a deadlock occurs after a new container is found because multiple sessions are trying to write to a channel.

@amir20
Copy link
Owner

amir20 commented Nov 22, 2024

I am guessing that didn't solve your problem. I didn't realize you had originally greped the logs so only logs with e4c57f33f6d8 were being displayed. Looking at the full log, it looks to be fine.

It's crazy that I don't see c8cabeb803ed at all in the logs except for stats. I am pretty stumped right now.

@amir20
Copy link
Owner

amir20 commented Nov 23, 2024

Alright, I think we need to do some deeper investigation. I have enabled stack trace dump in #3418. You can deploy with pr-3418.

Next time this happens, go to https://<path-to-dozzle>/debug. You should see a page with link to goroutines. Send me that page. I want to understand why container store is not receiving any events. This will tell me if that thread is stuck some where. For a good state, we should see something like /dozzle/internal/docker/container_store.go:176 and container_store.go:56.

Let me know!

@amir20
Copy link
Owner

amir20 commented Nov 25, 2024

I am going to merge the debug API to master. I felt adding an extra 500KB is not worth it. But whatever. If it was merged before then we could have troubleshooted this issue sooner.

So you can try master if you want.

@FrancYescO
Copy link
Author

I've deleted my comments, was nothing related to this issue...still waiting for this to rehappen.

@amir20
Copy link
Owner

amir20 commented Nov 26, 2024

So has this not happened yet?

@FrancYescO
Copy link
Author

Now that i'm waiting for it, no πŸ˜…

Unfortunately the stack update now are fewer per day so less possibilities to retrigger...and I was just wondering if this can happen only if there are concurrent users using dozzle on the same container

@amir20
Copy link
Owner

amir20 commented Nov 26, 2024

It would increase the chances since Dozzle has a complicated backend with all kinds of multi threading and semaphores. Once it happens, the stack trace will help me if any of the routines are stuck any where. The worst thing that could happen is you say it didn't work but the stack trace looks right. Because then I wouldn't know how to go about investigating.

Let's see how next few days go.

@amir20
Copy link
Owner

amir20 commented Dec 3, 2024

Any update?

@FrancYescO
Copy link
Author

Unfortunately not.

I'm start thinking the bug was caused only if multiple user are on the same container log that is being redeployed, but i'm still on the pr where you just added debug and the issue still not popupped again...

@amir20
Copy link
Owner

amir20 commented Dec 4, 2024

Even if multiple users were present, I would imagine it would be the same thing as having multiple tabs open for one user. Which is something I do often.

I am going to close this. You can update to the latest release. As long as debug is enabled, you should be able to get me stracktrace.

Let's open a new one if it is found.

@amir20 amir20 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants