Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SIGSEGV in fra-ai #3346

Merged
merged 15 commits into from
Jan 14, 2025
Merged

Fix SIGSEGV in fra-ai #3346

merged 15 commits into from
Jan 14, 2025

Conversation

leszko
Copy link
Contributor

@leszko leszko commented Jan 10, 2025

FFMPEG causes SIGSEGV, which is very hard to analyze. The bug can be in LPMS or in the FFMPEG library itself. This PR extracts running ffmpeg to a separate process.

Notes:

  • I've used system ffmpeg instead of the one from LPMS; We could switch to LPMS, but then we need to build an additional binary (which I find redundant)
  • Additionally I've added Timeouts for the requests done against MediaMTX; It's not strictly related to SIGSEGV, but I noticed there were some hanging connections between Livepeer and MediaMTX

@github-actions github-actions bot added docker Pull requests that update Docker code go Pull requests that update Go code labels Jan 10, 2025
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 0% with 29 lines in your changes missing coverage. Please review.

Project coverage is 33.72602%. Comparing base (3e89259) to head (e52f462).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
server/ai_live_video.go 0.00000% 12 Missing ⚠️
media/rtmp2segment.go 0.00000% 11 Missing ⚠️
media/mediamtx.go 0.00000% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #3346         +/-   ##
===================================================
- Coverage   33.73323%   33.72602%   -0.00721%     
===================================================
  Files            141         141                 
  Lines          37426       37434          +8     
===================================================
  Hits           12625       12625                 
- Misses         24080       24088          +8     
  Partials         721         721                 
Files with missing lines Coverage Δ
media/mediamtx.go 0.00000% <0.00000%> (ø)
media/rtmp2segment.go 0.00000% <0.00000%> (ø)
server/ai_live_video.go 0.00000% <0.00000%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e89259...e52f462. Read the comment docs.

Files with missing lines Coverage Δ
media/mediamtx.go 0.00000% <0.00000%> (ø)
media/rtmp2segment.go 0.00000% <0.00000%> (ø)
server/ai_live_video.go 0.00000% <0.00000%> (ø)

@leszko leszko changed the title Add debug logs to investigate crashing fra-ai Fix SIGSEGV in fra-ai Jan 13, 2025
@leszko leszko marked this pull request as ready for review January 13, 2025 12:37
@leszko leszko requested review from mjh1 and j0sh January 13, 2025 12:37
@mjh1
Copy link
Contributor

mjh1 commented Jan 13, 2025

So purely running it as a separate binary fixes the problem? Or you think there's some bug fixed in a later version of ffmpeg which we're using when running it as a separate binary?

@leszko
Copy link
Contributor Author

leszko commented Jan 13, 2025

So purely running it as a separate binary fixes the problem? Or you think there's some bug fixed in a later version of ffmpeg which we're using when running it as a separate binary?

It doesn't fix the SIGSEGV, but it makes it way better handled, because not the whole Gateway pod is restarted (which closes all streams).

At the same time, I think we don't need to spend more time on that right now, because maybe the impact is really low or maybe it's even zero impact (e.g. it might have happened only while closing the stream or something). So, I'd wait and see if we need to still investigate it.

VideoEncoder: ffmpeg.ComponentOptions{Name: "copy"},
Muxer: ffmpeg.ComponentOptions{Name: "segment"},
}})
cmd := exec.Command("ffmpeg",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j0sh what were the reasons to use lpms transcode3 here? Guessing mostly ease of use since it was in go-livepeer already?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct - we don't want to require additional executables outside go-livepeer itself (but this fix is fine for now since it gets us out of a pickle)

@leszko leszko merged commit 1a4c7c7 into master Jan 14, 2025
17 of 18 checks passed
@leszko leszko deleted the rafal/debug-sigsegv branch January 14, 2025 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker Pull requests that update Docker code go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants