Using RecorderService manim-voiceover hangs (before or after releasing the 'r' key?) #56

azampa · 2023-06-07T15:05:47Z

Description of bug / unexpected behavior

I render a file with CoquiService and both manim and manim-voicover work correctly. Then I pass to RecorderService, I select the input device (e.g. 13 - default) and start recording pressing the 'r' key. When I release the 'r' key manim-voiceover hangs as if lost in an infinite idle cycle. In fact, inspecting folder ./media/voiceovers I see that no file has been produced, therefore I suspect that manim-voiceover hangs waiting the user to press 'r'...

Expected behavior

Manim-voiceover should start recording the voiceover as soon as the user presses 'r'. After releasing the 'r' key manim-voiceover should ask to choose from the following options:

l to [l]isten to the recording
r to [r]e-record
a to [a]ccept the recording

How to reproduce the issue

Code for reproducing the problem

from manim import *
from manim_voiceover import VoiceoverScene
# from manim_voiceover.services.gtts import GTTSService
# from manim_voiceover.services.coqui import CoquiService
from manim_voiceover.services.recorder import RecorderService
from math import *

class RecordVoiceover(VoiceoverScene):
    
    def construct(self):
        circle = Circle()

        # self.set_speech_service(GTTSService(lang='it',tld='it',transcription_model='base'))
        self.set_speech_service(RecorderService())
        # self.set_speech_service(CoquiService(
        #                                         model_name='tts_models/it/mai_male/glow-tts',
        #                                         transcription_model='base'
        #                                     )
        #                        )

        with self.voiceover(
                text='''Ora creo un <bookmark mark="A"/> cerchio,
                        poi lo muovo a <bookmark mark="B"/> destra
                        e infine lo <bookmark mark="C"/> elimino.
                     ''') as tracker:
            self.wait_until_bookmark('A')
            self.play(Create(circle,run_time=0.5))
            self.wait_until_bookmark('B')
            self.play(circle.animate.shift(RIGHT),run_time=0.5)
            self.wait_until_bookmark('C')
            self.play(FadeOut(circle),run_time=0.5)

Additional media files

Images/GIFs

Logs

Terminal output

PASTE HERE OR PROVIDE LINK TO https://pastebin.com/ OR SIMILAR

System specifications

System Details

OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): Linux Ubuntu 23.04
RAM: 32 GB
Python version (python/py/python3 --version): 3.10.11
Installed modules (provide output from pip list):

Package                        Version
------------------------------ ------------
accelerate                     0.19.0
aiohttp                        3.8.4
aiosignal                      1.3.1
anyascii                       0.3.2
appdirs                        1.4.4
async-timeout                  4.0.2
attrs                          23.1.0
audioread                      3.0.0
azure-cognitiveservices-speech 1.29.0
Babel                          2.12.1
backports.cached-property      1.0.2
bangla                         0.0.2
blinker                        1.6.2
bnnumerizer                    0.0.2
bnunicodenormalizer            0.1.1
boltons                        23.0.0
brotlipy                       0.7.0
build                          0.10.0
CacheControl                   0.12.11
certifi                        2023.5.7
cffi                           1.15.1
charset-normalizer             3.1.0
clean-fid                      0.1.35
cleo                           2.0.1
click                          8.1.3
click-default-group            1.2.2
clip-anytorch                  2.5.2
cloup                          0.13.1
cmake                          3.26.3
colorama                       0.4.6
colour                         0.1.5
contourpy                      1.0.7
coqpit                         0.0.17
crashtest                      0.4.1
cryptography                   41.0.1
cycler                         0.11.0
Cython                         0.29.28
dataclasses                    0.8
dateparser                     1.1.8
decorator                      5.1.1
deepl                          1.14.0
distlib                        0.3.6
docker-pycreds                 0.4.0
docopt                         0.6.2
dulwich                        0.21.5
einops                         0.6.1
evdev                          1.6.1
ffmpeg-python                  0.2.0
filelock                       3.12.0
Flask                          2.3.2
fonttools                      4.39.4
frozenlist                     1.3.3
fsspec                         2023.5.0
ftfy                           6.1.1
future                         0.18.3
g2pkk                          0.1.2
gitdb                          4.0.10
GitPython                      3.1.31
glcontext                      2.3.7
gruut                          2.2.3
gruut-ipa                      0.13.0
gruut-lang-de                  2.0.0
gruut-lang-en                  2.0.0
gruut-lang-es                  2.0.0
gruut-lang-fr                  2.0.2
gTTS                           2.3.2
html5lib                       1.1
huggingface-hub                0.15.1
idna                           3.4
imageio                        2.31.0
importlib-metadata             6.6.0
importlib-resources            5.12.0
inflect                        5.6.0
installer                      0.7.0
isosurfaces                    0.1.0
itsdangerous                   2.1.2
jamo                           0.4.1
jaraco.classes                 3.2.3
jeepney                        0.8.0
jieba                          0.42.1
Jinja2                         3.1.2
joblib                         1.2.0
jsonlines                      1.2.0
jsonmerge                      1.9.0
jsonschema                     4.17.3
k-diffusion                    0.0.15
keyring                        23.13.1
kiwisolver                     1.4.4
kornia                         0.6.12
lazy_loader                    0.2
librosa                        0.10.0.post2
lit                            16.0.5.post0
llvmlite                       0.39.1
lockfile                       0.12.2
manim                          0.17.3
manim-voiceover                0.3.3.post0
ManimPango                     0.4.3
mapbox-earcut                  1.0.0
markdown-it-py                 2.2.0
MarkupSafe                     2.1.3
matplotlib                     3.7.1
mdurl                          0.1.0
mecab-python3                  1.0.5
moderngl                       5.8.2
moderngl-window                2.4.1
more-itertools                 9.1.0
mpmath                         1.3.0
msgpack                        1.0.5
multidict                      6.0.4
multipledispatch               0.6.0
mutagen                        1.46.0
networkx                       2.8.8
nltk                           3.8.1
num2words                      0.5.12
numba                          0.56.4
numpy                          1.23.5
nvidia-cublas-cu11             11.10.3.66
nvidia-cuda-cupti-cu11         11.7.101
nvidia-cuda-nvrtc-cu11         11.7.99
nvidia-cuda-runtime-cu11       11.7.99
nvidia-cudnn-cu11              8.5.0.96
nvidia-cufft-cu11              10.9.0.58
nvidia-curand-cu11             10.2.10.91
nvidia-cusolver-cu11           11.4.0.1
nvidia-cusparse-cu11           11.7.4.91
nvidia-nccl-cu11               2.14.3
nvidia-nvtx-cu11               11.7.91
openai-whisper                 20230314
packaging                      23.1
pandas                         2.0.2
pathtools                      0.1.2
pexpect                        4.8.0
Pillow                         9.5.0
pip                            23.1.2
pkginfo                        1.9.6
pkgutil_resolve_name           1.3.10
platformdirs                   3.5.1
poetry                         1.5.1
poetry-core                    1.6.1
poetry-plugin-export           1.4.0
pooch                          1.6.0
protobuf                       3.19.6
psutil                         5.9.5
ptyprocess                     0.7.0
PyAudio                        0.2.13
pycairo                        1.23.0
pycparser                      2.21
pydub                          0.25.1
pyglet                         1.5.27
Pygments                       2.15.1
pynndescent                    0.5.10
pynput                         1.7.6
pyOpenSSL                      23.2.0
pyparsing                      3.0.9
pypinyin                       0.49.0
pyproject_hooks                1.0.0
pyrr                           0.10.3
pyrsistent                     0.19.3
pysbd                          0.3.4
PySocks                        1.7.1
python-crfsuite                0.9.9
python-dateutil                2.8.2
python-dotenv                  0.21.1
python-slugify                 8.0.1
python-xlib                    0.33
pyttsx3                        2.90
pytz                           2023.3
PyWavelets                     1.4.1
PyYAML                         6.0
rapidfuzz                      2.15.1
regex                          2023.6.3
requests                       2.31.0
requests-toolbelt              1.0.0
resize-right                   0.0.2
rich                           13.4.1
scikit-image                   0.21.0
scikit-learn                   1.2.2
scipy                          1.10.1
screeninfo                     0.8.1
SecretStorage                  3.3.3
sentry-sdk                     1.25.0
setproctitle                   1.3.2
setuptools                     67.7.2
shellingham                    1.5.1
six                            1.16.0
skia-pathops                   0.7.4
smmap                          5.0.0
soundfile                      0.12.1
sox                            1.4.1
soxr                           0.3.5
srt                            3.5.2
stable-ts                      2.6.2
svgelements                    1.9.5
sympy                          1.12
tensorboardX                   2.6
text-unidecode                 1.3
threadpoolctl                  3.1.0
tifffile                       2023.4.12
tiktoken                       0.3.1
tokenizers                     0.13.3
tomli                          2.0.1
tomlkit                        0.11.8
torch                          2.0.1
torchaudio                     2.0.2
torchdiffeq                    0.2.3
torchsde                       0.2.5
torchvision                    0.15.2
tqdm                           4.65.0
trainer                        0.0.20
trampoline                     0.1.2
transformers                   4.29.2
triton                         2.0.0
trove-classifiers              2023.5.24
TTS                            0.14.3
typing_extensions              4.6.3
tzdata                         2023.3
tzlocal                        5.0.1
umap-learn                     0.5.1
unidic-lite                    1.0.8
urllib3                        1.26.15
virtualenv                     20.23.0
wandb                          0.15.4
watchdog                       2.2.1
wcwidth                        0.2.6
webencodings                   0.5.1
Werkzeug                       2.3.4
wheel                          0.40.0
yarl                           1.9.2
zipp                           3.15.0

LaTeX details

LaTeX distribution (e.g. TeX Live 2020): TeX Live 2022/Debian
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
configuration: --prefix=/tmp/build/80754af9/ffmpeg_1587154242452/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho --cc=/tmp/build/80754af9/ffmpeg_1587154242452/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-avresample --enable-gmp --enable-hardcoded-tables --enable-libfreetype --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame --disable-nonfree --enable-gpl --enable-gnutls --disable-openssl --enable-libopenh264 --enable-libx264
libavutil      56. 31.100 / 56. 31.100
libavcodec     58. 54.100 / 58. 54.100
libavformat    58. 29.100 / 58. 29.100
libavdevice    58.  8.100 / 58.  8.100
libavfilter     7. 57.100 /  7. 57.100
libavresample   4.  0.  0 /  4.  0.  0
libswscale      5.  5.100 /  5.  5.100
libswresample   3.  5.100 /  3.  5.100
libpostproc    55.  5.100 / 55.  5.100

Additional comments

The text was updated successfully, but these errors were encountered:

osolmaz · 2023-06-08T19:13:04Z

I can't reproduce this locally. Can you maybe insert some breakpoints (import ipdb; ipdb.set_trace()) to manim-voiceover source locally and tell me which line causes the hang?

azampa · 2023-06-09T07:53:07Z

Well, doing like you asked I determined that manim-voiceover hangs in services/recorder/utility.py between line 160 (reached) and line 163 (never reached). It reaches line 179, therefore it seems that MyListener() is not able to recognise pression of 'r' key. i.e. that self.listener.key_pressed is always false!

azampa · 2023-06-09T13:53:43Z

Doing more debugging I determined that MyListener.on_press() is never called (the listener gets initialised and started, though), therefore there is no chance for key_pressed to become True, and its value remains None!
Of course, I don't understand the reason for this behaviour...

osolmaz · 2023-06-10T06:16:29Z

Thank you, I’ll investigate this sooon

azampa · 2023-06-10T10:24:55Z

Ok, I found the origin of the issue: the fact is that pynput is not meant to work on Wayland but only on Xorg. When I switched to Ubuntu on Xorg all worked as expected.

Knowing this, you should either search for an alternative to pynput that works also on Wayland (such as this which, unfortunately, is currently unmaintained), or warn users to avoid Wayland when using manim-voiceover on Ubuntu...

osolmaz · 2023-06-13T03:45:03Z

Then this might relate to #44, you can follow the discussion there. (Note that I had issues with Gradio, and that's why I didn't merge until now) The CLI based approach was a quick hack to get the MVP going, and the ideal solution would be a standalone UI that works without cross platform compatilibility issues.

I found the following options:

https://github.com/PySimpleGUI/PySimpleGUI
https://github.com/hoffstadt/DearPyGui
https://github.com/beeware/toga

We could also go for an Electron app or locally hosted web app like Jupyter Notebook but then it would be more complicated to ship Python and JS code in the same package, albeit more future-proof. Open to suggestions.
cc @o-alexandre-felipe

azampa added the bug Something isn't working label Jun 7, 2023

azampa assigned osolmaz Jun 7, 2023

azampa changed the title ~~Using RecorderService manim-voiceover hangs after the release of 'r' key~~ Using RecorderService manim-voiceover hangs (before or after releasing the 'r' key?) Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using RecorderService manim-voiceover hangs (before or after releasing the 'r' key?) #56

Using RecorderService manim-voiceover hangs (before or after releasing the 'r' key?) #56

azampa commented Jun 7, 2023 •

edited

Loading

osolmaz commented Jun 8, 2023

azampa commented Jun 9, 2023 •

edited

Loading

azampa commented Jun 9, 2023

osolmaz commented Jun 10, 2023

azampa commented Jun 10, 2023 •

edited

Loading

osolmaz commented Jun 13, 2023 •

edited

Loading

Using RecorderService manim-voiceover hangs (before or after releasing the 'r' key?) #56

Using RecorderService manim-voiceover hangs (before or after releasing the 'r' key?) #56

Comments

azampa commented Jun 7, 2023 • edited Loading

Description of bug / unexpected behavior

Expected behavior

How to reproduce the issue

Additional media files

Logs

System specifications

Additional comments

osolmaz commented Jun 8, 2023

azampa commented Jun 9, 2023 • edited Loading

azampa commented Jun 9, 2023

osolmaz commented Jun 10, 2023

azampa commented Jun 10, 2023 • edited Loading

osolmaz commented Jun 13, 2023 • edited Loading

azampa commented Jun 7, 2023 •

edited

Loading

azampa commented Jun 9, 2023 •

edited

Loading

azampa commented Jun 10, 2023 •

edited

Loading

osolmaz commented Jun 13, 2023 •

edited

Loading