Skip to content

Conversation

@ggmarts04
Copy link

No description provided.

google-labs-jules bot and others added 30 commits May 22, 2025 06:06
This commit introduces changes to allow the model to be deployed on RunPod Serverless.

Key changes include:

- **`runpod_handler.py`**: A new handler script that serves as the entry point for RunPod. It processes video and audio URL inputs, downloads the files, and then uses the existing `Predictor` class to perform inference.
- **`Dockerfile`**: A new Dockerfile to build the container image for RunPod. It includes system dependencies (ffmpeg, libgl1), Python packages from `requirements.txt`, the downloader, and sets the appropriate CMD for the RunPod Python environment.
- **`requirements.txt`**: Added the `runpod` package, which is necessary for the RunPod serverless environment.
- **`RUNPOD_DEPLOYMENT.md`**: New documentation detailing how to build the Docker image, deploy it to RunPod, and make requests to the serverless endpoint.

The existing core inference logic in `predict.py` and `scripts/inference.py` remains largely unchanged, with the new handler acting as an adapter for the RunPod environment and URL-based inputs.
feat: Adapt model for RunPod serverless deployment
The previous Dockerfile was missing `curl`, which is required for downloading
the `pget` tool during the image build process. This resulted in a
"curl: not found" error.

This commit adds `curl` to the `apt-get install` command in the Dockerfile
to ensure it is available.
fix: Add curl to Dockerfile system dependencies
This commit addresses two main issues:

1.  **Fixes `pget` 404 error during local testing:**
    The `runpod_handler.py` script had a local testing block
    that attempted to download a `CHANGELOG.md` file from a
    URL that resulted in a 404 error. This was causing
    RunPod's local test run to fail. The problematic URL in the
    `mock_event` for `audio_url` has been changed to a valid one
    (uses the README.md URL, same as `video_url` for test purposes).

2.  **Prevents errors from existing symlinks:**
    The `predict.py` script's `setup()` method attempted to create
    symbolic links without checking if they already existed.
    This could lead to errors if the script was run multiple
    times or if the links were already present. The script has
    been updated to check for the existence of these links
    before attempting to create them.

These changes should improve the reliability of deploying and running this model on RunPod Serverless.
This commit addresses the following issues:

1.  **Fixes Whisper model loading:**
    The `Audio2Feature` class was attempting to load the Whisper
    model from a hardcoded local path (`checkpoints/whisper/tiny.pt`)
    which does not exist. This caused a `RuntimeError`.
    The `model_path` in `latentsync/whisper/audio2feature.py` has been
    changed from `"checkpoints/whisper/tiny.pt"` to `"tiny"`. This allows
    the Whisper library's `load_model` function to correctly identify
    the model by name and handle its download and caching automatically.

2.  **Improves error handling in inference:**
    The `predict.py` script used `os.system()` to call the main
    inference script (`scripts/inference.py`). This did not check the
    exit code of the subprocess, potentially masking errors if the
    inference script failed. `os.system()` has been replaced with
    `subprocess.check_call()`, which will raise a `CalledProcessError`
    if the inference script returns a non-zero exit code. This ensures
    that failures during the actual model inference are properly
    propagated and reported in the RunPod logs.

These changes should resolve the `RuntimeError: Model checkpoints/whisper/tiny.pt not found` and provide more robust error reporting.
This commit addresses issues related to checkpoint and model loading:

1.  **Ensure `setup_env.sh` Execution:**
    The `Dockerfile` was modified to execute `setup_env.sh` during
    the image build. This script downloads specific critical checkpoints
    like `whisper/tiny.pt` and `latentsync_unet.pt` from Hugging Face
    into the `checkpoints/` directory.

2.  **Correct Whisper Model Path:**
    The `latentsync/whisper/audio2feature.py` was updated to use
    `model_path="checkpoints/whisper/tiny.pt"`, aligning with the file
    downloaded by `setup_env.sh`.

3.  **Ensure Main Model Archive (`model.tar`) is Always Downloaded:**
    Modified `predict.py`'s `setup()` method to always call
    `download_weights(MODEL_URL, MODEL_CACHE)`. This removes a
    previous condition that skipped downloading `model.tar` if the
    `checkpoints/` directory already existed (e.g., due to
    `setup_env.sh`). This is crucial because `model.tar` contains
    auxiliary models (e.g., for face detection, VGG) that are needed
    for symbolic linking and are not covered by `setup_env.sh`.
    The extraction of `model.tar` will occur after `setup_env.sh`
    has placed its files, potentially overwriting some, which is
    acceptable as `model.tar` is considered more comprehensive for
    the auxiliary files.

4.  **Path Confirmation and Error Handling Kept:**
    - Confirmed `predict.py` uses the correct path for `latentsync_unet.pt`.
    - Retained the improved error handling in `predict.py` that uses
      `subprocess.check_call()` for more robust error reporting.

These changes aim to create a more reliable setup process where all
necessary model components and checkpoints are correctly downloaded and
placed for your application to run on RunPod Serverless.
This commit addresses an error during the Docker build process where
`setup_env.sh` failed because `hf_transfer` was not available.

The `huggingface-cli` attempts to use `hf_transfer` for faster
downloads when `HF_HUB_ENABLE_HF_TRANSFER=1` is set (which it is
in this Dockerfile). However, the `hf_transfer` package was not
installed.

This commit modifies the `Dockerfile` to include `hf_transfer` in the
`pip install` command, ensuring it's available in the environment.
This should allow `setup_env.sh` to execute successfully and download
the necessary checkpoints using the faster transfer method.
This commit addresses a `RuntimeError: stack expects a non-empty TensorList`
which was preceded by ffmpeg-related errors: "Unrecognized option 'crf'"
and "Error: Could not open video."

The errors indicated a problem with processing the input video.
The `latentsync/utils/util.py` file contained an `ffmpeg` command
within its `read_video` function, specifically for changing the
video's FPS to 25. This command included `-crf 18` as an output
option.

While `-crf` is a valid ffmpeg option for encoding, it's suspected
that in the specific execution environment or ffmpeg version within
the Docker container, this option (or its interaction with other
options in that command) was causing ffmpeg to fail and not produce
a valid temporary video file. This failure then led to `cv2.VideoCapture`
not being able to open the video, resulting in no frames for face
detection, and ultimately the `torch.stack` error.

This commit removes the `-crf 18` option from this specific
intermediate ffmpeg command. FFmpeg will use its default quality
settings for this temporary transcoding. The final video encoding
in `lipsync_pipeline.py` still uses `-crf 18` appropriately.

This change aims to allow the intermediate video processing to complete
successfully, enabling proper video frame loading and subsequent face
detection.
This commit addresses the "Unrecognized option 'crf'" error that
occurred during the final ffmpeg command in the lipsync pipeline.
The issue was likely caused by the Python script not running within
the 'latentsync' conda environment, where a fully-featured ffmpeg
was installed by `setup_env.sh`. Instead, it was likely using your
system's default ffmpeg, which might have been older or lacked
certain features.

The `Dockerfile`'s `CMD` instruction has been modified to use
`conda run -n latentsync --no-capture-output python runpod_handler.py`.
This ensures that the `runpod_handler.py` script, and consequently
any ffmpeg processes it invokes via `subprocess`, will use the
ffmpeg version from the 'latentsync' conda environment.

This change should make the correct ffmpeg version available to your
application, resolving the issues with unrecognized options like '-crf'.
ggmarts04 and others added 4 commits May 22, 2025 23:41
This commit modifies `setup_env.sh` to address issues with `conda activate`
not working as expected when the script is executed by Docker's `RUN`
instruction.

The `conda activate latentsync` line has been removed. Instead, all
commands that need to operate within the `latentsync` conda environment
(such as `conda install`, `pip install`, and `huggingface-cli download`)
are now prefixed with `conda run -n latentsync --no-capture-output`.

This change ensures that these commands correctly use the specified
conda environment, leading to reliable installation of dependencies
(including `cog` and the correct `ffmpeg` version from `conda-forge`)
during the Docker build process. This should resolve previous errors
related to `conda activate` failing and subsequent `ModuleNotFound` or
incorrect `ffmpeg` versions being used.
This commit updates `setup_env.sh` to improve the reliability of
commands executed within the Conda environment during Docker builds.

Changes include:
1.  Corrected the `conda install` syntax to use the `-n <envname>`
    flag directly, instead of wrapping with `conda run`.
2.  Modified `pip install` to be invoked via `python -m pip install ...`
    within `conda run`. This ensures the `pip` associated with the
    Conda environment's Python is used.
3.  Modified `huggingface-cli download` to be invoked via
    `python -m huggingface_hub.commands.cli download ...` within
    `conda run`. This ensures the CLI commands from the `huggingface-hub`
    package are correctly found and executed using the environment's Python.

These changes are intended to prevent "command not found" (exit code 127)
errors that can occur if shell activation or PATH issues prevent
executables like `pip` or `huggingface-cli` from being found directly
during scripted Conda operations in Docker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant