Skip to content

Conversation

@Bhagawat8
Copy link

No description provided.

Bhagawat8 and others added 2 commits February 8, 2025 17:50
@juan-altatech
Copy link

FYI, getting an error when testing this out:

/latent-sync# ./inference.sh --unet_config_path "configs/unet/second_stage.yaml" \
               --inference_ckpt_path "checkpoints/latentsync_unet.pt" \
               --inference_steps 20 \
               --guidance_scale 1.5 \
               --video_path "assets/demo1_video.mp4" \
               --audio_path "assets/demo1_audio.wav" \
               --video_out_path "video_out.mp4" \
               --superres "both"
Input video path: assets/demo1_video.mp4
Input audio path: assets/demo1_audio.wav
Loaded checkpoint path: checkpoints/latentsync_unet.pt
Initial seed: 1247
Affine transforming 250 faces...
  0%|                                                                                                                                                                                                                        | 0/250 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/root/miniconda3/envs/latentsync/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/latentsync/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/latent-sync/scripts/inference.py", line 167, in <module>
    main(config, args)
  File "/latent-sync/scripts/inference.py", line 136, in main
    pipeline(
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/latent-sync/latentsync/pipelines/lipsync_pipeline.py", line 335, in __call__
    faces, original_video_frames, boxes, affine_matrices = self.affine_transform_video(video_path)
  File "/latent-sync/latentsync/pipelines/lipsync_pipeline.py", line 270, in affine_transform_video
    face, box, affine_matrix = self.image_processor.affine_transform(frame)
  File "/latent-sync/latentsync/utils/image_processor.py", line 147, in affine_transform
    detected_faces = self.fa.get_landmarks(image)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/api.py", line 113, in get_landmarks
    return self.get_landmarks_from_image(image_or_path, detected_faces, return_bboxes, return_landmark_score)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/api.py", line 144, in get_landmarks_from_image
    detected_faces = self.face_detector.detect_from_image(image.copy())
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/detection/sfd/sfd_detector.py", line 45, in detect_from_image
    bboxlist = detect(self.face_detector, image, device=self.device)[0]
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/detection/sfd/detect.py", line 17, in detect
    return batch_detect(net, img, device)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/detection/sfd/detect.py", line 36, in batch_detect
    olist = net(img_batch)  # patched uint8_t overflow error
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/detection/sfd/net_s3fd.py", line 75, in forward
    h = F.max_pool2d(h, 2, 2)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/_jit_internal.py", line 499, in fn
    return if_false(*args, **kwargs)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/nn/functional.py", line 796, in _max_pool2d
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (128x540x1). Calculated output size: (128x270x0). Output size is too small

@Nishad-007
Copy link

@Bhagawat8 Is it tested? if yes, does it output in the same resolution?

@aklacar1
Copy link

For whoever like me lands here or in any similar Forks.

CodeFormer and GFPGAN are Face Upscalers. If anyone is using them in restore_video as part of Python lib -> Most likely AI Hallucination, so do not even try to use it. It is possible to implement, but as first step best bet for most users is to just implement it outside of LatentSync as next step.

@Saisankarwork123
Copy link

For whoever like me lands here or in any similar Forks.

CodeFormer and GFPGAN are Face Upscalers. If anyone is using them in restore_video as part of Python lib -> Most likely AI Hallucination, so do not even try to use it. It is possible to implement, but as first step best bet for most users is to just implement it outside of LatentSync as next step.

Unrelated but is it possible to upload an image instead of a video for Latent-Sync? I cannot see any option in the Gradio demo for picture.

@aklacar1
Copy link

aklacar1 commented Mar 19, 2025

For whoever like me lands here or in any similar Forks.
CodeFormer and GFPGAN are Face Upscalers. If anyone is using them in restore_video as part of Python lib -> Most likely AI Hallucination, so do not even try to use it. It is possible to implement, but as first step best bet for most users is to just implement it outside of LatentSync as next step.

Unrelated but is it possible to upload an image instead of a video for Latent-Sync? I cannot see any option in the Gradio demo for picture.

As far as I know that is not possible. It is though most likely possible to extend the code to make something like that by using a model that can animate images and passing that to LatentSync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants