Super resolution enhancement step to improve the quality of the generated (lipsynced) region #146

Bhagawat8 · 2025-02-26T07:01:50Z

No description provided.

My Readme file

juan-altatech · 2025-02-28T18:11:52Z

FYI, getting an error when testing this out:

/latent-sync# ./inference.sh --unet_config_path "configs/unet/second_stage.yaml" \
               --inference_ckpt_path "checkpoints/latentsync_unet.pt" \
               --inference_steps 20 \
               --guidance_scale 1.5 \
               --video_path "assets/demo1_video.mp4" \
               --audio_path "assets/demo1_audio.wav" \
               --video_out_path "video_out.mp4" \
               --superres "both"
Input video path: assets/demo1_video.mp4
Input audio path: assets/demo1_audio.wav
Loaded checkpoint path: checkpoints/latentsync_unet.pt
Initial seed: 1247
Affine transforming 250 faces...
  0%|                                                                                                                                                                                                                        | 0/250 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/root/miniconda3/envs/latentsync/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/latentsync/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/latent-sync/scripts/inference.py", line 167, in <module>
    main(config, args)
  File "/latent-sync/scripts/inference.py", line 136, in main
    pipeline(
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/latent-sync/latentsync/pipelines/lipsync_pipeline.py", line 335, in __call__
    faces, original_video_frames, boxes, affine_matrices = self.affine_transform_video(video_path)
  File "/latent-sync/latentsync/pipelines/lipsync_pipeline.py", line 270, in affine_transform_video
    face, box, affine_matrix = self.image_processor.affine_transform(frame)
  File "/latent-sync/latentsync/utils/image_processor.py", line 147, in affine_transform
    detected_faces = self.fa.get_landmarks(image)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/api.py", line 113, in get_landmarks
    return self.get_landmarks_from_image(image_or_path, detected_faces, return_bboxes, return_landmark_score)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/api.py", line 144, in get_landmarks_from_image
    detected_faces = self.face_detector.detect_from_image(image.copy())
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/detection/sfd/sfd_detector.py", line 45, in detect_from_image
    bboxlist = detect(self.face_detector, image, device=self.device)[0]
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/detection/sfd/detect.py", line 17, in detect
    return batch_detect(net, img, device)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/detection/sfd/detect.py", line 36, in batch_detect
    olist = net(img_batch)  # patched uint8_t overflow error
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/detection/sfd/net_s3fd.py", line 75, in forward
    h = F.max_pool2d(h, 2, 2)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/_jit_internal.py", line 499, in fn
    return if_false(*args, **kwargs)
  File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/nn/functional.py", line 796, in _max_pool2d
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (128x540x1). Calculated output size: (128x270x0). Output size is too small

Nishad-007 · 2025-03-15T17:40:15Z

@Bhagawat8 Is it tested? if yes, does it output in the same resolution?

aklacar1 · 2025-03-17T05:54:20Z

For whoever like me lands here or in any similar Forks.

CodeFormer and GFPGAN are Face Upscalers. If anyone is using them in restore_video as part of Python lib -> Most likely AI Hallucination, so do not even try to use it. It is possible to implement, but as first step best bet for most users is to just implement it outside of LatentSync as next step.

Saisankarwork123 · 2025-03-19T10:36:45Z

For whoever like me lands here or in any similar Forks.

CodeFormer and GFPGAN are Face Upscalers. If anyone is using them in restore_video as part of Python lib -> Most likely AI Hallucination, so do not even try to use it. It is possible to implement, but as first step best bet for most users is to just implement it outside of LatentSync as next step.

Unrelated but is it possible to upload an image instead of a video for Latent-Sync? I cannot see any option in the Gradio demo for picture.

aklacar1 · 2025-03-19T13:56:52Z

For whoever like me lands here or in any similar Forks.
CodeFormer and GFPGAN are Face Upscalers. If anyone is using them in restore_video as part of Python lib -> Most likely AI Hallucination, so do not even try to use it. It is possible to implement, but as first step best bet for most users is to just implement it outside of LatentSync as next step.

Unrelated but is it possible to upload an image instead of a video for Latent-Sync? I cannot see any option in the Gradio demo for picture.

As far as I know that is not possible. It is though most likely possible to extend the code to make something like that by using a model that can animate images and passing that to LatentSync.

Bhagawat8 and others added 2 commits February 8, 2025 17:50

added changes

3d36332

Update README.md

df3d583

My Readme file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Super resolution enhancement step to improve the quality of the generated (lipsynced) region #146

Super resolution enhancement step to improve the quality of the generated (lipsynced) region #146

Uh oh!

Bhagawat8 commented Feb 26, 2025

Uh oh!

juan-altatech commented Feb 28, 2025

Uh oh!

Nishad-007 commented Mar 15, 2025

Uh oh!

aklacar1 commented Mar 17, 2025

Uh oh!

Saisankarwork123 commented Mar 19, 2025

Uh oh!

aklacar1 commented Mar 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Super resolution enhancement step to improve the quality of the generated (lipsynced) region #146

Are you sure you want to change the base?

Super resolution enhancement step to improve the quality of the generated (lipsynced) region #146

Uh oh!

Conversation

Bhagawat8 commented Feb 26, 2025

Uh oh!

juan-altatech commented Feb 28, 2025

Uh oh!

Nishad-007 commented Mar 15, 2025

Uh oh!

aklacar1 commented Mar 17, 2025

Uh oh!

Saisankarwork123 commented Mar 19, 2025

Uh oh!

aklacar1 commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

aklacar1 commented Mar 19, 2025 •

edited

Loading