-
Notifications
You must be signed in to change notification settings - Fork 862
Super resolution enhancement step to improve the quality of the generated (lipsynced) region #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
My Readme file
|
FYI, getting an error when testing this out: /latent-sync# ./inference.sh --unet_config_path "configs/unet/second_stage.yaml" \
--inference_ckpt_path "checkpoints/latentsync_unet.pt" \
--inference_steps 20 \
--guidance_scale 1.5 \
--video_path "assets/demo1_video.mp4" \
--audio_path "assets/demo1_audio.wav" \
--video_out_path "video_out.mp4" \
--superres "both"
Input video path: assets/demo1_video.mp4
Input audio path: assets/demo1_audio.wav
Loaded checkpoint path: checkpoints/latentsync_unet.pt
Initial seed: 1247
Affine transforming 250 faces...
0%| | 0/250 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/miniconda3/envs/latentsync/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/latentsync/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/latent-sync/scripts/inference.py", line 167, in <module>
main(config, args)
File "/latent-sync/scripts/inference.py", line 136, in main
pipeline(
File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/latent-sync/latentsync/pipelines/lipsync_pipeline.py", line 335, in __call__
faces, original_video_frames, boxes, affine_matrices = self.affine_transform_video(video_path)
File "/latent-sync/latentsync/pipelines/lipsync_pipeline.py", line 270, in affine_transform_video
face, box, affine_matrix = self.image_processor.affine_transform(frame)
File "/latent-sync/latentsync/utils/image_processor.py", line 147, in affine_transform
detected_faces = self.fa.get_landmarks(image)
File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/api.py", line 113, in get_landmarks
return self.get_landmarks_from_image(image_or_path, detected_faces, return_bboxes, return_landmark_score)
File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/api.py", line 144, in get_landmarks_from_image
detected_faces = self.face_detector.detect_from_image(image.copy())
File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/detection/sfd/sfd_detector.py", line 45, in detect_from_image
bboxlist = detect(self.face_detector, image, device=self.device)[0]
File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/detection/sfd/detect.py", line 17, in detect
return batch_detect(net, img, device)
File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/detection/sfd/detect.py", line 36, in batch_detect
olist = net(img_batch) # patched uint8_t overflow error
File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/face_alignment/detection/sfd/net_s3fd.py", line 75, in forward
h = F.max_pool2d(h, 2, 2)
File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/_jit_internal.py", line 499, in fn
return if_false(*args, **kwargs)
File "/root/miniconda3/envs/latentsync/lib/python3.10/site-packages/torch/nn/functional.py", line 796, in _max_pool2d
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (128x540x1). Calculated output size: (128x270x0). Output size is too small |
|
@Bhagawat8 Is it tested? if yes, does it output in the same resolution? |
|
For whoever like me lands here or in any similar Forks. CodeFormer and GFPGAN are Face Upscalers. If anyone is using them in restore_video as part of Python lib -> Most likely AI Hallucination, so do not even try to use it. It is possible to implement, but as first step best bet for most users is to just implement it outside of LatentSync as next step. |
Unrelated but is it possible to upload an image instead of a video for Latent-Sync? I cannot see any option in the Gradio demo for picture. |
As far as I know that is not possible. It is though most likely possible to extend the code to make something like that by using a model that can animate images and passing that to LatentSync. |
No description provided.