Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure at "audio_callback" in gui_diff.py preventing usage #99

Open
danieloneill opened this issue Nov 1, 2023 · 2 comments
Open

Failure at "audio_callback" in gui_diff.py preventing usage #99

danieloneill opened this issue Nov 1, 2023 · 2 comments

Comments

@danieloneill
Copy link

Of my sound devices, it works fine with my USB headset, but attempting to use pipewire, default (which is a Pulse backend), or Jack results in different errors. I'm not convinced one (or all) of these aren't a sounddevice issue.

Still, the result is no audio with any device selections besides directly to my USB headset.

event: start_vc
input device:21:default (ALSA)
output device:21:default (ALSA)
crossfade_time:0.06
buffer_num:4
samplerate:44100
block_time:0.8
prefix_pad_length:3.1100000000000003
mix_mode:None
using_cuda:True
 [DDSP Model] Combtooth Subtractive Synthesiser
 [Loading] /Sabrent/gpt/DDSP-SVC/exp/diffusion-test/model_100000.pt
 [Encoder Model] Content Vec
 [Loading] pretrain/contentvec/checkpoint_best_legacy_500.pt
2023-10-31 17:04:17 | INFO | fairseq.tasks.hubert_pretraining | current directory is /Sabrent/gpt/DDSP-SVC
2023-10-31 17:04:17 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-10-31 17:04:17 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}

Starting callback
Infering...
Audio block passed.
Audio block passed.
Audio block passed.
Audio block passed.
| Load HifiGAN:  pretrain/nsf_hifigan/model
...
sola_shift: 0
Exception ignored from cffi callback <function _StreamBase.__init__.<locals>.callback_ptr at 0x7fa5f96b6f70>:
Traceback (most recent call last):
  File "/Sabrent/gpt/DDSP-SVC/venv/lib64/python3.9/site-packages/sounddevice.py", line 886, in callback_ptr
    return _wrap_callback(
  File "/Sabrent/gpt/DDSP-SVC/venv/lib64/python3.9/site-packages/sounddevice.py", line 2687, in _wrap_callback
    callback(*args)
  File "/Sabrent/gpt/DDSP-SVC/gui_diff.py", line 489, in audio_callback
    outdata[:] = temp_wav[: - self.crossfade_frame, None].repeat(1, 2).cpu().numpy()
ValueError: could not broadcast input array from shape (35280,2) into shape (35280,64)
Audio block passed.
Audio block passed.
Audio block passed.
Audio block passed.
event: stop_vc
Audio block passed.
ENDing VC

When using "pipewire":

event: start_vc
input device:21:default (ALSA)
output device:21:default (ALSA)
crossfade_time:0.06
buffer_num:4
samplerate:44100
block_time:0.8
prefix_pad_length:3.1100000000000003
mix_mode:None
using_cuda:True
 [DDSP Model] Combtooth Subtractive Synthesiser
/Sabrent/gpt/DDSP-SVC/venv/lib64/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
 [Loading] /Sabrent/gpt/DDSP-SVC/exp/diffusion-test/model_100000.pt
 [Encoder Model] Content Vec
 [Loading] pretrain/contentvec/checkpoint_best_legacy_500.pt
2023-10-31 17:04:17 | INFO | fairseq.tasks.hubert_pretraining | current directory is /Sabrent/gpt/DDSP-SVC
2023-10-31 17:04:17 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-10-31 17:04:17 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}

Starting callback
Infering...
Audio block passed.
Audio block passed.
Audio block passed.
Audio block passed.
| Load HifiGAN:  pretrain/nsf_hifigan/model
...
Audio block passed.
Removing weight norm...
sola_shift: 0
Exception ignored from cffi callback <function _StreamBase.__init__.<locals>.callback_ptr at 0x7fa5801e1700>:
Traceback (most recent call last):
  File "/Sabrent/gpt/DDSP-SVC/venv/lib64/python3.9/site-packages/sounddevice.py", line 886, in callback_ptr
    return _wrap_callback(
  File "/Sabrent/gpt/DDSP-SVC/venv/lib64/python3.9/site-packages/sounddevice.py", line 2687, in _wrap_callback
    callback(*args)
  File "/Sabrent/gpt/DDSP-SVC/gui_diff.py", line 489, in audio_callback
    outdata[:] = temp_wav[: - self.crossfade_frame, None].repeat(1, 2).cpu().numpy()
ValueError: could not broadcast input array from shape (35280,2) into shape (35280,64)
Audio block passed.
Audio block passed.
Audio block passed.
Audio block passed.
event: stop_vc
Audio block passed.
ENDing VC

The last one, JACK, is the most baffling. It dies with SIGKILL, which I'm not issuing myself. I see no messages in the journalctl about it whatsoever, either, so I'm not sure what's causing it:

event: start_vc
input device:22:G733 Gaming Headset Mono (JACK Audio Connection Kit)
output device:25:G733 Gaming Headset Analog Stereo (JACK Audio Connection Kit)
crossfade_time:0.06
buffer_num:4
samplerate:44100
block_time:0.8
prefix_pad_length:3.1100000000000003
mix_mode:None
using_cuda:True
 [DDSP Model] Combtooth Subtractive Synthesiser
 [Loading] /Sabrent/gpt/DDSP-SVC/exp/diffusion-test/model_100000.pt

Starting callback
Infering...
Audio block passed.
Killed
(venv) [doneill@galena DDSP-SVC]$ 
@yxlllc
Copy link
Owner

yxlllc commented Nov 1, 2023

According to my tests, only MME is the most stable driver, the others are very random, and may be a problem with the sounddevice library.

@danieloneill
Copy link
Author

I've found that if I modify sounddevice.py to force 1 input channel and 2 output channels, it works as expected. It seems the output device is being instantiated with the "available channels", which on pipewire devices is typically 64, but the audio samples array only contains 2 channel data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants