Skip to content

Brainchip-Inc/aTENNuate

Repository files navigation

aTENNuate (paper)

aTENNuate is a network that can be configured for real-time speech enhancement on raw audio waveforms. It can perform tasks such as audio denoising, super-resolution, and de-quantization. This repo contains the network definition and a set of pre-trained weights for the aTENNuate model.

Note that the repo is meant for denoising performance evaluation on custom audio samples, and is not optimized for inference. It also does not contain the recurrent configuration of the network, so it cannot be directly used for real-time inference by itself. Evaluation should ideally be done on a batch of .wav files at once as expected by the denoise.py script.

Please contact Brainchip Inc. to learn more on the full real-time audio denoising solution. And please consider citation our work if you find this repo useful.

Quickstart

One simply needs a working Python environment, and run the following

pip install attenuate

To run the pre-trained network on custom audio samples, simply put the .wav files (or other format supported by librosa) into the noisy_samples directory (or any directory of your choice), and run the following

import torch
from attenuate import Denoiser

model = Denoiser()
model.eval()

with torch.no_grad():
    model.from_pretrained("PeaBrane/aTENNuate")
    model.denoise('noisy_samples', denoised_dir='test_samples')

# denoised_samples = model.denoise('noisy_samples')  # return torch tensors instead

The denoised samples will then be saved as .wav files in the denoised_samples directory.

Training

The network should be easily interfaced with your custom training pipeline. The network expects an input of shape (batch, 1, length) and an output of the same shape, which can be sampled at any frequency (though the pre-trained weights operate at 16000 Hz). Note that length should be a multiple of 256, due to the downsampling behavior of the network.

The model supports torch.compile for training, but the FFT operations will be performed in eager mode still, due to complex numbers not being supported. The model does not fully support torch.amp yet stably, due to the sensitivity of the SSM layers. It is recommended to train the model with tensorfloat32 instead, which can be enabled by

from torch.backends import cudnn
torch.backends.cuda.matmul.allow_tf32 = True
cudnn.allow_tf32 = True

Denoising samples

DNS1 synthetic test samples, no reverb

Noisy Sample Denoised Sample
Noisy Sample 1 Denoised Sample 1
Noisy Sample 2 Denoised Sample 2
Noisy Sample 3 Denoised Sample 3

DNS1 real recordings

Noisy Sample Denoised Sample
Noisy Sample 1 Denoised Sample 1
Noisy Sample 2 Denoised Sample 2
Noisy Sample 3 Denoised Sample 3

Contribution

Please submit a Github issue if you find any bugs. If you'd like to contribute a new feature, feel free to open a Github issue to discuss, or email [email protected].

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages