Skip to content

Tensorflow Lite Audio Output

Phil Schatzmann edited this page Oct 5, 2024 · 6 revisions

The goal of this blog to give a quick introduction into using TensorFlow Lite For Microcontrollers to create Audio Output with the help of the audio-tools library.

Hallo World

The starting point is the good overview provided by the "Hallo World" example of Tensorflow Lite which describes how to create, train and use a model which based on the sine function.

Converting into Audio

We can use this model to output the result as a tone with the help o the following sketch:

#include "AudioTools.h"
#include "AudioTools/AudioLibs/TfLiteAudioStream.h"
#include "model.h"

TfLiteSineReader tf_reader(20000,0.3);  // Audio generation logic 
TfLiteAudioStream tf_stream;            // Audio source -> no classification so N is 0
I2SStream i2s;                          // Audio destination
StreamCopy copier(i2s, tf_stream);      // copy tf_stream to i2s
int channels = 1;
int samples_per_second = 16000;


void setup() {
  Serial.begin(115200);
  AudioLogger::instance().begin(Serial, AudioLogger::Warning);

  // Input from tensorflow
  auto tcfg = tf_stream.defaultConfig();
  tcfg.channels = channels;
  tcfg.sample_rate = samples_per_second;
  tcfg.kTensorArenaSize = 2 * 1024;
  tcfg.model = g_model;
  tcfg.reader = &tf_reader;
  tf_stream.begin(tcfg);

  // Output to I2S
  auto cfg = i2s.defaultConfig(TX_MODE);
  cfg.channels = channels;
  cfg.sample_rate = samples_per_second;
  i2s.begin(cfg);

}

void loop() { copier.copy(); }

Like in any other audio sketch we copy the audio data from the source (TfLiteAudioStream) to the sink (I2SStream).

The TfLiteSineReader class

The heart of the processing is the TfLiteSineReader class which is provided by the framework and has been defined as follows

class TfLiteSineReader : public TfLiteReader {
  public: TfLiteSineReader(int16_t range=32767, float increment=0.01 ){
    this->increment = increment;
    this->range = range;
  }
  
  virtual int read(TfLiteAudioStream *parent, int16_t*data, int sampleCount) {
    int channels = parent->config().channels;
    float two_pi = 2 * PI;
    // setup on first call
    if (p_interpreter==nullptr){
      p_interpreter = parent->interpreter();
      input = p_interpreter->input(0);
      output = p_interpreter->output(0);
    }
    for (int j=0; j<sampleCount; j+=channels){
      // Quantize the input from floating-point to integer
      input->data.int8[0] = TfQuantizer::quantize(actX,input->params.scale, input->params.zero_point);
      
      // Invoke TF Model
      TfLiteStatus invoke_status = p_interpreter->Invoke();
      // Dequantize the output and convert it to int32 range
      data[j] = TfQuantizer::dequantizeToNewRange(output->data.int8[0], output->params.scale, output->params.zero_point, range);
      for (int i=1;i<channels;i++){
          data[j+i] = data[j];
      }
      // Increment X
      actX += increment;
      if (actX>two_pi){
        actX-=two_pi;
      }
    }

    return sampleCount;
  }

As you can see, we just provide a array of int16_t data generated by the Tensorflow Model!

Dependencies

Github

The full example can be found on Github

Summary

This is a pretty bad way to generate a sine tone and the audio tools library provides better ways to do this. However the goal was to give an simple introduction as a stepping stone...

Clone this wiki locally