Noise Cancellation using Web Audio API

20 Nov 2019

Apple recently announced AirPods Pro, this time with noise cancellation feature (ANC), so it made me sit down and try to write my noise cancellation code. Well, I know it can’t be done only by software due to physical limitations (which I will explain later), but playing around with audio is always fun. So I want to write an app that disables the background noises, using javascript, and in particular, Web Audio API. For that, I would like to go back and implement some basic mandatory code first and go over some basic audio processing how-to. Use Chrome for the best compatibility.

FFT

The Scientist and Engineer’s Guide to Digital Signal Processing By Steven W. Smith, Ph.D. is by far the greatest source for learning about digital signal processing. I learned from 10 years ago while I had zero knowledge about the subject, and now after reading through it again, I can see how clear it is. This book is a really good start for someone who would like to dive in and explore more on that subject. For the lazy ones, here is a quick simulator to show “how sound works”. The sound we’re hearing is a wave, which we can see on the Time Domain. Each soundwave is composed of multiple frequencies. We can see the decomposition of these frequencies on the Frequency Domain. To convert the time domain to the frequency domain, one uses FFT, Fast Fourier Transform. On computers, we use DFT - Discrete Fourier Transform, since binary representation is after all, discrete.

Let’s take a look at how to implement a simple soundwave visualization on JS.

FFT simulator using Web Audio API

Time Domain

↓ DFT

Frequency Domain

Audio Source	Oscillator [Computer Generator Waveform] Microphone
Frequency	364 Hz
Detune
Wave Shape	Sine Square Sawtooth Triangle

you can find the source here.
Web Audio API is incredible! I never thought 10 years ago that I could perform sound processing with JS. Yet, it’s not beginner-friendly, and all the examples given by MDN are not enough - they are usually the most basic script, without using classes/good resources management. The main thing to know about Web Audio API is that you use it by creating “Nodes” in which the audio passes through. Every node can manipulate the bits and for example, change the amplitude, add gain, and use other sound effects. You start from obtaining the audio from a source (mp3 file/microphone/generated by computer (oscillatorNode)/…) and then connect it the series of nodes (if you would like to manipulate the sound), and then to the audio output (laptop speakers, in my case).
This simulation is using Web Audio API to play a soundwave and show its time & frequency domains. To use the Audio API, we will initialize an AudioContext:

audioCtx = new (window.AudioContext || window.webkitAudioContext)();

We’ll create a class to handle our simulator (seems basic but I haven’t seen it on many examples online):

class FFTSimulator {
  constructor() {
    this._source = "oscillator";
    this.isPlaying = false;
    ...
  }

Our simulation sound source can be one of the following:

OscillatorNode Using the parameters below you can define the shape, frequency and detune of the sound wave.
Microphone

The waveform appears on the Time Domain graph.

Using the AnalyserNode we can obtain the waveform FFT decomposition - meaning the frequencies that altogether create the soundwave. The frequencies appear on the Frequency Domain graph.

So, we would like to connect our source of sound (Microphone/OscillatorNode) to the analyzer, so it could give us 2 arrays (one contains the time domain, the second the frequency domain) that we can display on screen, and then connect the time domain (the waveform) to our speakers (audioCtx.destination).

  async play() {
    this.analyser = audioCtx.createAnalyser();

    switch (this._source) {
      case "oscillator":
        // create oscillator (produces waveforms)
        this.oscillator = audioCtx.createOscillator();
        this.oscillator.type = this._oscillatorType; // "Sine"
        this.oscillator.frequency.value = this._oscillatorFrequency; // 440 Hz
        ...

        // connect audio nodes and start
        this.oscillator.connect(this.analyser);
        this.oscillator.start(0);
        break;

      case "microphone":
        initUserMediaFromBrowser();
        if (!navigator.mediaDevices.getUserMedia) {
          console.log("getUserMedia not supported on your browser!");
          break;
        }
        let constraints = { audio: true };
        this._microphoneStream = await navigator.mediaDevices.getUserMedia(
          constraints
        );
        this._microphone = audioCtx.createMediaStreamSource(
          this._microphoneStream
        );
        this._microphone.connect(this.analyser);
        break;
    }

    // connect the analazer, for FFT graphs visualizations
    this.analyser.connect(audioCtx.destination);
    this.visualize();
  }

For visualizations, we are using visualize(). In this function we can see the Analyzer purpose: we can observe the analyzer calls to getByteTimeDomainData(timeBuffer) and getByteFrequencyData(frequencyBuffer), which fills the given empty arrays with waveform and frequencies samples. You can check how the chromium team implemented their FFT analysis here. We can also observe the call of requestAnimationFrame in order to create animations.

  visualize() {
    this.analyser.fftSize = 2048;
    var timeBufferLength = this.analyser.fftSize;
    var frequencyBufferLength = this.analyser.frequencyBinCount;
    var timeBuffer = new Uint8Array(timeBufferLength);
    var frequencyBuffer = new Uint8Array(frequencyBufferLength);

    clearBackground(this.canvasTime);
    clearBackground(this.canvasFrequency);

    this.draw = function() {
      this.drawHandler = requestAnimationFrame(this.draw);
      this.analyser.getByteTimeDomainData(timeBuffer);
      this.analyser.getByteFrequencyData(frequencyBuffer);
      drawGraph(this.canvasTime, timeBufferLength, timeBuffer);
      drawGraph(
        this.canvasFrequency,
        frequencyBufferLength,
        frequencyBuffer,
        true
      );
    }.bind(this);
    this.draw();
  }
}

This is the basics for creating simple audio visualization. You can find the full source here. Moving on.

How Noise Cancellation Works

The sound we hear is a wave. Let’s see, for example, a simple sine wave, where \(A\) is the amplitude, \(\omega\) is angular frequency, and \(\varphi\) is the phase:

\[y_1\left(t\right) = A sin\left(\omega t + \varphi\right)\]

Because of the superposition principle, any sound waves can be combined to create the resulting sound wave (by adding their amplitudes at each point of time). In the most basic way, noise cancellation works by creating a second waveform, (duplicated from the original wave), but only with inversed amplitude. This process is called interference. If the result is that the waves cancel each other out - its called destructive interference.

\[y_2\left(t\right) = -y_1(t) = - A sin\left(\omega t + \varphi\right)\]

This way, for every point on time \(t\), we have destructive waves interference, since:

\[output(t) = y_1\left(t\right) + y_2\left(t\right) = A sin\left(\omega t + \varphi\right) - A sin\left(\omega t + \varphi\right) = 0\]

Here is a simulation that demonstrates this phenomenon. Change the phase in order to notice the changing sound, depending on the phase of the inverted wave:

Waves Interference Simulator using Audio Worklet

Time Domain

Audio Source	Oscillator [Computer Generated Waveform]
Audio Output	Oscillator Wave Anti-Phase Oscillator Wave
Frequency	364 Hz
Detune
Inverted Wave Phase Shift
Wave Shape	Sine Square Sawtooth Triangle

you can find the source here.

Unlike the previous simulation, which used only the OscillatorNode and the Microphone input, this simulation is using a rather new feature of WebAudio API, called Audio Worklet.

This feature allows us to process audio, using code written in JS. We can also write Web-Assembly audio-processing code for that purpose, but that’s for another post. This feature is replacing the deprecated ScriptProcessorNode.

I honestly didn’t think it would work well - I thought there would be a slight shift in the phase between the input and the processed output waveform, a “glitch” in the audio), but it proved me wrong, and after I’ve read about it, I see why - while in the previous design of the ScriptProcessorNode, the audio processing has been executed asynchronously on the main thread (causing audio latency and UI “jank”), in the current design, the audio processing is executed on a different thread (audio processing thread) using the AudioWorkletProcessor, while the AudioWorkletNode object is on the main thread, connected to the input and output audio nodes as we’ve seen before.

So in order to use audio processing, we need to use two components that communicate between each other:

Writing your own processing code with AudioWorkletProcessor is pretty easy - we only need to implement the process function. parameters are optional. The main thing to notice is that the AudioWorkletProcessor code is on a different file.

Let’s go back to WHY we wanted to use audio processing in the first place - we want, given an input waveform \(y_1\left(t\right)\), to produce its negative waveform, \(y_2\left(t\right) = -y_1\left(t\right)\). So we need to multiply each point in the sample by \(-1.0\) to flip the amplitude vertically.

for (let i = 0; i < size; ++i)
  output[channel][(i+phase) % size] = -1.0 * input[channel][i];

When phase==0, then the code just flip the amplitude, and we would get destructive waves interference - we would hear nothing! When phase!=0, then the code flips the amplitude but also add a shift, and we can notice the sound change upon phase change. Furthermore - if we change the phase so that the waves overlap - we would get constructive interference - the amplitude of the new sound we hear is twice the size of the original amplitude - so we can hear the same sound, but twices as louder. cool, isn’t it?

Notice I didn’t fill the places left by the shifting operation. That would require some sort of prediction (when we’re not dilling with a simple sine wave).

noise-cancellation-processor.js

class NoiseCancellationProcessor extends AudioWorkletProcessor {

    static get parameterDescriptors () {
      return [{
        name: 'phase',
        defaultValue: 0,
        minValue: 0,
        maxValue: 180,
      }]
    }

    process(inputs, outputs, parameters) {
      const input = inputs[0];
      const output = outputs[0];
      const phase = parameters['phase'][0];
    
      for (let channel = 0; channel < output.length; ++channel) {
        // if you just want to copy input to output:
        //    const map1 = input[channel].map(x => -1.0 * x);
        //    output[channel].set(map1);
        const size = input[channel].length;
        for (let i = 0; i < size; ++i)
          output[channel][(i+phase) % size] = -1.0 * input[channel][i];
      }
  
      return true;
    }
  }
  
  console.log('Registering processor');
  registerProcessor('noise-cancellation-processor', NoiseCancellationProcessor);

On our main JS code file, we create the AudioWorkletNode on the play function, just like any other audio node. The way the simulation works is using the following routing of nodes:

The oscillator comes first and its job is to generate a Sine/Square/… waveform. Then we connect it to analyser1 (to visualise it) and then to the audio output. The AudioWorkletNode takes it input from the oscillator, and generate the inverted wave. Its output is then passed to analyser2 (for visualizations) and to the audio output.

wave-interference-simulator.js

  async play() {
    this.analyser1 = audioCtx.createAnalyser();
    this.analyser2 = audioCtx.createAnalyser();

    this.oscillator = audioCtx.createOscillator();

    await audioCtx.audioWorklet.addModule(
      "/projects/noise-cancellation/noise-cancellation-processor.js");
    this.noiseReducer = new AudioWorkletNode(
      audioCtx,
      "noise-cancellation-processor"
    );
    const phaseParam = this.noiseReducer.parameters.get("phase");
    phaseParam.setValueAtTime(this._antiWavePhase, audioCtx.currentTime);

    // connect waveform y1
    this.oscillator.connect(this.analyser1);
    this.analyser1.connect(audioCtx.destination);

    // connect waveform y2
    this.oscillator.connect(this.noiseReducer);
    this.noiseReducer.connect(this.analyser2);
    this.analyser2.connect(audioCtx.destination);

    this.oscillator.start(0);
    this.visualize();
  }

Problems creating real-life Noise Reducer

So by now, we have a good understanding of the physics behind a basic noise cancellation using destructive waves interference, and how to implement it using Web Audio API. We simply need to take what the waveform recorded by the microphone and multiply each point by \(-1\). But we can’t celebrate yet.

There are few problems with implementing noise cancellation using software:

Processing time (Letancy) - How long does it take from the moment I sample the wave to produce the opposite wave. If someone outside shouts “Peaky”, and by the time it takes to process the output, the person is already shouting “Blinders!!”, when we hear it on our “special javascript noise-canceling earphones” we would hear something like “BPleiankyders!!” (just remember that the “Peaky” is multiplied by −1−1).
Phase Difference - occurred because of the Letancy (it’s actually the same issue), but need to be clarified. When trying to implement noise cancellation of a simple sine-waveform, even a small change in the phase can cause noise. You can check that out on the simulation above - refresh the page, or change the frequency to 364 Hz. In this frequency, the speed of waveform on the graph is slow so you can observe the changes. Shift the phase, even by 1 step (equivalent to one data point). You would immediately notice a sound!
Amplitude Difference - the sound amplitude (or sound pressure or intensity) the microphone captures is not the one we hear - since we have an AirPod in our ear, it’s functioning as an earplug, and therefore reducing the intensity of the outside sounds. So, when we come to calculate our inverse destructive wave from the surrounding sounds, we need to lower the amplitude of that wave. By how much? we need to make an experiment and to conclude exactly by how many % or DB we need to reduce the amplitude. Also, notice that for each earphone it would be a bit different - Beats headphones would reduce the surrounding noise better than the cheap earphones you receive on the tourists’ bus. So when changing the amplitude, that depends on the earphone model.

Fake Javascript Active Noise Cancellation Simulation

Like Moses who knew he couldn’t get into the promised land, so are we, implementing an “active” “noise” “cancellation” simulation, knowing it wouldn’t work, no matter how hard we pray. Physics (or God) simply prevents us from doing so. However, You will do your best to try to configure the parameters below, so it would work on your settings. But I’ll warn you in advance - your ears will bleed during the process!

On your phone, open this webpage again. Use the first simulation on this page to play a sine wave waveform. I recommend using the default 364Hz frequency and use the lowest detune level.
On Your computer, start the following simulation. This simulation simply records the audio and produces the “anti-wave” with parameters of your choosing. Play the Amplitude Level, and Phase until you hear noise reduction.
To ignore echo effects, use earphones connected to your computer.

Time Domain

Audio Source	Microphone
Show Microphone Waveform
Show Anti-Wave
Anti-Wave Phase Shift
Anti-Wave Amplitude [0-100%]

you can find the source here.

As you can see, All this simulation does is connecting the microphone to the noise reducer (an audioWorkletNode) that creates the anti-wave, and then connect it to the audio destination (speakers). The analyzers are only for displaying the waves here.

What’s Next?

Since it’s obvious that I need to implement it using hardware, and since I don’t have an ultra-low latency machine back at home, I won’t implement a real-time (and working) version of Active Noise Cancellation anytime soon. It can be a great project sometimes. Also, multiplying the waveform by -1 and shifting (even if using sound prediction) is pretty basic. For a real-life implementation, here is an incrediblevideo of Adam Cook from MathWorks, where he explains in details how he designed and implemented a real-time active noise control system using a Speedgoat Simulink Real-Time target (a machine equipped with an ultra-low latency operating system and ultra-low latency A/D and D/A converters). For further explanation about his design and implementation, you can check out this link.