Tag Archives: envelope detector

Dynamics Processing: Compressor/Limiter, part 2

In part 1 I detailed how I built the envelope detector that I will now use in my Unity compressor/limiter. To reiterate, the envelope detector extracts the amplitude contour of the audio that will be used by the compressor to determine when to compress the signal’s gain. The response of the compressor is determined by the attack time and the release time of the envelope, with higher values resulting in a smoother envelope, and hence, a gentler response in the compressor.

The compressor script is a MonoBehaviour component that can be attached to any GameObject. Here are the fields and corresponding inspector GUI:

public class Compressor : MonoBehaviour
{
    [AudioSlider("Threshold (dB)", -60f, 0f)]
    public float threshold = 0f;		// in dB
    [AudioSlider("Ratio (x:1)", 1f, 20f)]
    public float ratio = 1f;
    [AudioSlider("Knee", 0f, 1f)]
    public float knee = 0.2f;
    [AudioSlider("Pre-gain (dB)", -12f, 24f)]
    public float preGain = 0f;			// in dB, amplifies the audio signal prior to envelope detection.
    [AudioSlider("Post-gain (dB)", -12f, 24f)]
    public float postGain = 0f;			// in dB, amplifies the audio signal after compression.
    [AudioSlider("Attack time (ms)", 0f, 200f)]
    public float attackTime = 10f;		// in ms
    [AudioSlider("Release time (ms)", 10f, 3000f)]
    public float releaseTime = 50f;		// in ms
    [AudioSlider("Lookahead time (ms)", 0, 200f)]
    public float lookaheadTime = 0f;	// in ms

    public ProcessType processType = ProcessType.Compressor;
    public DetectionMode detectMode = DetectionMode.Peak;

    private EnvelopeDetector[] m_EnvelopeDetector;
    private Delay m_LookaheadDelay;

    private delegate float SlopeCalculation (float ratio);
    private SlopeCalculation m_SlopeFunc;
    
    // Continued...
Compressor/Limiter Unity inspector GUI.

Compressor/Limiter Unity inspector GUI.

Compressor/Limiter Unity inspector GUI.

Compressor/Limiter Unity inspector GUI.

 

 

 

 

 

 

 

 

The two most important parameters for a compressor are the threshold and the ratio values. When a signal exceeds the threshold, the compressor reduces the level of the signal by the given ratio. For example, if the threshold is -2 dB with a ratio of 4:1 and the compressor encounters a signal peak of +2 dB, the gain reduction will be 3 dB, resulting in the signal’s new level of -1dB. The ratio is just a percentage, so a 4:1 ratio means that the signal will be reduced by 75% (1 – 1/4 = 0.75). The difference between the threshold and the signal peak (which is 4 dB in this example) is scaled by the ratio to arrive at the 3 dB reduction (4 * 0.75 = 3). When the ratio is ∞:1, the compressor is turned into a limiter. The compressor’s output can be visualized by a plot of amplitude in vs. amplitude out:

Plot of amplitude in vs. amplitdue out of a compressor with 4:1 ratio.

Plot of amplitude in vs. amplitdue out of a compressor with 4:1 ratio.

When the ratio is ∞:1, the resulting amplitude after the threshold would be a straight horizontal line in the above plot, effectively preventing any levels from exceeding the threshold. It can easily be seen how this then would exhibit the behavior of a limiter. From these observations, we can derive the equations we need for the compressor.

compressor gain = slope * (threshold – envelope value) if envelope value >= threshold, otherwise 0

slope = 1 – (1 / ratio), or for limiting, slope = 1

All amplitude values are in dB for these equations. We saw both of these equations earlier in the example I gave, and both are pretty straightforward. These elements can now be combined to make up the compressor/limiter. The Awake method is called as soon as the component is initialized in the scene.

 

void Awake ()
{
    if (processType == ProcessType.Compressor) {
        m_SlopeFunc = CompressorSlope;
    } else if (processType == ProcessType.Limiter) {
        m_SlopeFunc = LimiterSlope;
    }

    // Convert from ms to s.
    attackTime /= 1000f;
    releaseTime /= 1000f;

    // Handle stereo max number of channels for now.
    m_EnvelopeDetector = new EnvelopeDetector[2];
    m_EnvelopeDetector[0] = new EnvelopeDetector(attackTime, releaseTime, detectMode, sampleRate);
    m_EnvelopeDetector[1] = new EnvelopeDetector(attackTime, releaseTime, detectMode, sampleRate);
}

Here is the full compressor/limiter code in Unity’s audio callback method. When placed on a component with the audio listener, the data array will contain the audio signal prior to being sent to the system’s output.

void OnAudioFilterRead (float[] data, int numChannels)
{
    float postGainAmp = AudioUtil.dB2Amp(postGain);

    if (preGain != 0f) {
        float preGainAmp = AudioUtil.dB2Amp(preGain);
        for (int k = 0; k < data.Length; ++k) {
            data[k] *= preGainAmp;
        }
    }

    float[][] envelopeData = new float[numChannels][];

    if (numChannels == 2) {
        float[][] channels;
        AudioUtil.DeinterleaveBuffer(data, out channels, numChannels);
        m_EnvelopeDetector[0].GetEnvelope(channels[0], out envelopeData[0]);
        m_EnvelopeDetector[1].GetEnvelope(channels[1], out envelopeData[1]);
        for (int n = 0; n < envelopeData[0].Length; ++n) {
            envelopeData[0][n] = Mathf.Max(envelopeData[0][n], envelopeData[1][n]);
        }
    } else if (numChannels == 1) {
        m_EnvelopeDetector[0].GetEnvelope(data, out envelopeData[0]);
    } else {
        // Error...
    }

    m_Slope = m_SlopeFunc(ratio);

    for (int i = 0, j = 0; i < data.Length; i+=numChannels, ++j) {
        m_Gain = m_Slope * (threshold - AudioUtil.Amp2dB(envelopeData[0][j]));
        m_Gain = Mathf.Min(0f, m_Gain);
        m_Gain = AudioUtil.dB2Amp(m_Gain);
        for (int chan = 0; chan < numChannels; ++chan) {
            data[i+chan] *= (m_Gain * postGainAmp);
        }
    }
}

And quickly, here is the helper method for deinterleaving a multichannel buffer:

public static void DeinterleaveBuffer (float[] source, out float[][] output, int sourceChannels)
{
    int channelLength = source.Length / sourceChannels;

    output = new float[sourceChannels][];

    for (int i = 0; i < sourceChannels; ++i) {
        output[i] = new float[channelLength];

        for (int j = 0; j < channelLength; ++j) {
            output[i][j] = source[j*sourceChannels+i];
        }
    }
}

First off, there are a few utility functions that I included in the component that converts between linear amplitude and dB values that we can see in the function above. Pre-gain is applied to the audio signal prior to extracting the envelope. For multichannel audio, Unity unfortunately gives us an interleaved buffer, so this needs to be deinterleaved before sending it to the envelope detector (recall that the detector uses a recursive filter and thus has state variables. This could of course be handled differently in the envelope detector, but it’s simpler to work on single continuous data buffers).

When working with multichannel audio, each channel will have a unique envelope. These could of course be processed separately, but this will result in the relative levels between the channels to be disturbed. Instead, I take the maximum envelope value and use that for the compressor. Another option would be to take the average of the two.

I then calculate the slope value based on whether the component is set to compressor or limiter mode (via a function delegate). The following loop is just realizing the equations posted earlier, and converting the dB gain value to linear amplitude before applying it to the audio signal along with post-gain.

This completes the compressor/limiter component. However, there are two important elements missing: soft knee processing, and lookahead. From the plot earlier in the post, we see that once the signal reaches the threshold, the compressor kicks in rather abruptly. This point is called the knee of the compressor, and if we want this transition to happen more gently, we can interpolate within a zone around the threshold.

It’s common, especially in limiters, to have a lookahead feature that compensates for the obvious lag of the envelope detector. In other words, when the attack and release times are non-zero, the resulting envelope lags behind the audio signal as a result of the filtering. The compressor/limiter will actually miss attenuating the peaks in the signal that it needs to because of this lag. That’s where lookahead comes in. In truth, it’s a bit of a misnomer because we can obviously not see into the future of an audio signal, but we can delay the audio to achieve the same effect. This means that we extract the envelope as normal, but delay the audio output so that the compressor gain value lines up with the audio peaks that it is meant to attenuate.

I will be implementing these two remaining features in a future post.

Dynamics processing: Compressor/Limiter, part 1

Lately I’ve been busy developing an audio-focused game in Unity, whose built-in audio engine is notorious for being extremely basic and lacking in features. (As of this writing, Unity 5 has not yet been released, in which its entire built-in audio engine is being overhauled). For this project I have created all the DSP effects myself as script components, whose behavior is driven by Unity’s coroutines. In order to have slightly more control over the final mix of these elements, it became clear that I needed a compressor/limiter. This particular post is written with Unity/C# in mind, but the theory and code is easy enough to adapt to other uses. In this first part we’ll be looking at writing the envelope detector, which is needed by the compressor to do its job.

An envelope detector (also called a follower) extracts the amplitude envelope from an audio signal based on three parameters: an attack time, release time, and detection mode. The attack/release times are fairly straightforward, simply defining how quickly the detection responds to rising and falling amplitudes. There are typically two modes of calculating the envelope of a signal: by its peak value or its root mean square value. A signal’s peak value is just the instantaneous sample value while the root mean square is measured over a series of samples, and gives a more accurate account of the signal’s power. The root mean square is calculated as:

rms = sqrt ( (1/n) * (x12 + x22 + … + xn2) ),

where n is the number of data values. In other words, we sum together the squares of all the sample values in the buffer, find the average by dividing by n, and then taking the square root. In audio processing, however, we normally bound the sample size (n) to some fixed number (called windowing). This effectively means that we calculate the RMS value over the past n samples.

(As an aside, multiplying by 1/n effectively assigns equal weights to all the terms, making it a rectangular window. Other window equations can be used instead which would favor terms in the middle of the window. This results in even greater accuracy of the RMS value since brand new samples (or old ones at the end of the window) have less influence over the signal’s power.)

Now that we’ve seen the two modes of detecting a signal’s envelope, we can move on to look at the role of the attack/release times. These values are used in calculating coefficients for a first-order recursive filter (also called a leaky integrator) that processes the values we get from the audio buffer (through one of the two detection methods). Simply stated, we get the sample values from the audio signal and pass them through a low-pass filter to smooth out the envelope.

We calculate the coefficients using the time-constant equation:

g = e ^ ( -1 / (time * sample rate) ),

where time is in seconds, and sample rate in Hz. Once we have our gain coefficients for attack/release, we put them into our leaky integrator equation:

out = in + g * (out – in),

where in is the input sample we detected from the incoming audio, g is either the attack or release gain, and out is the envelope sample value. Here it is in code:

public void GetEnvelope (float[] audioData, out float[] envelope)
{
    envelope = new float[audioData.Length];

    m_Detector.Buffer = audioData;

    for (int i = 0; i < audioData.Length; ++i) {
        float envIn = m_Detector[i];

        if (m_EnvelopeSample < envIn) {
            m_EnvelopeSample = envIn + m_AttackGain * (m_EnvelopeSample - envIn);
        } else {
            m_EnvelopeSample = envIn + m_ReleaseGain * (m_EnvelopeSample - envIn);
        }

        envelope[i] = m_EnvelopeSample;
    }
}

(Source: code is based on “Envelope detector” from http://www.musicdsp.org/archive.php?classid=2#97, with detection modes added by me.)

The envelope sample is calculated based on whether the current audio sample is rising or falling, with the envIn sample resulting from one of the two detection modes. This is implemented similarly to what is known as a functor in C++. I prefer this method to having another branching structure inside the loop because among other things, it’s more extensible and results in cleaner code (as well as being modular). It could be implemented using delegates/function pointers, but the advantage of a functor is that it retains its own state, which is useful for the RMS calculation as we will see. Here is how the interface and classes are declared for the detection modes:

public interface IEnvelopeDetection
{
    float[] Buffer { set; get; }
    float this [int index] { get; }

    void Reset ();
}

We then have two classes that implement this interface, one for each mode:

A signal’s peak value is the instantaneous sample value while the root mean square is measured over a series of samples, and gives a more accurate account of the signal’s power.

public class DetectPeak : IEnvelopeDetection
{
    private float[] m_Buffer;

    /// <summary>
    /// Sets the buffer to extract envelope data from. The original buffer data is held by reference (not copied).
    /// </summary>
    public float[] Buffer
    {
        set { m_Buffer = value; }
        get { return m_Buffer; }
    }

    /// <summary>
    /// Returns the envelope data at the specified position in the buffer.
    /// </summary>
    public float this [int index]
    {
        get { return Mathf.Abs(m_Buffer[index]); }
    }

    public DetectPeak () {}
    public void Reset () {}
}

This particular class involves a rather trivial operation of just returning the absolute value of a signal’s sample. The RMS detection class is more involved.

/// <summary>
/// Calculates and returns the root mean square value of the buffer. A circular buffer is used to simplify the calculation, which avoids
/// the need to sum up all the terms in the window each time.
/// </summary>
public float this [int index]
{
    get {
        float sampleSquared = m_Buffer[index] * m_Buffer[index];
        float total = 0f;
        float rmsValue;

        if (m_Iter < m_RmsWindow.Length-1) {
            total = m_LastTotal + sampleSquared;
            rmsValue = Mathf.Sqrt((1f / (index+1)) * total);
        } else {
            total = m_LastTotal + sampleSquared - m_RmsWindow.Read();
            rmsValue = Mathf.Sqrt((1f / m_RmsWindow.Length) * total);
        }

        m_RmsWindow.Write(sampleSquared);
        m_LastTotal = total;
        m_Iter++;

        return rmsValue;
    }
}

public DetectRms ()
{
    m_Iter = 0;
    m_LastTotal = 0f;
    // Set a window length to an arbitrary 128 for now.
    m_RmsWindow = new RingBuffer<float>(128);
}

public void Reset ()
{
    m_Iter = 0;
    m_LastTotal = 0f;
    m_RmsWindow.Clear(0f);
}

The RMS calculation in this class is an optimization of the general equation I stated earlier. Instead of continually summing together all the  values in the window for each new sample, a ring buffer is used to save each new term. Since there is only ever 1 new term to include in the calculation, it can be simplified by storing all the squared sample values in the ring buffer and using it to subtract from our previous total. We are just left with a multiply and square root, instead of having to redundantly add together 128 terms (or however big n is). An iterator variable ensures that the state of the detector remains consistent across successive audio blocks.

In the envelope detector class, the detection mode is selected by assigning the corresponding class to the ivar:

public class EnvelopeDetector
{
    protected float m_AttackTime;
    protected float m_ReleaseTime;
    protected float m_AttackGain;
    protected float m_ReleaseGain;
    protected float m_SampleRate;
    protected float m_EnvelopeSample;

    protected DetectionMode m_DetectMode;
    protected IEnvelopeDetection m_Detector;

    // Continued...
public DetectionMode DetectMode
{
    get { return m_DetectMode; }
    set {
        switch(m_DetectMode) {
            case DetectionMode.Peak:
                m_Detector = new DetectPeak();
                break;

            case DetectionMode.Rms:
                m_Detector = new DetectRms();
                break;
        }
    }
}

Now that we’ve looked at extracting the envelope from an audio signal, we will look at using it to create a compressor/limiter component to be used in Unity. That will be upcoming in part 2.