Tag Archives: envelope

Dynamics Processing: Compressor/Limiter, part 3

In part 1 of this series of posts, I went over creating an envelope detector that detects both peak amplitude and RMS values. In part 2, I used it to create a compressor/limiter. There were two common features missing from that compressor plug-in, however, that I will go over in this final part: soft knee and lookahead. Also, as I have stated in the previous parts, this effect is being created with Unity in mind, but the theory and code is easily adaptable to other uses.

Let’s start with lookahead since it is very straightforward to implement. Lookahead is common in limiters and compressors because any non-zero attack/release times will cause the envelope to lag behind the audio due to the filtering, and as a result, it won’t attenuate the right part of the signal corresponding to the envelope. This can be fixed by delaying the output of the audio so that it lines up with the signal’s envelope. The amount we delay the audio by is the lookahead time, so an extra field is needed in the compressor:

public class Compressor : MonoBehaviour
{
    [AudioSlider("Threshold (dB)", -60f, 0f)]
    public float threshold = 0f;		// in dB
    [AudioSlider("Ratio (x:1)", 1f, 20f)]
    public float ratio = 1f;
    [AudioSlider("Knee", 0f, 1f)]
    public float knee = 0.2f;
    [AudioSlider("Pre-gain (dB)", -12f, 24f)]
    public float preGain = 0f;			// in dB, amplifies the audio signal prior to envelope detection.
    [AudioSlider("Post-gain (dB)", -12f, 24f)]
    public float postGain = 0f;			// in dB, amplifies the audio signal after compression.
    [AudioSlider("Attack time (ms)", 0f, 200f)]
    public float attackTime = 10f;		// in ms
    [AudioSlider("Release time (ms)", 10f, 3000f)]
    public float releaseTime = 50f;		// in ms
    [AudioSlider("Lookahead time (ms)", 0, 200f)]
    public float lookaheadTime = 0f;	// in ms

    public ProcessType processType = ProcessType.Compressor;
    public DetectionMode detectMode = DetectionMode.Peak;

    private EnvelopeDetector[] m_EnvelopeDetector;
    private Delay m_LookaheadDelay;

    private delegate float SlopeCalculation (float ratio);
    private SlopeCalculation m_SlopeFunc;

    // Continued...

I won’t actually go over implementing the delay itself since it is very straightforward (it’s just a simple circular buffer delay line). The one thing I will say is that if you want the lookahead time to be modifiable in real time, the circular buffer needs to be initialized to a maximum length allowed by the lookahead time (in my case 200ms), and then you need to keep track of the actual time/position in the buffer that will move based on the current delay time.

The delay comes after the envelope is extracted from the audio signal and before the compressor gain is applied:

void OnAudioFilterRead (float[] data, int numChannels)
{
    // Calculate pre-gain & extract envelope
    // ...

    // Delay the incoming signal for lookahead.
    if (lookaheadTime > 0f) {
        m_LookaheadDelay.SetDelayTime(lookaheadTime, sampleRate);
        m_LookaheadDelay.Process(data, numChannels);
    }

    // Apply compressor gain
    // ...
}

That’s all there is to the lookahead, so now we turn our attention to the last feature. The knee of the compressor is the area around the threshold where compression kicks in. This can either be a hard knee (the compressor kicks in abruptly as soon as the threshold is reached) or a soft knee (compression is more gradual around the threshold, known as the knee width). Comparing the two in a plot illustrates the difference clearly.

Hard knee in black and soft knee in light blue (threshold is -24 dB).

Hard knee in black and soft knee in light blue (threshold is -24 dB).

There are two common ways of specifying the knee width. One is an absolute value in dB, and the other is as a factor of the threshold as a value between 0 and 1. The latter is one that I’ve found to be most common, so it will be what I use. In the diagram above, for example, the threshold is -24 dB, so a knee value of 1.0 results in a knee width of 24 dB around the threshold. Like the lookahead feature, a new field will be required:

public class Compressor : MonoBehaviour
{
    [AudioSlider("Threshold (dB)", -60f, 0f)]
    public float threshold = 0f;		// in dB
    [AudioSlider("Ratio (x:1)", 1f, 20f)]
    public float ratio = 1f;
    [AudioSlider("Knee", 0f, 1f)]
    public float knee = 0.2f;
    [AudioSlider("Pre-gain (dB)", -12f, 24f)]
    public float preGain = 0f;			// in dB, amplifies the audio signal prior to envelope detection.
    [AudioSlider("Post-gain (dB)", -12f, 24f)]
    public float postGain = 0f;			// in dB, amplifies the audio signal after compression.
    [AudioSlider("Attack time (ms)", 0f, 200f)]
    public float attackTime = 10f;		// in ms
    [AudioSlider("Release time (ms)", 10f, 3000f)]
    public float releaseTime = 50f;		// in ms
    [AudioSlider("Lookahead time (ms)", 0, 200f)]
    public float lookaheadTime = 0f;	// in ms

    public ProcessType processType = ProcessType.Compressor;
    public DetectionMode detectMode = DetectionMode.Peak;

    private EnvelopeDetector[] m_EnvelopeDetector;
    private Delay m_LookaheadDelay;

    private delegate float SlopeCalculation (float ratio);
    private SlopeCalculation m_SlopeFunc;

    // Continued...

At the start of our process block (OnAudioFilterRead()), we set up for a possible soft knee compression:

float kneeWidth = threshold * knee * -1f; // Threshold is in dB and will always be either 0 or negative, so * by -1 to make positive.
float lowerKneeBound = threshold - (kneeWidth / 2f);
float upperKneeBound = threshold + (kneeWidth / 2f);

Still in the processing block, we calculate the compressor slope as normal according to the equation from part 2:

slope = 1 – (1 / ratio), for compression

slope = 1, for limiting

To calculate the actual soft knee, I will use linear interpolation. First I check if the knee width is > 0 for a soft knee. If it is, the slope value is scaled by the linear interpolation factor if the envelope value is within the knee bounds:

slope *= ((envValue – lowerKneeBound) / kneeWidth) * 0.5

The compressor gain is then determined using the same equation as before, except instead of calculating in relation to the threshold, we use the lower knee bound:

gain = slope * (lowerKneeBound – envValue)

The rest of the calculation is the same:

for (int i = 0, j = 0; i < data.Length; i+=numChannels, ++j) {
    envValue = AudioUtil.Amp2dB(envelopeData[0][j]);
    slope = m_SlopeFunc(ratio);

    if (kneeWidth > 0f && envValue > lowerKneeBound && envValue < upperKneeBound) { // Soft knee
        // Lerp the compressor slope value.
        // Slope is multiplied by 0.5 since the gain is calculated in relation to the lower knee bound for soft knee.
        // Otherwise, the interpolation's peak will be reached at the threshold instead of at the upper knee bound.
        slope *= ( ((envValue - lowerKneeBound) / kneeWidth) * 0.5f );
        gain = slope * (lowerKneeBound - envValue);
    } else { // Hard knee
        gain = slope * (threshold - envValue);
        gain = Mathf.Min(0f, gain);
    }

    gain = AudioUtil.dB2Amp(gain);

    for (int chan = 0; chan < numChannels; ++chan) {
        data[i+chan] *= (gain * postGainAmp);
    }
}

In order to verify that the soft knee is calculated correctly, it is best to plot the results. To do this I just created a helper method that calculates the compressor values for a range of input values from -90 dB to 0 dB. Here is the plot of a compressor with a threshold of -12.5 dB, a 4:1 ratio, and a knee of 0.4:

Compressor with a threshold of -12.5 dB, 4:1 ratio, and knee of 0.4.

Compressor with a threshold of -12.5 dB, 4:1 ratio, and knee of 0.4.

Of course this also works when the compressor is in limiter mode, which will result in a gentler application of the limiting effect.

Compressor in limiter mode with a threshold of -18 dB, and knee of 0.6.

Compressor in limiter mode with a threshold of -18 dB, and knee of 0.6.

That concludes this series on building a compressor/limiter component.

Dynamics processing: Compressor/Limiter, part 1

Lately I’ve been busy developing an audio-focused game in Unity, whose built-in audio engine is notorious for being extremely basic and lacking in features. (As of this writing, Unity 5 has not yet been released, in which its entire built-in audio engine is being overhauled). For this project I have created all the DSP effects myself as script components, whose behavior is driven by Unity’s coroutines. In order to have slightly more control over the final mix of these elements, it became clear that I needed a compressor/limiter. This particular post is written with Unity/C# in mind, but the theory and code is easy enough to adapt to other uses. In this first part we’ll be looking at writing the envelope detector, which is needed by the compressor to do its job.

An envelope detector (also called a follower) extracts the amplitude envelope from an audio signal based on three parameters: an attack time, release time, and detection mode. The attack/release times are fairly straightforward, simply defining how quickly the detection responds to rising and falling amplitudes. There are typically two modes of calculating the envelope of a signal: by its peak value or its root mean square value. A signal’s peak value is just the instantaneous sample value while the root mean square is measured over a series of samples, and gives a more accurate account of the signal’s power. The root mean square is calculated as:

rms = sqrt ( (1/n) * (x12 + x22 + … + xn2) ),

where n is the number of data values. In other words, we sum together the squares of all the sample values in the buffer, find the average by dividing by n, and then taking the square root. In audio processing, however, we normally bound the sample size (n) to some fixed number (called windowing). This effectively means that we calculate the RMS value over the past n samples.

(As an aside, multiplying by 1/n effectively assigns equal weights to all the terms, making it a rectangular window. Other window equations can be used instead which would favor terms in the middle of the window. This results in even greater accuracy of the RMS value since brand new samples (or old ones at the end of the window) have less influence over the signal’s power.)

Now that we’ve seen the two modes of detecting a signal’s envelope, we can move on to look at the role of the attack/release times. These values are used in calculating coefficients for a first-order recursive filter (also called a leaky integrator) that processes the values we get from the audio buffer (through one of the two detection methods). Simply stated, we get the sample values from the audio signal and pass them through a low-pass filter to smooth out the envelope.

We calculate the coefficients using the time-constant equation:

g = e ^ ( -1 / (time * sample rate) ),

where time is in seconds, and sample rate in Hz. Once we have our gain coefficients for attack/release, we put them into our leaky integrator equation:

out = in + g * (out – in),

where in is the input sample we detected from the incoming audio, g is either the attack or release gain, and out is the envelope sample value. Here it is in code:

public void GetEnvelope (float[] audioData, out float[] envelope)
{
    envelope = new float[audioData.Length];

    m_Detector.Buffer = audioData;

    for (int i = 0; i < audioData.Length; ++i) {
        float envIn = m_Detector[i];

        if (m_EnvelopeSample < envIn) {
            m_EnvelopeSample = envIn + m_AttackGain * (m_EnvelopeSample - envIn);
        } else {
            m_EnvelopeSample = envIn + m_ReleaseGain * (m_EnvelopeSample - envIn);
        }

        envelope[i] = m_EnvelopeSample;
    }
}

(Source: code is based on “Envelope detector” from http://www.musicdsp.org/archive.php?classid=2#97, with detection modes added by me.)

The envelope sample is calculated based on whether the current audio sample is rising or falling, with the envIn sample resulting from one of the two detection modes. This is implemented similarly to what is known as a functor in C++. I prefer this method to having another branching structure inside the loop because among other things, it’s more extensible and results in cleaner code (as well as being modular). It could be implemented using delegates/function pointers, but the advantage of a functor is that it retains its own state, which is useful for the RMS calculation as we will see. Here is how the interface and classes are declared for the detection modes:

public interface IEnvelopeDetection
{
    float[] Buffer { set; get; }
    float this [int index] { get; }

    void Reset ();
}

We then have two classes that implement this interface, one for each mode:

A signal’s peak value is the instantaneous sample value while the root mean square is measured over a series of samples, and gives a more accurate account of the signal’s power.

public class DetectPeak : IEnvelopeDetection
{
    private float[] m_Buffer;

    /// <summary>
    /// Sets the buffer to extract envelope data from. The original buffer data is held by reference (not copied).
    /// </summary>
    public float[] Buffer
    {
        set { m_Buffer = value; }
        get { return m_Buffer; }
    }

    /// <summary>
    /// Returns the envelope data at the specified position in the buffer.
    /// </summary>
    public float this [int index]
    {
        get { return Mathf.Abs(m_Buffer[index]); }
    }

    public DetectPeak () {}
    public void Reset () {}
}

This particular class involves a rather trivial operation of just returning the absolute value of a signal’s sample. The RMS detection class is more involved.

/// <summary>
/// Calculates and returns the root mean square value of the buffer. A circular buffer is used to simplify the calculation, which avoids
/// the need to sum up all the terms in the window each time.
/// </summary>
public float this [int index]
{
    get {
        float sampleSquared = m_Buffer[index] * m_Buffer[index];
        float total = 0f;
        float rmsValue;

        if (m_Iter < m_RmsWindow.Length-1) {
            total = m_LastTotal + sampleSquared;
            rmsValue = Mathf.Sqrt((1f / (index+1)) * total);
        } else {
            total = m_LastTotal + sampleSquared - m_RmsWindow.Read();
            rmsValue = Mathf.Sqrt((1f / m_RmsWindow.Length) * total);
        }

        m_RmsWindow.Write(sampleSquared);
        m_LastTotal = total;
        m_Iter++;

        return rmsValue;
    }
}

public DetectRms ()
{
    m_Iter = 0;
    m_LastTotal = 0f;
    // Set a window length to an arbitrary 128 for now.
    m_RmsWindow = new RingBuffer<float>(128);
}

public void Reset ()
{
    m_Iter = 0;
    m_LastTotal = 0f;
    m_RmsWindow.Clear(0f);
}

The RMS calculation in this class is an optimization of the general equation I stated earlier. Instead of continually summing together all the  values in the window for each new sample, a ring buffer is used to save each new term. Since there is only ever 1 new term to include in the calculation, it can be simplified by storing all the squared sample values in the ring buffer and using it to subtract from our previous total. We are just left with a multiply and square root, instead of having to redundantly add together 128 terms (or however big n is). An iterator variable ensures that the state of the detector remains consistent across successive audio blocks.

In the envelope detector class, the detection mode is selected by assigning the corresponding class to the ivar:

public class EnvelopeDetector
{
    protected float m_AttackTime;
    protected float m_ReleaseTime;
    protected float m_AttackGain;
    protected float m_ReleaseGain;
    protected float m_SampleRate;
    protected float m_EnvelopeSample;

    protected DetectionMode m_DetectMode;
    protected IEnvelopeDetection m_Detector;

    // Continued...
public DetectionMode DetectMode
{
    get { return m_DetectMode; }
    set {
        switch(m_DetectMode) {
            case DetectionMode.Peak:
                m_Detector = new DetectPeak();
                break;

            case DetectionMode.Rms:
                m_Detector = new DetectRms();
                break;
        }
    }
}

Now that we’ve looked at extracting the envelope from an audio signal, we will look at using it to create a compressor/limiter component to be used in Unity. That will be upcoming in part 2.

The Making of a Plug-In: Part 5 (Beta & new features)

In this post I’m going to discuss two new features I have added to the Match Envelope plug-in that I’m pretty excited about.  And with some additional bug fixing, it’s in a good workable state, so I can offer up the plug-in as a Beta version.

The first of the new features I added is an option to invert the envelope, so instead of matching the source audio that you extract the envelope from, the resulting audio is opposite in shape.  Of course this can further be tweaked by use of the “match strength” and “gain” parameters.  The actual process of doing this is very simple; take the interpolated value (from cubic interpolation) ‘ival’ and subtract it from 1 (digital waveforms in floating point representation all have sample values between -1 to 1).  As simple as this was, it did introduce a bug that took a long time to find.

Occasionally I would get artefacting after applying the process when inverting the envelope, and I discovered that the resuling interpolated value was nan in these cases (nan = not a number).  When the source audio consisted of sharp attacks, and thus sharp rises in the waveform, the interpolated value exceeded 1 by a small amount.  Then, when it gets to this point in the formula for calculating match strength:

pow(ival, matchStr),

it will result in an imaginary number when matchStr is around 0.5.  In essence, this part of the formula ends up trying to square-root a negative number.  The easy fix for this was just to take the absolute value of ‘ival’ such that pow(fabs(ival), matchStr)) will then never result in nan.  This is a better solution than to just “floor” ival to 0 if it is negative because this will actually alter the audio slightly by messing with the interpolation.

The second feature I added is a “window offset” parameter, which shifts the envelope left or right.  Oddly enough, this one took less time to implement and required less testing/fixing than the invert feature even though it sounds more complex to code.  In fact, it’s pretty simple as well.  Similar to a circular buffer, instead of shifting the elements around in the buffers that contain the envelope data, I just offset the cursor that points to the location in the buffer.

If I want to shift the envelope to the left (the result will in effect anticipate the source audio’s envelope), the cursor needs to be offset by a positive number.  If I want to shift the envelope to the right (emulating a delay of the envelope), the cursor needs to be offset by a negative number.  This may seem a bit unintuitive at first, but we need to consider how the offset/placement of the cursor affects how the processing begins. i.e. if it is negatively offset, it will be delayed in processing the envelope data, thus shifting the envelope to the right.

Before moving on, here is a new video showcasing these new features, this time with some slightly more exciting audio for demonstration.

Lastly, in terms of bug fixing, I had to deal with an issue that arose in Soundforge due to the fact that it does not save the state of a plug-in once it’s invoked.  In other words, when a plug-in is opened in Audacity, it is not “destroyed” until the program is exited; it only switches to a suspend state while it’s not active.  This means that a plug-in’s constructor will only be called once, thus saving the state of parameters/variables between uses as long as the host isn’t closed.  Soundforge does not do this, however, and this caused problems with parameter and variable reinitialization.

Fortunately a fix was found, but it has further reaffirmed that VST is not really built for offline processing, and while we can certainly coax them into it, I’ve hit several limitations in terms of what I can accomplish within the bounds of the VST SDK as well as inadequate support for the offline capabilities it does provide by hosts.

As such, this might be one of the last entries in this particular making-of series on the Match Envelope plug-in.  I have learned a ton through this process, and through sticking it out when it became clear that this particular process is probably better suited to a standalone app or command-line program where I could have had much more control over things.

I look very much forward to developing my next plug-in, however, which will most assuredly be a real-time process of some kind, and I am excited to learn and to tackle the challenges that await!  Without further ado, here are links to the beta of the Match Envelope VST plug-in:

Match Envelope beta v0.9.2.23 — Mac

Match Envelope beta v0.9.2.23 — Win

Cheers!

The Making of a Plug-In: Part 4 (Alpha Version)

After all the work that’s gone into this plug-in, it’s pretty darn cool to see it up and running in a host program!  I’ve been messing around with it a bit in Audacity, just trying out different parameter setting and seeing it all come together.  There’s more work to be done in order to take care of some additional issues that may crop up, but for the most part I have a working alpha version of the Match Envelope plug-in.  I will likely graduate it to beta once I take care of these issues, which isn’t far off, but in the meantime I am providing the alpha version for anyone interested to give it a shot (links at the end of this post).  I also recorded a short little video demonstrating the basic functionality of the plug-in (again, these particular audio files were chosen to illustrate the effect visually):

One of the primary issues that I need to address is differing sample rates between the source and destination audio.  This is something I have yet to try out in code, but I expect some sort of scaling of certain parameters will solve the problem rather than forbidding mixed sample rates (which would be a bit of a pain).

If, for example, we choose a window size of 30ms with the source audio at a sample rate of 44.1kHz, it will contain 1323 samples per window.  If the destination audio’s sample rate is at 48kHz, however, it will contain 1440 samples per window.  There can thus quite easily be a mismatch in the number of windows in an envelope profile between the two audio streams as well as window boundaries not matching up, which will cause the audio files to not sync or match properly.

This brings up another related issue, evident in the UI.  The Envelope Extractor contains the parameter for window duration, but what if the user selects a different value for the source envelope and the destination?  Right now it results in an incorrect match.  Should this be forbidden, or perhaps even ignored by forcing the same window duration on both envelopes?  Or perhaps there is a way to turn it into a feature.  I am undecided on this at the moment and need to do some testing and exploring of various options before addressing this issue.

In addition, I will have to decide on whether implementing user-saved programs is of any use in this plug-in.  Currently this functionality is not supported, but I think it would be a good idea to include some obvious presets, and allow users to save their own.

Other than that, I am excited to share the first distributable version of this plug-in!  The Mac OS X version has been tested on 10.6.8 and 10.8 in Audacity, and the Windows version using Windows 7 in both Audacity and Soundforge.  If you’re interested, give it a try, and though you’re certainly not obligated to, feedback is welcome:

christian@christianfloisand.com

Edit: Beta version available in Downloads

The Making of a Plug-In: Part 3 (Solving the UI Issue)

I’m both happy and relieved that progress on making the Match Envelope plug-in is proving to be successful (so far, anyway)!  It’s up and running, albeit in skeleton form, on Audacity (both Mac and Windows) and Soundforge (Windows).  As I was expecting, it hasn’t been without it’s fair share of challenges, and one of the biggest has been dealing with the UI — how will the user interact with the plug-in efficiently with the inherent limitations involved in the interface?

The crux of the problem stems from the offline-only capability of the Match Envelope plug-in.  Similar to processes like normalization, where the entire audio buffer needs to be scanned to determine its peak before scanning it a second time to apply it, I need to scan the entire audio buffer (or at least as much as the user has selected) in order to get the envelope profile before then applying that onto a different audio buffer.

This part of the challenge I foresaw as I began development.  I knew of VST’s offline features, however, and I planned to explore these options that would solve some of the interface difficulties I knew I would encounter.  What I didn’t count on was that host programs widely do not support VST offline functions, and in fact, Steinberg has all but removed the example source code for offline plug-ins from the 2.4 SDK (I’m not currently up to speed on VST3 as of yet).  Thus I have been forced to use the normal VST SDK functions to handle my plug-in.

So here is the root cause of perhaps the main challenge I had to deal with: the host program that invokes the plug-in is responsible for sending the audio buffer in blocks to the processing function, which is the only place I have access to the audio stream.  The function prototype looks like this:

void processReplacing (float **inputs, float **outputs, VstInt32 sampleFrames)

inputs‘ contains the actual audio sample data that the host has sent to the plug-in, ‘outputs‘ is where, after processing, the plug-in places the modified audio, and ‘sampleFrames‘ is the number of samples (block size) in the audio sample data.  As I mentioned earlier, not only do I need to scan the audio buffer first to acquire the envelope profile, I need to divide the audio data into windows whose size is determined by the user.  It’s pretty obvious that the number of samples in the window size will not equal the number of samples in sampleFrames (at least not 99.99998% of the time), effectively complicating the implementation of this function three-fold.

How should I handle cases where the window size is less than sampleFrames?  More than sampleFrames?  More than double sampleFrames?  Complicating matters further is that different hosts will pass different block sizes in for sampleFrames, and there is no way to tell exactly what it will be until processReplacing() is invoked.  Here is the pseudocode I used to tackle this problem:

The code determines how many windows it can process in any given loop iteration of processReplacing() given sampleFrames and windowSize and storing leftover samples in a variable that is carried over into the next iteration.  Once the end of a window is reached, the values copied from the audio buffer (our source envelope) are averaged together, or its peak is found, whichever the user has specified, and that value is then stored in the envelope data buffer.  The reasoning behind handling large windows separately from small ones is to avoid a conditional test with every sample processed to determine if the end of the window is reached.

Once this part of the plug-in began to take shape, another problem cropped up.  The plug-in requires three steps (one is optional) taken by the user in order to use it:

  1. Acquire source envelope profile from an audio track,
  2. Acquire the destination audio’s envelope profile to use the match % parameter (optional),
  3. Select the audio to apply the envelope profile onto and process.

It became clear that, since I was not using VST offline capabilities, the plug-in would need to be opened and reopened 2 – 3 times in order to make this work.  This isn’t exactly ideal and wasn’t what I had in mind for the interface, but the upside is that its been a huge learning experience.  As such, I decided to split the Match Envelope plug-in into two halves: the Envelope Extractor, and the Envelope Matcher.

I felt this was a good way to go because it separated two distinct elements of the plug-in as well as clarifying which parameters belong with which process.  i.e. The match % or gain parameters have no effect on the actual extraction of the envelope profile, only during the processing onto the destination audio.  Myself, like many others I assume, like fiddling around with parameters and settings on plug-ins, and it can get very frustrating at times when/if they have no apparent effect, and this can create confusion and possibly thoughts of bad design towards the software.

To communicate between the two halves, I implemented a system of writing the envelope data extracted to a temporary binary file that is read by the Envelope Matcher half in order to process the envelope, and this has proven to work very well.  In debug mode I am writing a lot of data out to temporary debug files in order to monitor what the plug-in is doing and how all the calculations are being done.

From Envelope Extractor:

From Envelope Matcher:

Some of these non-ideal interface features I plan on tackling with a custom GUI, which offers much more flexibility than the extremely limited default UI.  Regardless, I’m excited that I’ve made it this far and am very close to having a working version of this plug-in up and running on at least two hosts so far (Adobe Audition also supports VST and as far as I know, offline processing, but I have not been able to test it as I don’t own it yet).

After this is done, I do plan on exploring other plug-in types to compare and contrast features and flexibility (AU, RTAS, etc.), and I may find a better solution for the interface. Of course, the plug-in could work as a standalone app where I have total control over the UI and functionality, but it would lack the benefit of doing processing right from within the host.

The Making of a Plug-In: Part 2

This entry in my making of a plug-in series will detail what went into finalizing the prototype program for the Match Envelope plug-in.  A prototype of this kind is usually a command-line program wherein much of the code is actually written to implement functionality and features, and then later transferred into a plug-in’s SDK (in my case, the VST SDK).  Plug-ins, by their very nature, are not self-executable programs and need a host to run, so it is more efficient to test the fundamental code structure within a command-line program.

Having now completed my prototype, I want to first share some of the things I improved upon as well as new features I implemented.  One of the main features of the plug-in that I mentioned in part 1 was the match % parameter.  This effectively lets you control how strongly the envelope you’re matching affects the audio, and rather than being a linear effect, it is proportional to the difference between the amplitude of the envelope and the amplitude of the audio.  Originally this was the formula I used (from part 1):

We could see that this mostly gave me the results I was after, but if we look closely at the resulting waveform, there is some asymmetry in comparing it to the original (look at the bottom of the waveforms).  One of the mistakes in this formula was in comparing the interpolated value ‘ival’ with the actual sample amplitude of the ‘buffer’.  To remedy this, I now also extract the envelope of the destination audio that we are applying the envelope on to (with the same window width used to extract the source envelope) and use this value to compare the difference with ‘ival’.  This ensures a more consistent and accurate comparison of amplitudes.

The other mistake in the original formula was to linearly affect ‘a‘, the alpha value that is input by the user in %, by the term that calculates the difference between the amplitudes. So while the a value does affect the resulting ival proportionally, a itself was not.  The final equation then, just became:

I use two strategies when deciding on an appropriate mathematical formula for what I need.  One is considering how I want a value to change over time, or over some range of values, and turning to a kind of equation that does that (i.e. should it be a linear change, exponential, logarithmic, cyclical, etc.).  This leads to the second method, and that is to use a graph to visualize the shape of change I am after; this leads to an equation that defines that graph.

Here is a quick graphic and audio to illustrate these changes using the same flute source as the envelope and triangle wave as its destination from part1:

Flute envelope applied with 80msec window size at 100% match

Shortly, we will be seeing some much more interesting musical examples of the plug-in at work.  But before that, we can see another feature at work above that I implemented since last time: junction smoothing.

In addition to specifying the length of the envelope, the user also specifies a value (in msec) to smooth the transition from the envelope match to the original, unmodified audio.  Longer values will obviously make the transition more gradual, while shorter makes it more abrupt.  The process of implementing this feature turned out to be reasonably simple.  This is the basic equation:

where ‘ival’ is the interpolated value and ‘jpos’ is the current position within the bounds of the junction smoothing specified by the user.  ‘jpos’ starts at 0, and once the smoothing begins, it increments (within a normalized value) until it hits 1 at the end of the smoothing.  The larger ‘jpos’ gets, the less of the actual interpolated value we end up with in our ‘jval’ result, which is used to scale the audio buffer (just as ‘ival’ does outside of junction smoothing).  In other words, when ‘jpos’ hits 1, ‘jval’ will be 1 and so we multiply our audio buffer by 1; then we have reached the end of our process and the original audio continues on unmodified.

Before we move on, here is a musical example.  This very famous opening of Debussy’s “Prelude to the Afternoon of a Faun” seemed like a good excerpt to test my plug-in on.  This is the original audio (the opening is very very soft as most classical recordings are of quiet moments to preserve dynamic range, so I had to amplify it which is why there is some audible low noise):

Opening of “Prelude to the Afternoon of a Faun”

Using this as the source envelope, I used a window size of 250msec at 90% match to apply on to this flute line that I recorded in Logic, doubling the original from the audio above (the flute sample is from Vienna Special Edition).

Unmodified flute doubling of the Prelude opening

The result:

Flute doubling after Match Envelope

And here is the result mixed with the original audio:

Doubled flute line mixed with original audio

Junction smoothing was of course applied during the process to let the flute line fade out as in its original incarnation.  Without smoothing it would have abruptly cut off.  This gives us a seamless transition from the end of the flute solo into the orchestral answer.

It was very important in this example to specify a fairly large window duration, because we don’t want to capture the tremolo of the original flute solo as this would fight against the tremolo of the sampled recording that we are applying the envelope on to.  We can hear a little bit of this in places even with a 250msec window, so this will be something I intend to test further to see how this may be avoided or at least minimized.

This brings me to the other major challenge I faced in developing this plug-in since part 1: stereo handling.  Dealing with stereo files isn’t complicated in itself, but there were a few complexities I encountered along the way specific to how I wanted the plug-in to behave. Instead of only allowing a 1-to-1 correspondence (i.e. only supporting mono to mono, or stereo to stereo), I decided to allow for the two additional situations of mono to stereo and stereo to mono.

The first two cases are easy enough to deal with, but what should happen if the source envelope is mono and the destination audio is stereo, and vice versa?  I decided to allocate 2-dimensional arrays for the envelope buffers to hold the mono/stereo amplitude data and then use bitwise flags to store the states of each envelope:

This saves on having multiple variables representing channel states for each envelope, so I only have to pass around one variable that contains all of this information that is then parsed in the appropriate places to retrieve this information.

As we can see, only one variable (‘envFlags’) is used in the extraction of the source envelope, and we can find out mono/stereo information by using bitwise AND with the corresponding enum definition of the flag we’re after.  Furthermore, in the case of the source envelope being stereo but destination audio being mono, I combine the amplitude data of the two channels into one, according to either average or peak extraction method (also specified by user).

The difficulty in implementing this feature wasn’t so much in how to get it done, but how to get it done more efficiently, without a massive number of parameters passing around and a whole lot of conditional statements within the main processing loop to determine how many channels each envelope has.  We can see some of this at work within the main process:

I use another variable (‘stereo_src’) that extracts some information from the bitwise flags to take care of the case where the source envelope is mono but destination audio is stereo.  Since the loop covers the channels of the destination audio (in interleaved format), I needed a way to restrict out of bounds indexing of the source envelope.  If the source envelope is mono, ‘stereo_src’ will be 0, so the indexing of it will not exceed its limit.  If both envelopes are stereo, ‘stereo_src’ will be 1, so it will effectively “follow” the same indexing as the destination audio.

For the next, and last, musical example of this entry, we change things up a whole lot.  I’m going to show the application of this plug-in to electronic dance music.  This match envelope plug-in can emulate, or function as a kind of side chain compressor, which is quite commonly found in EDM.  Here is a simple kick drum pattern and a synth patch that goes on top:

Kick drum pattern

Synth pattern

By applying the match envelope plug-in to the synth pattern using the kick drum pattern above as the source envelope (window size of 100msec and 65% match), we achieve the kind of pumping pattern in the synth so characteristic of this kind of music.  The result, and mix, are as follows:

Match Envelope applied to synth pattern

Modified synth pattern mixed with percussion and bass

This part has really covered the preliminary features of what I’m planning to include in the Match Envelope plug-in.  As I stated at the start, the next step is to transfer the code into the VST SDK (that’ll be part 3), but this also comes with its share of considerations and complications, mainly dealing with UI.  How should this appear to the user?  How do you neatly package it all together to make it easy and efficient to use?  How should be parameters be presented so that they are intuitive?

Most all plug-ins/hosts offer up a default UI, which is what I’ll be working with initially, but eventually a nice graphic custom GUI will be needed (part 4? 5? 42?).

The Making of a Plug-In: Part 1

Well, it’s finally time to do something useful with all this stuff.  Not that command-line programs aren’t useful, but they have their limitations — especially these days.  Making a plug-in is a great way to apply all the things I’ve been doing, which really culminates into making a deliverable product that has a use.  I have thus decided to make a Match Envelope plug-in, inspired by a suggestion from my good friend Igor (thanks man!).  This first part of the “making of” blog will cover some of the initial conception and development of the prototype program as well as introducing some of the planned features and parameters of the plug-in.  Focus will not be on actual C++ code at this point, but more on the math and the concepts behind it.

This Match Envelope plug-in will be similar in many ways to an Envelope Follower.  It extracts the envelope from a source audio file and applies it to a destination audio file.  Envelope followers tend to be more geared towards MIDI and there are not too many (to my knowledge) stand-alone plug-ins that give you what you need.  They can also be found as features in filters and other kinds of plug-ins.

The usefulness of them can vary quite a bit as we will see in more detail.  Commonly we see Envelope Followers used to sync up a sound to a drum loop for example.  Another benefit of using this plug-in involves layering sounds together.  If a seamless blend is desired, the envelopes of the different layers must match fairly close or else we will hear the distinct layers.  With a mix of more percussive sounds with sustained ones, an envelope matcher can be quite useful.

Furthermore, Match Envelope can be used to approximate the attack or release of instruments in an orchestra that have been recorded.  Let’s say you wish to double the flute line, or even the bass line with something; the Match Envelope plug-in can assist in blending the two layers together.  As will be discussed below, there will be parameters and features to control the effect because there are situations where we certainly don’t want a “lifeless” 100% match.

In addition to its uses in music, it can also be used in sound design, where layers upon layers of different sound sources are often combined to great effect.  Given some of the cool and varied applications of this plug-in, and considering that there are not a great many of them out there, I felt this was an exciting project to take on!

So how does it work and where do we start?  To extract the envelope shape from the source audio, we use windows of a certain size that will either take the peak amplitude or the average amplitude within that window and store it in a buffer.  We then take those amplitude values and apply them onto the destination audio, effectively recreating its amplitude shape.  Fairly straightforward in concept, but there’s more to it than that.

If we just apply the extracted envelope values to the audio, we’ll get a staircase.  Thus we turn to interpolation.  At first I went with linear interpolation, given by the formula

where we want y(x) with position x between points (x0, y0) and (x1, y1), and this resulted in a fairly good and accurate match.  However, we need a better quality interpolation; one that will give us a smoother, more accurate curve that will result in better quality audio.  For this we turn to cubic interpolation.  The cubic equation will look familiar to most;

but as it has four unknowns (the coefficients a, b, c, d) we need four points in order to interpolate or solve it (remember that from math class?).  Solving this equation isn’t terribly complicated; we take the derivative of y(x) and then solve both equations for x = 0 and x = 1.  This assumes that the distance between successive x values is 1.  For a more detailed explanation of how to solve this equation for the coefficients go here: http://www.paulinternet.nl/?page=bicubic

There was one problem I encountered after implementing this, however, that is worth mentioning.  My window sizes were not of the unit value (1) and even moreso, can be changed by the user, so I did not have a constant for the distance between x values.  The solution is pretty simple (almost too much so as I was heavily focused on workarounds that were far too complex).  Basically we just scale the x increment value, which keeps track of our position, by the inverse of the window duration.

Let’s say we set our window duration to 20 milliseconds.  That gives us our window size of 882 samples (assuming a sample rate of 44.1kHz).  Our x increment value is then 2.268×10-5 (0.02 / 882) based on a window size of 0.02.  But we need a window size that is effectively 1, so 1 / 0.02 is 50, and this is our scaling factor.  The increment value is now 1.1334×10-3.

One final detail that caused a small issue was rounding error.  At the boundaries of some windows, I would end up with an x value of 3.999999 or similar.  This caused sample error that did not sound very good, but the solution was a simple matter of adding a very small value to the x position at the end of each window loop (0.0000001 for instance).  Some additional testing with varying window sizes will be done to make sure no more sample/rounding error occurs.

While on the topic of windows, their function is to affect the smoothness of the extracted envelope as well as its accuracy.  Smaller window sizes will result in a closer match to the source audio’s envelope, and larger ones will be more of an approximation.  Before moving on, let’s get a visual of how I’m testing out the functionality of the plug-in.

To really get a good idea of how the program is working, using something simple like a triangle wave is great for visualizing the outcome.

As I mentioned previously, there are two ways of taking the amplitude within each window: taking the maximum (peak) value, or taking the average.  Just below we see the difference between them, and it is the intention at this point to have this as an option for which to use.

Here we can see a number of things already at work.  The difference between peak and average amplitude is not very much.  But in this case, the source audio is quite smooth and the destination waveform is completely constant.  A following example will demonstrate the difference between these better.  We also see the effects of different window sizes in the visual above.  A very large window size (perhaps around 500 ms to 1 s) would retain much more of the shape of the destination audio and might be more useful for longer sustained sounds, while a shorter window length is better for capturing percussive sounds.  The example below uses a different audio as the source, with more percussive attacks.

There is a clear difference between peak and average, at least visually.  We will hear at the end of this post that they don’t differ a huge amount to our ears (at least not with these examples).

Before I wrap up this post, I want to discuss two additional features I’m planning at this point.  One is just a simple gain, that scales the result of the envelope match.  An extension of this feature will be to add an option where the user can specify that the amplitude at the end of the envelope will match the amplitude of the rest of the audio (i.e. at the junction point where the envelope ends, and the rest of the audio continues unmodified).  This would be useful if the user just wants to match a section from a source audio file.

Secondly, I have implemented a parameter called “match strength %” that will control how strongly the envelope will modify the destination audio.  The cool thing about this is that it’s proportional to the difference in amplitude between the envelope and the destination audio.  To accomplish this, I scale the interpolated value, ival, by the equation

where buffer is the amplitude of the destination audio and a is a value in % specified by the user.  We can see it visually in the image below.

The bigger the distance between the two amplitudes, the stronger the envelope affects the audio.  This ensures that the general shape of the envelope remains, while retaining more of the original shape in relation to the specified % by the user.  The middle waveform shows a gain factor of 1.6, but this was incorrectly implemented, as the gain modifies the interpolated value, ival, instead of the result of ival * buffer.  As it is in the image above, gain is also proportional, but I intend it to be a linear effect.

Here are a couple of examples of how this process sounds at this stage.  These audio samples use just the triangle wave that I have been using to test the program layered on top of the audio I extracted the envelope from.  Listen to how the triangle wave follows the shape of the audio and see if you can hear a difference between the peak version and the average version (the peak version has slightly stronger attacks from the triangle wave).

Peak windowing: layered triangle wave over source audio

Average windowing: layered triangle wave over source audio

This has been a long and wordy post, so it’s time to wrap up.  I’m pleased so far with the functionality and behavior of the plug-in and the parameters and features I am planning on implementing.  It should provide a good amount of flexibility to shape a given waveform/audio file from an extracted envelope source, and should have some fun and productive uses.