Dynamics processing: Compressor/Limiter, part 1

Lately I’ve been busy developing an audio-focused game in Unity, whose built-in audio engine is notorious for being extremely basic and lacking in features. (As of this writing, Unity 5 has not yet been released, in which its entire built-in audio engine is being overhauled). For this project I have created all the DSP effects myself as script components, whose behavior is driven by Unity’s coroutines. In order to have slightly more control over the final mix of these elements, it became clear that I needed a compressor/limiter. This particular post is written with Unity/C# in mind, but the theory and code is easy enough to adapt to other uses. In this first part we’ll be looking at writing the envelope detector, which is needed by the compressor to do its job.

An envelope detector (also called a follower) extracts the amplitude envelope from an audio signal based on three parameters: an attack time, release time, and detection mode. The attack/release times are fairly straightforward, simply defining how quickly the detection responds to rising and falling amplitudes. There are typically two modes of calculating the envelope of a signal: by its peak value or its root mean square value. A signal’s peak value is just the instantaneous sample value while the root mean square is measured over a series of samples, and gives a more accurate account of the signal’s power. The root mean square is calculated as:

rms = sqrt ( (1/n) * (x12 + x22 + … + xn2) ),

where n is the number of data values. In other words, we sum together the squares of all the sample values in the buffer, find the average by dividing by n, and then taking the square root. In audio processing, however, we normally bound the sample size (n) to some fixed number (called windowing). This effectively means that we calculate the RMS value over the past n samples.

(As an aside, multiplying by 1/n effectively assigns equal weights to all the terms, making it a rectangular window. Other window equations can be used instead which would favor terms in the middle of the window. This results in even greater accuracy of the RMS value since brand new samples (or old ones at the end of the window) have less influence over the signal’s power.)

Now that we’ve seen the two modes of detecting a signal’s envelope, we can move on to look at the role of the attack/release times. These values are used in calculating coefficients for a first-order recursive filter (also called a leaky integrator) that processes the values we get from the audio buffer (through one of the two detection methods). Simply stated, we get the sample values from the audio signal and pass them through a low-pass filter to smooth out the envelope.

We calculate the coefficients using the time-constant equation:

g = e ^ ( -1 / (time * sample rate) ),

where time is in seconds, and sample rate in Hz. Once we have our gain coefficients for attack/release, we put them into our leaky integrator equation:

out = in + g * (out – in),

where in is the input sample we detected from the incoming audio, g is either the attack or release gain, and out is the envelope sample value. Here it is in code:

public void GetEnvelope (float[] audioData, out float[] envelope)
{
    envelope = new float[audioData.Length];

    m_Detector.Buffer = audioData;

    for (int i = 0; i < audioData.Length; ++i) {
        float envIn = m_Detector[i];

        if (m_EnvelopeSample < envIn) {
            m_EnvelopeSample = envIn + m_AttackGain * (m_EnvelopeSample - envIn);
        } else {
            m_EnvelopeSample = envIn + m_ReleaseGain * (m_EnvelopeSample - envIn);
        }

        envelope[i] = m_EnvelopeSample;
    }
}

(Source: code is based on “Envelope detector” from http://www.musicdsp.org/archive.php?classid=2#97, with detection modes added by me.)

The envelope sample is calculated based on whether the current audio sample is rising or falling, with the envIn sample resulting from one of the two detection modes. This is implemented similarly to what is known as a functor in C++. I prefer this method to having another branching structure inside the loop because among other things, it’s more extensible and results in cleaner code (as well as being modular). It could be implemented using delegates/function pointers, but the advantage of a functor is that it retains its own state, which is useful for the RMS calculation as we will see. Here is how the interface and classes are declared for the detection modes:

public interface IEnvelopeDetection
{
    float[] Buffer { set; get; }
    float this [int index] { get; }

    void Reset ();
}

We then have two classes that implement this interface, one for each mode:

A signal’s peak value is the instantaneous sample value while the root mean square is measured over a series of samples, and gives a more accurate account of the signal’s power.

public class DetectPeak : IEnvelopeDetection
{
    private float[] m_Buffer;

    /// <summary>
    /// Sets the buffer to extract envelope data from. The original buffer data is held by reference (not copied).
    /// </summary>
    public float[] Buffer
    {
        set { m_Buffer = value; }
        get { return m_Buffer; }
    }

    /// <summary>
    /// Returns the envelope data at the specified position in the buffer.
    /// </summary>
    public float this [int index]
    {
        get { return Mathf.Abs(m_Buffer[index]); }
    }

    public DetectPeak () {}
    public void Reset () {}
}

This particular class involves a rather trivial operation of just returning the absolute value of a signal’s sample. The RMS detection class is more involved.

/// <summary>
/// Calculates and returns the root mean square value of the buffer. A circular buffer is used to simplify the calculation, which avoids
/// the need to sum up all the terms in the window each time.
/// </summary>
public float this [int index]
{
    get {
        float sampleSquared = m_Buffer[index] * m_Buffer[index];
        float total = 0f;
        float rmsValue;

        if (m_Iter < m_RmsWindow.Length-1) {
            total = m_LastTotal + sampleSquared;
            rmsValue = Mathf.Sqrt((1f / (index+1)) * total);
        } else {
            total = m_LastTotal + sampleSquared - m_RmsWindow.Read();
            rmsValue = Mathf.Sqrt((1f / m_RmsWindow.Length) * total);
        }

        m_RmsWindow.Write(sampleSquared);
        m_LastTotal = total;
        m_Iter++;

        return rmsValue;
    }
}

public DetectRms ()
{
    m_Iter = 0;
    m_LastTotal = 0f;
    // Set a window length to an arbitrary 128 for now.
    m_RmsWindow = new RingBuffer<float>(128);
}

public void Reset ()
{
    m_Iter = 0;
    m_LastTotal = 0f;
    m_RmsWindow.Clear(0f);
}

The RMS calculation in this class is an optimization of the general equation I stated earlier. Instead of continually summing together all the  values in the window for each new sample, a ring buffer is used to save each new term. Since there is only ever 1 new term to include in the calculation, it can be simplified by storing all the squared sample values in the ring buffer and using it to subtract from our previous total. We are just left with a multiply and square root, instead of having to redundantly add together 128 terms (or however big n is). An iterator variable ensures that the state of the detector remains consistent across successive audio blocks.

In the envelope detector class, the detection mode is selected by assigning the corresponding class to the ivar:

public class EnvelopeDetector
{
    protected float m_AttackTime;
    protected float m_ReleaseTime;
    protected float m_AttackGain;
    protected float m_ReleaseGain;
    protected float m_SampleRate;
    protected float m_EnvelopeSample;

    protected DetectionMode m_DetectMode;
    protected IEnvelopeDetection m_Detector;

    // Continued...
public DetectionMode DetectMode
{
    get { return m_DetectMode; }
    set {
        switch(m_DetectMode) {
            case DetectionMode.Peak:
                m_Detector = new DetectPeak();
                break;

            case DetectionMode.Rms:
                m_Detector = new DetectRms();
                break;
        }
    }
}

Now that we’ve looked at extracting the envelope from an audio signal, we will look at using it to create a compressor/limiter component to be used in Unity. That will be upcoming in part 2.

Advertisements

Beat Synchronization in Unity

Update

Due to some valuable advice (courtesy of Tazman-audio), I’ve made a few small changes that ensure that synchronization stays independent of framerate. My original strategy for handling this issue was to grab the current sample of the audio source’s playback and compare that to the next expected beat’s sample value (discussed in more detail below). Although this was working fine, Unity’s documentation makes little mention as to the accuracy of this value, aside from it being more preferrable than using Time.time.  Furthermore, the initial synch with the start of audio playback and the BeatCheck function would suffer from some, albeit very small, discrepancy.

Here is the change to the Start method in the “BeatSynchronizer” script that enforces synching with the start of the audio:

public float bpm = 120f; // Tempo in beats per minute of the audio clip.
public float startDelay = 1f; // Number of seconds to delay the start of audio playback.
public delegate void AudioStartAction(double syncTime);
public static event AudioStartAction OnAudioStart;

void Start ()
{
    double initTime = AudioSettings.dspTime;
    audio.PlayScheduled(initTime + startDelay);
    if (OnAudioStart != null) {
        OnAudioStart(initTime + startDelay);
    }
}

The PlayScheduled method starts the audio clip’s playback at the absolute time (on the audio system’s dsp timeline) given in the function argument. The correct start time is then this initial value plus the given delay. This same value is then broadcast to all the beat counters that have subscribed to the AudioStartAction event, which ensures their alignment with the audio.

This necessitated a small change to the BeatCheck method as well, as can be seen below.  The current sample is now calculated using the audio system’s dsp time instead of the clip’s sample position, which also aleviated the need for wrapping the current sample position when the audio clip loops.

IEnumerator BeatCheck ()
{
    while (audioSource.isPlaying) {
        currentSample = (float)AudioSettings.dspTime * audioSource.clip.frequency;

        if (currentSample >= (nextBeatSample + sampleOffset)) {
            foreach (GameObject obj in observers) {
                obj.GetComponent<BeatObserver>().BeatNotify(beatType);
            }
            nextBeatSample += samplePeriod;
        }

        yield return new WaitForSeconds(loopTime / 1000f);
    }
}

Lastly, I decided to add a nice feature to the beat synchronizer that allows you to scale up the the beat values by an integer constant. This is very useful for cases where you might want to synch to beats that transcend one measure. For example, you could synchronize to the downbeat of the second measure of a four-measure group by selecting the following values in the inspector:

Scaling up the beat values by a factor of 4 treats each beat as a measure instead of a single beat (assuming 4/4 time).

Scaling up the beat values by a factor of 4 treats each beat as a measure instead of a single beat (assuming 4/4 time).

This same feature exists for the pattern counter as well, allowing great deal of flexibility and control over what you can synchronize to.  There is a new example scene in the project demonstrating this.

Github project here.

I did, however, come across a possible bug in the PlayScheduled function: a short burst of noise can be heard occasionally when running a scene. I’ve encountered this both in the Unity editor (version 4.3.3) and in the build product. This does not happen when starting the audio using Play or checking “Play On Awake”.

Original Post

Lately I’ve been experimenting and brainstorming different ways in which audio can be tied in with gameplay, or even drive gameplay to some extent. This is quite challenging because audio/music is so abstract, but rhythm is one element that has been successfully incorporated into gameplay for some time.  To experiment with this in Unity, I wrote a set of scripts that handle beat synchronization to an audio clip.  The github project can be found here.

The way I set this up to work is by comparing the current sample of the audio data to the sample of the next expected beat to occur.  Another approach would be to compare the time values, but this is less accurate and less flexible.  Sample accuracy ensures that the game logic follows the actual audio data, and avoids the issues of framerate drops that can affect the time values.

The following script handles the synchronization of all the beat counters to the start of audio playback:

public float bpm = 120f; // Tempo in beats per minute of the audio clip.
public float startDelay = 1f; // Number of seconds to delay the start of audio playback.
public delegate void AudioStartAction(double syncTime);
public static event AudioStartAction OnAudioStart;

void Start ()
{
    StartCoroutine(StartAudio());
}

IEnumerator StartAudio ()
{
    yield return new WaitForSeconds(startDelay);

    audio.Play();

    if (OnAudioStart != null) {
        OnAudioStart();
    }
}

To accomplish this, each beat counter instance adds itself to the event OnAudioStart, seen here in the “BeatCounter” script:

void OnEnable ()
{
    BeatSynchronizer.OnAudioStart += () => { StartCoroutine(BeatCheck()); };
}

When OnAudioStart is called above, all beat counters that have subscribed to this event are invoked, and in this case, starts the coroutine BeatCheck that contains most of the logic and processing of determining when beats occur. (The () => {} statement is C#’s lambda syntax).

The BeatCheck coroutine runs at a specific frequency given by loopTime, instead of running each frame in the game loop. For example, if a high degree of accuracy isn’t required, this can save on the CPU load by setting the coroutine to run every 40 or 50 milliseconds instead of the 10 – 15 milliseconds that it may take for each frame to execute in the game loop.  However, since the coroutine yields to WaitForSeconds (see below), setting the loop time to 0 will effectively cause the coroutine to run as frequently as the game loop since execution of the coroutine in this case happens right after Unity’s Update method.

IEnumerator BeatCheck ()
{
    while (audioSource.isPlaying) {
        currentSample = audioSource.timeSamples;

        // Reset next beat sample when audio clip wraps.
        if (currentSample < previousSample) {
            nextBeatSample = 0f;
        }

        if (currentSample >= (nextBeatSample + sampleOffset)) {
            foreach (GameObject obj in observers) {
                obj.GetComponent<BeatObserver>().BeatNotify(beatType);
            }
            nextBeatSample += samplePeriod;
        }
        
        previousSample = currentSample;

        yield return new WaitForSeconds(loopTime / 1000f);
    }
}

Furthermore, the fields that count the sample positions and next sample positions are declared as floats, which may seem wrong at first since there is no possibility of fractional samples.  However, the sample period (the number of samples between each beat in the audio) is calculated from the BPM of the audio and the note value of the beat to check, so it is likely to result in a floating point value. In other words:

samplePeriod = (60 / (bpm * beatValue)) * sampleRate

where beatValue is a constant that defines the ratio of the beat to a quarter note.  For instance, for an eighth beat, beatValue = 2 since there are two eighths in a quarter.  For a dotted quarter beat, beatValue = 1 / 1.5; the ratio of one quarter to a dotted quarter.

If samplePeriod is truncated to an int, drift would occur due to loss of precision when comparing the sample values, especially for longer clips of music.

When it is determined that a beat has occurred in the audio, the script notifies its observers along with the type of beat that triggered the event (the beat type is a user-defined value that allows different action to be taken depending on the beat type).  The observers (any Unity object) are easily added through the scripts inspector panel:

The beat counter's inspector panel.

The beat counter’s inspector panel.

Each object observing a beat counter also contains a beat observer script that serves two functions: it allows control over the tolerance/sensitivity of the beat event, and sets the corresponding bit in a bit mask for what beat just occurred that the user can poll for in the object’s script and take appropriate action.

public void BeatNotify (BeatType beatType)
{
    beatMask |= beatType;
    StartCoroutine(WaitOnBeat(beatType));
}

IEnumerator WaitOnBeat (BeatType beatType)
{
    yield return new WaitForSeconds(beatWindow / 1000f);
    beatMask ^= beatType;
}

To illustrate how a game object might respond to and take action when a beat occurs, the following script activates an animation trigger on the down-beat and rotates the object during an up-beat by 45 degrees:

void Update ()
{
    if ((beatObserver.beatMask & BeatType.DownBeat) == BeatType.DownBeat) {
        anim.SetTrigger("DownBeatTrigger");
    }
    if ((beatObserver.beatMask & BeatType.UpBeat) == BeatType.UpBeat) {
        transform.Rotate(Vector3.forward, 45f);
    }
}

Finally, here is a short video demonstrating the example scene set up in the project:

Beat Synchronization in Unity Demo from Christian on Vimeo.

AdVerb: Building a Reverb Plug-In Using Modulating Comb Filters

Some time ago, I began exploring the early reverb algorithms of Schroeder and Moorer, whose work dates back all the way to the 1960s and 70s respectively.  Still their designs and theories inform the making of algorithmic reverbs today.  Recently I took it upon myself to continue experimenting with the Moorer design I left off with in an earlier post.  This resulted in the complete reverb plug-in “AdVerb”, which is available for free in downloads.  Let me share what went into designing and implementing this effect.

One of the foremost challenges in basing a reverb design on Schroeder or Moorer is that it tends to sound a little metallic because with the number of comb filters suggested, the echo density doesn’t build up fast or dense enough.  The all-pass filters in series that come after the comb filter section helps to diffuse the reverb tail, but I found that the delaying all-pass filters added a little metallic sound of their own.  One obvious way of overcoming this is to add more comb filters (today’s computers can certainly handle it).  More importantly, however, the delay times of the comb filters need to be mutually prime so that their frequency responses don’t overlap, which would result in increased beating in the reverb tail.

To arrive at my values for the 8 comb filters I’m using, I wrote a simple little script that calculated the greatest common divisor between all the delay times I chose and made sure that the results were 1.  This required a little bit of tweaking in the numbers, as you can imagine finding 8 coprimes is not as easy as it sounds, especially when trying to keep the range minimal between them.  It’s not as important for the two all-pass filters to be mutually prime because they are in series, not in parallel like the comb filters.

I also discovered, after a number of tests, that the tap delay used to generate the early reflections (based on Moorer’s design) was causing some problems in my sound.  I’m still a bit unsure as to why, though it could be poorly chosen tap delay times or something to do with mixing, but it was enough so that I decided to discard the tap delay network and just focus on comb filters and all-pass filters.  It was then that I took an idea from Dattorro and Frenette who both showed how the use of modulated comb/all-pass filters can help smear the echo density and add warmth to the reverb.  Dattorro is responsible for the well-known plate reverbs that use modulating all-pass filters in series.

The idea behind a modulated delay line is that some oscillator (usually a low-frequency sine wave) modulates the delay value according to a frequency rate and amplitude.  This is actually the basis for chorusing and flanging effects.  In a reverb, however, the values need to be kept very small so that the chorusing effect will not be evident.

I had fun experimenting with these modulated delay lines, and so I eventually decided to modulate one of the all-pass filters as well and give control of it to the user, which offers a great deal more fun and crazy ways to use this plug-in.  Let’s take a look at the modulated all-pass filter (the modulated comb filter is very similar).  We already know what an all-pass filter looks like, so here’s just the modulated delay line:

Modulated all-pass filter.

Modulated all-pass filter.

The oscillator modulates the value currently in the delay line that we then use to interpolate, resulting in the actual value.  In code it looks like this:

double offset, read_offset, fraction, next;
size_t read_pos;

offset = (delay_length / 2.) * (1. + sin(phase) * depth);
phase += phase_incr;
if (phase > TWO_PI) phase -= TWO_PI;
if (offset > delay_length) offset = delay_length;

read_offset = ((size_t)delay_buffer->p - (size_t)delay_buffer->p_head) / sizeof(double) - offset;
if (read_offset < 0) {
    read_offset = read_offset + delay_length;
} else if (read_offset > delay_length) {
    read_offset = read_offset - delay_length;
}

read_pos = (size_t)read_offset;
fraction = read_offset - read_pos;
if (read_pos != delay_length - 1) {
    next = *(delay_buffer->p_head + read_pos + 1);
} else {
    next = *delay_buffer->p_head;
}

return *(delay_buffer->p_head + read_pos) + fraction * (next - *(delay_buffer->p_head + read_pos));

In case that looks a little daunting, we’ll step through the C code (apologies for the pointer arithmetic!).  At the top we calculate the offset using the delay length in samples as our base point.  The following lines are easily seen as incrementing and wrapping the phase of the oscillator as well as capping the offset to the delay length.

The next line calculates the current position in the buffer from the current position pointer, p, and the buffer head, p_head.  This is accomplished by casting the pointer addresses to integral values and dividing by the size of the data type of each buffer element.  The read_offset position will determine where in the delay buffer we read from, so it needs to be clamped to the buffer’s length as well.

The rest is simply linear interpolation (albeit with some pointer arithmetic: delay_buffer->p_head + read_pos + 1 is equivalent to delay_buffer[read_pos + 1]).  Once we have our modulated delay value, we can finish processing the all-pass filter:

delay_val = get_modulated_delay_value(allpass_filter);

// don't write the modulated delay_val into the buffer, only use it for the output sample
*delay_buffer->p = sample_in + (*delay_buffer->p * allpass_filter->g);
sample_out = delay_val - (allpass_filter->g * sample_in);

The final topology of the reverb is given below:

Topology of the AdVerb plug-in.

Topology of the AdVerb plug-in.

The pre-delay is implemented by a simple delay line, and the low-pass filters are of the one-pole IIR variety.  Putting the LPFs inside the comb filters’ feedback loops simulates the absorption of energy that sound undergoes as it comes in contact with surfaces and travels through air.  This factor can be controlled with a damping parameter in the plug-in.

The one-pole moving-average filter is there for an extra bit of high frequency roll-off, and I chose it because this particular filter is an FIR type and has linear phase so it won’t add further disturbance to the modulated samples entering it.  The last (normal) all-pass filter in the series serves to add extra diffusion to the reverb tail.

Here are some short sound samples using a selection of presets included in the plug-in:

Piano, “Medium Room” preset

The preceding sample demonstrates a normal reverb setting.  Following are a few samples that demonstrate a couple of subtle and not-so-subtle effects:

Piano, “Make it Vintage” preset

Piano, “Bad Grammar” preset

Flute, “Shimmering Tail” preset

Feel free to get in touch regarding any questions or comments on “AdVerb“.

Building a Tone Generator for iOS using Audio Units

Over the past month or so, I’ve been working with a friend and colleague, George Hufnagl, on building an iOS app for audiophiles that includes several useful functions and references for both linear and interactive media.  The app is called “Pocket Audio Tools” and will be available sometime in mid-August for iPhone, with iPad native and OS X desktop versions to follow.  Among the functions included in the app is one that displays frequency values for all pitches in the MIDI range with adjustable A4 tuning.  As an extension of this, we decided to include playback of a pure sine wave for any pitch selected as further reference.  While standard audio playback on iOS is quite straightforward, building a tone generator requires manipulation of the actual sample data, and for that level of control, you need to access the lowest audio layer on iOS — Audio Units.

To do this, we set up an Objective-C class (simply derived from NSObject) that takes care of all the operations we need to initialize and play back our sine tone.  To keep things reasonably short, I’m just going to focus on the specifics of initializing and using Audio Units.

Here is what we need to initialize our tone generator; we’re setting up an output unit with input supplied by a callback function.

AudioComponentDescription defaultOutputDescription = { 0 };
defaultOutputDescription.componentType = kAudioUnitType_Output;
defaultOutputDescription.componentSubType = kAudioUnitSubType_RemoteIO;
defaultOutputDescription.componentManufacturer = kAudioUnitManufacturer_Apple;
defaultOutputDescription.componentFlags = 0;
defaultOutputDescription.componentFlagsMask = 0;

AudioComponent defaultOutput = AudioComponentFindNext(NULL, &defaultOutputDescription);
OSErr error = AudioComponentInstanceNew(defaultOutput, &_componentInstance);

Above, we set our type and sub type to an output Audio Unit in which its I/O interfaces directly with iOS (specified by RemoteIO).  Currently on iOS, no third-party AUs are allowed, so the only manufacturer is Apple.  The two flag fields are set to 0 as per the Apple documentation.  Next we search for the default Audio Component by passing in NULL as the first argument to the function call, and then we initialize the instance.

Once an Audio Unit instance has been created, it’s properties are customized through calls to AudioUnitSetProperty().  Here we set up the callback function used to populate the buffers with the sine wave for our tone generator.

AURenderCallbackStruct input;
input.inputProc = RenderTone;
input.inputProcRefCon = (__bridge void *)(self);
error = AudioUnitSetProperty(_componentInstance, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Input, 0, &input, sizeof(input));

After setting the callback function RenderTone (we’ll define this function later) and our user data (self in this case because we want the instance of the class we’re defining to be our user data in the callback routine), we set this property to the AU instance.  The scope of the Audio Unit refers to the context for which the property applies, in this case input.  The “0” argument is the element number, and we want the first one, so we specify “0”.

Next we need to specify the type of stream the audio callback function will be expecting.  Again we use the AudioUnitSetProperty() function call, this time passing in an instance of  AudioStreamBasicDescription.

AudioStreamBasicDescription streamFormat = { 0 };
streamFormat.mSampleRate = _sampleRate;
streamFormat.mFormatID = kAudioFormatLinearPCM;
streamFormat.mFormatFlags = kAudioFormatFlagsNativeFloatPacked | kAudioFormatFlagIsNonInterleaved;
streamFormat.mBytesPerPacket = 4; // 4 bytes for 'float'
streamFormat.mBytesPerFrame = 4; // sizeof(float) * 1 channel
streamFormat.mFramesPerPacket = 1; // 1 channel
streamFormat.mChannelsPerFrame = 1; // 1 channel
streamFormat.mBitsPerChannel = 8 * 4; // 1 channel * 8 bits/byte * sizeof(float)
error = AudioUnitSetProperty(_componentInstance, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, 0,
                                     &streamFormat, sizeof(AudioStreamBasicDescription));

The format identifier in our case is linear PCM since we’ll be dealing with non-compressed audio data.  We’ll also be using native floating point as our sample data, and because we’ll be using only one channel, our buffers don’t need to be interleaved.  The bytes per packet field is set to 4 since we’re using floating point numbers (which is of course 4 bytes in size), and bytes per frame is the same because in mono format, 1 frame is equal to 1 sample.  In uncompressed formats, we need 1 frame per packet, and we specify 1 channel for mono.  Bits per channel is just calculated as 8 * sizeof our data type, which is float.

This completes the setup of our Audio Unit instance, and we then just initialize it with a call to

AudioUnitInitialize(_componentInstance);

Before looking at the callback routine, there is one extra feature that I included in this tone generator.  Simply playing back a sine wave results in noticeable clicks and pops in the audio due to the abrupt amplitude changes, so I added fade ins and fade outs to the playback.  This is accomplished simply by storing arrays of fade in and fade out amplitude values that shape the audio based on the current playback state.  For our purposes, setting a base amplitude level of 0.85 and using linear interpolation to calculate the arrays using this value is sufficient.  Now we can examine the callback function.

OSStatus RenderTone (void* inRefCon, AudioUnitRenderActionFlags* ioActionFlags,
                     const AudioTimeStamp* inTimeStamp,
                     UInt32 inBusNumber,
                     UInt32 inNumberFrames,
                     AudioBufferList* ioData)
{
    PAToneGenerator *toneGenerator = (__bridge PAToneGenerator*)(inRefCon);
    Float32 *buffer = (Float32*)ioData->mBuffers[0].mData;

    for (UInt32 frame = 0; frame < inNumberFrames; ++frame)
    {

We only need to consider the following arguments: inRefCon, inNumberFrames, and ioData.  First we need to cast our user data to the right type (our tone generator class), and then get our buffer that we will be filling with data.  From here, we need to determine the amplitude of our sine wave based on the playback state.

switch ( toneGenerator->_state ) {
    case TG_STATE_IDLE:
        toneGenerator->_amplitude = 0.f;
        break;

    case TG_STATE_FADE_IN:
        if ( toneGenerator->_fadeInPosition < kToneGeneratorFadeInSamples ) {
            toneGenerator->_amplitude = toneGenerator->_fadeInCurve[toneGenerator->_fadeInPosition];
            ++toneGenerator->_fadeInPosition;
        } else {
            toneGenerator->_fadeInPosition = 0;
            toneGenerator->_state = TG_STATE_SUSTAINING;
        }
        break;

    case TG_STATE_FADE_OUT:
        if ( toneGenerator->_fadeOutPosition < kToneGeneratorFadeOutSamples ) {
            toneGenerator->_amplitude = toneGenerator->_fadeOutCurve[toneGenerator->_fadeOutPosition];
            ++toneGenerator->_fadeOutPosition;
        } else {
            toneGenerator->_fadeOutPosition = 0;
            toneGenerator->_state = TG_STATE_IDLE;
        }
        break;

    case TG_STATE_SUSTAINING:
        toneGenerator->_amplitude = kToneGeneratorAmplitude;
        break;

    default:
        toneGenerator->_amplitude = 0.f;
        break;
}

Once we have the amplitude, we simply fill the buffer with the sine wave.

buffer[frame] = sinf(toneGenerator->_phase) * toneGenerator->_amplitude;

toneGenerator->_phase += toneGenerator->_phase_incr;
toneGenerator->_phase = ( toneGenerator->_phase > kTwoPi ? toneGenerator->_phase - kTwoPi : toneGenerator->_phase );

One important thing to bear in mind is that the callback function is continually called after we start the Audio Unit with

AudioOutputUnitStart(_componentInstance);

We don’t need to start and stop the entire instance each time a tone is played because that could very well introduce some delays in playback, so the callback continually runs in the background while the Audio Unit is active.  That way, calls to play and stop the tone are as simple as changing the playback state.

- (void)playTone
{
    if ( _isActive) {
        _state = TG_STATE_FADE_IN;
    }
}

- (void)stopTone
{
    if ( _isActive ) {
        if ( _state == TG_STATE_FADE_IN ) {
            _state = TG_STATE_IDLE;
        } else {
            _state = TG_STATE_FADE_OUT;
        }
    }
}

The reason for the extra check in stopTone() is to prevent clicks from occuring if the playback time is very short.  In other words, if the playback state has not yet reached its sustain point, we don’t want any sound output.

To finish off, we can stop the Audio Unit by calling

AudioOutputUnitStop(_componentInstance);

and free the resources with calls to

AudioUnitUninitialize(_componentInstance);
AudioComponentInstanceDispose(_componentInstance);
_componentInstance = nil;

That completes this look at writing a tone generator for output on iOS.  Apple’s Audio Units layer is a sophisticated and flexible system that can be used for all sorts of audio needs including effects, analysis, and synthesis.  Keep an eye out for Pocket Audio Tools, coming soon to iOS!

DFT Spectral Analysis with OpenGL

The idea for this little experiment/project of mine came about while I was exploring OpenGL through the use of Cinder, a C++ library that focuses on creative coding.  Among its many features, it supports 2D and 3D graphics through OpenGL.  As I was experimenting with building a particle engine, I started thinking about how audio can drive various graphical parameters and elements.  We don’t see this very much, but it has some intriguing applications in video games and other interactive media.

We all know that randomness is a large part of sound and visual effects in games these days.  To illustrate, let’s take an example of a magic spell; both its visual and aural element will have randomness, but will these sync up together?  In other words, if a particular instance of the spell’s sound effect contains a lot of high harmonics, does the visual aspect of it respond by being brighter or more energetic?  If the visual and aural parts of the spell are completely independent, there is less chance that they will behave as one cohesive unit, where the visual element is accurately reflective of the aural part.  It’s a subtle thing, and not applicable in every case, but I strongly believe that these details add much to immersion, and impress upon us a more organic game world.

So now, how does DSP fit into this?  Analysing a sound in the time domain (represented by the wave form) doesn’t give us any useful information as to the frequency content or the particular energy it contains across its spectrum.  The DFT (Discrete Fourier Transform) is used to convert a signal into the frequency domain where this information resides.  Instead of displaying amplitude (y axis) vs. time (x axis) in the time domain, the frequency domain plots the amplitude of each frequency component (y axis) vs. the frequency (x axis).  This is a plot of the frequency domain in Audacity:

Frequency domain plot of a piano chord.

Frequency domain plot of a piano chord.

Spectral analysis using the DFT is nothing new and has been around for some time, but its due to the FFT (Fast Fourier Transform) and dramatically increased performance with floating point operations in modern computers that using the DFT for analysis in real time has become practical.  The DFT is calculated using convolution, which we know is very slow.  The FFT uses a highly optimized algorithm (the Cooley-Turkey is the most widely used, but there are several others) to calculate the DFT of a signal.  Going into detail on the math of the FFT is beyond my experience, but fortunately we don’t need to know its intricacies to use it effectively.

The DFT works by breaking a signal down into its individual components.  This is related to the Fourier Theorem, which states that any complex waveform can be constructed by the addition of sine waves — the building blocks of a signal.  Each sine wave represents a single frequency component; its amplitude is the strength of that frequency.  The DFT results in a complex signal with both cosine and sine wave components.  This information isn’t terribly useful to us in this form.  From the complex-valued signal, however, we find the magnitude of each frequency component by converting it into polar form.  In rectangular form a complex number is represented as:

a + jb (DSP normally uses “j” instead of “i”), where a is the real part and b the imaginary part

In polar form, a complex number is represented by an angle (the argument) and the magnitude (absolute value).  The absolute value is what we’re interested in, and is calculated according to Pythagoras’ Theorem:

m = sqrt( x2 + y2 )

This value, plotted against the frequency (which would be the index of the complex number corresponding to the index of the real-valued array) gives us the spectrum of the signal.  In other words, if we take the FFT of an N-point signal, we get N pairs of complex values as output.  Each N in the complex signal represents a frequency along the x axis, with 0 being DC, N/2 being Nyquist, and N being the sampling rate.  The corresponding magnitude of the complex number gives us the amplitude of that frequency component in the signal.  A final important point to make here is that we’re only interested in the frequency points up to N/2, because anything above Nyquist aliases in discrete time (the spectrum of a signal is actually mirrored around N/2).  Here is a sample plot of a piano chord I recorded, using 5 512-sample buffers from the audio:

Frequency plot of 5 512-sample buffers of a piano chord.

Frequency plot of 5 512-sample buffers of a piano chord.

This plot shows us that most of the energy lies in the frequency range 0 – 0.04 (assuming a sampling rate of 48kHz, this would equate to approximately 0 – 1920 Hz).  Now that we can extract the frequency information from an audio signal, I want to talk about how I use this data along with the RMS value to drive the particle engine I made.

There are a number of parameters in generating particles that I considered.  The number of particles, how they’re generated/constructed, the radius, and their color.  Particles are only generated during transients, and so I check a previous RMS value against the new one to determine if particles should be constructed.  This method isn’t foolproof, but with my tests it’s been working quite well for my purposes.  The number of particles generated is also related to the RMS of the signal — the louder it is, the more particles are made.  This is evenly divided across the whole spectrum, so that at least a few particles from each frequency range is present.  The radius of each particle is determined by the frequency component (the N value of the frequency plot above).  The lower the frequency, the larger the particle.  Finally, the color, or more specifically the brightness of the particle, is determined by the frequency amplitude.  Let me go into a little more detail on each of these before presenting a little video I made demonstrating the program.

First, here is the function that gathers all the information (I’m using Portaudio for real-time playback, so all the audio processing happens in a callback function):

int audioCallback (const void* input, void* output, unsigned long samples, const PaStreamCallbackTimeInfo *timeInfo, PaStreamCallbackFlags statusFlags, void* userData)
{
    const float* in = (const float*)input;
    float* out = (float*)output;
    PaAudioData* data = (PaAudioData*)userData;

    memset(data->buffer, 0, sizeof(double) * SAMPLE_BLOCK * 2);
    float rms = CFDsp::cf_SSE_calcRms(in, samples);

    for (int i = 0; i < samples; ++i) {
        data->buffer[i*2] = *in * hammingWindow(i, SAMPLE_BLOCK);

        *out++ = *in++;
        *out++ = *in++;
    }

    // Calculate DFT using FFT.
    fft(data->buffer, SAMPLE_BLOCK);

    data->amp_prev = data->amp_now;
    data->amp_now = rms;

    return paContinue;
}

Since the FFT function I’m using requires interleaved data, it needs to be zeroed out first and then each real-valued sample stored in the even-numbered indices.  The RMS value is acquired using a function I wrote using SSE intrinsics for optimal speed.

float cf_SSE_calcRms (const float* inBuffer, const int length)
{
    assert(length % 4 == 0);

    __m128* mbuf = (__m128*)inBuffer;
    __m128 m1, m2;
    __m128 macc = _mm_set1_ps(0.f);
    __m128 _1oN = _mm_set1_ps(1.f / static_cast<float>(length));

    for (int i = 0; i < length; i+=4) {
        m1 = _mm_mul_ps(*mbuf, *mbuf);
        m2 = _mm_mul_ps(m1, _1oN);
        macc = _mm_add_ps(m2, macc);
        mbuf++;
    }

    __declspec(align(16)) float result[4];
    _mm_store_ps(result, macc);

    return sqrtf(result[0] + result[1] + result[2] + result[3]);
}

Inside the for loop, a Hamming window is applied to the signal prior to sending it to the FFT to bound it along the edges, minimizing or eliminating frequency leakage in the spectrum.  In the application’s update method, I test for the presence of a transient in the current buffer.  If one exists, particles are added.

if (audioData.isTransient())
{
    Vec2i randPos;

    // 50 pixel border.
    randPos.x = Rand::randInt(50, getWindowWidth() - 50);
    randPos.y = Rand::randInt(50, getWindowHeight() - 50);
    mParticleController.addParticles(randPos, audioData);
}

The adding of particles is handled by the ParticleController class:

void ParticleController::addParticles (const Vec2i& loc, const PaAudioData& data)
{
    int stride = static_cast<int>((64.f - powf((64.f * data->amp_now), 1.32f)) + 0.5f;
    stride = (stride < 1 ? 1 : stride);

    // 512 == sample block size; represents the horizontal frequency axis in the spectral analysis.
    for (int i = 0; i < 512; i+=stride) {
        Vec2f randVec = Rand::randVec2f() * 6.5f;
        mParticles.push_back(Particle(loc + randVec, i/1024.f, data.mag(i) * 18.f));
    }
}

First I calculate a stride value that controls the speed at which we execute the following for loop (this evenly divides the number of particles across the x axis).  As you can see, this is determined by the volume (calculated as RMS) of the current audio buffer.  When creating each individual particle, the i/1024 argument determines the radius — as I mentioned this is based on the frequency value along the x axis.  To constrain along reasonable bounds, this value is further calculated as

sqrt( 1 / (i / 1024) )

The color of each particle is first determined using HSV.  The hue is randomly chosen, but the saturation and value (brightness) is based on the data.mag(i) * 18 argument — i.e. the amplitude of the frequency component.  This is then converted into an RGB value to set the color of the individual particle.

One of the primary challenges of getting this working properly is mapping the output from the DFT analysis and RMS to usable values in the particle engine.  That’s the reason for several of the “magic numbers” in the function above (1024 is not one of them; it’s double our sample length, which I said previously we only want from 0 – 0.5, and since the data stored is in interleaved format, the buffer is actually 1024 samples long).  Quite a lot of experimenting went into fine-tuning these values to get good results, but more can be done in this area, which I am interested in exploring further.  I feel that this is a good first step in this area, however, and I’ll be looking at ways to integrate this process into further projects and ideas I have brewing.

With all of that out of the way, here is the video demonstrating the result (note: my FPS took a hit due to recording, and my way-old, out-of-date video card didn’t help matters either):

DFT Spectral Particle Engine from Christian on Vimeo.

A Game of Tic-Tac-Toe Using the FMOD Sound Engine

With this post I’m taking a slight diversion away from low-level DSP and plug-ins to share a fun little experimental project I just completed.  It’s a game of Tic-Tac-Toe using the FMOD sound engine with audio based on the minimalist piano piece “Für Alina” by Arvo Pärt.  In the video game industry there are two predominant middleware audio tools that sound designers and composers use to define the behavior of audio within a game.  Audiokinetic’s Wwise is one, and Firelight Technologies’ FMOD is the other.  Both have their strengths and weaknesses, but I chose to work with FMOD on this little project because Wwise’s Mac authoring tool is still in the alpha stage.  I used FMOD’s newest version, FMOD Studio.

The fun and interesting part of this little project was working with the fairly unusual approach to the audio.  I explored many different ways to implement the sound in a way that both reflected the subtletly of the original music while reacting to player actions and the state of the game.  Here is a video demonstrating the result.  Listen for subtle changes in the audio as the game progresses.  The behavior of the audio is all based on a few simple rules and patterns governed by player action.

A Game of Tic-Tac-Toe with Arvo Part from Christian on Vimeo.

The game is available for download (Mac OS X 10.7+ only).

Upgrading Dyner with Parameter Smoothing

It’s been just over a month since I released my free ring modulator plug-in, Dyner.  Since then I’ve gotten some very nice feedback on it, and thankfully no crash reports (well.. that I know of). Additionally, I’ve been tweaking a few things like the GUI controls and doing some minor optimizations, but more importantly I just finished adding smoothing to the depth parameter, so I figured it was time to upgrade to version 1.0.4 and put that out.  Also, since parameter smoothing is a pretty important feature in DSP for achieving good quality sound, I thought I’d give a brief overview of what I did.

First, let’s hear the difference.  Previously, artifacts were most easily heard using a low frequency setting with the sine wave while changing the depth somewhat rapidly.  Other wave forms, due to their rich harmonic content, obscured most artifacts that resulted in automating the depth.  So here then, are two audio samples “before” and “after”:

Before parameter smoothing

After parameter smoothing

In writing the parameter smoother class, there were a few features I felt were important to include:

  • Lightweight and easy to use,
  • Flexible in terms of how the interpolated smoothing is calculated,
  • Reusable for other plug-ins or DSP effects.

One thing I wanted to avoid was having to insert a whole bunch of new code into my plug-in to “check for this”, “update that”, “initialize”, etc.  The more operations that are handled inside the parameter smoother class, the better.  To accomplish this, I decided to include a reference to the parameter value that’s receiving the smoothing.  Once the object is initialized with this reference, it is automatically notified when the user changes the value of the parameter, and I compare that to the previous value that is stored within the class to signal the change.  In other words,

Constructor signature

Constructor signature

Instantiating the class; first argument is passed by reference.

Instantiating the class; first argument is passed by reference.

Since the first argument (initialValue) is passed by reference and stored internally as such, changing paramDepth outside the object automatically changes its reference inside as well since they point to the same address space.  Once this connections is made, there is no need to call an external function to check whether the parameter value has changed, or to explicitly set the change.  This minimizes function calls and makes it very easy to use and implement in any project; just instantiate the object as above, and then call one method within the audio’s process loop to fetch the parameter value.

Parameter smoothing itself is typically handled by interpolating from the current value to the target value that has just been set by the user, making the change more gradual to avoid artifacting that can result from the abrupt jump otherwise (other methods might include passing parameter values through a low-pass filter).  Linear interpolation is often good enough for this task.  It’s very light on computation and easy to implement.  But I wanted to leave it open to implement different curve shapes that the interpolation can be mapped onto.  Especially for cases where the smoothing takes place over a longer duration of samples whereby a specific curve would become more noticeable.  This was easily handled through a callback function (also seen in the above image of the constructor), which allows me to define a custom function that the interpolation follows.

Callback function signature.

Callback function signature.

For example, to define an exponential curve, I could write the following function, and then pass it as an argument (which is basically a function pointer) to the constructor:

Exponential function (x^1.84).

Exponential function (x^1.84).

Initializing with function pointer.

Initializing with function pointer.

Here we can see the difference graphically:

Smoothing with linear interpolation.

Smoothing with linear interpolation.

Smoothing with exponential interpolation.

Smoothing with exponential interpolation.

Being able to specify the number of samples over which the parameter smoothing takes place is also a fairly obvious customization that I included.  This way, I can adapt the smoothing to be equal to the block size that the host passes to my plug-in, for example.  Another option I wanted to include was the ability to calculate new interpolated values at a different rate than the sampling rate.  In other words, instead of calculating new values every sample in the processing loop, I can have it calculate every 2, 4, 12, 48, etc. samples by setting the stride variable that you might have noticed above.  It’s rarely necessary to calculate a new value every sample, and in fact, the smoothing in Dyner happens every 4 samples.  This could be increased as well if a custom callback is used that is of greater complexity than simple linear interpolation (e.g. mapping to an S-curve).  The resulting interpolation looks like this with a stride value of 24:

Linear smoothing with a stride of 24 samples.

Linear smoothing with a stride of 24 samples.

Lastly, you might have noticed that the above graph (apart from the staircases) looks different from the preceding ones.  This illustrates a critical feature that needed attention: allowing the parameter smoothing to change when a new value is set while smoothing is already taking place due to a previous change.  In such a case, the current state nees to be saved and a new one begins towards the new target parameter value.  I did this by defining four states that I keep track of using bitwise flags.

State-handling for the parameter smoother.

State-handling for the parameter smoother.

Updating the state flags using bitwise operations.

Updating the state flags using bitwise operations.

As can be seen above, comparison between the parameter reference value and its previous setting determines whether a new change has occurred.  mPos is the position index along the interpolation line/curve, so when it is non-zero it means smoothing is currently processing due to a previous value change.

It’s also worth mentioning that the actual method that retrieves the smoothed parameter value (getParameterValue) needs to be called every frame of processing so it can properly update itself.  It replaces any direct use of the original parameter for processing calculations.

Example of using the class method instead of the actual parameter inside the processing loop.

Example of using the class method instead of the actual parameter inside the processing loop.