Tag Archives: Core Audio

Custom Game Engine on iOS: Audio

Having previously covered the general architecture and the graphics system, we come now to the audio part of the game engine. One might be surprised (or unsurprised depending on one’s expectations) that it is conceptually very similar to how the graphics work. As a quick recap of what was covered in the last part, the platform sends some memory to the game representing a bitmap that the game draws into for each frame, then takes that bitmap at the end of the frame and renders it onto the screen.

For my audio system, it works very much the same. The platform sends some memory representing an audio buffer to the game layer. Like the graphics system, the game layer does all the heavy lifting, handling the mixing of audio sources from music and sound effects into the buffer passed in by the platform layer. The platform layer then takes this audio buffer and sends it to the audio output provided by the operating system.

To do this, the platform layer needs to talk to Core Audio, which powers the audio for all of Apple’s platforms. Core Audio is a purely C-based API, and can be a little cumbersome to deal with at times (albeit very powerful). However, since the game layer handles all the mixing and all the platform layer is concerned with is one master audio buffer, calls to the Core Audio API are minimal.

In order to encapsulate data around the audio buffer, the bridging layer declares a PlatformAudio struct (recall that the bridging layer is a .h/.cpp file pair that connects the game, written in C++, to the iOS platform layer written in Swift):

struct PlatformAudio {
    double sampleRate;
    uint32_t channels;
    uint32_t bytesPerFrame;
    void *samples;
};

Initialization of this struct and the audio system as a whole takes place in the didFinishLaunching method of the AppDelegate:

class AppDelegate: UIResponder, UIApplicationDelegate {

    typealias AudioSample = Int16
    var audioOutput: AudioComponentInstance?
    var platformAudio: PlatformAudio!

    func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
        // Platform initialization
        
        if initAudio(sampleRate: 48000.0, channels: 2) {
            ios_audio_initialize(&platformAudio)
            AudioOutputUnitStart(audioOutput!)
        }
        
        ios_game_startup()
        
        let screen = UIScreen.main
        window = UIWindow(frame: screen.bounds)
        window?.rootViewController = ViewController()
        window?.makeKeyAndVisible()
        
        return true
    }
}

The AudioComponentInstance object represents an Audio Unit, which in Core Audio is required for working with audio at a low level, and provides the lowest latency for audio processing. After initializing the platform layer (seen back in the first part of this series), the audio system is first initialized on the OS side before initializing it in the game layer (via the bridging interface). Once that is done, the output unit is started — it will become clearer what this actually does very soon.

The audio interface in the bridging layer consists of three functions:

struct PlatformAudio* ios_create_platform_audio(double sampleRate, uint16_t channels, uint16_t bytesPerSample);
void ios_audio_initialize(struct PlatformAudio *platformAudio);
void ios_audio_deinitialize(struct PlatformAudio *platformAudio);

Before moving on to have a look at the initAudio method, here is the implementation for these functions (in the bridging .cpp file):

static bbAudioBuffer audioBuffer = {0};

struct PlatformAudio*
ios_create_platform_audio(double sampleRate, uint16_t channels, uint16_t bytesPerSample) {
    static PlatformAudio platformAudio = {};
    platformAudio.sampleRate = sampleRate;
    platformAudio.channels = channels;
    platformAudio.bytesPerFrame = bytesPerSample * channels;
    return &platformAudio;
}

void
ios_audio_initialize(struct PlatformAudio *platformAudio) {
    platform.audioSampleRate = platformAudio->sampleRate;
    audioBuffer.sampleRate = platformAudio->sampleRate;
    audioBuffer.channels = platformAudio->channels;
    audioBuffer.samples = (int16_t *)calloc(audioBuffer.sampleRate, platformAudio->bytesPerFrame);
    audioBuffer.mixBuffer = (float *)calloc(audioBuffer.sampleRate, sizeof(float) * platformAudio->channels);
    platformAudio->samples = audioBuffer.samples;
}

void
ios_audio_deinitialize(struct PlatformAudio *platformAudio) {
    free(audioBuffer.samples);
    free(audioBuffer.mixBuffer);
}

Pretty straightforward, really. The ios_create_platform_audio function is called at the start of initAudio:

private func initAudio(sampleRate: Double, channels: UInt16) -> Bool {
    let bytesPerSample = MemoryLayout<AudioSample>.size
    if let ptr = ios_create_platform_audio(sampleRate, channels, UInt16(bytesPerSample)) {
        platformAudio = ptr.pointee
    } else {
        return false
    }

    var streamDescription = AudioStreamBasicDescription(mSampleRate: sampleRate,
                                                        mFormatID: kAudioFormatLinearPCM,
                                                        mFormatFlags: kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked,
                                                        mBytesPerPacket: platformAudio.bytesPerFrame,
                                                        mFramesPerPacket: 1,
                                                        mBytesPerFrame: platformAudio.bytesPerFrame,
                                                        mChannelsPerFrame: platformAudio.channels,
                                                        mBitsPerChannel: UInt32(bytesPerSample * 8),
                                                        mReserved: 0)
    
    var desc = AudioComponentDescription()
    desc.componentType = kAudioUnitType_Output
    desc.componentSubType = kAudioUnitSubType_RemoteIO
    desc.componentManufacturer = kAudioUnitManufacturer_Apple
    desc.componentFlags = 0
    desc.componentFlagsMask = 0
    
    guard let defaultOutputComponent = AudioComponentFindNext(nil, &desc) else {
        return false
    }
    
    var status = AudioComponentInstanceNew(defaultOutputComponent, &audioOutput)
    if let audioOutput = audioOutput, status == noErr {
        var input = AURenderCallbackStruct()
        input.inputProc = ios_render_audio
        withUnsafeMutableBytes(of: &platformAudio) { ptr in
            input.inputProcRefCon = ptr.baseAddress
        }
        
        var dataSize = UInt32(MemoryLayout<AURenderCallbackStruct>.size)
        status = AudioUnitSetProperty(audioOutput, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Input, 0, &input, dataSize)
        if status == noErr {
            dataSize = UInt32(MemoryLayout<AudioStreamBasicDescription>.size)
            status = AudioUnitSetProperty(audioOutput, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, 0, &streamDescription, dataSize)
            if status == noErr {
                status = AudioUnitInitialize(audioOutput)
                return status == noErr
            }
        }
    }
    
    return false
}

After creating the PlatformAudio instance, the method proceeds to setup the output Audio Unit on the Core Audio side. Core Audio needs to know what kind of audio it will be dealing with and how the data is laid out in memory in order to interpret it correctly, and this requires an AudioStreamBasicDescription instance that is eventually set as a property on the audio output unit.

The first property is easy enough, just being the sample rate of the audio. For the mFormatID parameter, I pass in a flag specifying that the audio data will be uncompressed — just standard linear PCM. Next, I pass in some flags for the mFormatFlags parameter specifying that the audio samples will be packed signed 16 bit integers. Another flag that can be set here is one that specifies that the audio will be non-interleaved, meaning that all samples for each channel are grouped together, and each channel is laid out end-to-end. As I have omitted this flag, the audio is interleaved, meaning that the samples for each channel are interleaved in a single buffer as in the diagram below:

Interleaved audio layout

Interleaved audio layout.

(As a quick side-note, although the final format of the audio is signed 16 bit integers, the game layer mixes in floating point. This is a common workflow in audio, to mix and process at a higher resolution and sampling rates than the final output.)

The rest of the fields in the stream description require a bit of calculation. Well, except for mFramesPerPacket, which is set to 1 for uncompressed audio; and since there is 1 frame per packet, mBytesPerPacket is the same as the number of bytes per frame. mChannelsPerFrame is just going to be the number of channels, and mBitsPerChannel is just going to be the size of an audio sample expressed as bits. The bytes per frame value, as seen above, is simply calculated from the bit depth of the audio (bytes per sample) and the number of channels.

Next, I need to get the output Audio Component. I need an Audio Unit in the Core Audio system that will send audio to the output hardware of the device. To find this component, an AudioComponentDescription is required and needs to be configured with parameters that return the desired unit (iOS contains a number of built-in units, from various I/O units to mixer and effect units). To find the audio output unit I need, I specify “output” for the type, “remote I/O” for sub type (the RemoteIO unit is the only one that connects to the audio hardware for I/O), and “Apple” as the manufacturer.

Once the component is found with a call to AudioComponentFindNext, I initialize the audio output unit with this component. This Audio Unit (and Core Audio in general) works on a “pull model” — you register a function with Core Audio who will then call you whenever it needs audio from you to fill its internal buffers. This function gets called on a high-priority thread, and runs at a faster rate than the game’s update function. Effectively this means you have less time to do audio processing per call than you do for simulating and rendering a frame, so the audio processing needs to be fast enough to keep up. Missing an audio update means the buffer that is eventually sent to the hardware is most likely empty, resulting in audio artifacts like clicks or pops because of the discontinuity between the audio in the previous buffer.

In order to set the callback function on the Audio Unit, I need an AURenderCallbackStruct instance that takes a pointer to the callback function and a context pointer. Once I have this, it is set as a property on the Audio Unit by calling AudioUnitSetProperty, and specifying “input” as the scope (this tells Core Audio that this property is for audio coming in to the unit). Next I take the stream description that was initialized earlier and set it as a property on the Audio Unit, also on the “input” scope (i.e. this tells the Audio Unit about the audio data coming in to it). Finally, the audio is initialized and is then ready for processing. The call we saw earlier to start the Audio Unit after initialization tells the OS to start calling this callback function to receive audio.

The callback function itself is actually quite simple:

fileprivate func ios_render_audio(inRefCon: UnsafeMutableRawPointer,
                                  ioActionFlags: UnsafeMutablePointer<AudioUnitRenderActionFlag>,
                                  inTimeStamp: UnsafePointer<AudioTimeStamp>,
                                  inBusNumber: UInt32,
                                  inNumberFrames: UInt32,
                                  ioData: UnsafeMutablePointer<AudioBufferList>?) -> OSStatus
{
    var platformAudio = inRefCon.assumingMemoryBound(to: PlatformAudio.self).pointee
    ios_process_audio(&platformAudio, inNumberFrames)
    
    let buffer = ioData?.pointee.mBuffers.mData
    buffer?.copyMemory(from: platformAudio.samples, byteCount: Int(platformAudio.bytesPerFrame * inNumberFrames))
    
    return noErr
}

Similar to the graphics system, here is where the call is made to the bridging layer to process (i.e. fill) the audio buffer with data from the game. Core Audio calls this function with the number of frames it needs as well as the buffer(s) to place the data in. Once the game layer is done processing the audio, the data is copied into the buffer provided by the OS. The ios_process_audio function simply forwards the call to the game layer after specifying how many frames of audio the system requires:

void
ios_process_audio(struct PlatformAudio *platformAudio, uint32_t frameCount) {
    audioBuffer.frameCount = frameCount;
    process_audio(&audioBuffer, &gameMemory, &platform);
}

The last part to cover in the audio system of my custom game engine is how to handle audio with regard to the lifecycle of the application. We saw how audio is initialized in the didFinishLaunching method of the AppDelegate, so naturally the audio is shut down in the applicationWillTerminate method:

func applicationWillTerminate(_ application: UIApplication) {
    if let audioOutput = audioOutput {
        AudioOutputUnitStop(audioOutput)
        AudioUnitUninitialize(audioOutput)
        AudioComponentInstanceDispose(audioOutput)
    }
    
    ios_game_shutdown()
    
    ios_audio_deinitialize(&platformAudio)
    ios_platform_shutdown()
}

When the user hits the Home button and sends the game into the background, audio processing needs to stop, and when the game is brought back to the foreground again, it needs to resume playing. Stopping the Audio Unit will halt the callback function that processes audio from the game, and starting it will cause Core Audio to resume calling the function as needed.

func applicationWillEnterForeground(_ application: UIApplication) {
    if let audioOutput = audioOutput {
        AudioOutputUnitStart(audioOutput)
    }
    
    if let vc = window?.rootViewController as? ViewController {
        vc.startGame()
    }
}

func applicationDidEnterBackground(_ application: UIApplication) {
    if let audioOutput = audioOutput {
        AudioOutputUnitStop(audioOutput)
    }
    
    if let vc = window?.rootViewController as? ViewController {
        vc.stopGame()
    }
}

This completes my detailed overview of the three critical pieces of any game engine: the platform, the graphics, and the audio. And as I did with the blog post on the graphics system, here is a short demo of the audio system running in the game engine:

Pure Data and libpd: Integrating with Native Code for Interactive Testing

Over the past couple of years, I’ve built up a nice library of DSP code, including effects, oscillators, and utilities. One thing that always bothered me however, is how to test this code in an efficient and reliable way. The two main methods I have used in the past have their pros and cons, but ultimately didn’t satisfy me.

One is to process an effect or generate a source into a wave file that I can open with an audio editor so I can listen to the result and examine the output. This method is okay, but it is tedious and doesn’t allow for real-time adjustment of parameters or any sort of instant feedback.

For effects like filters, I can also generate a text file containing the frequency/phase response data that I can view in a plotting application. This is useful in some ways, but this is audio — I want to hear it!

Lately I’ve gotten to know Pure Data a little more, so I thought about using it for interactive testing of my DSP modules. On its own, Pure Data does not interact with code of course, but that’s where libpd comes in. This is a great library that wraps up much of Pure Data’s functionality so that you can use it right from your own code (it works with C, C++, Objective-C, Java, and others). Here is how I integrated it with my own code to set up a nice flexible testing framework; and this is just one application of using libpd and Pure Data together — the possiblities go far beyond this!

First we start with the Pure Data patches. The receiver patch is opened and maintained in code by libpd, and has two responsiblities: 1) generate a test tone that the effect is applied to, and 2) receive messages from the control patch and dispatch them to C++.

Receiver patch, opened by libpd.

Receiver patch, opened by libpd.

The control patch is opened in Pure Data and acts as the interactive patch. It has controls for setting the frequency and volume of the synthesizer tone that acts as the source, as well as controls for the filter effect that is being tested.

Control patch, opened in Pure Data, and serves as the interactive UI for testing.

Control patch, opened in Pure Data, and serves as the interactive UI for testing.

As can be seen from the patches above, they communicate to each other via the netsend/netreceive objects by opening a port on the local machine. Since I’m only sending simple data to the receiver patch, I opted to use UDP over TCP as the network protocol. (Disclaimer: my knowledge of network programming is akin to asking “what is a for loop”).

Hopefully the purpose of these two patches is clear, so we can now move on to seeing how libpd brings it all together in code. It is worth noting that libpd does not output audio to the hardware, it only processes the data. Pure Data, for example, commonly uses Portaudio to send the audio data to the sound card, but I will be using Core Audio instead. Additionally, I’m using the C++ wrapper from libpd.

An instance of PdBase is first created with the desired intput/output channels and sample rate, and a struct contains data that will need to be held on to that will become clear further on.

struct TestData
{
    AudioUnit outputUnit;
    EffectProc effectProc;

    PdBase* pd;
    Patch pdPatch;
    float* pdBuffer;
    int pdTicks;
    int pdSamplesPerBlock;

    CFRingBuffer<float> ringBuffer;
    int maxFramesPerSlice;
    int framesInReserve;
};

int main(int argc, const char * argv[])
{
    PdBase pd;
    pd.init(0, 2, 48000); // No input needed for tests.

    TestData testData;
    testData.pd = &pd;
    testData.pdPatch = pd.openPatch("receiver.pd", ".");
}

Next, we ask Core Audio for an output Audio Unit that we can use to send audio data to the sound card.

int main(int argc, const char * argv[])
{
    PdBase pd;
    pd.init(0, 2, 48000); // No input needed for tests.

    TestData testData;
    testData.pd = &pd;
    testData.pdPatch = pd.openPatch("receiver.pd", ".");

    {
        AudioComponentDescription outputcd = {0};
        outputcd.componentType = kAudioUnitType_Output;
        outputcd.componentSubType = kAudioUnitSubType_DefaultOutput;
        outputcd.componentManufacturer = kAudioUnitManufacturer_Apple;

        AudioComponent comp = AudioComponentFindNext(NULL, &outputcd);
        if (comp == NULL)
        {
            std::cerr << "Failed to find matching Audio Unit.\n";
            exit(EXIT_FAILURE);
        }

        OSStatus error;
        error = AudioComponentInstanceNew(comp, &testData.outputUnit);
        if (error != noErr)
        {
            std::cerr << "Failed to open component for Audio Unit.\n";
            exit(EXIT_FAILURE);
        }

        Float64 sampleRate = 48000;
        UInt32 dataSize = sizeof(sampleRate);
        error = AudioUnitSetProperty(audioUnit,
                                     kAudioUnitProperty_SampleRate,
                                     kAudioUnitScope_Input,
                                     0, &sampleRate, dataSize);

        AudioUnitInitialize(audioUnit);
    }
}

The next part needs some explanation, because we need to consider how the Pure Data patch interacts with Core Audio’s render callback function that we will provide. This function will be called continuously on a high priority thread with a certain number of frames that we need to fill with audio data. Pure Data, by default, processes 64 samples per channel per block. It’s unlikely that these two numbers (the number of frames that Core Audio wants and the number of frames processed by Pure Data) will always agree. For example, in my initial tests, Core Audio specified its maximum block size to be 512 frames, but it actually asked for 470 & 471 (alternating) when it ran. Rather than trying to force the two to match block sizes, I use a ring buffer as a medium between the two — that is, read sample data from the opened Pure Data patch into the ring buffer, and then read from the ring buffer into the buffers provided by Core Audio.

Fortunately, Core Audio can be queried for the maximum number of frames it will ask for, so this will determine the number of samples we read from the Pure Data patch. We can read a multiple of Pure Data’s 64-sample block by specifying a value for “ticks” in libpd, and this value will just be equal to the maximum frames from Core Audio divided by Pure Data’s block size. The actual number of samples read/processed will of course be multiplied by the number of channels (2 in this case for stereo).

The final point on this is to handle the case where the actual number of frames processed in a block is less than the maximum. Obviously it would only take a few blocks for the ring buffer’s write pointer to catch up with the read pointer and cause horrible audio artifacts. To account for this, I make the ring buffer twice as long as the number of samples required per block to give it some breathing room, and also keep track of the number of frames in reserve currently in the ring buffer at the end of each block. When this number exceeds the number of frames being processed in a block, no processing from the patch occurs, giving the ring buffer a chance to empty out its backlog of frames.

int main(int argc, const char * argv[])
{
    <snip> // As above.

    UInt32 framesPerSlice;
    UInt32 dataSize = sizeof(framesPerSlice);
    AudioUnitGetProperty(testData.outputUnit,
                         kAudioUnitProperty_MaximumFramesPerSlice,
                         kAudioUnitScope_Global,
                         0, &framesPerSlice, &dataSize);
    testData.pdTicks = framesPerSlice / pd.blockSize();
    testData.pdSamplesPerBlock = (pd.blockSize() * 2) * testData.pdTicks; // 2 channels for stereo output.
    testData.maxFramesPerSlice = framesPerSlice;

    AURenderCallbackStruct renderCallback;
    renderCallback.inputProc = AudioRenderProc;
    renderCallback.inputProcRefCon = &testData;
    AudioUnitSetProperty(testData.outputUnit,
                         kAudioUnitProperty_SetRenderCallback,
                         kAudioUnitScope_Input,
                         0, &renderCallback, sizeof(renderCallback));

    testData.pdBuffer = new float[testData.pdSamplesPerBlock];
    testData.ringBuffer.resize(testData.pdSamplesPerBlock * 2); // Twice as long as needed in order to give it some buffer room.
    testData.effectProc = EffectProcess;
}

With the output Audio Unit and Core Audio now set up, let’s look at the render callback function. It reads the audio data from the Pure Data patch if needed into the ring buffer, which in turn fills the buffer list provided by Core Audio. The buffer list is then passed on to the callback that processes the effect being tested.

OSStatus AudioRenderProc (void *inRefCon,
                          AudioUnitRenderActionFlags *ioActionFlags,
                          const AudioTimeStamp *inTimeStamp,
                          UInt32 inBusNumber,
                          UInt32 inNumberFrames,
                          AudioBufferList *ioData)
    {
        TestData *testData = (TestData *)inRefCon;

        // Don't require input, but libpd requires valid array.
        float inBuffer[0];

        // Only read from Pd patch if the sample excess is less than the number of frames being processed.
        // This effectively empties the ring buffer when it has enough samples for the current block, preventing the
        // write pointer from catching up to the read pointer.
        if (testData->framesInReserve < inNumberFrames)
        {
            testData->pd->processFloat(testData->pdTicks, inBuffer, testData->pdBuffer);
            for (int i = 0; i < testData->pdSamplesPerBlock; ++i)
            {
                testData->ringBuffer.write(testData->pdBuffer[i]);
            }
            testData->framesInReserve += (testData->maxFramesPerSlice - inNumberFrames);
        }
        else
        {
            testData->framesInReserve -= inNumberFrames;
        }

        // NOTE: Audio data from Pd patch is interleaved, whereas Core Audio buffers are non-interleaved.
        for (UInt32 frame = 0; frame < inNumberFrames; ++frame)
        {
            Float32 *data = (Float32 *)ioData->mBuffers[0].mData;
            data[frame] = testData->ringBuffer.read();
            data = (Float32 *)ioData->mBuffers[1].mData;
            data[frame] = testData->ringBuffer.read();
        }

        if (testData->effectCallback != nullptr)
        {
            testData->effectCallback(ioData, inNumberFrames);
        }

        return noErr;
    }

Finally, let’s see the callback function that processes the filter. It’s about as simple as it gets — it just processes the filter effect being tested on the audio signal that came from Pure Data.

void EffectProcess(AudioBufferList* audioData, UInt32 numberOfFrames)
{
    for (UInt32 frame = 0; frame < numberOfFrames; ++frame)
    {
        Float32 *data = (Float32 *)audioData->mBuffers[0].mData;
        data[frame] = filter.left.sample(data[frame]);
        data = (Float32 *)audioData->mBuffers[1].mData;
        data[frame] = filter.right.sample(data[frame]);
    }
}

Not quite done yet, though, since we need to subscribe the open libpd instance of Pure Data to the messages we want to receive from the control patch. The messages received will then be dispatched inside the C++ code to handle appropriate behavior.

int main(int argc, const char * argv[])
{
    <snip> // As above.

    pd.subscribe("fromPd_filterfreq");
    pd.subscribe("fromPd_filtergain");
    pd.subscribe("fromPd_filterbw");
    pd.subscribe("fromPd_filtertype");
    pd.subscribe("fromPd_quit");

    // Start audio processing.
    pd.computeAudio(true);
    AudioOutputUnitStart(testData.outputUnit);

    bool running = true;
    while (running)
    {
        while (pd.numMessages() > 0)
        {
            Message msg = pd.nextMessage();
            switch (msg.type)
            {
                case pd::PRINT:
                    std::cout << "PRINT: " << msg.symbol << "\n";
                    break;

                case pd::BANG:
                    std::cout << "BANG: " << msg.dest << "\n";
                    if (msg.dest == "fromPd_quit")
                    {
                        running = false;
                    }
                    break;

                case pd::FLOAT:
                    std::cout << "FLOAT: " << msg.num << "\n";
                    if (msg.dest == "fromPd_filterfreq")
                    {
                        filter.left.setFrequency(msg.num);
                        filter.right.setFrequency(msg.num);
                    }
                    else if (msg.dest == "fromPd_filtertype")
                    {
                        // (filterType is just an array containing the available filter types.)
                        filter.left.setState(filterType[(unsigned int)msg.num]);
                        filter.right.setState(filterType[(unsigned int)msg.num]);
                    }
                    else if (msg.dest == "fromPd_filtergain")
                    {
                        filter.left.setGain(msg.num);
                        filter.right.setGain(msg.num);
                    }
                    else if (msg.dest == "fromPd_filterbw")
                    {
                        filter.left.setBandwidth(msg.num);
                        filter.right.setBandwidth(msg.num);
                    }
                    break;

                default:
                    std::cout << "Unknown Pd message.\n";
                    std::cout << "Type: " << msg.type << ", " << msg.dest << "\n";
                    break;
            }
        }
    }
}

Once the test has ended by banging the stop_test button on the control patch, cleanup is as follows:

int main(int argc, const char * argv[])
{
    <snip> // As above.

    pd.unsubscribeAll();
    pd.computeAudio(false);
    pd.closePatch(testData.pdPatch);

    AudioOutputUnitStop(testData.outputUnit);
    AudioUnitUninitialize(testData.outputUnit);
    AudioComponentInstanceDispose(testData.outputUnit);

    delete[] testData.pdBuffer;

    return 0;
}

The raw synth tone in the receiver patch used as the test signal is actually built with the PolyBLEP oscillator I made and discussed in a previous post. So it’s also possible (and very easy) to compile custom Pure Data externals into libpd, and that’s pretty awesome! Here is a demonstration of what I’ve been talking about — testing a state-variable filter on a raw synth tone:

Pure Data & libpd Interactive Demo from Christian on Vimeo.

Building a Tone Generator for iOS using Audio Units

Over the past month or so, I’ve been working with a friend and colleague, George Hufnagl, on building an iOS app for audiophiles that includes several useful functions and references for both linear and interactive media.  The app is called “Pocket Audio Tools” and will be available sometime in mid-August for iPhone, with iPad native and OS X desktop versions to follow.  Among the functions included in the app is one that displays frequency values for all pitches in the MIDI range with adjustable A4 tuning.  As an extension of this, we decided to include playback of a pure sine wave for any pitch selected as further reference.  While standard audio playback on iOS is quite straightforward, building a tone generator requires manipulation of the actual sample data, and for that level of control, you need to access the lowest audio layer on iOS — Audio Units.

To do this, we set up an Objective-C class (simply derived from NSObject) that takes care of all the operations we need to initialize and play back our sine tone.  To keep things reasonably short, I’m just going to focus on the specifics of initializing and using Audio Units.

Here is what we need to initialize our tone generator; we’re setting up an output unit with input supplied by a callback function.

AudioComponentDescription defaultOutputDescription = { 0 };
defaultOutputDescription.componentType = kAudioUnitType_Output;
defaultOutputDescription.componentSubType = kAudioUnitSubType_RemoteIO;
defaultOutputDescription.componentManufacturer = kAudioUnitManufacturer_Apple;
defaultOutputDescription.componentFlags = 0;
defaultOutputDescription.componentFlagsMask = 0;

AudioComponent defaultOutput = AudioComponentFindNext(NULL, &defaultOutputDescription);
OSErr error = AudioComponentInstanceNew(defaultOutput, &_componentInstance);

Above, we set our type and sub type to an output Audio Unit in which its I/O interfaces directly with iOS (specified by RemoteIO).  Currently on iOS, no third-party AUs are allowed, so the only manufacturer is Apple.  The two flag fields are set to 0 as per the Apple documentation.  Next we search for the default Audio Component by passing in NULL as the first argument to the function call, and then we initialize the instance.

Once an Audio Unit instance has been created, it’s properties are customized through calls to AudioUnitSetProperty().  Here we set up the callback function used to populate the buffers with the sine wave for our tone generator.

AURenderCallbackStruct input;
input.inputProc = RenderTone;
input.inputProcRefCon = (__bridge void *)(self);
error = AudioUnitSetProperty(_componentInstance, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Input, 0, &input, sizeof(input));

After setting the callback function RenderTone (we’ll define this function later) and our user data (self in this case because we want the instance of the class we’re defining to be our user data in the callback routine), we set this property to the AU instance.  The scope of the Audio Unit refers to the context for which the property applies, in this case input.  The “0” argument is the element number, and we want the first one, so we specify “0”.

Next we need to specify the type of stream the audio callback function will be expecting.  Again we use the AudioUnitSetProperty() function call, this time passing in an instance of  AudioStreamBasicDescription.

AudioStreamBasicDescription streamFormat = { 0 };
streamFormat.mSampleRate = _sampleRate;
streamFormat.mFormatID = kAudioFormatLinearPCM;
streamFormat.mFormatFlags = kAudioFormatFlagsNativeFloatPacked | kAudioFormatFlagIsNonInterleaved;
streamFormat.mBytesPerPacket = 4; // 4 bytes for 'float'
streamFormat.mBytesPerFrame = 4; // sizeof(float) * 1 channel
streamFormat.mFramesPerPacket = 1; // 1 channel
streamFormat.mChannelsPerFrame = 1; // 1 channel
streamFormat.mBitsPerChannel = 8 * 4; // 1 channel * 8 bits/byte * sizeof(float)
error = AudioUnitSetProperty(_componentInstance, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, 0,
                                     &streamFormat, sizeof(AudioStreamBasicDescription));

The format identifier in our case is linear PCM since we’ll be dealing with non-compressed audio data.  We’ll also be using native floating point as our sample data, and because we’ll be using only one channel, our buffers don’t need to be interleaved.  The bytes per packet field is set to 4 since we’re using floating point numbers (which is of course 4 bytes in size), and bytes per frame is the same because in mono format, 1 frame is equal to 1 sample.  In uncompressed formats, we need 1 frame per packet, and we specify 1 channel for mono.  Bits per channel is just calculated as 8 * sizeof our data type, which is float.

This completes the setup of our Audio Unit instance, and we then just initialize it with a call to

AudioUnitInitialize(_componentInstance);

Before looking at the callback routine, there is one extra feature that I included in this tone generator.  Simply playing back a sine wave results in noticeable clicks and pops in the audio due to the abrupt amplitude changes, so I added fade ins and fade outs to the playback.  This is accomplished simply by storing arrays of fade in and fade out amplitude values that shape the audio based on the current playback state.  For our purposes, setting a base amplitude level of 0.85 and using linear interpolation to calculate the arrays using this value is sufficient.  Now we can examine the callback function.

OSStatus RenderTone (void* inRefCon, AudioUnitRenderActionFlags* ioActionFlags,
                     const AudioTimeStamp* inTimeStamp,
                     UInt32 inBusNumber,
                     UInt32 inNumberFrames,
                     AudioBufferList* ioData)
{
    PAToneGenerator *toneGenerator = (__bridge PAToneGenerator*)(inRefCon);
    Float32 *buffer = (Float32*)ioData->mBuffers[0].mData;

    for (UInt32 frame = 0; frame < inNumberFrames; ++frame)
    {

We only need to consider the following arguments: inRefCon, inNumberFrames, and ioData.  First we need to cast our user data to the right type (our tone generator class), and then get our buffer that we will be filling with data.  From here, we need to determine the amplitude of our sine wave based on the playback state.

switch ( toneGenerator->_state ) {
    case TG_STATE_IDLE:
        toneGenerator->_amplitude = 0.f;
        break;

    case TG_STATE_FADE_IN:
        if ( toneGenerator->_fadeInPosition < kToneGeneratorFadeInSamples ) {
            toneGenerator->_amplitude = toneGenerator->_fadeInCurve[toneGenerator->_fadeInPosition];
            ++toneGenerator->_fadeInPosition;
        } else {
            toneGenerator->_fadeInPosition = 0;
            toneGenerator->_state = TG_STATE_SUSTAINING;
        }
        break;

    case TG_STATE_FADE_OUT:
        if ( toneGenerator->_fadeOutPosition < kToneGeneratorFadeOutSamples ) {
            toneGenerator->_amplitude = toneGenerator->_fadeOutCurve[toneGenerator->_fadeOutPosition];
            ++toneGenerator->_fadeOutPosition;
        } else {
            toneGenerator->_fadeOutPosition = 0;
            toneGenerator->_state = TG_STATE_IDLE;
        }
        break;

    case TG_STATE_SUSTAINING:
        toneGenerator->_amplitude = kToneGeneratorAmplitude;
        break;

    default:
        toneGenerator->_amplitude = 0.f;
        break;
}

Once we have the amplitude, we simply fill the buffer with the sine wave.

buffer[frame] = sinf(toneGenerator->_phase) * toneGenerator->_amplitude;

toneGenerator->_phase += toneGenerator->_phase_incr;
toneGenerator->_phase = ( toneGenerator->_phase > kTwoPi ? toneGenerator->_phase - kTwoPi : toneGenerator->_phase );

One important thing to bear in mind is that the callback function is continually called after we start the Audio Unit with

AudioOutputUnitStart(_componentInstance);

We don’t need to start and stop the entire instance each time a tone is played because that could very well introduce some delays in playback, so the callback continually runs in the background while the Audio Unit is active.  That way, calls to play and stop the tone are as simple as changing the playback state.

- (void)playTone
{
    if ( _isActive) {
        _state = TG_STATE_FADE_IN;
    }
}

- (void)stopTone
{
    if ( _isActive ) {
        if ( _state == TG_STATE_FADE_IN ) {
            _state = TG_STATE_IDLE;
        } else {
            _state = TG_STATE_FADE_OUT;
        }
    }
}

The reason for the extra check in stopTone() is to prevent clicks from occuring if the playback time is very short.  In other words, if the playback state has not yet reached its sustain point, we don’t want any sound output.

To finish off, we can stop the Audio Unit by calling

AudioOutputUnitStop(_componentInstance);

and free the resources with calls to

AudioUnitUninitialize(_componentInstance);
AudioComponentInstanceDispose(_componentInstance);
_componentInstance = nil;

That completes this look at writing a tone generator for output on iOS.  Apple’s Audio Units layer is a sophisticated and flexible system that can be used for all sorts of audio needs including effects, analysis, and synthesis.  Keep an eye out for Pocket Audio Tools, coming soon to iOS!