Author Archives: Christian

Animating Views with Custom Timing Functions on iOS

View animation on iOS is a ubiquitous part of the UI. From presenting & dismissing view controllers, to navigation; and on a smaller scale, subviews like buttons and labels, animations are everywhere and are an important element of adding polish to the feel and look of an app.

iOS already provides default animations for the aforementioned actions on view controllers and navigation controllers as well as a few others, but animating subviews or customizing view/navigation controller transitions are left up to the developer. Typically this is done using the block-based animation methods in the UIView class (-animateWithDuration:animations:, etc.). One of these variants allows you to pass in an option with four predefined animation curves or timing functions: ease in, ease out, ease in-out, and linear (the default). The timing function is what drives the pacing of the animation. Linear is fine for certain properties like alpha or color, but for movement-based animations, a curve with acceleration/deceleration feels much better. However, the predefined easing functions are very subtle, and it’s very likely there are times that you want something better.

For more control and flexibility, layer-based animations are available. On iOS, each view is layer-backed by a CALayer object that defines the view’s visual appearance and geometry. In fact, when manipulating a view’s properties such as frame, center, alpha, etc., it is actually applying those changes to its backing CALayer object.

Animating CALayers is done through one of the subclasses of CAAnimation and offers additional options for defining animations, including basic and key-frame animation. Like the block-based methods of UIView, a timing function can be specified and has the same set of predefined easing curves with an additional system default option. The timing function in this case is an instance of CAMediaTimingFunction, and it also allows you to create a custom function with a call to +functionWithControlPoints::::. However, this custom timing function is created by supplying the control points of a cubic Bezier curve, and while that does offer some additional flexibility in creating more unique animations, it can’t handle piecewise functions (e.g. for bounces), multiple curves (e.g. elastic or spring-like functions), etc. To achieve those effects, you would either have to stitch animations together, or use a combination of custom Bezier curves with key-frame animations. Both approaches end up being tedious and cumbersome to deal with.

Ideally, the block-based animation methods of UIView would allow you to pass in a custom timing function block like this:

[UIView animateWithDuration:timingFunction:animations:];

where timingFunction is just a block that takes in a value for the current animation time, it’s duration, and returns a fractional value between 0 and 1. That’s all a timing function is! This would allow you to, for example, use any of the easing functions here. Or to supply your own homebuilt version. So in the true spirit of DIY (since Apple couldn’t be bothered to do it), let’s make that method!

To begin with, here is the actual signature of our method with an example of how it is used:

[UIView fs_animateWithDuration:1.2 timingFunction:FSEaseOutBack animations:^{
    aView.fs_center = newCenter;
} completion:^{
    centerYConstraint.constant = ...;
    centerXConstraint.constant = ...;
}];

If constraint-based layout is being used, the same rules apply when using other animation methods, so once an animation has completed, constraints need to be updated to reflect the new position and/or bounds of the view.

Since the block-based animation methods are class methods, that clues us in that we need to keep some global state for in-flight animations. For now, I’m not going to worry about supporting multiple concurrent or nested animations, so I’ll keep it simple and make the state static.

typedef NSTimeInterval(^FSAnimationTimingBlock)(CGFloat,CGFloat);
typedef void(^FSCompletionBlock)(void);

static const NSTimeInterval fsAnimationFrameTime = 1.0/60.0; // 60 fps
static FSAnimationTimingBlock fsAnimationTimingFunc;
static FSCompletionBlock fsAnimationCompletionFunc;
static NSTimeInterval fsAnimationDuration;
static NSMutableArray<NSTimer*> *fsAnimationTimers;
static BOOL fsAnimationInFlight = NO;
static NSUInteger fsAnimationTimerCount = 0;
static NSUInteger fsAnimationTimersCompleted = 0;

Now we create a category on UIView for our block-based animation method:

@interface UIView (FSAnimations)
@property (nonatomic) CGPoint fs_center;
@property (nonatomic) CGRect fs_bounds;
@property (nonatomic) CGFloat fs_alpha;
// Additional animatable properties to support...

+ (void)fs_animateWithDuration:(NSTimeInterval)duration timingFunction:(FSAnimationTimingBlock)timingFunc animations:(void(^)(void))animations completion:(FSCompletionBlock)completion;

@end

@implementation UIView (FSAnimations)

+ (void)fs_animateWithDuration:(NSTimeInterval)duration timingFunction:(FSAnimationTimingBlock)timingFunc animations:(void (^)(void))animations completion:(FSCompletionBlock)completion {
    // Set up global state for this animation.
    fsAnimationTimingFunc = [timingFunc copy];
    fsAnimationCompletionFunc = [completion copy];
    fsAnimationDuration = duration;
    fsAnimationInFlight = YES;
    fsAnimationTimers = [NSMutableArray array];
    
    // Run the animation block which queues up the animations for each property of the views contained in the block.
    animations();
    
    // Run the animations.
    FSLaunchQueuedAnimations();
}

@end

The reason we define custom animatable properties like fs_center, fs_alpha, etc. instead of setting them directly (i.e. aView.center = newCenter;) will become clearer as we move forward. In the method above, the global state of the animation is set up and then we execute the block that defines which properties on which views will be animated. Recall that we had a line that looks like this in the example above:

aView.fs_center = newCenter;

This code is executed by the animations block in the method above, and here is what that code does:

- (void)setFs_center:(CGPoint)newCenter {
    NSAssert(fsAnimationInFlight, @"Property %@ can only be called inside a 'fs_animateWithDuration:...' block.", NSStringFromSelector(_cmd));
    
    CGPoint baseValue = self.center;
    CGPoint targetValue = newCenter;
    
    __block NSTimeInterval currentTime = 0.0;
    NSTimer *timer = [NSTimer timerWithTimeInterval:fsAnimationFrameTime repeats:YES block:^(NSTimer * _Nonnull timer) {
        currentTime += timer.timeInterval;
        NSTimeInterval t = fsAnimationTimingFunc(currentTime, fsAnimationDuration);
        CGPoint val = CGPointMake(lerp(baseValue.x, t, targetValue.x), lerp(baseValue.y, t, targetValue.y));
        self.center = val;
        
        FSCheckTimerAndCompleteIfLast(timer, currentTime);
    }];
    
    [fsAnimationTimers addObject:timer];
}

(We can now see the reason for using custom properties like fs_center instead of the view’s center property. While it is possible to override existing properties in a category, we need to set the new value of the property inside it, which in this case would cause infinite recursion.)

When an animatable fs_ property is called inside an animation block, it sets up the local state for that property, including its timer that is queued up to run after doing this for all properties in the block. Inside the timer’s block, we can see that the timing function we supply is used to calculate a value, t, used to blend between the initial state and the target state of the property (lerp here is just a normal linear interpolation function). The intermediate value of the property is set, and the block then checks to see if this timer has reached the end of the animation’s duration. If it has, and it’s the last timer to do so, it will run the animation’s completion block we provided and then reset it’s global state.

static void
FSCheckTimerAndCompleteIfLast(NSTimer *timer, NSTimeInterval currentTime) {
    if (currentTime >= fsAnimationDuration) {
        [timer invalidate];
        
        if (++fsAnimationTimersCompleted == fsAnimationTimerCount) {
            fsAnimationInFlight = NO;
            fsAnimationCompletionFunc();

            fsAnimationCompletionFunc = nil;
            fsAnimationTimingFunc = nil;            
            fsAnimationTimerCount = 0;
            fsAnimationTimersCompleted = 0;
            [fsAnimationTimers removeAllObjects];
            fsAnimationTimers = nil;
        }
    }
}

The function to start the animations simply runs through all the queued-up timers in the array and schedules them on the current run loop (which must be on the main thread since all drawing in Core Graphics must be done on the main thread). This is done to keep the animations as closely synced as possible as opposed to starting each timer in their respective property’s setter (which could cause problems with short durations and/or a long list of views and properties being animated).

static void
FSLaunchQueuedAnimations() {
    fsAnimationTimerCount = fsAnimationTimers.count;
    [fsAnimationTimers enumerateObjectsUsingBlock:^(NSTimer * _Nonnull timer, NSUInteger idx, BOOL * _Nonnull stop) {
        [[NSRunLoop currentRunLoop] addTimer:timer forMode:NSDefaultRunLoopMode];
    }];
}

The only thing remaining is an example of an easing function that can be used as a custom timing function for this method. This is one of Robert Penner’s Easing Functions (the link referenced above is a handy visual aid for them all).

NSTimeInterval(^FSEaseOutBack)(CGFloat, CGFloat) = ^NSTimeInterval(CGFloat t, CGFloat d) {
    CGFloat s = 1.70158;
    t = t/d - 1.0;
    return (t * t * ((s + 1.0) * t + s) + 1.0);
};

With that, we have a concise and convenient way of animating using a much wider variety of easing functions. This method can quite easily be used to customize existing transition animations on view/navigation controllers using their UIViewControllerTransitioningDelegate property.

Finally, here is a short demo of several animations using the method we just created. The timing functions used in this demo include EaseOutBack, EaseOutElastic, EaseOutBounce, and EaseInBack.

animation_demo

Advertisements

Improving the interface to NSUserDefaults using the Objective-C Runtime

The NSUserDefaults class is commonly used in iOS/macOS apps to store user settings or configuration options, and can also be a convenient way of caching a small number of values or objects while an app is active. It’s interface, while not terrible, is a little clunky and relies on strings to identify the value you wish to store in the defaults database. Typical usage looks like this:

NSUserDefaults *defaults = [NSUserDefaults standardUserDefaults];
//...
BOOL hasAppRun = [defaults boolForKey:@"HasAppRun"];
if (!hasHappRun) {
    // App's first launch
    //...
    [defaults setBool:YES forKey:@"HasAppRun"];
}

Other than being a little wordy, the main issue here is dealing with the strings used to identify the defaults property you are interested in. Either your app’s codebase will be littered with magic string constants, or you have an ever evolving global strings table (or file, or something else to manage your app’s global string constants). What we would like is something a bit cleaner, like this:

MyUserDefaults *defaults = [MyUserDefaults sharedDefaults];
//...
if (!defaults.hasAppRun) {
    // App's first launch
    //...
    defaults.hasAppRun = YES;
}

In its simplest case, MyUserDefaults would just be a wrapper around NSUserDefaults, but as we see below, this only adds to the amount of boilerplate we have to manage for each new default property that is added.

@implementation MyUserDefaults

- (void)setHasAppRun:(BOOL)hasAppRun {
    [[NSUserDefaults standardUserDefaults] setBool:hasAppRun forKey:@"HasAppRun"];
}

- (BOOL)hasAppRun {
    return [[NSUserDefaults standardUserDefaults] boolForKey:@"HasAppRun"];
}

@end

What we can do instead is take advantage of the dynamic nature of Objective-C to automatically handle any properties that are added (or removed) from MyUserDefaults. Once this has been set up in a base class (we’ll call it FSUserDefaults), MyUserDefaults only needs to inherit from this class and then declare its properties like this:

@interface MyUserDefaults : FSUserDefaults
@property (nonatomic) BOOL hasAppRun;
@property (nonatomic) NSString *userName;
// Other properties...
@end

@implementation MyUserDefaults
@dynamic hasAppRun;
@dynamic userName;
@end

Just add the property to the interface and declare it as @dynamic in the implementation. That’s it! So how is this going to work?

First of all, methods in Objective-C are really just C functions with two hidden parameters: self and _cmd, where self is the object on which the method was invoked, and _cmd is the selector. A selector in Objective-C is just a way to identify a method, but we can get some very useful information about a property just from its selector. Before we get to that, however, we need to have a look at what @dynamic is.

Declaring a property as @dynamic tells the Objective-C compiler that its implementation methods (the setter & getter) will be resolved at runtime. Normally the compiler generates these for you, as well as synthesizing the instance variable that backs the property. e.g.

@implementation MyUserDefaults

// Unless you override either the setter or getter, these are normally auto-generated by the compiler.
- (void)setUserName:(NSString *)userName {
    _userName = userName;
}

- (NSString *)userName {
    return _userName;
}

@end

However, with the @dynamic directive, no implementation methods for the property exist at first, which would typically cause an exception to be raised if the property is accessed in any way. Before this happens though, the Objective-C runtime gives you an opportunity to either handle the missing implementation or forward the message invocation to another object. To handle it on the receiving object, we overwrite +(BOOL)resolveInstanceMethod:(SEL)sel inherited from NSObject, and as stated above, we can get all the information we need from the given selector argument of this method. Furthermore, to dynamically add a method implementation to a class, we use the function class_addMethod(Class, SEL, IMP, const char*). We already have the first two arguments to the function (the class is just our FSUserDefaults base class). IMP is a function pointer to the method implementation we are providing, and the character string is the type encoding of the method (its return type and arguments). This is the information we need going forward.

Let’s use the userName property as an example. When we access this property by trying to set a value on it for the first time, +resolveInstanceMethod: will be called and the selector argument will be -setUserName:. We can get the name of the selector as a string from the Objective-C runtime function sel_getName(SEL), which takes a selector as its argument. In this case, it will be “setUserName:”, and we can get the property name by stripping off “set” from the beginning and making the ‘U’ lowercase (if instead the selector is a getter, its string is already equal to the property name). The property name string can then be used to retrieve the actual property from the class using the function class_getProperty(Class, const char*).

Here is what we have so far:

+ (BOOL)resolveInstanceMethod:(SEL)sel {
    const char *selectorName = sel_getName(sel);
    const char *propertyName = selectorName;
    BOOL isSetter = NO;
    
    if ((strlen(selectorName) > 3) && (strncmp("set", selectorName, 3) == 0)) {
        propertyName = ...; // Strip off "set" and make first character lowercase.
        isSetter = YES;
    }
    
    objc_property_t property = class_getProperty(self, propertyName);
    if (isSetter) {
        free((void *)propertyName);
    }

    //...
}

Given the property, we can now get its attributes as a string by calling property_getAttributes(objc_property_t). This formatted string contains all the information relating to the property, including its type, the class name (if type is an object type), and attributes such as readonly, copy, etc. The two we are interested in are the type and in one special case, the class name.

The formatted attribute string always begins with a ‘T’, followed immediately by a character code that indicates its type corresponding to what @encode returns when given a type. e.g. ‘i’ is int, ‘f’ is float, ‘I’ is unsigned int, etc. With this information we can determine which method implementation (IMP) is needed for the given selector as well as the type encoding for the class_addMethod function.

Recall that Objective-C methods are just C functions with two parameters (id for the object the method is called on , and the selector) followed by the arguments. IMP is just a function pointer to this C function (its declaration is id(*IMP)(id, SEL, ...)). So based on the type that we retrieved from the property’s attribute string, we assign our method implementation to the proper C function, which is simply included and defined in the file. As our example uses the userName property that has a type of NSString, but which ultimately is a type id for a generic object, we define this C function as follows:

static void idSetter(id self, SEL _cmd, id value) {
    const char *propertyName = ...; // Get property name from sel_getName(_cmd), stripping away "set" and making first character lowercase.
    [[NSUserDefaults standardUserDefaults] setObject:value forKey:[NSString stringWithUTF8String:propertyName]];
    free((void *)propertyName);
}

A function similar to that above is required for each type we need to handle, but fortunately NSUserDefaults has only 6 types we need to deal with: id (objects), NSURL (special case), BOOL, NSInteger, float, and double. NSUserDefaults handles NSURL objects differently than a normal id type, so this is where we need the class name from the properties attributes string. Immediately following the type encoding character (which is ‘@’ in the case of an object) is the class name. We simply check to see if this is equal to “NSURL”, and if it is, select the corresponding IMP function for NSURL instead of the generic id variant.

Above I gave the setter version of the IMP function; the getter is very much the same except it returns a value and does not pass a value as argument:

static id idGetter(id self, SEL _cmd) {
    const char *propertyName = ...; // Get property name from sel_getName(_cmd).
    id value = [[NSUserDefaults standardUserDefaults] objectForKey:[NSString stringWithUTF8String:propertyName]];
    free((void *)propertyName);
    return value;
}

In the final version of the FSUserDefaults class, I have used a lot of C macro magic to avoid having to duplicate the above setter and getter functions for each supported type, but it is given here in its simpler form for readability purposes. (The Github link to the project can be found below).

Finally, we need the type encoding string to pass to class_addMethod that indicates what the signature of the IMP function we are adding is. Since the first two arguments are always id and SEL, this string has the format “r@:a” where ‘r’ is the return type and ‘a’ is the argument type (the second and third character must always be ‘@’ and ‘:’). The type encoding string that corresponds to our example is then “v@:@”, where v indicates a void return type.

We can now complete the implementation of +resolveInstanceMethod: by calling class_addMethod and returning YES to tell the runtime system we have dynamically added the method for this selector.

+ (BOOL)resolveInstanceMethod:(SEL)sel {
    const char *selectorName = sel_getName(sel);
    const char *propertyName = selectorName;
    BOOL isSetter = NO;
    
    if ((strlen(selectorName) > 3) && (strncmp("set", selectorName, 3) == 0)) {
        propertyName = ...; // Strip off "set" and make first character lowercase.
        isSetter = YES;
    }
    
    objc_property_t property = class_getProperty(self, propertyName);
    if (isSetter) {
        free((void *)propertyName);
    }

    if (property != NULL) {
        const char *propertyAttrs = property_getAttributes(property);
        char propertyType = propertyAttrs[1];
        
        const char *methodTypeString;
        int index;
        if (isSetter) {
            methodTypeString = "v@:_";
            index = 3;
        } else {
            methodTypeString = "_@:";
            index = 0;
        }
        
        char *typeEncodingString = malloc(sizeof(char) * (strlen(methodTypeString) + 1));
        strlcpy(typeEncodingString, methodTypeString, strlen(methodTypeString) + 1);
        typeEncodingString[index] = propertyType;
        
        IMP methodImpl = ...; // Select and set C function corresponding to propertyType
        
        class_addMethod(self, sel, methodImpl, typeEncodingString);
        free(typeEncodingString);
        
        return YES;
    }
    
    return [super resolveInstanceMethod:sel];
}

This class can then be dropped into any new iOS or macOS app for a much cleaner way of using NSUserDefaults. I created this class a couple of years ago, and have used it in almost every app since.

There are many other, powerful and effective ways to use the Objective-C runtime. Reflection in general is great for metaprogramming, building tools, debugging, and generic serialization.

This project is available on Github here.

UIResponder: An Overlooked Message-Passing Mechanism on iOS

Wow, it’s been awhile since I last posted on here! I am still doing a little bit of audio programming, but only as part of the audio system for a game/game engine I’m building from scratch. Aside from that, I’ve been doing a lot of iOS programming as part of my day job, and that got me thinking that I should diversify my blog and talk about anything programming-related that I find interesting and/or valuable. So, on to the topic at hand.

When developing an iOS app, before too long you will run into the issue of how to deliver messages or events to other parts of your app. These may include any touch events, actions, completions (e.g. as a result of async operations), or other messages, and there are several ways in which to handle these, each with their own pros/cons depending on context. Perhaps the most well-known is the delegate, which is used extensively throughout UIKit for everything from UITableViews and UICollectionViews to UITextFields and UIPickerViews. Blocks are another option that can replace delegates in some cases, and typically allow you to initialize and define behaviour all in one place without the disjointness of delegates.

For example, let’s say we have a custom lightweight table view. Blocks allow us to do this:

NSArray *match = @[@"Two", @"Four", @"Eight"];
ATableView *tv = [ATableView new];
tv.rowHeight = 44.0;
tv.data = @[@"One", @"Two", @"Three"];
tv.onSelected = ^void(NSString *item) {
    if ([match containsObject:item]) {
        // Do something...
    }
};

Yet another tool at our disposal are notifications, which are handled by NSNotificationCenter on iOS (and macOS). This is a broadcast system that delivers messages to any part of your app that may be listening. However, notifications do require some resource management as observers need to be removed once they are no longer interested in being notified (or during shutdown/deactivation). This makes them a little heavy-handed for simple message passing.

I’ll also briefly mention two final options for the sake of completion (though there may be others that I’m forgetting at the moment). The target-selector action used by UIControls adds the target-selector pair to a list that the control invokes when a certain event occurs. And finally NSInvocations encapsulate the message-sending mechanism of Objective-C/Swift into an object that can be passed around and invoked by different parts of your app.

So, why UIResponder, and how does it fit into this? The UIResponder class forms the foundation of handling events in UIKit, and implements methods that respond to touch events, motion events, etc. Most importantly for our case though, is its nextResponder property. UIKit manages what is called the responder chain for all objects that inherit from UIResponder and that are currently visible or active in the app. This includes views, buttons, view controllers, the window, and even the application itself. A more detailed look at the responder chain and how each object establishes its next responder connection can be seen here, but to put it simply, responder objects “walk up” the responder chain via the nextResponder property until the event is handled or there are no responders left in the chain. This behaviour can be leveraged for sending messages between responder objects using categories on UIResponder.

As an example, let’s return to the custom table view referenced above because this is something I implemented recently in an app I am working on. Instead of using a delegate to notify an object that a row has been selected, this table view uses a category on UIResponder.

// Declared in ATableView.h
@interface UIResponder (ATableView)
- (void)fs_tableView:(ATableView *)tableView selectedRow:(ATableViewRow *)row atIndex:(NSUInteger)index;
@end

The implementation of this method looks like this:

// Defined in ATableView.m
@implementation UIResponder (ATableView)
- (void)fs_tableView:(ATableView *)tableView selectedRow:(ATableViewRow *)row atIndex:(NSUInteger)index {
    [[self nextResponder] fs_tableView:tableView selectedRow:row atIndex:index];
}
@end

In fact, that implementation is almost identical no matter how you define your method in the category. Only the method name and (possibly) signature change, but the basic mechanism of invoking it on the next responder stays the same. In several cases I have had methods that differ only by name and have the same signature ((id)sender), so the implementation can further be simplified using C macros:

#define FS_UIRESPONDER_IMPL(name)       \
- (void)name:(id)sender {               \
    [[self nextResponder] name:sender]; \
}

@implementation UIResponder (AControl)
FS_UIRESPONDER_IMPL(fs_viewItem)
FS_UIRESPONDER_IMPL(fs_editItem)
FS_UIRESPONDER_IMPL(fs_deleteItem)
@end

And that’s it for setup! To send this event when a row has been selected, the table view would invoke this method when it detects a touch up event inside a table row. Here is how it looks in my case:

- (void)touchesEnded:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event {
    UITouch *touch = touches.allObjects.firstObject;
    if (touch) {
        CGPoint touchPoint = [touch locationInView:self];
        
        ATableViewRow *selectedRow = ...
        // Determine which, if any, row has been selected.
        
        if (selectedRow) {
            [self fs_tableView:self selectedRow:selectedRow atIndex:index];
        }
    }
        
    [super touchesEnded:touches withEvent:event];
}

To be notified of this message/event, any UIResponder object implements the method and handles any logic accordingly, and invokes super if the event should continue through the responder chain. In this case, it would typically be in a view controller. e.g.:

// Implemented in the view controller containing the table view
- (void)fs_tableView:(ATableView *)tableView selectedRow:(ATableViewRow *)row atIndex:(NSUInteger)index {
    // Do something...
    // Call super if needed. i.e. if an ancestor also wants to know about this event.
    // [super fs_tableView:tableView selectedRow:row atIndex:index];
}

This strategy is also great for when table view cells need to send messages. For example, if a cell has 1 or more buttons, you can invoke the UIResponder category method in the button’s selector to pass the event up the responder chain, which will pass through the table view, it’s superview if one exists, and on to the view controller where you can handle the event.

// Defined in ATableViewRow.m
// A UIButton either has it's action set via IBAction in Interface Builder, or added with -addTarget:action:forControlEvents:
- (IBAction)buttonPressed:(id)sender {
    [self fs_editAction:sender];
}

I find this to be a nice alternative to delegates or notifications when it may be somewhat cumbersome to pass instances around and when you don’t want to have to think about subscribing & unsubscribing observers for notifications. It offers similar flexibility that delegates have in that you can define the method signatures however you want with the added benefit of being able to notify more than one responder very easily.

As a final example, I used this in a custom popover bubble view to notify other views and view controllers when the popover was dismissed by the user, following a similar pattern to the –viewWillDisappear:, –viewDidDisappear: pair of methods that exist on UIViewControllers. For the custom popover, I have two category methods on UIResponder called –fs_popoverWillDismiss: and –fs_popoverDidDismiss: that are handled in the view controller that presented the popover, updating the UI with some fade animations and re-enabling some controls. For such a simple, lightweight view this is a nice, concise way for it to pass along its events.

Pure Data and libpd: Integrating with Native Code for Interactive Testing

Over the past couple of years, I’ve built up a nice library of DSP code, including effects, oscillators, and utilities. One thing that always bothered me however, is how to test this code in an efficient and reliable way. The two main methods I have used in the past have their pros and cons, but ultimately didn’t satisfy me.

One is to process an effect or generate a source into a wave file that I can open with an audio editor so I can listen to the result and examine the output. This method is okay, but it is tedious and doesn’t allow for real-time adjustment of parameters or any sort of instant feedback.

For effects like filters, I can also generate a text file containing the frequency/phase response data that I can view in a plotting application. This is useful in some ways, but this is audio — I want to hear it!

Lately I’ve gotten to know Pure Data a little more, so I thought about using it for interactive testing of my DSP modules. On its own, Pure Data does not interact with code of course, but that’s where libpd comes in. This is a great library that wraps up much of Pure Data’s functionality so that you can use it right from your own code (it works with C, C++, Objective-C, Java, and others). Here is how I integrated it with my own code to set up a nice flexible testing framework; and this is just one application of using libpd and Pure Data together — the possiblities go far beyond this!

First we start with the Pure Data patches. The receiver patch is opened and maintained in code by libpd, and has two responsiblities: 1) generate a test tone that the effect is applied to, and 2) receive messages from the control patch and dispatch them to C++.

Receiver patch, opened by libpd.

Receiver patch, opened by libpd.

The control patch is opened in Pure Data and acts as the interactive patch. It has controls for setting the frequency and volume of the synthesizer tone that acts as the source, as well as controls for the filter effect that is being tested.

Control patch, opened in Pure Data, and serves as the interactive UI for testing.

Control patch, opened in Pure Data, and serves as the interactive UI for testing.

As can be seen from the patches above, they communicate to each other via the netsend/netreceive objects by opening a port on the local machine. Since I’m only sending simple data to the receiver patch, I opted to use UDP over TCP as the network protocol. (Disclaimer: my knowledge of network programming is akin to asking “what is a for loop”).

Hopefully the purpose of these two patches is clear, so we can now move on to seeing how libpd brings it all together in code. It is worth noting that libpd does not output audio to the hardware, it only processes the data. Pure Data, for example, commonly uses Portaudio to send the audio data to the sound card, but I will be using Core Audio instead. Additionally, I’m using the C++ wrapper from libpd.

An instance of PdBase is first created with the desired intput/output channels and sample rate, and a struct contains data that will need to be held on to that will become clear further on.

struct TestData
{
    AudioUnit outputUnit;
    EffectProc effectProc;

    PdBase* pd;
    Patch pdPatch;
    float* pdBuffer;
    int pdTicks;
    int pdSamplesPerBlock;

    CFRingBuffer<float> ringBuffer;
    int maxFramesPerSlice;
    int framesInReserve;
};

int main(int argc, const char * argv[])
{
    PdBase pd;
    pd.init(0, 2, 48000); // No input needed for tests.

    TestData testData;
    testData.pd = &pd;
    testData.pdPatch = pd.openPatch("receiver.pd", ".");
}

Next, we ask Core Audio for an output Audio Unit that we can use to send audio data to the sound card.

int main(int argc, const char * argv[])
{
    PdBase pd;
    pd.init(0, 2, 48000); // No input needed for tests.

    TestData testData;
    testData.pd = &pd;
    testData.pdPatch = pd.openPatch("receiver.pd", ".");

    {
        AudioComponentDescription outputcd = {0};
        outputcd.componentType = kAudioUnitType_Output;
        outputcd.componentSubType = kAudioUnitSubType_DefaultOutput;
        outputcd.componentManufacturer = kAudioUnitManufacturer_Apple;

        AudioComponent comp = AudioComponentFindNext(NULL, &outputcd);
        if (comp == NULL)
        {
            std::cerr << "Failed to find matching Audio Unit.\n";
            exit(EXIT_FAILURE);
        }

        OSStatus error;
        error = AudioComponentInstanceNew(comp, &testData.outputUnit);
        if (error != noErr)
        {
            std::cerr << "Failed to open component for Audio Unit.\n";
            exit(EXIT_FAILURE);
        }

        Float64 sampleRate = 48000;
        UInt32 dataSize = sizeof(sampleRate);
        error = AudioUnitSetProperty(audioUnit,
                                     kAudioUnitProperty_SampleRate,
                                     kAudioUnitScope_Input,
                                     0, &sampleRate, dataSize);

        AudioUnitInitialize(audioUnit);
    }
}

The next part needs some explanation, because we need to consider how the Pure Data patch interacts with Core Audio’s render callback function that we will provide. This function will be called continuously on a high priority thread with a certain number of frames that we need to fill with audio data. Pure Data, by default, processes 64 samples per channel per block. It’s unlikely that these two numbers (the number of frames that Core Audio wants and the number of frames processed by Pure Data) will always agree. For example, in my initial tests, Core Audio specified its maximum block size to be 512 frames, but it actually asked for 470 & 471 (alternating) when it ran. Rather than trying to force the two to match block sizes, I use a ring buffer as a medium between the two — that is, read sample data from the opened Pure Data patch into the ring buffer, and then read from the ring buffer into the buffers provided by Core Audio.

Fortunately, Core Audio can be queried for the maximum number of frames it will ask for, so this will determine the number of samples we read from the Pure Data patch. We can read a multiple of Pure Data’s 64-sample block by specifying a value for “ticks” in libpd, and this value will just be equal to the maximum frames from Core Audio divided by Pure Data’s block size. The actual number of samples read/processed will of course be multiplied by the number of channels (2 in this case for stereo).

The final point on this is to handle the case where the actual number of frames processed in a block is less than the maximum. Obviously it would only take a few blocks for the ring buffer’s write pointer to catch up with the read pointer and cause horrible audio artifacts. To account for this, I make the ring buffer twice as long as the number of samples required per block to give it some breathing room, and also keep track of the number of frames in reserve currently in the ring buffer at the end of each block. When this number exceeds the number of frames being processed in a block, no processing from the patch occurs, giving the ring buffer a chance to empty out its backlog of frames.

int main(int argc, const char * argv[])
{
    <snip> // As above.

    UInt32 framesPerSlice;
    UInt32 dataSize = sizeof(framesPerSlice);
    AudioUnitGetProperty(testData.outputUnit,
                         kAudioUnitProperty_MaximumFramesPerSlice,
                         kAudioUnitScope_Global,
                         0, &framesPerSlice, &dataSize);
    testData.pdTicks = framesPerSlice / pd.blockSize();
    testData.pdSamplesPerBlock = (pd.blockSize() * 2) * testData.pdTicks; // 2 channels for stereo output.
    testData.maxFramesPerSlice = framesPerSlice;

    AURenderCallbackStruct renderCallback;
    renderCallback.inputProc = AudioRenderProc;
    renderCallback.inputProcRefCon = &testData;
    AudioUnitSetProperty(testData.outputUnit,
                         kAudioUnitProperty_SetRenderCallback,
                         kAudioUnitScope_Input,
                         0, &renderCallback, sizeof(renderCallback));

    testData.pdBuffer = new float[testData.pdSamplesPerBlock];
    testData.ringBuffer.resize(testData.pdSamplesPerBlock * 2); // Twice as long as needed in order to give it some buffer room.
    testData.effectProc = EffectProcess;
}

With the output Audio Unit and Core Audio now set up, let’s look at the render callback function. It reads the audio data from the Pure Data patch if needed into the ring buffer, which in turn fills the buffer list provided by Core Audio. The buffer list is then passed on to the callback that processes the effect being tested.

OSStatus AudioRenderProc (void *inRefCon,
                          AudioUnitRenderActionFlags *ioActionFlags,
                          const AudioTimeStamp *inTimeStamp,
                          UInt32 inBusNumber,
                          UInt32 inNumberFrames,
                          AudioBufferList *ioData)
    {
        TestData *testData = (TestData *)inRefCon;

        // Don't require input, but libpd requires valid array.
        float inBuffer[0];

        // Only read from Pd patch if the sample excess is less than the number of frames being processed.
        // This effectively empties the ring buffer when it has enough samples for the current block, preventing the
        // write pointer from catching up to the read pointer.
        if (testData->framesInReserve < inNumberFrames)
        {
            testData->pd->processFloat(testData->pdTicks, inBuffer, testData->pdBuffer);
            for (int i = 0; i < testData->pdSamplesPerBlock; ++i)
            {
                testData->ringBuffer.write(testData->pdBuffer[i]);
            }
            testData->framesInReserve += (testData->maxFramesPerSlice - inNumberFrames);
        }
        else
        {
            testData->framesInReserve -= inNumberFrames;
        }

        // NOTE: Audio data from Pd patch is interleaved, whereas Core Audio buffers are non-interleaved.
        for (UInt32 frame = 0; frame < inNumberFrames; ++frame)
        {
            Float32 *data = (Float32 *)ioData->mBuffers[0].mData;
            data[frame] = testData->ringBuffer.read();
            data = (Float32 *)ioData->mBuffers[1].mData;
            data[frame] = testData->ringBuffer.read();
        }

        if (testData->effectCallback != nullptr)
        {
            testData->effectCallback(ioData, inNumberFrames);
        }

        return noErr;
    }

Finally, let’s see the callback function that processes the filter. It’s about as simple as it gets — it just processes the filter effect being tested on the audio signal that came from Pure Data.

void EffectProcess(AudioBufferList* audioData, UInt32 numberOfFrames)
{
    for (UInt32 frame = 0; frame < numberOfFrames; ++frame)
    {
        Float32 *data = (Float32 *)audioData->mBuffers[0].mData;
        data[frame] = filter.left.sample(data[frame]);
        data = (Float32 *)audioData->mBuffers[1].mData;
        data[frame] = filter.right.sample(data[frame]);
    }
}

Not quite done yet, though, since we need to subscribe the open libpd instance of Pure Data to the messages we want to receive from the control patch. The messages received will then be dispatched inside the C++ code to handle appropriate behavior.

int main(int argc, const char * argv[])
{
    <snip> // As above.

    pd.subscribe("fromPd_filterfreq");
    pd.subscribe("fromPd_filtergain");
    pd.subscribe("fromPd_filterbw");
    pd.subscribe("fromPd_filtertype");
    pd.subscribe("fromPd_quit");

    // Start audio processing.
    pd.computeAudio(true);
    AudioOutputUnitStart(testData.outputUnit);

    bool running = true;
    while (running)
    {
        while (pd.numMessages() > 0)
        {
            Message msg = pd.nextMessage();
            switch (msg.type)
            {
                case pd::PRINT:
                    std::cout << "PRINT: " << msg.symbol << "\n";
                    break;

                case pd::BANG:
                    std::cout << "BANG: " << msg.dest << "\n";
                    if (msg.dest == "fromPd_quit")
                    {
                        running = false;
                    }
                    break;

                case pd::FLOAT:
                    std::cout << "FLOAT: " << msg.num << "\n";
                    if (msg.dest == "fromPd_filterfreq")
                    {
                        filter.left.setFrequency(msg.num);
                        filter.right.setFrequency(msg.num);
                    }
                    else if (msg.dest == "fromPd_filtertype")
                    {
                        // (filterType is just an array containing the available filter types.)
                        filter.left.setState(filterType[(unsigned int)msg.num]);
                        filter.right.setState(filterType[(unsigned int)msg.num]);
                    }
                    else if (msg.dest == "fromPd_filtergain")
                    {
                        filter.left.setGain(msg.num);
                        filter.right.setGain(msg.num);
                    }
                    else if (msg.dest == "fromPd_filterbw")
                    {
                        filter.left.setBandwidth(msg.num);
                        filter.right.setBandwidth(msg.num);
                    }
                    break;

                default:
                    std::cout << "Unknown Pd message.\n";
                    std::cout << "Type: " << msg.type << ", " << msg.dest << "\n";
                    break;
            }
        }
    }
}

Once the test has ended by banging the stop_test button on the control patch, cleanup is as follows:

int main(int argc, const char * argv[])
{
    <snip> // As above.

    pd.unsubscribeAll();
    pd.computeAudio(false);
    pd.closePatch(testData.pdPatch);

    AudioOutputUnitStop(testData.outputUnit);
    AudioUnitUninitialize(testData.outputUnit);
    AudioComponentInstanceDispose(testData.outputUnit);

    delete[] testData.pdBuffer;

    return 0;
}

The raw synth tone in the receiver patch used as the test signal is actually built with the PolyBLEP oscillator I made and discussed in a previous post. So it’s also possible (and very easy) to compile custom Pure Data externals into libpd, and that’s pretty awesome! Here is a demonstration of what I’ve been talking about — testing a state-variable filter on a raw synth tone:

Pure Data & libpd Interactive Demo from Christian on Vimeo.

Custom Pure Data External: PolyBLEP Sawtooth Oscillator

I have recently gotten back into Pure Data in the interest of familiarizing myself more with it, and perhaps to integrate it with Unity for a future project. For anyone who may not be familiar with it, more info on it can be found here. It’s a great environment for building synthesizers and experiment with a great variety of audio processing techniques in a modular, graphically-based way.

Oscillators are the foundation of synthesizers of course, and Pure Data comes with two modules that serve this purpose: osc~ for sine waves, and phasor~ for a sawtooth (it’s actually just a rectified ramp in the range 0 – 1, so it needs a couple of minor additions to turn it into a sawtooth wave). However, the phasor~ module (not being band-limited) suffers from aliasing noise. This can be avoided either by passing it through an anti-aliasing filter, or by using the sinesum message to construct a sawtooth wave according to the Fourier Theorem of summing together sine waves (i.e. creating a wavetable). Both of these methods have some drawbacks though.

Sawtooth wave using Pure Data's phasor~ module.

Sawtooth wave using Pure Data’s phasor~ module.

Constructing a sawtooth from a wavetable of 12 harmonics (source: http://en.flossmanuals.net/pure-data/).

Constructing a sawtooth from a wavetable of 12 harmonics (source: http://en.flossmanuals.net/pure-data/).

A robust anti-aliasing filter is often computationally expensive, whereas wavetables sacrifice the quality of the waveform and are not truly band-limited across its entire range (unless the wavetable is divided into sections constructed with decreasing number of harmonics as the frequency increases). A wavetable sawtooth typically lacks richness and depth due to the lack of harmonics in its lower range. These issues can be fixed by constructing the sawtooth using band-limited steps (BLEPs), which are based on band-limited impulse trains (BLITs), at the expense of some increased complexity in the algorithm. And fortunately, Pure Data allows for custom modules to be built, written in C, that can then be used in any patches just like a normal module.

The process behind this method is to construct a naive sawtooth wave using the equation

sample = (2 * (f / SR * t)) – 1

where f is frequency (Hz), SR is sample rate (Hz), and t is a time constant (in this case, the sample count). At the waveform’s discontinuities, we insert a PolyBLEP to round the sharp corners, which is calculated by a low order polynomial equation (hence, “Poly”nomial Band-Limited Step). The polynomial equation is of the form

x2 + 2x + 1

The equation for the PolyBLEP is based on the discussion of the topic on this KVR forum thread. The concensus in the thread is that the PolyBLEP is far superior to using wavetables, but sounds slightly duller than using minBLEPs (which are far more complicated still, and require precalculation of the BLEP using FFT before integrating with the naive waveform). PolyBLEP oscillators strike a good balance between high quality, minimal aliasing noise, and reasonable complexity.

Pure Data patch using the polyblep~ module for a sawtooth wave.

Pure Data patch using the polyblep~ module for a sawtooth wave.

Here is a quick sample of the PolyBLEP sawtooth recorded from Pure Data. Of course, PolyBLEPs can be used with other waveforms as well, including triangle and square waves, but the sawtooth is a popular choice for synths due to it’s rich sound.

The GitHub page can be found here, with projects for Mac (Xcode 5) and Windows (Visual Studio 2010).

Dynamics Processing: Compressor/Limiter, part 3

In part 1 of this series of posts, I went over creating an envelope detector that detects both peak amplitude and RMS values. In part 2, I used it to create a compressor/limiter. There were two common features missing from that compressor plug-in, however, that I will go over in this final part: soft knee and lookahead. Also, as I have stated in the previous parts, this effect is being created with Unity in mind, but the theory and code is easily adaptable to other uses.

Let’s start with lookahead since it is very straightforward to implement. Lookahead is common in limiters and compressors because any non-zero attack/release times will cause the envelope to lag behind the audio due to the filtering, and as a result, it won’t attenuate the right part of the signal corresponding to the envelope. This can be fixed by delaying the output of the audio so that it lines up with the signal’s envelope. The amount we delay the audio by is the lookahead time, so an extra field is needed in the compressor:

public class Compressor : MonoBehaviour
{
    [AudioSlider("Threshold (dB)", -60f, 0f)]
    public float threshold = 0f;		// in dB
    [AudioSlider("Ratio (x:1)", 1f, 20f)]
    public float ratio = 1f;
    [AudioSlider("Knee", 0f, 1f)]
    public float knee = 0.2f;
    [AudioSlider("Pre-gain (dB)", -12f, 24f)]
    public float preGain = 0f;			// in dB, amplifies the audio signal prior to envelope detection.
    [AudioSlider("Post-gain (dB)", -12f, 24f)]
    public float postGain = 0f;			// in dB, amplifies the audio signal after compression.
    [AudioSlider("Attack time (ms)", 0f, 200f)]
    public float attackTime = 10f;		// in ms
    [AudioSlider("Release time (ms)", 10f, 3000f)]
    public float releaseTime = 50f;		// in ms
    [AudioSlider("Lookahead time (ms)", 0, 200f)]
    public float lookaheadTime = 0f;	// in ms

    public ProcessType processType = ProcessType.Compressor;
    public DetectionMode detectMode = DetectionMode.Peak;

    private EnvelopeDetector[] m_EnvelopeDetector;
    private Delay m_LookaheadDelay;

    private delegate float SlopeCalculation (float ratio);
    private SlopeCalculation m_SlopeFunc;

    // Continued...

I won’t actually go over implementing the delay itself since it is very straightforward (it’s just a simple circular buffer delay line). The one thing I will say is that if you want the lookahead time to be modifiable in real time, the circular buffer needs to be initialized to a maximum length allowed by the lookahead time (in my case 200ms), and then you need to keep track of the actual time/position in the buffer that will move based on the current delay time.

The delay comes after the envelope is extracted from the audio signal and before the compressor gain is applied:

void OnAudioFilterRead (float[] data, int numChannels)
{
    // Calculate pre-gain & extract envelope
    // ...

    // Delay the incoming signal for lookahead.
    if (lookaheadTime > 0f) {
        m_LookaheadDelay.SetDelayTime(lookaheadTime, sampleRate);
        m_LookaheadDelay.Process(data, numChannels);
    }

    // Apply compressor gain
    // ...
}

That’s all there is to the lookahead, so now we turn our attention to the last feature. The knee of the compressor is the area around the threshold where compression kicks in. This can either be a hard knee (the compressor kicks in abruptly as soon as the threshold is reached) or a soft knee (compression is more gradual around the threshold, known as the knee width). Comparing the two in a plot illustrates the difference clearly.

Hard knee in black and soft knee in light blue (threshold is -24 dB).

Hard knee in black and soft knee in light blue (threshold is -24 dB).

There are two common ways of specifying the knee width. One is an absolute value in dB, and the other is as a factor of the threshold as a value between 0 and 1. The latter is one that I’ve found to be most common, so it will be what I use. In the diagram above, for example, the threshold is -24 dB, so a knee value of 1.0 results in a knee width of 24 dB around the threshold. Like the lookahead feature, a new field will be required:

public class Compressor : MonoBehaviour
{
    [AudioSlider("Threshold (dB)", -60f, 0f)]
    public float threshold = 0f;		// in dB
    [AudioSlider("Ratio (x:1)", 1f, 20f)]
    public float ratio = 1f;
    [AudioSlider("Knee", 0f, 1f)]
    public float knee = 0.2f;
    [AudioSlider("Pre-gain (dB)", -12f, 24f)]
    public float preGain = 0f;			// in dB, amplifies the audio signal prior to envelope detection.
    [AudioSlider("Post-gain (dB)", -12f, 24f)]
    public float postGain = 0f;			// in dB, amplifies the audio signal after compression.
    [AudioSlider("Attack time (ms)", 0f, 200f)]
    public float attackTime = 10f;		// in ms
    [AudioSlider("Release time (ms)", 10f, 3000f)]
    public float releaseTime = 50f;		// in ms
    [AudioSlider("Lookahead time (ms)", 0, 200f)]
    public float lookaheadTime = 0f;	// in ms

    public ProcessType processType = ProcessType.Compressor;
    public DetectionMode detectMode = DetectionMode.Peak;

    private EnvelopeDetector[] m_EnvelopeDetector;
    private Delay m_LookaheadDelay;

    private delegate float SlopeCalculation (float ratio);
    private SlopeCalculation m_SlopeFunc;

    // Continued...

At the start of our process block (OnAudioFilterRead()), we set up for a possible soft knee compression:

float kneeWidth = threshold * knee * -1f; // Threshold is in dB and will always be either 0 or negative, so * by -1 to make positive.
float lowerKneeBound = threshold - (kneeWidth / 2f);
float upperKneeBound = threshold + (kneeWidth / 2f);

Still in the processing block, we calculate the compressor slope as normal according to the equation from part 2:

slope = 1 – (1 / ratio), for compression

slope = 1, for limiting

To calculate the actual soft knee, I will use linear interpolation. First I check if the knee width is > 0 for a soft knee. If it is, the slope value is scaled by the linear interpolation factor if the envelope value is within the knee bounds:

slope *= ((envValue – lowerKneeBound) / kneeWidth) * 0.5

The compressor gain is then determined using the same equation as before, except instead of calculating in relation to the threshold, we use the lower knee bound:

gain = slope * (lowerKneeBound – envValue)

The rest of the calculation is the same:

for (int i = 0, j = 0; i < data.Length; i+=numChannels, ++j) {
    envValue = AudioUtil.Amp2dB(envelopeData[0][j]);
    slope = m_SlopeFunc(ratio);

    if (kneeWidth > 0f && envValue > lowerKneeBound && envValue < upperKneeBound) { // Soft knee
        // Lerp the compressor slope value.
        // Slope is multiplied by 0.5 since the gain is calculated in relation to the lower knee bound for soft knee.
        // Otherwise, the interpolation's peak will be reached at the threshold instead of at the upper knee bound.
        slope *= ( ((envValue - lowerKneeBound) / kneeWidth) * 0.5f );
        gain = slope * (lowerKneeBound - envValue);
    } else { // Hard knee
        gain = slope * (threshold - envValue);
        gain = Mathf.Min(0f, gain);
    }

    gain = AudioUtil.dB2Amp(gain);

    for (int chan = 0; chan < numChannels; ++chan) {
        data[i+chan] *= (gain * postGainAmp);
    }
}

In order to verify that the soft knee is calculated correctly, it is best to plot the results. To do this I just created a helper method that calculates the compressor values for a range of input values from -90 dB to 0 dB. Here is the plot of a compressor with a threshold of -12.5 dB, a 4:1 ratio, and a knee of 0.4:

Compressor with a threshold of -12.5 dB, 4:1 ratio, and knee of 0.4.

Compressor with a threshold of -12.5 dB, 4:1 ratio, and knee of 0.4.

Of course this also works when the compressor is in limiter mode, which will result in a gentler application of the limiting effect.

Compressor in limiter mode with a threshold of -18 dB, and knee of 0.6.

Compressor in limiter mode with a threshold of -18 dB, and knee of 0.6.

That concludes this series on building a compressor/limiter component.

Dynamics Processing: Compressor/Limiter, part 2

In part 1 I detailed how I built the envelope detector that I will now use in my Unity compressor/limiter. To reiterate, the envelope detector extracts the amplitude contour of the audio that will be used by the compressor to determine when to compress the signal’s gain. The response of the compressor is determined by the attack time and the release time of the envelope, with higher values resulting in a smoother envelope, and hence, a gentler response in the compressor.

The compressor script is a MonoBehaviour component that can be attached to any GameObject. Here are the fields and corresponding inspector GUI:

public class Compressor : MonoBehaviour
{
    [AudioSlider("Threshold (dB)", -60f, 0f)]
    public float threshold = 0f;		// in dB
    [AudioSlider("Ratio (x:1)", 1f, 20f)]
    public float ratio = 1f;
    [AudioSlider("Knee", 0f, 1f)]
    public float knee = 0.2f;
    [AudioSlider("Pre-gain (dB)", -12f, 24f)]
    public float preGain = 0f;			// in dB, amplifies the audio signal prior to envelope detection.
    [AudioSlider("Post-gain (dB)", -12f, 24f)]
    public float postGain = 0f;			// in dB, amplifies the audio signal after compression.
    [AudioSlider("Attack time (ms)", 0f, 200f)]
    public float attackTime = 10f;		// in ms
    [AudioSlider("Release time (ms)", 10f, 3000f)]
    public float releaseTime = 50f;		// in ms
    [AudioSlider("Lookahead time (ms)", 0, 200f)]
    public float lookaheadTime = 0f;	// in ms

    public ProcessType processType = ProcessType.Compressor;
    public DetectionMode detectMode = DetectionMode.Peak;

    private EnvelopeDetector[] m_EnvelopeDetector;
    private Delay m_LookaheadDelay;

    private delegate float SlopeCalculation (float ratio);
    private SlopeCalculation m_SlopeFunc;
    
    // Continued...
Compressor/Limiter Unity inspector GUI.

Compressor/Limiter Unity inspector GUI.

Compressor/Limiter Unity inspector GUI.

Compressor/Limiter Unity inspector GUI.

 

 

 

 

 

 

 

 

The two most important parameters for a compressor are the threshold and the ratio values. When a signal exceeds the threshold, the compressor reduces the level of the signal by the given ratio. For example, if the threshold is -2 dB with a ratio of 4:1 and the compressor encounters a signal peak of +2 dB, the gain reduction will be 3 dB, resulting in the signal’s new level of -1dB. The ratio is just a percentage, so a 4:1 ratio means that the signal will be reduced by 75% (1 – 1/4 = 0.75). The difference between the threshold and the signal peak (which is 4 dB in this example) is scaled by the ratio to arrive at the 3 dB reduction (4 * 0.75 = 3). When the ratio is ∞:1, the compressor is turned into a limiter. The compressor’s output can be visualized by a plot of amplitude in vs. amplitude out:

Plot of amplitude in vs. amplitdue out of a compressor with 4:1 ratio.

Plot of amplitude in vs. amplitdue out of a compressor with 4:1 ratio.

When the ratio is ∞:1, the resulting amplitude after the threshold would be a straight horizontal line in the above plot, effectively preventing any levels from exceeding the threshold. It can easily be seen how this then would exhibit the behavior of a limiter. From these observations, we can derive the equations we need for the compressor.

compressor gain = slope * (threshold – envelope value) if envelope value >= threshold, otherwise 0

slope = 1 – (1 / ratio), or for limiting, slope = 1

All amplitude values are in dB for these equations. We saw both of these equations earlier in the example I gave, and both are pretty straightforward. These elements can now be combined to make up the compressor/limiter. The Awake method is called as soon as the component is initialized in the scene.

 

void Awake ()
{
    if (processType == ProcessType.Compressor) {
        m_SlopeFunc = CompressorSlope;
    } else if (processType == ProcessType.Limiter) {
        m_SlopeFunc = LimiterSlope;
    }

    // Convert from ms to s.
    attackTime /= 1000f;
    releaseTime /= 1000f;

    // Handle stereo max number of channels for now.
    m_EnvelopeDetector = new EnvelopeDetector[2];
    m_EnvelopeDetector[0] = new EnvelopeDetector(attackTime, releaseTime, detectMode, sampleRate);
    m_EnvelopeDetector[1] = new EnvelopeDetector(attackTime, releaseTime, detectMode, sampleRate);
}

Here is the full compressor/limiter code in Unity’s audio callback method. When placed on a component with the audio listener, the data array will contain the audio signal prior to being sent to the system’s output.

void OnAudioFilterRead (float[] data, int numChannels)
{
    float postGainAmp = AudioUtil.dB2Amp(postGain);

    if (preGain != 0f) {
        float preGainAmp = AudioUtil.dB2Amp(preGain);
        for (int k = 0; k < data.Length; ++k) {
            data[k] *= preGainAmp;
        }
    }

    float[][] envelopeData = new float[numChannels][];

    if (numChannels == 2) {
        float[][] channels;
        AudioUtil.DeinterleaveBuffer(data, out channels, numChannels);
        m_EnvelopeDetector[0].GetEnvelope(channels[0], out envelopeData[0]);
        m_EnvelopeDetector[1].GetEnvelope(channels[1], out envelopeData[1]);
        for (int n = 0; n < envelopeData[0].Length; ++n) {
            envelopeData[0][n] = Mathf.Max(envelopeData[0][n], envelopeData[1][n]);
        }
    } else if (numChannels == 1) {
        m_EnvelopeDetector[0].GetEnvelope(data, out envelopeData[0]);
    } else {
        // Error...
    }

    m_Slope = m_SlopeFunc(ratio);

    for (int i = 0, j = 0; i < data.Length; i+=numChannels, ++j) {
        m_Gain = m_Slope * (threshold - AudioUtil.Amp2dB(envelopeData[0][j]));
        m_Gain = Mathf.Min(0f, m_Gain);
        m_Gain = AudioUtil.dB2Amp(m_Gain);
        for (int chan = 0; chan < numChannels; ++chan) {
            data[i+chan] *= (m_Gain * postGainAmp);
        }
    }
}

And quickly, here is the helper method for deinterleaving a multichannel buffer:

public static void DeinterleaveBuffer (float[] source, out float[][] output, int sourceChannels)
{
    int channelLength = source.Length / sourceChannels;

    output = new float[sourceChannels][];

    for (int i = 0; i < sourceChannels; ++i) {
        output[i] = new float[channelLength];

        for (int j = 0; j < channelLength; ++j) {
            output[i][j] = source[j*sourceChannels+i];
        }
    }
}

First off, there are a few utility functions that I included in the component that converts between linear amplitude and dB values that we can see in the function above. Pre-gain is applied to the audio signal prior to extracting the envelope. For multichannel audio, Unity unfortunately gives us an interleaved buffer, so this needs to be deinterleaved before sending it to the envelope detector (recall that the detector uses a recursive filter and thus has state variables. This could of course be handled differently in the envelope detector, but it’s simpler to work on single continuous data buffers).

When working with multichannel audio, each channel will have a unique envelope. These could of course be processed separately, but this will result in the relative levels between the channels to be disturbed. Instead, I take the maximum envelope value and use that for the compressor. Another option would be to take the average of the two.

I then calculate the slope value based on whether the component is set to compressor or limiter mode (via a function delegate). The following loop is just realizing the equations posted earlier, and converting the dB gain value to linear amplitude before applying it to the audio signal along with post-gain.

This completes the compressor/limiter component. However, there are two important elements missing: soft knee processing, and lookahead. From the plot earlier in the post, we see that once the signal reaches the threshold, the compressor kicks in rather abruptly. This point is called the knee of the compressor, and if we want this transition to happen more gently, we can interpolate within a zone around the threshold.

It’s common, especially in limiters, to have a lookahead feature that compensates for the obvious lag of the envelope detector. In other words, when the attack and release times are non-zero, the resulting envelope lags behind the audio signal as a result of the filtering. The compressor/limiter will actually miss attenuating the peaks in the signal that it needs to because of this lag. That’s where lookahead comes in. In truth, it’s a bit of a misnomer because we can obviously not see into the future of an audio signal, but we can delay the audio to achieve the same effect. This means that we extract the envelope as normal, but delay the audio output so that the compressor gain value lines up with the audio peaks that it is meant to attenuate.

I will be implementing these two remaining features in a future post.