I have recently gotten back into Pure Data in the interest of familiarizing myself more with it, and perhaps to integrate it with Unity for a future project. For anyone who may not be familiar with it, more info on it can be found here. It’s a great environment for building synthesizers and experiment with a great variety of audio processing techniques in a modular, graphically-based way.
Oscillators are the foundation of synthesizers of course, and Pure Data comes with two modules that serve this purpose: osc~ for sine waves, and phasor~ for a sawtooth (it’s actually just a rectified ramp in the range 0 – 1, so it needs a couple of minor additions to turn it into a sawtooth wave). However, the phasor~ module (not being band-limited) suffers from aliasing noise. This can be avoided either by passing it through an anti-aliasing filter, or by using the sinesum message to construct a sawtooth wave according to the Fourier Theorem of summing together sine waves (i.e. creating a wavetable). Both of these methods have some drawbacks though.
A robust anti-aliasing filter is often computationally expensive, whereas wavetables sacrifice the quality of the waveform and are not truly band-limited across its entire range (unless the wavetable is divided into sections constructed with decreasing number of harmonics as the frequency increases). A wavetable sawtooth typically lacks richness and depth due to the lack of harmonics in its lower range. These issues can be fixed by constructing the sawtooth using band-limited steps (BLEPs), which are based on band-limited impulse trains (BLITs), at the expense of some increased complexity in the algorithm. And fortunately, Pure Data allows for custom modules to be built, written in C, that can then be used in any patches just like a normal module.
The process behind this method is to construct a naive sawtooth wave using the equation
sample = (2 * (f / SR * t)) – 1
where f is frequency (Hz), SR is sample rate (Hz), and t is a time constant (in this case, the sample count). At the waveform’s discontinuities, we insert a PolyBLEP to round the sharp corners, which is calculated by a low order polynomial equation (hence, “Poly”nomial Band-Limited Step). The polynomial equation is of the form
x2 + 2x + 1
The equation for the PolyBLEP is based on the discussion of the topic on this KVR forum thread. The concensus in the thread is that the PolyBLEP is far superior to using wavetables, but sounds slightly duller than using minBLEPs (which are far more complicated still, and require precalculation of the BLEP using FFT before integrating with the naive waveform). PolyBLEP oscillators strike a good balance between high quality, minimal aliasing noise, and reasonable complexity.
Pure Data patch using the polyblep~ module for a sawtooth wave.
Here is a quick sample of the PolyBLEP sawtooth recorded from Pure Data. Of course, PolyBLEPs can be used with other waveforms as well, including triangle and square waves, but the sawtooth is a popular choice for synths due to it’s rich sound.
The GitHub page can be found here, with projects for Mac (Xcode 5) and Windows (Visual Studio 2010).
Some time ago, I began exploring the early reverb algorithms of Schroeder and Moorer, whose work dates back all the way to the 1960s and 70s respectively. Still their designs and theories inform the making of algorithmic reverbs today. Recently I took it upon myself to continue experimenting with the Moorer design I left off with in an earlier post. This resulted in the complete reverb plug-in “AdVerb”, which is available for free in downloads. Let me share what went into designing and implementing this effect.
One of the foremost challenges in basing a reverb design on Schroeder or Moorer is that it tends to sound a little metallic because with the number of comb filters suggested, the echo density doesn’t build up fast or dense enough. The all-pass filters in series that come after the comb filter section helps to diffuse the reverb tail, but I found that the delaying all-pass filters added a little metallic sound of their own. One obvious way of overcoming this is to add more comb filters (today’s computers can certainly handle it). More importantly, however, the delay times of the comb filters need to be mutually prime so that their frequency responses don’t overlap, which would result in increased beating in the reverb tail.
To arrive at my values for the 8 comb filters I’m using, I wrote a simple little script that calculated the greatest common divisor between all the delay times I chose and made sure that the results were 1. This required a little bit of tweaking in the numbers, as you can imagine finding 8 coprimes is not as easy as it sounds, especially when trying to keep the range minimal between them. It’s not as important for the two all-pass filters to be mutually prime because they are in series, not in parallel like the comb filters.
I also discovered, after a number of tests, that the tap delay used to generate the early reflections (based on Moorer’s design) was causing some problems in my sound. I’m still a bit unsure as to why, though it could be poorly chosen tap delay times or something to do with mixing, but it was enough so that I decided to discard the tap delay network and just focus on comb filters and all-pass filters. It was then that I took an idea from Dattorro and Frenette who both showed how the use of modulated comb/all-pass filters can help smear the echo density and add warmth to the reverb. Dattorro is responsible for the well-known plate reverbs that use modulating all-pass filters in series.
The idea behind a modulated delay line is that some oscillator (usually a low-frequency sine wave) modulates the delay value according to a frequency rate and amplitude. This is actually the basis for chorusing and flanging effects. In a reverb, however, the values need to be kept very small so that the chorusing effect will not be evident.
I had fun experimenting with these modulated delay lines, and so I eventually decided to modulate one of the all-pass filters as well and give control of it to the user, which offers a great deal more fun and crazy ways to use this plug-in. Let’s take a look at the modulated all-pass filter (the modulated comb filter is very similar). We already know what an all-pass filter looks like, so here’s just the modulated delay line:
Modulated all-pass filter.
The oscillator modulates the value currently in the delay line that we then use to interpolate, resulting in the actual value. In code it looks like this:
In case that looks a little daunting, we’ll step through the C code (apologies for the pointer arithmetic!). At the top we calculate the offset using the delay length in samples as our base point. The following lines are easily seen as incrementing and wrapping the phase of the oscillator as well as capping the offset to the delay length.
The next line calculates the current position in the buffer from the current position pointer, p, and the buffer head, p_head. This is accomplished by casting the pointer addresses to integral values and dividing by the size of the data type of each buffer element. The read_offset position will determine where in the delay buffer we read from, so it needs to be clamped to the buffer’s length as well.
The rest is simply linear interpolation (albeit with some pointer arithmetic: delay_buffer->p_head + read_pos + 1 is equivalent to delay_buffer[read_pos + 1]). Once we have our modulated delay value, we can finish processing the all-pass filter:
delay_val = get_modulated_delay_value(allpass_filter);
// don't write the modulated delay_val into the buffer, only use it for the output sample
*delay_buffer->p = sample_in + (*delay_buffer->p * allpass_filter->g);
sample_out = delay_val - (allpass_filter->g * sample_in);
The final topology of the reverb is given below:
Topology of the AdVerb plug-in.
The pre-delay is implemented by a simple delay line, and the low-pass filters are of the one-pole IIR variety. Putting the LPFs inside the comb filters’ feedback loops simulates the absorption of energy that sound undergoes as it comes in contact with surfaces and travels through air. This factor can be controlled with a damping parameter in the plug-in.
The one-pole moving-average filter is there for an extra bit of high frequency roll-off, and I chose it because this particular filter is an FIR type and has linear phase so it won’t add further disturbance to the modulated samples entering it. The last (normal) all-pass filter in the series serves to add extra diffusion to the reverb tail.
Here are some short sound samples using a selection of presets included in the plug-in:
Piano, “Medium Room” preset
The preceding sample demonstrates a normal reverb setting. Following are a few samples that demonstrate a couple of subtle and not-so-subtle effects:
Piano, “Make it Vintage” preset
Piano, “Bad Grammar” preset
Flute, “Shimmering Tail” preset
Feel free to get in touch regarding any questions or comments on “AdVerb“.
The adventure continues. This time we occupy the world of tremolo as a digital signal processing effect; also known as amplitude modulation. My studies into the area of audio programming has progressed quite far I must say, covering the likes of filters and delays (get your math hats ready), reverb, and even plug-in development. In order to really solidify what I’ve been learning though, I decided to go back and create a program from scratch that will apply tremolo and vibrato to an existing audio file, and that’s where this blog entry comes in. For now I am just covering the tremolo effect as there is plenty to discuss here on that, while vibrato will be the subject of the next blog entry.
Tremolo in and of itself is pretty straightforward to implement, both on generating signals and on existing soundfiles. (Vibrato on the other hand is easy enough to apply on to signals being generated, but a bit more complex when it comes to sound input). Nonetheless, several challenges were met along the way that required a fair amount of research, experimentation and problem solving to overcome, but in doing so I’ve only expanded my knowledge in the area of DSP and audio programming. I think this is why I enjoy adventure games so much — chasing down solutions and the feeling you get when you solve a problem!
The tremolo effect is simply implemented by multiplying a signal by an LFO (low frequency oscillator). While LFOs are normally between 0 – 20 Hz, a cap of 10 Hz works well for tremolo. The other specification we need is depth — the amount of modulation the LFO will apply on to the original signal — specified in percent. A modulation depth of 100%, for example, will alternate between full signal strength to complete suppression of the signal at the frequency rate of the LFO. For a more subtle effect, a depth of around 30% or so will result in a much smoother amplitude variance of the signal. With this information we can develop a mathematical formula for deriving the modulating signal in which we can base our code on. This is also where I encountered one of my first big challenges. The formula I used at first (from the book Audio Programming) was:
ModSignal = 1 + DEPTH * sin(w * FREQ)
where w = 2 * pi / samplerate. This signal, derived from the LFO defined by the sine operation, would be used to modulate the incoming sound signal:
Signal = Signal * ModSignal
This produced the desired tremolo effect quite nicely. But when the original signal approached full amplitude, overmodulation would occur resulting in a nasty digital distortion. As can be seen in the above equation for the modulating signal, it will exceed 1 for values of sine > 0. Essentially this equation is a DC offset, which takes a normally bipolar signal and shifts it up or down. This is what we want to create the tremolo effect, but after realizing what was causing distortion in the output, I set about finding a new equation to calculate the modulating signal. After some searching, I found this:
This equation was much better in that it never exceeds 1, so it won’t result in overmodulation of the original signal. I did however make one personal modification to it; I decided not to square the sine operation after experimenting around with it in the main processing loop. Ideally we want to perform as few calculations (especially costly ones) within loops as possible. This is especially important in audio where responsiveness and efficiency are so important in real-time applications. To compensate for this I scale the DEPTH parameter from a percentage to a range of 0 – 0.5. From here we can now get into the code. First, initialization occurs:
Then the main processing loop:
With expandability and flexibility in mind, I began creating my own “oscillator” class which can be seen here:
This is where the power of C++ and object-oriented programming start to show itself. It affords the programmer much needed flexibility and efficiency in creating objects that can be portable between different programs and functions for future use, and this is definitely important for me as I can utilize these for upcoming plug-ins or standalone audio apps. Furthermore, by designing it with flexibility in mind, this will allow for the modulation of the modulator so-to-speak. In other words, we can time-vary the modulation frequency or depth through the use of envelopes or other oscillators. Values extracted from an envelope or oscillator can be passed into the “oscillator” class which processes and updates its internal data with the proper function calls. This will allow for anything from ramp ups of the tremolo effect to entirely new and more complex effects derived from amplitude modulation itself!
But now let’s get on to the listening part! For this demonstration I extracted a short segment of the Great Fairy Fountain theme from the Zelda 25th Anniversary CD release, probably my favorite theme from all of Zelda.
This brings up another challenge that had to be overcome during the development of this program. Prior to this most of the work I had been studying in the book “Audio Programming” dealt with mono soundfiles. For this I really wanted to get into handling stereo files and this presented a few problems as I had to learn exactly how to properly process the buffer that holds all the sound data for stereo files. I am using libsndfile (http://www.mega-nerd.com/libsndfile/) to handle I/O on the actual soundfile being processed and this required me to search around and further adapt my code to work properly with this library. At one point I was getting very subtle distortion in all of my outputs as well as tremolo rates that were double (or even quadruple) the rates that I had specified. It took a lot of investigation and trial & error before I discovered the root of the problem lie in how I was handling the stereo files.
In closing off this blog entry, here is a further processing I did on the Zelda sample. After applying tremolo to it using the program I wrote, I put it through the pitch shifter VST plug-in I implemented to come up with a very eerie result. ‘Till next time!
For this blog entry, I felt a good topic to cover would be additive synthesis and its implementation in C. Based on Fourier theory, additive synthesis concerns combining simple periodic waveforms (such as sine or cosine waves) together that result in more complex sounds. Put another way, any waveform can be expressed as a sum of sine waves due to the principle of the harmonic series which states that any sound contains an infinite number of overtones, or partials, as a diverging series. These overtones, of a certain frequency and amplitude, when expressed as simple sinusoidal waves form the building blocks of a more complex waveform. Mathematically this can be defined (for partials k = 1 to N, w = (2πf)/sample rate, and ø as the phase offset) as:
We can now, for example, create a sawtooth wave using this equation. But first, here is a snippet of C code that generates a simple sine wave and outputs it to a raw binary data file (these can be read by most audio editing software like Audacity) as well as a text file that can be plotted (I used Gnuplot):
Here is the graph of the sine wave as well as the soundfile (converted to .wav from .raw):
Now let’s move on to generating the sawtooth wave and we’ll see how the simple sinusoidal wave above is transformed into a more complex waveform through the use of the Fourier series equation. I set the total number of partials to 30 for this demonstration, and sawtooth waves contain all partials of the harmonic series with the amplitude of each equal to 1/partial (ak = 1/k). We also set the phase offset as ø = -π/2.
Here is the graph and the soundfile output of the sawtooth wave:
Now for something really funky! If we go back to the Fourier series equation above as we defined it, both amplitude and frequency were invarying. This is, as we know, not how sound behaves and really isn’t very interesting. To generate much more interesting and complex waveforms, we set both the amplitude (a) and the frequency (f) as finite, time-varying functions. I set up my code to read in a breakpoint file for both amplitude and frequency, each containing a few random values at arbitrary time locations to illustrate this. (A breakpoint file contains data set up similar to coordinates, or points, that can be plotted. In this case we have a amplitude vs. time and frequency vs. time.) Here is the plot of the amplitude stream …
… and the frequency stream …
… and the C code that implements this:
The code that reads the breakpoint data uses linear interpolation to find the value at time n / sample rate. The normalize_buffer routine adjusts the resulting amplitude to fit within the values of -1 and +1 and then scales it to the desired amplitude set by the user (in this case 0.7). Following is a snapshot of the resulting waveform (just before the 1 second mark) in Gnuplot as well as the soundfile output; the result of just summing together simple sinusoidal waves as seen in the first Gnuplot above!
This is a short and simple example of how flexible and powerful the Fourier series can be at creating textures and complex waveforms through additive synthesis. The possibilities just escalate from here, depending on the time-varying functions that determine the amplitude and frequencies applied to the individual sinusoidal waves, i.e. the harmonics.
It must be pointed out that the code used to implement these routines above is brute force and not optimal. For instance, to calculate these 3-second waves at a 44.1 kHz sampling rate with 30 partials requires 30 iterations through 132,300 (3 * 44,100) samples of data for a total of 3,969,000 loop iterations! Calculating sine and cosine values is actually quite a costly operation, so this is unacceptable for most practical purposes. One alternative is wavetable synthesis, whereby the data for one period of a basic wave is stored in a table and can then be looked up and utilized by the program. This is much more efficient (though depending on the size of the table, there can be a loss in precision) as redundant trigonometric calculations don’t have to be repeatedly calculated. This is potentially another topic for another time however.
For now, hope this was an interesting look into additive synthesis using the Fourier series and how with relatively few lines of code we can make some pretty cool things happen.
I decided to start this little blog about my current endeavors into audio programming because since I started, I’ve already learned a great deal of fascinating and wonderful things relating to audio in the analog and, especially, the digital domain. Some of these things I already knew but my understanding of them have deepened, and other concepts are completely new. Sharing this knowledge, the discoveries and the challenges I encounter along the way, seemed like a good idea.
Sound is such an amazing thing! I’ve always known (and been told as I’m sure we all have) that math is a huge part of it — inseperable. But precisely how much, and to what complexity, I didn’t fully know until I dove into audio programming. Advanced trigonometry, integrals, and even complex numbers are all there in the theory behind waveforms and signal processing. Fortunately, math was consistently my best subject in school and trigonometry was one of my favorite areas of it.
What further steered me in this direction was my growing fascination with audio implementation in video games. As I taught myself the various middleware tools used in the industry (FMOD, Wwise and UDK) it really became clear how much I loved it and how interested I was in how the process of implementation and integration of audio in video games could add to the gameplay, immersion and the overall experience.
With that little introduction out of the way, I’ll end this first post with a little example of what I’ve picked up so far. I’m reading through the book “Audio Programming” (Boulanger and Lazzarini), and early on it walks through the process of writing a real-time ring modulator. Building on this I adapted it to accept stereo input/output as it was originally mono. You then input two frequencies (one for the left channel and one for the right channel) that are then modulated with the carrier frequencies of the stereo input signal, and this results in a ring-modulated stereo output signal (ring modulation is a fairly simple DSP effect that just multiplies two signals together producing strong inharmonic partials in the resulting sound, which is usually very bell-like). Here is a snippet of my modified code in which I had to create my own stereo oscillator structure and send it to the callback function that modulates both channels:
And here is a recording of my digital piano being played into the real-time ring modulator (which I did with a single microphone, so the recording is in mono unfortunately):