"Linux Gazette...making Linux just a little more fun!"

Audio Processing Pipelines

By Adrian J. Chung

For decades experienced Unix users have employed many text processing tools to make document editing tasks much easier. Console utilities such as sed, awk, cut, paste, and join, though useful in isolation, only realise their full potential when combined together through the use of pipes.

Recently Linux has been used for more than just processing of ASCII text. The growing popularity of various multimedia formats, in the form of images and audio data, has spurred on the development of tools to deal with such files. Many of these tools have graphical user interfaces and cannot operate in absence of user interaction. There are, however, a growing number of tools which can be operated in batch mode with their interfaces disabled. Some tools are even designed to be used from the command prompt or within shell scripts.

It is this class of tools that this article will explore. Complex media manipulation functions can often be effected by combining simple tools together using techniques normally applied to text processing filters. The focus will be on audio stream processing as these formats work particularly well with the Unix filter pipeline paradigm.

Sound Sample Translator

There are a multitude of sound file formats and converting between them is a frequent operation. The sound exchange utility sox fulfills this role and is invoked at the command prompt:

sox sample.wav sample.aiff

The above command will convert a WAV file to AIFF format. One can also change the sample rate, bits per sample (8 or 16), and number of channels:

sox sample.aiff -r 8000 -b -c 1 low.aiff

low.aiff will be at 8000 single byte samples per second in a single channel.

sox sample.aiff -r 44100 -w -c 2 high.aiff

high.aiff will be at 44100 16-bit samples per second in stereo.

When sox cannot guess the destination format from the file extension it is necessary to specify this explicitly:

sox sample.wav -t aiff sample.000

The "-t raw" option indicates a special headerless format that contains only raw sample data:

sox sample.wav -t raw -r 11025 -sw -c 2 sample.000

As the file has no header specifying the sample rate, bits per sample, channels etc, it is a good idea to set these explicitly at the command line. This is necessary when converting from the raw format:

sox -t raw -r 11025 -sw -c 2 sample.000 sample.aiff

One need not use the "-t raw" option if the file extension is .raw, however this option is essential when the raw samples are coming from standard input or being sent to standard output. To do this, use the "-" in place of the file name:

sox -t raw -r 11025 -sw -c 2 - sample.aiff < sample.raw

sox sample.aiff -t raw -r 11025 -sw -c 2 - > sample.raw

Why would we want to do this? This usage style allows sox to be used as a filter in a command pipeline.

Play It Faster/Slower

Normally sox adjusts the sample frequency without altering the pitch or tempo of any sounds through the use of interpolation. By piping the output of one sox to the input of another and using unequal sample rates, we can bypass the interpolation and effectively slow down a sound sample:

sox sample.aiff -t raw -r 44100 -sw -c 2 - | sox -t raw -r 32000 -sw -c 2 - slow.aiff

or speed it up:

sox sample.aiff -t raw -r 32000 -sw -c 2 - | sox -t raw -r 44100 -sw -c 2 - fast.aiff

Simple Editing

Suppose one wants a sample consisting of the first two seconds of some other sound file. We can do this using sox in a command pipeline as shown here:

sox sample.aiff -t raw -r 44100 -sw -c 2 - | head -c 352800 | sox -t raw -r 44100 -sw -c 2 - twosecs.aiff

The input file sample.aiff is converted to 44.1kHz samples, each two bytes in two channels. Thus two seconds of sound is represented in 44100x2x2x2 = 352800 bytes of data which are stripped off using "head -c 352800". This is then converted back to AIFF format and stored in twosecs.aiff

Likewise to extract the last second of a sample:

sox sample.aiff -t raw -r 44100 -sw -c 2 - | tail -c 176400 |
sox -t raw -r 44100 -sw -c 2 - lastsec.aiff

and the third second:

sox sample.aiff -t raw -r 44100 -sw -c 2 - | tail -c +352801 | head -c 176400 | sox -t raw -r 44100 -sw -c 2 - lastsec.aiff

Note that with 16-bit samples the argument to "tail -c +N" must be odd, otherwise the raw samples become misaligned.

One can extract parts of different samples and join them together into one file via nested sub-shell commands:

(sox sample-1.aiff -t raw -r 44100 -sw -c 2 - | head -c 176400 
sox sample-2.aiff -t raw -r 44100 -sw -c 2 - | head -c 176400 ) | 
sox -t raw -r 44100 -sw -c 2 - newsample.aiff

Here we invoke a child shell that outputs raw samples to standard output from two different files. This is piped to a sox process executing in the parent shell which creates the resulting file.

Desktop Sound Output

Sounds can be sent to the OSS (open sound system) device /dev/dsp with the "-t ossdsp" option:

sox sample.aiff -t ossdsp /dev/dsp

The sox package usually includes a platform-independent script play that invokes sox with the appropriate options. The previous command could be invoked simply by

play sample.aiff

Audio samples played this way monopolise the output hardware. Another sound capable application must wait until the audio device is freed before attempting to play more samples. Desktop environments such as GNOME and KDE provide facilities to play more than one audio sample simultaneously. Samples may be issued by different applications at any time without having to wait, although not every audio application knows how to do this for each of the various desktops. sox is one such program that lacks this capability. However, with a little investigation of the audio media services provided by GNOME and KDE, one can devise ways to overcome this shortcoming.

There are quite a few packages that allow audio device sharing. One common strategy is to run a background server to which client applications must send their samples to be played. The server then grabs control of the sound device and forwards the audio data to it. Should more than one client send samples at the same time the server mixes them together and sends a single combined stream to the output device.

The Enlightened Sound Daemon (ESD) uses this method. The server, esd, can often be found running in the background of GNOME desktops. The ESD package goes by the name, esound, on most distributions and includes a few simple client applications such as:

esdplay - plays sound samples stored in one of the more popular file formats (WAV, AU, or AIFF)
esdcat - submits raw sound samples to the server. This tool is a natural fit for terminating a pipeline of sound filters.

This command will play the first second of a sample via ESD:

sox sample.aiff -t raw -r 44100 -sw -c 2 - | head -c 176400 | esdcat

One can also arrange to play samples stored in formats that ESD does not understand but can be read by sox:

sox sample.cdr -t raw -r 44100 -sw -c 2 - | esdcat

In some cases samples can sound better when played this way. Some versions of ESD introduce significant distortion and noise when given sounds recorded at a low sample rate.

The Analog RealTime Synthesizer (ARtS) is similar to ESD but is often used with KDE. The background server is artsd with the corresponding client programs, artsplay and artscat. To play a sample:

sox sample.cdr -t raw -r 44100 -sw -c 2 - | tail -c 352800 |artscat

Both ESD and ARtS are not dependent on any one particular desktop environment. With some work, one could in theory use ESD with KDE and ARtS with GNOME. Each can even be used within a console login session. Thus one can mix samples, encoded in a plethora of formats, with or without the graphical desktop interface.

Music as a Sample Source

Having covered what goes on the end of an audio pipeline, we should consider what can be placed at the start. Sometimes one would like to manipulate samples extracted from music files in MP3, MIDI, or module (MOD, XM, S3M, etc) format. Command line tools exist for each of these formats that will output raw samples to standard output.

For MP3 music one can use "maplay -s"

maplay -s music.mp3 | artscat

The music.mp3 must be encoded at 44.1kHz stereo to play properly otherwise artscat or esdcat will have to be told otherwise:

maplay -s mono22khz.mp3 | esdcat -r 22050 -m
maplay -s mono22khz.mp3 | artscat -r 22050 -c 1

Alternatively one can use "mpg123 -s". Additional arguments ensure that the output is at the required rate and number of channels:

mpg123 -s -r 44100 --stereo lowfi.mp3 | artscat

Users of Ogg Vorbis may use the following:

ogg123 -d raw -f - music.ogg | artscat

Piping is not really necessary here since ogg123 has built-in ESD and ARtS output drivers. Nevertheless, it is still useful to have access to a raw stream of sample data which one can feed through a pipeline.

Music files also can be obtained in MIDI format. If (like me) you have an old sound card with poor sequencer hardware, you may find that timidity can work wonders. Normally this package converts MIDI files into sound samples for direct output to the sound device. Carefully chosen command line options can redirect this output:

timidity -Or1sl -o - -s 44100 music.mid | artscat

The "-o -" sends sample data to standard output, "-Or1sl" ensures that the samples are 16-bit signed format, and "-s 44100" sets the sample rate appropriately.

If you're a fan of the demo scene you might want to play a few music modules on your desktop. Fortunately mikmod can play most of the common module formats. The application can also output directly to the sound device or via ESD. The current stable version of libmikmod, 3.1.9, does not seem to be ARtS aware yet. One can remedy this using a command pipeline:

mikmod -d stdout -q -f 44100 music.mod | artscat

The -q is needed to turn off the curses interface which also uses standard output. If you still want access to this interface you should try the following:

mikmod -d pipe,pipe=artscat -f 44100 music.mod

Only the later versions of mikmod know how to create their own output pipelines.

Effects Filters

Let us return to the pipeline friendly sox. In addition to its format conversion capabilities, there is small library of effects filters. Here are some examples:

Add echo play sample.aiff echo 1 0.6 150 0.6
Add vibration play sample.aiff vibro 20 0.9

Add severe distortion play sample.aiff flanger 0.7 0.7 4 0.8 2 play sample.aiff phaser 0.6 0.6 4 0.6 2

Band pass filter -- sounds like a bad phone connection: play sample.aiff band 3000 700

or listening through a thick blanket: play sample.aiff band 0 700

Make a chorus of sounds from one sample: play sample.aiff chorus 0.7 0.7 20 1 5 2 -s

Hidden messages? Play it backwards: play sample.aiff reverse

Warning: Depending on the size of the sample, this can use up a lot of memory and/or disk space

Putting It All Together

The major components of an audio command pipeline have now been covered. Let us see how they can be combined together to perform a few non-trivial functions:

Play a music module on the KDE desktop with a chorus effect:

mikmod -d stdout -q -f 44100 music.xm | sox -t raw -r 44100 -sw -c 2 - -t raw - chorus 0.7 0.7 80 0.5 2 1 -s | artscat

Play a song in Ogg Vorbis format with the first 4 seconds removed:

ogg123 -d raw -f - music.ogg | tail -c +705601 |artscat
Convert a MIDI file to Ogg Vorbis format introducing a little added echo:

timidity -Or1sl -o - -s 44100 music.mid | sox -t raw -r 44100 -sw -c 2 - -t raw - echo 1 0.6 80 0.6 | oggenc -o music.ogg --raw -

The pipeline has been terminated with the Ogg Vorbis encoder, oggenc, configured here to accept raw sample data from standard input.

Convert a 32kHz mono MP3 file to 44.1kHz stereo Ogg Vorbis file, lowering the volume in the process:

maplay -s mono32.mp3 | sox -v 0.5 -t raw -r 32000 -sw -c 1 - -t raw -r 44100 -c 2 - split | oggenc -o music.ogg --raw -

Concatenate all AIFF files in the current directory into a single WAV file:

for x in *.aiff do sox $x -v 0.5 -t raw -r 8000 -bu -c 1 - done | sox -t raw -r 8000 -bu -c 1 - all.wav

Hopefully these examples hint at what can be accomplished with the pipeline technique. One cannot argue against using interactive applications with elaborate graphical user interfaces. They often can perform much more complicated tasks while saving the user from having to memorise pages of argument flags. There will always be instances where command pipelines are more suitable however. Converting a large number of sound samples will require some form of scripting. Interactive programs cannot be invoked as part of an at or cron job.

Audio pipelines can also be used to save disk space. One need not store a dozen copies of what is essentially the same sample with different modifications applied. Instead, create a dozen scripts each with a different pipeline of filters. These can be invoked when the modified version of the sound sample is called for. The altered sound is generated on demand.

I encourage you to experiment with the tools described in this article. Try combining them together in increasingly elaborate sequences. Most importantly, remember to have fun while doing so.

Adrian J Chung

When not teaching undergraduate computing at the University of the West Indies, Trinidad, Adrian is writing system level scripts to manage a network of Linux boxes, and conducts experiments with interfacing various scripting environments with home-brew computer graphics renderers and data visualization libraries.