There's
a growing trend in the world of electronic-music performance.
Next time you go to a show, chances are you'll see a laptop
onstage, functioning as a performer. In interactive composition
and performance, control of a piece includes a computer that
has been programmed to sense significant musical features from
a human performer and produce its own music in response.
Nothing about the idea is new — people have been writing
and playing interactive works for more than 25 years. But the
pioneers worked for institutions that could spend hundreds of
thousands of dollars on specialized computer systems. Now that
PCs are intertwined with everyday life, interactive music systems
have trickled down to the proletarian sphere of individual musicians.
In this column, I'll take a brief look at the evolution of interactive
music systems and give an overview of some performance approaches
that are commonly used. (See the sidebar “References and
Recordings” for additional resources.)
SOME RECENT HISTORY
By the end of the 1960s, Max Mathews, the father of computer
music, was increasingly dissatisfied with the music that computers
were producing. Music created from coded scores was dry and
lifeless. In an effort to transmit micromodulations —
the uncountable variations in embouchure, bow position, breath
pressure, and so on, that give live music a dynamic dimension
— Mathews began the pursuit of what he called the intelligent
machine that would respond to performers' nuances. Conductor
was an early system that Mathews designed for Pierre Boulez,
who was the musical director of the New York Philharmonic at
the time. Boulez was enthusiastic about electronic elements
in performance, but felt constrained by having to follow a tape.
The Conductor system allowed electronic elements to be dynamically
controlled by external devices such as joysticks and percussion
instruments.
In 1977, composer Joel Chadabe snatched the first Synclavier
off the production line and had it outfitted with special software
that created melodies based on predefined parameters such as
harmony and interval content. The Synclavier was interfaced
with two modified theremins. One antenna controlled the tempo
(note durations), while the other controlled relative volumes
of four Synclavier voices (in effect, overall timbre). Chadabe
wrote that performing with the system was like having “a
conversation with a clever friend.” He could do things
like cue clarinet sounds to play slowly; but since he did not
know which pitches would play, the notes he heard then influenced
his next control gesture.
Meanwhile, at Boulez's research brainchild, IRCAM, in Paris,
work was under way on a digital signal-processing computer that
was capable of any synthesis configuration as well as real-time
audio processing. The 4X workstation was completed in the early
1980s and was like nothing the world had ever seen. Miller Puckette
created a Macintosh-based interface for the 4X in which processes
and controls were represented graphically. Patches could be
created by drawing patch cords between modules, and processing
algorithms could be switched on and off by various gates. He
named the program Max in honor of Mathews.
Max was later ported to the NeXT personal computer, where it
could be run with the help of peripheral hardware processors
in a configuration called the ISPW (IRCAM Signal Processing
Workstation). Though far more economical than the 4X, the ISPW
remained a pricey hardware-software combination. Max was then
released commercially as a kind of erector set for MIDI input,
processing, and output and is now under active development by
Cycling '74 (the sidebar “On the Web” provides URLs
for all the developers mentioned in this article) for both the
Mac and Windows computers. The tools for interactivity were
now within the means of independent musicians.
SAY
WHAT?
So what is meant, exactly, by machine responses to a human player?
Author-composer Robert Rowe classifies interactions into three
broad categories. The first concerns the type of “listening”
a computer is doing. The second describes the computer response
types. The third describes the nature of the partnership between
performer and computer. As for listening, computers can listen
generally or specifically.
General listening means that the computer senses general characteristics
such as register, loudness, or density. Specific listening can
come in two forms. One, score following, involves moment-by-moment
estimations of a performer's tempo. One commercial score follower
is Smart Music, a practice aid for music students, by MakeMusic
Inc. The program has accompaniments to standard repertoire for
most solo instruments. A piece's accompaniment plays along with
a soloist, whose tempo is tracked with a microphone. A less
rigorous form of listening, score orientation, does make not
continual tempo estimations but responds to selected highlights,
such as a trigger from a pedal or a high note at a given pitch.
So much for listening. Now we can consider three forms of response.
Transformative responses create variations on a performance.
For example, Max can be configured to invert intervals, play
a phrase backwards, transpose notes, arpeggiate chords, sense
the current harmony and add a bass note, create chords from
a melody, and more. Generative responses are based on material
that the computer creates on its own, such as algorithmic creation
of melodies from a library of pitches and rhythms (see “Game
of Chance” in the November 2003 EM for more on algorithmic
composition). Sequenced responses consist of stored musical
passages that are kept on hand to be played when triggered.
For example, in a score-oriented listening system, certain events
in a score, such as a long, loud middle A, might trigger a preset
melody. The performer might then create variations on the melody
using a continuous-control pedal that changes the sequence's
tempo or dynamics.
Finally, we can think of two roles the computer might play in
a performance. In one, the computer extends the player's instrument,
augmenting a solo performance with features such as filtering,
effects, or pitch doubling. In the second, the computer creates
another personality, so that it plays a kind of duet with a
musician. Sophisticated implementations of duet partnering may
rely on techniques of artificial intelligence to perform tasks
such as defining phrase beginnings and endings or sensing changes
of scale, mode, or key.
MESSAGE IN A CABLE
The previous examples described MIDI responses. MIDI is an effective
vehicle for interaction, given its discrete, event-based format.
Incoming events can be marked with time stamps, easily cataloged,
and complemented by stored catalogs of algorithms or sequences.
MIDI, however, provides an incomplete representation of a performance.
Notably absent is any description of timbral variation. But
an extension to Max called MSP adds the ISPW audio-processing
modules to the environment, letting today's computer owners
explore what was once only possible with the 4X, at less than
one one-hundredth of the cost.
While an audio-based system has the advantage of being more
closely tied acoustically to a performance, it lacks many of
the flexibilities of a MIDI-based system. Responses such as
playing a phrase in reverse or inverting all pitches around
a given note are easy to implement with MIDI's unambiguous event
types, but much more difficult to perform with a stream of audio
samples. Polyphony is another issue that is easy for MIDI: a
chord is easily recognizable as a set of discrete pitches. This
level of analysis is impossible for an acoustic signal, as no
one has been able to create a program that can distinguish between
simultaneous pitches and overtones of a fundamental pitch. Acoustic
systems, then, are typically based on input from a monophonic
instrument.
Pitch trackers can identify the fundamental of a monophonic
instrument or signal. With a pitch-tracking module, a signal's
frequency can be sent to an oscillator to control its pitch,
or the signal may be transposed. Other audio-based applications
could include using the volume of an acoustic signal to modify
the index of a frequency-modulating oscillator, or mapping MIDI
controller values to audio processes such as reverb time, filter
frequencies, or stereo placement. Analysis modules can do things
like analyze incoming speech, separate noisy sibilants from
periodic vowels, and process each differently.
OSC (Open Sound Control) is a protocol introduced by the Center
for New Media and Audio Technologies (CNMAT) at the University
of California at Berkeley in the late 1990s to enable real-time
control of computer-synthesis processes from gestural devices.
OSC does not include MIDI messages, but MIDI messages can easily
be mapped into OSC, making OSC commands a superset of the MIDI
protocol. OSC offers increased resolution and definition of
gestures and synthesis parameters, as well as more accurate
time control. It is transmitted over networks of computers,
which means that it is well suited for broadcast performances
of computers and performers interacting with each other from
different places. The Gibson guitar company has also developed
the MaGIC specification, which sends an electric guitar's acoustic
signal over an Ethernet network, giving guitarists the opportunity
to participate in these simulcast collaborations.
SUBTLE MANIPULATIONS
Joel Chadabe probably chose the theremin for his original Synclavier
system because that instrument is practically unparalleled in
its sensitivity to micromodulations. Ironically, as the sound
capabilities of electronic instruments have evolved, their player
interfaces have become increasingly rudimentary. Interactive
performances often feature experimental-instrument types that
push the sensing envelope. Instruments like Don Buchla's Lightning
allow movements in space to be translated into MIDI control
signals.
Massachusetts Institute of Technology Media Lab composer Tod
Machover heads the development of hyperinstruments that generate
various control signals. The conducting dataglove translates
a conductor's left-hand movements into controls by tracking
the angle of each finger relative to the back of the hand, as
well as the angle of the joints of each finger. Hyperstrings
augment the capabilities of string instruments. One commission
by cellist Yo-Yo Ma consisted of sensors that tracked bow angle,
bow pressure, wrist angle, and left-hand finger positions. Data
from the cello motions and an analysis of the instrument's audio
were fed into a computer that generated audio in response.
GET WITH THE PROGRAM
FIG.
1: Cycling '74's Max?MSP combines customized MIDI and audio
processing. Plain black patch cords carry MIDI-based processes,
while black-and-yellow striped patch cords carry audio processes.
This example creates simple FM synthesis.
Max/MSP is the software most commonly used in interactive music
applications (see Fig. 1). Its graphical front end facilitates
algorithm configuration, while the essential issues of event
scheduling and input tracking are kept “under the hood.”
This allows users to focus on music rather than computer cycles.
The Max environment has also spawned two offshoots. Pd (“pure
data” or “public domain”) is a version introduced
by Miller Puckette that exists in the public domain. It is free,
runs on virtually all hardware platforms, and is under continual
development by a community of users. Yet another version, jMax,
is written in Java and is available from IRCAM's Web site.
Other systems suited to interactivity include Symbolic Sound's
Kyma system, an audio processor and sound-programming language
for Macintosh and Windows. Like Max, it is visually oriented,
but processing and synthesis modules are arranged on a timeline.
Kyma includes pitch and amplitude trackers, and it can be configured
to wait for a specific event (such as a middle C) before, for
example, running a script to generate notes (see Fig. 2).
FIG.
2: In Symbolic Sound's Kyma, processes are dragged onto a multitrack
timeline. Processes such as waitForLowG (center-left, track
1) are used to tell the system when to start various tasks.
James McCartney's SuperCollider, a free program for the Macintosh,
is a text-based programming environment. Although the absence
of a graphical interface makes SuperCollider harder to learn
than some programs, it also permits a greater degree of efficiency
and flexibility. For example, the number of active oscillators
can be assigned to a variable. Changing the number of oscillators
in a patch is simply a matter of changing the value assigned
to that variable, rather than adding or removing objects and
patch cords from the screen.
Kyma's developer, Carla Scaletti, has pointed out that these
programs are computer music languages. Most commercial music
software falls into the category of a utility, meaning programs
that perform common, well-defined functions. It's true that
many utilities are quite complex — your average digital
audio sequencer is an example. But they cannot match the open-endedness
and flexibility of general purpose languages that enable users
to configure whatever synthesis and audio-processing algorithms
they want, nor can they provide the same ability to tailor these
processes to customized input and output routings. You can take
all the features of your favorite commercial synths and combine
them in one custom environment, provided you have the computer
memory (and the patience!) to cobble them together. For those
wanting individualized performance environments, computer music
languages are the only way to fly.
INTO THE FUTURE
Interactive music raises intriguing questions about musical
intelligence, compositional methodology, and collaboration —
questions that only become more intriguing as computing power
advances. This is a pursuit likely to become an important current
of 21st-century music.