Friday, January 9, 2009

Dissonance and ‘Frisson’: Harmonic Spectral Complexity and Rate of Change of Complexity

I   t does feel like the brighter vowels are placed in chords that will benefit from the ‘frisson’ of their higher overtones. (This compositional device also resonates with the use of solfege as an intonational device—the built-in brightening of 3rd and 7th degrees of the scale from their syllables.)”
  —  Liz Garnett, BCU, commenting on choral [and barbershop] orchestration in previous CMT blog post.
F   ris•son´ [free-sohn]
{1770–80; < F: shiver, shudder, OF friçons (pl.) < LL frictiōnem, acc. of frictiō shiver (taken as deriv. of frīgēre to be cold)}
n.: a sudden, passing sensation of excitement; a shudder of emotion; a thrill.”
T he previous CMT post led composer/vocal coach/musicologist Liz Garnett at the Birmingham Conservatoire to comment on the multi-factorial origins of perceptual interest that’s aroused by harmonically complex polyphony. I especially liked Liz’s expression, ‘frisson’, when she was describing the sizzle of close-harmony and beat-frequencies among intertwining harmonics. The word ‘frisson’ exactly captures the sense that I experienced when I was singing tenor in a barbershop quartet for four years in the early 1990s. Maybe the word is vivid and accurate for you as well.

L iz averred that she had not studied the nature of the effect systematically [yet] but is interested in doing so. And she and I both agree that the effect is real and that it deserves systematic study. The matter is of interest (a) to composers who, by knowing additional dimensions of how the underlying acoustics and cognitive processes of music work, will be better able to reliably write music that efficiently utilizes those processes; (b) to performers and conductors who, by more fully understanding how a piece operates, will be better able to interpret and deliver the effects that the piece has to offer; (c) to recording engineers and sound-reinforcement engineers and architects who, through new insights into the harmonic properties of the sounds they are helping to produce, will be better able to create ‘systems’ and performance venues that do justice to the live aesthetics; and (d) to musicologists and music theorists, who want to enable us to make sense of it all. Maybe this somewhat esoteric topic even has some interest to regular audience members—at least science-friendly, numerate ones like you.

T here are a variety of wonderful recent books on the cognitive psychology of music; many of those have been cited in CMT posts over the past two years. But none of them, so far as I am aware, has specifically and quantitatively examined mechanisms of attention-capture and –retention by music as a function of its harmonic spectral content and trajectory.

B ut how to go about doing this? Well, first we need to have a collection of examples of compositions and performances that illustrate the effect we’re wanting to study—the ‘frisson’ of harmonic/spectral trajectories. Many barbershop and other close-harmony choral pieces are ideal examples, insofar as they generate most of their perceptual interest precisely in the way that Liz and I were discussing. The Eric Whitacre ‘Lux Aurumque’ piece is perfect: it is harmonically rich, but it is rhythmically and textually very simple—elegant. Many pieces of instrumental chamber music will also be analyzable by the methods I have in mind, and contain passages where much of the cognitive content is generated by close-harmony polyphony. So, in response to Liz Garnett’s suggestion and with an interest in continuity with the CMT post that stimulated it, I thought I’d take the Dale Warland Singers’ recording of ‘Lux Aurumque’ and see what I could make of it with some digital spectrum analysis.

I   write bars, for the musicians—because they have to be ‘together’. It’s difficult when you have multiple different speeds going on at the same time. But don’t think that this is mathematics. There are some [mathematical] constructions in it [i.e., in my compositional design], but the whole thing is ‘music’. In my piano concerto I developed the polyphony to a much higher level of complexity. I think in harmony and rhythm I have found ‘it’. But in melody, I still search…”
  —  György Ligeti, interview with Dorle Soria, 1987, Musical America 107(4).
A s I often try to do here on CMT, I invoke other disciplines outside of music itself, to see what techniques and analytical formalisms they may have to offer us. The cognitive robotics and artificial intelligence (AI) communities, for example, are not ordinarily ‘collaborators’ with music theory people, except on ‘music information retrieval’ (MIR) and a few other subjects. Consider, though, the recent work of Jürgen Schmidhuber and colleagues at TUM and IDSIA.

Jürgen Schmidhuber, Professor of Cognitive Robotics, TUM
I  nterestingness’ becomes the first derivative [with respect to time] of ‘subjective beauty’ [as a function of time]. As the learning agent improves its data compression algorithm, formerly apparently-random data parts become subjectively more regular and ‘beautiful’. Such progress in [adaptive, dynamic] data compression is measured and maximized by the ‘curiosity drive’ [that is exhibited by the agent’s behavior]: create action sequences that extend the observation history and that yield previously unknown, unpredictable-but-quickly-learnable algorithmic regularity. That is how beauty arises and how our interest is captured.”
  —  Jürgen Schmidhuber, Professor of Cognitive Robotics, Technische Univ Munich and Co-Director, IDSIA, Lugano, Switzerland, 2007.
J ürgen Schmidhuber is not the first to have linked aesthetic ‘interest’ with the time-derivative (the rate-of-change of ‘beauty’), nor is he the first to have associated ‘beauty’ with ‘complexity’. Fred Lerdahl at Columbia and many others have had similar thoughts in the past. But Jürgen takes the thought further, perhaps, than others have done, in terms of his analysis of the detailed adaptive encoding/compression behaviors that accompany cognitive ‘interest’—the ‘origination-persistence-dissipation’ life-cycle of it.

A nd, although the first-order time-derivative (slope) of harmonic complexity c(t) is not by any means the only predictor of musical interest or subjective ‘frisson’ or ‘sizzle’, it is one of the things that we can readily examine, to investigate how good or poor a correlate of ‘frisson’ it is. That’s what I set about to do this past week. The preliminary results of my initial exploring appear further down this page...

C   omplexity that works is built up out of modules that work perfectly, layered one over the other.”
  —  Kevin Kelly, ‘Out of Control: The New Biology of Machines, Social Systems, & the Economic World’, 1995.
 Vowels, Formants F1 and F2
Y ou may like to know that there are a variety of ‘freeware’ software applications that can help you do this sort of thing, in case you wish to undertake some investigation of these things yourself. SFS from the Phonetics Dept at UCL is the tool that I happened to use, although there are several other purpose-built spectral analysis programs for harmonic and formant analysis of speech, which allow importing .mp3 or .wav audio files. Or you can use the signal-processing modules of MATLAB or other full-function analytics packages.

T he more challenging thing, in my view, is to select a quantitative complexity index (or, alternatively, devise a new, entirely original complexity metric) that adequately represents the harmonic complexity of the polyphony at any instant in time and that enables us to analyze the timeseries of serial complexity indices of the music in sync with the music’s score. The mathematics and computer science literature on Kolmogorov complexity is especially extensive.

B ut Kolmogorov is computationally more difficult to implement than I wanted to do right now. So I considered Lempel-Ziv complexity (LZC), which is another complexity metric that has received much attention over the past 30 years. I have used LZC in a recent project in my software-engineering day-job, and I have coded it in C and C++ languages in the past. But the LZC is a ‘left-to-right reading’ or ‘directional’ complexity metric, which is undesirable for what we are aiming to do here. We don’t want a complexity metric that gives different or undue emphasis or deemphasis to harmonics that are on the ‘left’ of the input string (the low-frequency bands, bass) vs. the ‘middle’ (the mid-range registers, tenor and alto) vs. the ‘right’ (the high-frequency bands, soprano, etc.).

F or my initial study, I therefore settled on a simple, non-directional ‘compositional complexity’ metric, with a separate ‘channel’ for each of [any number of] the significant harmonics that exist (in the context of the chord that prevails at each moment). My complexity metric has no fancy adjustments for pitch error or jitter or phasing or nonlinear Fletcher-Munson- or Robinson-Dadson-type aural sensitivity curves or other important things. It’s a basic, exploratory tool, nothing more. It is not meant to be a definitive representation of the sound or the waveform or the neurophysiology of our hearing of the sound. It is not the product of some extensive, scholarly project. It’s just ‘lunch’—featuring left-over things I know from other analytics projects I’ve done, plus a few hours of Java coding and testing. In other words, my complexity metric is just a quick, simple method to see whether the interpretive path that I’m on makes sense and whether it might be productive or not. Productive for me … or sufficiently intriguing for you or others to pursue it further.

T his ‘compositional complexity’ metric is designed so as to assign the same complexity value to a given polyphony/harmonics ‘pattern’ regardless of its performance dynamics. That is, the string A=‘1020000000000000’ and B=‘2070000000000000’ have the same pitches sounded, for example. The pitches have different sound intensities and different relative loudnesses with respect to each other, so their timbres are actually substantially different. But for the preliminary purposes of this experiment, A and B are harmonically ‘the same’, according to our simple ‘compositional complexity’ metric, in terms of the pitches and beat-frequencies, etc., that they contain. They each have only two loudness levels instantiated and they each have only two harmonics instantiated: therefore, their complexity metrics are identical.

W hat next? The sound intensity needs to be determined at each moment in time (in elapsed milliseconds, in the .mp3), with each moment/sample tagged with the corresponding notation in the music score. For simplicity’s sake, I arbitrarily chose to do 3-bit quantization of the sound intensity (8 levels, coded as ‘0’ to ‘7’), corresponding roughly to qualitative ‘nil’, ‘ppp’, ‘pp’, ‘p’, ‘mp’, ‘mf’, ‘f’, and ‘fff’ dynamics. I measured the sound-level of each harmonic, ascertained from the SFS digital spectrograms of the Warland ‘Lux Aurumque’ .mp3 at each point in time. In other words, my sound intensity quantization of this .mp3 did not just assign one dynamics code at each moment but instead assigned a multi-frequency ‘vector’ comprised of up to 16 three-bit dynamics codes at each moment, one for each harmonic present [plus the ‘0’=‘nil’ sound intensity code for any harmonic that was ‘implied’ by the prevailing chord structure but un-voiced or absent in the sampled audio].

 Harmonics
Here is a jpeg SFS screenshot of one of the ‘Lux’ vowels in the piece...

 SFS, spectrogram of Whitacre ‘Lux Aurumque’, m. 6
I  then wrote a simple Java program to accept these 16-vectors as input and calculate the compositional complexity for each one. Now, we don’t want a coarser or finer representation of the frequency domain (EQ bands) to cause us to compute wildly different complexity index values. We want our complexity metric to be ‘normed’ and vary, say, on a range between 0.0 and 1.0, regardless how many or few bands or channels we use to measure the polyphony and harmonics. The nj are the numbers of each harmonic 'band' in the sample. C varies from values close to zero for simple, flute-like, near sine-wave sounds up to 1.0 for all-hell-broke-loose dissonance.

T he equation that governs the ‘normalization’ of the computed complexity index is a function of the number of ‘bands’—the EQ spectrogram ‘length’ L (expressed in bins, by note-by-note fundamental in bin #1, its octave in bin #2, and so on). The normalization also a function of the number of sound-intensity levels, which was N=8 for my 3-bit quantization, as shown in this equation.

 Normed Complexity
T he normalization calculation is done automatically in my little Java program. If you like, you can click on the jpeg of the Java code below and download a copy to play with.

 complexity.java program, DSM
Y ou can download the free copy of the Java Development Kit (JDK) from Sun Microsystems here. Extract it to your hard-drive. Then compile the complexity.java program in a command window with the ‘javac’ compiler command:

     C:\>\Program Files\Java\jdk1.6.0_11\bin\javac complexity.java

To run the program, use the java command:

     C:\>\Program Files\Java\jdk1.6.0_11\bin\java complexity

Y ou don’t need to be a software developer/programmer to do this. It’s easy. Entering some intensity vectors from the keyboard as strings gives you output like what you see below…

 Output of complexity.java with example 16-band harmonics and 3-bit quantization
T hen we take that timeseries of complexity values computed from our spectral analysis measurements of the .mp3 music file and use difference-equations (in Excel or whatever you like) to estimate the first derivative (slope, rate-of-change) of the complexity at each time point. When we plot it, it looks like this…

DSM, complexity analysis, 16-harmonic, 3-bit quantization, Eric Whitacre, ‘Lux Aurumque’, mm. 6 – 8, Dale Warland Singers
W hat we see here is that the moments that have that ‘frisson’ that Liz and I were referring to do have high complexity levels. And the heart-pounding, emotionally-evocative compositional structures/moments tend to be ones where the first derivative of complexity has high positive or negative values—moments of dramatic change in polyphony harmonics and dynamics.

T   he art of ‘simplicity’ is a puzzle of ‘complexity’.”
  —  Douglas Horton, Harvard Divinity School, 1955.
W hen the performance is delivered in an echoic space like a cathedral, you get significantly non-zero spectral complexities that linger hundreds of milliseconds after a note is released. The ‘shock’ value of this is almost as great as the ‘voix celeste’ beat-frequency implied-pitch effects we were discussing before. The effect is somewhat like SONAR: your voices are acoustically ‘imaging’ the geometry of the performance hall, somewhat like a bat would do … flying around in the upper reaches of Chartres Cathedral at Christmas Eve midnight Mass. The composer has created a gesture that causes you to acoustically plumb your ‘situatedness’ and physicality as a human being in the cathedral, much like a bat. The polyphony—the frisson and its decay—causes you to realize your physical humanness in a very dramatic, palpable way. You are performatively ‘present’ more vividly than at almost any other time in your life. Through the ‘frisson’ you become intensely aware of your own personhood and that of the other singers (or instrumental players) around you. It is breath-taking to hear it; tremendously exciting or emotionally unnerving to perform it.

Incidentally (but unsurprisingly), expressively-effective close-harmony ‘frisson’ doesn’t come suddenly out of ‘nowhere’. It has a foundation—a methodical, deliberate architecture. We look at the c(t) and dc(t)/dt curves, and we see that the effect builds progressively, logically, over a phrase or a portion of a phrase. You feel this happening when you sing it, of course. These spectral analysis and complexity timeseries plots only confirm quantitatively/objectively what most of us feel intuitively/subjectively. Same thing when you perform a similarly orchestrated close-harmony piece of instrumental music. The acclaim that Eric Whitacre has accrued over the past 10 years redounds to his skill in engineering these kinds of harmonic gestures. The ambiguity of the C#m [I] → C#m7/9 (D#m7) [II6] V-of-V supertonic and other things Whitacre’s got going on in here are impressive.

S   implicity does not precede complexity but instead follows it.”
  —  Alan J. Perlis, ‘Software Metrics’, 1981.
A  more extended, time-consuming effort will be required to statistically evaluate the correlation of the spectral ‘brightening’ (more acoustic energy in the higher harmonics) with certain vowel and consonant sounds and short time-scale structures associating text phonetics-to-timbre and –pitch/–spectrum. It’s more work than I have time to devote to right now. But this CMT post is meant to convince you that studying effects like these quantitatively is surely feasible—feasible even using available free-ware and simple complexity metrics that you can code for free in Java. In other words, there is a real possibility for you to do this without a big expenditure for software or expensive computers. If you are a student in a conservatory, though, you probably will want to get copies of several of the books in the list below ...

 Vowels, formants F1 and F2
W hat else? Well, it’s important to note that you can’t write a piece that has constant low levels of complexity—or constant, excessive complexity—and expect people to sit still for it. Nor can you whip-saw people with changes in dc(t)/dt that are too frequent or too fast. You either won’t capture the artists’/audience’s attention at all, or you will rapidly lose it, annoying them in the process. Beauty and interest may indeed be functions of the first time-derivative, as Jürgen Schmidhuber says, but that doesn’t mean that ‘more is better’ or that ‘all frisson, all the time’ is a categorical aesthetic good. You need to write with a storyline—compose/arrange with a good ‘plot’ in mind. Of course, this comes as no surprise. For me, the surprise is being able now to quantatively see what we already knew or believed—understanding the mechanisms of musical ‘interest’ in terms of spectral complexity timeseries. It’s a new way to understand the ‘mechanics’ of how good writing works, and to identify the faults and reasons why other compositions don’t work. Simple. And pretty useful, I think.

 Wundt curve, preference as function of complexity, elegant brinkspersonship
T hanks for your interest in this topic! Please feel free to post a comment or email me anytime, about this or other aspects concerning how we transmit/receive/perceive musical aesthetic value. If there’s enough interest, I’ll continue with more posts in this vein. And thanks especially to you, Liz, for the dialogue.

T   he complexity of things—the things within things—is endless. Nothing is easy; nothing is simple... I was looking forward to telling the truth, or some of it, in all its complexity, to a person who would not be surprised or outraged by it.”
  —  Alice Munro, ‘Carried Away’, 2006.

John Rhys-Davies, ‘Indiana Jones: Raiders of the Lost Ark’, Lucasfilm, 1981
I   ndy, why does the floor move?”
  —  Sallah (John Rhys-Davies) to Indiana Jones, upon opening the Well of Souls, ‘Raiders of the Lost Ark’, Lucasfilm, 1981.
 Well of Souls, ‘Indiana Jones: Raiders of the Lost Ark’, Lucasfilm, 1981
I  t is our inimitable frisson of complexity—all writhing and beauteous complexity down here.”
  —  Snakes’ reply, to Sallah and Indiana, from the bottom of Well of Souls, ‘Raiders of the Lost Ark’, Lucasfilm, 1981.


No comments:

Post a Comment