Harmonic Entropy

From TD Xenharmonic Wiki
Jump to navigation Jump to search

Failed to parse (unknown function "\newcommand"): {\displaystyle \newcommand{\cent}{\text{¢}}}

Introduction[edit]

Harmonic Entropy, sometimes abbreviated as "HE", is a simple model to quantify the extent to which musical chords exhibit various psychoacoustic effects, lumped together in a single construct called psychoacoustic concordance. It was invented by Paul Erlich and developed extensively on the Yahoo! tuning and harmonic_entropy lists. Various later contributions to the model have been made by Steve Martin, Mike Battaglia, Keenan Pepper, and others.

Background[edit]

The general workings of the human auditory system lead to a plethora of well-documented and sonically interesting phenomena that can occur when a musical chord is played:

  • The perception of partial timbral fusion of the chord into one complex sound
  • The appearance of a virtual fundamental pitch in the bass
  • Critical band effects, such as timbral beatlessness, compared to mistunings of the chord in the surrounding area
  • The appearance of a quick fluttering effect sometimes known as periodicity buzz

There has been much research specifically on the musical implications critical band effects in the literature (e.g. Sethares's work), which are perhaps the psychoacoustic phenomena that readers are most familiar with. However, the modern xenharmonic community has displayed immense interest in exploring the other effects mentioned above as well, which have proven extremely important to the development of modern xenharmonic music.

These effects sometimes behave differently, and do not always appear strictly in tandem with one another. For instance, Paul Erlich has noted that most models for beatlessness measure 10:12:15 and 4:5:6 as being identical, whereas the latter exhibits more timbral fusion and a more salient virtual fundamental than the former. However, suppose we want to come up with a combined measure for how often effects such as the above tend to occur. It is then useful to note that

  • effects such as these tend to appear most strongly for those chords with large subsets that correspond to simple chunks of the harmonic series
  • the effects produced exhibit some degree of tolerance for mistuning

This enables us to speak of a general notion of the psychoacoustic concordance of an interval - the degree to which effects such as the above will appear when an arbitrary musical chord is played. Additionally, chords which are very inharmonic often exhibit a quality known as psychoacoustic discordance.

While psychoacoustic concordance is not a feature universal to all styles of music, it has been utilized significantly in Western music in the study of intonation. For instance, flexible-pitch ensembles operating within 12-EDO, such as barbershop quartets and string ensembles, will often adjust intonationally from the underlying 12-EDO reference to maximize the concordance of individual chords. Indeed, the entire history of Western tuning theory -- from meantone temperament, to the various Baroque well-temperaments, to 12-EDO itself, to the modern theory of regular temperament -- can be seen as an attempt to reason mathematically about how to generate manageable tuning systems that will maximize concordance and minimize discordance.

The Harmonic Entropy model is a simple way of quantifying how much an arbitrary chord will exhibit psychoacoustic concordance.

Concordance has often been confused with actual musical consonance, an unfortunate fact made more common by the psychoacoustics literature under the unfortunate name sensory consonance, most often used to refer to phenomena related to roughness and beatlessness specifically. This is not to be confused with the more familiar construct of tonal stability, typically just called "consonance" in Western common practice music theory and sometimes clarified as "musical consonance" in the music cognition literature. To make matters worse, the literature has also at times referred to concordance -- and not tonal stability -- as tonal consonance, often referring to phenomena related to virtual pitch integration, creating a complete terminological mess. As a result, the term "consonance" has been completely avoided in this article.

Model[edit]

The original Harmonic Entropy model limited itself to working with dyads. More recently, work by Steve Martin and others has extended this basic idea to higher-cardinality chords. This article will concern itself with dyads, as the dyadic case is still the most well-developed, and many of the ideas extend naturally to larger chords without need for much exposition.

The general idea of Harmonic Entropy is to first develop a discrete probability distribution quantifying how strongly an arbitrary incoming dyad "matches" every element in a set of basis rational intervals, and then seeing how evenly distributed the resulting probabilities are. If the distribution for some dyad is spread out very evenly, such that there is no clear "victor" basis interval that dominates the distribution, the dyad is considered to be more discordant; on the other extreme, if the distribution tends to concentrate on one or a small set of dyads, the dyad is considered to be more concordant. A clear mathematical way of quantifying this is via the Shannon entropy of the probability distribution:

Failed to parse (syntax error): {\displaystyle H(d) = -\sum_{b} p_d(b) \log_β p_d(b)}

where H(d) is the Shannon entropy of the dyad d, the b are all of the basis rationals in the set, the pd(b) is the probability assigned to basis rational b given an input dyad of d, and the logarithm β reflects the units of information being used (by convention, we set β=e, corresponding to the use of nats). This is the Harmonic Entropy of the dyad d.

In order to systematically assign a probability distribution to this dyad, we first start by defining a spreading function that dictates how the dyad is "smeared" out in log-frequency space, representing how the auditory system allows for some tolerance for mistuning. The typical choice that we will assume here for a spreading function is a Gaussian distribution, with mean set to be centered around the incoming dyad, and standard deviation s typically taken as a free parameter in the system.

A fairly typical choice of settings for a basic dyadic HE model would be:

  • The basis set is all those rationals bounded by some maximum Tenney height, with the bound typically notated as N and set to at least 10000.
  • The spreading function is typically a Gaussian distribution with a frequency deviation of 1% either way, or about s=~17 cents.

Other spreading functions have also been explored, such as the use of the "Vos function" of a·exp(b|x|) rather than the Gaussian distribution. We will assume the Gaussian distribution as the spreading function for the remainder of this article, so that the spreading function for a dyad d can be written as follows:

where the notation S(x-d) is chosen to make clear that we are translating S to be centered around the dyad d, which is now the mean of the Gaussian. In this notation, s becomes the standard deviation of the Gaussian, being an ASCII-friendly version of the more familiar symbol σ for representing the standard deviation. We assume here that the variable x represents cents, and will adopt this convention for the remainder of the article. Note that in previous expositions on Harmonic Entropy, s was sometimes given in units representing a percentage of linear-frequency deviation; we allow s to stand for cents here to simplify the notation. To convert from a percentage to cents, the formula cents = 1200*log2(1+percentage) can be used.

It is also common to use as a basis set all those rationals bounded by some maximum Weil height, with a typical cutoff for N set to at least 100. This has sometimes been referred to as seeding HE with the "Farey sequence of order N" and its reciprocals, so references in Paul's work to "Farey series HE" vs "Tenney series HE" are sometimes seen.

Given a spreading function and set of basis rationals, there are two different procedures commonly used to assign probabilities to each rational. The first, the domain-integral approach, works for arbitrary nowhere dense sets of rationals without any further free parameters. The second, the complexity-normalization approach, has nice mathematical properties which sometimes make it easier to compute and which may lead to generalizations to infinite sets of rationals which are sometimes dense in the reals. It is conjectured that there are certain important limiting situations where the two converge; both are described in detail below.

Domain-Integral Probabilities[edit]

For sets of basis rationals which are nowhere dense, the log-frequency spectrum can be divided up into domains assigned to the basis rationals. Each rational is assigned a domain with lower bound equal to the mediant of itself and its nearest lower neighbor, and likewise with upper bound equal to the mediant of itself and its nearest upper neighbor. If no such neighbor exists, ±∞ is used instead. Mathematically, this can be represented via the following expression:

Failed to parse (unknown function "\cent"): {\displaystyle p_d(b) = \int_{\cent(b_l)}^{\cent(b_u)} S(x-d) dx}

where sd(x) is the spreading function associated with d, bl and bu are the domain lower and upper bounds associated with basis rational b, and ¢(f) = 1200·log2(f), or the "cents" function converting frequency ratios to cents. Normally, bl is set equal to the mediant of b and its nearest lower neighbor (if it exists), or -∞ if not; likewise with bu.

This process can be summarized by the following picture, taken from William Sethares' paper on Harmonic Entropy:

File:HarmonicEntropySethares.png

Note the difference in terminology here - in this example, the fj+n are the basis rationals, the rj+n are the domains for each basis rational, and the bounds for each domain are the mediants between each fj+n and its nearest neighbor. The probability assigned to each basis rational is then the area under the spreading function curve for each rational's domain. The entropy of this probability distribution is then the Harmonic Entropy for that dyad.

In the case where the set of basis rationals consists of a finite set bounded by Tenney or Weil height, the resulting set of widths is conjectured to have interesting mathematical properties, leading to mathematically nice conceptual simplifications of the model. These simplifications are explained below.

Complexity-Normalization Probabilities[edit]

It has been noted empirically by Paul Erlich that, given all those rationals with Tenney height under some cutoff N as a basis set, that the domain widths for rationals sufficiently far from the cutoff seem to be proportional to 1/sqrt(nd). While it's still an open conjecture that this pattern holds for arbitrarily large N, the assumption is sometimes made that this is the case, and hence that for these basis rational sets, 1/sqrt(nd) "approximations" to the width are sufficient to estimate domain-integral Harmonic Entropy. This modifies the expression for the pd(b) as follows, noting that for the moment the "probabilities" won't sum to 1:

Failed to parse (unknown function "\cent"): {\displaystyle q_d(b) = \frac{S(\cent(b)-d)}{\sqrt{n_b \cdot d_b}}}

where the qd(b) now represent the unnormalized "probabilities", and nb and db are the numerator and denominator, respectively, of basis rational b. Again, the set of basis rationals here is assumed to be all of those rationals of Tenney Height ≤ N for some N.

A similar observation for the use of Weil-bounded subsets of the rationals suggests domain widths of 1/max(n,d), yielding instead the following formula:

Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q_d(b) = \frac{S(\cent(b)-d)}{\max(n_b,d_b)}}

where this time the set of basis rationals is assumed to be all of those of Weil Height ≤ N for some N.

In both cases, the general approach is the same: the value of the spreading function, taken at the value of cents(b), is divided by some sort of "complexity" function representing how much weight is given to that rational number. While the two complexity functions considered thus far were derived empirically by observing the asymptotic behavior of various height-bounded subsets of the rationals, we can generalize this for arbitrary basis sets of rationals and arbitrary complexities as follows:

Failed to parse (unknown function "\cent"): {\displaystyle q_d(b) = \frac{S(\cent(b)-d)}{\|b\|}}

where ||b|| denotes a complexity function mapping from rational numbers to non-negative reals.

As these "probabilities" don't sum to 1, the result is not a probability distribution at all, invalidating the use of the Shannon Entropy. To rectify this, the distribution is normalized so that the probabilities do sum to 1:

The pd(b) are then used directly to compute the entropy.

This approach to assigning probabilities to basis rationals is useful because it hypothetically makes it possible to consider the HE of sets of rationals which are dense in the reals, or even the entire set of positive rationals ℚ+, although the best way to do this is a subject of ongoing research.

Examples[edit]

In all of these examples, the x-axis represents the width in cents of the dyad, and the y-axis represents discordance rather than concordance, measured in nats of Shannon entropy. Note that by convention, the value for s is typically expressed as a percentage of frequency deviation; this can be converted to cents via

This uses as a spreading function the Gaussian distribution with s=~17 cents (or a lin-frequency deviation of 1%). The basis set is all rationals of Tenney height less than 10000. This uses the complexity-normalization approach, and the complexity function is sqrt(n·d):

File:HE Tenney N 10000 s 17cents.png

This example uses the same spreading function and standard deviation, but this time the basis set is all rationals of Weil height less than 100. The complexity function here is max(n,d):

File:HE Weil N 100 s 17cents.png

The following image (from Paul Erlich) compares the domain-integral and complexity-normalization approaches by overlaying the two curves on top of each other. In both cases, the spreading function is again a Gaussian with s=~17 cents, and the basis set is all those rationals with Tenney height ≤ 10000. It can be seen that the curves are extremely similar, and that the locations of the minima and maxima are largely preserved:

File:HE Tenney mediant vs sqrt nd Paul.png

Harmonic Rényi Entropy

An extension to the base Harmonic Entropy model, proposed by Mike Battaglia, is to generalize the use of Shannon entropy by replacing it instead with Rényi entropy, a q-analog of Shannon's original entropy. The Harmonic Rényi Entropy of order a of an incoming dyad can be defined as follows:

Failed to parse (syntax error): {\displaystyle H_a(d) = \frac{1}{1-a} \log_β \sum_b p_d(b)^a}

where the p_d(b) are the probabilities assigned by dyad d to each basis rational b. Being a q-analog, it is noteworthy that Rényi entropy converges to Shannon entropy in the limit as a→1, a fact which can be verified using L'Hôpital's rule as found here.

The Rényi entropy has found use in cryptography as a measure of the strength of a cryptographic code in the face of an intelligent attacker, an application for which Shannon entropy has long been known to be insufficient as described in this paper and this RFC. More precisely, the Rényi entropy of order ∞, also called the min-entropy, is used to measure the strength of the randomness used to define a cryptographic secret against a "worst-case" attacker who has complete knowledge of the probability distribution from which cryptographic secrets are drawn. In a musical context, by considering the incoming dyad as analogous to a cryptographic code which is attempting to be "cracked" by an intelligent auditory system, we can consider that the analogous "worst-case attacker" would be a "best-case auditory system" which has complete awareness of the probability distribution for any incoming dyad. This analogy would view such an auditory system as actively attempting to choose the most probable rational, rather than drawing a rational at random weighted by the distribution.

The use of a=∞ min-entropy would reflect this view. In contrast, the use of a=1 Shannon entropy reflects a much "dumber" process which performs no such analysis and perhaps doesn't even seek to "choose" any sort of "victor" rational at all. As the parameter a interpolates between these two options, it can be interpreted as the extent to which the rational-matching process for incoming dyads is considered to be "intelligent" and "active" in this way.

Some psychoacoustic effects naturally fit into this paradigm, such as the virtual pitch integration process, which actually does attempt to find a single victor when matching incoming chords with chunks of the harmonic series. Other psychoacoustic effects, such as that of beatlessness, may instead be better viewed as "dumb" processes whereby nothing in particular is being "chosen," but where a more uniform distribution of matching rational numbers for a dyad simply generates a more discordant sonic effect. Different values of a can differentiate between the predominance given to these two types of effect in the overall construct of psychoacoustic concordance.

Certain values of a reduce to simpler expressions and have special names.

a=0: Harmonic Hartley Entropy[edit]

where |R| is the cardinality of the set of basis rationals. This assumes, in essence, an "infinitely dumb" auditory system which can do no better than picking a rational number from a uniform distribution completely at random. All dyads have the same Harmonic Hartley Entropy. The Hartley Entropy is sometimes called the "max-entropy," and is useful mainly as an upper bound on the other forms of entropy: all Rényi Entropies are always guaranteed to be less than the Hartley Entropy.

File:HRE a=0.png

Harmonic Hartley Entropy (a=0) with the basis set all rationals with Tenney height ≤ 10000. Note that the choice of spreading function makes no difference in the end result at all.

a=1: Harmonic Shannon Entropy (Harmonic Entropy)[edit]

This is Paul's original Harmonic Entropy. Within the cryptographic analogy, this can be thought of as an auditory system which simply selects a rational at random from the incoming distribution, weighted via the distribution itself.

File:HE Tenney N 10000 s 17cents.png

Harmonic Shannon Entropy (a=1) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and sqrt(n·d) complexity.

a=2: Harmonic Collision Entropy[edit]

where Pd and Qd are independent and identically distributed random variables corresponding to the same dyad, and the collision entropy is the same as the negative log of the probability that the two variables produce the same outcome.

File:HE Tenney N 10000 s 17cents a=2.png

Harmonic Collision Entropy (a=2) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and sqrt(n·d) complexity.

a=∞: Harmonic Min-Entropy[edit]

This is the min-entropy, which simply takes the negative log of the largest probability in the distribution. This can be thought of as representing the "strength" of the incoming dyad from being "deciphered" by a "best-case" auditory system. The name "min-entropy" reflects that the a=∞ case is guaranteed to be a lower bound among all Rényi entropies.

File:HE Tenney N 10000 s 17cents a=7.png

Harmonic Rényi Entropy with a=7, with the high value of a being chosen to approximate min-entropy (a=). The basis set is still all rationals with Tenney height ≤ 10000, the spreading function a Gaussian distribution with s=1% (~17 cents), and the complexity function sqrt(n·d).

Convolution-Based Expression For Quickly Computing Renyi Entropy[edit]

Below is given an derivation that expresses Harmonic Renyi Entropy in terms of two simpler functions, each of which is a convolution product and hence can be computed quickly using the Fast Fourier Transform. The below derivation depends on the use of complexity-normalization probabilities, although it may be possible to extend to domain-integral probabilities instead.

Preliminaries[edit]

The Harmonic Renyi Entropy is defined as

Failed to parse (syntax error): {\displaystyle H_a(d) = \frac{1}{1-a} \log_β \sum_b p_d(b)^a}

As before, we can write the pd as follows:

where the qd are "unnormalized" probabilities, and the denominator above is the sum of these unnormalized probabilities, so that all of the pd sum to 1.

To simplify notation, we first rewrite the denominator as a "normalization" function:

and putting back into the original equation, we get

Failed to parse (syntax error): {\displaystyle H_a(d) = \frac{1}{1-a} \log_β \left( \sum_b \left( \frac{q_d(b)}{\psi(d)} \right)^a \right)}

Since ψ(d) is the same for each basis interval b, we can pull it out of the summation to obtain:

Failed to parse (syntax error): {\displaystyle H_a(d) = \frac{1}{1-a} \log_β \left( \frac{\sum_b q_d(b)^a}{\psi(d)^a} \right)}

To simplify, we can also rewrite the numerator, the sum of "raw" (unnormalized) pseudo-probabilities, as a function:

Finally, we put this all together to obtain a simplified version of the Harmonic Renyi Entropy equation:

Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle H_a(d) = \frac{1}{1-a} \log_β \left( \frac{\rho_a(d)}{\psi(d)^a} \right)}

We thus reduce the term inside the logarithm to the quotient of the functions ρa(d) and ψ(d)a. Our aim is now to express each of these two functions in terms of a convolution product.

Convolution product for ψ(d)[edit]

ψ(d), the normalization function, is written as follows:

Again, each qd(b) is defined as follows:

Failed to parse (unknown function "\cent"): {\displaystyle q_d(b) = \frac{S(\cent(b)-d)}{\|b\|}}

Assuming we are treating the d as constant, it is clear that the qd(b) are all scaled, translated, flipped versions of the spreading function S. We can use this property to rewrite each one as a convolution with a delta distribution:

Failed to parse (unknown function "\cent"): {\displaystyle q_d(b) = \left(S \ast \frac{\delta_{-\cent(b)}}{\|b\|}\right)(-d)}

Putting this back into the original summation, we obtain

Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \psi(d) = \sum_b \left(S \ast \frac{\delta_{-\cent(b)}}{\|b\|}\right)(-d)}

We note that the left factor in the convolution product is always S, and is not dependent on b in any way. Since convolution distributes over multiplication, we can factor the S out of the summation to obtain

Failed to parse (unknown function "\cent"): {\displaystyle \psi(d) = \left[S \ast \left(\sum_b \frac{\delta_{-\cent(b)}}{\|b\|}\right)\right](-d)}

We can clean up this notation by defining the auxiliary distribution K:

Failed to parse (unknown function "\cent"): {\displaystyle K(d) = \left(\sum_b \frac{\delta_{-\cent(b)}}{\|b\|}\right)}

Which leaves us with the final expression:

Convolution product for ρa(d)[edit]

The derivation for ρa(d) proceeds similarly. Recall the function is written as follows:

The expression for each qd(b)a is:

Failed to parse (unknown function "\cent"): {\displaystyle q_d(b)^a = \frac{S(\cent(b)-d)^a}{\|b\|^a}}

We can again express this as a convolution of the function Sa, meaning the spreading function S taken to the a'th power, and a delta distribution:

Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q_d(b)^a = \left(S^a \ast \frac{\delta_{-\cent(b)}}{\|b\|^a}\right)(-d)}

Putting this back into the original summation and factoring as before, we obtain

Failed to parse (unknown function "\cent"): {\displaystyle \rho_a(d) = \left[S^a \ast \left(\sum_b \frac{\delta_{-\cent(b)}}{\|b\|^a}\right)\right](-d)}

And again we clean up notation by defining the auxiliary distribution

Failed to parse (unknown function "\cent"): {\displaystyle K^a(d) = \left(\sum_b \frac{\delta_{-\cent(b)}}{\|b\|^a}\right)}

so that

We have now succeeded in representing ρa(d) as a convolution.

Note that the function Ka(d) involves a slight abuse of notation, as it is not literally K(d) taken to the a'th power (as the square of the delta distribution is undefined). Rather, we are simply taking the weights of each delta distribution in the summation to the a'th power.

Round-up[edit]

Taking all of this, we can rewrite the original expression for Harmonic Renyi Entropy as follows:

Failed to parse (syntax error): {\displaystyle H_a(d) = \frac{1}{1-a} \log_β \left( \frac{\left[S^a \ast K^a\right](-d)}{\left[S \ast K\right]^a(-d)} \right)}

where the expression

represents the convolution of S and K, taken to the a'th power.

We have succeeded in representing Harmonic Renyi Entropy in simple terms of two convolution products, each of which can be computed in O(N log N) time.

References[edit]

Paul Erlich article

William Sethares article

Harmonic entropy (TonalSoft encyclopedia)

Harmonic entropy group on Yahoo

Harmonic entropy graph calculator (JavaScript)