Following up on an earlier post, I’ve been investigating why digital pianos sound so bad in stereo. I did a spectral analysis on the outputs of my Yamaha CP50 stage piano. The results confirm what I and others have been hearing: because of how they are sampled, stereo pianos don’t reproduce well except in headphones.
The graphs below show the levels of the first 5 harmonics produced by each note on my CP50, at L and R line-level outputs as well as the L/Mono output. The L/Mono jack sums the L and R channels when a 1/4″ phone plug is put in the L jack only — although acoustically this is a really bad idea.
To create these graphs I struck and held each note on my CP50 (with the keyboard set to a fixed medium-loud strike velocity) and recorded 1 second of audio from both L and R line-level outputs. Reverb and all other on-board effects were disabled, and EQ pots were at their center positions. For each recording I calculated a mean spectrum (an average of several length-8192 windowed FFTs) after discarding the attack transient (about 20ms). From each mean spectrum I extracted the amplitudes of the first 5 spectral peaks. I did the same for the mono signal, derived by summing L+R channels (which is exactly what the unit’s “L/mono” line-level output does if there is no phone plug in the R jack).
Timbre of the L/mono Output
These measurements confirm what I and many others have heard and complained about: the L/mono output gets the piano’s timbre egregiously wrong, and in a way that isn’t consistent across the keyboard. For instance the fundamental is 6-12dB down throughout the mid-range from D4 to A5. The 3rd and 5th harmonics are 6-12dB down throughout the bass octave from F2 to F3. Many other timbral irregularities are apparent from the graphs.
The culprit is destructive interference, caused by summing L and R signals that were recorded at spaced microphones when the original piano was sampled. Time delays between sound wave arrival times at the mics cause phase differences between the recorded signals. This phase difference depends on mic placement relative to the string that is sounding, as well as the wavelength of the harmonic in question. The worst case is if both channels contain a spectral component with the same level but opposite phase, since summing these components results in complete cancellation.
The fundamental of note A#4 (A# above middle C) is close to worst-case: in the L/mono output the fundamental is 24dB below what it should be. The mean spectrum of my A#4 recording shows this clearly:
The fundamental is essentially absent. The brain supplies the missing fundamental so we actually perceive the correct pitch, but the timbre of this note sounds thin.
At note F3, summing L+R channels to mono causes the 3rd harmonic to drop out, and the 2nd harmonic is weak as well:
This note sounds wrong for obvious reasons.
For every note above E2 the L/mono signal has at least one harmonic that is well below its expected level. Below E2 the first 5 harmonics have wavelength longer than about 1m so phase differences between spaced mics are insignificant.
- L and R channels contain surprisingly different spectra, with levels of most harmonics differing by 6dB or more but with no discernible pattern across the keyboard. This is probably due to the relationship between mic placement and standing waves within the cabinet of the piano that was sampled. The R channel has a more consistent timbre, but neither channel gets it right on its own. To get the most natural timbre we need to play back both channels, but in such a way that they don’t interfere (e.g. headphones).
- Contrary to the advice I often see given in musicians’ forums, better (more accurate) stereo monitors are not the solution. More accurate reproduction will serve only to more faithfully reproduce the problematic phase relationships inherent in the recorded samples.
- In both L and R channels the fundamental falls off steeply (24dB/oct) below note F2, as does the 2nd harmonic below F1. This suggests a steep high-pass filter around 85Hz. It could be the original piano that was sampled, or it could have been applied in post-processing of samples. Either way, for accurate reproduction of this sample library the sound system doesn’t need to go below 80Hz.
- The level of upper harmonics falls with rising pitch, for two reasons. (1) This is normal for any piano; notes above about E6 typically produce only the fundamental and maybe a weak 2nd harmonic. (2) Higher-frequency spectral components damp out faster, so in a mean (i.e. time-averaged) spectrum they show up with lower amplitude. In any case, spectral content falls off rapidly above 4kHz. For accurate reproduction of these samples the sound system doesn’t need to go above 10kHz (even 5kHz might be acceptable).
- The measurements presented here are specific to the sample library in the Yamaha CP50. However, from numerous comments in musicians’ forums I gather this behavior is typical of many other makes and models.
It will be difficult to make this piano sound right except in stereo headphones. In headphones destructive interference does not occur, since L and R signals are electrically and acoustically isolated. The brain blends the different spectra presented to the ears, smoothing out the timbral irregularities in the individual channels and resulting, perceptually, in the most natural timbre obtainable.
For live playback, stereo might be acceptable or it might not, depending on several factors. Near-field monitors can give good results for one listener, if they are placed at a very wide angle to mimic headphones. The listener’s head will partially block the R signal from reaching the left ear and vice versa, limiting the degree to which L and R channels interfere.
Otherwise, stereo playback results in L and R channels summing acoustically (at least on the mid-line between speakers) so at the ears the destructive interference will be the same as if the signals had been summed electrically to mono. This is why stereo pianos often sound bad in live situations.
To the extent that the reverberant field dominates at the listener position and presents a decorrelated sum of L and R recorded signals, the effects of destructive interference can be masked somewhat. Omnidirectional loudspeakers will likely perform best, since they put more energy into the reverberant field and allow the greatest number of room reflections to decorrelate the L and R acoustic signals. A pair of highly directional loudspeakers (e.g. typical horn-based PA systems and full-range monitors) presents the worst-case scenario: since these aim for a high ratio of direct to reverberant sound, they maximize the potential for audible destructive interference.
The results here suggest that using the R channel only (possibly in dual mono) will give much better results than playing back L and R channels in stereo, unless room reflections can be used to adequately mask the interference between channels. The L/mono jack should never be used on its own!
Update (18.9.2017): Mono playback of just the L or R channel avoids the phasing issues discussed here, but causes its own phasing issues in polyphonic music (i.e. chords). See this post for details.