Quantization noise in ecasound
Suppose we use ecasound to generate a low-level (-78dBFS) 1kHz tone at 16-bit resolution, like this:
ecasound -t:12 -i:tone,sine,1000 -eadb:-78 \ -f:s16_le -o:tone_undithered.wav
We should expect the output spectrum to have a sharp peak at 1kHz and nothing else, but the actual spectrum looks like this:
(This is an average of 16 Hanning-windowed 50%-overlapped FFT records of length 2^16. The reference line is at -96dB, the level of the least-significant bit.) The distortion products at harmonics and sub-harmonics of 1kHz are obvious. They result from re-quantization of ecasound’s internal signal representation (32-bit floating point) to 16-bit resolution at the output. This distortion could be eliminated with dither, if only this were implemented in ecasound.
What is dither?
However digital audio is produced, at some point the individual samples get quantized to the resolution of the final storage format (usually 16- or 24-bit). Generally the samples are rounded at the least-significant bit. This introduces rounding error, and the rounding error effectively adds noise to the original signal. Quantization noise doesn’t look or sound like regular noise: because the rounding error is correlated with the original signal, the quantization noise has a spectrum that contains deterministic peaks, which get modulated by the original signal. That is, quantization (or re-quantization) adds harmonic distortion.
Quantization distortion can be reduced (even eliminated) by adding low-level noise before quantizing. Generally the added noise is at the level of half the least-significant bit; for 16-bit audio that’s -102dB. This is called dither and there are many variations on the basic idea. Dither can eliminate quantization distortion, but at the cost of raising the noise floor. Fortunately, noise shaping can be used to render the dither signal minimally audible. I would argue that any time a signal is re-quantized to lower resolution, it should be done with dither.
Dithering output from ecasound
The undithered re-quantization from 32-bit float to 16-bit integer is internal to ecasound so we’re stuck with it for now. Fortunately ecasound is open-source, so someone could fix this in the source. I’ve had a look, but it’s a bit beyond me and I haven’t had time.
Instead, we can get ecasound to output its 32-bit floats directly, then use another tool to do the re-quantization to 16-bit ourselves. Here is how to use sox to dither the 1kHz tone example above:
ecasound -t:12 -f:f32_le,1,44100 -i:tone,sine,1000 -eadb:-78 \ -o:stdout | sox -q -c 1 -r 44100 -b 32 -e float -L -t raw - \ -e signed -c 1 -b 16 -t wav tone_dithered.wav dither
That’s a mouthful: most of it is about explicitly defining the input and output audio stream formats, which we have to do since we’re piping raw data via stdout. The output spectrum now looks like this:
Why should we care?
Arguably, at 16-bit resolution quantization noise may be inaudible: at below -96dBFS it’s buried in the noise floor of most audio equipment. But there are other reasons to want dither. For example, I want to use ecasound to test my LADSPA plugins for distortion. Alas, the quantization distortion introduced by ecasound masks the distortion signals I’m looking for. I can fix this by using sox to dither but this feels like a workaround; it would be much nicer if ecasound had built-in dithering.
Another reason to dither is dynamic range. A common misconception is that a 16-bit signal allows for only 96dB of dynamic range since the lowest-level signal that can be represented is at the level of the least-significant bit, which is 2^-16 = -96.3dBFS. For example, if we try to generate a -102dBFS tone without dither, like this
ecasound -t:12 -i:tone,sine,1000 -eadb:-102 \ -f:s16_le,1,44100 -o:lowlevel_undithered.wav
then the output will contain absolute silence: all samples are less than 2^-102 so they round to zero in the 16-bit re-quantization at the output.
Dither extends the dynamic range of 16-bit audio well beyond 96dB. If we generate the same -102dBFS tone, but with dithered re-quantization to 16 bits like this:
ecasound -t:12 -f:f32_le,1,44100 -i:tone,sine,1000 -eadb:-102 \ -o:stdout | sox -q -c 1 -r 44100 -b 32 -e float -L -t raw - \ -e signed -c 1 -b 16 -t wav lowlevel_dithered.wav dither
then the output isn’t silent, and the spectrum clearly shows our 1kHz tone:
ecasound -i:lowlevel_dithered.wav -eadb:66 -o:alsa
As Helmholtz discovered more than a century ago, we don’t hear a waveform: we perceive its Fourier transform. (Actually just the magnitude; it seems we don’t perceive absolute phase.)
Dither with libsndfile?
Since ecasound can be made to use libsndfile for writing audio files, and since libsndfile has some dithering code, perhaps we can get ecasound to dither like this:
ecasound -t:12 -f:f32_le,1,44100 -i:tone,sine,1000 -eadb:-78 \ -f:s16_le,1,44100 -o:sndfile,tone_undithered.wav
But no, it appears not:
On closer inspection of the libsndfile code I don’t think dither is fully implemented. It looks like there are some code stubs, but they don’t do anything.