Dither and ecasound

Ecasound is an incredibly versatile audio processing tool but it doesn’t have dither.  I wish it did.

Quantization noise in ecasound

Suppose we use ecasound to generate a low-level (-78dBFS) 1kHz tone at 16-bit resolution, like this:

ecasound -t:12 -i:tone,sine,1000 -eadb:-78 \
         -f:s16_le -o:tone_undithered.wav

We should expect the output spectrum to have a sharp peak at 1kHz and nothing else, but the actual spectrum looks like this:

nodither(This is an average of 16 Hanning-windowed 50%-overlapped FFT records of length 2^16.  The reference line is at -96dB, the level of the least-significant bit.)  The distortion products at harmonics and sub-harmonics of 1kHz are obvious.  They result from re-quantization of ecasound’s internal signal representation (32-bit floating point) to 16-bit resolution at the output.  This distortion could be eliminated with dither, if only this were implemented in ecasound.

What is dither?

However digital audio is produced, at some point the individual samples get quantized to the resolution of the final storage format (usually 16- or 24-bit).  Generally the samples are rounded at the least-significant bit.  This introduces rounding error, and the rounding error effectively adds noise to the original signal.  Quantization noise doesn’t look or sound like regular noise: because the rounding error is correlated with the original signal, the quantization noise has a spectrum that contains deterministic peaks, which get modulated by the original signal.  That is, quantization (or re-quantization) adds harmonic distortion.

Quantization distortion can be reduced (even eliminated) by adding low-level noise before quantizing.  Generally the added noise is at the level of half the least-significant bit; for 16-bit audio that’s -102dB.  This is called dither and there are many variations on the basic idea.  Dither can eliminate quantization distortion, but at the cost of raising the noise floor.  Fortunately, noise shaping can be used to render the dither signal minimally audible.  I would argue that any time a signal is re-quantized to lower resolution, it should be done with dither.

Dithering output from ecasound

The undithered re-quantization from 32-bit float to 16-bit integer is internal to ecasound so we’re stuck with it for now.  Fortunately ecasound is open-source, so someone could fix this in the source.  I’ve had a look, but it’s a bit beyond me and I haven’t had time.

Instead, we can get ecasound to output its 32-bit floats directly, then use another tool to do the re-quantization to 16-bit ourselves.  Here is how to use sox to dither the 1kHz tone example above:

ecasound -t:12 -f:f32_le,1,44100 -i:tone,sine,1000 -eadb:-78 \
  -o:stdout | sox -q -c 1 -r 44100 -b 32 -e float -L -t raw - \
  -e signed -c 1 -b 16 -t wav tone_dithered.wav dither

That’s a mouthful: most of it is about explicitly defining the input and output audio stream formats, which we have to do since we’re piping raw data via stdout.  The output spectrum now looks like this:

dither1This is much better: a sharp peak at 1kHz, no other deterministic peaks, and a flat noise floor (the spectrum of the TPDF dither signal plus quantization noise) as expected.

Why should we care?

Arguably, at 16-bit resolution quantization noise may be inaudible: at below -96dBFS it’s buried in the noise floor of most audio equipment.  But there are other reasons to want dither.  For example, I want to use ecasound to test my LADSPA plugins for distortion.  Alas, the quantization distortion introduced by ecasound masks the distortion signals I’m looking for.  I can fix this by using sox to dither but this feels like a workaround; it would be much nicer if ecasound had built-in dithering.

Dynamic range

Another reason to dither is dynamic range.  A common misconception is that a 16-bit signal allows for only 96dB of dynamic range since the lowest-level signal that can be represented is at the level of the least-significant bit, which is 2^-16 = -96.3dBFS.  For example, if we try to generate a -102dBFS tone without dither, like this

ecasound -t:12 -i:tone,sine,1000 -eadb:-102 \
         -f:s16_le,1,44100 -o:lowlevel_undithered.wav

then the output will contain absolute silence: all samples are less than 2^-102 so they round to zero in the 16-bit re-quantization at the output.

Dither extends the dynamic range of 16-bit audio well beyond 96dB.   If we generate the same -102dBFS tone, but with dithered re-quantization to 16 bits like this:

ecasound -t:12 -f:f32_le,1,44100 -i:tone,sine,1000 -eadb:-102 \
  -o:stdout | sox -q -c 1 -r 44100 -b 32 -e float -L -t raw - \
  -e signed -c 1 -b 16 -t wav lowlevel_dithered.wav dither

then the output isn’t silent, and the spectrum clearly shows our 1kHz tone:

lowlevel_ditherWith only a naive understanding of digital audio one might think this impossible.  Of course the output wave form looks nothing like a 1kHz sine wave:

lowlevel_waveformYet when this signal is amplified, what we hear is a 1kHz tone clearly audible against a background of noise, just as the spectrum above predicts.  Try listening yourself:

ecasound -i:lowlevel_dithered.wav -eadb:66 -o:alsa

As Helmholtz discovered more than a century ago, we don’t hear a waveform: we perceive its Fourier transform.  (Actually just the magnitude; it seems we don’t perceive absolute phase.)

Dither with libsndfile?

Since ecasound can be made to use libsndfile for writing audio files, and since libsndfile has some dithering code, perhaps we can get ecasound to dither like this:

ecasound -t:12 -f:f32_le,1,44100 -i:tone,sine,1000 -eadb:-78 \
         -f:s16_le,1,44100 -o:sndfile,tone_undithered.wav

But no, it appears not:

lowlevel_sndfileNo sign of dither here, although interestingly many of the distortion products are lower than they were in the undithered example above; I have no idea why.

On closer inspection of the libsndfile code I don’t think dither is fully implemented.  It looks like there are some code stubs, but they don’t do anything.

5 thoughts on “Dither and ecasound

  1. Hi Richard,

    After ironing out a crossover using ecasound I noticed that there was all this distortion at low output levels. At first I blamed my amps, but some testing proved that this wasn’t the problem and it was upstream. I thought of your post, here, about dither. Could that be to blame? So I’ve implemented your approach where ecasound’s output is piped to Sox, and then out to the DACs via ALSA. Worked like a charm!

    It’s really impressive just how BAD the truncated audio data sounded when the “volume” level got down to low levels. The sound was distorted, and the lowest levels sounded like it was barely cutting in and out, almost like a dirty relay contact was the cause, or a diode had been inserted after the amp. It was definitely noticeable even on casual listening, because sometimes a quiet passage would sound distorted even if the overall volume level was not set all that low.

    How can we convince Kai that dither needs to be implemented in ecasound asap??? I would rather not have to pipe through Sox.

    • Hi Charlie,

      Agreed! Dithering via sox feels like a kludge. I thought of writing a ladspa dither plugin, but it would be limited to simple (e.g. TPDF) dither without noise shaping. Alas, people are going to want noise shaping, which would need to be done at the same point as the actual truncation (i.e. internal to ecasound) so the truncation noise signal can be pulled out and filtered.

      But it sounds to me like your underlying problem isn’t just output stage re-quantization. I’ve never heard quantization as bad as you describe. But dither covers a multitude of sins; it may be that adding dither is masking another problem. Where is the volume control in your chain? Maybe you have a poorly implemented software volume control somewhere?

      • The problematic distortion that disappeared when I implemented dither via Sox was definitely more than I would have anticipated. I am controlling volume in software (in VLC) which I assume is just manipulating an internal float. Happens for all source formats, e.g. mp3, wav, etc. in the same way. I am afraid that the volume operations are just a black box for me, although someone could comb through the source code to determine exactly what is going on.

        I would prefer to control volume (attenuation and gain) in the analog realm after the DAC but it’s not practical in my application.

      • I came back to add this:
        One thing that is changing when I pipe ecasound output to Sox is that ecasound is no longer sending output to alsa. I wonder if there is some issue going on there…

        I’m definitely no longer experiencing any problems with the Sox pipe, but it does require more CPU ticks and memory that way.

        • Is vlc outputting at 16-bit? If so, and if vlc’s software volume doesn’t use dither, then you have two sources of quantization distortion: vlc’s re-quantization to 16-bit (after scaling by the volume control), plus ecasound’s re-quantization to 16-bit at the final output. At low levels this could be bad! Your sox dither eliminates the second source of distortion, but can do nothing about the first.

          It’s essential to keep at least 24-bit resolution all the way through the processing chain. When I pipe audio from mpd to ecasound I pass the samples as 32-bit floats so there’s no re-quantization in the middle of the chain. Can you coerce vlc to do the same?

Leave a Reply to Richard Taylor Cancel reply

Your email address will not be published. Required fields are marked *