| General > General Technical Chat |
| Maximum slew rate typically found in music/voice |
| << < (5/14) > >> |
| TimFox:
A serious suggestion for the mathematically inclined to answer the original question: 1. Assume the "music/voice" to be analyzed is on a conventional audio CD. 2. The 16-bit digital waveform, sampled at 44 sec-1, goes through a reconstruction filter (something like sinx/x) to give the audio output. 3. Determine the code that gives you a maximum 1 kHz sine wave and define its reconstructed output as the reference level. 4. Put the full-scale digital square wave (zero and max on alternating samples) into the same reconstruction filter and compute the slew rate at the output. 5. Determine the code that gives you a 20 kHz square wave and do the same. |
| Nominal Animal:
Let's expand what TimFox described. Conventional audio CD's contain metadata (including error correction), and uncompressed pulse-code modulated 16-bit stereo (two channel) data sampled at 44100 samples per second. The maximum slew rate is therefore 2¹⁶ = 65536 quantization steps per one 44100'th of a second, or 65536×44100 = 2890137600 steps/second ≃ 2.89 quantisation steps per nanosecond. If the audio CD data contains an alternating sample sequence (-32768, +32767, -32768, +32767, ...), per Nyquist-Shannon theorem, it should be reconstructed as a perfect 22050 Hz sine wave with maximum amplitude. This leads to the \$2 \pi f V_{pk}\$ (here, \$138544 \, V_{pk}\$ per second, or \$0.1385 \, V_{pk}\$ per microsecond) minimum slew rate required for both the incoming circuitry before the ADC and the output amplifier circuitry after the DAC. Does the DAC have more stringent slew rate requirements? Slewing from rail to rail in a single sample period is \$2 \, V_{pk}\$ in one 44100'ths of a second, or \$88200 \, V_{pk}\$ per second, which is less than the aforementioned reconstruction limit. This means that with a theoretically perfect brick-wall low-pass filter, the \$0.1385 \, V_{pk}\$ slew rate suffices, but this slew rate is \$1.5707\$ times (\$\pi/2\$) as fast as just slewing from rail to rail in a single sample period. This affects our choice of DACs, as just being able to slew from rail to rail in a single sample period is not sufficient; it needs to slew basically 1.5707 times rail-to-rail range, in a single sample period. If we did a Fourier analysis of the error spectrum at different (higher than necessary) DAC slew rates, we'd find a faster slew rate does push the error noise somewhat higher in the output spectrum, which makes it easier to filter out this particular error using analog circuits. Note that we're still assuming a brick-wall low-pass filter at 22050 Hz for reconstructing the highest-frequency components in the audio signal, though. For the ADC, there is no "slew rate" requirement as such, if we assume instantaneous sampling in time, or integration of the input signal for the duration of each sample. Existing ADCs differ, but their frequency response is known, and can be compensated using filters prior to the ADC. The main problem is that for perfect signal capture, we'd again need a brick wall low-pass filter at 22050 Hz, and AC coupling (i.e. rejecting the DC component). In practice, human hearing does not go below 10 Hz - 20 Hz, so signals below 10 Hz can be rejected also, giving us a brick-wall 10 Hz to 22050 Hz band-pass input filter requirement. Here we get into the realm that will forever weird out Audiophōles. If we replace the exact DAC with an oversampled dithering DAC, we can push all quantization noise to higher frequencies, so that we don't need a brick wall filter; a more realistic low-pass filter can reconstruct our original signal perfectly. Looking up delta-sigma modulation shows how these are done in practice, noting that the intermediate steps (within the modulation scheme) do require much higher clock rates than the sample rate used. It is also useful to note that single-bit delta-sigma modulation is pulse-density modulation, exactly. It is also useful to understand what kind of voltages are used to process and transfer audio signals. The most common standard is line level, which has \$V_{pk}\$ of 1.414 V at 0 dBu (decibels unloaded) in "consumer" devices, and 1.095 V at 0 dBu in "pro" devices, with signals clipped to somewhere between ±1.5V and ±2.0V. Thus, an initial assumption of \$V_{pk} \approx 1.4 \text{ V}\$ for consumer line-level audio that does not clip, is sensible. Combining all of the above, and a rough estimate of \$f = 20 \text{ kHz}\$ for the highest frequency we humans care about, we can say that using line levels, the maximum slew rate needed is \$2 \pi f V_{pk} \approx 0.2 \text{ V/µs}\$. To understand the range in slew rates we should consider, let's consider superhuman hearing that can detect components up to \$f = 25 \text{ kHz}\$, and a fully clipping signal with say \$V_{pk} = 2 \text{ V}\$. The slew rate we get for this is \$\approx 0.31 \text{ V/µs}\$. Because we are talking about stereo audio, however, we do need to consider the one oddity about human hearing: time discrimination. Humans can detect audio signal time separation down to 10 µs, which corresponds to 100 kHz. That is, because of the exact mechanism of human hearing (which is very much a spectrum analyzer, rather than time-domain sampling), humans can detect much smaller time delays than the maximum frequency they can hear. For engineers, the time-domain discrimination for changes in the spectrum detected in each ear is 10 µs. In turn, this does mean that even though 20 Hz .. 20 kHz bandwidth per ear suffices, we may need much higher bandwidth to properly represent 3D audio effects, because of the out extreme time-domain discrimination ability! Which also explains why 192 kHz and audio sampling, even when band-filtered to say 10 Hz ... 20 kHz, can produce superior stereo/3D audio experience. Of course, that only really applies when the speaker configuration matches the microphone configuration, preferably a human head acoustic model with earlobes and all. It also turns out that most 3D effects do not rely on high time-domain discrimination at all, but more on spectrum shaping; basically, our earlobes and the shape of our head causes sound spectra to be filtered differently based on their direction, with the time-domain separation being just "fine tuning" on top of that. You can investigate and experiment on this further by looking into the open source OpenAL library. |
| nctnico:
--- Quote from: tom66 on September 15, 2023, 06:40:51 pm ---Would it matter - if the human ear can't typically hear above 20kHz, the maximum slew rate is a function of frequency, since any energy above that frequency is going to be ignored by the low pass filter that is the human auditory system. There will be some audiophiles who claim that sampling above ~44kHz is necessary for one reason or another but AFAIK there's no scientific basis for those claims. --- End quote --- The problem is that -like with any filter- you'll see / hear distortions happen at much lower frequencies. Brickwall filtering at 20kHz gives nasty effects as well. So yes, for excellent audio quality you'll need to sample at much higher frequencies. Some of the higher end audio amplifiers have bandwidths up to 200kHz or more in order to have the lowest phase shift in the audio band. In the end a CD is pretty bad where it comes to frequency and dynamic range. --- Quote from: Nominal Animal on September 16, 2023, 06:05:51 pm ---Because we are talking about stereo audio, however, we do need to consider the one oddity about human hearing: time discrimination. Humans can detect audio signal time separation down to 10 µs, which corresponds to 100 kHz. That is, because of the exact mechanism of human hearing (which is very much a spectrum analyzer, rather than time-domain sampling), humans can detect much smaller time delays than the maximum frequency they can hear. For engineers, the time-domain discrimination for changes in the spectrum detected in each ear is 10 µs. In turn, this does mean that even though 20 Hz .. 20 kHz bandwidth per ear suffices, we may need much higher bandwidth to properly represent 3D audio effects, because of the out extreme time-domain discrimination ability! --- End quote --- No! You don't need a higher samplerate to phase shift a signal by a small amount. In case of audio you need a higher samplerate /bandwidth to preserve the frequency/phase response better. There is a slight difference there. . |
| Nominal Animal:
--- Quote from: nctnico on September 16, 2023, 06:15:41 pm ---No! You don't need a higher samplerate to phase shift a signal by a small amount. --- End quote --- That statement makes no sense. First, phase shift of a discretized signal is not exactly the same as delaying it, because the wave still starts at sampling window boundary, at best at a sample boundary, in the reconstituted signal. As it is the leading edge of a wave packet that is detected, the sinusoidal phases start at zero degrees, and shifting it to nonzero causes a leading error in the reconstituted signal, which can be audible as a high-frequency "click"-type noise. That is, it is true that for a band- or low-pass limited signal, the higher time resolution does correspond to greater phase resolution, mathematically. For a discretized signal, phase shift is problematic at the leading wavefront; and the leading wavefront is what is involved in the time discrimination here. Second, phase resolution in a discrete signal is solely determined by the sample rate \$f_s\$ and sinusoidal component frequency \$f\$: \$360° \, f / f_s\$. At the Nyquist-Shannon limit of half the sample rate, there are only two possible phases, 0° and 180°, because the discrete signal consists of alternating values. At quarter the sample rate, there are four possible phases, and so on. Phase resolution in a discrete signal is therefore always inversely proportional to frequency. --- Quote from: nctnico on September 16, 2023, 06:15:41 pm ---In case of audio you need a higher samplerate to preserve the frequency/phase response better. --- End quote --- That's not exactly it, though: for mono audio, none of this matters. We only need a higher samplerate to preserve phase resolution of the difference of the signal to each ear at effective frequencies. Consider a stereo audio signal $$\begin{cases} L(t) = 0, & t \lt 0 \text{ or } t \gt 1 \\ R(t) = 0, & t \lt \tau \text{ or } t \gt 1+\tau \\ L(t) = (1 - t)^2 \sin\bigr(2 \pi f t\bigr), & 0 \le t \le 1 \\ R(t) = (1 + \tau - t)^2 \sin\bigr(2 \pi f (t - \tau)\bigr), & \tau \le t \le 1+\tau \\ \end{cases}$$ where \$\tau\$ is the delay on arrival for the right channel, \$t\$ is time, and \$f\$ the fundamental frequency. For such a signal with sharp leading edge or peak, it is \$\tau\$ that has the 10 µs resolution, even though our hearing is limited to approximately \$20 \text{ Hz } \le f \le 20000 \text{ Hz}\$ or so (as if we had 50 µs \$t\$ sampling intervals). If we stored our audio as \$S(t) = \bigl(L(t)+R(t)\bigr)/2\$ and \$D(t) = \bigl(L(t) - R(t)\bigr)/2\$, only the latter (difference) would need the higher time resolution (sample rate, with both having the same bandwidth). To reconstitute, we'd need to use the higher time resolution for both channels, and apply \$L(t) = S(t) + D(t)\$ and \$R(t) = S(t) - D(t)\$. For a discrete audio signal, phase is an useful mathematical tool, though. For example, the peak sensitivity for human hearing is a bit below 4000 Hz; let's approximate it as 3675 Hz, or 1/12th of the CD audio sample rate of 44100 samples per second. This means that at that frequency, the phase resolution of CD audio is 360°/12 = 30°. If we produce two sinusoidal signals at 3675 Hz (each full wave taking about 272 µs), one for each ear, both starting at zero phase, but one minutely delayed, humans can generally discriminate at down to 10 µs difference. That corresponds to a 360° × 10µs / 272µs ≃ 13° phase difference at 3675 Hz. Thus, CD stereo audio does not have sufficient phase resolution at 3675 Hz to match human hearing. Yet, phase does not convey the correct underlying concept, and may lead to problems –– for example, the case above, initial samples in a phase-shifted wave packet leading edge, that tripped nctnico too. Humans are born with about 3500 hair cells in each ear, with each cell having a bundle of 50 to 100 hairs, stereocilia, which sense a specific frequency range. We can think of each ear as a spectrum analyzer with about 3500 channels (with each channel consisting of a few dozen frequency samplers within its range). Our brain can detect down to 10 µs delays between the activation of a pair of corresponding channels in each ear. Therefore, instead of pure sinusoidal signals, it is better to use the concept of wave packets –– those with a steep leading edge or peak, for arrival time discrimination. In some sense, the "spectrum analyzer" does 100,000 spectrums per second; but this isn't exactly correct either, as most changes are detected at much, much lower rate. Overall, we're deep into human physiology and psychoacoustic model here. |
| nctnico:
--- Quote from: Nominal Animal on September 16, 2023, 09:12:36 pm --- --- Quote from: nctnico on September 16, 2023, 06:15:41 pm ---No! You don't need a higher samplerate to phase shift a signal by a small amount. --- End quote --- That statement makes no sense. First, phase shift of a discretized signal is not exactly the same as delaying it, because the wave still starts at sampling window boundary, at best at a sample boundary, in the reconstituted signal. As it is the leading edge of a wave packet that is detected, the sinusoidal phases start at zero degrees, and shifting it to nonzero causes a leading error in the reconstituted signal, which can be audible as a high-frequency "click"-type noise. That is, it is true that for a band- or low-pass limited signal, the higher time resolution does correspond to greater phase resolution, mathematically. For a discretized signal, phase shift is problematic at the leading wavefront; and the leading wavefront is what is involved in the time discrimination here. Second, phase resolution in a discrete signal is solely determined by the sample rate \$f_s\$ and sinusoidal component frequency \$f\$: \$360° \, f / f_s\$. At the Nyquist-Shannon limit of half the sample rate, there are only two possible phases, 0° and 180°, because the discrete signal consists of alternating values. At quarter the sample rate, there are four possible phases, and so on. Phase resolution in a discrete signal is therefore always inversely proportional to frequency. --- End quote --- You are forgetting the actual sample values / resolution. You can sample a sine wave at a random phase and still end up with a sine wave. If your DAC resolution is infinite, then your phase resolution will also be infinite. And this holds true up to (not at) the Nyquist-shannon limit. |
| Navigation |
| Message Index |
| Next page |
| Previous page |