General > General Technical Chat
Maximum slew rate typically found in music/voice
<< < (10/14) > >>
Nominal Animal:

--- Quote from: gf on September 21, 2023, 08:05:31 am ---I think the main problem is that the the original signal violates the sampling theorem.
--- End quote ---
I don't see any way of low-pass filtering the original signal that would not "smear" the attack phase of the wave packet.

Another way to put it, I guess, is to say that while humans cannot typically sense sounds above 20 kHz or so, our time discrimination (arrival time difference between each ear) is such that components above 20 kHz are involved, with the time discrimination ability measured to be on the order of 10 µs.

The reason I keep talking about wave packets is because it much better matches the physical operation of the stereocilia and hair cells.  It is the time difference between activation of the corresponding hair cells in the ears that has the 10 µs time discrimination ability.

Now, how would you model that as a simple frequency response?  I don't know exactly how.  It is much easier to do using wave packet model, where if you do an infinite-width Fourier analysis of the waveform, you indeed do have frequencies above the range at which humans can hear a sinusoidal continuous signal.  It should not be a surprise that a pure frequency domain representation cannot capture a mixed frequency-domain (stereocilia and hair cells, that respond to a narrow band of frequencies) and time domain (band activation time difference in the auditory brainstem) sensors.


The uncontested facts are that while a typical human ear can sense acoustic waves within 20 Hz to 20 kHz or so, and differences as small as 10 µs between initial arrival time in each ear.  The peak sensitivity is a bit below 4000 Hz, which means that at that frequency, the time discrimination is on the order of one 25th of the wavelength.  Another sensitivity peak is around 1000 Hz, where the time discrimination is on the order of one hundredth of the wavelength.

The question seems to be whether CD audio can capture that or not.

My understanding is that in optimal conditions (i.e., volume set depending on background noise levels, background noise having no sharp spectral peaks) mono CD audio does convey basically all perceivable information to a typical human (if we exclude dynamic range changes, like an orchestra playing some parts very loud, and some parts very quietly; human hearing, effectively, has an automatic gain control relative to the surrounding noise floor, if you will).  For stereo, it does not convey all the information needed for the human auditory senses to perceive the exact direction of specific types of sounds, "smearing" their directionality.  The reason for this belief is the physical structure of the audio sensor, having stereocilia waving in a fluid that in groups trigger nerve cells, with groupwise arrival time differences from each ear processed in the auditory brainstem (via the cochlear nucleus).  It is neither purely time-domain, nor purely frequency-domain apparatus, but a mix.

I've probably messed up my explanations and my math above –– I do make errors often ––, but the above paragraph is basically what I've been trying to convey.

The end result, in my opinion, is that 24-bit quantization allows a larger dynamic range, and 192 kHz sample rate better captures the stereophonic information human hearing can detect, and is not just audiophoolery or done because it makes filtering and quantization noise shaping easier.
CatalinaWOW:
Or saying the same thing as Nominal (I think), in somewhat different words.  Our brain is not just a simple RMS meter connected to a microphone and front end amplifier.  There are at least two types of sensors and literally hundreds or thousands of each type.  Non-linear responses are clearly evident and there is evidence of temporal variation on time scales ranging from under a second to decades.  Our experience of sound is the brain combining and interpreting all of those different inputs.

Reconstructing the pressure variations in the ear canal produced by the original sound source is the goal of all audio systems, but a simple measure of performance measured in the 20-20k band at some amplitude is a necessary but not necessarily sufficient criteria. 

Unfortunately at our current level of understanding we can't really define a better measure, and so how it sounds is all we can do.  And that is subjective and also subject to social pressures.  For most of us CD quality sound is more than good enough, and for a somewhat smaller subset of us streaming quality digital audio is just fine.  But I believe there are some who really do hear defects in CD quality sound, and many more who have convinced themselves they can for a variety of reasons.
NiHaoMike:
Those who use equalization/ "room correction" will want more than 16 bits playback capability, since by the very nature it operates, it will reduce some frequencies so that they'll end up with less than 16 bits of resolution if you start with just 16 bits. Starting with 24 bits (or even 20 "real bits"), there will be room for some pretty aggressive adjustments and still have at least 16 bits of resolution left over for every frequency in the range.
metebalci:
When reading the posts, I was thinking there has to be some amount of studies done in this topic (I have read some amount of psycho/technical acoustics and auditory neuroscience papers many years ago). Here is the abstract of one I quickly found from 2007:

"Misalignment in timing between drivers in a speaker system and temporal smearing of signals in components and cables have long been alleged to cause degradation of fidelity in audio reproduction. It has also been noted that listeners prefer higher sampling rates (e.g., 96 kHz) than the 44.1 kHz of the digital compact disk, even though the 22 kHz Nyquist frequency of the latter already exceeds the nominal single-tone high-frequency hearing limit fmax∼18 kHz. These qualitative and anecdotal observations point to the possibility that human hearing may be sensitive to temporal errors, τ, that are shorter than the reciprocal of the limiting angular frequency [2πfmax] ~ 9us, thus necessitating bandwidths in audio equipment that are much higher than fmax in order to preserve fidelity. The blind trials of the present work provide quantitative proof of this by assessing the discernability of time misalignment between signals from spatially displaced speakers. The experiment found a displacement threshold of d≈2 mm corresponding to a delay discrimination of τ≈6 μs."

http://boson.physics.sc.edu/~kunchur/papers/Audibility-of-time-misalignment-of-acoustic-signals---Kunchur.pdf

So it looks like 20 kHz is obviously not enough to replicate natural listening experience.

I dont think enough is known about how this works (in terms of neural coding, early processing in the brain etc. at least it was the case ~10y ago) so the only way is doing experiments and this gives a starting point. It is not much ungrounded to move to 96 or 192 kHz.
tggzzz:
The concept of "degregation of fidelity" misses the elephant in the room.

Consider recording an orchestra in a concert hall. Where exactly should the microphones be placed? Conductor's podium? First row of the audience? Centre, right left? Centre of the audience? With or without a complete audience?

The concept of a single place for the microphones is, of course, too simplistic. But it does highlight the point that there cannot be a single "correct" sound - it is all a choice made by the recording engineer.

Back when I had ears and CDs were new, I and a friend did A-B comparisons between a CD and vinyl. His setup was good, but not audiophool. We could tell a difference, but we could not tell which was which nor which was better.
Navigation
Message Index
Next page
Previous page
There was an error while thanking
Thanking...

Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod