There is often confusion with regards to latency between simple "reaction times", audio/video sync, single source latency (like the delay between the timer on a music player and the actual sound) and musician latency - where someone
creates the sound, and there is latency between that and hearing it, like with guitar effects, or using midi instruments. All of these have different latency tolerances, and then of course there is individual tolerance to latency.
Which one are we talking about in this thread?
In terms of "single source" latency really doesn't matter - which is why MP3's still rule even though they require a fairly large buffer, and therefore high latency. Because there is only one source of audio, there is nothing to compare it to, so latency can be seconds. It is also why MP3 can be used to record, but not for real-time transmission, regardless of how fast a processor one uses.
For audio/video sync that tolerance is different because it involves both visual and audible stimulus. I have no hard figures on this, but I vaguely remember reading about bluetooth audio causing sync problems with its stock codec with ~150ms latency, and that <~40ms is required for imperceptible sync.
For instances where one is *creating* the sound, as in singers, drummers, guitarist, musicians etc.. the figure is generally even lower, being roughly 20ms. Perception of delay here involves actually physical movements to audible sounds, again a different route in the brain. I am totally unsure of this as I have used guitar effects for years, and even in the 90's where the latency was >40ms I didn't notice it. Also, as
coppice rightly pointed out, 10ms is roughly 3.5m distance - which is easily how far a guitarist can be away from his amp. I think 10ms isn't unreasonable for a delay. Even with ASIO midi drivers on my PC I get about 15ms.
Also people forget about buffers. Digital processes generally require a certain "block" of audio to work with (like MP3 mentioned above). So it doesn't matter how fast your processor is, if a process requires a chunk of 2k samples, at 48ksps, thats ~42ms buffer latency right there. This is why modern real-time audio compression codecs work on smaller chunks of audio, and also why some resort to higher sample rates even though there isn't much different in perceived quality going beyond 48k - it halves the latency.
Sadly, any topics surrounding perception, or subjective things like audio latency, or quality tend to get quite heated, mostly because what you "hear" is also about expectations
You can adjust how much "latency" someone hears just by telling them numbers.