EEVblog > EEVblog Specific
EEVblog 1658 - TUTORIAL: Mean vs Median
<< < (6/7) > >>
iMo:
I've been using the median for on the fly filtering the incoming data from my 34401A. It removes outliers like random EMI spikes etc and does low-passing as well.

Btw there are industrial sectors (like Health) where using the "average/mean" is almost considered a swear word (the Health sector is The World of Outliers)..

Also - not sure the median is less compute intensive compared to average/mean (as it has been said in the video).
Try to calculate the average/mean of 1000 floating point numbers vs. median of 1000 fp numbers (the sorting is pretty expensive exercise compared to 1000 adds and a single divide).
golden_labels:
Being introduced to statistics with gussian-ish distributions leads to developing some poor, or outright wrong, intuition. Took me years to recover, and I guess I’ll never be completely cured.

One of the mistakes is unconditional belief in the mode ≤ median ≤ mean (or mirrored) rule. It doesn’t hold universally. Multimodal distributions often shred it to pieces:



But unimodal ones may also produce unexpected results:


Distribution design by Glen_B of StackOverflow(1)

Another common, but wrong intuition is with long tails. With the normal distribution we’re used to interpreting extreme values as an irrelevant. Similar to how a cone filled with liquid holds most of it in the widest section. While this is true for the normal distribution, it is not in general. A tail may look unimpressive on a graph. But it may contain a significant part of the population.

(1) Both smooth and empirical distributions with that property also exist. But they are either not showing the effect to that extent, or are empirical data that with unknown underlying model. I chosen Glen_B’s one, as it’s readily available, simple, and the effect is dramatic.
coppice:

--- Quote from: golden_labels on January 02, 2025, 12:56:44 pm ---Being introduced to statistics with gussian-ish distributions leads to developing some poor, or outright wrong, intuition. Took me years to recover, and I guess I’ll never be completely cured.

--- End quote ---
Don't most people start with linear distributions, like tossing a die, then work up to the idea that munging a lot of those together tends towards a Gaussian distribution, often with its tails trimmed by the linear distribution having limits?

--- Quote from: golden_labels on January 02, 2025, 12:56:44 pm ---One of the mistakes is unconditional belief in the mode ≤ median ≤ mean (or mirrored) rule. It doesn’t hold universally. Multimodal distributions often shred it to pieces:

--- End quote ---
They do tend to show some poorly chosen examples in books and presentations, but it should be obvious that the mode can be practically anywhere. Many things combine a strong tendency to one value, while everything which isn't that value is more or less Gaussian. You see these narrow spikes imposed on a bell like distribution all over the place.

--- Quote from: golden_labels on January 02, 2025, 12:56:44 pm ---Another common, but wrong intuition is with long tails. With the normal distribution we’re used to interpreting extreme values as an irrelevant. Similar to how a cone filled with liquid holds most of it in the widest section. While this is true for the normal distribution, it is not in general. A tail may look unimpressive on a graph. But it may contain a significant part of the population.

--- End quote ---
People are pretty weak on the effect of the tails, especially when there is asymmetry in them.
* Let's say you use the common practice of producing a Gaussian like distribution by summing 12 random integers, and dividing by 12. The tails are very much trimmed. Try using that to model a rare event and you will probably conclude the rare event never actually occurs. You have none of the tail areas which show up many of those rare events.
* Asymmetry in the tails gets really interesting. If you can get to zero on one side, but the other is unbounded you often tend towards a Pareto distribution instead of a Gaussian one. If people have studied enough stats to get beyond the intuitive idea that munging a lot of distributions together tends towards Gaussian, and learn the central limit theorem properly, they should have noted that the requirement towards Gaussian requires a stable mean. No stable mean, means the distribution will probably be something funky, Pareto being one of the interesting forms of funkiness.
Picuino:
It is especially different in measurements that are bounded on one side, but not bounded on the other. For example, in wages.

http://news.bbc.co.uk/2/hi/uk_news/magazine/7581120.stm

TimFox:
An interesting example of the statistics of resistor production lots:  Two reports cited on p. 56 of D Self Small signal audio design Focal press 2015.
(His discussion of improving statistical accuracy with multiple resistors.)
1.  He cites H Kroeze, data from 211 10 k\$\Omega\$ 1% metal film resistors (from the same batch):  mean 9995 \$\Omega\$ (0.05% low), apparently Gaussian distribution with standard deviation 10 \$\Omega\$ (0.1%).
2.  His own data from 100 cheap Yageo 1000 \$\Omega\$ 1% metal film resistors: mean 997.66 \$\Omega\$, std dev 2.10 \$\Omega\$ (0.21%), one outlier at 0.7%, all others within 0.5%, again apparently Gaussian.
His conclusion is that the manufacturer doesn't select the best parts from a production run to sell at tighter tolerance.  He suspects that old-style carbon composition resistors may be subject to that selection.
Navigation
Message Index
Next page
Previous page
There was an error while thanking
Thanking...

Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod