Author Topic: EEVblog 1658 - TUTORIAL: Mean vs Median  (Read 5062 times)

0 Members and 1 Guest are viewing this topic.

Offline iMo

  • Super Contributor
  • ***
  • Posts: 5662
  • Country: gw
Re: EEVblog 1658 - TUTORIAL: Mean vs Median
« Reply #25 on: January 02, 2025, 10:07:07 am »
I've been using the median for on the fly filtering the incoming data from my 34401A. It removes outliers like random EMI spikes etc and does low-passing as well.

Btw there are industrial sectors (like Health) where using the "average/mean" is almost considered a swear word (the Health sector is The World of Outliers)..

Also - not sure the median is less compute intensive compared to average/mean (as it has been said in the video).
Try to calculate the average/mean of 1000 floating point numbers vs. median of 1000 fp numbers (the sorting is pretty expensive exercise compared to 1000 adds and a single divide).
« Last Edit: January 02, 2025, 10:21:27 am by iMo »
Readers discretion is advised..
 

Offline golden_labels

  • Super Contributor
  • ***
  • Posts: 1557
  • Country: pl
Re: EEVblog 1658 - TUTORIAL: Mean vs Median
« Reply #26 on: January 02, 2025, 12:56:44 pm »
Being introduced to statistics with gussian-ish distributions leads to developing some poor, or outright wrong, intuition. Took me years to recover, and I guess I’ll never be completely cured.

One of the mistakes is unconditional belief in the mode ≤ median ≤ mean (or mirrored) rule. It doesn’t hold universally. Multimodal distributions often shred it to pieces:



But unimodal ones may also produce unexpected results:


Distribution design by Glen_B of StackOverflow(1)

Another common, but wrong intuition is with long tails. With the normal distribution we’re used to interpreting extreme values as an irrelevant. Similar to how a cone filled with liquid holds most of it in the widest section. While this is true for the normal distribution, it is not in general. A tail may look unimpressive on a graph. But it may contain a significant part of the population.


(1) Both smooth and empirical distributions with that property also exist. But they are either not showing the effect to that extent, or are empirical data that with unknown underlying model. I chosen Glen_B’s one, as it’s readily available, simple, and the effect is dramatic.
People imagine AI as T1000. What we got so far is glorified T9.
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 10289
  • Country: gb
Re: EEVblog 1658 - TUTORIAL: Mean vs Median
« Reply #27 on: January 02, 2025, 04:41:25 pm »
Being introduced to statistics with gussian-ish distributions leads to developing some poor, or outright wrong, intuition. Took me years to recover, and I guess I’ll never be completely cured.
Don't most people start with linear distributions, like tossing a die, then work up to the idea that munging a lot of those together tends towards a Gaussian distribution, often with its tails trimmed by the linear distribution having limits?
One of the mistakes is unconditional belief in the mode ≤ median ≤ mean (or mirrored) rule. It doesn’t hold universally. Multimodal distributions often shred it to pieces:
They do tend to show some poorly chosen examples in books and presentations, but it should be obvious that the mode can be practically anywhere. Many things combine a strong tendency to one value, while everything which isn't that value is more or less Gaussian. You see these narrow spikes imposed on a bell like distribution all over the place.
Another common, but wrong intuition is with long tails. With the normal distribution we’re used to interpreting extreme values as an irrelevant. Similar to how a cone filled with liquid holds most of it in the widest section. While this is true for the normal distribution, it is not in general. A tail may look unimpressive on a graph. But it may contain a significant part of the population.
People are pretty weak on the effect of the tails, especially when there is asymmetry in them.
  • Let's say you use the common practice of producing a Gaussian like distribution by summing 12 random integers, and dividing by 12. The tails are very much trimmed. Try using that to model a rare event and you will probably conclude the rare event never actually occurs. You have none of the tail areas which show up many of those rare events.
  • Asymmetry in the tails gets really interesting. If you can get to zero on one side, but the other is unbounded you often tend towards a Pareto distribution instead of a Gaussian one. If people have studied enough stats to get beyond the intuitive idea that munging a lot of distributions together tends towards Gaussian, and learn the central limit theorem properly, they should have noted that the requirement towards Gaussian requires a stable mean. No stable mean, means the distribution will probably be something funky, Pareto being one of the interesting forms of funkiness.
 

Offline Picuino

  • Super Contributor
  • ***
  • Posts: 1119
  • Country: es
    • Picuino web
Re: EEVblog 1658 - TUTORIAL: Mean vs Median
« Reply #28 on: January 02, 2025, 05:26:27 pm »
It is especially different in measurements that are bounded on one side, but not bounded on the other. For example, in wages.

http://news.bbc.co.uk/2/hi/uk_news/magazine/7581120.stm

 

Offline TimFox

  • Super Contributor
  • ***
  • Posts: 9285
  • Country: us
  • Retired, now restoring antique test equipment
Re: EEVblog 1658 - TUTORIAL: Mean vs Median
« Reply #29 on: January 02, 2025, 06:52:47 pm »
An interesting example of the statistics of resistor production lots:  Two reports cited on p. 56 of D Self Small signal audio design Focal press 2015.
(His discussion of improving statistical accuracy with multiple resistors.)
1.  He cites H Kroeze, data from 211 10 k\$\Omega\$ 1% metal film resistors (from the same batch):  mean 9995 \$\Omega\$ (0.05% low), apparently Gaussian distribution with standard deviation 10 \$\Omega\$ (0.1%).
2.  His own data from 100 cheap Yageo 1000 \$\Omega\$ 1% metal film resistors: mean 997.66 \$\Omega\$, std dev 2.10 \$\Omega\$ (0.21%), one outlier at 0.7%, all others within 0.5%, again apparently Gaussian.
His conclusion is that the manufacturer doesn't select the best parts from a production run to sell at tighter tolerance.  He suspects that old-style carbon composition resistors may be subject to that selection.
 

Offline golden_labels

  • Super Contributor
  • ***
  • Posts: 1557
  • Country: pl
Re: EEVblog 1658 - TUTORIAL: Mean vs Median
« Reply #30 on: January 03, 2025, 12:08:07 pm »
Don't most people start with linear distributions, like tossing a die, then work up to the idea that munging a lot of those together tends towards a Gaussian distribution, often with its tails trimmed by the linear distribution having limits?
Introduction to probabilistics is indeed the binomial and uniform distributions. But in statistics I’d say it’s the normal distribution. But you have a point: for some people it might be the binomial distribution, that builds the concept of irrelevant tails.

They do tend to show some poorly chosen examples in books and presentations, but it should be obvious that the mode can be practically anywhere. Many things combine a strong tendency to one value, while everything which isn't that value is more or less Gaussian. You see these narrow spikes imposed on a bell like distribution all over the place.
I would argue it’s not obvious, in particular for unimodal distributions. All common distributions obey the inequality and most well-known do too. While in strict sense they’re not rare, it’s actually hard to find a theoretical distribution that doesn’t observe the mean-median-mode inequality, if you want it to be elegant or, worse, smooth and not complicated. This is why I didn’t even try, but used somebody else’s work.

An interesting example of the statistics of resistor production lots: (…) His conclusion is that the manufacturer doesn't select the best parts from a production run to sell at tighter tolerance.  He suspects that old-style carbon composition resistors may be subject to that selection.
The notched normal distribution in resistors has been a common belief in the past. Unfortunately I don’t know, if that was true or not. I was hoping to test it one day. But it is too late now. For a long time 1% and 5% ranges come from different processes, not from binning. Now we would need to measure banana’s curvature.(1)


(1) This common anti-EU disinformation was indeed a distorted presentation of the binning process used to mark bananas for wholesale market.
People imagine AI as T1000. What we got so far is glorified T9.
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 10289
  • Country: gb
Re: EEVblog 1658 - TUTORIAL: Mean vs Median
« Reply #31 on: January 03, 2025, 12:32:27 pm »
They do tend to show some poorly chosen examples in books and presentations, but it should be obvious that the mode can be practically anywhere. Many things combine a strong tendency to one value, while everything which isn't that value is more or less Gaussian. You see these narrow spikes imposed on a bell like distribution all over the place.
I would argue it’s not obvious, in particular for unimodal distributions. All common distributions obey the inequality and most well-known do too. While in strict sense they’re not rare, it’s actually hard to find a theoretical distribution that doesn’t observe the mean-median-mode inequality, if you want it to be elegant or, worse, smooth and not complicated. This is why I didn’t even try, but used somebody else’s work.
All common distributions relate to one driving force, and they do not generally have funky modal values. However, in the real world many things combine multiple distributions you might find in a book. For example, you might measure all the distances walked by all employees in their day at work. You'll probably find a fairly Gaussian mix of short journeys around the campus, but one strong line much longer than that, which is the distance from the local train station where they arrive in the morning and leave at night. The overall distribution you are seeing is a mix of two (or more) fairly Gaussian distributions. This is extremely common, and you have to be pretty dumb not to realise this blending of distributions is going to happen all over the place.

 

Offline golden_labels

  • Super Contributor
  • ***
  • Posts: 1557
  • Country: pl
Re: EEVblog 1658 - TUTORIAL: Mean vs Median
« Reply #32 on: January 03, 2025, 12:53:09 pm »
Some interesting quirks in distributions. Matura is in Poland an exam you may take after finishing high school. There are three mandatory subjects, Polish language, maths, and a foreign language. You may also take additional subjects, as well as pass some exams at “extended” level. To pass, you need to collect 1/3 of points for each subject. Some histograms.

First, the most famous one, the basic Polish exam (horizontal scale is the number of points (score)).



The pronounced anomaly around 1/3 points (the threshold): you can guess yourself, where this came from. If you need a hint: those are written essays, judged subjectively by the teachers.

Another one, from extended Polish exam. You can still see a small anomaly by the threshold, but what is more interesting are the ends:



This is a great example of the truncated gaussian distribution. With many other cases we only assume gaussian-like behavior, while the underlying model doesn’t allow for longer tails. Or it does, but the histogram is cut to shape (as with Picuino’s example). But here it may be an actual case of the rating process collecting all extra-good students into a single, final bin. And of course the 0, which collected not only poor performance, but also all people who for one reason or another should really be on the negative side.

Now let’s discuss some weirder occurances. You’d assume normality or at least something like Poisson distribution, right? Here is the basic maths level:



Other than once more teachers trying to squeeze out points to push students over the thresholds and the “genius collector” bin, what on Earth is this? :D
You think it may be Poisson overlaid with normal distribution from very good maths pupils, who didn’t choose the extended exam for some reason? Well, then look at the extended level:



I don’t even know, what happens here. In particular note the mode is below the threeshold, despite this is an exam which pupils choose themselves.

Finally, if you think that all exams produce the hump, possibly left-skewed, a surprise. This is basic English:



I have a hint to the examining agency: maybe, just maybe the English exam is too easy, if it captures only the far low side of the actual distribution?

Source (in Polish): Grabowska, Grabowska et al. “Osiągnięcia maturzystów w 2011 roku: sprawozdanie z egzaminu maturalnego w 2011 roku”, Centralna Komisja Egzaminacyjna 2011
« Last Edit: January 03, 2025, 01:02:28 pm by golden_labels »
People imagine AI as T1000. What we got so far is glorified T9.
 
The following users thanked this post: Nominal Animal, Xena E


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf