Well, you already did explain the basics. 2 points can describe a signal of 1 Hz, assuming it's a perfect signal.
Not exactly.
The 2 points will reliably pick up the FREQUENCY (assuming appropriate anti-aliasing), but not necessarily the accurate timing of the signals.
This shouldn't irk me, but that is a complete fallacy and simply proven. Sample a 1 Hz sine signal at 2 Hz, perfectly in phase and you will get these results:
Sample #, Sample phase, value
0 0 0.0
1 180 0.0
2 0 0.0
4 180 0.0
Looks like a complete failure to me to detect any signal.... looks like you are missing half the information.
That is a GOOD point!
The following article, seems to agree with my answer. But your counter-example seems perfectly correct and reasonable.
https://users.cs.cf.ac.uk/Dave.Marshall/Multimedia/node149.html
As I said in a much earlier post in this thread. It can easily become an extremely complicated subject area.
tl;dr
I agree, there is a contradiction somewhere
EDIT:
I've researched this issue further. Remember I said that things can get VERY complicated.
It seems that the Sample Rate = 2 X Maximum Frequency, is the absolute bare minimum, and is only borderline suppose to work. As well as anti-aliasing, it is needed to make sure that you are NOT at or too near the zero-crossing point of the waveform. Then it should work as expected.
In practice, because you should be well above this "bare minimum", you shouldn't have this issue.
But really you want to keep well above the 2 X Max Freq Nyquist. Which I (and others) have already said in other posts in this thread.
EDIT2:
On reflection, even the zero-crossing avoidance is probably NOT enough. As you would then just get a pile of samples at the same voltage. So we are still waiting for the magic Electronics mega guru, to quickly appear and explain away this phenomena . . .
It is enough information (samples) to say that it is either exactly 1 Hz or a fixed DC voltage. If the DC is blocked (AC input coupling mode on scope), then it would work. But it seems unsatisfactory, because allowing DC, seems perfectly reasonable.
Here is how I look at it. Say we were sampling a bandwidth limited periodic signal at 15Hz, with a trigger point of 0.0 and got the following 15 values:
-0.70
-0.60
-0.50
-0.40
-0.30
-0.20
-0.10
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
According to the theorem (when used off the cuff) you should have all the information you need to deduce the amplitude and frequency of the signal being sampled, as the frequency is definitely much less than Fs/2. However, the best that can be done is do some math, and come back with "well, I know the amplitude is going to be more than X, and the frequency is going to be less than Y Hz".
I'm sure you agree that you don't have enough information to say much more than that - as it is just not sampling slow enough to get enough information. We could know more about this signal if we had access to 15 samples, just spaced further apart in time.
The same thing holds at high frequencies. If we were to get this 15 samples (note how these are the same values as above, but with every other sign swapped):
-0.70
0.60
-0.50
0.40
-0.30
0.20
-0.10
0.00
0.10
-0.20
0.30
-0.40
0.50
-0.60
0.70
Then the best I can say is "Well, we have a high frequency signal, with an amplitude higher than X and the frequency is going to be greater than (15/2 - Y) Hz (with exactly the same X and Y values as in the low frequency example) - and have to say this because we are not sampling fast enough, or long enough, to get any more certainty about this signal.
I guess the upshot is that sampling at X samples per second, bandwidth of 0 Hz to X/2 Hz is the limit when you have access to an infinite series of samples. When you only have a limited number of samples, the actual bandwidth will always be a tad smaller, both at the DC end, and at the Fs/2 end.