Your 400 Msps means that at best, you can look at a 200 MHz signal. But that only gives you the bare minimum requirement of 2 samples per cycle that Nyquist demands. Realistically, you're probably limited to more like 100 MHz and 4 samples per cycle - and many would say that's still not enough.
Sorry, that's very wrong. See my previous reply to begin to get an understanding.
Sorry, but that's very right. In fact, it's 'Digital Scopes 101'.
To be really picky, Nyquist says that you have to sample at twice the frequency of the highest frequency component in your signal.
You need to understand the definition of "signal" in the sampling theorem. Many people don't.
In particular, if you have a repetitive waveform,
the signal bandwidth approaches zero.
That is of key importance to
sub-sampling, which is a common technique in digital radios and DSP.
So, if you have a 200 MHz square wave, sampling at 400 Msps will only recover the fundamental and you'll have a 200 MHz sine wave rather than a square wave. This is for the typical 'real-time' sampling scenario and assumes that the scope front end has enough bandwidth to pass the harmonics in the first place.
Partially correct.
Firstly note that I carefully made the distinction between capturing one-off single-shot signals and capturing repetitive signals. You seem to have missed that.
Secondly, you would benefit from understanding an important use of oscilloscopes with non-repetitive high speed signals: eye diagrams.
Finally, in general mixing MS/s and MHz willy-nilly is likely to lead to very misleading statements and confusion.
Realistically, a 300 MHz front end will give you mushy-looking square waves at 50 MHz because you'll only get the third, fifth, and maybe a bit of the seventh harmonics. But the 400 Msps limit won't pass more than 200 MHz components so you'd have to switch to the 10 Gsps RIS sampling to even see that much.
The latter is the key point, and why RIS and the many similar variants are useful.
For digital signals in particular, you would do well to concentrate on a waveform's risetime rather than its irrelevant fundamental frequency. See a brief intro to the theory and some practical measurements at
https://entertaininghacks.wordpress.com/2018/05/08/digital-signal-integrity-and-bandwidth-signals-risetime-is-important-period-is-irrelevant/ There's nothing new there, but people repeatedly misunderstand the point.
The 40 Ksps scope you mentioned doesn't violate this rule because it takes multiple cycles to build up an image of the signal.
Correct, but you are repeating what I wrote!
Building a generator that puts out a good square wave at 200 MHz (or even 50 MHz) is left as an exercise for the student. 
It is trivial, even with
jellybean CMOS digital logic, to get 250ps risetimes. See, for example,
https://www.eevblog.com/forum/testgear/show-us-your-square-wave/msg1902941/#msg1902941