Some clarifications in no particular order:
1) Averaging a series of short windows reduces the variance. Summing a series of short windows increases the dynamic range. Divide by N or 1. Pick one.
2) A window in time is a moving average in frequency. In traditional DSP speak a multiplication by a rectangular window in time is a convolution with sinc(x) in frequency. A Gaussian window in one domain is a Gaussian smoother in the other.
3) Not being able to select the record length, FFT length, number of segments, average or sum and time domain window type is the motivation for my comment. This is not difficult to implement nor is it particularly compute intensive. The absence of these options is a comment on the mathematical prowess of the programmers. No more.
4) log(20,000)*20 is 83dB. That's a substantial increase in dynamic range if you choose not to normalize the sum of the FFTs. 1/sqrt(20,000) is a factor of 141 reduction in the variance. Lots of DSOs have sufficient memory to record more than 20,000 segments.
5) A good FFT is a nice feature on a scope, but not a replacement for an SA. But sometimes all you have is a scope. Lots of people (including me) don't have access to an SA. I tried to fix that, but the SA was much too buggy to keep. I'm now considering spending over twice as much.
6) At present, for most scopes the only answer is to move the data to a PC and run MATLAB or Octave.
7) There is no need for guessing. All the math was nailed down in the 40's by Wiener et al. It's actually astonishingly easy to do it all on the back of a cocktail napkin if you know the transform pairs in time and frequency. Ronald Bracewell has an excellent collection of graphs of common functions in both domains in his classic text on the Fourier transform. Almost any signal you can come up with can be modeled as the sum of other signals with simple transforms.
Wraparound is a VERY serious issue. Discontinuities at the start and end of a window MUST be handled correctly or you will get GIGO.
9) A long FFT has higher variance relative to the average of a bunch of short FFTs. It also has narrower frequency bins and greater dynamic range. Pick one.
10) No information is discarded by chopping a long series into short pieces. It's just being used in different ways.
11) The change in dynamic range assumes Gaussian distributed additive random noise. I should have stated that assumption despite it being so common I refer to it as "sprinkling Gauss water on the problem".
12) If a long trace consists of a series of segments with gaps between them it gets rather messy if the segments don't have a consistent trigger for each.
13) Couldn't agree more that most DSO FFTs are toys.
OT: I HATE "smileys" being inserted where they don't belong!