Even if you have only one or two periods of input and reasonable SNR, I can show that using FFT is optimal in terms of MAP (maximum aposterior probability) rule, and to maintain that optimality you can not do much better in terms of computational complexity. Any attempts to reduce computation (time domain approaches, sparse approaches, etc) will make the solution non-optimal in terms of MAP.

First of all, when distinguishing between several possible waveforms in the presence of noise, the MAP rule reduces to minimum distance rule, or in other words the symbol with the lowest distance to the input waveform is the most likely one. Distance here is represented by dot product, or the integral of the product of two waveforms. Higher dot product => less distance.

In this case we have a chunk of input waveform and know that it contains a portion of a sine plus some noise. Our symbol dictionary is simply the set of all sinusoids with amplitude 1.

**The problem can then be defined as: find the most likely sinusoid that the input waveform chunk is believed to contain.**If we simply took the dot product between the input waveform chunk and every possible sinusoid, then picked the one with the highest dot product, then we have satisfied the MAP rule. The fourier transform is exactly that, it's a dot product between the input waveform and all possible sinusoids.

However, the FFT as computed is not a FT (fourier transform). It's a special case of it which is properly named the discrete fourier series (DFS). Recall that a signal that is periodic in the time domain is discrete in the frequency domain, and that a signal that is discrete in the time domain is periodic in the frequency domain. The DFS is a fourier transform of a signal that is both discrete and periodic, both in time and frequency domains.

Suppose our input waveform chunk looks like this:

If we compute a FFT on it, we are actually computing the FT of this, which is the input chunk repeated:

(which is repeated and goes on to infinity).

However that's not what we want, we want the FT of just the input chunk, with zeros everywhere else. So to do that we (conceptually) multiply the input with a rect, which is equivalent to convolving with a sinc in the frequency domain or interpolating it.

The FFT output might look like this:

The black lines are the discrete output points, while the red line is the interpolated waveform.

Every point on the red waveform is mathematically equivalent to the dot product between a particular sinusoid and the input waveform chunk.

The peak pointed to by the blue arrow is the maximum point. This peak can be extracted without actually interpolating the FFT output, by deriving the formula of the sinc convolution and then finding local maxima.

Regardless of what you do, to actually make full use of the input data you must look at all points, so your minimum computational complexity is O(n). The FFT + peak finding has a complexity of n log n, which is very close to O(n) and I would say is pretty much as good as you can get. Any approaches that are sublinear such as sparse FFT variants will not give you the MAP result, and thus are not optimal in terms of extracting maximum information out of the input waveform.

Please, before you comment that new and shiny approaches do far better than FFT, go and review your first year signals and systems theory and fourier identities, and try deriving exactly what you are actually computing. There is a time and place for sparse approaches, but when you have evenly spaced input data and you want a best possible estimate of frequency and phase, you actually can not do better than a fourier based approach.