Actually, thinking about it again, could one not just demodulate directly by undersampling? Sample twice at the modulation frequency, with every other sample going into a second buffer, and then find the magnitude by interpreting them as the real and imaginary parts of a complex signal. Some example Octave code to illustrate:
f = @(t) t.^2 .* sin(2*pi*t+0.1)
I = f(0:1:4);
Q = f(0.25:1:4.25);
plot(0.25:1:4.25,sqrt(I.^2 + Q.^2),'bo');
hold on;
plot(0:0.1:5,(0:0.1:5).^2, 'r');
hold off;
legend('Demodulated', 'Transmitted');
The catch I suppose is that you would need to lock to the modulating carrier so that you don't get a beat frequency. Other than the need for a (software) PLL or somesuch, does anyone see a problem here other than the complexity of the frequency locking? Obviously you could get better noise performance by sampling again at the other two quadrants.