The noise floor in each of your "-80dB", "noisefloor" and "noloop" sample data files is approx. 0.57mV RMS, equivalent to an IF level (at the ADC input) of -52 dBm (at full IF bandwidth).
[ In your "-80dB" samples, the 15kHz signal already drowns in the noise floor -- at this frequency I cannot see a statistically significant difference between your "-80dB", "noisefloor" and "noloop" samples, which would exced the variance of the noise floor. ]
The maximum possible signal level is limited by the mixer's compression point, which is specified in the datasheet with -6dBm. Your data show that -6dBm is not yet the limit, since your "0dB" samples show a compression of only ~1.1dB, compared to the "10dB" samples (whose ADC-measured IF level is -6.3dBm for the 15kHz fundamental). So a maximum IF level of about 3dBm seems feasible (at least for the RF frequency you did use when you created the data files -- maybe it's different at other RF frequencies).
If I define "dynamic range" as quotient between maximum signal level and noise floor level, then it is ~55dB, for the ADC-based power detector at full IF bandwidth. Due to summing/averaging the power of 4k samples, the variance of the noise floor is supposed to be relatively low (approx +/- 1dB), i.e. the averaging acts as VBW filter, but it can't lower the noise floor. A lower noise floor can only be obtained by reducing the noise bandwidth, of course.
OTOH, the AD8307 datasheet claims a noise floor of -78dBm, which should give a dynamic range of 81dB then. Multiple ADC sapmles of the AD8307 output can still be averaged to improve the variance of the measured noise floor (-> VBW filter). But I'm in fact surprised that you don't get a better DR from the AD8307 either, than from the ADC-based power detector. The transfer function depicted in the datasheet starts to flatten below -70dBm, but that's already close to the noise floor, so it is not assumed to explain a DR loss of more than a couple of dB. The AD8307 measures of course anything presented to its input. Does the IF signal coming out from the mixer possibly already contain a lot of noise (i.e. significanly more than -78dBm)? If yes, then it could explain why the relatively large intrinsic DR of the AD8307's cannot be exploited. But I have no idea how much noise is actually coming out of the mixer. The noise in the ADC samples is the sum of noise already present in the IF signal and ADC noise, and I've no feeling which component dominates.
Regarding the spurs in the spectrum: The dominant spurs have a spacing of about 143kHz, which is a little bit less than fs/6. I don't worry so much about the spurs near fs/6, near fs/3 and close to Nyqusit, but they also appear close to DC, which is nasty for SA usage, since they are still inside the passband of the RBW filter, even with significantly reduced bandwith. I still wonder whether the frequency of 143kHz is just an unfortunate coincidence, or whether it is related to the sampling? If they were exactly at DC, fs/6, fs/3 and Nyquist, then they were likely sampling-related. But the spacing is a little bit less than fs/6. Do you find any signal source on the board which has an integral multiple of 143kHz, or pulse widths with an integral fraction of ~7us, or anything which could produce IM products of the said frequencies? What's actually the PS3120 IC on the board? I don't find a datasheet, but it seems to be a step-up converter (so I guess it has an oscillator, too). +5V from USB is also a potential candidate for noise and spurs. Was the receiver LO actually turned on, when you captured the "noisefloor" and "noloop" samples? Does it make a difference (regarding spurs) whether it is on or off?
Edit:
It is anticipated that the dynamic range will increase to 70dB - 80dB in network analyzer mode. The sweep times will also improve in future.
(Average) noise floor at 321 Hz noise bandwidth (-> 4000 samples, Hamming window, one bin) is about -83dBm (which would imply a DR of 86dB when the maximum level is 3dBm). Since the noise is not white, it is also frequency-dependent. But variance of the noisefloor (when you repeat the measurement multiple times) is rather high, say +/- 10dB standard deviation (just a rough initial guess), so the desired signal still needs to have a level quite above the average noise floor in oder to stick out with statistical significance. The 15kHz signal in your "-70dB" samples (which is about -66.8dBm) can be clearly distinguished from the noise floor, with a repeatability of about +/-1dB. Still it is too weak for estimating the phase. And the 15kHz signal in your "-80dB" samples is likely above average noise floor, too, but cannot be distinguished from noise due to the variance. For 4k samples, already the sampling limits the sweep time to < 200 readings per second. Tuning the PLLs takes some additional time. [ And vector measurements require at least two (1 port) or 3 (1.5-port) readings per frequency point, reducing the sweep rate further. ]
Edit:
Since the noise distributions are skewed in dB space, which is not so easy to calculate analytically, I've done monte carlo / bootstrap simulation to get closer estimates for the noise floors and their variability. Below are the results for 4000 samples. dBm level are absulute IF voltage levels at the ADC input (0dBm = 0.223607 V
RMS).
Noise floor levels:
Full IF bandwidth RMS averaging (integrate power over samples):
dBm_mean = -51.446
dBm_95pct = -51.237
dBm_stdev = 0.12724
RMS averaging of filtered samples (642 Hz noise BW, NA mode):
dBm_mean = -84.318
dBm_median = -84.104
dBm_95pct = -79.171
dBm_stdev = 3.3625
Vector averaging (DFT-like) of filtered samples (642 Hz noise BW, NA mode):
dBm_mean = -85.697
dBm_median = -84.673
dBm_95pct = -78.324
dBm_stdev = 5.6498
N=10000;
NS=4000;
num_samples = NS
fs=72e6/6/14;
freq=15000;
t=[0:NS-1]/fs;
% quadrature LO
lo = exp(-1i * 2 * pi * freq * t);
% window function (lowpass)
w = hamming(NS, "periodic")';
w /= sum(w);
% corresponding bandpass fir filter centered at freq
bp = 2 * w .* cos(2 * pi * freq * t);
% bootstrap resampling of noise from given distribution
% still not perfect, since actual noise is not i.i.d.
nf1 = textread("tg-15kHz-test-signals/tg-15kHz-80dB-after-caps.dat");
nf2 = textread("tg-15kHz-test-signals/tg-15kHz-noisefloor-after-caps.dat");
nf3 = textread("tg-15kHz-test-signals/tg-15kHz-noloop-after-caps.dat");
nf = [ nf1 nf2 nf3 ]; % combine all three
nf = nf(1:NS)' / 65536 * 3.3 / 0.223607; % normalize to dBm
x = zeros(N,NS);
for i = 1:N
x(i,:) = nf(randi(length(nf),1,NS));
end
% alternatively, Gaussian noise with same power
% normalized to dBm
% x = std(nf) * randn(N,NS);
% sine wave for checking correct scaling of calculation
% x = repmat(sqrt(2) * cos(2 * pi * flo * t), N, 1);
y = zeros(N,NS); % bandpass-filtered samples
V = zeros(N,1); % filtered vector average
for i = 1:N
xi = x(i,:);
y(i,:) = real(ifft(fft(xi) .* fft(bp)));
V(i) = sum(xi .* lo .* w);
end
disp("")
disp("Noise floor levels:")
disp("")
disp("Full IF bandwidth RMS averaging (integrate power over samples):")
dBm_mean = mean(20*log10(std(x')))
dBm_95pct = sort(20*log10(std(x')))(floor(0.95*N)+1)
dBm_stdev = std(20*log10(std(x')))
disp("")
disp("RMS averaging of filtered samples (642 Hz noise BW, NA mode):")
dBm_mean = mean(20*log10(std(y')))
dBm_median = median(20*log10(std(y')))
dBm_95pct = sort(20*log10(std(y')))(floor(0.95*N)+1)
dBm_stdev = std(20*log10(std(y')))
disp("")
disp("Vector averaging (DFT-like) of filtered samples (642 Hz noise BW, NA mode):")
dBm_mean = mean(20*log10(abs(V) * sqrt(2)))
dBm_median = median(20*log10(abs(V) * sqrt(2)))
dBm_95pct = sort(20*log10(abs(V) * sqrt(2)))(floor(0.95*N)+1)
dBm_stdev = std(20*log10(abs(V) * sqrt(2)))