After about a week of testing I found out why the results were off.
The RBW and VBW settings have a big impact on displayed attenuation when using the tracking generator.
I couldn't get the analyzer to show the correct attenuation value with default VBW settings. I had to manually override the VBW down to 10 KHz, and leave the RBW at default (1 MHz) for a 3 GHz span. Narrowing the RBW to less than default also shifted the readings. Plot averaging was set to 100, but 10 would have worked just as well.
Siglent have a AppNote on making attenuator readings, but it says nothing about changing RBW or VBW settings. Screenshots just show them at default.
https://siglentna.com/application-note/attenuator-verification-spectrum-analyzer/Here are 3 plots of a 40db attenuator showing different RBW and VBW settings. The most accurate results are found with a 1MHz RBW (default) and a 10KHz VBW (although 30KHz wasn't much different).
rbw30k_vbw30k.jpg - Reducing the RBW down to (non-default) 30K has a huge impact. Almost 4.5dB off.
rbw1m_vbw1m.jpg - Default RBW and VBW settings (1 MHz). Off by about 0.5dB.
rbw1m_vbw10k.jpg - Default RBW but a manual VBW of 10K. This is the most accurate plot.
If you're wondering how or why I think the last plot is the most accurate, I contacted minicircuits w/serial number and they very helpfully handed over the test data for the sample of attenuators tested in the same batch that my device came from.
Top marks to MiniCircuits for customer service! I even got to speak with an engineer.
I also ran some tests using Zero Span, and T-Power for spot readings. T-Power was the most accurate, as it didn't seem affected by changes in any RBW or VBW settings. T-Power results agreed with the last plot (rbw1m_vbw10k), and were also within the test data range provided by minicircuits.
If you do use T-Power to make spot readings, I found that there's a technique for the most accurate results. Normalize the analyzer with a pass through on the cables - take a reading, then connect the attenuator and take another reading. Don't change the frequency between readings. Just moving the frequency off target and back again results in a small error. This technique does mean that you have to fiddle about with cables for every different frequency reading though.