You don't need SO(T)L to do the calibration. In fact, SOTL is not really a good calibration method outside of lower-frequency (few GHz) applications. We almost always calibrate our systems with TRL, as this does not require perfectly known standards. (Don't ask me about the math though, as I'm not that familiar with it either). The issue with TRL at lower frequencies is that the line becomes large.
Right now, from what I have been told, the difficult part making a high-end VNA is not getting the performance in the first place - it is stability and speed. Keeping that performance right there within a fraction of a dB for many hours or days, and allowing fast measurements to be done to better characterize the system. And don't think this is just something needed for production test - colleagues of mine working on microwave for biomedical and spectroscopy applications sometimes do measurements that take 72 hours (as they need to determine how a parameter changes as a yeast culture grows, for example). As such, they need that stability over time to ensure that they are not measuring the VNA drift but actually measuring the DUT.
The beauty of a VNA calibration is that (to some extent at least) the performance of the VNAs components doesn't matter that much - you calibrate that out anyways. Of course, you run into limits of dynamic range and return loss - if your output connector on the VNA has an S11 of -15 dB, good luck measuring the S11 of a device that should have extremely low return loss...