From experience: I've used ne3210s01 as amplifier in parallel feedback oscillator. What I observed is that S21 phase given in datasheet for certain bias, frequency and terminations is more or less consistent. For the same conditions simulation would provide S parameters which sometime differ by a relatively big amount, e.g. phase and amplitude differs a lot: 30 degree vs 90 degree S21 phase, and sometimes it was almost the same. There are different simulators and different simulation approaches. Tools such ADS have a thing called "momentum", but there is also spice style simulation using building blocks, and result differ too. It is really interesting to hear true experts in this field, but before they arrive, I share some more thoughts. Maybe you can try some transient simulations and examine if there is some unexpected wiggling added to amplified frequency, see if transient S21 phase have an expected value.
What is always worried me was amplifier stability. I had no measurement equipment at all, and going through many papers I've got a feeling that the best way is to have a VNA and/or signal generator, feed ports of amplifier and measure what is coming back and coming through. Many old designs have very strange copper matching, which looks like initial design was tuned using large copper pieces and cutting, and then final design just copied this layout, especially designs from 20-30 years ago. All those frequency triplers and quadruplers, they are stable and have a lot of tuning copper areas which do not look like analytically calculated open/short stubs at certain offsets. I think many of such designs are made by prototype tuning (cutting/adding copper) and measurement.
There is a great article called "The oscillator as a reflection amplifier: an intuitive approach to oscillator design" by John W. Boyles. I recommend everyone interested in matching topic to read it, because it shows how the same matching at single frequency may have very different behavior when active device "travels" through it's S-parameter curve. It is about oscillators, but may help to understand how to make an opposite - a stable amplifier. It is 1986 and frequency was 3.8GHz, there is obviously some measurement equipment was used. And this article was written 37 years ago, very detailed. My opinion, you just need to match and measure it, and then maybe go through few iterations. Simulation may help, but at some point it may become an endless useless tuning of parameters.
In my case it was kind of simple: First, I designed amplifier which is very stable theoretically. All unstable regions are far away from Smith Chart center. Then I could terminate gate line with resistor and make it very stable at remaining unstable lower frequency band. Because my design was pretty narrow band and at high frequency, I just coupled signal by tightly coupled quarterwave lines when needed, or a microstrip ring / split ring resonator. When I used this approach, nothing was "beeping" at unwanted frequencies. But of course, only measurements would help to ensure if everything is ok.