The part that wasn't clicking before was in the practical application as regarding the iron.
What I am connecting, now, is that the undershoot of the non PID iron is almost always going to happen when you first touch the joint. This is because the nonPID iron essentially lives in the "over" area for over 90% of the time while the tip is not touching anything. Tip slowly cools, heater goes on for just an instant. Tip goes back to slowly cooling. Slowly cooling in the "over" is the usual state.
So when you touch the joint, the first dip/undershoot is going to be larger than it could be with PID. Of course the blind squirrel could find the nut... and the heater could be just about to come on at the same time as you touch the joint... so that the oscillation is minimized by chance. But this is not going to be the norm.
So minimizing undershoot is one of the two key improvements in my current understanding (which is subject to change). The other being the correction for difference in temp between tip and sensor... which amounts to temperature droop that increases with thermal load, if I understand correctly. These are the two things I would be focusing on with an algorithm for a soldering iron. Overshoot I'm ok with. I'm going to focus on quickest response and err on more heat. Once the iron is in action, I want the tip to respond faster to retain more heat/temp and then continue on to hot or hotter and either one is fine.