Well, it's been a long time (decades) since I had to do anything involving communication or signal theory, so it's a bit of a stretch to remember the math. However, where you see a fundamental difference, I see a fundamental similarity. The circuits that do the work are just implementing the math, and if there are two or more solutions, they are mathematically equivalent. The one you pick is the one that is most practical to implement with current technology (or at least the technology one knows how to implement). But, I tend to see similarities across disciplines in any case.

The ideal mixer building block is a multiplier. But, the "LO" input to the mixer does not have to be a sine wave. It could be a square wave, which is commonly used. Once you allow for a non-sinusoidal periodic waveform with infinite harmonic content, it becomes pretty arbitrary to allow square waves but not an impulse train. Once you go to an impulse train, you have created an ideal sampler. The discrete time signal has to come from sampling a signal somewhere, and if it has frequency content >Fs/2, it means the image overlaps the desired content, and that is the origin of aliasing error.

John