The paper does not mention any gain stage prior to the mixer. Is that common and/or reasonable?
Sort of. A passive mixer always loses 6dB by its operation, which means your SNR is 6dB worse, straightaway. Your dynamic range is also reduced by as much, plus whatever the maximum signal level the mixer itself can run at.
With a preamp, SNR rises, so that by magnifying the input (and adding only a little noise in the process), the mixer's 6dB loss looks like only, maybe, 1dB, so that after the amplifier's say 2dB noise figure, the SNR is now 3dB off ideal, a 3dB improvement. Dynamic range has been reduced even further, however, because not only does the amplifier experience IMD and compression, it also drives the mixer that much harder. If the RF amp is part of the AGC circuit, this can be somewhat mitigated, but only with respect to the station being received, not over the total band (i.e., a station at 110MHz and 0dBm might overdrive the front end, while trying to receive 112.5MHz at -60dBm).
The same SNR-dynamic range concerns apply to ADCs (the ENOB being a very obvious limit to both!). If the sample rate is high, much filtering and averaging can be applied to recover more ENOB (effectively, filtering the quantization noise improves the noise floor), so it need not be exactly 72dB. It's all about the processing.
If you do wish to explore a hetrodyne converter (it's not really superhet unless the RF band is also being tuned as a tracking filter), I would suggest something like an SA602 with a varactor tuned LO (driven from a DAC, assuming you have one handy) and a frequency counter (so you can keep adjusting the DAC output to keep the frequency offset correct). This should target a 10.7MHz IF, provided by a ceramic filter -- a standard component in any FM radio. The SDR can run at whatever sample rate is desired, but if you have the capability and processing, you might as well keep it at maximum, since quantization noise is independent of sample rate, which means you can do more filtering to recover more SNR.
Tim