Heck, might as well do it with transistors...
Averse to discrete? Just don't know how?
It's quite simple! Here's an example from my Theremin (similar bandwidth I suppose, though at a somewhat higher center frequency):
The two collectors at the top carry the output signal currents. You can attach a transformer here, or a resonant circuit (handy for more radio and IF), or ground* one side and use the other (which is as shown; this is the volume mixer, so a resonant tank detects the off-center difference, and when that difference is on top of the resonant frequency, a bias voltage is generated, which goes to the other mixer to control its volume level). In your case, a single resistor would be fine, giving audio output. (Or use two resistors, one for each collector; followed by a differential amplifier, which removes the DC offset and amplifies the desired difference signal. Which can be combined with the needed lowpass filter, so that's handy.)
*For AC signals, supplies count as ground. The collector has to draw current from a supply at a higher voltage than the emitters are at, so it's understood that DC is "above", while AC is "ground". Hence, a supply rail.
Of course, it's easier with more supply voltage, but signal levels can be ratcheted down a bit, and bias tightened up, without much expense to performance. This is certainly feasible at 3.3V.
If you're just detecting bats, that should do fine. (Heck, maybe you even want that resonant amplitude detector part, too!
)
If you want to "listen", you might not want to do direct conversion. Reason being, if you're expecting any kind of pitch or harmonics*, half the output band will be reversed (the frequencies below the LO), and superimposed on the "correct" half (frequencies above LO). It might be valuable to up-convert fairly substantially, use an "SSB" style IF filter to eliminate the lower sideband (which will be at higher frequencies, this time), then convert this to baseband for listening (perhaps using a BFO and detector, which is traditional).
*Note that you'll only see harmonics one at a time (more or less), which isn't very harmonic at all. This is fundamentally why slightly-out-of-tune SSB communications sound so goofy or garbled or alien: your ear expects evenly spaced harmonics, and when they don't line up, it becomes unintelligible. You won't even be hearing sounds as such, if you're listening to the gaps between harmonics!
How can this be solved? (FYI, it can't through mixing methods. You need an altogether different kind of nonlinearity for that!) One way is to acquire a big wad of samples, and run some DSP on it to rescale the frequencies (a pitch bend effect). This takes a lot of processing, but is relatively simple when you have the libraries handy (it can possibly be done in real time on an average ARM core, otherwise a cheap DSP should do it no problem; a Fourier transform is needed, followed by resampling the frequency spectrum (a decimation or downsampling function), and inverse FT).
It can also be done in analog, for more limited constraints: if the signal is periodic, the fundamental can be detected, and samples taken at a fractional rate, reconstructing the waveform through equivalent time sampling. It's played back in real time, a step at a time.
Tim