The attached circuit is one I recently participated in at my local OSH group. It uses a walkie talkie set to remotely trigger a motion sensor that unlocks a door. Normally there is no audio, but when the WT receives a ring tone, that signal is amplified and rectified and charges C1 until the voltage there exceeds the divider of R6 and R7, and then the output of U1B goes high. There is no filtering or tuning - it's just looking for any audio energy. But it takes a while for C1 to charge, so the circuit doesn't generally respond to noise spikes.
It takes a material fraction of a second for C1 to charge to the R6/R7 level, but it continues to charge to the LM358 output voltage minus the diode drop so long as audio is received. When the audio stops, C1 discharges through R5, and that takes several seconds. So the output actually lasts longer than the audio input.
The gain of U1A and the relationship of R4 and R5 determine how the circuit responds to audio - how long it takes to register that there is legit audio, and how long it takes to "release" when the audio stops. But this might be teakable to work for you if there is enough difference between your background static and actual audio signal. If the noise is as loud as the audio, then this circuit wouldn't work, unless a little filtering out of the high frequencies would be enough to make it work.
Then you would have to rig up something that lets the output of U1B switch the speaker on and off.
I should add that the input assumes there is no DC offset, which is the case with the WT audio output. If there is offset, you would need to run it through a capacitor. Also, you have to be careful that the input signal doesn't exceed the absolute maximum rating for the voltage applied to any input, the meaningful one in this case being 0.3V below the negative rail.