One thing I can recommend, tune you audio amp filter and ADC sample rate to match something close to a phone line. Something like and audio bandwidth close to flat from 400Hz to 3KHz, meaning audio should begin around 200hz and roll off out around 6khz, sample at around 12khz. This should be readily understandable for MCU speech decoding and anything else would just be excess noise you need to process around.
There do exist cheap tricks to achieve a fast cheap quality spectrum analysts of the source audio using integer math only, basically many multiply adds in a pipe, but I suspect to get the quality you want, you will just need to go with a pic which has the maximum ram and instruction ram so you aren't left squeezing code in at the last minute for trying to save 2-3$ on a single PIC. That 1 bit input motorola MCU probably sampled at 1khz for 1.5 seconds, my guess looking at Dave's video, the sift through that 1 bit pattern, 187 bytes of ram, probably all they could muster on an MCU at the time. You'll be operating at, say 8 bit, 2 seconds sampled at 8Khz is 16k. If you do a real-time cheap FFT, and retain 50samp/second 64 bands, at 8 bit each, you are talking 7kb for 2 seconds. Take a look at this guy, I chose it because it's in a DIP package, easy for testing:
http://www.microchip.com/wwwproducts/en/PIC24EP512GP202512k flash, 48k ram.
It also has 2x built in opamps which may be good enough for your MIC, + wired to it's internal comparators to wake up the MCU from sleep when there is sufficient noise. An all in 1 solution. Though, if you enable the internal op-amps and comparators, I cant say how little power the device will consume during sleep.
Also, remember, you most likely need to keep the MIC continuously powered as well.