A few years ago when I found out I was going to be an uncle, I 'toyed' (see what I did there?) with little sound generators.
Specifically, small flash chip + microcontroller + opamp +speaker. I believe it was the Attiny461 which had 'high speed' PWM (on board RC clock @ 64MHz that could driver the timer/counter for high speed PWM). Because of the high speed, I could get away with a cheaper LPF after it, although it played samples at 10-bit 16ksps, the cut-off was at roughly 7kHz, and it sounded surprisingly good for what it was.
For sound effects, or just samples where you can understand human speech, you can of course go lower. Telephone quality is 300Hz-3kHz anyway. The microcontroller part is relatively easy as all it does is periodically grab samples from the flash (SPI) and change its PWM. A few buttons just picked the start address in the external flash, the first two bytes in that location would contain how many samples it was - the length of it. I wrote a pretty basic (and awful) VB app that read in a *.wav file, and sent it over the serial port to the micro, which then loaded the flash with it. The *.wav's were converted to 16-bit @ 16kHz using 'windows sound recorder'. Only the upper 10-bits of each sample were used, because it requires two bytes to store that anyway, and I couldn't find free software to convert audio clips to 10-bit. (would probably do that all in a C# app now).
I revisited that project recently just to see how 'cheap' I could get it. And was reminded of this:
http://www.romanblack.com/BTc_alg.htmThat can help reduce the part count, but you'll probably still need external memory. 'Synthesizing' voice is extremely difficult, but just playing pre-recorded samples is very straight-forward for hardware, and given how cheap memory is these days, an SPI flash chip won't add much cost. Some PIC's have built-in opamps, like the PIC16F1704, as well as an 8-bit DAC. It also has a couple of PWM modules if you wish to use those. That would be a two-chip solution which I suspect could actually do fairly decent quality sound.
You might be able to get away with an 8-pin micro (using one ADC pin to read several buttons), an 8-pin flash chip ($0.50 for 4MBit) and a single transistor speaker driver. Although commercial cheap devices generally use custom IC's and are extremely cheap, this is the cheapest (also easiest) way I can think of to make a small 'sound module' that plays small sound clips.