Author Topic: Embedding data in audio for decoding with a microcontroller (Read 6373 times)

NiHaoMike · « **on:** May 29, 2013, 03:51:57 am »

I had this great idea to embed data in game sound effects so that when the effects play, a microcontroller (connected to the PC's audio output) can activate various devices. (e.g. light up a neon sign for a second after an enemy is killed or pulse a vibrator when hit.) Obviously, the game is going to introduce a lot of amplitude and phase distortion as well as mix in other audio. And of course, the added data should not noticeably affect the audio quality. It also needs to be low latency (less than 50ms or so) so the effects will operate in sync.

So I'm thinking of putting the data at about 22kHz or so (exactly half the sample rate of the game audio in question), modulating the carrier on and off to encode the data. At the microcontroller end (planning on using a dsPIC), some analog filtering is done to isolate the high frequency data, followed by some DSP to do the actual decoding.

Is there any example code for something like that? (It does sound similar to what my friend Tiffany Yep does, but I want to use a cheap microcontroller instead of an expensive FPGA.)

David_AVD · « **Reply #1 on:** May 29, 2013, 04:17:26 am »

Out of curiosity, is this for wav or mp3 (or other) format?

NiHaoMike · « **Reply #2 on:** May 29, 2013, 04:52:16 am »

It's WAV.

flynnjs · « **Reply #3 on:** May 29, 2013, 06:26:24 am »

These are things you'll need to consider:

1) Bandwidth of ADC in MCU, or will you need an external one
2) Ability to implement filtering before ADC. Can this be high Q
bandpass or can you only afford a simple lowpass?
3) Ability to carry out some simple inside MCU filtering such as
a FIR.
4) Noise immunity of your AM scheme will be fairly low a BPSK
scheme might be better
5) FEC. For a block interleave you'll need adequate MCU memory
but you may decide your noise sources are not long duration so
you may not use block interleaving.

Another signalling scheme you might like to look at (depending
on signalling rate required) is CTCSS, used in analogue two-way
radio.

You may find doing the demodulation side of stuff easier by using
an external IC such as an infrared receiver (which puts the data
on a carrier but possibly higher than your proposed 22kHz) or
a dedicated CTCSS decoder.

Rasz · « **Reply #4 on:** May 29, 2013, 06:32:23 am »

sounds like macrovision / RDS, except both of those technologies carry data outside of audio stream.

Cant you just use external interface?
In the old days before force feedback we had rumble, and I seem to remember first rumble packs just picked up Bass in audio stream.

Paul Price · « **Reply #5 on:** May 29, 2013, 01:48:58 pm »

Am I the only paranoid person here? This method would seem an ideal way to assist terrorists to send each other embedded messages while seemingly only sending some Itunes .mp3 files of the top-10 playing on the radio? Who are we helping here?

Psi · « **Reply #6 on:** May 29, 2013, 01:54:17 pm »

Quote from: Paul Price on May 29, 2013, 01:48:58 pm

Am I the only paranoid person here? This method would seem an ideal way to assist terrorists to send each other embedded messages while seemingly only sending some Itunes .mp3 files of the top-10 playing on the radio? Who are we helping here?

They can already do that with the many programs available to store a small text file inside image pixel data.
http://en.wikipedia.org/wiki/Steganography

Trying to stop people communicating anonymously/covertly on the internet is impossible.

AndyC_772 · « **Reply #7 on:** May 29, 2013, 02:17:05 pm »

Quote from: Paul Price on May 29, 2013, 01:48:58 pm

Am I the only paranoid person here?

Yes, I'm afraid I think you are. This technique of using out-of-band audio signals to carry small amounts of digital information is, with due respect to the OP, laughably simple to do and has been used by the world's telephone systems for years. It's how payphones know how many coins to require for a given call. No doubt dozens of other examples exist too.

There is information out there which, it could be argued, has no legitimate purpose and should perhaps be monitored or restricted... information like "if you mix chemicals A, B and C in these proportions then they'll go bang", perhaps. But techniques for things like communication, timing and security are so widespread, have such obvious and useful legitimate purposes, and are so readily derived by anyone with some basic electronics knowledge, that to try and control them would be utterly ridiculous.

To the OP: I'd start by synthesizing a swept sine wave tone and analysing what comes out of your sound card, to find the highest frequency it will produce at a usable amplitude before the output filter cuts in. That's the frequency to modulate to carry your signalling.

Then I'd build a high pass (analogue) filter - maybe a 4th order VCVS type based on op-amps - to filter out everything below this frequency, and use a peak detector circuit (diode + capacitor) to measure the peak level of the resulting signal. Feed the output of this into your microcontroller, and all you need to do is pick a suitable threshold level to determine whether the signal you're after is present or absent. Then you have a serial bit stream which you can decode; you might even be able to use a UART.

Paul Price · « **Reply #8 on:** May 29, 2013, 03:24:44 pm »

How about sending short bursts (just a few cycles) of high frequency sine waves (somewhere in the 5 to 20KHz spectrum) only in small amplitudes around the 0-crossing point of the transmitted signal. The encoder would vary the frequency of a few cycles of this burst signal to give a binary one or zero and also create a start bit signature frequency. At the receiving end, the decoder need only look at signals extracted from a small amplitude sampling window around the 0-point crossover to find this "burst" signal for Freq. Shift Keying decoding. To ensure reception, the same code sequence should repeat several times. The transmission scheme could even choose to only transmit during any brief "quiet" periods encountered in the "carrier" main music/voice.

Niklas · « **Reply #9 on:** May 29, 2013, 06:39:23 pm »

Quote from: Paul Price on May 29, 2013, 03:24:44 pm

How about sending short bursts (just a few cycles) of high frequency sine waves (somewhere in the 5 to 20KHz spectrum) only in small amplitudes around the 0-crossing point of the transmitted signal. The encoder would vary the frequency of a few cycles of this burst signal to give a binary one or zero and also create a start bit signature frequency. At the receiving end, the decoder need only look at signals extracted from a small amplitude sampling window around the 0-point crossover to find this "burst" signal for Freq. Shift Keying decoding. To ensure reception, the same code sequence should repeat several times. The transmission scheme could even choose to only transmit during any brief "quiet" periods encountered in the "carrier" main music/voice.

That kind of transmission scheme works for fixed frequency power distribution network to remotely control units. I doubt it will be of any use in this case because of the randomness of frequency distribution in most songs. As the 5-20 kHz spectrum also contains, hmmm, music, how would complex will the triggering code be to handle all cases of possible false triggering? If you can code an algorithm for a dsPIC that handles that with repeatable results, then I would be very impressed.

I think AndyC's proposed solution seems like a much better idea to start from.

Paul Price · « **Reply #10 on:** May 29, 2013, 09:06:59 pm »

If the frequency is high enough but below the nyquist aliasing limitations of the receiver, this high frequency encoding could be filtered out of the normal content. Not much music hits the highest notes(freqs.) usable in the receiver.

If the "Q" of the receiver's data highpass filter is high enough, or fed to a bandpass filter it could ring and act like a carrier regenerator to give an MCU more samples to determine the frequency.
The accuracy received code could be verified and bad packets rejected by checksum and/or sending first the normal data string sequence and then its binary inversion as a check.
All this assumes a low baud, but consider this: high speed DSL internet works well over a POTS telephone line.

C · « **Reply #11 on:** May 29, 2013, 10:11:26 pm »

Hi
Background
A long time ago I worked with some systems for receiving RTTY. Most transmitters would transmit one of two tones. Some systems did two or more channels and would use more tones but only one tone at a time. Most receivers used a band pass filter to get just the tones and then used FM detector. There were a lot of systems with different tone spacing, so to do a great job you ended up with a lot of different band pass filters and detectors. You still had problems with the filters not being linear across the BW. The receivers worked better when the transmitter did zero switching between tones. Now my understanding in analog world is that you can not build a BW filter at 20 and a second at 21 and have all the prosperities match. As I heard it, One smart ass made a statement
"that noise you hear has a tone in it at one of the frequencies so we just need to know which noise band has the highest energy". To do this correct they needed a filter for each tone with all properties matching. They made a bunch of matched filters all at the same frequency and put a mixer in front. Change the local oscillator freq for that mixer to pick the tone. The one with max energy should have the tone. Noise that you could not hear a tone in worked and over all less hardware and did a better job. I think later research found that lose of tone was quicker easer to detect.

A DSP translation of this should be easer.
----
My pc sound card may not work as good as yours, so while the higher the freq used the better, you may need to lower the freq. for some. Also some people hear better at high freq. while others have no ears left. So you could have an adjustment on the PC side that would allow it to work on poor sound cards. And instead of a bunch of hardware changes you could change an oscillator freq on receive side using above idea.

i would suggest a packet data structure so that you could preset a bunch of things to happen and then send a "DO-IT" packet. And as from what I see it is easer to detect a loss of signal, I would have the "DO-IT" packet have a half or more cycles missing so that you could sync to that missing cycles spot for when to cut latency.

On the software side you might look at a mod of the touch tone phone decoding software. That is 2 of 8 tone detection. And look at the tones you pick, the Touch tones were to prevent harmonic and other problems. You might want to pick ones that hit a bucket.

The cheap STM32F4 discovery board my be something you want to look at. High speed ADC and they add a sound chip

for what it's worth
have fun.
C

Niklas · « **Reply #12 on:** May 29, 2013, 11:46:37 pm »

A quite narrow bandpass filter together with a coding scheme like the one for IR remotes could be an option. The filter's lower break frequency is near the upper frequency limit of the sound card and the upper break frequency is selected below possible switching noise from the computer or the sound card. The filtered pulses can trigger a monostable multivibrator that demodulates the signal before before it is fed to te microcontroller.

IR remotes usually sends a wider sync pulse followed by data pulses with two different lengths depeding on the transmitted data bit value. The transmitted pulse width is sampled in the middle of the average pulse time, ie at 2 ms if 1 ms is '0' and 3 ms is '1'. Time base long time inaccurracies are compensated as the rising edge of every pulse also serves as a clock signal.

flynnjs · « **Reply #13 on:** May 31, 2013, 06:34:48 am »

Quote from: Niklas on May 29, 2013, 11:46:37 pm

A quite narrow bandpass filter together with a coding scheme like the one for IR remotes could be an option. The filter's lower break frequency is near the upper frequency limit of the sound card and the upper break frequency is selected below possible switching noise from the computer or the sound card.

I pointed this out in my previous post but it assumes that the circuit can be made to accomodate a filter or there's enough grunt to do a FIR in the MCU.
IR carrier is usually 36kHz which needs a fairly fancy DAC/ADC (you'd need to go 96kHz sound card and many MCUs don't have internal 96kHz ADC)

Niklas · « **Reply #14 on:** May 31, 2013, 09:52:12 pm »

Quote from: flynnjs on May 31, 2013, 06:34:48 am

IR carrier is usually 36kHz which needs a fairly fancy DAC/ADC (you'd need to go 96kHz sound card and many MCUs don't have internal 96kHz ADC)

There is no need to go up to 36 kHz or even 48 kHz as some other IR diodes use. The idea was to use 20 kHz something as carrier frequency, but to keep the signaling scheme from IR remotes. Same scheme but with another modulation frequency. If the signal can be filtered in HW and then demodulated, the decoding is quite simple and only needs a timer and an edge triggered interrupt with Schmitt trigger input levels.
The demodulation can be done with an RC-link between Vdd and Vss together with a transistor in parallell with the capacitor. Each pulse will discharge the capacitor and reset the timing. The edge triggering interrupt measures the voltage over the capacitor.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Embedding data in audio for decoding with a microcontroller (Read 6373 times)

NiHaoMike

Embedding data in audio for decoding with a microcontroller

David_AVD

Re: Embedding data in audio for decoding with a microcontroller

NiHaoMike

Re: Embedding data in audio for decoding with a microcontroller

flynnjs

Re: Embedding data in audio for decoding with a microcontroller

Rasz

Re: Embedding data in audio for decoding with a microcontroller

Paul Price

Re: Embedding data in audio for decoding with a microcontroller

Psi

Re: Embedding data in audio for decoding with a microcontroller

AndyC_772

Re: Embedding data in audio for decoding with a microcontroller

Paul Price

Re: Embedding data in audio for decoding with a microcontroller

Niklas

Re: Embedding data in audio for decoding with a microcontroller

Paul Price

Re: Embedding data in audio for decoding with a microcontroller

C

Re: Embedding data in audio for decoding with a microcontroller

Niklas

Re: Embedding data in audio for decoding with a microcontroller

flynnjs

Re: Embedding data in audio for decoding with a microcontroller

Niklas

Re: Embedding data in audio for decoding with a microcontroller

Share me