Author Topic: Reverse engineering audio data  (Read 4036 times)

0 Members and 1 Guest are viewing this topic.

Offline kfitch42Topic starter

  • Frequent Contributor
  • **
  • Posts: 300
  • Country: us
Reverse engineering audio data
« on: October 17, 2013, 03:03:14 am »
First a caveat. I am not sure if this the right forum for this question, but it seems like the right kind of people are here that might be able to help.

I (well, my son really) have(has) an embedded electronics device (Nintendo DSi XL) that runs an app (FlipNote studio) that can record audio (and make fun little animations). It will export the animations as .gif files, but that obviously loses the audio. I would like to be able to extract the audio directly from the proprietary files (which can be saved to an SD card). Yes, I can just connect the headphone port to the line-in on my computer, but, as an engineer (well software engineer who works around a lot of electronics/embedded stuff), I want to see if I can do it better :)

There has already been some effort by others in this problem space, and they have identified WHERE in the proprietary file the audio data is, but properly decoding it isn't quite there:
 http://www.dsibrew.org/wiki/Flipnote_Files/PPM

After some experimentation I have found that treating the data as linear PCM with 1 channel, 8 bits per sample at 4096Hz (yikes, worse that POTS!) gets the audio to the edge of recognizable. If I record a simple tone (e.g. 440Hz sine wave), the extracted audio has a clear peak in the spectrum at 440Hz. Looking at the data, it looks vaguely sine-wave-ish. If I feed the headphone port on the DS into my line in and record the output, it looks MUCH nicer. I have tried a similar approach with spoken language (e.g. the word "testing"). If you listen VERY carefully you can hear the word under a hiss of (white/pink?) noise. But, a recording from the headphone jack is almost noise free.

I have attached a zip file with some files. In the recorded directory are files that I recored by playing back sounds with the headphone jack connected to the line-in on my computer. In the extracted directory are files I created by attempting to extract the audio directly from the proprietary file format. There are .wav files (my best attempt at 'decoding' the audio data) and .raw files. The .raw files are just the bytes from the audio section(s) of the proprietary file.

P.S. I have attempted to treat the data as Mu-law and A-law and the results were worse than treating it as linear. BUT audio is not my area of expertise and I may have just fouled things up.
 

Offline marshallh

  • Supporter
  • ****
  • Posts: 1462
  • Country: us
    • retroactive
Re: Reverse engineering audio data
« Reply #1 on: October 17, 2013, 03:21:23 am »
Seems to be ADPCM... 8000hz sounds about right and a common freq for voice
Verilog tips
BGA soldering intro

11:37 <@ktemkin> c4757p: marshall has transcended communications media
11:37 <@ktemkin> He speaks protocols directly.
 

Offline kfitch42Topic starter

  • Frequent Contributor
  • **
  • Posts: 300
  • Country: us
Re: Reverse engineering audio data
« Reply #2 on: October 17, 2013, 04:20:36 am »
Seems to be ADPCM... 8000hz sounds about right and a common freq for voice

I think you are onto something. I used the builtin python function for converting adpcm to linear pcm ( http://docs.python.org/2/library/audioop.html#module-audioop ), and the result is interesting. When applied to a 440Hz sine-wave tone I get what looks a bit like a 440Hz triangle wave riding on a drifting DC bias. One sample I have is ~12sec long and it spends several seconds entirely below 0, and then a few seconds entirely above... My guess is that it is some slight variant on ADPCM.

I will play with this a bit more soon, hopefully this weekend.
 

Offline marshallh

  • Supporter
  • ****
  • Posts: 1462
  • Country: us
    • retroactive
Re: Reverse engineering audio data
« Reply #3 on: October 17, 2013, 04:40:19 am »
I was able to get discernable speech with audacity raw import... VOX ADPCM... it could be one of many numerous variations, or use a custom codebook specific to that game.
Verilog tips
BGA soldering intro

11:37 <@ktemkin> c4757p: marshall has transcended communications media
11:37 <@ktemkin> He speaks protocols directly.
 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8413
Re: Reverse engineering audio data
« Reply #4 on: October 17, 2013, 11:23:10 am »
Definitely some sort of ADPCM... VOX ADPCM @8192Hz is close and the pure tones have the frequency right on, but the speech sample has some very bizarre DC level shifts. Could you try signals of silence, 512Hz, 1024Hz, 2048Hz, and 4096Hz?

The recorded output looks like it only has 8 quantization levels - G.726 24kbit/s might be a candidate.
 

Offline madshaman

  • Frequent Contributor
  • **
  • Posts: 698
  • Country: ca
  • ego trans insani
Reverse engineering audio data
« Reply #5 on: October 17, 2013, 11:47:26 am »
Quick comment, relying on spectral data could be misleading as frequency is quite immutable.

What mean is, as long as there is a pattern of data which has a consistent period there will be spectral content at that period/frequency and all its harmonics.

So, for example, even if you're using the completely wrong decoding scheme, you're likely to see spectral content from your original encoding and you're likely to be able to discern the original content with your ears.

The good news is that this effect *probably* won't occur if the audio data was expressed in the frequency domain, so the encoding is unlikely to be DCT based (like layer MPEG1 (mp3))

I am super busy at the moment so I don't have time to hack the files, but I would highly suggest trying other decoding schemes similar to mu-law audio which encodes by compressing the dynamic range

I'd also give both speex: http://www.speex.org/ and flac: https://xiph.org/flac/ a try.

I would also make sure you're not playing the result back (assuming you're converting to raw bits) with the wrong signed-unsigned sense (I doubt this is the trouble, but would make sure myself).

One suggestion: make a program/script that iterates through all the conversion formats available to a program like vox and automates comparing the result to your original audio.

Good luck, sounds like fun!
To be responsible, but never to let fear stop the imagination.
 

Offline madshaman

  • Frequent Contributor
  • **
  • Posts: 698
  • Country: ca
  • ego trans insani
Reverse engineering audio data
« Reply #6 on: October 17, 2013, 11:56:03 am »
P.S.

If you automate decode-testing, it might be worth it to add trying all "rot" combinations against the result, assuming that either the input data was rotated before encoding or the result was rotated after encoding.

There are other gotchas:

- they used huffman coding with their own private tables (you *might* be able to detect this as a tone being slightly frequency shifted after a failed decoding that still sounds right).

- they use private de-quantisation lookup tables, this isn't likely to cause a frequency shift in wrongly decoded data
To be responsible, but never to let fear stop the imagination.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf