How 2 Get A Table Lamp to Always Understand "On" and "OFF" and "Dim"

How 2 Get A Table Lamp to Always Understand "On" and "OFF" and "Dim"
Posted by martys on 18 Jul, 2017 12:43
If they can do it on Star Trek, why can't I?

I have succeeded in making the ultimate kitchen timer for my girlfriend, now is no time to rest on my laurels! My next project is to take sound control of a table lamp.

I don't want to use any stink'n pre-fab speech recog or synth IC's. I will use a mike, op-amps, active filters made with discrete components, etc. but this is a software challenge.

(1)What would be the best choice for a minimal MCU able to handle this SW task written in C code?

(2)How do I approach the software challenge of writing code that can interpret these simple spoken commands?

(3)If the lamp can understand the command clearly, it should say, "Ok" and perform the requested action.

(4)No Rasberry Pi, no room for this, has to be small and low-poweredly green enough to fit comfortably in the base of a small table lamp.

(5)This makes an idiot lamp a moron robot, but it doesn't have to respond immediately, it can take a few number-crunching seconds to exact a reply or else say, and only then if it is only just reasonably sure, "Huh?"

(6)At the same time, it must clearly understand me, my girlfriend, or a even her <=7 year old child and not falsely respond.

(7)I don't want wireless connectivity to Google AI or to any PC equip to augment operation. Should work in nowhereville without any internet connection or wireless link to a PC.

#1 Reply
Posted by CJay on 18 Jul, 2017 13:13
No Pi you say but Pi Zero W with Google AIY kit would do this in small physical size.

if you really don't want to go that way there were/are dedicated single chip solutions for it.

Software speech recognition is not trivial and requires a good amount of CPU power, might be possible with a Teensy or similar but then you're upping the physical size again.

#2 Reply
Posted by martys on 18 Jul, 2017 13:28
http://www.raspberrypi-spy.co.uk/2017/05/the-magpi-issue-57-comes-with-google-voice-interaction-kit/

This is a bigger box that is too smart, need something small, to work at all.

Having trouble finding the exact power specs for the Pi Zero W.

#3 Reply
Posted by CJay on 18 Jul, 2017 14:00
You only need the microphone part of the AIY unless you want it to talk back at you and that board is tiny, much smaller than the Pi Zero W

#4 Reply
Posted by rstofer on 18 Jul, 2017 15:33
That MagPi board is an add-on board to a Raspberry PI so the combined boards are probably too large for the base of a lamp. I would do it anyway... Even if I had to sit the cardboard box along side the lamp..

There have been some Hackaday projects:
http://hackaday.com/2010/07/11/adding-speach-recognition-to-your-embedded-platform/

Speech recognition is hard! The fact that Google gets it nearly right is due to the enormous servers they have parsing the stuff.

#5 Reply
Posted by CJay on 18 Jul, 2017 15:48
Quote from: rstofer on 18 Jul, 2017 15:33
That MagPi board is an add-on board to a Raspberry PI so the combined boards are probably too large for the base of a lamp. I would do it anyway... Even if I had to sit the cardboard box along side the lamp..

There have been some Hackaday projects:
http://hackaday.com/2010/07/11/adding-speach-recognition-to-your-embedded-platform/

Speech recognition is hard! The fact that Google gets it nearly right is due to the enormous servers they have parsing the stuff.

It is, Apple and Amazon too have data centres to handle this stuff and it can create laughable mistakes.

I have anecdotal evidence from a colleague that the AIY kit works with a Raspberry Pi Zero W and I *think* you'd only need the microphone board, it talks I2S directly so I think it's 'just' a matter of getting the data to Google to parse and then processing the returned data.

It's a good reason for me to build the AIY kit that's been hiding under my table since two days after it was released.

#6 Reply
Posted by Richard Crowley on 18 Jul, 2017 16:09
There are dozens of enormous server farms which comprise "Alexa" and "Siri" and all those speech-recognition features. There is no way that kind of sophisticated speech recognition is implemented inside the little gadget at the user end of the transaction.

I heard an interesting news story last week on the radio. Some woman was being attacked by a thug who broke into her home and he shouted "You didn't call the sheriff, did you?" The little Amazon gadget in the kitchen picked up the thug's question and dutifully called the sheriff.

#7 Reply
Posted by nfmax on 18 Jul, 2017 16:39
OTOH the Echo Dot can recognise the word 'Alexa' (or one of the other magic words) all by itself, independent of the speaker, and with background noise such as music & speech, using only just over a watt. And it's small enough to fit into a lamp base.

What the OP is asking for is probably not impossible, just very difficult.

#8 Reply
Posted by Zero999 on 18 Jul, 2017 16:47
Quote from: nfmax on 18 Jul, 2017 16:39
OTOH the Echo Dot can recognise the word 'Alexa' (or one of the other magic words) all by itself, independent of the speaker, and with background noise such as music & speech, using only just over a watt. And it's small enough to fit into a lamp base.

What the OP is asking for is probably not impossible, just very difficult.
Probably blissfully unaware of how difficult it is.

Just for a start. Record several uncompressed .wav files of each of the voice commands, by several different people: male, female and with different accents. Now write a program to differentiate between them. Now hopefully you'll see how difficult it really is. You'll need to experiment a lot, just to get something which works most of the time.

#9 Reply
Posted by rstofer on 18 Jul, 2017 18:54
Way back in the '54, Popular Electronics had an article about voice control of a model railroad. I was 9 years old, I certainly didn't understand how it worked... It only recognized "Stop" and "Go" but I believe it did it with filters to detect high frequency components. Or maybe it recognized the time between "sssss" and "TOP versus "GUH" and "oh".

So, I suppose a very limited vocabulary could be created with analog circuits.

Found a reference for Popular Electronics Volume 1 No. 3 - December 1954
http://www.smecc.org/popular_electronics.htm

And then I went on eBay and found:
http://www.ebay.com/itm/272729262971

And bought it... I have no idea why...

My long term memory works a lot better than my short term memory. I remember reading the volume while riding in our car. I looked like a cool project!

#10 Reply
Posted by martys on 18 Jul, 2017 18:58
I never said it wasn't hard to accomplish this project goal, but that I like the idea and I am willing to accept the challenge. I like to accomplish what is called difficult. There is some feeling of achievement to learn and develop, improve and maybe perfect something that is called difficult to do.

My goal is to get this idea to work well, even if it may never be perfected by me due to the multitudes of issues that could cause me to finally give up such as excess time spent on this project or having waning interest.

I would consider this idea has been successful even if it only worked in a reasonably quiet room(exactly the target environment) and may only recognize one or two commands or one or two special command speakers.

Here's my first primitive software and hardware strategy to accomplish this.
(1)Pick up the sound and amplify and filter the sound pickup to filter out everything but a portion of the human voice spectrum.
(2)Sample this mike pickup and convert into digital data and store in RAM for later analysis.
(3)Try to identify the length and characteristics of vowels and consonants by some software means.
(4)Maybe with (3) using FFT to analyze the frequency spectrum of a sample.
(5)Help identify a spoken command by the length of the command. A spoken command would have to fit into a time window.
If the sound stream is too long or too short it is ignored. Only spoken commands fitting into the time window would be analyzed
(6) The time between accented consonants or length of vowels are also important clues to be weighed by the software.

For instance, "ON" has a soft consonant ending following a duplicating vowel "Ah" sound, while "OFF" has a distinctive "FFFF" brown noise like structure completing the sound bytes sample and would also last slightly longer.

The software would weight comparisons to expected results and assign a figure of merit of match to help make a command decision.

As far as replying to commands with a voice response, this has already been easily accomplished by me playing back PCM recordings stored in a SEEPROM.

Whacha think?

#11 Reply
Posted by rstofer on 18 Jul, 2017 19:05
Quote from: nfmax on 18 Jul, 2017 16:39
OTOH the Echo Dot can recognise the word 'Alexa' (or one of the other magic words) all by itself, independent of the speaker, and with background noise such as music & speech, using only just over a watt. And it's small enough to fit into a lamp base.

What the OP is asking for is probably not impossible, just very difficult.

Does "Alexa" have to be pronounced in some particular way? Like with the emphasis on the second syllable or can I prononce it "AL" "exa". I wonder how discriminating it really is. Does "LEX" "us" work? That kind of thing.

Do we know it works "standalone" or could it be sending "Alexa" to the big server farm?

We're not going to buy one but I will admit to being intrigued.

BTW, Microsoft Kinect does speech recognition and there is a development kit but I'm not sure this is the least bit helpful.

#12 Reply
Posted by rstofer on 18 Jul, 2017 19:13
Google might be your friend!

Search for 'speech recognition ARM' and there are several links like:
http://www.ti.com/tool/TIDEP0066

#13 Reply
Posted by BrianHG on 18 Jul, 2017 19:29
Since you want only a few words, as long as there is no background noise, with a little smart diligence, you could do it with a 16bit PIC which has a built in fast enough 12 bit or more ADC and some smart coding & tables. It has certainly been done with much slower MCUs and 1 bit ADC in the past as seen here in Dave's video.

You should be able to outperform this old 80's 1 bit ADC piece of junk easily today.
With the MIC amp, I would do some of the filtering already at that stage, all too easy with the OPAMPs you will be using to amplify the MIC anyways, to lower CPU processing.

#14 Reply
Posted by Richard Crowley on 18 Jul, 2017 20:41
Quote from: rstofer on 18 Jul, 2017 18:54
Way back in the '54, Popular Electronics had an article about voice control of a model railroad. I was 9 years old, I certainly didn't understand how it worked... It only recognized "Stop" and "Go" but I believe it did it with filters to detect high frequency components. Or maybe it recognized the time between "sssss" and "TOP versus "GUH" and "oh".
It worked by detecting the number of syllables. It was a pretty primitive peak detector controlling a relay.
Ref: http://www.americanradiohistory.com/Archive-Poptronics/50s/54/Pop-1954-12.pdf page 17

#15 Reply
Posted by martys on 18 Jul, 2017 20:57
Thanks BrianHG, the idea of using a midrange 16-bit PICC looks like a first choice for a simple low-power solution with micro-power standby.

I also watched Dave's video on an amazingly simple but unreliable HW approach and I certainly think I can make something to work many times better.

Any other comments on my strategy?

#16 Reply
Posted by BrianHG on 19 Jul, 2017 20:11
One thing I can recommend, tune you audio amp filter and ADC sample rate to match something close to a phone line. Something like and audio bandwidth close to flat from 400Hz to 3KHz, meaning audio should begin around 200hz and roll off out around 6khz, sample at around 12khz. This should be readily understandable for MCU speech decoding and anything else would just be excess noise you need to process around.

There do exist cheap tricks to achieve a fast cheap quality spectrum analysts of the source audio using integer math only, basically many multiply adds in a pipe, but I suspect to get the quality you want, you will just need to go with a pic which has the maximum ram and instruction ram so you aren't left squeezing code in at the last minute for trying to save 2-3$ on a single PIC. That 1 bit input motorola MCU probably sampled at 1khz for 1.5 seconds, my guess looking at Dave's video, the sift through that 1 bit pattern, 187 bytes of ram, probably all they could muster on an MCU at the time. You'll be operating at, say 8 bit, 2 seconds sampled at 8Khz is 16k. If you do a real-time cheap FFT, and retain 50samp/second 64 bands, at 8 bit each, you are talking 7kb for 2 seconds. Take a look at this guy, I chose it because it's in a DIP package, easy for testing:
http://www.microchip.com/wwwproducts/en/PIC24EP512GP202
512k flash, 48k ram.
It also has 2x built in opamps which may be good enough for your MIC, + wired to it's internal comparators to wake up the MCU from sleep when there is sufficient noise. An all in 1 solution. Though, if you enable the internal op-amps and comparators, I cant say how little power the device will consume during sleep.
Also, remember, you most likely need to keep the MIC continuously powered as well.

#17 Reply
Posted by BrianHG on 19 Jul, 2017 21:02
Use this PIC if you want to use floating point processing:
http://www.microchip.com/wwwproducts/en/dsPIC33EP512GM706
It also has hardware multiple accumulate used in FFT decoding. But, judging that even a 7MHz Amiga can do a realtime FFT in integer with a 68000, this 70mips, single clock/multiply/add, floating point monster (by comparison) will leave you with mips to spare...

#18 Reply
Posted by Kjelt on 19 Jul, 2017 22:10
Yeah voice recognition
Youtube search on voice recognition fail
Cars with VR that fail, MS Cortana fail, those are systems with millions of R&D and they are mediocre at best.
Good luck.

https://youtu.be/NmWRhhvf60Y

#19 Reply
Posted by rstofer on 20 Jul, 2017 00:06
Quote from: Richard Crowley on 18 Jul, 2017 20:41
Quote from: rstofer on 18 Jul, 2017 18:54
Way back in the '54, Popular Electronics had an article about voice control of a model railroad. I was 9 years old, I certainly didn't understand how it worked... It only recognized "Stop" and "Go" but I believe it did it with filters to detect high frequency components. Or maybe it recognized the time between "sssss" and "TOP versus "GUH" and "oh".
It worked by detecting the number of syllables. It was a pretty primitive peak detector controlling a relay.
Ref: http://www.americanradiohistory.com/Archive-Poptronics/50s/54/Pop-1954-12.pdf page 17

And it relied on Lionel's remote control scheme which embedded a stepper relay in the locomotive. All the box is doing is recognizing a syllable and pulsing the track power which steps the stepper.

Did you notice how casually they treated 115V back in the day? I remember my father telling me he would rather I didn't work on line operated projects on his test bench. So, sure thing, the next day I was working on line voltage stuff.

Two prong power cord...

#20 Reply
Posted by dferyance on 20 Jul, 2017 16:09
You may want to consider getting speech recognition working on a PC first before running it on a MCU. This comes with some risks of having difficulty porting your code over or performance tuning after-the-fact but it also helps mitigate some of your risks. You will have better dev tools available on a PC and it let's you concentrate on tuning your algorithm or at least getting the basics working. Getting it all working on a MCU to start has both the risks of your algorithm not working along with the limitations of embedded tools and hardware all at once.

Quote
I don't want to use any stink'n pre-fab speech recog or synth IC's. I will use a mike, op-amps, active filters made with discrete components, etc. but this is a software challenge.

So it is harder to develop it on a PC if you want to be designing input filters as your connection to a PC will be different than on a MCU. However the "this is a software challenge" will be greatly helped by not having to deal with cross-compile, downloading to target, debugger and breakpoint limitations and such. Also if you manage to fail at getting speech recognition working reliably, at least you didn't waste extra time on these other problems that you didn't need to solve yet.

#21 Reply
Posted by Alex Eisenhut on 20 Jul, 2017 23:49
http://www.ebay.ca/itm/ASR-Speech-Recognition-LD3320-Professional-SP-Voice-Recognition-Voice-Module-/331748869120?hash=item4d3dc60400:g:VmEAAOSwcUBYVQhu

#22 Reply
Posted by rstofer on 21 Jul, 2017 14:53
Quote from: dferyance on 20 Jul, 2017 16:09

So it is harder to develop it on a PC if you want to be designing input filters as your connection to a PC will be different than on a MCU. However the "this is a software challenge" will be greatly helped by not having to deal with cross-compile, downloading to target, debugger and breakpoint limitations and such. Also if you manage to fail at getting speech recognition working reliably, at least you didn't waste extra time on these other problems that you didn't need to solve yet.

I wonder if a PC sound card plays into this. It is possible to model the filtering done by the card, an external microphone can be added and pre-filtering can be easily done. At some level, it should be pretty easy to capture the input. After all, sound card oscilloscopes are still around so code is out there somewhere.

Among others:
http://www.analog.com/en/analog-dialogue/articles/turning-pc-sound-card-into-sampling-oscilloscope.html

Then there is Matlab! Is there anything this tool can't do?
https://www.mathworks.com/company/newsletters/articles/developing-an-isolated-word-recognition-system-in-matlab.html

Sure, you can't bundle up Matlab in the base of a lamp but there must be something to learn. Not surprisingly, Matlab has a data aquisition toolbox that is aimed straight at this kind of thing.

Google 'Yule Walker' and read on... Many of the links head back to Matlab (or Mathworks) but there is a lot of information out there. Who knew that speech recognition is an exercise in truly ugly math? Looks like a good place to play with Fortran to me!

#23 Reply
Posted by martys on 22 Jul, 2017 01:10
Thanks again rstofer and also dferyance for your good advice.

I have been using a music editor program on my PC called "Cool Edit Pro" which records and displays the waveform of spoken words in great detail and has analytical tools, FFT and filters, effects, etc. and this helps me quite a bit to understand speech issues and with visual graphical feedback to get a sound understanding of the nuances and complexity of speech. This does help me a lot to know how to write the code to get a MCU to recognize a few words.

#24 Reply
Posted by martys on 22 Jul, 2017 01:24
Thanks again, BrianHG, your help is much appreciated and will help me organize my path to achieve my goals. I am carefully looking over the spec sheet for the PICC MCUs you recommend.

I can easily see how this project can stay green by allowing the CPU to wake up from ULP sleep a few times each second to take a listen.

I get the idea of adding a trigger word to be said before a command to be used to waken the MCU(such as the word "lamp" that needn't be fully recognized, but can cause the MCU to make a waking decision based just on the length window of the sound heard by the MCU.)

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

There was an error while thanking

Thanking...

Go to page:

1 2 » All

Full site Menu

Navigation

Powered by SMFPacks Advanced Attachments Uploader Mod