I would like to add a Speech Recognition function to my C++ application programmed with XCode.
I did some Speech Recognition library hunting and here are the best candidates:
OpenEars
CMUSphinx
Voce
Nevertheless, none of these solutions are satisfying, for several reasons (that might not be a 100% true statement).
My questions are:
Did you ever try to use a Speech Recognition library in a C++ program with XCode?
Do you have an advice about which library/framework to use?
If some work has already been done, might it be possible to have a basic sample code of it? (just for the beginning...)
Note: the speech recognition function I would like to create is very simple: 10 words (in english) that increment 10 variables each time they are said and recognized, that's it.
Okay, after some searches I figured out that the apple Carbon API had a SpeechRecognition.h framework!
The bad news is that it seems quite old and that the documentation/help on internet is quite poor...
Anyone to have some experience about this framework?
Thanks for your help!
Related
I have spent hours looking for this and can't figure it out. I have a program that I have made which I would like to add voice recognition to (all it does is a few simple commands like time, date, things like that...it's just for fun) and I know I have some form of SAPI on my computer because I had to include sapi.h to get the voice synthesis to work (and that's works fine by the way) but I can't figure out for the life of me how to use the voice recognition.
It appears people have already asked about C++ voice recognition on here so I apologize if this is just a duplicate but none of the others seemed to answer my question, perhaps I'm just missing something (I'm fairly new to C++ so it's very possible) but I could really use some help here.
Thanks a bunch!
----edit----
The code in the link has an issue on my computer, it can't find the file "atlbase.h" which then of course is causing all sorts of other problems (hopefully these will all be resolved when I fix the atlbase.h problem). I found this site which seems to offer an explanation which shows up on quite a few other sites and appears to work, but I don't know how to get to the file that everyone is changing.
https://answers.unrealengine.com/questions/12757/error-cannot-find-atlbaseh-when-compiling-in-vs201.html
Could someone please help me as to where the file they're all changing is?
I'm doing a project on text recognition. One of the main points here is text-to-speech translation after the recognition. Could you help me find a very simple, plain speech engine for C++ Builder project? All that I've found were not only very complicated, but also they were suitable for MFC.
So, the problem is that I'd like to convert text to speech. No recognizing, just simple convertion. Please share some info about this problem, maybe I should look it up somewhere?
Basically, I do not even know, if I'm calling it right, so I'm sorry for misunderstanding if it happens.
Microsoft's Speech API (SAPI) is implemented as a set of COM objects, and thus is usable in C++Builder projects with minimal effort.
Flite is a small text to speech engine written in portable C, with no specific OS dependencies and with a permissive open source license. You will need to create a C++ Builder project to build it, but other than that it should work just fine.
I am trying to get started with HTK, I grabbed a copy, compiled it, grabbed the book, and all went more or less fine, little troubles here and there but nothing serious.
Now after reading the book and googling quite a while, I do not see any documentation for the essential part for me: HTKLib. Everything is described into the smallest detail for all HTK tool programs (scriptable command line interface tools) but I cannot find a single example or tutorial how to actually call the lib.
Could anyone point me into a direction?
The source code for the respective tools is included, but it would be rather cumbersome to have to extract the information for a reputable library by reading the source code... I would have expected a little more documentation , but maybe I simply overlooked it?
Any help is deeply appreciated,
Tom
edit:
I was trying to use HTK for computer vision purposes, not for NLP, and for that I required that I could link against it, and call it from within my code. Thanks for your replies.
Maybe ATK is more suitable for you. Here is the explantation from the ATK site:
"ATK is an API designed to facilitate building experimental applications for HTK. It consists of a C++ layer sitting on top of the standard HTK libraries."
In addition Microsoft Research has another research tool here for training acoustic models. This includes a set visual project for HTKlib and a set of C++ HTK wrappers, but it may only include a subset of the HTK functionality and has licence restrictions.
I have not used it but use I the language modeling toolkit. I think the main intention is to use the command line tools provided. I imagine they are very flexible tools that will enable you to build and test models. Why do you want to use the code?
Also what are you trying to do?
I'm a design student currently dabbling with Arduino code (based on c/c++) and flash AS3. What I want to do is to be able to write a program with a voice control input.
So, program prompts user to spell a word. The user spells out the word. The program recognizes if this is right, adds one to a score if it's correct, and corrects the user if it's wrong. So I'm seeing a big list of words, each with an audio file of the word being read out, with the voice recognition part checking to see if the reply matches the input.
Ideally i'd like to be able to interface this with an Arduino microcontroller so that a physical output with a motor could be achieved in reaction also.
Thing is i'm not sure if I can make this program in flash, in Processing (associated with arduino) or if I need another C program-making-program. I guess I need to download a good voice recognizing program, but how can I interface this with anything else? Also, I'm on a mac. (not sure if this makes a difference)
I apologize for my cluelessness, any hints would be great!
-Susan
What you need is most likely not a speech recognition program. You are looking for a speech recognition library. You're probably not that familiar with programming yet, so the term may be unfamiliar. Basically, a library is an intermediate step between source code and a whole program.
In your case, you are really asking for a library that (1) does vocie recognition and (2) works with Adobe Flash. Unfortunately, I can't find one of them with Google. Furthermore, I've found people who've tried, and their experiments (while short of what you need) are described by others as impressive. That suggests the technology isn't there yet.
It is probably easier to move the voice recognition to the Arduino. "Voice recognition Arduino" provides a lot of good hits in Google.
I want to know about various techniques to do speech recognition and text to speech conversion.
Also please let me know about any resources like links, tutorials ,ebooks etc. on it.
Which is the most efficient technique to achieve it ?
I'm going to answer the part about speech recognition (since I don't know much about text-to-speech):
http://ecx.images-amazon.com/images/I/4190SZC61CL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA240_SH20_OU01_.jpg
This book, "Statistical Methods for Speech Recognition" is a classic that explains the mathematical foundations of statistical speech recognition, written by the founder of that area, Frederick Jelinek.
The most important concept you have to know is Hidden Markov Models. People have been using them in speech recognition for decades. A recent approach uses Conditional Random Fields, see the paper (PDF) and the associated software toolkit SCARF.
It is fairly hard to write your own speech recognizer. It's an active research area with several scientific conferences, e.g. ASRU, Interspeech, ICASSP.
Both are very wide areas.
About recognition: In this this schema you will find how to build a basic automatic speech recognition system. It isn't by any means close to the start of the art, but it is something achievable and it works. If you want to do something more advanced, read about cepstral coefficients and Hidden Markov Models. Have a look into HTK, it is a widely used toolkit for Hidden Markov Models.
About text to speech: I'd have a look at Festival.
There are multiple sphinx's. The main active ones are pocketsphinx and sphinx4.
Sphinx4 is written in Java. It is better for desktop and web applications.
Pocketsphinx is written in C. It is better for embedded devices. There are iphone/android apps that use it.
Sounds like you want pocketsphinx. Try out this tutorial:
http://www.speech.cs.cmu.edu/sphinx/tutorial.html
A better place to ask pocketsphinx/sphinx4 questions is on CMU's sourceforge forum.
Also you should provide more info like what you intend to make.
As for books, the bible of speech recognition is "Spoken Language Processing"
Since you mentioned MS -
You should just look at the Microsoft Speech site. It contains many resources for dealing with speech, including TTS and speech recognition.
If you're looking for some actual code, check out Sphinx, an open source speech recognition project from CMU. It's not written in C++, but if you're interested in algorithms, it's implemented a bunch of stuff you can learn from. (I'd like to echo #dehmann's point, too: read up on hidden markov models.)
If you are curious about what to do with your fancy speech recognition you should read:
Voice Interaction Design by Randy Allen Harris
It provides some great advice about when to use Voice and how to use it in an application.