Google Dialogflow - Asking a customer for a unique number - google-cloud-platform

I am trying to ask a customer for a unique number. I have tested this using the test console and it's coming up with multiple variations without giving a value.
The numbers are a mix of 4/6/8 digits. I want a customer to be able to say 'my plan number is 12345678' and for me to be able to get that value and work with it.
What parameters/system entities should I be using to get a result? Often times it will miss a digit/put in a hyphen etc.
P.S. this is using voice only, not text.

There's a feature called Auto speech adaptation that will help you in this specific case. After enabling it, check the point 5 in the Example speech recognition improvements. It explains how you can use auto speech adaptation with Regexp entities to capture digit sequences and it gives you a regular expression you can use. It also recommends using #sys.number-sequence entity.
The enhanced speech models can also help with the number identification accuracy, but bear in mind that it is still a beta feature.
For reference you can also check the article Improving speech recognition for contact centers in the Google Cloud Blog.

Related

Autocomplete Per-Character Usage

I had a few questions related to the changes to the Google Maps/Routes/Places that happened a couple weeks ago
My app calls places.PlacesService.GetAutocompletePredictions and, at the moment, in the Google Cloud Platform, I can see my "request" usage. However the billing switched to "per-character" and I could not find my usage by that metric. Any idea how to view it?
Is there a way to see my usage for the past few months?
How does this "per-character" thing work? Is it the number of characters of the prediction that I pick or the number of characters predicted at the end of my pre-filled phrase? Google doesn't seem to want to make their pricing particularly clear

How to disable auto correction for Google Cloud Speech to Text API

Is there a way to disable auto correction for Google Cloud Speech to Text API? It is important for me to get accurate transcript of user's speech, with any errors they make rather than a corrected version.
It is difficult to distinguish between mistakes made by speaker (grammar/pronunciation errors) in the audio content and mistakes made by Speech API. However, you can check different versions of text output predicted by model behind the scene with the help of maxAlternatives property of the API.
You have not provided the example of such use-case, but if you are already expecting unusual pronunciation or Acronyms you can provide hint to the request using phraseHint property.
Please provide further details if it doesn't answer your question.

Google Cloud Vision API - TEXT_DETECTION

When i try to recognize a text in image, like the italian word "Perchè", Vision API get back the word "Perche" (give back the "e" and not the correct one "è").
I don't want to use languageHints to try to obtain better results because i've to do OCR Recognition across different language.
What is the problem here?
This is known issue with the Cloud Vision API when you don't use language hints.
You can see the actual bug report here.
It is in state accepted, but there seems to be radio silence on it for the last few months. It may take some time to roll it out.

Getting the amplitude(or rms voltage) of audio signal captured in C++ by wavin lib.?

I am working on a very basic robotics project, and wish to implement voice recognition in it.
i know its a complex thing but i wish to do it for only 3 or 4 commands(or words).
i know that using wavin i can record audio. but i wish to do real-time amplitude analysis on the audio signal, how can that be done, the wave will be inputed as 8-bit, mono.
i have thought of divinding the signal into a set of some specific time, further diving it into smaller subsets, getting the average rms value over the subset and then summing them up and then see how much different they are from the actual stored signal.If the error is below accepted value for all(or most) of the sets, then print the word.
How can this be implemented?
if you can provide me any other suggestion also, it would be great.
Thanks, in advance.
There is no simple way to recognize words, because they are basically a sequence of phonemes which can vary in time and frequency.
Classical isolated word recognition systems use signal MFCC (cepstral coefficients) as input data, and try to recognize patterns using HMM (hidden markov models) or DTW (dynamic time warping) algorithms.
You will also need a silence detection module if you don't want a record button.
For instance Edimburgh University toolkit provides some of these tools (with good documentation).
If you don't want to build it "from scratch" or have a source of inspiration, here is an (old but free) implementation of such a system (which uses its own toolkit) with a full explanation and practical examples on how it works.
This system is a LVCSR (Large-Vocabulary Continuous Speech Recognition) and you only need a subset of it. If someone know an open source reduced vocabulary system (like a simple IVR) it would be welcome.
If you want to make a basic system from your own, I recommend you to use MFCC and DTW:
For each target word to modelize:
record some instances of the word
compute some (eg each 10ms) delta-MFCC through the word to have a model
When you want to recognize a signal:
compute some delta-MFCC of this signal
use DTW to compare these delta-MFCC to each modelized word's delta-MFCC
output the word that fits the best (use a threshold to drop garbage)
If you just want to recognize a few commands, there are many commercial and free products you can use. See Need text to speech and speech recognition tools for Linux or What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? or Speech Recognition on iPhone. The answers to these questions link to many available products and tools. Speech recognition and understanding of a list of commands is a very common problem solved commercially. Many of the voice automated phone systems you call uses this type of technology. The same technology is available for developers.
From watching these questions for few months, I've seen most developer choices break down like this:
Windows folks - use the System.Speech features of .Net or Microsoft.Speech and install the free recognizers Microsoft provides. Windows 7 includes a full speech engine. Others are downloadable for free. There is a C++ API to the same engines known as SAPI. See at http://msdn.microsoft.com/en-us/magazine/cc163663.aspx. or http://msdn.microsoft.com/en-us/library/ms723627(v=vs.85).aspx
Linux folks - Sphinx seems to have a good following. See http://cmusphinx.sourceforge.net/ and http://cmusphinx.sourceforge.net/wiki/
Commercial products - Nuance, Loquendo, AT&T, others
Online service - Nuance, Yapme, others
Of course this may also be helpful - http://en.wikipedia.org/wiki/List_of_speech_recognition_software

In need of a SaaS solution for semantic thesaurus matching

I'm currently building a web application. In one of it's key processes the application need to match short phrases to other similar ones available in the DB.
The application needs to be able to match the phrase:
Looking for a second hand car in good shape
To other phrases which basically have the same meaning but use different wording, such as:
2nd hand car in great condition needed
or
searching for a used car in optimal quality
The phrases are length limited (say 250 chars), user generated & unstructured.
I'm in need of a service / company / some solution which can help / do these connections for me.
Can anyone give any ideas?
Have you looked at SAS text miner? It may be suited to this kind of application. I have only seen a demo of it and it would be able to tokenize the data just fine. You may need some custom programming around the synonyms though.