Dialogflow cuts off number input at 3-4 digits - regex

I'm currently making a voice assistent and I'm running into the following issue:
The voice detection of Google Dialogflow severely butchers my numeric inputs spoken into the assistent.
Like when I clearly pronounce "One nine, eight, seven, two, three" it will turn it into 987 or 1987. It just seems to cut off listening and continues straight away when it thinks it has a full entity.
I have made a custom composite entity that is built up out of three different recognition patterns.
#sys.number-integer:number-integer
NumberRegex ^([1-9]{1}([0-9]){1,4}[0-9]{1})$
NumberCardinals #sys.cardinal:cardinal (repeating 3-6 times as composite entity)
Basically what I want to detect is a numeric input consisting of 3 numbers minimum and 6 maximum.
Typing works great, it detects all combinations flawlessly whether it's cardinals or numbers...
But speech is just a huge problem, and it cuts off before the user has finished speaking.
Anyone got any suggestions on how to overcome this? And force DialogFlow to listen to the max amount of numbers?

Related

Algorithm to rank the simplicity of a random name

I have been looking for a name for a new project. I want the name to have available domains and social media handles. For months, all those I can think of are taken.
So I generated a list of names with at least a consonant and a vowel and checked if the domains are available (which is very fast). I have about a million possible names.
I would like to sort them by some rank of simplicity. "Aaazq" would be close to the bottom, "Cawel" would be close to the top. I thought of the CVC structure (Consonant-Vowel-Consonant) and wonder if some more sophisticated algorithm exists. I searched for "sonority" but it has a different meaning in linguistics.
How can I automatically rank the simplicity of a random name?
I assume you would judge simplicity as compared to a target language, say English. Something that is 'simple' in English might not be 'simple' in German or Korean, as these languages have very different phonological structures.
I would recommend the following:
collect some data of the language you are using. Just get some novels from Project Gutenberg, for example, or newspaper articles. Whatever you can easily get hold of.
now generate n-grams from this: all sequences of two (bigrams) or three (trigrams) letters. Turn this into a frequency list, so that common n-grams are at the top of the list with a high frequency.
turn your suggested name into n-grams. Count how many times the respective n-gram occurs in your frequency list, and take the average or median of the result
Your examples would do as follows:
aa aa az zq: "aa" is rare ("aardvark") "az" a bit more common ("glaze", "raze"), and "zq" would not exist. So, not a very high score.
ca aw we el: all of these are fairly common in English words, so a reasonably high score.
You could also add a dummy # at the beginning and the end, so in your first example you'd get #a, which is fine, as many English words start with "a", but the final q# bombs out, as there's only words such as "Iraq" which end in a "q".
You can obviously do the same for other languages.
Also, you can reverse the process in a way, and pick random n-grams from your frequency list to generate names: by picking higher-frequency n-grams you will make sure the name is a good match with the phonological structure of your target language.
Note for pedants: I use phonological structure, but it's really its representation in the spelling system that we're dealing with here.

Regex to identify Store Credit Card numbers

There are very detailed regex expressions to identify Visa, MasterCard, Discover and other popular credit card numbers.
However, there are tons of other credit cards; termed popularly as Store Credit Cards (these are not the Visa or Amex powered cards). Examples of these cards are Amazon, GAP brands, Williams Sonoma, Macy's and so on. Most of these are Synchrony Bank Credit Cards.
Is there a regex to identify these different brand credit card numbers?
It's ludicrous to use a regex to identify the network. All it takes is a prefix matching at most.
A card number has 16 digits. The first few identify the network and the bank.
Some people would say that Visa starts with 4 and MasterCard starts with 5 but that's a broad approximation at best. You can have a look at your card, should be right most of the time.
It would be easy to figure out what a card is if one could get a registry of known prefixes, but there is no public registry to my knowledge. I highly doubt that any of the parties involved would like to publish that information.
The first eight digits (until recently this was six digits) of an international card number are known as the Issuer Identification Number (IIN) and the registry that maintains this index is the American Bankers Association
The list of IINs is updated monthly and spans tens of thousands of rows. Unfortunately a fixed Regex isn't going to be accurate for any length of time.

Skip gram in word2vec - what is the number of outputs

The following images are often represented to describe the word2vec model with skip-gram:
However, after reading this discussion on stackoverflow, it seems that word2vec actually take 1 word and input and 1 word as output. The output word is randomly samples from the window. (And this is performed X number of times to generate X input/output pairs.)
It seems to me then that the above image is not correctly describing the network. My question is: is the 1 input/1 output standard (the Tensorflow word2vec tutorial takes this approach and calls it skip-gram) or do some networks actually take the structure of the above image?
It's not a great diagram.
In CBOW, those converging arrows are an averaging that happens all-at-once, to create one single 'training example' (desired prediction) that is (average(context1, context2, ..., contextN) -> target-word). (In practice averaging is more common than the 'SUM' shown in the diagram.)
In Skip-Gram, those diverging arrows are multiple training examples (desired predictions) made one-after-the-other.
And in both diagrams, while they look a bit like neural-net node-architectures, the actual hidden-layer and internal-connection weights are just implied inside the middle-column-to-right-column arrows.
Skip-gram is always 1 "input" context word used to predict 1 nearby (within the effective 'window') "output" target word.
Implementations tend to iterate through the whole effective window, so every (context -> target) pair gets used as a training-example. And in practice, it doesn't matter if you consider the central word the target-word and each word around it to be context-words, or the central word the context-word and each word around it to be target-words – both methods result in the exact same set of (word -> word) pairs being trained, just in a slightly different iteration order. (I believe the original Word2Vec paper described it one way, but then Google's released code did it the other way for reasons of slightly-better cache efficiency.)
In fact the effective window, for each central word considered, is chosen to be some random number from 1 to the configured maximum window value. This turns out to be a cheap way of essentially weighting nearer-words more: the immediate neighbors are always part of training-pairs, further words only sometimes. That is, pairs are not randomly sampled from the whole window - it's just a random window size. (There's another down-sampling where the most-frequent words will be randomly dropped so as not to overtrain them at the expense of less-frequent words, but that's a totally separate process not reflected in the above.)
In CBOW, instead of up-to 2*window input-output pairs of the (context-word -> target-word) form, there's a single input-output pair of (context-words-average -> target-word). (In CBOW, a loop creates the average value for a single N:1 training-example for one central word, and then splits the backpropagated error across all contributing words. In skip-gram, a loop creates multiple alternate 1:1 training-examples for one central word.)

Speech Recognition for small vocabulary (about 20 words)

I am currently working on a project for my university. The task is to write speech recognition system that is going to run on a phone in background waiting for few commands (like. call 0 123 ...).
It's 2 months project so it does not have to be very accurate. The amount of acceptable noise can be small and words will be separated by moments of silence.
I am currently at point of loading sample word encoded in RAW 16 bit PCM format. Splitting it to chunks (about 50 per second) and running FFT on each chunk in order to get frequency spectrum.
Things to solve are:
1) going through the longer recording and splitting it into words.
2) finding to best match for the word
1) I was thinking about just checking chunk after chunk and if I encounter few chunks that have higher altitudes of human voice frequencies assume that the word has started. Anyway I am looking for resources that may help with this.
2) This one seams a little bit tougher. Is it necessary to use HMM's for system like this or maybe there are simpler methods assuming that the vocabulary is so small ( 20 words )?
Edit:
The point of the project is writing the system on my own so I cannot use ready libraries like Sphinx or HTK.
Regards,
Karol
If anybody will have the same question in future. Look for 2 main keywords:
MFCC - Mel-Frequency cepstrum coefficients to calculate series of coefficients for each word template
DTW - To match captured word with templates
Good enough description of DTW can be found on wikipedia
This approach was good enough to have around 80% accuracy on 20 words dictionary and give a good demo during the class.
To recognize commands on the phone you can use Pocketsphinx. Tutorial which covers speech recognition applications on Android is available on CMUSphinx website.

calculating "levenshtein social network" *very* efficiently

I'm doing a code challenge online involving finding the 'social network' of words who are related through their Levenshtein distances. My Levenshtein function is correct. I'm recursively adding to a global set, and I'm using a map of tuples to boolean values to cache whether or not any pair of words has a Levenshtein distance of 1. The code is supposed to terminate in 5 seconds. I'm not sure how this is even close to possible. I'm sure that there is some aha insight that makes
this possible. Can anyone see that right off the bat?
Problem Statement:
Two words are friends if they have a Levenshtein distance of 1. That is, you can add, remove, or substitute exactly one letter in word X to create word Y. A word’s social network consists of all of its friends, plus all of their friends, and all of their friends’ friends, and so on. Write a program to tell us how big the social network for the word 'hello' is, using this word list
My pseudocode:
get_network(friend)
if friend not in network
add friend to network
friends = []
check friend against all words in network
consult cache or calculate lev distance
cache if necessary, append to friends if necessary
for all friends
get_network(friend)
To rephrase the question: "what's the fundamental insight that makes possible an astronomical boost in efficiency?"