I want to record a word beforehand and when the same password is spoken into the python script, the program should run if the spoken password matches the previously recorded file. I do not want to use the speech recognition toolkits as the passwords might not be any proper word but could be complete gibberish. I started with saving the previously recorded file and the newly spoken sound as numpy arrays. Now I need a way to determine if the two arrays are 'close' to each other. Can someone point me in the right direction for this?
It is not possible to compare to speech samples on a sample level (or time domain). Each part of the spoken words might vary in length, so they won't match up, and the levels of each part will also vary, and so on. Another problem is that the phase of the individual components that the sound signal consists of can change too, so that two signals that sound the same can look very different in the time domain. So likely the best solution is to move the signal into the frequency domain. One common way to do this is using the Fast Fourier Transform (FFT). You can look it up, there is a lot of material about this on the net, and good support for it in Python.
Then could could proceed like this:
Divide the sound sample into small segments of a few milliseconds.
Find the principal coefficients of FFT of segments.
Compare the sequences of some selected principal coefficients.
Related
We have Google Natural AI integrated into our product for Sentiment Analysis (https://cloud.google.com/natural-language). One of the customers complained that when they write "BAD" then it shows a positive sentiment.
On further investigation, we found that when google Sentiment Analysis Natural Language API is called with input as BAD or Bad (pls see its in all caps or first letter caps ), it identifies text as an entity (a location or consumer good) & sends back the result as Positive while when we write "bad" in all small case, it sends negative.
Has anyone faced a similar problem? How did you solve it?
One obvious way looks like converting text into a small case but that may break some other use cases (maybe where entities do not get analyzed due to a small case text). Another way which we are building is to use our own dictionary of words with sentiments before calling google APIs but that doesn't answer the said problem, which may occur with any other text.
Inputs will help us. Thank you!
The NLP API uses an underlying model that is neural in nature. The knowledge comes from training on real world text. It is normal to get different results for different capitalizations as they can relate to different uses of the same trigram, e.g. Mike (person), mike (microphone, slang), MIKE (military alphabet entry).
The second key aspect is that the model is tuned and meant to be used on larger pieces of text and not on single words, hence good results can not be expected in this case.
Let me start by giving a quick background on myself (please forgive me). I have an intense interest in programming and computers/technical things in general. I took a year of C/C++ in college and a semester of assembly. I have messed around with Visual BASIC. So, almost all of my programming knowledge is limited to these three languages in order of proficiency:
C/C++
Assembly
Visual BASIC
I have a job at a small business that can't justify hiring a trained/"certified" programmer where I have tasked myself with automating a process that must be completed on a monthly basis. It involves:
Sending faxes that are to be filled out with numbers
Receiving those faxes that are returned (all incoming faxes go to network folder as PDF)
Collecting the numbers from received faxes and entering these numbers into Excel (some are Word format for some reason) and then into QuickBooks after calculations
Sending emails
Receiving replies to these emails that contain numbers
Manually entering these numbers into Excel and then QuickBooks after calculations
Collecting numbers from a website written in Javascript. Numbers from website can be outputted to *.csv file.
Finally, printing invoices out from QuickBooks using the calculated numbers that have been entered.
My goal is to automate this entire process. As of now, everything is done manually. Emails and faxes are sent one at a time. Numbers from website are read and entered into Excel one at a time. Numbers are put into QB and invoices are printed one at a time.
So far I have added an email scheduling add-on to Outlook that automatically sends the emails every month. I am working on setting up faxes to be sent automatically (the only thing I can think of off the top of my head is manipulating Windows Scan/Fax with API library in either VB or VC++).
Also, I am automating the calculations that must be performed in order to prep the collected numbers for entry into QB using VBA/Excel and, potentially, Access.
Right now I'm brainstorming a way to automatically collect the numbers (along with customer name) from the returned faxes. My idea was to create a new fax sheet that forced the customer to "bubble in" the numbers like a ScanTron sheet. This way I could write a program (perhaps in C++) to parse the PDF looking for a certain colored pixel in a specific spot in order to piece together the number (I wonder if I could automatically OCR the PDFs and collect the customer name simply by extracting text from each PDF?) which could then be sent to a database or perhaps directly to an Excel sheet (the Excel sheets have to stay so that hard copies of data can be printed--though I supposed this could be accomplished without Excel).
And lastly, since some customers refuse to use any of those methods available to them, we have to manually call some of them. Once I am finished with all of the aforementioned work I would like to develop a way to allow customers to call a specific phone number and key in the information via voice prompt which would then deposit the information in database somewhere. This will be complicated and require special equipment so it will be last and lowest priority. Not worried about this right now.
Since my experience with programming is only moderate (though I'm sure my working knowledge will expand quickly once I get started since a lot of it is already in my brain somewhere) I wanted to give myself the best advantage and tools possible to tackle this project before I got so far into it that changing my methods would waste a lot of time/work. To sum up, I need to make an outline of exactly what I need to do/learn and what techniques/applications to use.
This is the site I always come to when searching for my programming questions and I have come to the conclusion that the people here are generally extremely knowledgeable, patient and helpful. I will appreciate any contribution of information, advice and/or insights no matter how small. I realize that in this situation I am the "beggar" and thus will be grateful for whatever I get.
Thanks in advance.
P.S. Before anyone says anything: I have "UTFSE" extensively and have assimilated lots of info from it. However, we all know that there's no equal to a human's problem solving capabilities--especially when proficient in the specific field.
Nice work! You are definitely on the right track. That was a lot of information so I apologize if I repeat anything you already know.
1) Faxes - Microsoft has an excellent resource for learning how to send faxes (they even provide the code). Check this out: http://msdn.microsoft.com/en-us/library/windows/desktop/ms693482(v=vs.85).aspx
2) You will have to OCR the PDF's (as you mentioned) and then you can extract the information. But (as you seem to understand), you cannot modify a pdf with c++.
3) C++ does allow you to save (and open) a file in Excel format. However, it's a very complicated format and will probably cause some problems. One of them is that it will want to save all of your data to one cell. A way to get around this is to I/O to Excel with .csv files. A comma separates the columns and a new line the rows. For example,
A1, B1, C1
A2, B2, C2
A3, B3, C3
Excel will open and read these files correctly. However, you won't be able to format font, borders, etc... automatically.
This is the extent of my knowledge, I have never worked with emails or Quickbooks. Hope it helps!
I'm looking for advices, for a personal project.
I'm attempting to create a software for creating customized voice commands. The goal is to allow user/me to record some audio data (2/3 secs) for defining commands/macros. Then, when the user will speak (record the same audio data), the command/macro will be executed.
The software must be able to detect a command in less than 1 second of processing time in a low-cost computer (RaspberryPi, for example).
I already searched in two ways :
- Speech Recognition (CMU-Sphinx, Julius, simon) : There is good open-source solutions, but they often need large database files, and speech recognition is not really what I'm attempting to do. Speech Recognition could consume too much power for a small feature.
- Audio Fingerprinting (Chromaprint -> http://acoustid.org/chromaprint) : It seems to be almost what I'm looking for. The principle is to create fingerprint from raw audio data, then compare fingerprints to determine if they can be identical. However, this kind of software/library seems to be designed for song identification (like famous softwares on smartphones) : I'm trying to configure a good "comparator", but I think I'm going in a bad way.
Do you know some dedicated software or parcel of code doing something similar ?
Any suggestion would be appreciated.
I had a more or less similar project in which I intended to send voice commands to a robot. A speech recognition software is too complicated for such a task. I used FFT implementation in C++ to extract Fourier components of the sampled voice, and then I created a histogram of major frequencies (frequencies at which the target voice command has the highest amplitudes). I tried two approaches:
Comparing the similarities between histogram of the given voice command with those saved in the memory to identify the most probable command.
Using Support Vector Machine (SVM) to train a classifier to distinguish voice commands. I used LibSVM and the results are considerably better than the first approach. However, one problem with SVM method is that you need a rather large data set for training. Another problem is that, when an unknown voice is given, the classifier will output a command anyway (which is obviously a wrong command detection). This can be avoided by the first approach where I had a threshold for similarity measure.
I hope this helps you to implement your own voice activated software.
Song fingerprint is not a good idea for that task because command timings can vary and fingerprint expects exact time match. However its very easy to implement matching with DTW algorithm for time series and features extracted with CMUSphinx library Sphinxbase. See Wikipedia entry about DTW for details.
http://en.wikipedia.org/wiki/Dynamic_time_warping
http://cmusphinx.sourceforge.net/wiki/download
In an effort to see if it is possible to easily break very simple CAPTCHAs, I am attempting to write a program (as simple and small as possible). This program, which I hope to write in C++, should do the following:
Make a partial screenshot of a known area of the screen. (Assuming that the CAPTCHA is always in the exact same place - for instance: Pixels 500-600 x, pixels 300-400 y).
Automatically dissect the CAPTCHA into individual letters. (The CAPTCHAS I will create for testing will all have only a few white letters, always on a black background, spaced well apart, to make things easy on me.)
The program then compares the "cut" letters against an array of "known" images of letters (which look similar to the letters used in the CAPTCHA), which contains 26 elements, each holding an image of a single letter of the English alphabet.
The program takes the letter associates with the image that the comparison mapped to, and sends that key to the console (via std::cout)
My question is: Is there an easy-to-use library (I am only a beginner at programming), which can handle tasks 1-3 (The 4. is rather easy)? Especially the third point is something I haven't found pretty much anything worthwhile on. What would be ideal is if this library had a "score" function, using a float to indicate how similar the images are. Then, the one with the highest score is the best hit. (I.e: 100.0 means the images are identical, 29.56 means they are very different, etc.)
A good library for this job is OpenCV. http://opencv.org
OpenCV has all the necessary low-level imge processing tools to segment the different elements of the captcha. Then you can use its template matching module.
You could even try to detect letters directly without the preprocessing. It will be slower, but the captcha image is typically so small, that it should rarely matter. See:
http://docs.opencv.org/modules/imgproc/doc/object_detection.html#cv2.matchTemplate
For some tutorials to get into the library see:
http://docs.opencv.org/doc/tutorials/tutorials.html
Because the open source geo-coders cannot begin to compare to Google's or even Yahoo's, I would like to start a project to create a good open source geo-coder. Just to clarify, a geo-coder takes some text (usually with some constraints) and returns one or more lat/lon pairs.
I realize that this is a difficult and garguntuan task, so I am wondering how you might get started. What would you read? What algorithms would you familiarize yourself with? What code would you review?
And also, assuming you were going to develop this very agilely, what would you want the first prototype to be able to do?
EDIT: Let's set aside the data question for now. I am going to use OpenStreetMap data, along with a database of waypoints that I have. I would later plan to include other data sets as well, and I realize the geo-coder would be inherently limited by the quality of the original data.
The first (and probably blocking) problem would be: where do you get your data from? (unless you are willing to pay thousands of dollars for proprietary sets).
You could build a geocoding-api on top of OpenStreetMap (they publish their data in dumps on a regular basis) I guess, but that one was still very incomplete last time I checked.
Algorithms are easy. Good mapping data, however, is expensive. Very expensive.
Google drove their cars all over the world, collecting this data among other things.
From a .NET point of view these articles might be interesting for you:
Writing Your Own GPS Applications: Part I
Writing Your Own GPS Applications: Part 2
Writing GIS and Mapping Software for .NET
I've only glanced at the articles but they've been on CodeProject's 'Most Popular' list for a long time.
And maybe this CodePlex project which the author of the articles above made available.
I would start at the absolute beginning by figuring out how you're going to get the data that matches a street address with a geocode. Either Google had people going around with GPS units, OR they got the information from some existing source. That existing source may have been... (all guesses)
The Postal Service
Some existing maps(printed)
A bunch of enthusiastic users that were early adopters of GPS technology who ere more than willing to enter in street addresses and GPS coordinates
Some government entity (or entities)
Their own satellites
etc
I guess what I'm getting at is the information was either imported from somewhere or was input by someone via some interface. As my starting point I would look at how to get that information. In an open source situation, you may be able to get a bunch of enthusiastic people to enter information.
So for my first prototype, boring as it would be, I would create a form for entering information.
Then you need to know the math for figuring out the closest distance (as the crow flies). From there, try to figure out how to include roads. (My guess is you would have to have data point for each and every curve, where you hold the geocode location of the curve, and the angle of the road on a north/south and east/west vector. You'd probably need to take incline into account, too to get accurate road measurements.)
That's just where I'd start.
But in all honesty, I wouldn't even start on this. Other programmers have done it already, I'm more interested in what hasn't already been done.
get my free raw data from somewhere like http://ipinfodb.com/ip_database.php
load it into a database, denormalizing for fast lookups
design my API
build it out as a RESTful web service
return results in varying formats: JSON, XML, CSV, raw text
The first prototype should accept a ZIP code and return lat/lon in raw text.