What is the way to implement neural networks in c/c++? - c++

I wanted to use neural networks for pattern matching in c++. The scenario is like this:
The main goal is to determine a product by name when captured by a camera.
A rectangular pack of a product (say for example the container of a toothpaste product) is cut into its edge so that the all of its side are shown in one plane. The camera takes a picture of the pack and compare its patterns to the database.
If the patterns are found from the search, then display the name of the product.
Else, store the patterns of the product to the database with its name (say the brand of the toothpaste).
What I mean by pattern is the distinct feature of the product pack among the other products.
I want to know the following using c/c++ (linux, windows, or mac os doesn't matter):
Is there a library that makes work somehow easier?
If a library is not available, what is the best algorithm you can suggest for pattern matching?

I think first, you will need to do some post processing on picture captured by a camera to normalize it (size, angle, ...) For that job, you can use OpenCV.
Then if you want to setup a NN, maybe you should give a try to FANN (Fast Artificial Neural Network) http://leenissen.dk/fann/wp/
The library is compatible with Linux/Windows and really easy to use!

Related

How to add attributes in combination of object detection using YOLO?

I'm new to computer vision and I'm wondering how to deal with the following problem.
I'm using YOLO for real time objet detection task. However I'm dealing with a dataset that gives me also few attributes such has weather, temperature etc...
(I'm obviously able to acces to those informations in real time, to use them in real life).
My data has some big differences depending of the weather, temperature etc... that's why it's useful to have access to those informations.
So is there any way to learn on both image dataset associated to a context ? I'm looking for something that is YOLO compatible.
If a such thing isn't compatible/doesn't exists, I guess I'll just do different versions of the trained YOLO on specifics datasets associated to different context. Each specific version will be actived only for specific weather and temperature.
Thank you in advance for any kind of help/informations.
You will need to build you custom model that combines visual features with tabular data. This could look something like:
vis_feats = nn.Linear(512, 1) # visual features
tab_feats = nn.Linear(4, 1) # tab features
x = torch.cat((x, tab), dim=1) # x goes into your prediction layer

Voice activated password implementation in python

I want to record a word beforehand and when the same password is spoken into the python script, the program should run if the spoken password matches the previously recorded file. I do not want to use the speech recognition toolkits as the passwords might not be any proper word but could be complete gibberish. I started with saving the previously recorded file and the newly spoken sound as numpy arrays. Now I need a way to determine if the two arrays are 'close' to each other. Can someone point me in the right direction for this?
It is not possible to compare to speech samples on a sample level (or time domain). Each part of the spoken words might vary in length, so they won't match up, and the levels of each part will also vary, and so on. Another problem is that the phase of the individual components that the sound signal consists of can change too, so that two signals that sound the same can look very different in the time domain. So likely the best solution is to move the signal into the frequency domain. One common way to do this is using the Fast Fourier Transform (FFT). You can look it up, there is a lot of material about this on the net, and good support for it in Python.
Then could could proceed like this:
Divide the sound sample into small segments of a few milliseconds.
Find the principal coefficients of FFT of segments.
Compare the sequences of some selected principal coefficients.

Creating custom voice commands (GNU/Linux)

I'm looking for advices, for a personal project.
I'm attempting to create a software for creating customized voice commands. The goal is to allow user/me to record some audio data (2/3 secs) for defining commands/macros. Then, when the user will speak (record the same audio data), the command/macro will be executed.
The software must be able to detect a command in less than 1 second of processing time in a low-cost computer (RaspberryPi, for example).
I already searched in two ways :
- Speech Recognition (CMU-Sphinx, Julius, simon) : There is good open-source solutions, but they often need large database files, and speech recognition is not really what I'm attempting to do. Speech Recognition could consume too much power for a small feature.
- Audio Fingerprinting (Chromaprint -> http://acoustid.org/chromaprint) : It seems to be almost what I'm looking for. The principle is to create fingerprint from raw audio data, then compare fingerprints to determine if they can be identical. However, this kind of software/library seems to be designed for song identification (like famous softwares on smartphones) : I'm trying to configure a good "comparator", but I think I'm going in a bad way.
Do you know some dedicated software or parcel of code doing something similar ?
Any suggestion would be appreciated.
I had a more or less similar project in which I intended to send voice commands to a robot. A speech recognition software is too complicated for such a task. I used FFT implementation in C++ to extract Fourier components of the sampled voice, and then I created a histogram of major frequencies (frequencies at which the target voice command has the highest amplitudes). I tried two approaches:
Comparing the similarities between histogram of the given voice command with those saved in the memory to identify the most probable command.
Using Support Vector Machine (SVM) to train a classifier to distinguish voice commands. I used LibSVM and the results are considerably better than the first approach. However, one problem with SVM method is that you need a rather large data set for training. Another problem is that, when an unknown voice is given, the classifier will output a command anyway (which is obviously a wrong command detection). This can be avoided by the first approach where I had a threshold for similarity measure.
I hope this helps you to implement your own voice activated software.
Song fingerprint is not a good idea for that task because command timings can vary and fingerprint expects exact time match. However its very easy to implement matching with DTW algorithm for time series and features extracted with CMUSphinx library Sphinxbase. See Wikipedia entry about DTW for details.
http://en.wikipedia.org/wiki/Dynamic_time_warping
http://cmusphinx.sourceforge.net/wiki/download

How can I distinguish two images having coordinate of object?

I am using OpenCV with c++ and I have several images with located minutiae (end/branch).
Minutiae have:coordinate(x,y) , type(end/branch) and angle.
How can distinguish one image from another having this information??
I need very simple algorithm or code or any idea!!!
Example with located points:
http://ifotos.pl/zobacz/minucjepn_xhaqnwh.png/
How can I distinguish images with thats located points??
Have a look at geometric hashing.
#user1666649, this is not as simple as you told. If you look for scientific articles, there are many of Bologne University about this. If you need a semple code you can look for NBIS from NIST/FBI. However if you are looking a good algorithm, you will need to buy a commercial one from Veridis, Neurotechnology, Aware, etc...

Best approach for doing full-text search with list-of-integers documents

I'm working on a C++/Qt image retrieval system based on similarity that works as follows (I'll try to avoid irrelevant or off-topic details):
I take a collection of images and build an index from them using OpenCV functions. After that, for each image, I get a list of integer values representing important "classes" that each image belongs to. The more integers two images have in common, the more similar they are believed to be.
So, when I want to query the system, I just have to compute the list of integers representing the query image, perform a full-text search (or similar) and retrieve the X most similar images.
My question is, what's the best approach to permorm such a search?
I've heard about Lucene, Lemur and other indexing methods, but I don't know if this kind of full-text searchs are the best way, given the domain is reduced (only integers instead of words).
I'd like to know about the alternatives in terms of efficiency, accuracy or C++ friendliness.
Thanks!
It sounds to me like you have a vectorspace model, so Lucene or a similar product may work well for you. In general, an inverted-index model will be good if:
You don't know the number of classes in advance
There are a lot of classes relative to the number of images
If your problem doesn't fit these criteria, a normal relational DB might work better, as Thomas suggested. If it meets #1 but not #2, you could investigate one of the "column oriented" non-relational databases. I'm not familiar enough with these to tell you how well they would work, but my intuition is that you'll need to replicate a lot of the functionality in an IR toolkit yourself.
Lucene is written in Java and I don't know of any C++ ports. Solr exposes Lucene as a web service, so it's easy enough to access it that way from whatever language you choose.
I don't know much about Lemur, but it looks like it has a similar vectorspace model, and it's written in C++, so that might be easier for you to use.
You can take a look at Lucene for image retrieval (LIRE) here: http://www.semanticmetadata.net/2006/05/19/lire-lucene-image-retrieval-04-released/
If I'm mistaken, you are trying to implement a typical bag of words image retrieval am I correct? If so you are probably trying to build an inverted file index. Lucene on its own is not suitable as you probably have already realized as it index text instead of numbers. Using its classes for querying the index would also be a problem as it is not designed to "parse" (i.e. detect keypoints, extract descriptors then vector-quantize them) image into the query vector.
LIRE on the other hand have been modified to index feature vectors. However, it does not appear to work out of the box for bag of words model. Also, I think I've read on the author's website that it currently uses brute force matching rather than the inverted file index to retrieve the images but I would expect it to be easier to extend than Lucene itself for your purposes.
Hope this helps.