How to add attributes in combination of object detection using YOLO? - computer-vision

I'm new to computer vision and I'm wondering how to deal with the following problem.
I'm using YOLO for real time objet detection task. However I'm dealing with a dataset that gives me also few attributes such has weather, temperature etc...
(I'm obviously able to acces to those informations in real time, to use them in real life).
My data has some big differences depending of the weather, temperature etc... that's why it's useful to have access to those informations.
So is there any way to learn on both image dataset associated to a context ? I'm looking for something that is YOLO compatible.
If a such thing isn't compatible/doesn't exists, I guess I'll just do different versions of the trained YOLO on specifics datasets associated to different context. Each specific version will be actived only for specific weather and temperature.
Thank you in advance for any kind of help/informations.

You will need to build you custom model that combines visual features with tabular data. This could look something like:
vis_feats = nn.Linear(512, 1) # visual features
tab_feats = nn.Linear(4, 1) # tab features
x = torch.cat((x, tab), dim=1) # x goes into your prediction layer

Related

Classification with machine learning and a small database

I want to create a valve detection and classification like this video : https://www.youtube.com/watch?v=VY92fqmSdfA
to detect the positions Open and close and intermediate of the valve.
I have done some research and I have found some methods to resolve this problem, but i have some conditions to respect to resolve this problem :
Condition 1 : Use machine learning in the application, I can't use simple methods like Template matching,...
Condition 2 : Use a small database (Minimum 10 images by classe, maximum 40 images by classe)
Condition 3 : detect the position of the valve if the camera position changes, so I can't use only colors to detect the valve handle.
I want to use HOG (Histogram oriented gradient) + SVM/ANN but HOG needs a lot of images to train SVM/ANN.
I dont know if I can resolve this problem respecting this conditions?
As we know, the most important thing that ML approaches need to work properly is data. So, I'd say your 1st and 2nd conditions are conflicting with each other. In addition, your 3rd condition is adding more complexity in the problem. You can solve it including more data from different angles and illumination conditions. But again, it's conflicting with condition 2.
Even so, if you'd like to follow the ML path, I'd recommend you to use a pre-trained model, a strong data augmentation and, maybe, an ensemble of models to help increase the detection. As the problem is not that hard, it should work.

Opencv Rating Features in an Image

The OpenCV forum has been unavailable for a few days so i am posting this questions here. I want to implement a class in C++ that will analyze an image and determine how good that image is for feature tracking.
One approach has been explained by Vuforia.
https://developer.vuforia.com/library/articles/Solution/Natural-Features-and-Ratings
1) Number of Features
Count the number of features returned, let's say requires min 30 features.
2) Local contrast
The variance can be used as a starting point to measure how much variation there is in the image. What sort of preprocessing would this require to get the most out of this metric?
How can we improve this? With a FT or DFT transform, would it be possible to see if there is high contrast at lots of different image frequencies? How would that be achieved?
DFT -> Variance (?)
3) Feature distribution
This can be done with clustering, with a suitable center and mean+s.d. that is comparable to the image dimensions. 95% should be within mean + 2 x s.d. ideally.
4) Avoid organic shapes
This will yield no features, so is the same criteria as the number of features.
5) Avoid repetitive patterns
Match detected features against itself and make sure there aren't too many duplicates.
Vuforia do the same .
But if you want to write your own code to do the same then,
ARToolkit is open source SDK which provide same feature for NFT markers . if you go through the source code of ARToolkit then you
will find something like " DisplayFeatureSet"
There is DisplayfeatureSet.exe file also there which show the
feature(Hotspots) of selected image like:
Somehow I managed to get source code(.c) for this.
Here I providing My google Drive Link to download Source Code, Work on it and share your experience :
Source Code to Display Feature Set
Best Luck :)

Google Inceptionism: obtain images by class

In the famous Google Inceptionism article,
http://googleresearch.blogspot.jp/2015/06/inceptionism-going-deeper-into-neural.html
they show images obtained for each class, such as banana or ant. I want to do the same for other datasets.
The article does describe how it was obtained, but I feel that the explanation is insufficient.
There's a related code
https://github.com/google/deepdream/blob/master/dream.ipynb
but what it does is to produce a random dreamy image, rather than specifying a class and learn what it looks like in the network, as shown in the article above.
Could anyone give a more concrete overview, or code/tutorial on how to generate images for specific class? (preferably assuming caffe framework)
I think this code is a good starting point to reproduce the images Google team published. The procedure looks clear:
Start with a pure noise image and a class (say "cat")
Perform a forward pass and backpropagate the error wrt the imposed class label
Update the initial image with the gradient computed at the data layer
There are some tricks involved, that can be found in the original paper.
It seems that the main difference is that Google folks tried to get a more "realistic" image:
By itself, that doesn’t work very well, but it does if we impose a prior constraint that the image should have similar statistics to natural images, such as neighboring pixels needing to be correlated.

Why Classification model in weka predicting all instances as one class?

I have built a classification model using weka.I have two classes namely {spam,non-spam} After applying stringtowordvector filter, I get 10000 attributes for 19000 records. Then I am using liblinear library to build model which gives me F-score as follows:
Spam-94%
non-spam-98%
When I use same model to predict new instances, it predict all of them as spam.
Also, when I try to use test set same as training set, It predict all of them as spam too. I am mentally exhausted to find the problem.Any help will be appreciated.
I get it also wrong every so often. Then I watch this video to remind myself how it's done: https://www.youtube.com/watch?v=Tggs3Bd3ojQ where Prof Witten, one of the Weka Developers/Architects shows how to use the FilteredClassifier (which in turn is configured to load the StringToWordVector Filter) on the training-dataset and the test-set correctly.
This is shown for weka 3.6, weka 3.7. might be slightly different.
What does ZeroR give you? If it's close to 100%, you know that any classification algorithm should be not too far off either.
Why do you optimize for F-Measure? Just asking. I have never used this and don't know much about it. (I would optimize for the "Precision" metric assuming you have much more Spam than Nonspam).

Creating custom voice commands (GNU/Linux)

I'm looking for advices, for a personal project.
I'm attempting to create a software for creating customized voice commands. The goal is to allow user/me to record some audio data (2/3 secs) for defining commands/macros. Then, when the user will speak (record the same audio data), the command/macro will be executed.
The software must be able to detect a command in less than 1 second of processing time in a low-cost computer (RaspberryPi, for example).
I already searched in two ways :
- Speech Recognition (CMU-Sphinx, Julius, simon) : There is good open-source solutions, but they often need large database files, and speech recognition is not really what I'm attempting to do. Speech Recognition could consume too much power for a small feature.
- Audio Fingerprinting (Chromaprint -> http://acoustid.org/chromaprint) : It seems to be almost what I'm looking for. The principle is to create fingerprint from raw audio data, then compare fingerprints to determine if they can be identical. However, this kind of software/library seems to be designed for song identification (like famous softwares on smartphones) : I'm trying to configure a good "comparator", but I think I'm going in a bad way.
Do you know some dedicated software or parcel of code doing something similar ?
Any suggestion would be appreciated.
I had a more or less similar project in which I intended to send voice commands to a robot. A speech recognition software is too complicated for such a task. I used FFT implementation in C++ to extract Fourier components of the sampled voice, and then I created a histogram of major frequencies (frequencies at which the target voice command has the highest amplitudes). I tried two approaches:
Comparing the similarities between histogram of the given voice command with those saved in the memory to identify the most probable command.
Using Support Vector Machine (SVM) to train a classifier to distinguish voice commands. I used LibSVM and the results are considerably better than the first approach. However, one problem with SVM method is that you need a rather large data set for training. Another problem is that, when an unknown voice is given, the classifier will output a command anyway (which is obviously a wrong command detection). This can be avoided by the first approach where I had a threshold for similarity measure.
I hope this helps you to implement your own voice activated software.
Song fingerprint is not a good idea for that task because command timings can vary and fingerprint expects exact time match. However its very easy to implement matching with DTW algorithm for time series and features extracted with CMUSphinx library Sphinxbase. See Wikipedia entry about DTW for details.
http://en.wikipedia.org/wiki/Dynamic_time_warping
http://cmusphinx.sourceforge.net/wiki/download