vehicle type identification with neural network - python-2.7

I was given a project on vehicle type identification with neural network and that is how I came to know the awesomeness of neural technology.
I am a beginner with this field, but I have sufficient materials to learn it. I just want to know some good places to start for this project specifically, as my biggest problem is that I don't have very much time. I would really appreciate any help. Most importantly, I want to learn how to match patterns with images (in my case, vehicles).
I'd also like to know if python is a good language to start this in, as I'm most comfortable with it.
I am having some images of cars as input and I need to classify those cars by there model number.
Eg: Audi A4,Audi A6,Audi A8,etc

You didn't say whether you can use an existing framework or need to implement the solution from scratch, but either way Python is excellent language for coding neural networks.
If you can use a framework, check out Theano, which is written in Python and is the most complete neural network framework available in any language:
http://www.deeplearning.net/software/theano/
If you need to write your implementation from scratch, look at the book 'Machine Learning, An Algorithmic Perspective' by Stephen Marsland. It contains example Python code for implementing a basic multilayered neural network.
As for how to proceed, you'll want to convert your images into 1-D input vectors. Don't worry about losing the 2-D information, the network will learn 'receptive fields' on its own that extract 2-D features. Normalize the pixel intensities to a -1 to 1 range (or better yet, 0 mean with a standard deviation of 1). If the images are already centered and normalized to roughly the same size than a simple feed-forward network should be sufficient. If the cars vary wildly in angle or distance from the camera, you may need to use a convolutional neural network, but that's much more complex to implement (there are examples in the Theano documentation). For a basic feed-forward network try using two hidden layers and anywhere from 0.5 to 1.5 x the number of pixels in each layer.
Break your dataset into separate training, validation, and testing sets (perhaps with a 0.6, 0.2, 0.2 ratio respectively) and make sure each image only appears in one set. Train ONLY on the training set, and don't use any regularization until you're getting close to 100% of the training instances correct. You can use the validation set to monitor progress on instances that you're not training on. Performance should be worse on the validation set than the training set. Stop training when the performance on the validation set stops improving. Once you've accomplished this you can try different regularization constants and choose the one that results in the best validation set performance. The test set will tell you how well your final result is performing (but don't change anything based on test set results, or you risk overfitting to that too!).
If your car images are very complex and varied and you cannot get a basic feed-forward net to perform well, you might consider using 'deep learning'. That is, add more layers and pre-train them using unsupervised training. There's a detailed tutorial on how to do this here (though all the code examples are in MatLab/Octave):
http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial
Again, that adds a lot of complexity. Try it with a basic feed-forward NN first.

Related

How can I use dlib for a neural network regression?

It seems that dlib needs a loss layer that dictates how the layers most distant to our input layer are treated. I cannot find any documentation towards the loss layers but it seems that there is no way to have just some summation layer.
Summing up all the values of the last layer would be exactly what I need for the regression, though (see also: https://deeplearning4j.org/linear-regression)
I was thinking along the lines of writing a custom loss layer but could not find information about this, either.
So, have I overseen some corresponding layer here or is there a possibility to have what I need?
The loss layers in dlib are listed in the menu on dlib's machine learning page. Look for the words "loss layers". There is lots of documentation.
The current released version of dlib doesn't include a regression loss. However, if you get the current code from github you can use the new loss_mean_squared layer to do regression. See: https://github.com/davisking/dlib/blob/master/dlib/dnn/loss_abstract.h

Time between training set images for individual facial recognition

Edit: I didn't make this clear, for this is for the possible future development of an application.
I am looking into individual facial recognition for an application, but an essential part of this seems to be a fairly large training set of images for each individual to be recognized.
Is it important for the images to be taken at different times in different environments, or could several images captured over a few seconds with a handheld camera possibly provide the necessary variations for a good training set?
(This isn't for human facial recognition, by the way, so existing tools and databases won't really help too much. I'm aware that 2D image recognition can not necessarily be applied to all species; let's just assume that it does work in my use case.)
This paper may answer some of your questions:
http://uran.donetsk.ua/~masters/2011/frt/dyrul/library/article8.pdf
From the pattern classification point of view, a usual problem in face recognition is having a plethora of classes and only a few, possibly only one, training sample(s) per class. For this reason, more sophisticated classifiers are not needed but a nearest-neighbour classifier is used.
While I'm not an expert on the subject, it appears to be a common problem to have only one image per person as a training sample and one that has been solved with at least some level of accuracy in controlled lighting/positional situations.
To specifically answer your question, a training set that had multiple images of each person with little or no variation ("several images captured over a few seconds with a handheld camera"), would not be as valuable as one that had more variation (e.g. different facial expressions, lighting, backgrounds).

Possible datasets for testing path finding algorithms

I'm doing some work on pathfinding.
So far I have tested my code on scenes composed of 2D cells.
I've also created a simple 3d scene to test my work on as well.
I'd like to test my work on some 3d scenes .. but it is time consuming to create them.
Does anyone know of any scene datasets that I could use to test my pathfinding algorithms on?
To get a better answer, you really need to specify the dimensionality of the configuration spaces that you want to consider. You aren't going to be tackling protein folding and docking problems (200+ degrees of freedom) with discrete graph searches. Even a relatively small planning problems (in terms of academic problems), of about 6 degrees of freedom can quickly become intractable.
Most of the best examples for planning tend to be published in research papers first, and then make their way into more general use. Some of the best work tends to be published in IEEE journals, or at the Intelligent Robots and Systems (IROS) and International Conference on Robotics and Automation (ICRA) conferences. It may also be worth using the bibliography of a well known reference in the field, such as "Motion Planning" by LaValle as a starting point for further research (available in bibtex here)
Mark Overmars work in the computational geometry and planning communities have made some of the problems considered in his publications very recognizable. It is worth checking if any his current grad students and collaborators have any data sets available at the moment.
If you're happy to still be doing some work in 2d, and to manually convert an image to geometric data, Kris Beevers website has a number of worked examples for a range of planners in 2d work spaces.
The Motion Strategy Library contains a number of classical motion planning problems for use in 2d and 3d workspaces, with varying dimensionality of configuration space depending on the problem. It includes:
L sections into a birdcage
trailers
multiple trailers
mazes
kinematic chains
non-holonomic cars
A more recent implementation of an academic motion planning library is The Open Motion Planning Library developed by the Kavraki lab. Because of licensing, I haven't checked personally, but I assume that they ship some examples and tests with their project.
A number of significantly more complex kinodynamic motion planning examples are now publicly available as part of the OpenRAVE project. Their gallery is eye opening.
when I need big 3D datasets, I usually use attractors or other dynamical series. You simply have to iterate as many time as you want and it will generate a nice set of 3D data.
Try this 'Peter de Jong Attractor':
Xn+1 = sin(a Yn) - cos(b Xn)
Yn+1 = sin(c Xn) - cos(d Yn)
Where (for example): a = 1.4, b = -2.3, c = 2.4, d = -2.1

Design of virtual trial room

As a part of my masters project I proposed to build a virtual trial room application intended for retail clothing stores. Currently its meant to be used directly in store though it may be extended for online stores as well.
This application will show customers how a selected apparel would look on them by showing it on their 3D replica on screen.
It involves 3 steps
Sizing up the customer
Building customer replica 3D humanoid model
Apply simulated cloth on the model
My question is about the feasibility of the project and choice of framework.
Can this be achieved in real time using a normal Desktop computer? If yes what would be appropriate framework ( hardware, software, programming language etc ) for this purpose?
On the work I have done till now, I was planning to achieve above steps in following ways
for step 1 : option a) Two cameras for front and side views or
option b) 1 Kinect or 2 Kinect for complete 3D data
for step 2: either use makehuman (http://www.makehuman.org/) code to build a customised 3D model using above data or build everything from scratch, unsure about the framework.
for step 3: Just need few cloth samples, so thought of building simulated clothes in blender.
Currently I have just the vague idea about different pieces but I am not sure of how to develop complete application.
Theoretically this can be achieved in real time. Many usefull algorithms for video tracking, stereo vision and 3d recostruction are available in OpenCV library. But it's very difficult to build robust solution. For example, you'll probably need to track human body which moves frame to frame and perform pose estimation (OpenCV contains POSIT algorithm), however it's not trivial to eliminate noise in resulting objects coordinates. For inspiration see a nice work on video tracking.
You might want to choose another way, simplify some things, avoid complicated stuff do things less dynamicaly and estimate only clothes size and approximate human location. I this case most likely you will create something usefull and interesting.
I've lost link to one online fiting room where hands and body detection implemented. Using Kinnect solves many problems. But If for some reason you won't use it then AR(augmented reality) helps you (yet another fitting room)

Comparing SIFT features stored in a mysql database

I'm currently extending an image library used to categorize images and i want to find duplicate images, transformed images, and images that contain or are contained in other images.
I have tested the SIFT implementation from OpenCV and it works very well but would be rather slow for multiple images. Too speed it up I thought I could extract the features and save them in a database as a lot of other image related meta data is already being held there.
What would be the fastest way to compare the features of a new images to the features in the database?
Usually comparison is done calculating the euclidean distance using kd-trees, FLANN, or with the Pyramid Match Kernel that I found in another thread here on SO, but haven't looked much into yet.
Since I don't know of a way to save and search a kd-tree in a database efficiently, I'm currently only seeing three options:
* Let MySQL calculate the euclidean distance to every feature in the database, although I'm sure that that will take an unreasonable time for more than a few images.
* Load the entire dataset into memory at the beginning and build the kd-tree(s). This would probably be fast, but very memory intensive. Plus all the data would need to be transferred from the database.
* Saving the generated trees into the database and loading all of them, would be the fastest method but also generate high amounts of traffic as with new images the kd-trees would have to be rebuilt and send to the server.
I'm using the SIFT implementation of OpenCV, but I'm not dead set on it. If there is a feature extractor more suitable for this task (and roughly equally robust) I'm glad if someone could suggest one.
So I basically did something very similar to this a few years ago. The algorithm you want to look into was proposed a few years ago by David Nister, the paper is: "Scalable Recognition with a Vocabulary Tree". They pretty much have an exact solution to your problem that can scale to millions of images.
Here is a link to the abstract, you can find a download link by googleing the title.
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1641018
The basic idea is to build a tree with a hierarchical k-means algorithm to model the features and then leverage the sparse distribution of features in that tree to quickly find your nearest neighbors... or something like that, it's been a few years since I worked on it. You can find a powerpoint presentation on the authors webpage here: http://www.vis.uky.edu/~dnister/Publications/publications.html
A few other notes:
I wouldn't bother with the pyramid match kernel, it's really more for improving object recognition than duplicate/transformed image detection.
I would not store any of this feature stuff in an SQL database. Depending on your application it is sometimes more effective to compute your features on the fly since their size can exceed the original image size when computed densely. Histograms of features or pointers to nodes in a vocabulary tree are much more efficient.
SQL databases are not designed for doing massive floating point vector calculations. You can store things in your database, but don't use it as a tool for computation. I tried this once with SQLite and it ended very badly.
If you decide to implement this, read the paper in detail and keep a copy handy while implementing it, as there are many minor details that are very important to making the algorithm work efficiently.
The key, I think, is that is this isn't a SIFT question. It is a question about approximate nearest neighbor search. Like image matching this too is an open research problem. You can try googling "approximate nearest neighbor search" and see what type of methods are available. If you need exact results, try: "exact nearest neighbor search".
The performace of all these geometric data structures (such as kd-trees) degrade as the number of dimensions increase, so the key I think is that you may need to represent your SIFT descriptors in a lower number of dimensions (say 10-30 instead of 256-1024) to have really efficient nearest neighbor searches (use PCA for example).
Once you have this I think it will become secondary if the data is stored in MySQL or not.
I think speed is not the main issue here. The main issue is how to use the features to get the results you want.
If you want to categorize the images (e. g. person, car, house, cat), then the Pyramid Match kernel is definitely worth looking at. It is actually a histogram of the local feature descriptors, so there is no need to compare individual features to each other. There is also a class of algorithms known as the "bag of words", which try to cluster the local features to form a "visual vocabulary". Again, in this case once you have your "visual words" you do not need to compute distances between all pairs of SIFT descriptors, but instead determine which cluster each feature belongs to. On the other hand, if you want to get point correspondences between pairs of images, such as to decide whether one image is contained in another, or to compute the transformation between the images, then you do need to find the exact nearest neighbors.
Also, there are local features other than SIFT. For example SURF are features similar to SIFT, but they are faster to extract, and they have been shown to perform better for certain tasks.
If all you want to do is to find duplicates, you can speed up your search considerably by using a global image descriptor, such as a color histogram, to prune out images that are obviously different. Comparing two color histograms is orders of magnitude faster than comparing two sets each containing hundreds of SIFT features. You can create a short list of candidates using color histograms, and then refine your search using SIFT.
I have some tools in python you can play with here . Basically its a package that uses SIFT transformed vectors, and then computes a nearest lattice hashing of each 128d sift vector. The hashing is the important part, as it is locality sensitive, simply meaning that vectors near in R^n space result in equivalent hash collision probabilities. The work I provide is an extension of Andoni that provides a query adaptive heuristic for pruning the LSH exact search lists, as well as an optimized CUDA implementation of the hashing function. I also have a small app that does image database search with nice visual feedback, all under bsd (exception is SIFT which has some additional restrictions).