Classification with machine learning and a small database - c++

I want to create a valve detection and classification like this video : https://www.youtube.com/watch?v=VY92fqmSdfA
to detect the positions Open and close and intermediate of the valve.
I have done some research and I have found some methods to resolve this problem, but i have some conditions to respect to resolve this problem :
Condition 1 : Use machine learning in the application, I can't use simple methods like Template matching,...
Condition 2 : Use a small database (Minimum 10 images by classe, maximum 40 images by classe)
Condition 3 : detect the position of the valve if the camera position changes, so I can't use only colors to detect the valve handle.
I want to use HOG (Histogram oriented gradient) + SVM/ANN but HOG needs a lot of images to train SVM/ANN.
I dont know if I can resolve this problem respecting this conditions?

As we know, the most important thing that ML approaches need to work properly is data. So, I'd say your 1st and 2nd conditions are conflicting with each other. In addition, your 3rd condition is adding more complexity in the problem. You can solve it including more data from different angles and illumination conditions. But again, it's conflicting with condition 2.
Even so, if you'd like to follow the ML path, I'd recommend you to use a pre-trained model, a strong data augmentation and, maybe, an ensemble of models to help increase the detection. As the problem is not that hard, it should work.

Related

Text Detection with YOLO on Challenging Images

I have images that look as follows:
My goal is to detect and recognize the number 31197394. I have already fine-tuned a deep neural network on text recognition. It can successfully identify the correct number, if it is provided it in the following format:
The only task that remains is the detection of the corresponding bounding box. For this purpose, I have tried darknet. Unfortunately, it's not recognizing anything. Does anyone have an idea of a network that performs better on these kind of images? I know, that amazon recognition is able to solve this task. But I need a solution that works offline. So my hopes are still high that there exist pre-trained networks that work. Thank's a lot for your help!
Don't say darknet doesn't work. It depends on how you labeled your dataset. It is true that the numbers you want to recognize are too small so if you don't make any changes to the image during the pre-processing phase, it would be complicated for a neural network to recognize them well. So what you can do that will surely work is:
1---> Before labeling, increase the size of all images by 2 times its current size (like 1000*1000)
2---> used this size (1000 * 1000) for the darket trainer instead of the default size proposed by darknet which is 416 * 416. You would then have to change the configuration file
3---> use the latest darknet version (yolo v4)
4---> On the configuration file, always keep a number of subdivisions at 1.
I also specify that this method is too greedy in memory, it is therefore necessary to provide a machine with RAM > 16 GB. The advantage is that it works...
Thanks for your answers guys! You were right, I had to finetune yolo to make it work. So I created a dataset and fine-tuned yolov5. I am surprised how good the results are. Despite only having about 300 images in total, I get an accuracy of 97% to predict the correct number. This is mainly due to strong augmentations. And indeed the memory requirements are large, but I could train on a 32 GM RAM machine. I can really encourage anyone who faces similar problems to give yolo a shot!!
Maybe use an R-CNN to identify the region where the number is and then pass that region to your fine-tuned neural network for the digit classification

Style transfer on large image. (in chunks?)

I am looking into various style transfer models and I noted that they all have limited resolution (when running on Pixel 3, for example, I couldn't go beyond 1,024x1,024, OOM otherwise).
I've noticed a few apps (eg this app) which appear to be doing style transfer for up to ~10MP images, these apps also show progress bar which I guess means that they don't just call a single tensorflow "run" method for entire image as otherwise they won't know how much was processed.
I would guess they are using some sort of tiling, but naively splitting the image into 256x256 produces inconsistent style (not just on the borders).
As this seems like an obvious problem I tried to find any publications about this, but I couldn't find any. Am I missing something?
Thanks!
I would guess people split the model into multiple ones (for VGG it is easy to do manually, eg. via layers) and then use model_summary Keras function (or benchmarks) to estimate relative time it takes for each step and thus guide progress bar. Such separation probably also saves memory as tensorflow lite might not be clever enough to reuse memory storing intermediate activations from lower layers once they are not needed.

Opencv Rating Features in an Image

The OpenCV forum has been unavailable for a few days so i am posting this questions here. I want to implement a class in C++ that will analyze an image and determine how good that image is for feature tracking.
One approach has been explained by Vuforia.
https://developer.vuforia.com/library/articles/Solution/Natural-Features-and-Ratings
1) Number of Features
Count the number of features returned, let's say requires min 30 features.
2) Local contrast
The variance can be used as a starting point to measure how much variation there is in the image. What sort of preprocessing would this require to get the most out of this metric?
How can we improve this? With a FT or DFT transform, would it be possible to see if there is high contrast at lots of different image frequencies? How would that be achieved?
DFT -> Variance (?)
3) Feature distribution
This can be done with clustering, with a suitable center and mean+s.d. that is comparable to the image dimensions. 95% should be within mean + 2 x s.d. ideally.
4) Avoid organic shapes
This will yield no features, so is the same criteria as the number of features.
5) Avoid repetitive patterns
Match detected features against itself and make sure there aren't too many duplicates.
Vuforia do the same .
But if you want to write your own code to do the same then,
ARToolkit is open source SDK which provide same feature for NFT markers . if you go through the source code of ARToolkit then you
will find something like " DisplayFeatureSet"
There is DisplayfeatureSet.exe file also there which show the
feature(Hotspots) of selected image like:
Somehow I managed to get source code(.c) for this.
Here I providing My google Drive Link to download Source Code, Work on it and share your experience :
Source Code to Display Feature Set
Best Luck :)

Google Inceptionism: obtain images by class

In the famous Google Inceptionism article,
http://googleresearch.blogspot.jp/2015/06/inceptionism-going-deeper-into-neural.html
they show images obtained for each class, such as banana or ant. I want to do the same for other datasets.
The article does describe how it was obtained, but I feel that the explanation is insufficient.
There's a related code
https://github.com/google/deepdream/blob/master/dream.ipynb
but what it does is to produce a random dreamy image, rather than specifying a class and learn what it looks like in the network, as shown in the article above.
Could anyone give a more concrete overview, or code/tutorial on how to generate images for specific class? (preferably assuming caffe framework)
I think this code is a good starting point to reproduce the images Google team published. The procedure looks clear:
Start with a pure noise image and a class (say "cat")
Perform a forward pass and backpropagate the error wrt the imposed class label
Update the initial image with the gradient computed at the data layer
There are some tricks involved, that can be found in the original paper.
It seems that the main difference is that Google folks tried to get a more "realistic" image:
By itself, that doesn’t work very well, but it does if we impose a prior constraint that the image should have similar statistics to natural images, such as neighboring pixels needing to be correlated.

Reduce a Caffe network model

I'd like to use Caffe to extract image features. However, it takes too long to process an image, so I'm looking for ways to optimize for speed.
One thing I noticed is that the network definition I'm using has four extra layers on top the one from which I'm reading a result (and there are no feedback signals, so they should be safe to delete).
I tried to delete them from the definition file but it had no effect at all. I guess I might need to remove the corresponding part of the file that contains pre-trained weights, too. That is, however, a binary file (a protobuffer) so editing it is not that easy.
Do you think that removing the four layers might have a profound effect of the net performance?
If so then how do I get familiar with the file contents so that I could edit it and how do I know which parts to remove?
first, I don't think removing the binary weights will have any effect.
Second, you can do it easily using the python interface: see this tutorial.
Last but not least, have you tried running caffe time to measure the performance of your net? this may help you identify the bottlenecks of your computations.
PS,
You might find this thread relevant as well.
Caffemodel stores data as key-value pair. Caffe only copies weight for those layers (in train.prototxt) having exactly same name as caffemodel. Hence I don't think removing binary weights will work. If you want to change network structure, just modify train.prototxt and deploy.txt.
If you insist to remove weights from binary file, follow this caffe example.
And to make sure you delete right part, this visualizing tool should help.
I would retrain on a smaller input size, change strides, etc. However if you want to reduce file size, I'd suggest quantizing the weights https://github.com/yuanyuanli85/CaffeModelCompression and then using something like lzma compression (xz for unix). We do this so we can deploy to mobile devices. 8 bit weights compress nicely.