Text Detection with YOLO on Challenging Images - computer-vision

I have images that look as follows:
My goal is to detect and recognize the number 31197394. I have already fine-tuned a deep neural network on text recognition. It can successfully identify the correct number, if it is provided it in the following format:
The only task that remains is the detection of the corresponding bounding box. For this purpose, I have tried darknet. Unfortunately, it's not recognizing anything. Does anyone have an idea of a network that performs better on these kind of images? I know, that amazon recognition is able to solve this task. But I need a solution that works offline. So my hopes are still high that there exist pre-trained networks that work. Thank's a lot for your help!

Don't say darknet doesn't work. It depends on how you labeled your dataset. It is true that the numbers you want to recognize are too small so if you don't make any changes to the image during the pre-processing phase, it would be complicated for a neural network to recognize them well. So what you can do that will surely work is:
1---> Before labeling, increase the size of all images by 2 times its current size (like 1000*1000)
2---> used this size (1000 * 1000) for the darket trainer instead of the default size proposed by darknet which is 416 * 416. You would then have to change the configuration file
3---> use the latest darknet version (yolo v4)
4---> On the configuration file, always keep a number of subdivisions at 1.
I also specify that this method is too greedy in memory, it is therefore necessary to provide a machine with RAM > 16 GB. The advantage is that it works...

Thanks for your answers guys! You were right, I had to finetune yolo to make it work. So I created a dataset and fine-tuned yolov5. I am surprised how good the results are. Despite only having about 300 images in total, I get an accuracy of 97% to predict the correct number. This is mainly due to strong augmentations. And indeed the memory requirements are large, but I could train on a 32 GM RAM machine. I can really encourage anyone who faces similar problems to give yolo a shot!!

Maybe use an R-CNN to identify the region where the number is and then pass that region to your fine-tuned neural network for the digit classification

Related

Machine Vision - Hash An Image

I'm in the feasibility stage of a project and wanted to know whether the following was doable using Machine Vision:
If I wanted to see if two files were identical, I would use a hashing function of sorts (e.g. sha1 or md5) on the files and store the results in a database.
However, if I have two images where say image 1 is 90% quality and image 2 is 100% quality, then this will not work as they will have different hashes.
Using machine vision, is it possible to "look" at an image and create a signature from it, so that when another image is encountered, we can say "have we already got this image in the system", and if so, disregard the new image, and if not, save the image?
I know that you are able to perform Machine Vision comparison between two known images, e.g.:
https://www.pyimagesearch.com/2014/09/15/python-compare-two-images/
(there's a lot of code in there so I cannot simply paste in here for reference, unfortunately)
but an image by image comparison would be extremely expensive.
Thanks
python provide the module called : imagehash :
imagehash - encodes the image which is commend bellow.
from PIL import Image
import imagehash
hash = imagehash.average_hash(Image.open('./image_1.png'))
print(hash)
# d879f8f89b1bbf
otherhash = imagehash.average_hash(Image.open('./image_2.png'))
print(otherhash)
# ffff3720200ffff
print(hash == otherhash)
# False
print(hash)
above is the python code which will print "true" if images are identical and "false" if images are not identical.
Thanks.
I do not what you mean by 90% and 100%. Are they image compression quality using JPEG? Regardless of this, you can match images using many methods for example using image processing only approaches such as SIFT, SURF, BRISK, ORB, FREAK or machine learning approaches such as Siamese networks. However, they are heavy for simple computer to run (on my computer powered by core-i7 2670QM, from 100 to 2000 ms for a 2 mega pixel match), specially if you run them without parallelism ( programming without GPU, AVX, ...), specially the last one.
For hashing you may also use perceptual hash functions. They are widely used in finding cases of online copyright infringement as well as in digital forensics because of the ability to have a correlation between hashes so similar data can be found (for instance with a differing watermark) [1]. Also you can search copy move forgery and read papers around it and see how similar images could be found.

Caffe Batch processing no speedup

I would like to speedup the forward pass of classification of a CNN using caffe.
I have tried batch classification in Caffe using code provided in here:
Modifying the Caffe C++ prediction code for multiple inputs
This solution enables me to give a vector of Mat, but it does not speed up anything. Even though the input layer is modified.
I am processing pretty small images (3x64x64) on a powerful pc with two GTX1080, and there is no issue in terms of memory.
I tried also changing the deploy.prototxt, but I get the same result.
It seems that at one point the forward pass of the CNN becomes sequential.
I have seen someone pointing this out here also:
Batch processing mode in Caffe - no performance gains
Another similar thread, for python : batch size does not work for caffe with deploy.prototxt
I have seen some things about MemoryDataLayer, but I am not sure this will solve my problem.
So I am kind of lost on what to do exactly... does anyone have any information on how to speedup classification time.
Thanks for any help !

Opencv Rating Features in an Image

The OpenCV forum has been unavailable for a few days so i am posting this questions here. I want to implement a class in C++ that will analyze an image and determine how good that image is for feature tracking.
One approach has been explained by Vuforia.
https://developer.vuforia.com/library/articles/Solution/Natural-Features-and-Ratings
1) Number of Features
Count the number of features returned, let's say requires min 30 features.
2) Local contrast
The variance can be used as a starting point to measure how much variation there is in the image. What sort of preprocessing would this require to get the most out of this metric?
How can we improve this? With a FT or DFT transform, would it be possible to see if there is high contrast at lots of different image frequencies? How would that be achieved?
DFT -> Variance (?)
3) Feature distribution
This can be done with clustering, with a suitable center and mean+s.d. that is comparable to the image dimensions. 95% should be within mean + 2 x s.d. ideally.
4) Avoid organic shapes
This will yield no features, so is the same criteria as the number of features.
5) Avoid repetitive patterns
Match detected features against itself and make sure there aren't too many duplicates.
Vuforia do the same .
But if you want to write your own code to do the same then,
ARToolkit is open source SDK which provide same feature for NFT markers . if you go through the source code of ARToolkit then you
will find something like " DisplayFeatureSet"
There is DisplayfeatureSet.exe file also there which show the
feature(Hotspots) of selected image like:
Somehow I managed to get source code(.c) for this.
Here I providing My google Drive Link to download Source Code, Work on it and share your experience :
Source Code to Display Feature Set
Best Luck :)

Reduce a Caffe network model

I'd like to use Caffe to extract image features. However, it takes too long to process an image, so I'm looking for ways to optimize for speed.
One thing I noticed is that the network definition I'm using has four extra layers on top the one from which I'm reading a result (and there are no feedback signals, so they should be safe to delete).
I tried to delete them from the definition file but it had no effect at all. I guess I might need to remove the corresponding part of the file that contains pre-trained weights, too. That is, however, a binary file (a protobuffer) so editing it is not that easy.
Do you think that removing the four layers might have a profound effect of the net performance?
If so then how do I get familiar with the file contents so that I could edit it and how do I know which parts to remove?
first, I don't think removing the binary weights will have any effect.
Second, you can do it easily using the python interface: see this tutorial.
Last but not least, have you tried running caffe time to measure the performance of your net? this may help you identify the bottlenecks of your computations.
PS,
You might find this thread relevant as well.
Caffemodel stores data as key-value pair. Caffe only copies weight for those layers (in train.prototxt) having exactly same name as caffemodel. Hence I don't think removing binary weights will work. If you want to change network structure, just modify train.prototxt and deploy.txt.
If you insist to remove weights from binary file, follow this caffe example.
And to make sure you delete right part, this visualizing tool should help.
I would retrain on a smaller input size, change strides, etc. However if you want to reduce file size, I'd suggest quantizing the weights https://github.com/yuanyuanli85/CaffeModelCompression and then using something like lzma compression (xz for unix). We do this so we can deploy to mobile devices. 8 bit weights compress nicely.

Creating custom voice commands (GNU/Linux)

I'm looking for advices, for a personal project.
I'm attempting to create a software for creating customized voice commands. The goal is to allow user/me to record some audio data (2/3 secs) for defining commands/macros. Then, when the user will speak (record the same audio data), the command/macro will be executed.
The software must be able to detect a command in less than 1 second of processing time in a low-cost computer (RaspberryPi, for example).
I already searched in two ways :
- Speech Recognition (CMU-Sphinx, Julius, simon) : There is good open-source solutions, but they often need large database files, and speech recognition is not really what I'm attempting to do. Speech Recognition could consume too much power for a small feature.
- Audio Fingerprinting (Chromaprint -> http://acoustid.org/chromaprint) : It seems to be almost what I'm looking for. The principle is to create fingerprint from raw audio data, then compare fingerprints to determine if they can be identical. However, this kind of software/library seems to be designed for song identification (like famous softwares on smartphones) : I'm trying to configure a good "comparator", but I think I'm going in a bad way.
Do you know some dedicated software or parcel of code doing something similar ?
Any suggestion would be appreciated.
I had a more or less similar project in which I intended to send voice commands to a robot. A speech recognition software is too complicated for such a task. I used FFT implementation in C++ to extract Fourier components of the sampled voice, and then I created a histogram of major frequencies (frequencies at which the target voice command has the highest amplitudes). I tried two approaches:
Comparing the similarities between histogram of the given voice command with those saved in the memory to identify the most probable command.
Using Support Vector Machine (SVM) to train a classifier to distinguish voice commands. I used LibSVM and the results are considerably better than the first approach. However, one problem with SVM method is that you need a rather large data set for training. Another problem is that, when an unknown voice is given, the classifier will output a command anyway (which is obviously a wrong command detection). This can be avoided by the first approach where I had a threshold for similarity measure.
I hope this helps you to implement your own voice activated software.
Song fingerprint is not a good idea for that task because command timings can vary and fingerprint expects exact time match. However its very easy to implement matching with DTW algorithm for time series and features extracted with CMUSphinx library Sphinxbase. See Wikipedia entry about DTW for details.
http://en.wikipedia.org/wiki/Dynamic_time_warping
http://cmusphinx.sourceforge.net/wiki/download