I am making a small (I hope it is) project, and, of course, I use self-made functions. Is it better (more Pythonic) to move them to separate file(s) and import it(them), or just leave them in same file? Details about my project:
Uses NumPy, pyBrain, PIL. Based on my home computer. Basically just experiments with neural networks recognising digits. Current algorithm:
Generate a set of pictures containing digits
Make them fit my requirements (normalising them)
Putting them into NN friendly form
Training my NN them
Taking user's input (a picture with drawn digit)
Same as step 2+3, but for input received in 5
Feeding t into NN
Observing results.
Steps 1,2,3,6 contain functions. Total about 4-5 functions.
Related
I'm in the feasibility stage of a project and wanted to know whether the following was doable using Machine Vision:
If I wanted to see if two files were identical, I would use a hashing function of sorts (e.g. sha1 or md5) on the files and store the results in a database.
However, if I have two images where say image 1 is 90% quality and image 2 is 100% quality, then this will not work as they will have different hashes.
Using machine vision, is it possible to "look" at an image and create a signature from it, so that when another image is encountered, we can say "have we already got this image in the system", and if so, disregard the new image, and if not, save the image?
I know that you are able to perform Machine Vision comparison between two known images, e.g.:
https://www.pyimagesearch.com/2014/09/15/python-compare-two-images/
(there's a lot of code in there so I cannot simply paste in here for reference, unfortunately)
but an image by image comparison would be extremely expensive.
Thanks
python provide the module called : imagehash :
imagehash - encodes the image which is commend bellow.
from PIL import Image
import imagehash
hash = imagehash.average_hash(Image.open('./image_1.png'))
print(hash)
# d879f8f89b1bbf
otherhash = imagehash.average_hash(Image.open('./image_2.png'))
print(otherhash)
# ffff3720200ffff
print(hash == otherhash)
# False
print(hash)
above is the python code which will print "true" if images are identical and "false" if images are not identical.
Thanks.
I do not what you mean by 90% and 100%. Are they image compression quality using JPEG? Regardless of this, you can match images using many methods for example using image processing only approaches such as SIFT, SURF, BRISK, ORB, FREAK or machine learning approaches such as Siamese networks. However, they are heavy for simple computer to run (on my computer powered by core-i7 2670QM, from 100 to 2000 ms for a 2 mega pixel match), specially if you run them without parallelism ( programming without GPU, AVX, ...), specially the last one.
For hashing you may also use perceptual hash functions. They are widely used in finding cases of online copyright infringement as well as in digital forensics because of the ability to have a correlation between hashes so similar data can be found (for instance with a differing watermark) [1]. Also you can search copy move forgery and read papers around it and see how similar images could be found.
I have a project that make use of Google Vision API DOCUMENT_TEXT_DETECTION in order to extract text from document images.
Often the API has troubles in recognizing single digits, as you can see in this image:
I suppose that the problem could be related to some algorithm of noise removal, that recognizes isolated single digits as noise. Is there a way to improve Vision response in these situations? (for example managing noise threshold or others parameters)
At other times Vision confuses digits with letters:
But if I specify as parameter languageHints = 'en' or 'mt' these digits are ignored by the ocr. Is there a way to force the recognition of digits or latin characters?
Unfortunately I think the Vision API is optimized for both ends of the spectrum -- dense text (DOCUMENT_TEXT_DETECTION) on one end, and arbitrary bits of text (TEXT_DETECTION) on the other. As you noted in the comments, the regular TEXT_DETECTION works better for these stray single digits while DOCUMENT_TEXT_DETECTION works better overall.
As far as I've heard, there are no current plans to try to cover both of these in a single way, but it's possible that this could improve in the future.
I think there have been other requests to do more fine-tuning and hinting on what you're looking to detect (e.g., here and here), but this doesn't seem to be available yet. Perhaps in the future you'll be able to provide more hints on the format of the text that you're looking to find in images (e.g., phone numbers, single digits, etc).
I want to record a word beforehand and when the same password is spoken into the python script, the program should run if the spoken password matches the previously recorded file. I do not want to use the speech recognition toolkits as the passwords might not be any proper word but could be complete gibberish. I started with saving the previously recorded file and the newly spoken sound as numpy arrays. Now I need a way to determine if the two arrays are 'close' to each other. Can someone point me in the right direction for this?
It is not possible to compare to speech samples on a sample level (or time domain). Each part of the spoken words might vary in length, so they won't match up, and the levels of each part will also vary, and so on. Another problem is that the phase of the individual components that the sound signal consists of can change too, so that two signals that sound the same can look very different in the time domain. So likely the best solution is to move the signal into the frequency domain. One common way to do this is using the Fast Fourier Transform (FFT). You can look it up, there is a lot of material about this on the net, and good support for it in Python.
Then could could proceed like this:
Divide the sound sample into small segments of a few milliseconds.
Find the principal coefficients of FFT of segments.
Compare the sequences of some selected principal coefficients.
I'd like to use Caffe to extract image features. However, it takes too long to process an image, so I'm looking for ways to optimize for speed.
One thing I noticed is that the network definition I'm using has four extra layers on top the one from which I'm reading a result (and there are no feedback signals, so they should be safe to delete).
I tried to delete them from the definition file but it had no effect at all. I guess I might need to remove the corresponding part of the file that contains pre-trained weights, too. That is, however, a binary file (a protobuffer) so editing it is not that easy.
Do you think that removing the four layers might have a profound effect of the net performance?
If so then how do I get familiar with the file contents so that I could edit it and how do I know which parts to remove?
first, I don't think removing the binary weights will have any effect.
Second, you can do it easily using the python interface: see this tutorial.
Last but not least, have you tried running caffe time to measure the performance of your net? this may help you identify the bottlenecks of your computations.
PS,
You might find this thread relevant as well.
Caffemodel stores data as key-value pair. Caffe only copies weight for those layers (in train.prototxt) having exactly same name as caffemodel. Hence I don't think removing binary weights will work. If you want to change network structure, just modify train.prototxt and deploy.txt.
If you insist to remove weights from binary file, follow this caffe example.
And to make sure you delete right part, this visualizing tool should help.
I would retrain on a smaller input size, change strides, etc. However if you want to reduce file size, I'd suggest quantizing the weights https://github.com/yuanyuanli85/CaffeModelCompression and then using something like lzma compression (xz for unix). We do this so we can deploy to mobile devices. 8 bit weights compress nicely.
In an effort to see if it is possible to easily break very simple CAPTCHAs, I am attempting to write a program (as simple and small as possible). This program, which I hope to write in C++, should do the following:
Make a partial screenshot of a known area of the screen. (Assuming that the CAPTCHA is always in the exact same place - for instance: Pixels 500-600 x, pixels 300-400 y).
Automatically dissect the CAPTCHA into individual letters. (The CAPTCHAS I will create for testing will all have only a few white letters, always on a black background, spaced well apart, to make things easy on me.)
The program then compares the "cut" letters against an array of "known" images of letters (which look similar to the letters used in the CAPTCHA), which contains 26 elements, each holding an image of a single letter of the English alphabet.
The program takes the letter associates with the image that the comparison mapped to, and sends that key to the console (via std::cout)
My question is: Is there an easy-to-use library (I am only a beginner at programming), which can handle tasks 1-3 (The 4. is rather easy)? Especially the third point is something I haven't found pretty much anything worthwhile on. What would be ideal is if this library had a "score" function, using a float to indicate how similar the images are. Then, the one with the highest score is the best hit. (I.e: 100.0 means the images are identical, 29.56 means they are very different, etc.)
A good library for this job is OpenCV. http://opencv.org
OpenCV has all the necessary low-level imge processing tools to segment the different elements of the captcha. Then you can use its template matching module.
You could even try to detect letters directly without the preprocessing. It will be slower, but the captcha image is typically so small, that it should rarely matter. See:
http://docs.opencv.org/modules/imgproc/doc/object_detection.html#cv2.matchTemplate
For some tutorials to get into the library see:
http://docs.opencv.org/doc/tutorials/tutorials.html