Saving image twice using PIL - python-2.7

What happens if I save an image twice using PIL, with same image quality.
from PIL import Image
quality = 85
# Open original image and save
img = Image.open('image.jpg')
img.save('test1.jpg', format='JPEG', quality=quality)
# Open the saved image and save again with same quality
img = Image.open('test1.jpg')
img.save('test2.jpg', format='JPEG', quality=quality)
There is almost no difference in the image size or the image quality.
Can I assume that saving an image multiple times with same quality does not affect the actual image quality and that it is a safe to do so?
Also, if I save an image with 85% quality and then open and save with 95% quality, the image size becomes much larger. Does that mean PIL decompresses the image and compresses it again?

In most cases your test1.jpg and test2.jpg images will be slightly different. Meaning, a loss of information stored in test1.jpg will hapen after you open (decompress) and save it (compress again) with lossy JPEG compression.
In some cases however, opening and storing a JPEG image with same software will not introduce any changes.
Take a look at this example:
from PIL import Image
import os
import hashlib
def md5sum(fn):
hasher = hashlib.md5()
with open(fn, 'rb') as f:
hasher.update(f.read())
return hasher.hexdigest()
TMP_FILENAME = 'tmp.jpg'
orig = Image.open(INPUT_IMAGE_FILENAME)
orig.save(TMP_FILENAME) # first JPG compression, standard quality
d = set()
for i in range(10000):
# Compute file statistics
file_size = os.stat(TMP_FILENAME).st_size
md5 = md5sum(TMP_FILENAME)
print ('Step {}, file size = {}, md5sum = {}'.format(i, file_size, md5))
if md5 in d: break
d.add(md5)
# Decompress / compress
im = Image.open(TMP_FILENAME)
im.save(TMP_FILENAME, quality=95)
It will open and save a JPG file repeatedly until a cycle is found (meaning an opened image has exactly the same data as occurred before).
In my testing, it takes anywhere from 50 to 700 cycles to reach a steady state (when opening and saving image does not produce any loss). However, the final "steady" image is noticeably different from the original.
Image after first JPG compression:
Resulting "steady" image after 115 compress/decompress cycles:
Sample output:
Step 0, file size = 38103, md5sum = ea28705015fe6e12b927296c53b6d147
Step 1, file size = 71707, md5sum = f5366050780be7e9c52dd490e9e69316
...
Step 113, file size = 70050, md5sum = 966aabe454aa8ec4fd57875bab7733da
Step 114, file size = 70050, md5sum = 585ecdd66b138f76ffe58fe9db919ad7
Step 115, file size = 70050, md5sum = 585ecdd66b138f76ffe58fe9db919ad7
So even though I used a relatively high quality setting of 95, as you can see, multiple repeated compression/decompression made the image to lose its colors and sharpness. Even for quality setting of 100 the result will be very similar despite almost twice bigger file size.

Related

Read, process and show the pixels in .EXR format images

I want to read the exr file format images and see the pixel intensities in the corresponding location. And also wanted to stack them together to give them into a neural network. How can I do the normal image processing on these kind of formats? Please help me in doing this!
I have tried this code using OpenEXR file but unable to proceed further.
import OpenEXR
file = OpenEXR.InputFile('file_name.exr')
I am expected to see the normal image processing tools like
file.size()
file.show()
file.write('another format')
file.min()
file.extract_channels()
file.append('another exr file')
OpenEXR seems to be lacking the fancy image processing features such as displaying images or saving the image to a different format. For this I would suggest you using OpenCV, which is full of image processing features.
What you may need to do is:
Read exr using OpenEXR only, then extract channels and convert them to numpy arrays as rCh = np.asarray(rCh, dtype=np.uint8)
Create a RGB image from these numpy arrays as img_rgb = cv2.merge([b, g, r]).
Use OpenCV functions for your listed operations:
Size: img_rgb.shape
Show: cv2.imshow(img_rgb)
Write: cv2.imwrite("path/to/file.jpg", img_rgb)
Min: np.min(b), np.min(g), np.min(r)
Extract channels: b, g, r = cv2.split(img_rgb)
There is an example on the OpenEXR webpage:
import sys
import array
import OpenEXR
import Imath
if len(sys.argv) != 3:
print "usage: exrnormalize.py exr-input-file exr-output-file"
sys.exit(1)
# Open the input file
file = OpenEXR.InputFile(sys.argv[1])
# Compute the size
dw = file.header()['dataWindow']
sz = (dw.max.x - dw.min.x + 1, dw.max.y - dw.min.y + 1)
# Read the three color channels as 32-bit floats
FLOAT = Imath.PixelType(Imath.PixelType.FLOAT)
(R,G,B) = [array.array('f', file.channel(Chan, FLOAT)).tolist() for Chan in ("R", "G", "B") ]
After this, you should have three arrays of floating point data, one per channel. You could easily convert these to numpy arrays and proceed with opencv as user #ZdaR suggests.

How to click images in open cv python after a certain interval of time and simaltaneously save a fixed number of those captured images?

Since I am very much new to this language, with whatever little knowledge I have, I have written code.
The code is getting executed thrice, but the three images are being overwritten and at the end there is just one image that is available instead of 3 different images (which is my goal).
import cv2
#helps in turning on the camera
cap = cv2.VideoCapture(0)
#camera clicks the images for 3 times
a = 0
while (a < 3):
a = a+1
#creating a frame
check, frame = cap.read()
print(check)
print(frame)
#conversion of image to grayscale
image = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
#shows the frame
cv2.imshow("capturing",image)
#Saving Of image
status = cv2.imwrite('path of where the image is to be saved.jpg',image)
print("Image written to file-system : ",status)
#turns off the camera
cap.release
cv2.waitKey(0)
cv2.destroyAllWindows()

Doing OCR to identify text written on trucks/cars or other vehicles

I am new to the world of Computer Vision.
I am trying to use Tesseract to detect numbers written on the side of trucks.
So for this example, I would like to see CMA CGM as the output.
I fed this image to Tesseract via command line
tesseract image.JPG out -psm 6
but it yielded a blank file.
Then I read the documentation of Tesserocr (python wrapper of Tesseract) and tried the following code
with PyTessBaseAPI() as api:
api.SetImage(image)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print 'Found {} textline image components.'.format(len(boxes))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
"confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)
and again it was not able to read any characters in the image.
My question is how should I go about solving this problem? ( I am not looking for a ready made code, but approach on how to go about solving this problem).
Would I need to train tesseract with sample images or can I just write code using existing libraries to somehow detect the co-ordinates of the truck and try to do OCR only within the boundaries of the truck?
Tesseract expects document-only images, but you have non-document objects in your image. You need a sophisticated segmentation(then probably some image processing) process before feeding it to Tesseract-OCR.
I have a three-step solution
Take the part of the image you want to recognize
Apply Gaussian-blur
Apply simple-thresholding
You can use a range to get the part of the image.
For instance, if you select the
height range as: from (int(h/4) + 40 to int(h/2)-20)
width range as: from int(w/2) to int((w*3)/4)
Result
Take Part
Gaussian
Threshold
Pytesseract
CMA CGM
Code:
import cv2
import pytesseract
img = cv2.imread('YizU3.jpg')
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = gry[int(h/4) + 40:int(h/2)-20, int(w/2):int((w*3)/4)]
blr = cv2.GaussianBlur(gry, (3, 3), 0)
thr = cv2.threshold(gry, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)

Tensorflow return similar images

I want to use Google's Tensorflow to return similar images to an input image.
I have installed Tensorflow from http://www.tensorflow.org (using PIP installation - pip and python 2.7) on Ubuntu14.04 on a virtual machine CPU.
I have downloaded the trained model Inception-V3 (inception-2015-12-05.tgz) from http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz that is trained on ImageNet Large Visual Recognition Challenge using the data from 2012, but I think it has both the Neural network and the classifier inside it (as the task there was to predict the category). I have also downloaded the file classify_image.py that classifies an image in 1 of the 1000 classes in the model.
So I have a random image image.jpg that I an running to test the model. when I run the command:
python /home/amit/classify_image.py --image_file=/home/amit/image.jpg
I get the below output: (Classification is done using softmax)
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 3
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 3
trench coat (score = 0.62218)
overskirt (score = 0.18911)
cloak (score = 0.07508)
velvet (score = 0.02383)
hoopskirt, crinoline (score = 0.01286)
Now, the task at hand is to find images that are similar to the input image (image.jpg) out of a database of 60,000 images (jpg format, and kept in a folder at /home/amit/images). I believe this can be done by removing the final classification layer from the inception-v3 model, and using the feature set of the input image to find cosine distance from the feature set all the 60,000 images, and we can return the images having less distance (cos 0 = 1)
Please suggest me the way forward for this problem and how do I do this using Python API.
I think I found an answer to my question:
In the file classify_image.py that classifies the image using the pre trained model (NN + classifier), I made the below mentioned changes (statements with #ADDED written next to them):
def run_inference_on_image(image):
"""Runs inference on an image.
Args:
image: Image file name.
Returns:
Nothing
"""
if not gfile.Exists(image):
tf.logging.fatal('File does not exist %s', image)
image_data = gfile.FastGFile(image, 'rb').read()
# Creates graph from saved GraphDef.
create_graph()
with tf.Session() as sess:
# Some useful tensors:
# 'softmax:0': A tensor containing the normalized prediction across
# 1000 labels.
# 'pool_3:0': A tensor containing the next-to-last layer containing 2048
# float description of the image.
# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG
# encoding of the image.
# Runs the softmax tensor by feeding the image_data as input to the graph.
softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
feature_tensor = sess.graph.get_tensor_by_name('pool_3:0') #ADDED
predictions = sess.run(softmax_tensor,
{'DecodeJpeg/contents:0': image_data})
predictions = np.squeeze(predictions)
feature_set = sess.run(feature_tensor,
{'DecodeJpeg/contents:0': image_data}) #ADDED
feature_set = np.squeeze(feature_set) #ADDED
print(feature_set) #ADDED
# Creates node ID --> English string lookup.
node_lookup = NodeLookup()
top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
for node_id in top_k:
human_string = node_lookup.id_to_string(node_id)
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
I ran the pool_3:0 tensor by feeding in the image_data to it. Please let me know if I am doing a mistake. If this is correct, I believe we can use this tensor for further calculations.
Tensorflow now has a nice tutorial on how to get the activations before the final layer and retrain a new classification layer with different categories:
https://www.tensorflow.org/versions/master/how_tos/image_retraining/
The example code:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py
In your case, yes, you can get the activations from pool_3 the layer below the softmax layer (or the so-called bottlenecks) and send them to other operations as input:
Finally, about finding similar images, I don't think imagenet's bottleneck activations are very pertinent representation for image search. You could consider to use an autoencoder network with direct image inputs.
(source: deeplearning4j.org)
Your problem sounds similar to this visual search project

Why is my OpenCV Video Refusing to Write to Disk?

So I am starting to get very confused by the openCV libraries ability to write out video to disk, because even the openCV documentation is not terribly clear as to how the video actually gets written in this case. The code I have below seems to collect the data just fine but the video file it tries to write has no data in it. All I want to do is take a video that I know I can, change the data within it to a ramp between 0 and 255, and then write that data back out to disk. However, the final I/O step is not cooperating for reasons I don't understand. Can anyone help? Find the code below:
import numpy as np
import cv2
import cv2.cv as cv
cap = cv2.VideoCapture("/Users/Steve/Documents/TestVideo.avi") #The video
height = cap.get(cv.CV_CAP_PROP_FRAME_HEIGHT) #We get some properties of the video
width = cap.get(cv.CV_CAP_PROP_FRAME_WIDTH)
fps = cap.get(cv.CV_CAP_PROP_FPS)
fourcc = cv2.cv.CV_FOURCC(*'PDVC') #This is essential for testing
out = cv2.VideoWriter('output.avi',fourcc, int(fps), (int(width),int(height)))
xaxis = np.arange(width,dtype='int')
yaxis = np.arange(height,dtype='int')
xx,yy = np.meshgrid(xaxis,yaxis)
ramp=256*xx/int(width) #This is a horizontal ramp image that scales from 0-255 across the width of the image
i=0
while(cap.isOpened()):
if i%100==0: print i
i+=1
ret, frame = cap.read() #Grab a frame
if ret==True:
# Change the frame data to the ramp instead of the original video
frame[:,:,0]=ramp #The camera is B/W so the image is in B/W
frame[:,:,1]=ramp
frame[:,:,2]=ramp
out.write(frame) #Write to disk?
cv2.imshow('frame',frame) # I see the ramp as an imshow
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
cap.release() #Clear windows
out.release()
cv2.destroyAllWindows()
Your code is generally correct, but is likely silently failing at some step along the way.
try adding some debug lines:
out = cv2.VideoWriter('output2.avi',fourcc, int(fps), (int(width),int(height)))
or
else:
print "frame %d is false" % i
break
When I was testing your code locally I found the fps was set to 0 for most .avi files I read. Manually setting it to 15 or 30 worked.
I also didn't have any luck getting your fourcc to work on my machine (osx), but this one worked fine.
fourcc = cv2.cv.CV_FOURCC('m', 'p', '4', 'v')