I want to read the exr file format images and see the pixel intensities in the corresponding location. And also wanted to stack them together to give them into a neural network. How can I do the normal image processing on these kind of formats? Please help me in doing this!
I have tried this code using OpenEXR file but unable to proceed further.
import OpenEXR
file = OpenEXR.InputFile('file_name.exr')
I am expected to see the normal image processing tools like
file.size()
file.show()
file.write('another format')
file.min()
file.extract_channels()
file.append('another exr file')
OpenEXR seems to be lacking the fancy image processing features such as displaying images or saving the image to a different format. For this I would suggest you using OpenCV, which is full of image processing features.
What you may need to do is:
Read exr using OpenEXR only, then extract channels and convert them to numpy arrays as rCh = np.asarray(rCh, dtype=np.uint8)
Create a RGB image from these numpy arrays as img_rgb = cv2.merge([b, g, r]).
Use OpenCV functions for your listed operations:
Size: img_rgb.shape
Show: cv2.imshow(img_rgb)
Write: cv2.imwrite("path/to/file.jpg", img_rgb)
Min: np.min(b), np.min(g), np.min(r)
Extract channels: b, g, r = cv2.split(img_rgb)
There is an example on the OpenEXR webpage:
import sys
import array
import OpenEXR
import Imath
if len(sys.argv) != 3:
print "usage: exrnormalize.py exr-input-file exr-output-file"
sys.exit(1)
# Open the input file
file = OpenEXR.InputFile(sys.argv[1])
# Compute the size
dw = file.header()['dataWindow']
sz = (dw.max.x - dw.min.x + 1, dw.max.y - dw.min.y + 1)
# Read the three color channels as 32-bit floats
FLOAT = Imath.PixelType(Imath.PixelType.FLOAT)
(R,G,B) = [array.array('f', file.channel(Chan, FLOAT)).tolist() for Chan in ("R", "G", "B") ]
After this, you should have three arrays of floating point data, one per channel. You could easily convert these to numpy arrays and proceed with opencv as user #ZdaR suggests.
Related
I have a few annotations that is originally in YOLO format. I need to convert it into yolo csv format in order to train with my transformers model.
Sample .csv file I need:
Sample annotation file in CSV format
The csv attributes include: image_id, width, height and coordinates of the image's bounding box.
Any help would be appreciated!
first of all i should say there is no straight way to convert those format into csv. you should read files and parse their data.
Step 1: import libraries
we need to read txt files ( yolo labels ) from a directory and save them into csv. so we need these libraries :
import os
import glob
import pandas as pd
import numpy as np
Step 2: get list of your yolo labels
open directory where you have your yolo labels with OS library and read txt files with glob:
os.chdir(r'D:\karami\Labeled\train1\labels')
myFiles = glob.glob('*.txt')
my labels are in my labels folder so i set that directory to that.
Step 3: read lines and split data
you have your labels list in myFiles variable
you should iterate on it and read the first line of each file and split its data.
in the image you shared we have absolute coordinate of bounding boxes. according to yolo darknet document we have :
object-class x_center y_center width height
x_center,y_center,width,height - float values relative to width and
height of image, it can be equal from (0.0 to 1.0] for example: =
<absolute_x> / <image_width> or = <absolute_height> /
<image_height>
you can see full description in here
so we need to multiply our coordinates into width and height of images.
yolo file doesnt contain image size, so you should define them before
after that use your splitted string. then save your data frame. code :
width=1024
height=1024
image_id=0
final_df=[]
for item in myFiles:
row=[]
bbox_temp=[]
with open(item, 'rt') as fd:
first_line = fd.readline()
splited = first_line.split();
row.append(image_id)
row.append(width)
row.append(height)
try:
bbox_temp.append(float(splited[1])*width)
bbox_temp.append(float(splited[2])*height)
bbox_temp.append(float(splited[3])*width)
bbox_temp.append(float(splited[4])*height)
row.append(bbox_temp)
final_df.append(row)
except:
print("file is not in YOLO format!")
df = pd.DataFrame(final_df,columns=['image_id', 'width', 'height','bbox'])
df.to_csv("saved.csv",index=False)
here is my csv file :
I changed some images in pascal dataset by opencv,then i need to convert them to P mode.
I used img = Image.open(os.path.join(origin_path,name)).convert('P') to convert RGB images to P mode.But the new image is a little strange.Why the color of new image is not smooth as the original?
Is it harmful to my training?How can i deal with it?
Original Image
New Image
Oh, I see. You want to modify the palette. You can do that like this:
#!/usr/bin/env python3
import numpy as np
from PIL import Image
# Load the source image
im = Image.open('original.png')
# Extract the palette into a Numpy array and reshape as 256 RGB triplets
palette = np.array(im.getpalette())
colorVectors = np.reshape(palette,(-1,3))
# Replace any palette entries consisting of [192,128,128] with yellow [255,255,0]
colorVectors[np.all(colorVectors==[192,128,128],axis=-1)] = [255,255,0]
# Put modified palette into image and save
im.putpalette(colorVectors.ravel().tolist())
im.save('result.png')
Original Image
Output Image
If you know you want to change a specific palette entry and you know that it is, say, entry 0, you can do:
#!/usr/bin/env python3
import numpy as np
from PIL import Image
# Load the source image
im = Image.open('original.png')
# Extract the palette into a Numpy array and reshape as 256 RGB triplets
palette = np.array(im.getpalette())
colorVectors = np.reshape(palette,(-1,3))
# Make palette entry 0 into magenta
colorVectors[0]=[255,0,255]
im.putpalette(colorVectors.ravel().tolist())
im.save('result.png')
Keywords: Python, PIL, Pillow, palette, palletised, PNG, modify palette, change palette, alter palette, replace palette, image, image processing.
I am new to the world of Computer Vision.
I am trying to use Tesseract to detect numbers written on the side of trucks.
So for this example, I would like to see CMA CGM as the output.
I fed this image to Tesseract via command line
tesseract image.JPG out -psm 6
but it yielded a blank file.
Then I read the documentation of Tesserocr (python wrapper of Tesseract) and tried the following code
with PyTessBaseAPI() as api:
api.SetImage(image)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print 'Found {} textline image components.'.format(len(boxes))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
"confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)
and again it was not able to read any characters in the image.
My question is how should I go about solving this problem? ( I am not looking for a ready made code, but approach on how to go about solving this problem).
Would I need to train tesseract with sample images or can I just write code using existing libraries to somehow detect the co-ordinates of the truck and try to do OCR only within the boundaries of the truck?
Tesseract expects document-only images, but you have non-document objects in your image. You need a sophisticated segmentation(then probably some image processing) process before feeding it to Tesseract-OCR.
I have a three-step solution
Take the part of the image you want to recognize
Apply Gaussian-blur
Apply simple-thresholding
You can use a range to get the part of the image.
For instance, if you select the
height range as: from (int(h/4) + 40 to int(h/2)-20)
width range as: from int(w/2) to int((w*3)/4)
Result
Take Part
Gaussian
Threshold
Pytesseract
CMA CGM
Code:
import cv2
import pytesseract
img = cv2.imread('YizU3.jpg')
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = gry[int(h/4) + 40:int(h/2)-20, int(w/2):int((w*3)/4)]
blr = cv2.GaussianBlur(gry, (3, 3), 0)
thr = cv2.threshold(gry, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)
I have just started to use Opencv using python in windows(PyCharm IDE).
I tried to read a color image. But it got displayed in Grayscale. So I tried to convert it as below:
import cv2
img = cv2.imread('C:\Ai.jpg', 0)
b,g,r = cv2.split(img)
rgb = cv2.merge([r,g,b])
cv2.imshow('image', img)
cv2.imshow('rgb image',rgb)
cv2.waitKey(0)
cv2.destroyAllWindows()
But i am getting an error:
"b, g, r = cv2.split(img) ValueError: need more than 1 value to
unpack"
Can you guys please help me out?
Thanks in advance.
There is a problem in the second line of your code img = cv2.imread('C:\Ai.jpg', 0), as per the documentation, 0 value corresponds to cv2.IMREAD_GRAYSCALE, This is the reason why you are getting a grayscale image. You may want to change it to 1 if you want to load it in RGB color space or -1 if you want to include any other channel like alpha channel which is encoded along with the image.
And b,g,r = cv2.split(img) was raising an error because img at that point of time is a grayscale image which had only one channel, and it is impossible to split a 1 channel image to 3 respective channels.
Your final snippet may look like this:
import cv2
# Reading the image in RGB mode
img = cv2.imread('C:\Ai.jpg', 1)
# No need of following lines:
# b,g,r = cv2.split(img)
# rgb = cv2.merge([r,g,b])
# cv2.imshow('rgb image',rgb)
# Displaying the image
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Try this solution
Read and convert image into RGB format:
If you have a color image and reading it using OpenCV. First, convert it in RGB colour format
image = cv2.imread(C:\Ai.jpg') #cv2 reading image in BGR
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) #convert it into RGB format
To display it we can use cv2.imshow, matplotlib or PIL as follows
import matplotlib.pyplot as plt
%matplotlib inline
from PIL import Image
Now print using matplotlib:
plt.imshow(image)
print using PIL
Image.fromarray(image)
I am trying to save some pixels to a file using GdkPixbuf from Python in Windows. I am making use of the excellent PyGI AIO (3.14.0) binaries.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from gi.repository import Gtk, Gdk, GdkPixbuf
w, h, n = 4, 4, 4
data = bytearray(b'\x00\x00\x00\xff' * w * h)
#data = GLib.Bytes.new(b'\x00\x00\x00\xff' * w * h).get_data()
#import numpy as np
#data = np.zeros((w,h,n), np.uint8)
#data[:,:,3] = 255
#data = data.tostring()
options = {}
pixbuf = GdkPixbuf.Pixbuf.new_from_data(data, GdkPixbuf.Colorspace.RGB, True, 8, w, h, n*w, None, None)
pixbuf.savev('screenshot.bmp', 'bmp', options.keys(), options.values())
The zoomed-in result looks as follows:
Clearly, the first couple of pixels are corrupted. The amout of broken pixels seems to vary depending on the image dimensions. However, some of the pixel manage to stay intact. There must be an error in my code or the memory is getting corrupted somehow. It is possible to encode a larger image, and the error always appears in the first few pixels. Could this be a string encoding problem or something?
Edit: I have tested the program on OS X and the error is very similar. Therefore, it seems to be a general issue with the Python bindings to GdkPixbuf, potentially related to this. Here is a bigger PNG produced by a modified version of the script. The grid of red and green lines is the expected output, whereas the pixels in the upper half of the image are just noise.