How to convert a yolo darknet format into .csv file - computer-vision

I have a few annotations that is originally in YOLO format. I need to convert it into yolo csv format in order to train with my transformers model.
Sample .csv file I need:
Sample annotation file in CSV format
The csv attributes include: image_id, width, height and coordinates of the image's bounding box.
Any help would be appreciated!

first of all i should say there is no straight way to convert those format into csv. you should read files and parse their data.
Step 1: import libraries
we need to read txt files ( yolo labels ) from a directory and save them into csv. so we need these libraries :
import os
import glob
import pandas as pd
import numpy as np
Step 2: get list of your yolo labels
open directory where you have your yolo labels with OS library and read txt files with glob:
os.chdir(r'D:\karami\Labeled\train1\labels')
myFiles = glob.glob('*.txt')
my labels are in my labels folder so i set that directory to that.
Step 3: read lines and split data
you have your labels list in myFiles variable
you should iterate on it and read the first line of each file and split its data.
in the image you shared we have absolute coordinate of bounding boxes. according to yolo darknet document we have :
object-class x_center y_center width height
x_center,y_center,width,height - float values relative to width and
height of image, it can be equal from (0.0 to 1.0] for example: =
<absolute_x> / <image_width> or = <absolute_height> /
<image_height>
you can see full description in here
so we need to multiply our coordinates into width and height of images.
yolo file doesnt contain image size, so you should define them before
after that use your splitted string. then save your data frame. code :
width=1024
height=1024
image_id=0
final_df=[]
for item in myFiles:
row=[]
bbox_temp=[]
with open(item, 'rt') as fd:
first_line = fd.readline()
splited = first_line.split();
row.append(image_id)
row.append(width)
row.append(height)
try:
bbox_temp.append(float(splited[1])*width)
bbox_temp.append(float(splited[2])*height)
bbox_temp.append(float(splited[3])*width)
bbox_temp.append(float(splited[4])*height)
row.append(bbox_temp)
final_df.append(row)
except:
print("file is not in YOLO format!")
df = pd.DataFrame(final_df,columns=['image_id', 'width', 'height','bbox'])
df.to_csv("saved.csv",index=False)
here is my csv file :

Related

Read, process and show the pixels in .EXR format images

I want to read the exr file format images and see the pixel intensities in the corresponding location. And also wanted to stack them together to give them into a neural network. How can I do the normal image processing on these kind of formats? Please help me in doing this!
I have tried this code using OpenEXR file but unable to proceed further.
import OpenEXR
file = OpenEXR.InputFile('file_name.exr')
I am expected to see the normal image processing tools like
file.size()
file.show()
file.write('another format')
file.min()
file.extract_channels()
file.append('another exr file')
OpenEXR seems to be lacking the fancy image processing features such as displaying images or saving the image to a different format. For this I would suggest you using OpenCV, which is full of image processing features.
What you may need to do is:
Read exr using OpenEXR only, then extract channels and convert them to numpy arrays as rCh = np.asarray(rCh, dtype=np.uint8)
Create a RGB image from these numpy arrays as img_rgb = cv2.merge([b, g, r]).
Use OpenCV functions for your listed operations:
Size: img_rgb.shape
Show: cv2.imshow(img_rgb)
Write: cv2.imwrite("path/to/file.jpg", img_rgb)
Min: np.min(b), np.min(g), np.min(r)
Extract channels: b, g, r = cv2.split(img_rgb)
There is an example on the OpenEXR webpage:
import sys
import array
import OpenEXR
import Imath
if len(sys.argv) != 3:
print "usage: exrnormalize.py exr-input-file exr-output-file"
sys.exit(1)
# Open the input file
file = OpenEXR.InputFile(sys.argv[1])
# Compute the size
dw = file.header()['dataWindow']
sz = (dw.max.x - dw.min.x + 1, dw.max.y - dw.min.y + 1)
# Read the three color channels as 32-bit floats
FLOAT = Imath.PixelType(Imath.PixelType.FLOAT)
(R,G,B) = [array.array('f', file.channel(Chan, FLOAT)).tolist() for Chan in ("R", "G", "B") ]
After this, you should have three arrays of floating point data, one per channel. You could easily convert these to numpy arrays and proceed with opencv as user #ZdaR suggests.

mask and extract cell values from a vrt file?

I have raster data for built up areas around the globe with 40m resolution as vrt file, download data from here , and I am trying to crop the data by a mask and then extract color index value for each cell.
Note: another 2 files exist with the data: vrt.clr and vrt.ovr
Here is a sample of data:
view of vrt data in arcmap.
My question: why I am getting empty cells values when I crop by mask ?
I have tried the following:
extract by mask using arcmap toolbox
using gdal in python 2.7
import gdal
ds = gdal.Open('input.vrt')
ds = gdal.Translate('output.vrt', ds, projWin =
[80.439,5.341,81.048,4.686])
ds = None
I have also try to save the data as tif
Also, is there any way to read the color index value at given coordinates (x,y) after masking the data?
The data appears to be in the Pseudo Mercator projection (EPSG 3857). So therefore you should either specify the extent for projWin in that coordinate system, or add projWinSRS if you want to provide them in a different coordinate system.
Also, if you want gdal.Translate to output to a VRT file, you should add format='VRT. Because in your code snippet outputs to the default file format, which is GeoTIFF.
When i assume your coordinates are WGS84 (EPSG 4326), it defines a small region over the ocean south of Sri Lanka. That doesn't make much sense given the nature of the data.
If you want to read the array given by your coordinates you could use:
invrt = 'GHS_BUILT_LDSMT_GLOBE_R2015B_3857_38_v1_0.vrt'
outfile = '/vsimem/tmpfile'
ds = gdal.Translate(outfile, invrt, projWin=[80.439, 5.341, 81.048, 4.686], projWinSRS='EPSG:4326')
data = ds.ReadAsArray()
ds = None
gdal.Unlink(outfile)
The plotted array looks like:

create dataframe by randomly sampling from multiple files

I have a folder with several 20 million record tab delimited files in it. I would like to create a pandas dataframe where I randomly sample say 20 thousand records from each file, and then append them together in the dataframe. Does anyone know how to do that?
You could read in all the text files in a particular folder. Then you could make use of pandas Dataframe.sample (link to docs).
I've provided a fully reproducible example with two example .txt file created with 200 rows. I then take a random sample of ten rows and append the sample to a final datframe.
import pandas as pd
import numpy as np
import glob
# Change the path for the directory
directory = r'C:\some\example\folder'
# I create two test .txt files for demonstration purposes with 200 rows each
df_test = pd.DataFrame(np.random.randn(200, 2), columns=list('AB'))
df_test.to_csv(directory + r'\test_1.txt', sep='\t', index=False)
df_test.to_csv(directory + r'\test_2.txt', sep='\t', index=False)
df = pd.DataFrame()
for filename in glob.glob(directory + r'\*.txt'):
df_full = pd.read_csv(filename, sep='\t')
df_sample = df_full.sample(n=10)
df = df.append(df_sample)

How can i extract data from the first column of data frame and insert data in other columns?

I have a trouble with data frame. I have a csv file with ten columns, but all the data stores in the first column. How can i automatically extract data from the first column and put into other columns? Could you help me, please. enter image description here
This is my code:
import pandas as pd
import numpy as np
df = pd.read_csv('test_dataset.csv')
df.head(3)
one_column = df.iloc[:,0]
one_column.head(3)
This is the link for download file:
enter link description here
You can use parameter quoting=3 for no quoting in read_csv:
df = pd.read_csv('test_dataset.csv', quoting=3)
quoting : int or csv.QUOTE_* instance, default 0
Control field quoting behavior per csv.QUOTE_* constants. Use one of QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).

Doing OCR to identify text written on trucks/cars or other vehicles

I am new to the world of Computer Vision.
I am trying to use Tesseract to detect numbers written on the side of trucks.
So for this example, I would like to see CMA CGM as the output.
I fed this image to Tesseract via command line
tesseract image.JPG out -psm 6
but it yielded a blank file.
Then I read the documentation of Tesserocr (python wrapper of Tesseract) and tried the following code
with PyTessBaseAPI() as api:
api.SetImage(image)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print 'Found {} textline image components.'.format(len(boxes))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
"confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)
and again it was not able to read any characters in the image.
My question is how should I go about solving this problem? ( I am not looking for a ready made code, but approach on how to go about solving this problem).
Would I need to train tesseract with sample images or can I just write code using existing libraries to somehow detect the co-ordinates of the truck and try to do OCR only within the boundaries of the truck?
Tesseract expects document-only images, but you have non-document objects in your image. You need a sophisticated segmentation(then probably some image processing) process before feeding it to Tesseract-OCR.
I have a three-step solution
Take the part of the image you want to recognize
Apply Gaussian-blur
Apply simple-thresholding
You can use a range to get the part of the image.
For instance, if you select the
height range as: from (int(h/4) + 40 to int(h/2)-20)
width range as: from int(w/2) to int((w*3)/4)
Result
Take Part
Gaussian
Threshold
Pytesseract
CMA CGM
Code:
import cv2
import pytesseract
img = cv2.imread('YizU3.jpg')
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = gry[int(h/4) + 40:int(h/2)-20, int(w/2):int((w*3)/4)]
blr = cv2.GaussianBlur(gry, (3, 3), 0)
thr = cv2.threshold(gry, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)