I've labelled objects on images with Google Cloud AutoML label tool. Than I've exported csv file. Here is the output:
On the beauty, it's like that:
I know first three columns.
I'll increase the images count by making data augmentation. I'll use OpenCV in Python for that. But I need coordinates of objects on the image.
How can I convert these decimals to pixel coordinations? Or is there any calculation for that?
These are called a NormalizedVertex.
A vertex represents a 2D point in the image. The normalized vertex coordinates are between 0 to 1 fractions relative to the original plane (image, video). E.g. if the plane (e.g. whole image) would have size 10 x 20 then a point with normalized coordinates (0.1, 0.3) would be at the position (1, 6) on that plane.
To get a pixel coordinate, you can multiply that number by your input width or length as appropriate.
The entire reference for the CSV formatting explains the following (truncated) makes up each row (one row per bounding box or per image):
TRAIN - Which set to assign the content in this row to
gs://optik-vcm/... - Google Cloud Storage URI
kenarcizgi - A label that identifies how the object is categorized
A bounding box for an object in the image:
x_relative_min, y_relative_min, x_relative_max, y_relative_min, x_relative_max, y_relative_max, x_relative_min, y_relative_max
I have a dataset of a single class (rectangular object) with a size of 130 images. My goal is to detect the object & draw a circle/dot/mark in the centre of the object.
Because the objects are rectangular, my idea is to get the dimensions of the predicted bounding box and take the circle/dot/mark as (width/2, height/2).
However, if I were to do transfer learning, would YOLO be a good choice to detect a single class of objects in a small dataset?
YOLO should be fine. However it is old now. Try YoloV4 for better results.
People have tried transfer learning from FasterRCNN to detect single objects with 300 images and it worked fine. (Link). However 130 images is a bit smaller. Try augmenting images - flipping, rotating etc if you get inferior results.
Use same augmentation for annotation as well while doing translation, rotation, flip augmentations. For example in pytorch, for segmentation, I use:
if random.random()<0.5: # Horizontal Flip
image = T.functional.hflip(image)
mask = T.functional.hflip(mask)
if random.random()<0.25: # Rotation
rotation_angle = random.randrange(-10,11)
image = T.functional.rotate(image,angle = rotation_angle)
mask = T.functional.rotate(mask ,angle = rotation_angle)
For bounding box you will have to create coordinates, x becomes width-x for horizontal flip.
Augmentations where object position is not changing: do not change annotations e.g.: gamma intensity transformation
How to implement connected component labeling in python with open cv?
This is an image example:
I need connected component labeling to separate objects on a black and white image.
The OpenCV 3.0 docs for connectedComponents() don't mention Python but it actually is implemented. See for e.g. this SO question. On OpenCV 3.4.0 and above, the docs do include the Python signatures, as can be seen on the current master docs.
The function call is simple: num_labels, labels_im = cv2.connectedComponents(img) and you can specify a parameter connectivity to check for 4- or 8-way (default) connectivity. The difference is that 4-way connectivity just checks the top, bottom, left, and right pixels and sees if they connect; 8-way checks if any of the eight neighboring pixels connect. If you have diagonal connections (like you do here) you should specify connectivity=8. Note that it just numbers each component and gives them increasing integer labels starting at 0. So all the zeros are connected, all the ones are connected, etc. If you want to visualize them, you can map those numbers to specific colors. I like to map them to different hues, combine them into an HSV image, and then convert to BGR to display. Here's an example with your image:
import cv2
import numpy as np
img = cv2.imread('eGaIy.jpg', 0)
img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)[1] # ensure binary
num_labels, labels_im = cv2.connectedComponents(img)
def imshow_components(labels):
# Map component labels to hue val
label_hue = np.uint8(179*labels/np.max(labels))
blank_ch = 255*np.ones_like(label_hue)
labeled_img = cv2.merge([label_hue, blank_ch, blank_ch])
# cvt to BGR for display
labeled_img = cv2.cvtColor(labeled_img, cv2.COLOR_HSV2BGR)
# set bg label to black
labeled_img[label_hue==0] = 0
cv2.imshow('labeled.png', labeled_img)
My adaptation of the CCL in 2D is:
1) Convert the image into a 1/0 image, with 1 being the object pixels and 0 being the background pixels.
2) Make a 2 pass CCL algorithm by implementing the Union-Find algorithm with pass compression. You can see more here.
In the First pass in this CCL implementation, you check the neighbor pixels, in the case your target pixel is an object pixel, and compare their label between them so that you can generate equivalences between them. You assign the least label, of those neighbor pixels which are objects pixels (label>0) to your target pixel. In this way, you are not only assigning an object label to your target pixesl (label>0) but also creating a list of equivalences.
2) In the second pass, you go through all the pixels, and change their previous label by the label of its parent label by just looking into the equivalent table stored in your Union-Find class.
3)I implemented an additional pass to make the labels follow a sequential order (1,2,3,4....) instead of a random order (23,45,1,...). That involves changing the labels "name" just for aesthetic purposes.
I have a Kinect and I'm using OpenCV and point cloud library. I would like to project the IR Image onto a 2D plane for forklift pallet detection. How would I do that?
I'm trying to detect the pallet in the forklift here is an image:
Where are the RGB data? You can use it to help with the detection. You do not need to project the image onto any plane to detect a pellet. There are basically 2 ways used for detection
non-deterministic based on neural network, fuzzy logic, machine learning, etc
This approach need a training dataset to recognize the object. Much experience is needed for proper training set and classifier architecture/topology selection. But other then that you do not need to program it... as usually some readily available lib/tool is used just configure and pass the data.
deterministic based on distance or correlation coefficients
I would start with detecting specific features like:
pallet has specific size
pallet has sharp edges and specific geometry shape in depth data
pallet has specific range of colors (yellowish wood +/- lighting and dirt)
Wood has specific texture patterns
So compute some coefficient for each feature how close the object is to real pallet. And then just treshold the distance of all coefficients combined (possibly weighted as some features are more robust).
I do not use the #1 approach so I would go for #2. So combine the RGB and depth data (they have to be matched exactly). Then segmentate the image (based on depth and color). After that for each found object classify if it is pallet ...
Your colored image does not correspond to depth data. The aligned gray-scale has poor quality and the depth data image is also very poor. Is the depth data processed somehow (loosing precision)? If you look at your data from different sides:
You can see how poor it is so I doubt you can use depth data for detection at all...
PS. I used my Align already captured rgb and depth images for the visualization.
The only thing left is the colored image and detect areas with matching color only. Then detect the features and classify. The color of your pallet in the image is almost white. Here HSV reduced colors to basic 16 colors (too lazy to segmentate)
You should obtain range of colors of the pallets possible by your setup to ease up the detection. Then check those objects for the features like size, shape,area,circumference...
So I would start with Image preprocessing:
convert to HSV
treshold only pixels close to pallet color
I chose (H=40,S=18,V>100) as a pallet color. My HSV ranges are <0,255> per channel so Hue angle difference can be only <-180deg,+180deg> max which corresponds to <-128,+128> in my ranges.
remove too thin areas
Just scan all Horizontal an Vertical lines count consequent set pixels and if too small size recolor them to black...
This is the result:
On the left the original image (downsized so it fits to this page), In the middle is the color treshold result and last is the filtering out of small areas. You can play with tresholds and pallet color to change behavior to suite your needs.
Here C++ code:
int tr_d=10; // min size of pallet [pixels[
int h,s,v,x,y,xx;
color c;
pic2.resize(pic1.xs*3,pic1.ys); xx=0;
pic2.bmp->Canvas->Draw(xx,0,pic0.bmp); xx+=pic1.xs;
// [color selection]
for (y=0;y<pic1.ys;y++)
for (x=0;x<pic1.xs;x++)
// get color from image
// distance to white-yellowish color in HSV (H=40,S=18,V>100)
// hue is cyclic angular so use only shorter angle
if (h<-128) h+=256;
if (h>+128) h-=256;
// abs value
if (h< 0) h=-h;
if (s< 0) s=-s;
// treshold close colors
if (h<25)
if (s<25)
if (v>100)
pic2.bmp->Canvas->Draw(xx,0,pic1.bmp); xx+=pic1.xs;
// [remove too thin areas]
for (y=0;y<pic1.ys;y++)
for (x=0;x<pic1.xs;)
for ( ;x<pic1.xs;x++) if ( pic1.p[y][x].dd) break; // find set pixel
for (h=x;x<pic1.xs;x++) if (!pic1.p[y][x].dd) break; // find unset pixel
if (x-h<tr_d) for (;h<x;h++) pic1.p[y][h].dd=0; // if too small size recolor to zero
for (x=0;x<pic1.xs;x++)
for (y=0;y<pic1.ys;)
for ( ;y<pic1.ys;y++) if ( pic1.p[y][x].dd) break; // find set pixel
for (h=y;y<pic1.ys;y++) if (!pic1.p[y][x].dd) break; // find unset pixel
if (y-h<tr_d) for (;h<y;h++) pic1.p[h][x].dd=0; // if too small size recolor to zero
pic2.bmp->Canvas->Draw(xx,0,pic1.bmp); xx+=pic1.xs;
See how to extract the borders of an image (OCT/retinal scan image) for the description of picture and color. Or look at any of my DIP/CV tagged answers. Now the code is well commented and straightforward but just need to add:
You can ignore pic2 stuff it is just the image posted above so I do not need to manually print screen and merge the subresult in paint... To improve robustness you should add enhancing of dynamic range (so the tresholds have the same conditions for any input images). Also you should compare to more then just single color (if more wood types of pallet is present).
Now you should segmentate or label the areas
loop through entire image
find first pixel set with the pallet color
flood fill the area with some distinct ID color different from set pallet color
I use black 0x00000000 space and white 0x00FFFFFF as pallete pixel color. So use ID={1,2,3,4,5...}. Also remember number of filled pixels (that is your area) so you do not need to compute it again. You can also compute bounding box directly while filling.
compute and compare features
You need to experiment with more then one image. To find out what properties are good for detection. I would go for circumference length vs area ratio. and or bounding box size... The circumference can be extracted by simply selecting all pixels with proper ID color neighboring black pixel.
See also similar Fracture detection in hand using image proccessing
Good luck and have fun ...
I begin a project about the detection.
My idea is to rank every pixels of an image (Mat).
Then, I will be able to exit which colour is dominant.
The difficulty is a colour is not unic. For exemple, Green is rgb(0, 255, 0) but is almost rgb(10, 240, 20) too.
The goal of my ranking is to exit pixels which are almost same colour. Then, with a pourcentage, I think I can locate my object.
So, my question: Is it a way to ranking pixels by colour ?
Thx a lot in advance for your answers.
There isn't a straight method of ranking as you say of pixels in colours.
However, you can find an approximation to the most dominant one.
There are several way in which you can do it:
You can calculate the histogram for each colour channel - split it into the R,G,B and compute the histogram. Then you can see where the peaks of the resulting graphs are - e.g.
If you k-means cluster the pixels at the image - in other words, represent each pixel as a 3D point with coordinated (R, G, B). Then you can segment the pixels into k most occurring colours.
If you resize the image to a 1x1 pixel image, you'll find the average of all pixel values. If there is a dominant colour, where the majority of the pixels are in close proximity, it will give a good approximation.
There however, are all approximations. Your best choice would be to use k-means and to find the cluster that either has the most elements, or is the most dense.
In case you are looking for way to locate an object with a specific colour, you can use a maximum likelihood estimation. Something like this, which was used to classify different objects, such as grass, cars, building and pavement from satellite images. You can use it with a single colour and get a heat-map of where the object is in terms of likelihood (the percentage of probability) of that pixel belonging to your object.
In an ordinary image, there's always a number of colors involved. To best average the pixels carrying almost the same colors is done by color quantization which is reducing number of colors in an image using techniques like K-mean clustering. This is best explained here with Python code:
After successful quantization, you can just try the following code to rank the colors based on their frequencies in the image.
top_n_colors = []
n = 3
colors_count = {}
(channel_b, channel_g, channel_r) = cv2.split(_processed_image)
# Flattens the 2D single channel array so as to make it easier to iterate over it
channel_b = channel_b.flatten()
channel_g = channel_g.flatten()
channel_r = channel_r.flatten()
for i in range(len(channel_b)):
RGB = str(channel_r[i]) + " " + str(channel_g[i]) + " " + str(channel_b[i])
if RGB in colors_count:
colors_count[RGB] += 1
colors_count[RGB] = 1
# taking the top n colors from the dictionary objects
_top_colors = sorted(colors_count.items(), key=lambda x: x[1], reverse=True)[0:n]
for _color in _top_colors:
_rgb = tuple([int(value) for value in _color[0].split()])
Please anyone help me to resolve my issue. I am working on image processing based project and I stuck at a point. I got this image after some processing and for further processing i need to crop or detect only deer and remove other portion of image.
This is my Initial image:
And my result should be something like this:
It will be more better if I get only a single biggest blob in the image and save it as a image.
It looks like the deer in your image is pretty much connected and closed. What we can do is use regionprops to find all of the bounding boxes in your image. Once we do this, we can find the bounding box that gives the largest area, which will presumably be your deer. Once we find this bounding box, we can crop your image and focus on the deer entirely. As such, assuming your image is stored in im, do this:
im = im2bw(im); %// Just in case...
bound = regionprops(im, 'BoundingBox', 'Area');
%// Obtaining Bounding Box co-ordinates
bboxes = reshape([bound.BoundingBox], 4, []).';
%// Obtain the areas within each bounding box
areas = [bound.Area].';
%// Figure out which bounding box has the maximum area
[~,maxInd] = max(areas);
%// Obtain this bounding box
%// Ensure all floating point is removed
finalBB = floor(bboxes(maxInd,:));
%// Crop the image
out = im(finalBB(2):finalBB(2)+finalBB(4), finalBB(1):finalBB(1)+finalBB(3));
%// Show the images
Let's go through this code slowly. We first convert the image to binary just in case. Your image may be an RGB image with intensities of 0 or 255... I can't say for sure, so let's just do a binary conversion just in case. We then call regionprops with the BoundingBox property to find every bounding box of every unique object in the image. This bounding box is the minimum spanning bounding box to ensure that the object is contained within it. Each bounding box is a 4 element array that is structured like so:
[x y w h]
Each bounding box is delineated by its origin at the top left corner of the box, denoted as x and y, where x is the horizontal co-ordinate while y is the vertical co-ordinate. x increases positively from left to right, while y increases positively from top to bottom. w,h are the width and height of the bounding box. Because these points are in a structure, I extract them and place them into a single 1D vector, then reshape it so that it becomes a M x 4 matrix. Bear in mind that this is the only way that I know of that can extract values in arrays for each structuring element efficiently without any for loops. This will facilitate our searching to be quicker. I have also done the same for the Area property. For each bounding box we have in our image, we also have the attribute of the total area encapsulated within the bounding box.
Thanks to #Shai for the spot, we can't simply use the bounding box co-ordinates to determine whether or not something has the biggest area within it as we could have a thin diagonal line that could drive the bounding box co-ordinates to be higher. As such, we also need to rely on the total area that the object takes up within the bounding box as well. Simply put, it's just the sum of all of the pixels that are contained within the object.
Therefore, we search the entire area vector that we have created to see which has the maximum area. This corresponds to your deer. Once we find this location, extract the bounding box locations, then use this to crop the image. Bear in mind that the bounding box values may have floating point numbers. As the image co-ordinates are in integer based, we need to remove these floating point values before we decide to crop. I decided to use floor. I then write code that displays the original image, with the cropped result.
Bear in mind that this will only work if there is just one object in the image. If you want to find multiple objects, check bwboundaries in MATLAB. Otherwise, I believe this should get you started.
Just for completeness, we get the following result:
While object detection is a very general CV task, you can start with something simple if the assumptions are strong enough and you can guarantee that the input images will contain a single prominent white blob well described by a bounding box.
One very simple idea is to subdivide the picture in 3x3=9 patches, calculate the statistics for each patch and compute some objective function. In the most simple case you just do a grid search over various partitions and select that with the highest objective metric. Here's an illustration:
If every line is a parameter x_1, x_2, y_1 and y_2, then you want to optimize
either by
grid search (try all x_i, y_i in some quantization steps)
genetic-algorithm-like random search
gradient descent (move every parameter in that direction that optimizes the target function)
The target function F can be define over statistics of the patches, e.g. like this
F(9 patches) {
brightest_patch = max(patches)
others = patches \ brightest_patch
score = brightness(brightest_patch) - 1/8 * brightness(others)
return score
or anything else that incorporates relevant statistics of the patches as well as their size. This also allows to incorporate a "prior knowledge": if you expect the blob to appear in the middle of the image, then you can define a "regularization" term that will penalize F if the parameters x_i and y_i deviate from the expected position too much.
Thanks to all who answer and comment on my Question. With your help I got my exact solution. I am posting my final code and result for others.
img = im2bw(imread('deer.png'));
[L, num] = bwlabel(img, 4);
%%// Get biggest blob or object
count_pixels_per_obj = sum(bsxfun(#eq,L(:),1:num));
[~,ind] = max(count_pixels_per_obj);
biggest_blob = (L==ind);
%%// crop only deer
bound = regionprops(biggest_blob, 'BoundingBox');
%// Obtaining Bounding Box co-ordinates
bboxes = reshape([bound.BoundingBox], 4, []).';
%// Obtain this bounding box
%// Ensure all floating point is removed
finalBB = floor(bboxes);
out = biggest_blob(finalBB(2):finalBB(2)+finalBB(4),finalBB(1):finalBB(1)+finalBB(3));
%%// Show images