I am using Yolact https://github.com/dbolya/yolact ,an instance segmentation algorithm which outputs the test image with a mask on the detected object. As the input images are given with the coordinates of polygons around the input classes in the annotations.json, I want to get an output like this. But I can't figure out how to extract the coordinates of those contours/polygons.
As far as I understood from this script https://github.com/dbolya/yolact/blob/master/eval.py the output is list of tensors for detected objects. It contains classes, scores, boxes and mask for evaluated image. The eval.py script returns recognized image with all this information. Recognition is saved in 'preds' in evalimg function (line 595), and post-processing of predict result is in the "def prep_display" (line 135)
Now how do I extract those polygon coordinates and save it in .JSON file or whatever else?
I also tried to look at these but couldn't figure out sadly!
https://github.com/dbolya/yolact/issues/286
and
https://github.com/dbolya/yolact/issues/256
You need to create a complete post-processing pipeline that is specific to your task. Here's small pseudocode that could be added to the prep_disply() in eval.py
with timer.env('Copy'):
if cfg.eval_mask_branch:
# Add the below line to get all the predicted objects as a list
all_objects_mask = t[3][:args.top_k]
# Convert each object mask to binary and then
# Use OpenCV's findContours() method to extract the contour points for each object
Related
I'm using colmap. I succeed to visualize a 3D sparse reconstitution from a video.
Now I have some new images from the same scene and I want to (only) localize them. I want the (x,y,z, angles) of the camera.
Following the doc, I used the commands colmap feature_extractor and colmap vocab_tree_matcher.
Everything seemed to get well; the output is
Indexing image [1/23] in 0,022s
...
Indexing image [23/23] in 0,077s
Matching image [1/16] in 0,078s
...
Matching image [16/16] in 0,003s
Elapsed time: 0,043 [minutes]
But now what ?
How do I query the colmap database to get the (x,y,z,angle) of image, say, 12 ?
I want to programmatically get the information.
I have a dataset of a single class (rectangular object) with a size of 130 images. My goal is to detect the object & draw a circle/dot/mark in the centre of the object.
Because the objects are rectangular, my idea is to get the dimensions of the predicted bounding box and take the circle/dot/mark as (width/2, height/2).
However, if I were to do transfer learning, would YOLO be a good choice to detect a single class of objects in a small dataset?
YOLO should be fine. However it is old now. Try YoloV4 for better results.
People have tried transfer learning from FasterRCNN to detect single objects with 300 images and it worked fine. (Link). However 130 images is a bit smaller. Try augmenting images - flipping, rotating etc if you get inferior results.
Use same augmentation for annotation as well while doing translation, rotation, flip augmentations. For example in pytorch, for segmentation, I use:
if random.random()<0.5: # Horizontal Flip
image = T.functional.hflip(image)
mask = T.functional.hflip(mask)
if random.random()<0.25: # Rotation
rotation_angle = random.randrange(-10,11)
image = T.functional.rotate(image,angle = rotation_angle)
mask = T.functional.rotate(mask ,angle = rotation_angle)
For bounding box you will have to create coordinates, x becomes width-x for horizontal flip.
Augmentations where object position is not changing: do not change annotations e.g.: gamma intensity transformation
I've labelled objects on images with Google Cloud AutoML label tool. Than I've exported csv file. Here is the output:
TRAIN,gs://optik-vcm/optikic/80-2020-03-19T11:58:25.819Z.jpg,kenarcizgi,0.92590326,0.035908595,0.9589712,0.035908595,0.9589712,0.9020675,0.92590326,0.9020675
On the beauty, it's like that:
TRAIN
gs://optik-vcm/optikic/80-2020-03-19T11:58:25.819Z.jpg
kenarcizgi
0.92590326
0.035908595
0.9589712
0.035908595
0.9589712
0.9020675
0.92590326
0.9020675
I know first three columns.
I'll increase the images count by making data augmentation. I'll use OpenCV in Python for that. But I need coordinates of objects on the image.
How can I convert these decimals to pixel coordinations? Or is there any calculation for that?
These are called a NormalizedVertex.
A vertex represents a 2D point in the image. The normalized vertex coordinates are between 0 to 1 fractions relative to the original plane (image, video). E.g. if the plane (e.g. whole image) would have size 10 x 20 then a point with normalized coordinates (0.1, 0.3) would be at the position (1, 6) on that plane.
To get a pixel coordinate, you can multiply that number by your input width or length as appropriate.
The entire reference for the CSV formatting explains the following (truncated) makes up each row (one row per bounding box or per image):
TRAIN - Which set to assign the content in this row to
gs://optik-vcm/... - Google Cloud Storage URI
kenarcizgi - A label that identifies how the object is categorized
A bounding box for an object in the image:
x_relative_min, y_relative_min, x_relative_max, y_relative_min, x_relative_max, y_relative_max, x_relative_min, y_relative_max
If a user has defined curves or faces within a STEP file with colors, I'm able to read in the colors from the STEP file and create a list with this snippet:
Handle_XCAFDoc_ColorTool colorList = XCAFDoc_DocumentTool::ColorTool(STEPDocument->Main());
// List colors in the STEP File
TDF_LabelSequence colors;
colorList->GetColors(colors);
I am having trouble extracting a shape, assembly, or component based on a given color. Ideally, I would like to extract a TopoDS_Shape from a method that uses color in such a way that I can cycle through the list of colors and dump out a shape. Any thoughts? Any hints on classes to look at or strategies will be helpful.
can anyone tell me the correct method to use the getOutputValue function in the following link? Also, how does the author get the 2nd and 3rd image from the code.
http://www.codeproject.com/Articles/385658/Multidimensional-Discrete-Wavelet-Transform
Thanks
Okay, usage:
I haven't tried it yet, but from what I get you simply call getOutputValue() to get one result. The parameter is a vector containing the "coordinates" (based on the number of dimensions in your input).
Images:
In this example, the author obviously used the image data as the discrete values, e.g. a black pixel would be 0 and a white pixel would be 255 with all other shades of grey being inbetween (default 8 bit grayscale image).
He then used the output signal/result to recreate a image (i.e. interpret the values as pixels once again).