How do I return absolute coordinates for bounding boxes with CustomVision API? - computer-vision

I'm wondering whether it is possible for the Azure Cognitive Custom Vision Prediction API to return absolute coordinates instead of percentile based ones:
In the above screenshot, you can see the top, left, width and height properties of the prediction results.
Is there any way to let the API return absolute coordinates instead of - what I assume - percentage wise coordinates?
Extra: does anyone have an idea why it returns this type of coordinates?

There is no way of getting absolute values in the current API.
You just have to multiply those relative values by your image width / height. I made an answer about that a moment ago, you can have a look at it here: How to use Azure custom vision service response boundingBox to plot shape
For your extra question: I guess the result is in relative because the processing is scaling/resizing the image to specific ratio. As you can see in the sample of the consumption of an exported Custom Vision model here, they are rescaling the image to a 256x256 square

Related

Map real world height to point cloud height

I have a point cloud that consists of a cube on a table. I'm currently trying to filter out everything except for the cube in the point cloud. Given the table height in reality and the camera pose and intrinsics, is there a way to know the coordinate of the table's bounding box so that I could filter out everything else? Thank you in advance!

Motion detection by eliminating constant movements

I am trying to implement a motion detection in OpenCV C++. I tried various methods like MOG, Optical flow which work fine but is there a way we can eliminate constant movements in the scene like a constant fan motion etc ? I have opencv accumuateWeighted() in mind but not sure if it works. Is there any better way we can do it ?
I have not got full robust solution and also i don't have any experience with video processing but i would put my idea whatever till now i have got in to this problem:
First consider a few pairs of consecutive image frames from the video and convert them to gray scale for more robust comparison.
Raster scan the image pairs and find the difference of image pairs by comparing corresponding pairs.
The resultant image will give the pixel location where there is a change in image to image in a pair, cluster these pixels locations and make a bounding box over them. So that this bounding box region will mark an object which is translating/rotation.
Now as we have applied the above image difference operation over several pairs. We will have rotating/translating bounding box in each image pair difference.
Now check in each resultant image difference with pixels having bounding box over them.
Compare bounding box central location in a difference image with other difference images. If bounding box with a very slight variation in its central location exists across all difference images then object contained in that bounding box will be having rotational motion like Fan,leaves and remaining bounding boxes will represent the actual translating objects in the video.

Finding regions of higher numbers in a matrix

I am working on a project to detect certain objects in an aerial image, and as part of this I am trying to utilize elevation data for the image. I am working with Digital Elevation Models (DEMs), basically a matrix of elevation values. When I am trying to detect trees, for example, I want to search for tree-shaped regions that are higher than their surrounding terrain. Here is an example of a tree in a DEM heatmap:
https://i.stack.imgur.com/pIvlv.png
I want to be able to find small regions like that that are higher than their surroundings.
I am using OpenCV and GDAL for my actual image processing. Do either of those already contain techniques for what I'm trying to accomplish? If not, can you point me in the right direction? Some ideas I've had are going through each pixel and calculating the rate of change in relation to it's surrounding pixels, which would hopefully mean that pixels with high rates change/steep slopes would signify an edge of a raised area.
Note that the elevations will change from image to image, and this needs to work with any elevation. So the ground might be around 10 meters in one image but 20 meters in another.
Supposing you can put the DEM information into a 2D Mat where each "pixel" has the elevation value, you can find local maximums by applying dilate and then substract the result from the original image.
There's a related post with code examples in: http://answers.opencv.org/question/28035/find-local-maximum-in-1d-2d-mat/

how to detect a bin / box in pcl?

Hello I am new to PCL (point cloud library) and my task is to detect 3d objects in a box / bin using surflet pairs using kinect. I am able to segment and cluster some of the objects but my algorithm also detects the box as a segment. I want to know how can I detect and remove only the box in the scene ?
should I use PCA or SIFT ?
Thank you,
Saransh Vora
You could run a planar ransac and subtract all points that belong to planes of sufficiently large sizes. An additional specification would be to only subtract out planes that have a normal vector at nearly 90 degrees from unit-z. This would allow you to search for smaller planes without fearing cutting your objects in your box too badly as it would make your filter highly specific to vertical planes.
Another note, if your box doesn't move... you could just save your (empty) point cloud ie the cloud when there are no objects in the box and then when you get a new cloud use that saved cloud as a proximity filter to cut out all points which are sufficiently close to what was labeled as background.

matrix image processing in OpenGL CE

im trying to create an image filter in OpenGL CE. currently I am trying to create a series of 4x4 matrices and multiply them together. then use glColorMask and glColor4f to adjust the image accordingly. I've been able to integrate hue rotation, saturation, and brightness. but i am having trouble adding contrast. thus far google hasn't been to helpful. I've found a few matrices but they don't seem to work. do you guys have any ideas?
I have to say, I haven't heard of using a 4x4 matrix to do brightness or contrast. I think of these operations as being done on the histogram of the image, rather than on a local per-pixel basis.
Say, for instance, that your image has values from 0 to 200, and you want to make it brighter. You can then add values to the image, and what is shown on the screen will be brighter. If you want to enhance the contrast on that image, you would do a multiplication like:
(image_value - original_min)/(original_max - original_min) * (new_max - new_min) + new_min
if you want your new min to be 0 and your new max to be 255, then that equation would stretch the contrast accordingly. The original_min and original_max do not have to be the actual min and max of the entire image, the could be the min and max of a subsection of the image, if you want to enhance a particular area and don't mind clipping values above or below your new_min/new_max.
I suppose if you already know your range and so forth, you could incorporate that formula into a 4x4 matrix to achieve your goal, but only after you've done a pass to find the min and max of the original image.
I would also make sure to uncouple the display of your image from your image data; the above operations are destructive, in that you'll lose information, so you want to keep the original and display a copy.