I'm planning on creating an app that does something like this: http://www.zonetrigger.com/articles/Kinect-software/
That means, I want to be able to set up "Trigger Zones" using the Kinect and it's 3d Image. Now I know that Microsoft is stating that the Kinect can detect the skeleton of up to 6 People.
For me however, it would be enough to detect whether something is entering a trigger zone and where.
Does anyone know if the Kinect can be programmed to function as a simple Motion Sensor, so it can detect more than 6 entries?
It is well known that Kinect cannot detect more than 5 entries (just kidding). All you need to do is to get a depth map (z-map) from Kinect, and then convert it into a 3d map using these formulas,
X = (((cols - cap_width) * Z ) / focal_length_X);
Y = (((row - cap_height)* Z ) / focal_length_Y);
Z = Z;
Where row and col are calculated from the image center (not upper left corner!) and focal is a focal length of Kinect in pixels (~570). Now you can specify the exact locations in 3D where if the pixels appear, you can do whatever you want to do. Here are more pointers:
You can use openCV for the ease of visualization. To read a frame from Kinect after it was initialized you just need something like this:
Mat inputMat = Mat(h, w, CV_16U, (void*) depth_gen.GetData());
You can easily visualize depth maps using histogram equalization (it will optimally spread 10000 Kinect levels among your available 255 levels of grey)
It is sometimes desirable to do object segmentation grouping spatially close pixels with similar depth together. I did this several years ago, see this but had to delete the floor and/or common surface on which object stayed otherwise all the object were connected and extracted as a single large segment.
Related
I have a dataset of a single class (rectangular object) with a size of 130 images. My goal is to detect the object & draw a circle/dot/mark in the centre of the object.
Because the objects are rectangular, my idea is to get the dimensions of the predicted bounding box and take the circle/dot/mark as (width/2, height/2).
However, if I were to do transfer learning, would YOLO be a good choice to detect a single class of objects in a small dataset?
YOLO should be fine. However it is old now. Try YoloV4 for better results.
People have tried transfer learning from FasterRCNN to detect single objects with 300 images and it worked fine. (Link). However 130 images is a bit smaller. Try augmenting images - flipping, rotating etc if you get inferior results.
Use same augmentation for annotation as well while doing translation, rotation, flip augmentations. For example in pytorch, for segmentation, I use:
if random.random()<0.5: # Horizontal Flip
image = T.functional.hflip(image)
mask = T.functional.hflip(mask)
if random.random()<0.25: # Rotation
rotation_angle = random.randrange(-10,11)
image = T.functional.rotate(image,angle = rotation_angle)
mask = T.functional.rotate(mask ,angle = rotation_angle)
For bounding box you will have to create coordinates, x becomes width-x for horizontal flip.
Augmentations where object position is not changing: do not change annotations e.g.: gamma intensity transformation
What I need
I'm currently working on an augmented reality kinda game. The controller that the game uses (I'm talking about the physical input device here) is a mono colored, rectangluar pice of paper. I have to detect the position, rotation and size of that rectangle in the capture stream of the camera. The detection should be invariant on scale and invariant on rotation along the X and Y axes.
The scale invariance is needed in case that the user moves the paper away or towards the camera. I don't need to know the distance of the rectangle so scale invariance translates to size invariance.
The rotation invariance is needed in case the user tilts the rectangle along its local X and / or Y axis. Such a rotation changes the shape of the paper from rectangle to trapezoid. In this case, the object oriented bounding box can be used to measure the size of the paper.
What I've done
At the beginning there is a calibration step. A window shows the camera feed and the user has to click on the rectangle. On click, the color of the pixel the mouse is pointing at is taken as reference color. The frames are converted into HSV color space to improve color distinguishing. I have 6 sliders that adjust the upper and lower thresholds for each channel. These thresholds are used to binarize the image (using opencv's inRange function).
After that I'm eroding and dilating the binary image to remove noise and unite nerby chunks (using opencv's erode and dilate functions).
The next step is finding contours (using opencv's findContours function) in the binary image. These contours are used to detect the smallest oriented rectangles (using opencv's minAreaRect function). As final result I'm using the rectangle with the largest area.
A short conclusion of the procedure:
Grab a frame
Convert that frame to HSV
Binarize it (using the color that the user selected and the thresholds from the sliders)
Apply morph ops (erode and dilate)
Find contours
Get the smallest oriented bouding box of each contour
Take the largest of those bounding boxes as result
As you may noticed, I don't make an advantage of the knowledge about the actual shape of the paper, simply because I don't know how to use this information properly.
I've also thought about using the tracking algorithms of opencv. But there were three reasons that prevented me from using them:
Scale invariance: as far as I read about some of the algorithms, some don't support different scales of the object.
Movement prediction: some algorithms use movement prediction for better performance, but the object I'm tracking moves completely random and therefore unpredictable.
Simplicity: I'm just looking for a mono colored rectangle in an image, nothing fancy like car or person tracking.
Here is a - relatively - good catch (binary image after erode and dilate)
and here is a bad one
The Question
How can I improve the detection in general and especially to be more resistant against lighting changes?
Update
Here are some raw images for testing.
Can't you just use thicker material?
Yes I can and I already do (unfortunately I can't access these pieces at the moment). However, the problem still remains. Even if I use material like cartboard. It isn't bent as easy as paper, but one can still bend it.
How do you get the size, rotation and position of the rectangle?
The minAreaRect function of opencv returns a RotatedRect object. This object contains all the data I need.
Note
Because the rectangle is mono colored, there is no possibility to distinguish between top and bottom or left and right. This means that the rotation is always in range [0, 180] which is perfectly fine for my purposes. The ratio of the two sides of the rect is always w:h > 2:1. If the rectangle would be a square, the range of roation would change to [0, 90], but this can be considered irrelevant here.
As suggested in the comments I will try histogram equalization to reduce brightness issues and take a look at ORB, SURF and SIFT.
I will update on progress.
The H channel in the HSV space is the Hue, and it is not sensitive to the light changing. Red range in about [150,180].
Based on the mentioned information, I do the following works.
Change into the HSV space, split the H channel, threshold and normalize it.
Apply morph ops (open)
Find contours, filter by some properties( width, height, area, ratio and so on).
PS. I cannot fetch the image you upload on the dropbox because of the NETWORK. So, I just use crop the right side of your second image as the input.
imgname = "src.png"
img = cv2.imread(imgname)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
## Split the H channel in HSV, and get the red range
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
h,s,v = cv2.split(hsv)
h[h<150]=0
h[h>180]=0
## normalize, do the open-morp-op
normed = cv2.normalize(h, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8UC1)
kernel = cv2.getStructuringElement(shape=cv2.MORPH_ELLIPSE, ksize=(3,3))
opened = cv2.morphologyEx(normed, cv2.MORPH_OPEN, kernel)
res = np.hstack((h, normed, opened))
cv2.imwrite("tmp1.png", res)
Now, we get the result as this (h, normed, opened):
Then find contours and filter them.
contours = cv2.findContours(opened, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
print(len(contours))[-2]
bboxes = []
rboxes = []
cnts = []
dst = img.copy()
for cnt in contours:
## Get the stright bounding rect
bbox = cv2.boundingRect(cnt)
x,y,w,h = bbox
if w<30 or h < 30 or w*h < 2000 or w > 500:
continue
## Draw rect
cv2.rectangle(dst, (x,y), (x+w,y+h), (255,0,0), 1, 16)
## Get the rotated rect
rbox = cv2.minAreaRect(cnt)
(cx,cy), (w,h), rot_angle = rbox
print("rot_angle:", rot_angle)
## backup
bboxes.append(bbox)
rboxes.append(rbox)
cnts.append(cnt)
The result is like this:
rot_angle: -2.4540319442749023
rot_angle: -1.8476102352142334
Because the blue rectangle tag in the source image, the card is splited into two sides. But a clean image will have no problem.
I know it's been a while since I asked the question. I recently continued on the topic and solved my problem (although not through rectangle detection).
Changes
Using wood to strengthen my controllers (the "rectangles") like below.
Placed 2 ArUco markers on each controller.
How it works
Convert the frame to grayscale,
downsample it (to increase performance during detection),
equalize the histogram using cv::equalizeHist,
find markers using cv::aruco::detectMarkers,
correlate markers (if multiple controllers),
analyze markers (position and rotation),
compute result and apply some error correction.
It turned out that the marker detection is very robust to lighting changes and different viewing angles which allows me to skip any calibration steps.
I placed 2 markers on each controller to increase the detection robustness even more. Both markers has to be detected only one time (to measure how they correlate). After that, it's sufficient to find only one marker per controller as the other can be extrapolated from the previously computed correlation.
Here is a detection result in a bright environment:
in a darker environment:
and when hiding one of the markers (the blue point indicates the extrapolated marker postition):
Failures
The initial shape detection that I implemented didn't perform well. It was very fragile to lighting changes. Furthermore, it required an initial calibration step.
After the shape detection approach I tried SIFT and ORB in combination with brute force and knn matcher to extract and locate features in the frames. It turned out that mono colored objects don't provide much keypoints (what a surprise). The performance of SIFT was terrible anyway (ca. 10 fps # 540p).
I drew some lines and other shapes on the controller which resulted in more keypoints beeing available. However, this didn't yield in huge improvements.
I need to detect this ball: and find its position and radius using opencv. I have downloaded many codes, but neither of them works. Any helps are highly appreciated.
I see you have quite a setup installed. As mentioned in the comments, please make sure that you have appropriate lighting to capture the ball, as well as making the ball distinguishable from it's surroundings by painting it a different colour.
Once your setup is optimized for detection, you may proceed via different ways to track your ball (stationary or not). A few ways may be:
Feature detection : Via Hough Circles, detect 2D circles (and their radius) that lie within a certain range of color, as explained below
There are many more ways to detect objects via feature detection, such as this clever blog points out.
Object Detection: Via SURF, SIFT and many other methods, you may detect your ball, calculate it's radius and even predict it's motion.
This code uses Hough Circles to compute the ball position, display it in real time and calculate it's radius in real time. I am using Qt 5.4 with OpenCV version 2.4.12
void Dialog::TrackMe() {
webcam.read(cim); /*call read method of webcam class to take in live feed from webcam and store each frame in an OpenCV Matrice 'cim'*/
if(cim.empty()==false) /*if there is something stored in cim, ie the webcam is running and there is some form of input*/ {
cv::inRange(cim,cv::Scalar(0,0,175),cv::Scalar(100,100,256),cproc);
/* if any part of cim lies between the RGB color ranges (0,0,175) and (100,100,175), store in OpenCV Matrice cproc */
cv::HoughCircles(cproc,veccircles,CV_HOUGH_GRADIENT,2,cproc.rows/4,100,50,10,100);
/* take cproc, process the output to matrice veccircles, use method [CV_HOUGH_GRADIENT][1] to process.*/
for(itcircles=veccircles.begin(); itcircles!=veccircles.end(); itcircles++)
{
cv::circle(cim,cv::Point((int)(*itcircles)[0],(int)(*itcircles)[1]), 3, cv::Scalar(0,255,0), CV_FILLED); //create center point
cv::circle(cim,cv::Point((int)(*itcircles)[0],(int)(*itcircles)[1]), (int)(*itcircles)[2], cv::Scalar(0,0,255),3); //create circle
}
QImage qimgprocess((uchar*)cproc.data,cproc.cols,cproc.rows,cproc.step,QImage::Format_Indexed8); //convert cv::Mat to Qimage
ui->output->setPixmap(QPixmap::fromImage(qimgprocess));
/*render QImage to screen*/
}
else
return; /*no input, return to calling function*/
}
How does the processing take place?
Once you start taking in live input of your ball, the frame captured should be able to show where the ball is. To do so, the frame captured is divided into buckets which are further divides into grids. Within each grid, an edge is detected (if it exists) and thus, a circle is detected. However, only those circles that pass through the grids that lie within the range mentioned above (in cv::Scalar) are considered. Thus, for every circle that passes through a grid that lies in the specified range, a number corresponding to that grid is incremented. This is known as voting.
Each grid then stores it's votes in an accumulator grid. Here, 2 is the accumulator ratio. This means that the accumulator matrix will store only half as many values as resolution of input image cproc. After voting, we can find local maxima in the accumulator matrix. The positions of the local maxima are corresponding to the circle centers in the original space.
cproc.rows/4 is the minimum distance between centers of the detected circles.
100 and 50 are respectively the higher and lower threshold passed to the canny edge function, which basically detects edges only between the mentioned thresholds
10 and 100 are the minimum and maximum radius to be detected. Anything above or below these values will not be detected.
Now, the for loop processes each frame captured and stored in veccircles. It create a circle and a point as detected in the frame.
For the above, you may visit this link
Am trying to use OPENCV to detect the shift in consecutive video frames when the camera is unstable and moving real time as shown in the picture.. To compensate the effect of shaking or changing in the angle I want to match some objects in the image as example the clock and from the center of the same object in the consecutive frames I can detect the shift value and compensate its effect. I don't know the way to do this real time or how many ways are available and accurate to do this.
Thank you in advance and I hope my question is clear.
This is a fairly standard operation, as it's actively used in MPEG-4 compression. It's called "motion estimation" and you don't do it on objects (too hard, requires image segmentation). In OpenCV, it's covered under Video Stabilization
If you want to try writing code yourself then one method is to first of all crop the frame to produce a sub image of your actual image slightly smaller than your actual image along each dimension. This will give you some room to move.
Next you want to be able to find and track shapes in OpenCV - an example of code is here - http://opencv-srf.blogspot.co.uk/2011/09/object-detection-tracking-using-contours.html - Play around until you get a few geometric primitive shapes coming up on each frame.
Next you want to build some vectors between the centres of each shape - these are what will determine the movement of the camera - if in the next frame most of the vectors are displaced but parallel that is a good indicator that the camera has moved.
The last step is to calculate the displacement, which should is matter of measuring the distance between detected parallel vectors. If this is smaller than your sub-image cropping then you can crop the original image to negate the displacement.
The pseudo code for each iteration would be something like -
//Variables
image wholeFrame1, wholeFrame2, subImage, shapesFrame1, shapesFrame2
vectorArray vectorsFrame1, vectorsFrame2; parallelVectorList
vector cameraDisplacement = [0,0]
//Display image
subImage = cropImage(wholeFrame1, cameraDisplacement)
display(subImage);
//Find shapes to track
shapesFrame1 = findShapes(wholeFrame1)
shapesFrame2 = findShapes(wholeFrame2)
//Store a list of parallel vectors
parallelVectorList = detectParallelVectors(shapesFrame1, shapesFrame2)
//Find the mean displacement of each pair of parallel vectors
cameraDisplacement = meanDisplacement(parallelVectorList)
//Crop the next image accounting for camera displacement
subImage = cropImage(wholeFrame1, cameraDisplacement)
There are better ways of doing it but this would be easy enough for someone doing their first attempt at image stabilisation with experience of OpenCV.
I have a Qt app where I have to find the HSV range of a couple of pixels around click coordinates, to track later on. This is how I do it:
cv::Mat temp;
cv::cvtColor(frame, temp, CV_BGR2HSV); //frame is pulled from a video or jpeg
cv::Vec3b hsv=temp.at<cv::Vec3b>(frameX,frameY); //sometimes SIGSEGV?
qDebug() << hsv.val[0]; //look up H
qDebug() << hsv.val[1]; //look up S
qDebug() << hsv.val[2]; //look up V
//just base values so far, will work on range later
emit hsvDownloaded(hsv.val[0], hsv.val[0]+5, hsv.val[1], 255, hsv.val[2], 255); //send to GUI which automaticly updates worker thread
Now, things are odd. Those are the results (red circle indicates the click location):
With red it's weird, upper half of the shape is detected correctly, lower half is not, despite it being a solid mass of the same colour.
And for an actual test
It detects HSV {95,196,248} which is frankly absurd (base values way too high). None of the pixels that were detected isn't even the one that was clicked. The best values to detect that ball 100% of the time are H:35-141 S:0-238 V:65-255. I've wanted to get a HSV range from a normalized histogram, but I can't even get the base values right. What's up? When OpenCV pulls a frame using kalibrowanyPlik.read(frame); , the default colour scheme is BGR, right?
Why would the colour detection work so randomly?
As berak has mentioned, your code looks like you've used the indices to access pixel in the wrong order.
That means your pixel locations are wrong, except for pixel that lie on the diagonal, so clicked objects that are around the diagonal will be detected correctly, while all the others won't.
To not get confused again and again, I want you to understand why OpenCV uses (row,col) ordering for indices:
OpenCV uses matrices to represent images. In mathematics, 2D matrices use (row,col) indexing, have a look at http://en.wikipedia.org/wiki/Index_notation#Two-dimensional_arrays and watch at the indices. So for matrices, it is typical to use the row index first, followed by the column index.
Unfortunately, images and pixel typically have a (x,y) indexing, which corresponds to x/y axis/direction in mathematical graphs and coordinate systems. So here the x position is used first, followed by the y position.
Luckily, OpenCV provides two different versions of .at method, one to access pixel-positions and one to access matrix elements (which are exactly the same elements in the end).
matrix.at<type>(row,column) // matrix indexing to access elements
// which equals
matrix.at<type>(y,x)
and
matrix.at<type>(cv::Point(x,y)) // pixel/position indexing to access elements
since the first version should be slightly more efficient it should be preferred if the positions aren't already given as cv::Point objects. So the best way often is to remember, that openCV uses matrices to represent images and it uses matric index notations to access elements.
btw, I've seen people wondering why matrix.at<type>(cv::Point(y,x)) doesn't work the way intended after they've learned that openCV images use the "wrong ordering". I hope this question doesn't come up after my explanation.
one more btw: in school I already wondered, why matrices index rows first, while graphs of functions index x axis first. I found it stupid to not use the "same" ordering for both but I still had to live with it :D (and at the end, both don't have much to do with the other)