How to find x,y coordinates of an object detected using yolov3? - computer-vision

I need to find the x,y coordinates of the object detected using YoloV3 in real time. For example I'm doing real time object detection using my computer camera. I'm taking the camera as reference point say (0,0). As I am moving the object, the coordinates need to change. So basically I want to get the x,y coordinates with respect to my camera.
Any help would be much appreciated. Thanks in advance.

If you want to get the coordinates of one detection you can use -ext_output flag of AlexeyAB/darknet code:
darknet detector test data/obj.data cfg/yolov4.cfg yolov4.weights -ext_output data/person.jpg
Or you can save it directly on a text file:
darknet detector test data/obj.data cfg/yolov4.cfg yolov4.weights -ext_output data/person.jpg > output.txt
But if you have a big number of images, you need to detect them all at once and save it on a JSON file:
darknet detector test data/obj.data cfg/yolov4.cfg yolov4.weights -ext_output -out train.json < data/train.txt
Train.json is the JSON file where the results are saved.

Related

Mediapipe hands python result does not have multi_hand_world_landmarks

I try to use MediaPipe in Python.
It works fine, but the result of hands.process() has multi_hand_world_landmarks, and does not have multi_hand_world_landmarks and I get
AttributeError: type object 'SolutionOutputs' has no attribute 'multi_hand_world_landmarks'
Why?
The comment in the source
https://github.com/google/mediapipe/blob/master/mediapipe/python/solutions/hands.py
says the result must have this property:
Returns:
A NamedTuple object with the following fields:
1) a "multi_hand_landmarks" field that contains the hand landmarks on
each detected hand.
2) a "multi_hand_world_landmarks" field that contains the hand landmarks
on each detected hand in real-world 3D coordinates that are in meters
with the origin at the hand's approximate geometric center.
3) a "multi_handedness" field that contains the handedness (left v.s.
right hand) of the detected hand.
May be it appears in pictures with a special background only?
Hello I met the same problem and then I checked the hands.py in my local files. I found that there is no "multi_hand_world_landmarks" in output definition. I'm trying to update to the latest mediapipe to solve it.

Get the polygon coordinates of predicted output mask in YOLACT/YOLACT++

I am using Yolact https://github.com/dbolya/yolact ,an instance segmentation algorithm which outputs the test image with a mask on the detected object. As the input images are given with the coordinates of polygons around the input classes in the annotations.json, I want to get an output like this. But I can't figure out how to extract the coordinates of those contours/polygons.
As far as I understood from this script https://github.com/dbolya/yolact/blob/master/eval.py the output is list of tensors for detected objects. It contains classes, scores, boxes and mask for evaluated image. The eval.py script returns recognized image with all this information. Recognition is saved in 'preds' in evalimg function (line 595), and post-processing of predict result is in the "def prep_display" (line 135)
Now how do I extract those polygon coordinates and save it in .JSON file or whatever else?
I also tried to look at these but couldn't figure out sadly!
https://github.com/dbolya/yolact/issues/286
and
https://github.com/dbolya/yolact/issues/256
You need to create a complete post-processing pipeline that is specific to your task. Here's small pseudocode that could be added to the prep_disply() in eval.py
with timer.env('Copy'):
if cfg.eval_mask_branch:
# Add the below line to get all the predicted objects as a list
all_objects_mask = t[3][:args.top_k]
# Convert each object mask to binary and then
# Use OpenCV's findContours() method to extract the contour points for each object

How to improve accuracy of estimateAffine2D (or estimageRigidTransform) in OpenCV?

I have two sets of points, one from time t-1 and current time t. The first set was generated using goodFeaturesToTrack, and the latter from using calcOpticalFlowPyrLK(). Using these two sets of points, I then estimate a transformation matrix via estimateAffine2DPartial() in order to keep track of its scale & rotation. Code snippet is listed below:
// Precompute image pyramids
maxLvl = cv::buildOpticalFlowPyramid(_imgPrev, imPyr1, _winSize, maxLvl, true);
maxLvl = cv::buildOpticalFlowPyramid(tmpImg, imPyr2, _winSize, maxLvl, true);
// Optical flow call for tracking pixels
cv::calcOpticalFlowPyrLK(imPyr1, imPyr2, _currentPoints, nextPts, status, err, _winSize, maxLvl, _terminationCriteria, 0, 0.000001);
// Get transformation matrix between the two data sets
cv::Mat H = cv::estimateAffinePartial2D(_currentPoints, nextPts, inlier_mask, cv::RANSAC, 10.0, 2000, 0.99);
Using H, I then map my masking points using perspectiveTransform(). The result seems accurate for the first few dozen frames until I notice some drift (in terms of rotation) occurring when the object I am tracking continues to rotate (usually when rotation becomes > M_PI). I'm honestly stumped on where the culprit is, but my main suspicion is perhaps my window size for optical flow might be too small, or too big. However, tweaking the window size did not seem to help, the position of my object is still accurate, but the estimated rotation (and scale) got worse. Can anyone hope to shed a light on this?
Warm regards and thanks.
EDIT: Images attached to show drift issue
Starting Frame
First few frames -- Rotation OK
Z-Rotation Drift occurs -- see anchor line has drifted towards the red rectangle.
Lucas Kanade tracker needs more features. Guess the tracking template you provided is not good enough.
(1) Try with other feature rich real images? e.g Opencv feautre tracking template image
(2) fix scale. Since you are doing simulation, you can try to anchor the size first.
calcOpticalFlowPyrLK is widely used in visual inertial state estimation studies. such as Semi direct visual odometry or VINSMONO. You can try to find the code inside those project to see how other people is playing with the feature and parameters

how to produce glare on an image with opencv

Is there a way to produce a glare on an image? Given an image with an object, I want to produce a glare on a portion of an image. If I have an image that is 256x256, I want to produce glare on the first 64x64 patch. Is there a function in opencv I can use for that? If not, what is a good way to go about this problem?
I think that this example does what you need. Each time it saves a face, it gives a flash in the part of the screen where the face was recognised. So, the glares changes every time of place and size.
You can found it here:
https://github.com/MasteringOpenCV/code/tree/master/Chapter8_FaceRecognition
Seek this part in the main.cpp:
// Make a white flash on the face, so the user knows a photo has been taken.
Mat displayedFaceRegion = displayedFrame(faceRect);
displayedFaceRegion += CV_RGB(90,90,90);

Where to get Settings for camera calibration in opencv?

I am doing an AR-Project for school and after some struggle i was able to build opencv with aruco and detect markers. Now I need to calibrate the camera for Pose Estimation. I am using this tutorial.
Now there is stated I have to "Read the Setting" from a xml-file. Where do I find this file? Or do I have to make one myself, if yes how?
Also I want to use a standard chess board for the calibration (I have no printer...). Is this possible and do I have to input the size of this board anywhere?
This is the link to the input.xml that you have to create "https://github.com/Itseez/opencv/blob/master/samples/cpp/tutorial_code/calib3d/camera_calibration/in_VID5.xml". In this file you will notice you have to give size of square on line 9.
Additionally you will also have to create VID5.xml whose path you provide in input.xml on line19. It should have your images that you use to calibrate.
<?xml version="1.0"?>
<opencv_storage>
<images>
images/CameraCalibration/VID5/xx1.jpg
images/CameraCalibration/VID5/xx2.jpg
images/CameraCalibration/VID5/xx3.jpg
images/CameraCalibration/VID5/xx4.jpg
images/CameraCalibration/VID5/xx5.jpg
images/CameraCalibration/VID5/xx6.jpg
images/CameraCalibration/VID5/xx7.jpg
images/CameraCalibration/VID5/xx8.jpg
</images>
</opencv_storage>
Try using standard chessboard make sure the surface is flat and has no irregularity if the calibration error and re projection error is between 0 to 1 you can use the intrinsic and extrinsic parameters for your project.