Calculation screen position from latitude and longitude - c++

I have a video camera at a high building (eg. 10 storey high) looking at an area. The video screen is stream to display on my laptop.
If I know a building’s latitude/longitude, how can I calculate what position (in pixels) I can mark out a cross on my display video to indicate the position of the building?

What you are looking for is Augmented Reality.
You would need to know what the focus of the camera which is/was a non-trivial task. I wrote one many years ago for a professional Sony DVCAM system that used error diffusion pulled from Numerical Recipes book. This took a long time to set up and calibrate but once calibrated the focal length was fixed.
Modern systems use a half-cube system in which software knows the size of the cube and thus can infer the focal length, fov and other aspects of the camera. Again not a trivial operation but a lot quicker than my method. I read a paper by a canadian guy who had a virtual tank driving over objects in photos of Greece IIRC. This was in 2001 so forgive me if the name of the paper slipped my mind.
Another aspect you have to consider is the angle and orientation of the camera you have. While you know the GPS of the camera, you need to know the orientation. You can get cheapo electronic compass that will give you the best-guess direction but what you need is an IMU. They can range from a few quid for Arduino ones to many thousands for industrial. Since you have the pull and rotation of the earth as constants you can infer from the strain placed on the axes of the IMU what its orientation is.
The ideas which were difficult then are probably common place now. Your best bet would be to read through some papers on Augmented Reality.

Related

Estimating camera height, orientation and field of view from a single image

I am hoping someone might be able to point me in the right direction (or let me know if I'm on the right path).
I am trying to build an image editing application that uses computer vision to assist with virtual object insertion - Basically AR but with the constraint of a single monocular image (uncalibrated).
The virtual object insertion will only occur on the ground plane (eg think of a virtual rug on the floor). Because of this (much like AR) I will need to align a virtual camera with the physical camera and composite the rendered virtual scene with the physical image.
I have had success training a semantic segmentation deep CNN to predict the flooring of an indoor scene (which serves as a mask, so that the virtual object, eg rug, is only visible in this area), but I am running into difficulties determining the camera properties.
My intuition is that in order to build a virtual scene that can be composited, the camera calibration properties that I care about is the height of the camera, the pitch, the roll and the field of view (or focal length). Now because this is just for rendering purposes, the estimated values don't need to be super accurate, just close enough that a rendered object doesn't look distorted.
After researching the problem I have come across this paper Single View Metrology In the Wild - It appears to provide an estimate for all of the calibration properties as listed above. With that said, with no training code available, this may end up taking quite a long time for something that may or may not work - although one I am willing to investigate if this is the only option.
Am I missing an obvious approach here? I've read some papers around more traditional CV approaches (eg vanishing points) and some more modern approaches (eg UprightNet), but they usually are missing one of the necessary camera calibration values listed above.
You can get camera parameters by calibrating it according to OpenCV docs. Have you tried that?
Eh, "more modern"... Canoma was released in 1998 IIRC, and was based in part on work done earlier by Paul Devevec at UCB. Both showed that realistic CG insertion on single image was doable with very little/easy user input.
The software calibrated the camera's focal length and pose by having the user trace with the mouse a few boxes or cylinders matching structures in view (e.g. buildings, towers). With a little practice, one or two boxes were all that was needed to get a good solution.

OpenCV triangulatePoints varying distance

I am using OpenCV's triangulatePoints function to determine 3D coordinates of a point imaged by a stereo camera.
I am experiencing that this function gives me different distance to the same point depending on angle of camera to that point.
Here is a video:
https://www.youtube.com/watch?v=FrYBhLJGiE4
In this video, we are tracking the 'X' mark. In the upper left corner info is displayed about the point that is being tracked. (Youtube dropped the quality, the video is normally much sharper. (2x1280) x 720)
In the video, left camera is the origin of 3D coordinate system and it's looking in positive Z direction. Left camera is undergoing some translation, but not nearly as much as the triangulatePoints function leads to believe. (More info is in the video description.)
Metric unit is mm, so the point is initially triangulated at ~1.94m distance from the left camera.
I am aware that insufficiently precise calibration can cause this behaviour. I have ran three independent calibrations using chessboard pattern. The resulting parameters vary too much for my taste. ( Approx +-10% for focal length estimation).
As you can see, the video is not highly distorted. Straight lines appear pretty straight everywhere. So the optimimum camera parameters must be close to the ones I am already using.
My question is, is there anything else that can cause this?
Can a convergence angle between the two stereo cameras can have this effect? Or wrong baseline length?
Of course, there is always a matter of errors in feature detection. Since I am using optical flow to track the 'X' mark, I get subpixel precision which can be mistaken by... I don't know... +-0.2 px?
I am using the Stereolabs ZED stereo camera. I am not accessing the video frames using directly OpenCV. Instead, I have to use the special SDK I acquired when purchasing the camera. It has occured to me that this SDK I am using might be doing some undistortion of its own.
So, now I wonder... If the SDK undistorts an image using incorrect distortion coefficients, can that create an image that is neither barrel-distorted nor pincushion-distorted but something different altogether?
The SDK provided with the ZED Camera performs undistortion and rectification of images. The geometry model is based on the same as openCV :
intrinsic parameters and distortion parameters for both Left and Right cameras.
extrinsic parameters for rotation/translation between Right and Left.
Through one of the tool of the ZED ( ZED Settings App), you can enter your own intrinsic matrix for Left/Right and distortion coeff, and Baseline/Convergence.
To get a precise 3D triangulation, you may need to adjust those parameters since they have a high impact on the disparity you will estimate before converting to depth.
OpenCV gives a good module to calibrate 3D cameras. It does :
-Mono calibration (calibrateCamera) for Left and Right , followed by a stereo calibration (cv::StereoCalibrate()). It will output Intrinsic parameters (focale, optical center (very important)), and extrinsic (Baseline = T[0], Convergence = R[1] if R is a 3x1 matrix). the RMS (return value of stereoCalibrate()) is a good way to see if the calibration has been done correctly.
The important thing is that you need to do this calibration on raw images, not by using images provided with the ZED SDK. Since the ZED is a standard UVC Camera, you can use opencv to get the side by side raw images (cv::videoCapture with the correct device number) and extract Left and RIght native images.
You can then enter those calibration parameters in the tool. The ZED SDK will then perform the undistortion/rectification and provide the corrected images. The new camera matrix is provided in the getParameters(). You need to take those values when you triangulate, since images are corrected as if they were taken from this "ideal" camera.
hope this helps.
/OB/
There are 3 points I can think of and probably can help you.
Probably the least important, but from your description you have separately calibrated the cameras and then the stereo system. Running an overall optimization should improve the reconstruction accuracy, as some "less accurate" parameters compensate for the other "less accurate" parameters.
If the accuracy of reconstruction is important to you, you need to have a systematic approach to reducing it. Building an uncertainty model, thanks to the mathematical model, is easy and can write a few lines of code to build that for you. Say you want to see if the 3d point is 2 meters away, at a particular angle to the camera system, and you have a specific uncertainty on the 2d projections of the 3d point, it's easy to backproject the uncertainty to the 3d space around your 3d point. By adding uncertainty to the other parameters of the system then you can see which ones are more important and need to have lower uncertainty.
This inaccuracy is inherent in the problem and the method you're using.
First if you model the uncertainty you will see the reconstructed 3d points further away from the center of cameras have a much higher uncertainty. The reason is that the angle <left-camera, 3d-point, right-camera> is narrower. I remember the MVG book had a good description of this with a figure.
Second, if you look at the implementation of triangulatePoints you see that the pseudo-inverse method is implemented using SVD to construct the 3d point. That can lead to many issues, which you probably remember from linear algebra.
Update:
But I consistently get larger distance near edges and several times
the magnitude of the uncertainty caused by the angle.
That's the result of using pseudo-inverse, a numerical method. You can replace that with a geometrical method. One easy method is to back-project the 2d-projections to get 2 rays in 3d space. Then you want to find where the intersect, which doesn't happen due to the inaccuracies. Instead you want to find the point where the 2 rays have the least distance. Without considering the uncertainty you will consistently favor a point from the set of feasible solutions. That's why with pseudo inverse you don't see any fluctuation but a gross error.
Regarding the general optimization, yes, you can run an iterative LM optimization on all the parameters. This is the method used in applications like SLAM for autonomous vehicles where accuracy is very important. You can find some papers by googling bundle adjustment slam.

Detect person in bed

Suppose I want to find out if there is a person in a bed or not using cameras and computer vision algorithms. One can assume that the camera provides RGB, infrared and depth data.
I don't really have a good idea how to solve this. So far I came up with this:
Estimate a plane using RANSAC of the bed object. This plane should be further away from the ground plane, if there is a person in the bed. This seems very unstable though, assumes that the normal height of a bed is known and can easily be broken if the bed has an adjustable head part (e.g. in a hospital)
Face detection. Try to detect a face in the bed. Probably also isn't very reliable since the face can be sideways to the camera and partly covered.
Use the infrared-image. I am not sure how much you would see through the blanket and what would happen if the person just left the bed and the bed is still warm?
Is there a good way to do this? Or, to be reliable, you would have to use pressure sensors in the bed?
Thanks!
I dont know about infrared images but for camera based video processing this kind of problem is widely studied.
If your problem is to detect a person in a bed which is "Normally empty" then I think the simplest algorithm would be to capture successive frames and calculate their difference.
The existence of human in the frame would make it different from a frame capturing only empty bed. Depending on various algorithms like this you would get different reliability.
Otherwise you can go directly for human detection in video frames. One possible algorithm is described here.
Edit:
Your problem is harder than i thought. The following approach might solve the cases.
The main idea is to use bunch of features at once to get higher accuracy and remove false positives.
Use HOG person detector at top level to detect a person's entry in the scene. If the position of the possible entry doors are known or detectable using edge lines in the scene use it to increase accuracy. (At the point of entry the diference in successive frames will be located near the doors)
Use Edge lines to track the human. And use the bed edges to track the position of the human. The edges of human should be bounded by the edges of the bed.
If the difference is located within the bed implies human is in the bed but moving.
If needed as a preprocessing step include analysis of texture, connected component to remove possible moving objects in the room for higher accuracy (for example:- movement of clothes because of air).
Also use face detectors to increase accuracy.
Infrared that camera uses has a different frequency than infrared signal from a warm object. Unless you are using military grade IR scanners you can forget about connection IR-warmth. But IR is still useful if there is limited light or you use it for depth maps.
Go with depth (Kinect style) and estimate bed as a segment at your image. It should have some features in depth (certain dimension, flatness, etc). The bed usually surrounded by walls or floor that are easy to segment out. You algorithm can also be tuned to the distance to the bed and cut it out based just on depth range.
As other people said, it will be useful to learn more about your particular goal or application. What is background or environment around the bed? how does it looks when there is no person in it? Can a person simulate his/her presence(as in prison escape scenario), etc. etc.

Calibrating camera with a glass-covered checkerboard

I need to find intrinsic calibration parameters of a single. To do this I take several images of checkerboard patten from different angles and then use calibration software.
To make the calibration pattern as flat as possible, I print it on a paper and cover with a 3mm glass. Obviously image of the pattern is modified by glass, because it has a different refraction coefficient compared to air.
Extrinsic parameters will be distorted by the glass. This is because checkerboard is not in place we see it in. However, if thickness of the glass and refraction coefficients of glass and air are known, it seems to be possible to recover extrinsic parameters.
So, the questions are:
Can extrinsic parameters be calculated, and if yes, then how? (This is not necessary right now, just an interesting theoretical question)
Are intrinsic calibration parameters obtained from these images equivalent to ones obtained from a usual calibration procedure (without cover glass)?
By using a glass, calibration parameters as reported by GML Camera Calibration Toolbox (based on OpenCV), become much more accurate. (Does it make any sense at all?) But this approach has a little drawback - unwanted reflections, especially from light sources.
I commend you on choosing a very flat support (which is what I recommend myself here). But, forgive me for asking the obvious question, why did you cover the pattern with the glass?
Since the point of the exercise is to ensure the target's planarity and nothing else, you might as well glue the side opposite to the pattern of the paper sheet and avoid all this trouble. Yes, in time the pattern will get dirty and worn and need replacement. So you just scrape it off and replace it: printing checkerboards is cheap.
If, for whatever reasons, you are stuck with the glass in the front, I recommend doing first a back-of-the-envelope calculation of the expected ray deflection due to the glass refraction, and check if it is actually measurable by your apparatus. Given the nominal focal length in mm of the lens you are using and the physical width and pixel density of the sensor, you can easily work it out at the image center, assuming an "extreme" angle of rotation of the target w.r.t the focal axis (say, 45 deg), and a nominal distance. To a first approximation, you may model the pattern as "painted" on the glass, and so ignore the first refraction and only consider the glass-to-air one.
If the above calculation suggests that the effect is measurable (deflection >= 1 pixel), you will need to add the glass to your scene model and solve for its parameters in the bundle adjustment phase, along with the intrinsics and extrinsics. To begin with, I'd use two parameters, thickness and refraction coefficient, and assume both faces are really planar and parallel. It will just make the computation of the corner projections in the cost function a little more complicated, as you'll have to take the ray deflection into account.
Given the extra complexity of the cost function, I'd definitely write the model's code to use Automatic Differentiation (AD).
If you really want to go through this exercise, I'd recommend writing the solver on top of Google Ceres bundle adjuster, which supports AD, among many nice things.

opengl modelling rocket flame and vapour trails with particles

Does anyone have any guidance for coding an approximation for the particle stream coming out of a jet engine (with afterburner) in opengl using particles drawing using vertex buffers / 4f color buffers?
I believe there are two aspects to this problem:
The colour of the light as particles exit the jet engine as a function of temperature and some constants relating to the type of gas being burnt. This article leads me to believe I will need some sort of array for the temperature / color conversion curve. Apprently hydrogen burns at 2,660C in oxygen and 2,045C in air whereas jet fuel burns at 287.5C in air. (but the temperature of jet fighter afterburner can reach 1700C somehow)
The vapour trail behind the rocket / jet which will be either white with alpha for the water base vapour trail if the rocket is in atmosphere. Also I believe my assumption is correct that this would not be necessary for a rocket burning fuel in space. The vapour trail will simulated as tiny water droplets which are much larger than the wavelength of visible light, so they would scatter light achromatically. As water itself is colorless the resulting color would be white?
Also I am looking to model this from a birds eye perspective so it does not need to be a full 3D model. So the positions of the 10 or so pilot lights around the afterburner cone for example could just be approximated as maybe 5 linear points.
Depending on the level of detail you require, you may want to simple use a textured cone coming out of the yet engine. If you want to go for a full-blown particle system (which for a jet engine does not appear necessary to me) then you might want to give each particle on the stack a bunch of properties like speed (vec3), size, gas type and age.
Make a loop to process each particle each time your game loop goes around. For each tick, your simulation would then change the speed and size as the particle gets older. You should make a functions that determines the look of the particle according to its age and gas type.
At its simplest, this could make colored particles fade, enlarge and speed down as it gets older. Is this what you are looking for?