Tile Overlay for Android maps v2: How to map latitude and longitude to a pixel location - android-maps-v2

I'm building an image for an Android maps v2 TileOverlay. Say, for example, that I want a pixel that will end up at a particular latitude and longitude in the tile, to be a particular color. How do I know what latitudes and longitudes in the tile will map to which pixels in my image?
What is the mapping from {latInTile,lngInTile} to {pixelLocationX, pixelLocationY} for a tile?

This is exactly what a "projection" is. Google uses the Mercator Projection, described here https://en.wikipedia.org/wiki/Mercator_projection.
The mapping is:
pixelLocationX = longitude
pixelLocationY = ln(tan(latitude/2 + Pi/4))
where longitude and latitude are in radians. Pixel locations then have to be scaled to whatever units you need.


Building an object detector for a small dataset with a single class

I have a dataset of a single class (rectangular object) with a size of 130 images. My goal is to detect the object & draw a circle/dot/mark in the centre of the object.
Because the objects are rectangular, my idea is to get the dimensions of the predicted bounding box and take the circle/dot/mark as (width/2, height/2).
However, if I were to do transfer learning, would YOLO be a good choice to detect a single class of objects in a small dataset?
YOLO should be fine. However it is old now. Try YoloV4 for better results.
People have tried transfer learning from FasterRCNN to detect single objects with 300 images and it worked fine. (Link). However 130 images is a bit smaller. Try augmenting images - flipping, rotating etc if you get inferior results.
Use same augmentation for annotation as well while doing translation, rotation, flip augmentations. For example in pytorch, for segmentation, I use:
if random.random()<0.5: # Horizontal Flip
image = T.functional.hflip(image)
mask = T.functional.hflip(mask)
if random.random()<0.25: # Rotation
rotation_angle = random.randrange(-10,11)
image = T.functional.rotate(image,angle = rotation_angle)
mask = T.functional.rotate(mask ,angle = rotation_angle)
For bounding box you will have to create coordinates, x becomes width-x for horizontal flip.
Augmentations where object position is not changing: do not change annotations e.g.: gamma intensity transformation

Google Cloud AutoML Object Detection export CSV object positions

I've labelled objects on images with Google Cloud AutoML label tool. Than I've exported csv file. Here is the output:
On the beauty, it's like that:
I know first three columns.
I'll increase the images count by making data augmentation. I'll use OpenCV in Python for that. But I need coordinates of objects on the image.
How can I convert these decimals to pixel coordinations? Or is there any calculation for that?
These are called a NormalizedVertex.
A vertex represents a 2D point in the image. The normalized vertex coordinates are between 0 to 1 fractions relative to the original plane (image, video). E.g. if the plane (e.g. whole image) would have size 10 x 20 then a point with normalized coordinates (0.1, 0.3) would be at the position (1, 6) on that plane.
To get a pixel coordinate, you can multiply that number by your input width or length as appropriate.
The entire reference for the CSV formatting explains the following (truncated) makes up each row (one row per bounding box or per image):
TRAIN - Which set to assign the content in this row to
gs://optik-vcm/... - Google Cloud Storage URI
kenarcizgi - A label that identifies how the object is categorized
A bounding box for an object in the image:
x_relative_min, y_relative_min, x_relative_max, y_relative_min, x_relative_max, y_relative_max, x_relative_min, y_relative_max

Convert kinect depth intensity to distance in meter

I'm working on kinect v1 depth images.
How do I find the distance in meters for the corresponding depth intensity value of each pixel?
The intensity value ranges from 0-255 since it is a grayscale image and i don't have the raw depth data.
I've tried various ways to get the distance, such as using the following formulas:
- 1.0 / (raw_depth * -0.0030711016 + 3.3309495161)
- 0.1236 * tan(rawDisparity / 2842.5 + 1.1863)
I've also tried to get the raw data by using:
raw = (255 - depthIntensity)/256*2047
How do i solve this problem?
The Kinect actually sends something akin to a disparity image over USB. Both OpenNI and libfreenect are capable of converting that to a depth image using parameters reported by the device (baseline, focal length, and distance to reference plane, IIRC). e.g CV_CAP_PROP_OPENNI_BASELINE
In math form below, it is how we find the depth based on disparity
Depth = Baseline * focal length / disparity. The depth should be corresponding to the Z axis of current image frame.

Extracting 3D coordinates given 2D image points, depth map and camera calibration matrices

I have a set of 2D image keypoints that are outputted from the OpenCV FAST corner detection function. Using an Asus Xtion I also have a time-synchronised depth map with all camera calibration parameters known. Using this information I would like to extract a set of 3D coordinates (point cloud) in OpenCV.
Can anyone give me any pointers regarding how to do so? Thanks in advance!
Nicolas Burrus has created a great tutorial for Depth Sensors like Kinect.
I'll copy & paste the most important parts:
Mapping depth pixels with color pixels
The first step is to undistort rgb and depth images using the
estimated distortion coefficients. Then, using the depth camera
intrinsics, each pixel (x_d,y_d) of the depth camera can be projected
to metric 3D space using the following formula:
P3D.x = (x_d - cx_d) * depth(x_d,y_d) / fx_d
P3D.y = (y_d - cy_d) * depth(x_d,y_d) / fy_d
P3D.z = depth(x_d,y_d)
with fx_d, fy_d, cx_d and cy_d the intrinsics of the depth camera.
If you are further interested in stereo mapping (values for kinect):
We can then reproject each 3D point on the color image and get its
P3D' = R.P3D + T
P2D_rgb.x = (P3D'.x * fx_rgb / P3D'.z) + cx_rgb
P2D_rgb.y = (P3D'.y * fy_rgb / P3D'.z) + cy_rgb
with R and T the rotation and translation parameters estimated during
the stereo calibration.
The parameters I could estimate for my Kinect are:
fx_rgb 5.2921508098293293e+02
fy_rgb 5.2556393630057437e+02
cx_rgb 3.2894272028759258e+02
cy_rgb 2.6748068171871557e+02
k1_rgb 2.6451622333009589e-01
k2_rgb -8.3990749424620825e-01
p1_rgb -1.9922302173693159e-03
p2_rgb 1.4371995932897616e-03
k3_rgb 9.1192465078713847e-01
fx_d 5.9421434211923247e+02
fy_d 5.9104053696870778e+02
cx_d 3.3930780975300314e+02
cy_d 2.4273913761751615e+02
k1_d -2.6386489753128833e-01
k2_d 9.9966832163729757e-01
p1_d -7.6275862143610667e-04
p2_d 5.0350940090814270e-03
k3_d -1.3053628089976321e+00
Relative transform between the sensors (in meters)
R [ 9.9984628826577793e-01, 1.2635359098409581e-03, -1.7487233004436643e-02,
-1.4779096108364480e-03, 9.9992385683542895e-01, -1.2251380107679535e-02,
1.7470421412464927e-02, 1.2275341476520762e-02, 9.9977202419716948e-01 ]
T [ 1.9985242312092553e-02, -7.4423738761617583e-04, -1.0916736334336222e-02 ]

Panoramic Image Photogrametry: How to calculate range?

Assume that I took two panoramic image with vertical offset of H and each image is presented in equirectangular projection with size Xm and Ym. To do this, I place my panoramic camera at position say A and took an image, then move camera H meter up and took another image.
I know that a point in image 1 with coordinate of X1,Y1 is the same point on image 2 with coordinate X2 and Y2(assuming that X1=X2 as we have only vertical offset).
My question is that How I can calculate the range of selected of point (the point that know its X1and Y1 is on image 1 and its position on image 2 is X2 and Y2 from the Point A (where camera was when image no 1 was taken.).
Yes, you can do it - hold on!!!
Key thing y = focal length of your lens - now I can do it!!!
So, I think your question can be re-stated more simply by saying that if you move your camera (on the right in the diagram) up H metres, a point moves down p pixels in the image taken from the new location.
Like this if you imagine looking from the side, across you taking the picture.
If you know the micron spacing of the camera's CCD from its specification, you can convert p from pixels to metres to match the units of H.
Your range from the camera to the plane of the scene is given by x + y (both in red at the bottom), and
so your range is
R = x + y = H/tan(alpha) + p/tan(alpha)
alpha = tan inverse(p/y)
where y is the focal length of your lens. As y is likely to be something like 50mm, it is negligible, so, to a pretty reasonable approximation, your range is
alpha = tan inverse(p in metres/focal length)
Or, by similar triangles
Range = H x focal length of lens
(Y2-Y1) x CCD photosite spacing
being very careful to put everything in metres.
Here is a shot in the dark, given my understanding of the problem at hand you want to do something similar to computer stereo vision, I point you to http://en.wikipedia.org/wiki/Computer_stereo_vision to start. Not sure if this is still possible to do in the manner you are suggesting, it sounds like you may need some more physical constraints but I do remember being able to correlate two 2d points in images after undergoing a strict translation. Think :
lambda[x,y,1]^t = W[r1, tx;r2, ty;ry, tz][x; y; z; 1]^t
Where lamda is a scale factor, W is a 3x3 matrix covering the intrinsic parameters of your camera, r1, r2, and r3 are row vectors that make up the 3x3 rotation matrix (in your case you can assume the identity matrix since you have only applied a translation), and tx, ty, tz which are your translation components.
Since you are looking at two 2d points at the same 3d point [x,y,z] this 3d point is shared by both 2d points. I cannot say if you can rationalize the actual x,y, and z values particularly for your depth calculation but this is where I would start.