Document AI v1Beta3 API coordinate mapping to pdf coordinates - google-cloud-platform

I parsed a PDF file using API version v1beta3 and got the coordinates of a table as seen below.
"normalizedVertices": [
{
"x": 0.6894705,
"y": 0.016400337
},
{
"x": 0.87983346,
"y": 0.016400337
},
{
"x": 0.87983346,
"y": 0.026072329
},
{
"x": 0.6894705,
"y": 0.026072329
}
]
How to convert these to PDF coordinates ?

I tested a pdf form (gs://cloud-samples-data/documentai/loan_form.pdf) and is from Document AI docs. I used both v1 and v1beta3, got same results and it works as expected.
The x and y values returned by normalizedVertices are from 0 to 1. Document AI calculates the values of x and y with respect to the point of origin which is the top left corner of the image. The bounding box logic is explained in this document. While online.sodapdf.com calculates value of x with respect to the origin, value of y with respect to the maximum point.
To convert the values to actual x and y coordinates just like in online.sodapdf.com see convertion:
x = x * width
y = height - (y * height))
To test this, the sample document has a width = 612 and height = 792 and selected an object to convert coordinates.
The returned object at loan_form.pdf have the coordinates presented under "NormalizedVertex" column. Using the formula above, you will get the converted coordinates. The calculated value may have minimal difference versus the actual, this is maybe due to the object detection algorithm of both tools. See testing done below:
Width and height of the tested document:
Test object detected loan_form.pdf:
Tested object detected in online.sodapdf.com:

Related

Algorithm for 'pixelated circle' image recognition

Here are three sample images. In these images I want to find:
Coordinates of the those small pixelated partial circles.
Rotation of these circles. These circle have a 'pointy' side. I want to find its direction.
For example, coordinates and the angle with positive x axis of that small partial circle in the
first image is (51 px, 63 px), 240 degrees, respectively.
second image is (50 px, 52 px), 300 degrees, respectively.
third image is (80 px, 29 px), 225 degrees, respectively.
I don't care about scale invariance.
Methods I have tried:
ORB feature detection
SIFT feature detection
Feature detection don't seem to work here.
Above is the example of ORB feature detector finding similar features in 1st and 2nd image.
It is finding one correct match, rest are wrong.
Probably because these images are too low resolution to find any meaningful corners or blobs. The corners and blob it does find are not much different form other pixelated object present.
I have seen people use erosion and dilution to remove noise, but my objects are too small for that to work.
Perhaps some other feature detector can help?
I am also thinking about Generalized Hough transform, however I cant find a complete tutorial to implement it with OpenCV (c++). Also I want something that is fast. Hopefully in real time.
Any help is appreciated.
If the small circles have constant size, then you might try a convolution.
This is a quick and dirty test I ran with ImageMagick for speed, and coefficients basically pulled out of thin air:
convert test1.png -define convolve:scale='!' -morphology Convolve \
"12x12: \
-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 \
-9,-7,-2,-1,0,0,0,0,-1,-2,-7,-9 \
-9,-2,-1,0,9,9,9,9,0,-1,-2,-9 \
-9,-1,0,9,7,7,7,7,9,0,-1,-9 \
-9,0,9,7,-9,-9,-9,-9,7,9,0,-9 \
-9,0,9,7,-9,-9,-9,-9,7,9,0,-9 \
-9,0,9,7,-9,-9,-9,-9,7,9,0,-9 \
-9,0,9,7,-9,-9,-9,-9,7,9,0,-9 \
-9,-1,0,9,7,7,7,7,9,0,-1,-9 \
-9,-2,0,0,9,9,9,9,0,0,-2,-9 \
-9,-7,-2,-1,0,0,0,0,-1,-2,-7,-9 \
-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9" \
test2.png
I then ran a simple level stretch plus contrast to bring out what already were visibly more luminous pixels, and a sharpen/reduction to shrink pixel groups to their barycenters (these last operations could be done by multiplying the matrix by the proper kernel), and got this.
The source image on the left is converted to the output on the right, the pixels above a certain threshold mean "circle detected".
Once this is done, I imagine the "pointy" end can be refined with a modified quicunx - use a 3x3 square grid centered on the center pixel, count the total luminosity in each of the eight peripheral squares, and that ought to give you a good idea of where the "point" is. You might want to apply thresholding to offset a possible blurring of the border (the centermost circle in the example below, the one inside the large circle, could give you a false reading).
For example, if we know the coordinates of the center in the grayscale matrix M, and we imagine the circle having diameter of 7 pixels (this is more or less what the convolution above says), we would do
uint quic[3][3] = { { 0, 0, 0 }, { 0, 0, 0 }, { 0, 0, 0 } };
for (y = -3; y <= 3; y++) {
for (x = -3; x <= 3; x++) {
if (matrix[cy+y][cx+x] > threshold) {
quic[(y+3)/2-1][(x+3)/2-1] += matrix[cy+y][cx+x];
}
}
}
// Now, we find which quadrant in quic holds the maximum:
// if it is, say, quic[2][0], the point is southeast.
// 0 1 2 x
// 0 NE N NW
// 1 E X W
// 2 SE S SW
// y
// Value X (1,1) is totally unlikely - the convolution would
// not have found the circle in the first place if it was so
For an accurate result you would have to use "sub-pixel" addressing, which is slightly more complicated. With the method above, one of the circles results in these quicunx values, that give a point to the southeast:
Needless to say, with this kind of resolution the use of a finer grid is pointless, you'd get an error of the same order of magnitude.
I've tried with some random doodles and the convolution matrix has a good rejection of non-signal shapes, but of course this is due to information about the target's size and shape - if that assumption fails, this approach will be a dead end.
It would help to know the image source: there're several tricks used in astronomy and medicine to detect specific shapes or features.
Python opencv2
The above can be implemented with Python:
#!/usr/bin/python3
import cv2
import numpy as np
# Scaling factor
d = 240
kernel1 = np.array([
[ -9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 ],
[ -9,-7,-2,-1,0,0,0,0,-1,-2,-7,-9 ],
[ -9,-2,-1,0,9,9,9,9,0,-1,-2,-9 ],
[ -9,-1,0,9,7,7,7,7,9,0,-1,-9 ],
[ -9,0,9,7,-9,-9,-9,-9,7,9,0,-9 ],
[ -9,0,9,7,-9,-9,-9,-9,7,9,0,-9 ],
[ -9,0,9,7,-9,-9,-9,-9,7,9,0,-9 ],
[ -9,0,9,7,-9,-9,-9,-9,7,9,0,-9 ],
[ -9,-1,0,9,7,7,7,7,9,0,-1,-9 ],
[ -9,-2,0,0,9,9,9,9,0,0,-2,-9 ],
[ -9,-7,-2,-1,0,0,0,0,-1,-2,-7,-9 ],
[ -9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 ]
], dtype = np.single)
sharpen = np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]]);
image = cv2.imread('EuDpD.png')
# Scale kernel
for i in range(0, 12):
for j in range(0, 12):
kernel1[i][j] = kernel1[i][j]/d
identify = cv2.filter2D(src=image, ddepth=-1, kernel=kernel1)
# Sharpen image
identify = cv2.filter2D(src=identify, ddepth=-1, kernel=sharpen)
# Cut at ~90% of maximum
ret,thresh = cv2.threshold(identify, 220, 255, cv2.THRESH_BINARY)
cv2.imwrite('identify.png', thresh)
The above, ran on the grayscaled image (left), gives the following result (right). A better sharpening or adaptive thresholding could come up with a single pixel.

What value should Z actually be for perspective divide?

So I'm trying to understand the fundamentals of perspective projection for 3D graphics and I'm getting stuck. I'm trying to avoid matrices at the moment to try and make things easier for understanding. This is what I've come up with so far:
First I imagine I have a point coming in with screen (pixel) coordinates of x: 200, y: 600, z: 400. The z amount in this context represents the distance, in pixels, from the projection plane or monitor (this is just how I'm thinking of it). I also have a camera that I'm saying is 800 pixels from the projection plane/monitor (on the back side of the projection plane/monitor), so that acts as the focal length of the camera.
From my understanding, first I find the total z distance of the point 200, 600 by adding its z to the camera's focal length (400 + 800), which gives me a total z distance of 1200. Then, if I wanted to find the projected point of these coordinates I just need to multiply each coordinate (x & y) by (focal_length/z_distance) or 800/1200 which gives me the projected coordinates x: 133, y: 400.
Now, from what I understand, openGL expects me to send my point down in clips space (-1 to 1) so I shouldn't send my pixel values down as 200, 600. I would have to normalize my x and y coordinates to this -1 to 1 space first. So I normalize my x & y values like so:
xNorm = (x / (width/2)) - 1;
yNorm = (y / (height/2)) - 1;
This gives me normalized values of x: -.6875, y: -.0625. What I'm unsure of is what my Z would need to be if openGL is going to eventually divide these normalized values by it. I know aspect ratio probably needs to be entered into the equation but not sure how.

How to find an Equivalent point in a Scaled down image?

I would like to calculate the corner points or contours of the star in this in a Larger image. For that I'm scaling down the size to a smaller one & I'm able to get this points clearly. Now How to map this point in original image? I'm using opencv c++.
Consider a trivial example: the image size is reduced exactly by half.
So, the cartesian coordinate (x, y) in the original image becomes coordinate (x/2, y/2) in the reduced image, and coordinate (x', y') in the reduced image corresponds to coordinate (x*2, y*2) in the original image.
Of course, fractional coordinates get typically rounded off, in a reduced scale image, so the exact mapping is only possible for even-numbered coordinates in this example's original image.
Generalizing this, if the image's width is scaled by a factor of w horizontally and h vertically, coordinate (x, y) becomes coordinate(x*w, y*h), rounded off. In the example I gave, both w and h are 1/2, or .5
You should be able to figure out the values of w and h yourself, and be able to map the coordinates trivially. Of course, due to rounding off, you will not be able to compute the exact coordinates in the original image.
I realize this is an old question. I just wanted to add to Sam's answer above, to deal with "rounding off", in case other readers are wondering the same thing I faced.
This rounding off becomes obvious for even # of pixels across a coordinate axis. For instance, along a 1-D axis, a point demarcating the 2nd quartile gets mapped to an inaccurate value:
axis_prev = [0, 1, 2, 3]
axis_new = [0, 1, 2, 3, 4, 5, 6, 7]
w_prev = len(axis_prev) # This is an axis of length 4
w_new = len(axis_new) # This is an axis of length 8
x_prev = 2
x_new = x_prev * w_new / w_prev
print(x_new)
>>> 4
### x_new should be 5
In Python, one strategy would be to linearly interpolate values from one axis resolution to another axis resolution. Say for the above, we wish to map a point from the smaller image to its corresponding point of the star in the larger image:
import numpy as np
from scipy.interpolate import interp1d
x_old = np.linspace(0, 640, 641)
x_new = np.linspace(0, 768, 769)
f = interp1d(x_old, x_new)
x = 35
x_prime = f(x)

Google chart how to have two y axis

I want one chart showing temperature with one curve and humidity in percent with a second curve.
I did it.
However the curve of the temperature is small since max is around 22 while humidity Max is around 90.
Does Google chart support the possibility to have 2 scale with y axis?
Thanks
Rod
You need to add a second y-axis. You can do this by setting the vAxes option (which takes an object whose properties are objects with vAxis options), and use the series option to target each series to a specific axis. Here's an example:
vAxes: {
0: {
// options for left y-axis
title: 'Temperature'
},
1: {
// options for right y-axis
title: 'Humidity'
}
},
series: {
0: {
// options for first data series (I'm assuming this is temperature)
targetAxisIndex: 0 // target left axis
},
1: {
// options for second data series (I'm assuming this is humidity)
targetAxisIndex: 1 // target right axis
}
}

Horizontal line in Google scatter chart

I'm using a scatter chart to display data with the following range: x = [-1..1] y = [-1..1]. Is it possible to draw a horizontal line on e.g. y = 0.5?
I'm using the JavaScript charts (i.e. not the image charts).
We had the same problem at work. Unfortunately, for the moment Google Charts does not provide an easy way to display a line in the scatter chart, like in the bar chart.
Finally we found a "small trick" that works perfectly for us, as you can see here:
http://csgid.org/csgid/statistics/structures
The trick consist in creating a "Line chart" but setting the linewidth property to 0 and pointsize to 5 in the series of the points, and linewidth 1 and pointsize 0 in the serie of the line.
It looks like:
interpolateNulls: true,
series: {
0: { lineWidth: 0, pointSize: 5 },
1: { lineWidth: 0, pointSize: 5 },
2: { lineWidth: 0, pointSize: 5 },
3: { lineWidth: 0, pointSize: 5 },
4: { lineWidth: 1, pointSize: 0 }
}
Why did I set interpolateNulls to true? Because then, I had to change the way I was setting the data in the array before convert it to JSON and pass it to Google Charts. In every row I had to set the values of every serie in the X axis for each value of the Y axis. So I had to set to null the X value when a serie didn't have a Y value for that X value (I mean, when a serie didn't have any point for that X value). So, the same for the serie of the line.
This would be one point of the first serie (in JSON):
[2.6,0.184,null,null,null,null]
And this one "point" of the line serie (the last serie):
[4,null,null,null,null,0.254]
Maybe it is not the most efficient way, but it works :)
I hope I have explained it clear, let me know if you have more questions.