OutPut of Map Reduce

OutPut of Map Reduce - mapreduce

Final output of Map Reducer is the combination of individual Reducers output.
For example if we have 3 reducers giving different output. Then final output of Map Reduce would be the combination of all 3 reducers output.
Is that True ?

To answer this question, I would take an example of a MapReduce algorithm used to compute / estimate the numerical value of PI.
Assume there is a circle inscribed inside a square of 1 unit.
A map/reduce program estimates the value of pi using a quasi-Monte Carlo method.
The task between the Mapper and Reducer is defined as below:
Mapper:
Generate points in a unit square.
Count the points inside / outside of the inscribed circle of the square.
Reducer:
Accumulate points inside / outside results from the mapper.
Since the side of square is 1 unit, the area of the circle and the area of the square would be PI / 4 units (using area = PI * r * r) and 1 units (using area = a * a) respectively.
Let numTotal = numInside (number of points inside the circle) + numOutside (number of points outside the circle).
(Area of the circle) / (Area of the square) = numInside / numTotal (Since the circle is inscribed inside the square).
Therefore, numInside / numTotal = PI / 4
Hence, the value of PI = 4 * numInside / numTotal
Since many reducers may be responsible for accumulating the points inside / outside resulting from the mappers, their work is additive and hence the output is a combination of all the reducers involved.
Another example to this would be the word count in each file. I'd suggest you explore more about this from here and here.

Related

Area of Boost c++ in Square Meters

I have a
boost::geometry::model::polygon<Point> Algorithm::poly
and i'm looking for the area of the polygon with
area = bg::area(poly);
the result is
1.10434e+08
When i'm reading the documentation, i can see
"The units are the square of the units used for the points defining the surface". I really don't understand what it means.
https://www.boost.org/doc/libs/1_65_0/libs/geometry/doc/html/geometry/reference/algorithms/area/area_1.html
I would like to know if we have a way to transform the return of the bg::area in a m2 result.
With another tool (i can't use it in my code) i can see that the total of m2 from the polygon is 11043 m2.
How can i have 11043 with 1.10434e+08.

The points in polyare cartesian (x,y) coordinates. What is their unit? are they in mm, cm, attoparsec?
The resulting unit is the square of that. But we can work that out from the data given:
sqrt(1.1043e+08 / 11043) = srqt(10000) = 100
So it seems your points are in 0.01 m == centimeter, so the area returned is in cm²
(1m² = 10000cm²)

In c++, how do I create point locations arbitrarily?

I'm trying to program a simulation. Originally I'd randomly create points like so...
for (int c = 0; c < number; c++){
for(int d = 0; d < 3; d++){
coordinate[c][d] = randomrange(low, high);
}
}
Where randomrange() is an arbitrary range randomizer, number is the amount of created points, and d represents the x,y,z coordinate. It works, however I want to take things further. How would I define a known shape? Say I want 80 points on a circle's circumference, or 500 that form the edges of a cube. I can explain well on paper, but have a problem describing the process as coding. This doesn't pertain to the question, but I end up taking the points to txt file and then use matlab, scatter3 to plot the points. Creating the "shape" points is my issue.

Both a circle and a cube edges set are 1-dimensional sets, so you can represent them as real intervals. For a circle it's straightforward: use an interval (0, 2pi) and transform a random value phi from the interval into a point:
xcentre + R cos(phi), ycentre + R sin(phi)
For a cube you have 12 segments, so use interval (0, 12) and split a random number from the interval into an integer part and a fraction. Then use the integer as an edge number and the fraction as a position inside the edge.

Easy variant:
First think of the min/max x/y values (separately; to reduce the faulty values for the step below), generate some coordinates matching this range, and then check if it fulfills eg. a^2+b^2=r^2 (circle)
If not, try again.
Better, but only possible for certain shapes:
Generate a radius between (0-max) and an angle (0-360)
(or just an angle if it should be on the circle border)
and use some math (sin/cos...) to transform it into x and y.
http://en.wikipedia.org/wiki/Polar_coordinate_system

C++ recognize shape from points

I'am trying to find out an algorithm to recognize circle in array of points.
Lets say that I've got points array where circle could or could not be stored (that also means array doesn't have to store only circle's points, there could be some "extra" points before or after circle's data).
I've already tried some algorithms but none of them work properly with those "extra" points. Have you got any ideas how to deal with this problem?
EDIT// I didn't mention that before. I want this algorithm to be used on circle gesture recognition. I've thought I would have data in array (for last few seconds) and by analysing this data in every tracking frame I would be able to say if there was or was not a circle gesture.

First I calculate the geometric mean (not the aritmetic mean) for each X and Y component.
I choose geometric mean because one feature is that small values (with respect to the arithmetic mean ) of the values are much more influential than the large values.
This lead me to the theoretical center of all points: circ_center
Then I calculate the standard deviation of distance of each point to center: stddev. This gives me the "indicator" to quantify the amount of variation. One property of circle is that all circumference point is at the same distance of it's center. With standard dev I try to test if your points are (with max variance threshold: max_dispersion) equally distance.
Last I calculates the average distance of points inside max_dispersion threshold from center, this give me the radius of the circle: avg_dist.
Parameters:
max_dispersion represents the "cicle precision". Smaller means more precise.
min_points_needed is the minimun number of points valid to be considered as circumference.
This is just an attempt, I have not tried. Let me know.
I will try this (in pseudo language)
points_size = 100; //number_of_user_points
all_poins[points_size]; //coordinates of points
//thresholds to be defined by user
max_dispersion = 20; //value of max stddev accepted, expressed in geometric units
min_points_needed = 5; //minimum number of points near the circumference
stddev = 0; //standard deviation of points from center
circ_center; //estimated circumference center, using Geometric mean
num_ok_points = 0; //points with distance under standard eviation
avg_dist = 0; //distance from center of "ok points"
all_x = 1; all_y = 1;
for(i = 0 ; i < points_size ; i++)
{
all_x = all_x * all_poins[i].x;
all_y = all_y * all_poins[i].y;
}
//pow(x, 1/y) = nth root
all_x = pow(all_x, 1 / points_size); //Geometric mean
all_y = pow(all_y, 1 / points_size); //Geometric mean
circ_center = make_point(all_x, all_y);
for(i = 0 ; i < points_size ; i++)
{
dist = distance(all_poins[i], circ_center);
stddev = stddev + (dist * dist);
}
stddev = square_root(stddev / points_size);
for(i = 0 ; i < points_size ; i++)
{
if( distance(all_poins[i], circ_center) < max_dispersion )
{
num_ok_points++;
avg_dist = avg_dist + distance(all_poins[i], circ_center);
}
}
avg_dist = avg_dist / num_ok_points;
if(stddev <= max_dispersion && num_ok_points >= min_points_needed)
{
circle recognized; it's center is circ_center; it's radius is avg_dist;
}

Can we assume the array of points are mostly on or near to the circumference of the circle?
A circle has a center and radius. If you can determine the circle's center coordinates, via the intersection of perpendiculars of two chords, then all the true circle points should be equidistant(r), from the center point.
The false points can be eliminated by not being equidistant (+-)tolerance from the center point.
The weakness of this approach is how well can you determine the center and radius? You may want to try a least squares approach to computing the center coordinates.

To answer the initially stated question, my approach would be to iterate through the points and derive the center of a circle from each consecutive set of three points. Then, take the longest contiguous subset of points that create circles with centers that fall within some absolute range. Then determine if the points wind consistently around the average of the circles. You can always perform some basic heuristics on any discarded data to determine if a circle is actually what the user wanted to make though.
Now, since you say that you want to perform gesture recognition, I would suggest you think of a completely different method. Personally, I would first create a basic sort of language that can be used to describe gestures. It should be very simple; the only words I would consider having are:
Start - Denotes the start of a stroke
Angle - The starting angle of the stroke. This should be one of the eight major cardinal directions (N, NW, W, SW, S, SE, E, NE) or Any for unaligned gestures. You could also add combining mechanisms, or perhaps "Axis Aligned" or other such things.
End - Denotes the end of a stroke
Travel - Denotes a straight path in the stroke
Distance - The percentage of the total length of the path that this particular operation will consume.
Turn - Denotes a turn in the stroke
Direction - The direction to turn in. Choices would be Left, Right, Any, Previous, or Opposite.
Angle - The angle of the turn. I would suggest you use just three directions (90 deg, 180 deg, 270 deg)
Tolerance - The maximum tolerance for deviation from the specified angle. This should have a default of somewhere around 45 degrees in either direction for a high chance of matching the angle in a signature.
Type - Hard or Radial. Radial angles would be a stroke along a radius. Hard angles would be a turn about a point.
Radius - If the turn is radial, this is the radius of the turn (units are in percentage of total path length, with appropriate conversions of course)
Obviously you can make the angles much more fine, but the coarser the ranges are, the more tolerant of input error it can be. Being too tolerant can lead to misinterpretation though.
If you apply some fuzzy logic, it wouldn't be hard to break just about any gesture down into a language like this. You could then create a bunch of gesture "signatures" that describe various gestures that can be performed. For instance:
//Circle
Start Angle=Any
Turn Type=Radial Direction=Any Angle=180deg Radius=50%
Turn Type=Radial Direction=Previous Angle=180deg Radius=50%
End
//Box
Start Angle=AxisAligned
Travel Distance=25%
Turn Type=Hard Direction=Any Angle=90deg Tolerance=10deg
Travel Distance=25%
Turn Type=Hard Direction=Previous Angle=90deg Tolerance=10deg
Travel Distance=25%
Turn Type=Hard Direction=Previous Angle=90deg Tolerance=10deg
Travel Distance=25%
End
If you want, I could work on an algorithm that could take a point cloud and degenerate it into a series of commands like this so you can compare them with pre-generated signatures.

Fastest way to calculate cubic bezier curves?

Right now I calculate it like this:
double dx1 = a.RightHandle.x - a.UserPoint.x;
double dy1 = a.RightHandle.y - a.UserPoint.y;
double dx2 = b.LeftHandle.x - a.RightHandle.x;
double dy2 = b.LeftHandle.y - a.RightHandle.y;
double dx3 = b.UserPoint.x - b.LeftHandle.x;
double dy3 = b.UserPoint.y - b.LeftHandle.y;
float len = sqrt(dx1 * dx1 + dy1 * dy1) +
sqrt(dx2 * dx2 + dy2 * dy2) +
sqrt(dx3 * dx3 + dy3 * dy3);
int NUM_STEPS = int(len * 0.05);
if(NUM_STEPS > 55)
{
NUM_STEPS = 55;
}
double subdiv_step = 1.0 / (NUM_STEPS + 1);
double subdiv_step2 = subdiv_step*subdiv_step;
double subdiv_step3 = subdiv_step*subdiv_step*subdiv_step;
double pre1 = 3.0 * subdiv_step;
double pre2 = 3.0 * subdiv_step2;
double pre4 = 6.0 * subdiv_step2;
double pre5 = 6.0 * subdiv_step3;
double tmp1x = a.UserPoint.x - a.RightHandle.x * 2.0 + b.LeftHandle.x;
double tmp1y = a.UserPoint.y - a.RightHandle.y * 2.0 + b.LeftHandle.y;
double tmp2x = (a.RightHandle.x - b.LeftHandle.x)*3.0 - a.UserPoint.x + b.UserPoint.x;
double tmp2y = (a.RightHandle.y - b.LeftHandle.y)*3.0 - a.UserPoint.y + b.UserPoint.y;
double fx = a.UserPoint.x;
double fy = a.UserPoint.y;
//a user
//a right
//b left
//b user
double dfx = (a.RightHandle.x - a.UserPoint.x)*pre1 + tmp1x*pre2 + tmp2x*subdiv_step3;
double dfy = (a.RightHandle.y - a.UserPoint.y)*pre1 + tmp1y*pre2 + tmp2y*subdiv_step3;
double ddfx = tmp1x*pre4 + tmp2x*pre5;
double ddfy = tmp1y*pre4 + tmp2y*pre5;
double dddfx = tmp2x*pre5;
double dddfy = tmp2y*pre5;
int step = NUM_STEPS;
while(step--)
{
fx += dfx;
fy += dfy;
dfx += ddfx;
dfy += ddfy;
ddfx += dddfx;
ddfy += dddfy;
temp[0] = fx;
temp[1] = fy;
Contour[currentcontour].DrawingPoints.push_back(temp);
}
temp[0] = (GLdouble)b.UserPoint.x;
temp[1] = (GLdouble)b.UserPoint.y;
Contour[currentcontour].DrawingPoints.push_back(temp);
I'm wondering if there is a faster way to interpolate cubic beziers?
Thanks

Look into forward differencing for a faster method. Care must be taken to deal with rounding errors.
The adaptive subdivision method, with some checks, can be fast and accurate.

There is another point that is also very important, which is that you are approximating your curve using a lot of fixed-length straight-line segments. This is inefficient in areas where your curve is nearly straight, and can lead to a nasty angular poly-line where the curve is very curvy. There is not a simple compromise that will work for high and low curvatures.
To get around this is you can dynamically subdivide the curve (e.g. split it into two pieces at the half-way point and then see if the two line segments are within a reasonable distance of the curve. If a segment is a good fit for the curve, stop there; if it is not, then subdivide it in the same way and repeat). You have to be careful to subdivide it enough that you don't miss any localised (small) features when sampling the curve in this way.
This will not always draw your curve "faster", but it will guarantee that it always looks good while using the minimum number of line segments necessary to achieve that quality.
Once you are drawing the curve "well", you can then look at how to make the necessary calculations "faster".

Actually you should continue splitting until the two lines joining points on curve (end nodes) and their farthest control points are "flat enough":
- either fully aligned or
- their intersection is at a position whose "square distance" from both end nodes is below one half "square pixel") - note that you don't need to compute the actual distance, as it would require computing a square root, which is slow)
When you reach this situation, ignore the control points and join the two end-points with a straight segment.
This is faster, because rapidly you'll get straight segments that can be drawn directly as if they were straight lines, using the classic Bresenham algorithm.
Note: you should take into account the fractional bits of the endpoints to properly set the initial value of the error variable accumulating differences and used by the incremental Bresenham algorithm, in order to get better results (notably when the final segment to draw is very near from the horizontal or vertical or from the two diagonals); otherwise you'll get visible artefacts.
The classic Bresenham algorithm to draw lines between points that are aligned on an integer grid initializes this error variable to zero for the position of the first end node. But a minor modification of the Bresenham algorithm scales up the two distances variables and the error value simply by a constant power of two, before using the 0/+1 increments for the x or y variable which remain unscaled.
The high order bits of the error variable also allows you compute an alpha value that can be used to draw two stacked pixels with the correct alpha-shading. In most cases, your images will be using 8-bit color planes at most, so you will not need more that 8 bits of extra precision for the error value, and the upscaling can be limited to the factor of 256: you can use it to draw "smooth" lines.
But you could limit yourself to the scaling factor of 16 (four bits): typical bitmap images you have to draw are not extremely wide and their resolution is far below +/- 2 billions (the limit of a signed 32-bit integer): when you scale up the coordinates by a factor of 16, it will remain 28 bits to work with, but you should already have "clipped" the geometry to the view area of your bitmap to render, and the error variable of the Bresenham algorithm will remain below 56 bits in all cases and will still fit in a 64-bit integer.
If your error variable is 32-bit, you must limit the scaled coordinates below 2^15 (not more than 15 bits) for the worst case (otherwise the test of the sign of the error varaible used by Bresenham will not work due to integer overflow in the worst case), and with the upscaling factor of 16 (4 bits) you'll be limited to draw images not larger than 11 bits in width or height, i.e. 2048x2048 images.
But if your draw area is effectively below 2048x2048 pixels, there's no problem to draw lined smoothed by 16 alpha-shaded values of the draw color (to draw alpha-shaded pixels, you need to read the orignal pixel value in the image before mixing the alpha-shaded color, unless the computed shade is 0% for the first staked pixel that you don't need to draw, and 100% for the second stacked pixel that you can overwrite directly with the plain draw color)
If your computed image also includes an alpha-channel, your draw color can also have its own alpha value that you'll need to shade and combine with the alpha value of the pixels to draw. But you don't need any intermediate buffer just for the line to draw because you can draw directly in the target buffer.
With the error variable used by the Bresenham algorithm, there's no problem at all caused by rounding errors because they are taken into account by this variable. So set its initial value properly (the alternative, by simply scaling up all coordinates by a factor of 16 before starting subdividing recursively the spline is 16 times slower in the Bresenham algorithm itself).

Note how "flat enough" can be calculated. "Flatness" is a mesure of the minimum absolute angle (between 0 and 180°) between two sucessive segment but you don't need to compute the actual angle because this flatness is also equivalent to setting a minimum value to the cosine of their relative angle.
That cosine value also does not need to be computed directly because all you need is in fact the vector product of the two vectors and compare it with the square of the maximum of their length.
Note also that the "square of the cosine" is also "one minus the square of the sine". A maximum square cosine value is also a minimum square sine value... Now you know which kind of "vector product" to use: the fastest and simplest to compute is the scalar product, whose square is proportional to the square sine of the two vectors and to the product of square lengths of both vectors.
So checking if the curve is "flat enough" is simple: compute a ratio between two scalar products and see if this ratio is below the "flatness" constant value (of the minimum square sine). There's no division by zero because you'll determine which of the two vectors is the longest, and if this one has a square length below 1/4, your curve is already flat enough for the rendering resolution; otherwise check this ratio between the longest and the shortest vector (formed by the crossing diagonals of the convex hull containing the end points and control points):
with quadratic beziers, the convex hull is a triangle and you choose two pairs
with cubic beziers, the convex hull is a 4-sides convex polygon and the diagonals may either join an end point with one of the two control points, or join together the two end-points and the two control points and you have six possibilities
Use the combination giving the maximum length for the first vector between the 1st end-point and one of the three other points, the second vector joining two other points):
Al you need is to determine the "minimum square length" of the segments starting with one end-point or control-point to the next control-point or end-point in the sequence. (in a quadratic Bezier you just compare two segments, with a quadratic Bezier, you check 3 segments)
If this "minimum square length" is below 1/4 you can stop there, the curve is "flat enough".
Then determine the "maximum square length" of the segments starting with one end-point to any one of the other end-point or control-point (with a quadratic Bezier you can safely use the same 2 segments as above, with a cubic Bezier you discard one of the 3 segments used above joining the 2 control-points, but you add the segment joining the two end-nodes).
Then check that the "minimum square length" is lower than the product of the constant "flatness" (minimum square sine) times the "maximum square length" (if so the curve is "flat enough".
In both cases, when your curve is "flat enough", you just need to draw the segment joining the two end-points. Otherwise you split the spline recursively.
You may include a limit to the recursion, but in practice it will never be reached unless the convex hull of the curve covers a very large area in a very large draw area; even with 32 levels of recusions, it will never explode in a rectangular draw area whose diagonal is shorter than 2^32 pixels (the limit would be reached only if you are splitting a "virtual Bezier" in a virtually infinite space with floating-point coordinates but you don't intend to draw it directly, because you won't have the 1/2 pixel limitation in such space, and only if you have set an extreme value for the "flatness" such that your "minimum square sine" constant parameter is 1/2^32 or lower).

Calculate the gradient for an histogram in c++

I calculated the histogram(a simple 1d array) for an 3D grayscale Image.
Now I would like to calculate the gradient for the this histogram at each point. So this would actually mean I have to calculate the gradient for a 1D function at certain points. However I do not have a function. So how can I calculate it with concrete x and y values?
For the sake of simplicity could you probably explain this to me on an example histogram - for example with the following values (x is the intensity, and y the frequency of this intensity):
x1 = 1; y1 = 3
x2 = 2; y2 = 6
x3 = 3; y3 = 8
x4 = 4; y4 = 5
x5 = 5; y5 = 9
x6 = 6; y6 = 12
x7 = 7; y7 = 5
x8 = 8; y8 = 3
x9 = 9; y9 = 5
x10 = 10; y10 = 2
I know that this is also a math problem, but since I need to solve it in c++ I though you could help me here.
Thank you for your advice
Marc

I think you can calculate your gradient using the same approach used in image border detection (which is a gradient calculus). If your histogram is in a vector you can calculate an approximation of the gradient as*:
for each point in the histogram compute
gradient[x] = (hist[x+1] - hist[x])
This is a very simple way to do it, but I'm not sure if is the most accurate.
approximation because you are working with discrete data instead of continuous
Edited:
Other operators will may emphasize small differences (small gradients will became more emphasized). Roberts algorithm derives from the derivative calculus:
lim delta -> 0 = f(x + delta) - f(x) / delta
delta tends infinitely to 0 (in order to avoid 0 division) but is never zero. As in computer's memory this is impossible, the smallest we can get of delta is 1 (because 1 is the smallest distance from to points in an image (or histogram)).
Substituting
lim delta -> 0 to lim delta -> 1
we get
f(x + 1) - f(x) / 1 = f(x + 1) - f(x) => vet[x+1] - vet[x]

Two generally approaches here:
a discrete approximation to the derivative
take the real derivative of a fitted function
In the first case try:
g = (y_(i+1) - y_(i-1))/2*dx
at all the points except the ends, or one of
g_left-end = (y_(i+1) - y_i)/dx
g_right-end = (y_i - y_(i-1))/dx
where dx is the spacing between x points. (Unlike the equally correct definition Andres suggested, this one is symmetric. Whether it matters or not depends on you use case.)
In the second case, fit a spline to your data[*], and ask the spline library the derivative at the point you want.
[*] Use a library! Do not implement this yourself unless this is a learning project. I'd use ROOT because I already have it on my machine, but it is a pretty heavy package just to get a spline...
Finally, if you data is noisy, you ma want to smooth it before doing slope detection. That was you avoid chasing the noise, and only look at large scale slopes.

Take some squared paper and draw on it your histogram. Draw also vertical and horizontal axes through the 0,0 point of your histogram.
Take a straight edge and, at each point you are interested in, rotate the straight edge until it accords with your idea of what the gradient at that point is. It is most important that you do this, your definition of gradient is the one you want.
Once the straight edge is at the angle you desire draw a line at that angle.
Drop perpendiculars from any 2 points on the line you just drew. It will be easier to take the following step if the horizontal distance between the 2 points you choose is about 25% or more of the width of your histogram. From the same 2 points draw horizontal lines to intersect the vertical axis of your histogram.
Your lines now define an x-distance and a y-distance, ie the length of the horizontal/ vertical (respectively) axes marked out by their intersections with the perpendiculars/horizontal lines. The gradient you want is the y-distance divided by the x-distance.
Now, to translate this into code is very straightforward, apart from step 2. You have to define what the criteria are for determining what the gradient at any point on the histogram is. Simple choices include:
a) at each point, set down your straight edge to pass through the point and the next one to its right;
b) at each point, set down your straight edge to pass through the point and the next one to its left;
c) at each point, set down your straight edge to pass through the point to the left and the point to the right.
You may want to investigate more complex choices such as fitting a curve (such as a quadratic or higher-order polynomial) through a number of points on your histogram and using the derivative of that to represent the gradient.
Until you understand the question on paper avoid coding in C++ or anything else. Once you do understand it, coding should be trivial.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js