How do you convert this math equation to C++ statement? - c++

I would like to know how can I convert this equation to C++.
This is what I've tried.
void calPoints(int arti, int eksi, int hak) {
int bir = 1, bin = 1000, on = 10, bes = 5;
int puan = on * arti + bes * eksi + bir / hak * bin - arti * eksi / arti + eksi;
}

There are two things to be wary of when writing C++ code for this equation: Operator precedence and integer math.
Operator precedence is easy enough to handle, and is only a concern for the last term. The more interesting issues come from integer division.
The expression 1 / hak will either be 0, 1, -1, or a divide-by-zero error, because the result of integer division is an integer. 1 / 2 will be 0. The usual fix is to multiply first, then divide (as long as this ordering won't result in overflow). In this case, we multiply by 1000 first, then divide.
Putting that all together gives:
int puan = on * arti + bes * eksi + bir * bin / hak - arti * eksi / (arti + eksi);
Although bir can be left out since it is 1, and my personal inclination would be to leave the constants as constants in the equation, which makes it easier to verify:
int puan = 10 * arti + 5 * eksi + 1000 / hak - arti * eksi / (arti + eksi);
If integer based math is not desired, you'll need to change a few types to do things with floats or doubles.
double puan = 10.0 * arti + 5.0 * eksi + 1000.0 / hak - double(arti * eksi) / (arti + eksi);
In this form, all the constants are specified as double values, so that each term is evaluated as a double, with one cast in the last term to also calculate that as a double. This result can be cast to an int if necessary. The parameter types may need to be changed to double, depending on what values may need to be passed into the equation.

- arti * eksi / arti + eksi;
That's where your problem is. Think about the order of operations, it does multiplication and division before addition and subtraction.
Steps the code is taking:
arti * eksi
the result above / arti
the result above + eski
You need a parenthesis around the arti + eksi

double puan = (on * arti) + (bes * eski) + ((double)(bir/hak))*bin - (double)((arti*eski)/(arti+esi))
Puan is double because (bir/hak) and ((arti*eski)/(arti+esi)) might result in floating point number.
If you don't want it to be double then,
int puan = (on * arti) + (bes * eski) + (bir/hak)*bin - ((arti*eski)/(arti+esi))
This should work in your case. You should read about typecasting and precedence of arithmetic operators.

Related

Trigonometric Equation only works with specific input ( 5 ) doesn't work with other inputs

I try to write code for that calculation angles from lengths of triangle. formula is
cos(a)=b^2+c^2-a^2/2bc. (Triangle is here)
angle1 = acosf((powf(length2,2) + powf(length3,2) - powf(length1,2)) / 2 * length2 * length3)* 180 / 3.14153;
angle2 = acosf((powf(length1,2) + powf(length3,2) - powf(length2,2)) / 2 * length1 * length3)* 180 / 3.14153;
angle3 = 180 - (angle2 + angle1);
Everything is float. When entered 5-4-3 inputs outcome this.
angle one is 90.0018
angle two is nan
angle three is nan
changing order doesn't matter, only gives output for 5.
You are doing:
angle1 = acosf((powf(length2,2) + powf(length3,2) - powf(length1,2)) / 2 * length2 * length3)* 180 / 3.14153;
You should be doing:
angle1 = acosf((powf(length2,2) + powf(length3,2) - powf(length1,2)) / (2 * length2 * length3))* 180 / 3.14153;
Explanation: The problem is caused by the following formula, which is in fact badly written:
cos(a)=b^2+c^2-a^2/2bc
// This, obviously, is wrong because
// you need to group the firt three terms together.
// Next to that, everybody understands that the last "b" and "c" are divisors,
// yet it would be better to write it as:
cos(a)=(b^2+c^2-a^2)/(2bc)
The brackets, I added in the code, are similar to the replacement of /2bc by /(2bc).

What happens to the value of a floating point number when it's assigned to a long double?

Edit: I've realised that I'm working with the type long doubleand not just double which does make a difference. I've also added an example from my program below that reproduces the error in question.
Note: I'm currently working in C++11 and using GCC to compile.
I'm dealing with a situation where the result varies between the below two calculations:
value1 = x * 6.0;
double six = 6.0;
value2 = x * six;
value1 != value2
Where all variables above are of type long double.
Essentially, I wrote a line of code that gives me an incorrect answer when I use 6.0 in the actual calculation. Whereas, if I assign 6.0 to a variable of type long double first then use that variable in the calculation I receive the correct result.
I understand the basics of floating point arithmetic, and I guess it's obvious that something is happening to the bits of 6.0 when it is assigned to the long double type.
Sample from my actual program (I left the calculation as is to ensure the error is reproducible):
#include <iomanip>
#include <math.h>
long double six = 6.0;
long double value1;
long double value2;
value1 = (0.7854 * (pow(10, 5)) * six * (pow(0.033, 2)) * 1.01325 * (1.27 * 11.652375 / 1.01325 - 1.0));
value2 = (0.7854 * (pow(10, 5)) * 6.0 * (pow(0.033, 2)) * 1.01325 * (1.27 * 11.652375 / 1.01325 - 1.0));
std::cout << std::setprecision(25) << value1 << std::endl;
std::cout << std::setprecision(25) << value2 << std::endl;
Where the output is:
7074.327896870849993415931
7074.327896870850054256152
Also, I understand how floating point calculations only hold precision up to a certain number of bits (so setting such high precision shouldn't effect results, e.g. after 15-17 digits it should really matter if numbers vary but unfortunately this does affect my calculation).
Question: Why are the above two code segments producing (slightly) different results?
Note: I'm not simply comparing the two numbers with == and receiving false. I've just been printing them out using setprecision and checking each digit.
The problem here I believe is one of promotion.
long double six = 6.0;
long double value1;
long double value2;
value1 = (0.7854 * (pow(10, 5)) * six * (pow(0.033, 2)) * 1.01325 * (1.27 * 11.652375 / 1.01325 - 1.0));
value2 = (0.7854 * (pow(10, 5)) * 6.0 * (pow(0.033, 2)) * 1.01325 * (1.27 * 11.652375 / 1.01325 - 1.0));
Looking at the second calculation we notice that every term in the expression is type double. This means the whole expression will be evaluated to double precision.
However the first calculation contains the variable six that is of type long double. This will cause the entire expression to be calculated at the higher precision of a long double.
So this difference in the calculation's precision is likely the cause of the discrepancy. The whole of the first expression is promoted to long double precision but the second calculation is calculated only to double precision.
In fact a simple change to the code can prove this. If we change the type of the term 6.0 from double to long double by writing 6.0L we will get identical results because both expressions are now calculated to the same precision:
value1 = (0.7854 * (pow(10, 5)) * six * (pow(0.033, 2)) * 1.01325 * (1.27 * 11.652375 / 1.01325 - 1.0));
value2 = (0.7854 * (pow(10, 5)) * 6.0L * (pow(0.033, 2)) * 1.01325 * (1.27 * 11.652375 / 1.01325 - 1.0));

SIMD/SSE : short dot product and short max value

I'm trying to optimize a dot product of two c-style arrays of contant and small size and of type short.
I've read several documentations about SIMD intrinsics and many blog posts/articles about dot product optimization using this intrisincs.
However, i don't understand how a dot product on short arrays using this intrinsics can give the right result. When making the dot product, the computed values can be (and are always, in my case) greater than SHORT_MAX, so is there sum. Hence, i store them in a variable of double type.
As i understand the dot product using simd intrinsic, we use __m128i variables types and operations are returning __m128i. So, what i don't understand is why it doesn't "overflow" and how the result can be transformed into a value type that can handle it?
thanks for your advices
Depending on the range of your data values you might use an intrinsic such as _mm_madd_epi16, which performs multiply/add on 16 bit data and generates 32 bit terms. You would then need to periodically accumulate your 32 bit terms to 64 bits. How often you need to do this depends on the range of your input data, e.g. if it's 12 bit greyscale image data then you can do 64 iterations at 8 elements per iteration (i.e. 512 input points) before there is the potential for overflow. In the worst case however, if your input data uses the full 16 bit range, then you would need to do the additional 64 bit accumulate on every iteration (i.e. every 8 points).
Just for the records, here is how i make an dot product for 2 int16 arrays of size 36:
double dotprod(const int16_t* source, const int16_t* target, const int size){
#ifdef USE_SSE
int res[4];
__m128i* src = (__m128i *) source;
__m128i* t = (__m128i *) target;
__m128i s = _mm_madd_epi16(_mm_loadu_si128(src), mm_loadu_si128(t));
++src;
++t;
s = _mm_add_epi32(s, _mm_madd_epi16(_mm_loadu_si128(src), _mm_loadu_si128(t)));
++src;
++t;
s = _mm_add_epi32(s, _mm_madd_epi16(_mm_loadu_si128(src), _mm_loadu_si128(t)));
++src;
++t;
s = _mm_add_epi32(s, _mm_madd_epi16(_mm_loadu_si128(src), _mm_loadu_si128(t)));
/* return the sum of the four 32-bit sub sums */
_mm_storeu_si128((__m128i*)&res, s);
return res[0] + res[1] + res[2] + res[3] + source[32] * target[32] + source[33] * target[33] + source[34] * target[34] + source[35] * target[35];
#elif USE_AVX
int res[8];
__m256i* src = (__m256i *) source;
__m256i* t = (__m256i *) target;
__m256i s = _mm256_madd_epi16(_mm256_loadu_si256(src), _mm256_loadu_si256(t));
++src;
++t;
s = _mm256_add_epi32(s, _mm256_madd_epi16(_mm256_loadu_si256(src), _mm256_loadu_si256(t)));
/* return the sum of the 8 32-bit sub sums */
_mm256_storeu_si256((__m256i*)&res, s);
return res[0] + res[1] + res[2] + res[3] + res[4] + res[5] + res[6] + res[7] + source[32] * target[32] + source[33] * target[33] + source[34] * target[34] + source[35] * target[35];
#else
return source[0] * target[0] + source[1] * target[1] + source[2] * target[2] + source[3] * target[3] + source[4] * target[4]+ source[5] * target[5] + source[6] * target[6] + source[7] * target[7] + source[8] * target[8] + source[9] * target[9] + source[10] * target[10] + source[11] * target[11] + source[12] * target[12] + source[13] * target[13] + source[14] * target[14] + source[15] * target[15] + source[16] * target[16] + source[17] * target[17] + source[18] * target[18] + source[19] * target[19] + source[20] * target[20] + source[21] * target[21] + source[22] * target[22] + source[23] * target[23] + source[24] * target[24] + source[25] * target[25] + source[26] * target[26] + source[27] * target[27] + source[28] * target[28] + source[29] * target[29] + source[30] * target[30] + source[31] * target[31] + source[32] * target[32] + source[33] * target[33] + source[34] * target[34] + source[35] * target[35];
#endif
}

Strange multiplication result

In my code I have this multiplications in a C++ code with all variable types as double[]
f1[0] = (f1_rot[0] * xu[0]) + (f1_rot[1] * yu[0]);
f1[1] = (f1_rot[0] * xu[1]) + (f1_rot[1] * yu[1]);
f1[2] = (f1_rot[0] * xu[2]) + (f1_rot[1] * yu[2]);
f2[0] = (f2_rot[0] * xu[0]) + (f2_rot[1] * yu[0]);
f2[1] = (f2_rot[0] * xu[1]) + (f2_rot[1] * yu[1]);
f2[2] = (f2_rot[0] * xu[2]) + (f2_rot[1] * yu[2]);
corresponding to these values
Force Rot1 : -5.39155e-07, -3.66312e-07
Force Rot2 : 4.04383e-07, -1.51852e-08
xu: 0.786857, 0.561981, 0.255018
yu: 0.534605, -0.82715, 0.173264
F1: -6.2007e-07, -4.61782e-16, -2.00963e-07
F2: 3.10073e-07, 2.39816e-07, 1.00494e-07
this multiplication in particular produces a wrong value -4.61782e-16 instead of 1.04745e-13
f1[1] = (f1_rot[0] * xu[1]) + (f1_rot[1] * yu[1]);
I hand verified the other multiplications on a calculator and they all seem to produce the correct values.
this is an open mpi compiled code and the above result is for running a single processor, there are different values when running multiple processors for example 40 processors produces 1.66967e-13 as result of F1[1] multiplication.
Is this some kind of mpi bug ? or a type precision problem ? and why does it work okay for the other multiplications ?
Your problem is an obvious result of what is called catastrophic summations:
As we know, a double precision float can handle numbers of around 16 significant decimals.
f1[1] = (f1_rot[0] * xu[1]) + (f1_rot[1] * yu[1])
= -3.0299486605499998e-07 + 3.0299497080000003e-07
= 1.0474500005332475e-13
This is what we obtain with the numbers you have given in your example.
Notice that (-7) - (-13) = 6, which corresponds to the number of decimals in the float you give in your example: (ex: -5.39155e-07 -3.66312e-07, each mantissa is of a precision of 6 decimals). It means that you used here single precision floats.
I am sure that in your calculations, the precision of your numbers is bigger, that's why you find a more precise result.
Anyway, if you use single precision floats, you can't expect a better precision. With a double precision, you can find a precision up to 16. You shouldn't trust a difference between two numbers, unless it is bigger than the mantissa:
Simple precision floats: (a - b) / b >= ~1e-7
Double precision floats: (a - b) / b >= ~4e-16
For further information, see these examples ... or the table in this article ...

What is the correct way to use C++ style casts to perform an expression at a desired precision?

Given the following:
int a = 10, b = 5, c = 3, d = 1;
int x = 3, y = 2, z = 2;
return (float) a/x + b/y + c/z + d;
This presumably casts our precision to float and then performs our sequence of divisions at floating point precision.
What is the correct way to update this using C++ style casts?
Should this really be rewritten as:
return static_cast<float>(a) / static_cast<float>(b) + ... ?
Start by correcting your code:
(float) a/x + b/y + c/z + d
produces 7.33333, while the correct result is 8.33333. Why? because b/y and c/z divisions are done in ints (demo).
The reason the result is incorrect is that division takes precedence over addition: your program needs to divide b by y and c by z before adding them to the result of division of a by x, which is float.
You need to cast one of the division operands to get this to work correctly. C cast works fine, but if you would rather use C++-style cast, here is how you can do it:
return static_cast<float>(a) / b + static_cast<float>(b) / y +
static_cast<float>(c) / z + d;
/ has higher precedence than +, so b/y will be performed in int, not in float.
The correct way to perform each division in float is to cast at least one operand to float:
static_cast<float>(a)/x + static_cast<float>(b)/y + static_cast<float>(c)/z + d
This is clearer than the equivalent C expression:
(float) a/x + (float) b/y + (float) c/z + d
Here one requires knowledge of precedence to realise that the cast to float binds tighter than the division.
return (float) a/x + b/y + c/z + d;
is not correct if you want to return the float value of sum of all divisions. In above expression only a/x is float division and rest of them are int division (because of heiger precedence of / operator than +) which will result in value truncation. Better to stick with
return (double)a/x + (double)b/y + (double)c/z + d;
int a = 10, b = 5, c = 3, d = 1;
int x = 3, y = 2, z = 2;
return (float) a/x + b/y + c/z + d;
This presumably casts our precision to float and then performs our sequence of divisions at floating point precision.
No, it casts a to float and so a/x is performed as a floating point divide, but b/y and c/z are integer divides. Afterwards, the sums are computed after converting the integer division results to float.
This is because casts are simply another operator, and they have higher precedence than + and /. Dividing float by an int or adding a float to an int causes the ints to be automatically converted to floats.
If you want floating point division then you need to insert casts so that they are applied prior to the divisions, and then the other values get automatically promoted.
return (float) a/x + (float) b/y + (float) c/z + d;
Casting using C++ syntax is exactly the same, except the syntax won't let you get confused about what's actually being cast:
return static_cast<float>(a)/x + static_cast<float>(b)/y + static_cast<float>(c)/z + d;
You can also use constructor syntax, which also has the benefit of clearly showing what's cast:
return float(a)/x + float(b)/y + float(c)/z + d;
Or you can simply use temporary variables:
float af = a, bf = b, cf = c;
return af/x + bf/y + cf/z + d;
The cast is only necessary with division operation. And you can lighten syntax this way:
return 1.0*a/x + 1.0*b/y + 1.0*c/z + d;
This will compute the result as double type, that gets automatically casted to float if the function returns this type.