Floats being rounded in C++ and I don't understand why

Floats being rounded in C++ and I don't understand why - c++

I am very confused about this... Here is an extract from my code..
float m = 0.0, c = 0.0;
printf("toprightx = %d bottomrightx = %d toprighty = %d bottomrighty = %d\n",
toprightx, bottomrightx, toprighty, bottomrighty);
// find m and c for symmetry line
if (toprightx == bottomrightx) {
m = (-toprighty + bottomrighty);
}
else {
m = (-toprighty + bottomrighty) / (toprightx - bottomrightx);
}
c = -toprighty - (m * toprightx);
printf("m = %f and c = %f\n", m, c);
And here is the output:
toprightx = 241 bottomrightx = 279 toprighty = 174 bottomrighty = 321
m = -3.000000 and c = 549.000000
Why is the output rounding m and c? I have declared them as floats so I don't understand why the code is returning integers. The correct value of m should be -3.8684.
(Note that toprightx, bottomrightx, toprighty, bottomrighty have been declared as integers further up in the code.)

Note that toprightx, bottomrightx, toprighty, bottomrighty have been
declared as integers further up in the code.
There's your answer. Calculations that involve only integers are performed in integer math, including divisions. It doesn't matter that the result is then assigned to a float.
To fix this, either declare at least one of the x/y values as float or cast it to float in the calculation.

You are performing integer division on this line:
(-toprighty + bottomrighty) / (toprightx - bottomrightx);
Since topright, bottomrighty, toprightx, and bottomrightx are all integers, the result of that equation will also be an integer. After the equaition calculates an integer you are assigning it to a float. It is equivalent to:
float m = -3;
You could do something like this instead:
(-toprighty + bottomrighty + 0.0) / (toprightx - bottomrightx);

Here's a hint for you:
m = (-toprighty + bottomrighty) / (toprightx - bottomrightx);
^int ^int ^int ^int
All of those operations will be performed using integer division (truncating floating points) and then cast to float. Try instead:
m = float(-toprighty + bottomrighty) / (toprightx - bottomrightx);

That's because you're using only int's on your calculations, so C++ uses integer calculation for them. Just cast one of your int variables to float and you'll be good.
Changing this statement m = (-toprighty + bottomrighty) / (toprightx - bottomrightx); to m = (-toprighty + bottomrighty) / (float)(toprightx - bottomrightx); will do that.

declare toprightx, bottomrightx, toprighty, bottomrighty as floats or cast them to floats before asking for mixed arithmetic.

Casting(implicitly, as you're doing) a float to an int will truncate the data that won't fit in the new type.
Note that your data isn't being rounded either, it's being truncated.

Try casting the divisor to a floating point number, to force the division to use floating point arithmetic:
m = (-toprighty + bottomrighty) / (float)(toprightx - bottomrightx);

Related

How do I avoid getting -0 when dividing in c++

I have a script in which I want to find the chunk my player is in.
Simplified version:
float x = -5
float y = -15
int chunkSize = 16
int player_chunk_x = int(x / chunkSize)
int player_chunk_y = int(y / chunkSize)
This gives the chunk the player is in, but when x or y is negative but not less than the chunkSize (-16), player_chunk_x or player_chunk_y is still 0 or '-0' when I need -1
Of course I can just do this:
if (x < 0) x--
if (y < 0) y--
But I was wondering if there is a better solution to my problem.
Thanks in advance.

Since C++20 it's impossible to get an integral type signed negative zero, and was only possible in a rare (but by no means extinct) situation where your platform had 1's complement int. It's still possible in C (although rare), and adding 0 to the result will remove it.
It's possible though to have a floating point signed negative zero. For that, adding 0.0 will remove it.
Note that for an integral -0, subtracting 1 will yield -1.

Your issue is that you are casting a floating point value to an integer value.
This rounds to zero by default.
If you want consistent round down, you first have to floor your value:
int player_chunk_x = int(std::floor(x / chunkSize);

If you don't like negative numbers then don't use them:
int player_chunk_x = (x - min_x) / chunkSize;
int player_chunk_y = (y - min_y) / chunkSize;

If you want integer, in this case -1 on ( -5%16 or anything like it ) then this is possible using a math function:
Possible Ways :
using floor ->
float x = -5;
float y = -15;
int chunkSize = 16;
int player_chunk_x = floor(x / chunkSize)
// will give -1 for (-5 % 16);
// 0 for (5%16)
// 1 for any value between 1 & 2 and so on
int player_chunk_y = floor(y / chunkSize);

Explicit casting in C/C++

After reading several SO posts on the subject I am still confused, mainly concerning to integer and boolean variables/expressions.
A. Integer expressions
Suppose I want to use modulo expression in a floating point computation, what, if any, is the most correct of the following? Is there any difference between C and C++? or should I just trust the compiler to make the correct conversion?
double sign;
int num = rand() % 100;
//want to map odd num to -1.0 and even num to 1.0
//A
sign = -2 * (num % 2) + 1;
//B
sign = -2.0 * (num % 2) + 1;
//C
sign = -2.0 * (num % 2) + 1.0;
//D
sign = -2 * (num % 2) + 1.0;
//E
sign = -2 * (double)(num % 2) + 1;
//F
sign = -2.0 * (double)(num % 2) + 1;
//G
sign = -2.0 * (double)(num % 2) + 1.0;
//H
sign = -2 * (double)(num % 2) + 1.0;
B. Boolean expressions
Can I use a boolean expression, safely, as an element in floating / integer computations without explicit casting? Is there a difference between C and C++?
double d_res = 1.0;
int i_res = 1;
int num = rand() % 10;
d_res = d_res + (num > 5);//or d_res = d_res + (double)(num > 5)?
i_res += (num > 5);//or i_res += (int)(num > 5)?

A. The initialization
double sign = -2 * (num % 2) + 1;
is perfectly well-defined. That's what I'd use; I don't think there's any need to complicate things with extra casts or anything.
C and C++ are well-defined and convenient in their implicit conversions between integer and floating-point types. Explicit conversions are usually not needed. In my experience there are only three things to worry about:
Code like double ratio = 1 / 3 doesn't do what you want; you need to force one of the operands to / to be floating-point. (This has nothing to do with your question, but it's an extremely easy mistake to make.)
Overflow, if one type or the other can't represent the value. (Also not a problem for your example.)
Overzealous compilers. Many compilers will "helpfully" warn you that you might lose precision when converting from double to float, or from a floating-point type to an integer. So you may need explicit casts to silence those warnings.
B. Asking for the numeric value of a Boolean is perfectly well-defined (is guaranteed to give you a nice, clean, 1 or 0), so your second fragment should be fine also. (I know this is true for C, and per a comment below, it's true for C++ also.)

What is the correct way to use C++ style casts to perform an expression at a desired precision?

Given the following:
int a = 10, b = 5, c = 3, d = 1;
int x = 3, y = 2, z = 2;
return (float) a/x + b/y + c/z + d;
This presumably casts our precision to float and then performs our sequence of divisions at floating point precision.
What is the correct way to update this using C++ style casts?
Should this really be rewritten as:
return static_cast<float>(a) / static_cast<float>(b) + ... ?

Start by correcting your code:
(float) a/x + b/y + c/z + d
produces 7.33333, while the correct result is 8.33333. Why? because b/y and c/z divisions are done in ints (demo).
The reason the result is incorrect is that division takes precedence over addition: your program needs to divide b by y and c by z before adding them to the result of division of a by x, which is float.
You need to cast one of the division operands to get this to work correctly. C cast works fine, but if you would rather use C++-style cast, here is how you can do it:
return static_cast<float>(a) / b + static_cast<float>(b) / y +
static_cast<float>(c) / z + d;

/ has higher precedence than +, so b/y will be performed in int, not in float.
The correct way to perform each division in float is to cast at least one operand to float:
static_cast<float>(a)/x + static_cast<float>(b)/y + static_cast<float>(c)/z + d
This is clearer than the equivalent C expression:
(float) a/x + (float) b/y + (float) c/z + d
Here one requires knowledge of precedence to realise that the cast to float binds tighter than the division.

return (float) a/x + b/y + c/z + d;
is not correct if you want to return the float value of sum of all divisions. In above expression only a/x is float division and rest of them are int division (because of heiger precedence of / operator than +) which will result in value truncation. Better to stick with
return (double)a/x + (double)b/y + (double)c/z + d;

int a = 10, b = 5, c = 3, d = 1;
int x = 3, y = 2, z = 2;
return (float) a/x + b/y + c/z + d;
This presumably casts our precision to float and then performs our sequence of divisions at floating point precision.
No, it casts a to float and so a/x is performed as a floating point divide, but b/y and c/z are integer divides. Afterwards, the sums are computed after converting the integer division results to float.
This is because casts are simply another operator, and they have higher precedence than + and /. Dividing float by an int or adding a float to an int causes the ints to be automatically converted to floats.
If you want floating point division then you need to insert casts so that they are applied prior to the divisions, and then the other values get automatically promoted.
return (float) a/x + (float) b/y + (float) c/z + d;
Casting using C++ syntax is exactly the same, except the syntax won't let you get confused about what's actually being cast:
return static_cast<float>(a)/x + static_cast<float>(b)/y + static_cast<float>(c)/z + d;
You can also use constructor syntax, which also has the benefit of clearly showing what's cast:
return float(a)/x + float(b)/y + float(c)/z + d;
Or you can simply use temporary variables:
float af = a, bf = b, cf = c;
return af/x + bf/y + cf/z + d;

The cast is only necessary with division operation. And you can lighten syntax this way:
return 1.0*a/x + 1.0*b/y + 1.0*c/z + d;
This will compute the result as double type, that gets automatically casted to float if the function returns this type.

C++ How do I set the fractional part of a float?

I know how to get the fractional part of a float but I don't know how to set it. I have two integers returned by a function, one holds the integer and the other holds the fractional part.
For example:
int a = 12;
int b = 2; // This can never be 02, 03 etc
float c;
How do I get c to become 12.2? I know I could add something like (float)b \ 10 but then what if b is >= than 10? Then I would have to divide by 100, and so on. Is there a function or something where I can do setfractional(c, b)?
Thanks
edit: The more I think about this problem the more I realize how illogical it is. if b == 1 then it would be 12.1 but if b == 10 it would also be 12.1 so I don't know how I'm going to handle this. I'm guessing the function never returns a number >= 10 for fractional but I don't know.

Something like:
float IntFrac(int integer, int frac)
{
float integer2 = integer;
float frac2 = frac;
float log10 = log10f(frac2 + 1.0f);
float ceil = ceilf(log10);
float pow = powf(10.0f, -ceil);
float res = abs(integer);
res += frac2 * pow;
if (integer < 0)
{
res = -res;
}
return res;
}
Ideone: http://ideone.com/iwG8UO
It's like saying: log10(98 + 1) = log10(99) = 1.995, ceilf(1.995) = 2, powf(10, -2) = 0.01, 99 * 0.01 = 0.99, and then 12 + 0.99 = 12.99 and then we check for the sign.
And let's hope the vagaries of IEEE 754 float math won't hit too hard :-)
I'll add that it would be probably better to use double instead of float. Other than 3d graphics, there are very few fields were using float is a good idea nowadays.

The most trivial method would be counting the digits of b and then divide accordingly:
int i = 10;
while(b > i) // rather slow, there are faster ways
i*= 10;
c = a + static_cast<float>(b)/i;
Note that due to the nature of float the result might not be what you expected. Also, if you want something like 3.004 you can modify the initial value of i to another power of ten.

kindly try this below code after including include math.h and stdlib.h file:
int a=12;
int b=22;
int d=b;
int i=0;
float c;
while(d>0)
{
d/=10;
i++;
}
c=a+(float)b/pow(10,i);

How i can make matlab precision to be the same as in c++?

I have problem with precision. I have to make my c++ code to have same precision as matlab. In matlab i have script which do some stuff with numbers etc. I got code in c++ which do the same as that script. Output on the same input is diffrent :( I found that in my script when i try 104 >= 104 it returns false. I tried to use format long but it did not help me to find out why its false. Both numbers are type of double. i thought that maybe matlab stores somewhere the real value of 104 and its for real like 103.9999... So i leveled up my precision in c++. It also didnt help because when matlab returns me value of 50.000 in c++ i got value of 50.050 with high precision. Those 2 values are from few calculations like + or *. Is there any way to make my c++ and matlab scrips have same precision?
for i = 1:neighbors
y = spoints(i,1)+origy;
x = spoints(i,2)+origx;
% Calculate floors, ceils and rounds for the x and y.
fy = floor(y); cy = ceil(y); ry = round(y);
fx = floor(x); cx = ceil(x); rx = round(x);
% Check if interpolation is needed.
if (abs(x - rx) < 1e-6) && (abs(y - ry) < 1e-6)
% Interpolation is not needed, use original datatypes
N = image(ry:ry+dy,rx:rx+dx);
D = N >= C;
else
% Interpolation needed, use double type images
ty = y - fy;
tx = x - fx;
% Calculate the interpolation weights.
w1 = (1 - tx) * (1 - ty);
w2 = tx * (1 - ty);
w3 = (1 - tx) * ty ;
w4 = tx * ty ;
%Compute interpolated pixel values
N = w1*d_image(fy:fy+dy,fx:fx+dx) + w2*d_image(fy:fy+dy,cx:cx+dx) + ...
w3*d_image(cy:cy+dy,fx:fx+dx) + w4*d_image(cy:cy+dy,cx:cx+dx);
D = N >= d_C;
end
I got problems in else which is in line 12. tx and ty eqauls 0.707106781186547 or 1 - 0.707106781186547. Values from d_image are in range 0 and 255. N is value 0..255 of interpolating 4 pixels from image. d_C is value 0.255. Still dunno why matlab shows that when i have in N vlaues like: x x x 140.0000 140.0000 and in d_C: x x x 140 x. D gives me 0 on 4th position so 140.0000 != 140. I Debugged it trying more precision but it still says that its 140.00000000000000 and it is still not 140.
int Codes::Interpolation( Point_<int> point, Point_<int> center , Mat *mat)
{
int x = center.x-point.x;
int y = center.y-point.y;
Point_<double> my;
if(x<0)
{
if(y<0)
{
my.x=center.x+LEN;
my.y=center.y+LEN;
}
else
{
my.x=center.x+LEN;
my.y=center.y-LEN;
}
}
else
{
if(y<0)
{
my.x=center.x-LEN;
my.y=center.y+LEN;
}
else
{
my.x=center.x-LEN;
my.y=center.y-LEN;
}
}
int a=my.x;
int b=my.y;
double tx = my.x - a;
double ty = my.y - b;
double wage[4];
wage[0] = (1 - tx) * (1 - ty);
wage[1] = tx * (1 - ty);
wage[2] = (1 - tx) * ty ;
wage[3] = tx * ty ;
int values[4];
//wpisanie do tablicy 4 pixeli ktore wchodza do interpolacji
for(int i=0;i<4;i++)
{
int val = mat->at<uchar>(Point_<int>(a+help[i].x,a+help[i].y));
values[i]=val;
}
double moze = (wage[0]) * (values[0]) + (wage[1]) * (values[1]) + (wage[2]) * (values[2]) + (wage[3]) * (values[3]);
return moze;
}
LEN = 0.707106781186547 Values in array values are 100% same as matlab values.

Matlab uses double precision. You can use C++'s double type. That should make most things similar, but not 100%.
As someone else noted, this is probably not the source of your problem. Either there is a difference in the algorithms, or it might be something like a library function defined differently in Matlab and in C++. For example, Matlab's std() divides by (n-1) and your code may divide by n.

First, as a rule of thumb, it is never a good idea to compare floating point variables directly. Instead of, for example instead of if (nr >= 104) you should use if (nr >= 104-e), where e is a small number, like 0.00001.
However, there must be some serious undersampling or rounding error somewhere in your script, because getting 50050 instead of 50000 is not in the limit of common floating point imprecision. For example, Matlab can have a step of as small as 15 digits!
I guess there are some casting problems in your code, for example
int i;
double d;
// ...
d = i/3 * d;
will will give a very inaccurate result, because you have an integer division. d = (double)i/3 * d or d = i/3. * d would give a much more accurate result.
The above example would NOT cause any problems in Matlab, because there everything is already a floating-point number by default, so a similar problem might be behind the differences in the results of the c++ and Matlab code.
Seeing your calculations would help a lot in finding what went wrong.
EDIT:
In c and c++, if you compare a double with an integer of the same value, you have a very high chance that they will not be equal. It's the same with two doubles, but you might get lucky if you perform the exact same computations on them. Even in Matlab it's dangerous, and maybe you were just lucky that as both are doubles, both got truncated the same way.
By you recent edit it seems, that the problem is where you evaluate your array. You should never use == or != when comparing floats or doubles in c++ (or in any languages when you use floating-point variables). The proper way to do a comparison is to check whether they are within a small distance of each other.
An example: using == or != to compare two doubles is like comparing the weight of two objects by counting the number of atoms in them, and deciding that they are not equal even if there is one single atom difference between them.

MATLAB uses double precision unless you say otherwise. Any differences you see with an identical implementation in C++ will be due to floating-point errors.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js