Why does int/float multiplication lead to different results? - c++

If I multiply a float and and integer like below, why does all multiplications lead to a differnt result? My expectation was a consistent result. I thought in both cases the int value gets implicitly converted to a float before multiplication. But there seems to be a difference. What is the reason for this differnt handling?
int multiply(float val, int multiplier)
{
return val * multiplier;
}
int multiply2(float val, int multiplier)
{
return float(val * multiplier);
}
float val = 1.3f;
int result0 = val * int(10); // 12
int result1 = 1.3f * int(10); // 13
int result3 = multiply(1.3f, 10); //12
int result4 = multiply2(1.3f, 10); // 13
Thank you
Thorsten

What likely happens for you is:
Assuming IEEE or similar floats, 1.3 can not be represented, and likely is something like 1.299999 which multiplied by 10 is 12.99999 which then truncated to int is 12.
However 1.3 * 10 can be evaluated at compile time, leading most likely to an accurate representation of 13.
Depending on how your code is actually structured, what compiler is used, and which settings it is used with, it could evaluate either one to 12 or 13, depending on whether it does this at run, or compile time.
For completeness, with the following code, I could reproduce it:
extern int result0;
extern int result1;
float val = 1.3f;
void foo( )
{
result0 = val * int(10); // 12
result1 = 1.3f * int(10); // 13
}

Related

Confusion with simple math in C++

I was playing around with C++ today and this is what I found after doing some tests:
#include <iostream>
using namespace std;
int main()
{
int myInt = 100;
int myInt2 = myInt + (0.5 * 17); // the math inside the parentheses should go first
int difference = myInt2 - myInt; // get the difference of the two numbers
cout << difference << endl; // difference is 8
}
the output is 8
#include <iostream>
using namespace std;
int main()
{
int myInt = 100;
int myInt2 = myInt - (0.5 * 17); // the math inside the parentheses should still go first
int difference = myInt - myInt2; // get the difference of the two numbers
cout << difference << endl; // difference is 9?
}
The output is 9?
So according to my first code sample, 0.5 * 17 = 8, but according to my second code sample, 0.5 * 17 = 9. I am aware that I would get the same results without the parentheses, but I am using them to help illustrate what I am doing in my code.
To help narrow down the problem, I tried this:
#include <iostream>
using namespace std;
int main()
{
int myInt = 100;
int myInt2 = 0.5 * 17; // use a variable instead of parentheses
int myInt3 = myInt + myInt2;
int difference = myInt3 - myInt; // get the difference of the two numbers
cout << difference << endl; // difference is 8
}
the output is 8
#include <iostream>
using namespace std;
int main()
{
int myInt = 100;
int myInt2 = 0.5 * 17; // continue using a variable
int myInt3 = myInt - myInt2;
int difference = myInt - myInt3; // get the difference of the two numbers
cout << difference << endl; // difference is 8 again!
}
the output is 8 again!
So my question is, if the math in the parentheses always comes first, then why am I getting different results? Shouldn't the first two trials have the same results as the second two trials? Another thing I should mention is that I have had these same results with other decimal numbers like 0.1 and 0.3.
The expression 0.5 * 17 results in a floating point value by virtue of the fact that one of its components, 0.5, is floating point.
However, when you assign a floating point number like 8.5 to an integer, it's truncated to 8. And, to be clear about this, it's truncated at the point of assignment.
So the first code segment calculates the value 100 + 8.5 = 108.5 and truncates that to 108 when you assign it to the integer(a). The subtraction of 100 from that then gives you 8.
In the second code segment, you calculate the value 100 - 8.5 = 91.5 and truncate that to 91 when assigning to the integer. The subtraction of that from 100 gives you 9.
The reason the final two code segments work is because the 8.5 is being truncated earlier in int myInt2 = 0.5 * 17, before being added to or subtracted from 100. In that case, 100 + 8 = 108 and 100 - 8 = 92, both which are different from 100 by exactly 8 (sign notwithstanding).
(a) It may help to think of it graphically:
int myInt2 = myInt + (0.5 * 17);
^ int + double * int
| \ \__________/
| \ |
| \ double
| \_____________/
| |
+- truncate <- double
The 0.5 * 17 is calculated first to give the floating point 8.5.
This is then added to the integer myInt = 100 to give the floating point 108.5.
This is then truncated to 108 when being assigned to the integer myInt2.
.5 * 17 is not 8, it is 8.5
int myInt = 100;
int myInt2 = myInt + (0.5 * 17);
This computes 100 + (0.5 * 17) or 108.5, which gets truncated to 108.
int difference = myInt2 - myInt;
This computes 108 - 100, or 8, which is the result you see.
In your second example:
int myInt = 100;
int myInt2 = myInt - (0.5 * 17);
This calculates 100 - 8.5, or 91.5, which gets truncated to 91.
int difference = myInt - myInt2;
This calculates 100 - 91, or 9.
In the same fashion you can work through the rest of your examples.
There's a very useful tool on your computer. It's called a "debugger". Using this tool you can step through any of your programs, one line at a time, and see for yourself the values of all variables at each step of the way.
When doing maths with integers any decimal parts are lost AT EACH STEP.
So .5 * 17 gives 8.5, but the result stored in an integer variable is 8 which is used in subsequent steps, including output.

Bitwise operations and shifts problems

I am testing the function fitsBits(int x, int n) on my own and I figure out there is a condition that doesn't fit in this function, what is the problem?
/*
* fitsBits - return 1 if x can be represented as an
* n-bit, two's complement integer.
* 1 <= n <= 32
* Examples: fitsBits(5,3) = 0, fitsBits(-4,3) = 1
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 15
* Rating: 2
*/
int fitsBits(int x, int n) {
int r, c;
c = 33 + ~n;
r = !(((x << c)>>c)^x);
return r;
}
It seems like it gives the wrong answer in
fitsBits(0x80000000, 0x20);
It gives me 1, but actually it should be 0...
How could I fix it?
Thank you!
fitsBits(0x80000000, 0x20);
This function returns 1, because the first argument of your function is int, which is (in practice these days) a 32 bit signed integer. The largest value that signed 32 bit integer can represent is 0x7FFFFFFF, which is less than the value you are passing in. Because of that your value gets truncated and becomes -0x80000000, something that 32 bit integer can represent. Therefore your function returns 1 (yes, my first argument is something that can be represented using 0x20 = 32 bits).
If you want your function to properly classify number 0x80000000 as something that cannot be represented using 32 bits, you need to change the type of the first argument of your function. One options would've been using an unsigned int, but from your problem definition it seems like you need to properly handle negative numbers, so your remaining option is long long int, that can hold numbers between -0x8000000000000000 and 0x7FFFFFFFFFFFFFFF.
You will need to do couple more adjustments: you need to explicitly specify that your constant is of type long long by using LL suffix, and you now need to shift by 64 - c, not by 32 - c:
#include <stdio.h>
int fitsBits(long long x, int n) {
long long r;
int c;
c = 65 + ~n;
r = !(((x << c)>>c)^x);
return r;
}
int main() {
printf("%d\n", fitsBits(0x80000000LL, 0x20));
return 0;
}
Link to IDEONE: http://ideone.com/G8I3kZ
Left shifts that cause overflow are undefined for signed types. Hence the compiler may optimise (x<<c)>>c as simply x, and the entire function reduces down to return 1;.
Probably you want to use unsigned types.
A second cause of undefined behavior in your code is that c may be greater than or equal to the width of int. A shift of more than the width of the integer type is undefined behavior.
r = (((x << c)>>c)^x); //This will give you 0, meaning r = 0;
OR
r = !((x << c)>>c);
Your function can be simplified to
int fitsBits(int x) {
int r, c;
c = 33;
r = (((x << c)>>c)^x);
return r;
}
Note that when NOT(!) is brought you're asking for opposite of r

Why is my double or int value is always 0 after division?

I'm fairly new to C++ and I'm experiencing some strange behaviour from a percentage increase method I am writing for some image editing software.
What I want to do is give the R G or B value of the current pixel and divide it by some modifier, then multiply it by the new value to return the percentage increase, fairly easy concept.
However, whenever I run my debugger, the return value is always 0, I thought this may be because I was trying to do operations which give negative numbers on an integer (or maybe a divide by zero could occur?), so I tried to use a double to store the output of the computation, however I've had no luck.
The code I'm struggling with is below:
int Sliders::getPercentageIncrease(int currPixel, int newValue, int modifier)
{
// calculate return value
double returnVal = (currPixel / modifier) * newValue;
// Check we are returning a positive integer
if(returnVal >= 0)
return (int)returnVal;
// Return a negative integer value
return (int)(0 - returnVal);
}
What am I doing wrong here?
NOTE: I have checked values, of inputs in my debugger and I get stuff like:
currPixel = 30
newValue = 119
modifier = 200
From this I would expect an output of 18 (I am not concerned with returning decimal figures)
Your current calculation only involves integers and so will be affected by integer division (which truncates the result to the nearest integer value).
(currPixel / modifier) * newValue
| |
---------------integer division e.g. 10/3 = 3, not 3.333
The result is then cast to double, but the accuracy is lost before this point.
Consider the following:
#include <iostream>
using namespace std;
int main() {
int val1 = 10;
int val2 = 7;
int val3 = 9;
double outval1 = (val1 / val2) * val3;
double outval2 = ((double)val1 / val2) * val3;
cout << "without cast: " << outval1 << "\nwith cast: "<< outval2 << std::endl;
return 0;
}
The output of this is:
without cast: 9
with cast: 12.8571
See it here
Note that the cast has to be applied in the right place:
(double)(val1 / val2) * val3 == 9.0 //casts result of (val1/val2) after integer division
(val1 / val2) * (double)val3 == 9.0 //promotes result of (val1/val2) after integer division
((double)val1 / val2) * val3 == 12.8571 //promotes val2 before division
(val1 / (double)val2) * val3 == 12.8571 //promotes val1 before division
Due to promotion of the other operands, if in doubt you can just cast everything and the resulting code will be the same:
((double)val1 / (double)val2) * (double)val3 == 12.8571
It is a little more verbose though.
Since all three parameters are integer the result of the calculation
double returnVal = (currPixel / modifier) * newValue;
will always be truncated. Add cast to (double) and the result should be fine. Simply:
double returnVal = ((double)currPixel / modifier) * newValue;
If you only set a cast before the bracket the result of the division stays an integer.
As long as all values are in a range, let me say, less than 1000 and greater (or equal) than 0, which is common on colour values, do something like
int returnVal = (currPixel * newValue) / modifier
No need for doubles; it will even speed up the code.
Needless to say, modifiershould not be zero.
Do this:
// calculate return value
double returnVal = (static_cast<double>(currPixel) / modifier) * newValue;
Or this:
double returnVal = (currPixel / static_cast<double>(modifier)) * newValue;
As you know that operator / will be performed first, and then the operator *. I have typecasted one of the operands of / to double, and hence division will be performed double. Now, left operand of * would be double (since / produced double), and the multiplication would be performed double also. For clarity and correctness, you may write:
double returnVal = (static_cast<double>(currPixel) / static_cast<double>(modifier)) * static_cast<double>(newValue);
Or simply:
double returnVal = (double(currPixel) / (double)modifier) * (double)newValue;
But, following is WRONG:
double returnVal = (double)(currPixel / modifier) * /*(double)*/ newValue;
Since the division would be performed int only! It is like:
double x = 10/3;
Where you need (either):
double x = 10.0/3;
double x = 10/3.0;
double x = (double)10/3;
casting to double should fix the error.
double returnVal = (double ) (currPixel) / (modifier) * newValue;
see type casting rules typecasting rules in c.

Modulo division returning negative number

I am carrying out the following modulo division operations from within a C program:
(5^6) mod 23 = 8
(5^15) mod 23 = 19
I am using the following function, for convenience:
int mod_func(int p, int g, int x) {
return ((int)pow((double)g, (double)x)) % p;
}
But the result of the operations when calling the function is incorrect:
mod_func(23, 5, 6) //returns 8
mod_func(23, 5, 15) //returns -6
Does the modulo operator have some limit on the size of the operands?
5 to the power 15 is 30,517,578,125
The largest value you can store in an int is 2,147,483,647
You could use 64-bit integers, but beware you'll have precision issues when converting from double eventually.
From memory, there is a rule from number theory about the calculation you are doing that means you don't need to compute the full power expansion in order to determine the modulo result. But I could be wrong. Been too many years since I learned that stuff.
Ahh, here it is: Modular Exponentiation
Read that, and stop using double and pow =)
int mod_func(int p, int g, int x)
{
int r = g;
for( int i = 1; i < x; i++ ) {
r = (r * g) % p;
}
return r;
}
The integral part of pow(5, 15) is not representable in an int (assuming the width of int is 32-bit). The conversion (from double to int in the cast expression) is undefined behavior in C and in C++.
To avoid undefined behavior, you should use fmod function to perform the floating point remainder operation.
My guess is the problem is 5 ^ 15 = 30517578125 which is greater than INT_MAX (2147483647). You are currently casting it to an int, which is what's failing.
As has been said, your first problem in
int mod_func(int p, int g, int x) {
return ((int)pow((double)g, (double)x)) % p;
}
is that pow(g,x) often exceeds the int range, and then you have undefined behaviour converting that result to int, and whatever the resulting int is, there is no reason to believe it has anything to do with the desired modulus.
The next problem is that the result of pow(g,x) as a double may not be exact. Unless g is a power of 2, the mathematical result cannot be exactly represented as a double for large enough exponents even if it is in range, but it could also happen if the mathematical result is exactly representable (depends on the implementation of pow).
If you do number-theoretic computations - and computing the residue of a power modulo an integer is one - you should only use integer types.
For the case at hand, you can use exponentiation by repeated squaring, computing the residue of all intermediate results. If the modulus p is small enough that (p-1)*(p-1) never overflows,
int mod_func(int p, int g, int x) {
int aux = 1;
g %= p;
while(x > 0) {
if (x % 2 == 1) {
aux = (aux * g) % p;
}
g = (g * g) % p;
x /= 2;
}
return aux;
}
does it. If p can be larger, you need to use a wider type for the calculations.

Can I rely on this to judge a square number in C++?

Can I rely on
sqrt((float)a)*sqrt((float)a)==a
or
(int)sqrt((float)a)*(int)sqrt((float)a)==a
to check whether a number is a perfect square? Why or why not?
int a is the number to be judged. I'm using Visual Studio 2005.
Edit: Thanks for all these rapid answers. I see that I can't rely on float type comparison. (If I wrote as above, will the last a be cast to float implicitly?) If I do it like
(int)sqrt((float)a)*(int)sqrt((float)a) - a < e
How small should I take that e value?
Edit2: Hey, why don't we leave the comparison part aside, and decide whether the (int) is necessary? As I see, with it, the difference might be great for squares; but without it, the difference might be small for non-squares. Perhaps neither will do. :-(
Actually, this is not a C++, but a math question.
With floating point numbers, you should never rely on equality. Where you would test a == b, just test against abs(a - b) < eps, where eps is a small number (e.g. 1E-6) that you would treat as a good enough approximation.
If the number you are testing is an integer, you might be interested in the Wikipedia article about Integer square root
EDIT:
As Krugar said, the article I linked does not answer anything. Sure, there is no direct answer to your question there, phoenie. I just thought that the underlying problem you have is floating point precision and maybe you wanted some math background to your problem.
For the impatient, there is a link in the article to a lengthy discussion about implementing isqrt. It boils down to the code karx11erx posted in his answer.
If you have integers which do not fit into an unsigned long, you can modify the algorithm yourself.
If you don't want to rely on float precision then you can use the following code that uses integer math.
The Isqrt is taken from here and is O(log n)
// Finds the integer square root of a positive number
static int Isqrt(int num)
{
if (0 == num) { return 0; } // Avoid zero divide
int n = (num / 2) + 1; // Initial estimate, never low
int n1 = (n + (num / n)) / 2;
while (n1 < n)
{
n = n1;
n1 = (n + (num / n)) / 2;
} // end while
return n;
} // end Isqrt()
static bool IsPerfectSquare(int num)
{
return Isqrt(num) * Isqrt(num) == num;
}
Not to do the same calculation twice I would do it with a temporary number:
int b = (int)sqrt((float)a);
if((b*b) == a)
{
//perfect square
}
edit:
dav made a good point. instead of relying on the cast you'll need to round off the float first
so it should be:
int b = (int) (sqrt((float)a) + 0.5f);
if((b*b) == a)
{
//perfect square
}
Your question has already been answered, but here is a working solution.
Your 'perfect squares' are implicitly integer values, so you could easily solve floating point format related accuracy problems by using some integer square root function to determine the integer square root of the value you want to test. That function will return the biggest number r for a value v where r * r <= v. Once you have r, you simply need to test whether r * r == v.
unsigned short isqrt (unsigned long a)
{
unsigned long rem = 0;
unsigned long root = 0;
for (int i = 16; i; i--) {
root <<= 1;
rem = ((rem << 2) + (a >> 30));
a <<= 2;
if (root < rem)
rem -= ++root;
}
return (unsigned short) (root >> 1);
}
bool PerfectSquare (unsigned long a)
{
unsigned short r = isqrt (a);
return r * r == a;
}
I didn't follow the formula, I apologize.
But you can easily check if a floating point number is an integer by casting it to an integer type and compare the result against the floating point number. So,
bool isSquare(long val) {
double root = sqrt(val);
if (root == (long) root)
return true;
else return false;
}
Naturally this is only doable if you are working with values that you know will fit within the integer type range. But being that the case, you can solve the problem this way, saving you the inherent complexity of a mathematical formula.
As reinier says, you need to add 0.5 to make sure it rounds to the nearest integer, so you get
int b = (int) (sqrt((float)a) + 0.5f);
if((b*b) == a) /* perfect square */
For this to work, b has to be (exactly) equal to the square root of a if a is a perfect square. However, I don't think you can guarantee this. Suppose that int is 64 bits and float is 32 bits (I think that's allowed). Then a can be of the order 2^60, so its square root is of order 2^30. However, a float only stores 24 bits in the significand, so the rounding error is of order 2^(30-24) = 2^6. This is larger to 1, so b may contain the wrong integer. For instance, I think that the above code does not identify a = (2^30+1)^2 as a perfect square.
I would do.
// sqrt always returns positive value. So casting to int is equivalent to floor()
int down = static_cast<int>(sqrt(value));
int up = down+1; // This is the ceil(sqrt(value))
// Because of rounding problems I would test the floor() and ceil()
// of the value returned from sqrt().
if (((down*down) == value) || ((up*up) == value))
{
// We have a winner.
}
The more obvious, if slower -- O(sqrt(n)) -- way:
bool is_perfect_square(int i) {
int d = 1;
for (int x = 0; x <= i; x += d, d += 2) {
if (x == i) return true;
}
return false;
}
While others have noted that you should not test for equality with floats, I think you are missing out on chances to take advantage of the properties of perfect squares. First there is no point in re-squaring the calculated root. If a is a perfect square then sqrt(a) is an integer and you should check:
b = sqrt((float)a)
b - floor(b) < e
where e is set sufficiently small. There are also a number of integers that you can cross of as non-square before taking the square root. Checking Wikipedia you can see some necessary conditions for a to be square:
A square number can only end with
digits 00,1,4,6,9, or 25 in base 10
Another simple check would be to see that a % 4 == 1 or 0 before taking the root since:
Squares of even numbers are even,
since (2n)^2 = 4n^2.
Squares of odd
numbers are odd, since (2n + 1)^2 =
4(n^2 + n) + 1.
These would essentially eliminate half of the integers before taking any roots.
The cleanest solution is to use an integer sqrt routine, then do:
bool isSquare( unsigned int a ) {
unsigned int s = isqrt( a );
return s * s == a;
}
This will work in the full int range and with perfect precision. A few cases:
a = 0, s = 0, s * s = 0 (add an exception if you don't want to treat 0 as square)
a = 1, s = 1, s * s = 1
a = 2, s = 1, s * s = 1
a = 3, s = 1, s * s = 1
a = 4, s = 2, s * s = 4
a = 5, s = 2, s * s = 4
Won't fail either as you approach the maximum value for your int size. E.g. for 32-bit ints:
a = 0x40000000, s = 0x00008000, s * s = 0x40000000
a = 0xFFFFFFFF, s = 0x0000FFFF, s * s = 0xFFFE0001
Using floats you run into a number of issues. You may find that sqrt( 4 ) = 1.999999..., and similar problems, although you can round-to-nearest instead of using floor().
Worse though, a float has only 24 significant bits which means you can't cast any int larger than 2^24-1 to a float without losing precision, which introduces false positives/negatives. Using doubles for testing 32-bit ints, you should be fine, though.
But remember to cast the result of the floating-point sqrt back to an int and compare the result to the original int. Comparisons between floats are never a good idea; even for square values of x in a limited range, there is no guarantee that sqrt( x ) * sqrt( x ) == x, or that sqrt( x * x) = x.
basics first:
if you (int) a number in a calculation it will remove ALL post-comma data. If I remember my C correctly, if you have an (int) in any calculation (+/-*) it will automatically presume int for all other numbers.
So in your case you want float on every number involved, otherwise you will loose data:
sqrt((float)a)*sqrt((float)a)==(float)a
is the way you want to go
Floating point math is inaccurate by nature.
So consider this code:
int a=35;
float conv = (float)a;
float sqrt_a = sqrt(conv);
if( sqrt_a*sqrt_a == conv )
printf("perfect square");
this is what will happen:
a = 35
conv = 35.000000
sqrt_a = 5.916079
sqrt_a*sqrt_a = 34.999990734
this is amply clear that sqrt_a^2 is not equal to a.