Multiplying float with double overflow in C++ - c++

I am a bit confused of the point of having this warning:
Arithmetic overflow: Using operator '' on a 4byte value and then casting the result to a 8byte value. Cast the value to the wider type before calling '' operator to avoid overflow.
#include <iostream>
using std::cin;
using std::cout;
using std::ios_base;
int main() {
cout.setf(ios_base::fixed, ios_base::floatfield);
double mints = 10.0 / 3.0;
const float c_MILLION = 1e6;
cout << "\n10 million mints: " << 10 * c_MILLION * mints;
cin.get();
}
According to my understanding when we multiply a float value with a double value we are basicaly multiplying a 4byte value to an 8byte value and it we will hence, lose some precision according to the links that I have read:
Cannot implicitly convert type 'double' to 'float'
Multiply a float with a double
http://www.cplusplus.com/articles/DE18T05o/#:~:text=Overflow%20is%20a%20phenomenon%20where,be%20larger%20than%20the%20range
However, when I do output this, I get a double value
https://i.stack.imgur.com/EOQzm.png
If that is the case, why does it bother to warn me to cast c_MILLION to double value if it is automatically changing it to a double result? It cant convert an 8byte value to a 4byte value anyways. So, why does it bother to warn the programmers when it is already saving us from this trouble? Or can it convert an 8byte value to a 4byte value as well. If so, how does it determine what type to print? This is a question that I cannot find the answer to from the links I read.
If it automatically converting the result to 8byte value, what is the point of displaying this warning?
Here is the warning:
https://i.stack.imgur.com/L2szy.png
Severity Code Description Project File Line Suppression State
Warning C26451 Arithmetic overflow: Using operator '*' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '*' to avoid overflow (io.2)
The problem was that I was multiplying an int value with a double value. But still the warning should not exist when it automatically converts the multiplication of an int to double to a double value.

The warning is because of this multiplication: 10 * c_MILLION. There can be some values of c_MILLION where some precision is lost that would not have been lost if c_MILLION was first converted to a double. Since the result of this multiplication is converted to double, a mistaken programmer might assume that no precision was lost beyond what might be expected if the operands were double in the first place. Hence the warning.

Related

Warning about arithmetic overflow when multiplying numbers

I'm writing a program to calculate the result of numbers:
int main()
{
float a, b;
cin >> a >> b;
float result = b + a * a * 0.4;
cout << result;
}
but I have a warning at a * a and it said Warning C26451 Arithmetic overflow: Using operator '*' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '*' to avoid overflow (io.2). Sorry if this a newbie question, can anyone help me with this? Thank you!
In the C language as described in the first edition of K&R, all floating-point operations were performed by converting operands to a common type (specifically double), performing operations with that type, and then if necessary converting the result to whatever type was needed. On many platforms, that was the most convenient and space-efficient way of handling floating-point math. While the Standard still allows implementations to behave that way, it also allows implementations to perform floating-point operations on smaller types to be performed using those types directly.
As written, the subexpression a * a * 0.5; would be performed by multiplying a * a together using float type, then multiply by a value 0.5 which is of type double. This latter multiplication would require converting the float result of a * a to double. If e.g. a had been equal to 2E19f, then performing the multiply using type float would yield a value too large to be represented using that type. Had the code instead performed the multiplication using type double, then the result 4E38 would be representable in that type, and the result of multiplying that by 0.5 (i.e. 2E38) would be within the range that is representable by float.
Note that in this particular situation, the use of float for the intermediate computations would only affect the result if a was within narrow ranges of very large or very small. If instead of multiplying by 0.5 one had multiplied by other values, however, the choice of whether to use float or double for the first multiplication could affect the accuracy of rounding. Generally, using double for both multiplies would yield slightly more accurate results, but at the expense of increased execution time. Using float for both may yield better execution speed, but at the result of reduced precision. If the floating-point constant had been something that isn't precisely representable in float, converting to double and multiplying by a double constant may yield slightly more accurate results than using float for everything, but in most cases where one would want that precision, one would also want the increased position that would be achieved by using double for the first multiply as well.
Let's look at the error message.
Using operator '*' on a 4 byte value
It is describing this code:
a * a
Your float is 4 bytes. The result of the multiplication is 4 bytes. And the result of a multiplication may overflow.
and then casting the result to a 8 byte value.
It is describing this code:
(result) * 0.4;
Your result is 4 bytes. 0.4 is a double, which is 8 bytes. C++ will promote your float result to a double before performing this multiplication.
So...
The compiler is observing that you are doing float math that could overflow and then immediately converting the result to a double, making the potential overflow unnecessary.
Change the code to this to remove the float to double to float conversions.
float result = b + a * a * 0.4f;
I read the question as "how to change the code to remove the warning?".
If you take the advice in the warning's text literally:
float result = b + (double)a * a * 0.4;
But this is nonsense — if an overflow happens, your result will probably not fit into float result.
It looks like in your case overflow is not possible, and you feel perfectly fine doing all calculations with float. If so, just write 0.4f instead of 0.4 — then the constant will have float type (instead of double), and the result will also be float.
If you want to "fix" the problem with overflow
double result = b + (double)a * a * 0.4;
But then you must also change the following code, which uses the result. And you don't remove the possibility of overflow, you just make it much less likely.

How do I end up with a negative value in this chain of transformations?

I am displaying pressure values in the range of 0 - 5, with two decimal places. However recently I got a negative value -0.05 shown.
Can someone explain how this tranformation chain can end up with a negative value? The only idea I have right now is an unreasonably high input value, but maybe I am not seeing something.
(Copied together from various source files)
double value = <the input value in bar>;
value -= 1; //subtract atmospheric pressure
value *= 100; // preserve decimal places
unsigned int ui_value = static_cast<unsigned int>(value);
int intValue = ui_value; // uint value is input into a function taking int as argument
double v = ((double)intValue) / std::pow(10.0f, 2); // scale back
std::stringstream tmp;
tmp << std::fixed << std::setprecision(2) << v;
Somehow this ended up giving me -0.05. I have no idea what the input was, but if everything was functioning correctly, then it should have been a bar value around 2.5 - 3.5.
What happened here that could have resulted in a negative value?? Note that there were 4 pressure sensors all giving the same wonky result, so I somewhat doubt it was a hardware issue, but at the same time I don't see how a "normal" input value could have resulted in a negative value here. Even with a negative input I would expect the conversion to uint to get rid of the sign and give a wrong but positive result.
In case this is somehow compiler dependent, I am using Embarcadero C++ Builder 10.1 Berlin.
The conversion from double type to unsigned is undefined, when unsigned can't "represent" the value of double, for example the double value is negative. From cppreference implicit conversion emphasis mine:
A prvalue of floating-point type can be converted to a prvalue of any integer type. The fractional part is truncated, that is, the fractional part is discarded. If the value cannot fit into the destination type, the behavior is undefined (even when the destination type is unsigned, modulo arithmetic does not apply).
To remove the sign, just use std::abs ("abs" is short for absolute value). Do:
double v = std::abs(value) / 100.0f;

conversion from long double to long long int

I have a long double sine and a long long int amp. I am using <cmath> and have code as follows:
sine = sin(point);
amp = round(sine * 2^31);
Here the variable point is incrementing in 0.009375 intervals. The first line here works fine but on the second I receive this error message:
error: invalid operands of types 'long double' and 'int' to binary 'operator^'
I'm unsure what this error means and the main request here is 'How can I get around this to get an output integer into the variable amp?'
In C++ the ^ operator means exclusive or, not exponentiation. You probably meant (1ULL << 31).
The reason for the error is that * is multiplication, and ^ is the bitwise xor operator which can only be applied to integral types.
Multiplication (*) has higher precedence than ^. So the compiler interprets amp = round(sine * 2^31); as amp = round( (sine *2)^31);. sine (presumably) has type long double, so the result of sine*2 is also of type long double. long double is not an integral type, so cannot be an operand of ^. Hence the error.
Your mistake is assuming that ^ represents exponentiation, which it does not.
You can fix the problem by either
amp = round (sine * pow(2.0, 31)); // all floating point
or
amp = round (sine * (1UL << 31));
The second computes 1 leftshifted 31 bits as an unsigned long (which is guaranteed able to represeny the result, unlike unsigned or int for which there is not such a guarantee). Then, in doing the multiplication, it promotes that value to long double.
If you are doing predominantly floating point operations, the first is more understandable to people who will maintain such code. The second is probably rather cryptic to someone who writes numeric code but is not well acquainted with bit fiddling operations - as, ironically, you have demonstrated in your belief that ^ is exponentiation.
You would need to test to determine which option offers greater performance (given the need to convert unsigned long to long double in the second , and potential for std::pow() in the first to be optimised for some special cases). In other words, there is potential for the compiler optimiser to get aggressive in both cases, or for the implementation of pow() to be lovingly hand-crafted, or both.

Confusion about float data type declaration in C++

a complete newbie here. For my school homework, I was given to write a program that displays -
s= 1 + 1/2 + 1/3 + 1/4 ..... + 1/n
Here's what I did -
#include<iostream.h>
#include<conio.h>
void main()
{
clrscr();
int a;
float s=0, n;
cin>>a;
for(n=1;n<=a;n++)
{
s+=1/n;
}
cout<<s;
getch();
}
It perfectly displays what it should. However, in the past I have only written programs which uses int data type. To my understanding, int data type does not contain any decimal place whereas float does. So I don't know much about float yet. Later that night, I was watching some video on YouTube in which he was writing the exact same program but in a little different way. The video was in some foreign language so I couldn't understand it. What he did was declared 'n' as an integer.
int a, n;
float s=0;
instead of
int a
float s=0, n;
But this was not displaying the desired result. So he went ahead and showed two ways to correct it. He made changes in the for loop body -
s+=1.0f/n;
and
s+=1/(float)n;
To my understanding, he declared 'n' a float data type later in the program(Am I right?). So, my question is, both display the same result but is there any difference between the two? As we are declaring 'n' a float, why he has written 1.0f instead of n.f or f.n. I tried it but it gives error. And in the second method, why we can't write 1(float)/n instead of 1/(float)n? As in the first method we have added float suffix with 1. Also, is there a difference between 1.f and 1.0f?
I tried to google my question but couldn't find any answer. Also, another confusion that came to my mind after a few hours is - Why are we even declaring 'n' a float? As per the program, the sum should come out as a real number. So, shouldn't we declare only 's' a float. The more I think the more I confuse my brain. Please help!
Thank You.
The reason is that integer division behaves different than floating point division.
4 / 3 gives you the integer 1. 10 / 3 gives you the integer 3.
However, 4.0f / 3 gives you the float 1.3333..., 10.0f / 3 gives you the float 3.3333...
So if you have:
float f = 4 / 3;
4 / 3 will give you the integer 1, which will then be stored into the float f as 1.0f.
You instead have to make sure either the divisor or the dividend is a float:
float f = 4.0f / 3;
float f = 4 / 3.0f;
If you have two integer variables, then you have to convert one of them to a float first:
int a = ..., b = ...;
float f = (float)a / b;
float f = a / (float)b;
The first is equivalent to something like:
float tmp = a;
float f = tmp / b;
Since n will only ever have an integer value, it makes sense to define it as as int. However doing so means that this won't work as you might expect:
s+=1/n;
In the division operation both operands are integer types, so it performs integer division which means it takes the integer part of the result and throws away any fractional component. So 1/2 would evaluate to 0 because dividing 1 by 2 results in 0.5, and throwing away the fraction results in 0.
This in contrast to floating point division which keeps the fractional component. C will perform floating point division if either operand is a floating point type.
In the case of the above expression, we can force floating point division by performing a typecast on either operand:
s += (float)1/n
Or:
s += 1/(float)n
You can also specify the constant 1 as a floating point constant by giving a decimal component:
s += 1.0/n
Or appending the f suffix:
s += 1.0f/n
The f suffix (as well as the U, L, and LL suffixes) can only be applied to numerical constants, not variables.
What he is doing is something called casting. I'm sure your school will mention it in new lectures. Basically n is set as an integer for the entire program. But since integer and double are similar (both are numbers), the c/c++ language allows you to use them as either as long as you tell the compiler what you want to use it as. You do this by adding parenthesis and the data type ie
(float) n
he declared 'n' a float data type later in the program(Am I right?)
No, he defined (thereby also declared) n an int and later he explicitly converted (casted) it into a float. Both are very different.
both display the same result but is there any difference between the two?
Nope. They're the same in this context. When an arithmetic operator has int and float operands, the former is implicitly converted into the latter and thereby the result will also be a float. He's just shown you two ways to do it. When both the operands are integers, you'd get an integer value as a result which may be incorrect, when proper mathematical division would give you a non-integer quotient. To avoid this, usually one of the operands are made into a floating-point number so that the actual result is closer to the expected result.
why he has written 1.0f instead of n.f or f.n. I tried it but it gives error. [...] Also, is there a difference between 1.f and 1.0f?
This is because the language syntax is defined thus. When you're declaring a floating-point literal, the suffix is to use .f. So 5 would be an int while 5.0f or 5.f is a float; there's no difference when you omit any trailing 0s. However, n.f is syntax error since n is a identifier (variable) name and not a constant number literal.
And in the second method, why we can't write 1(float)/n instead of 1/(float)n?
(float)n is a valid, C-style casting of the int variable n, while 1(float) is just syntax error.
s+=1.0f/n;
and
s+=1/(float)n;
... So, my question is, both display the same result but is there any difference between the two?
Yes.
In both C and C++, when a calculation involves expressions of different types, one or more of those expressions will be "promoted" to the type with greater precision or range. So if you have an expression with signed and unsigned operands, the signed operand will be "promoted" to unsigned. If you have an expression with float and double operands, the float operand will be promoted to double.
Remember that division with two integer operands gives an integer result - 1/2 yields 0, not 0.5. To get a floating point result, at least one of the operands must have a floating point type.
In the case of 1.0f/n, the expression 1.0f has type float1, so the n will be "promoted" from type int to type float.
In the case of 1/(float) n, the expression n is being explicitly cast to type float, so the expression 1 is promoted from type int to float.
Nitpicks:
Unless your compiler documentation explicitly lists void main() as a legal signature for the main function, use int main() instead. From the online C++ standard:
3.6.1 Main function
...
2 An implementation shall not predefine the main function. This function shall not be overloaded. It shall have a declared return type of type int, but otherwise its type is implementation-defined...
Secondly, please format your code - it makes it easier for others to read and debug. Whitespace and indentation are your friends - use them.
1. The constant expression 1.0 with no suffix has type double. The f suffix tells the compiler to treat it as float. 1.0/n would result in a value of type double.

Issues while printing float values

#include<stdio.h>
#include<math.h>
int main()
{
float i = 2.5;
printf("%d\n%d\n%d",i,i,i);
}
When I compile this using gcc and run it, I get this as the output:
0
1074003968
0
Why doesn't it print just
2
2
2
You're passing a float (which will be converted to a double) to printf, but telling printf to expect an int. The result is undefined behavior, so at least in theory, anything could happen.
What will typically happen is that printf will retrieve sizeof(int) bytes from the stack, and interpret whatever bit pattern they hold as an int, and print out whatever value that happens to represent.
What you almost certainly want is to cast the float to int before passing it to printf.
The "%d" format specifier is for decimal integers. Use "%f" instead.
And take a moment to read the printf() man page.
The "%d" is the specifier for a decimal integer (typically an 32-bit integer) while the "%f" specifier is used for decimal floating point. (typically a double or a float).
if you only want the non-decimal part of the floating point number you could specify the precision as 0.
i.e.
float i = 2.5;
printf("%.0f\n%.0f\n%.0f",i,i,i);
note you could also cast each value to an int and it would give the same result.
printf("%d\n%d\n%d",int(i),int(i),int(i));
%d prints decimal (int)s, not (float)s. printf() cannot tell that you passed a (float) to it (C does not have native objects; you cannot ask a value what type it is); you need to use the appropriate format character for the type you passed.