C++ double ptr to long ptr conversion - c++

Consider the below code fragment:
double * p = new double[16];
int value = 1;
void * q = p;
*((double *)q) = value;
int x = ((long *)p)[0];
cout << "Value of x for double * to long * = " << x << endl;
*((int *)q) = value ;
x = ((long *)p)[0];
cout << "Value of x for int * to long * = " << x << endl;
Here the outputs are 0 and 1 respectively. Can anyone explain to me why?
Also if I directly access the value at pointer...ie. p[0], the value is correctly shown as 1 in both case. Why?

Integers are stored in memory in straight binary so you can convert between intand long with no problem. doulbe is stored using the floating point binary syntax where some of the bits are used to describe the mantissa and the others are used to describe the exponent (similar to scientific notation i.e. 5e2 = 500).
If you try and use the data for a double as the data for a double then it will not convert correctly due to the different ways that the binary stores the value.

Related

Exact double division

Consider the following function:
auto f(double a, double b) -> int
{
return std::floor(a/b);
}
So I want to compute the largest integer k such that k * b <= a in a mathematical sense.
As there could be rounding errors, I am unsure whether the above function really computes this k. I do not worry about the case that k could be out of range.
What is the proper way to determine this k for sure?
It depends how strict you are. Take a double b and an integer n, and calculate bn. Then a will be rounded. If a is rounded down, then it is less than the mathematical value of nb, and a/b is mathematically less than n. You will get a result if n instead of n-1.
On the other hand, a == b*n will be true. So the “correct” result could be surprising.
Your condition was that “kb <= a”. If we interpret this as “the result of multiplying kb using double precision is <= a”, then you’re fine. If we interpret it as “the mathematically exact product of k and b is <= a”, then you need to calculate k*b - a using the fma function and check the result. This will tell you the truth, but might return a result of 4 if a was calculated as 5.0 * b and was rounded down.
The problem is that float division is not exact.
a/b can give 1.9999 instead of 2, and std::floor can then give 1.
One simple solution is to add a small value prior calling std::floor:
std::floor (a/b + 1.0e-10);
Result:
result = 10 while 11 was expected
With eps added, result = 11
Test code:
#include <iostream>
#include <cmath>
int main () {
double b = atan (1.0);
int x = 11;
double a = x * b;
int y = std::floor (a/b);
std::cout << "result = " << y << " while " << x << " was expected\n";
double eps = 1.0e-10;
int z = std::floor (a/b + eps);
std::cout << "With eps added, result = " << z << "\n";
return 0;
}

Why are the decimals absent in fractional number even after type casting from int to double?

From what I've gathered, assigning a fractional number to a double won't work properly unless either the numerator or the denominator is a floating point number, ( and by "not working properly", I mean that the decimals get cut off, I know that numbers can't be stored as fractions of course). However, I've tried type casting ints to doubles before assigning them to another double variable but it still doesn't work. It's not a big deal since I just had to do a minor work around, but why is this the case?
I added some coding I did while testing.
#include <iostream>
using namespace std;
double convert(int v) {
return v;
}
int main() {
int a = 5;
int b = 2;
double n;
n = convert(a) / convert(b);
cout << n << endl; // Decimals are stored
a = static_cast<double> (a);
b = static_cast<double> (b);
n = a / b;
cout << n << endl; // Decimals are cut off
a = (double) a;
b = (double) b;
n = a / b;
cout << n << endl; << // Decimals are cut off
double c = a;
double d = b;
n = c / d;
cout << n << endl; // Decimals are stored
return 0;
}
Output:
2.5
2
2
2.5
Because
a / b;
is integer division (because both operands are int) i.e. the output is an integer, whether the output is then assigned to double or anything else is irrelevant in the calculation of the result.
Because of integer division.
n = a / b;
Here a and b are integers so the result is also an integer, this is a rule of C++, so 5/2 == 2. The integer 2 then gets converted to a double which then prints as 2.
int a = 5;
a = static_cast<double> (a);
The first line creates an int variable named a and puts the value 5 in it. The second line explicitly converts the value of a to a double, then stores that converted value in a. However, a has type int, so there is an implicit conversion to int. That is, the second line is functionally equivalent to:
a = static_cast<int> ( static_cast<double> (a) );
So by the time you get to the division, you are back to integer arithmetic. To get the conversion to floating point to "stick" through your division, you need to avoid throwing it away. You could either assign the converted value to a new variable, as in
double aa = static_cast<double> (a);
or do the conversion in the same expression as the division
n = static_cast<double>(a) / b;
n = a / static_cast<double>(b);
n = static_cast<double>(a) / static_cast<double>(b);
Any of these three alternatives will trigger floating-point division.

Puzzled by different result from "same" type cast, float to int

If I assign a value to a floating point computation to a variable first, then assign that to an unsigned int with implicit type casting, I get one answer. But if I assign the same computation directly to the unsigned int, again with implicit type casting, I get a different answer.
Below is sample code I compiled and ran to demonstrate:
#include <iostream>
int
main( int argc, char** argv )
{
float payloadInTons = 6550.3;
// Above, payloadInTons is given a value.
// Below, two different ways are used to type cast that same value,
// but the results do not match.
float tempVal = payloadInTons * 10.0;
unsigned int right = tempVal;
std::cout << " right = " << right << std::endl;
unsigned int rawPayloadN = payloadInTons * 10.0;
std::cout << " wrong = " << rawPayloadN << std::endl;
return 0;
}
Does anyone have insight into why "right" is right, and "wrong" is wrong?
By the way, I am using gcc 4.8.2 on Ubuntu 14.04 LTS, if it matters.
You are using double literals. With proper float literals, everything's fine.
int
main( int argc, char** argv )
{
float payloadInTons = 6550.3f;
float tempVal = payloadInTons * 10.0f;
unsigned int right = tempVal;
std::cout << " right = " << right << std::endl;
unsigned int rawPayloadN = payloadInTons * 10.0f;
std::cout << "also right = " << rawPayloadN << std::endl;
return 0;
}
Output :
right = 65503
also right = 65503
After accept answer
This is not a double vs. float issue. It is a binary floating-point and conversion to int/unsigned issue.
Typical float uses binary32 representation with does not give exact representation of values like 6550.3.
float payloadInTons = 6550.3;
// payloadInTons has the exact value of `6550.2998046875`.
Multiplying by 10.0, below, insures the calculation is done with at least double precision with an exact result of 65502.998046875. The product is then converted back to float. The double value is not exactly representable in float and so is rounded to the best float with an exact value of 65503.0. Then tempVal converts right as desired with a value of 65503.
float tempVal = payloadInTons * 10.0;
unsigned int right = tempVal;
Multiplying by 10.0, below, insures the calculation is done with at least double precision with an exact result of 65502.998046875 just as before. This time, the value is converted directly to unsigned rawPayloadN with the undesired with a value of 65502. This is because the value in truncated and not rounded.
unsigned int rawPayloadN = payloadInTons * 10.0;
The first “worked” because of the conversion was double to float to unsigned. This involves 2 conversions with is usually bad. In this case, 2 wrongs made a right.
Solution
Had code tried float payloadInTons = 6550.29931640625; (the next smallest float number) both result would have been 65502.
The "right” way to convert a floating point value to some integer type is often to round the result and then perform the type conversion.
float tempVal = payloadInTons * 10.0;
unsigned int right = roundf(tempVal);
Note: This entire issue is complication by the value of FLT_EVAL_METHOD. If user’s value is non-zero, floating point calculation may occur at higher precision than expected.
printf("FLT_EVAL_METHOD %d\n", (int) FLT_EVAL_METHOD);

Converting int to float?

Whenever I try to compile I get
24 [Warning] converting to int from float
83 [Warning] converting to int from float
int a , b = 8;
float c = 5.2;
float d = 8E3;
a = static_cast<float>(b) * c; // 24
cout << a << "\n";
cout << d << "\n";
int x, y, answer;
x = 7;
y = 9;
answer = 5;
answer *= (x + y);
cout << answer << "\n";
answer *= x + y;
cout << answer << "\n";
float m = 33.97;
answer += (x + y + m); // 83
cout << answer << "\n";
Any suggestions as to what I'm doing wrong?
a = static_cast<float>(b) * c;
a is an int, and the right-hand side of the equation is the multiplication of two floats, which will result in an intermediate float value, which is then implicitly casted to an int, causing the warning you are seeing.
Also:
answer += (x + y + m);
answer is an int type, and so are x and y, but m is float, again causing the intermediate result of the right-hand side to be a float.
These conversions will cause truncation of the fractional values of the float results. You can get rid of the warnings by explicitly casting to an int:
a = static_cast<int>(static_cast<float>(b) * c);
answer += static_cast<int>(x + y + m);
Well you're just getting a warning since the compiler is changing a floating-point value to an integer, thus truncating the result.
int a;
float f = 3.2;
a = f; // a is 3, a trunctated 3.2
Your question seems to be about conversion from float to int, not from int to float.
Basically you are doing nothing wrong. It is only a warning, as the value will be truncated (and maybe you don't expect that). To tell the compiler that you really want to get an int out of a float, you can make the cast explicit, like this:
a = static_cast<int>(static_cast<float>(b) * c);
Then it will not warn you anymore.

C++ floating point to integer type conversions

What are the different techniques used to convert float type of data to integer in C++?
#include <iostream>
using namespace std;
struct database {
int id, age;
float salary;
};
int main() {
struct database employee;
employee.id = 1;
employee.age = 23;
employee.salary = 45678.90;
/*
How can i print this value as an integer
(with out changing the salary data type in the declaration part) ?
*/
cout << endl << employee.id << endl << employee.
age << endl << employee.salary << endl;
return 0;
}
What you are looking for is 'type casting'. typecasting (putting the type you know you want in brackets) tells the compiler you know what you are doing and are cool with it. The old way that is inherited from C is as follows.
float var_a = 9.99;
int var_b = (int)var_a;
If you had only tried to write
int var_b = var_a;
You would have got a warning that you can't implicitly (automatically) convert a float to an int, as you lose the decimal.
This is referred to as the old way as C++ offers a superior alternative, 'static cast'; this provides a much safer way of converting from one type to another. The equivalent method would be (and the way you should do it)
float var_x = 9.99;
int var_y = static_cast<int>(var_x);
This method may look a bit more long winded, but it provides much better handling for situations such as accidentally requesting a 'static cast' on a type that cannot be converted. For more information on the why you should be using static cast, see this question.
Normal way is to:
float f = 3.4;
int n = static_cast<int>(f);
Size of some float types may exceed the size of int.
This example shows a safe conversion of any float type to int using the int safeFloatToInt(const FloatType &num); function:
#include <iostream>
#include <limits>
using namespace std;
template <class FloatType>
int safeFloatToInt(const FloatType &num) {
//check if float fits into integer
if ( numeric_limits<int>::digits < numeric_limits<FloatType>::digits) {
// check if float is smaller than max int
if( (num < static_cast<FloatType>( numeric_limits<int>::max())) &&
(num > static_cast<FloatType>( numeric_limits<int>::min())) ) {
return static_cast<int>(num); //safe to cast
} else {
cerr << "Unsafe conversion of value:" << num << endl;
//NaN is not defined for int return the largest int value
return numeric_limits<int>::max();
}
} else {
//It is safe to cast
return static_cast<int>(num);
}
}
int main(){
double a=2251799813685240.0;
float b=43.0;
double c=23333.0;
//unsafe cast
cout << safeFloatToInt(a) << endl;
cout << safeFloatToInt(b) << endl;
cout << safeFloatToInt(c) << endl;
return 0;
}
Result:
Unsafe conversion of value:2.2518e+15
2147483647
43
23333
For most cases (long for floats, long long for double and long double):
long a{ std::lround(1.5f) }; //2l
long long b{ std::llround(std::floor(1.5)) }; //1ll
Check out the boost NumericConversion library. It will allow to explicitly control how you want to deal with issues like overflow handling and truncation.
I believe you can do this using a cast:
float f_val = 3.6f;
int i_val = (int) f_val;
the easiest technique is to just assign float to int, for example:
int i;
float f;
f = 34.0098;
i = f;
this will truncate everything behind floating point or you can round your float number before.
One thing I want to add. Sometimes, there can be precision loss. You may want to add some epsilon value first before converting. Not sure why that works... but it work.
int someint = (somedouble+epsilon);
This is one way to convert IEEE 754 float to 32-bit integer if you can't use floating point operations. It has also a scaler functionality to include more digits to the result. Useful values for scaler are 1, 10 and 100.
#define EXPONENT_LENGTH 8
#define MANTISSA_LENGTH 23
// to convert float to int without floating point operations
int ownFloatToInt(int floatBits, int scaler) {
int sign = (floatBits >> (EXPONENT_LENGTH + MANTISSA_LENGTH)) & 1;
int exponent = (floatBits >> MANTISSA_LENGTH) & ((1 << EXPONENT_LENGTH) - 1);
int mantissa = (floatBits & ((1 << MANTISSA_LENGTH) - 1)) | (1 << MANTISSA_LENGTH);
int result = mantissa * scaler; // possible overflow
exponent -= ((1 << (EXPONENT_LENGTH - 1)) - 1); // exponent bias
exponent -= MANTISSA_LENGTH; // modify exponent for shifting the mantissa
if (exponent <= -(int)sizeof(result) * 8) {
return 0; // underflow
}
if (exponent > 0) {
result <<= exponent; // possible overflow
} else {
result >>= -exponent;
}
if (sign) result = -result; // handle sign
return result;
}