When I give sizeof(a), where a=13.33, a float variable, the size is 4 bytes.
But if i give sizeof(13.33) directly, the size is 8 bytes.
I do not understand what is happening. Can someone help?
Those are the rules of the language.
13.33 is a numeric literal. It is treated as a double because it is a double. If you want 13.33 to be treated as a float literal, then you state 13.33f.
13.33 is a double literal. If sizeof(float) == 4, sizeof(13.33f) == 4 should also hold because 13.33f is a float literal.
The literal 13.33 is treated as a double precision floating point value, 8 bytes wide.
The 13.33 literal is being treated as 'double', not 'float'.
Try 13.33f instead.
The type and size of your variable are fine. It's just that the compiler has some default types for literals, those constant values hard-coded in your program.
If you request sizeof(1), you'll get sizeof(int). If you request sizeof(2.5), you'll get sizeof(double). Those would clearly fit into a char and a float respectively, but the compiler has default types for your literals and will treat them as such until assignment.
You can override this default behaviour, though. For example:
2.5 // as you didn't specify anything, the compiler will take it for a double.
2.5f // ah ha! you're specifying this literal to be float
Cheers!
Because 13.33 is a double, which gets truncated to a float if you assign it. And a double is 8bytes. To create a real float, use 13.33f (note the f).
Related
I'd like to assign a value to a variable like this:
double var = 0xFFFFFFFF;
As a result var gets the value 65535.0 assigned. Since the compiler assumes a 64bit target system the number literal (i.e. all respective 32 bits) is interpreted significand precision bits. However, since 0xFFFF FFFF is just a notation for a bit pattern, without any hint about the representation, it could be quite differently interpreted w.r.t. becoming a floating point value. Thus, I was wondering if there is a way to manipulate this fixed interpretation of the value. In other words, give a hint about the desired representation. (Maybe someone could also point me to part in the standard where this implicit interpretation is defined).
So far, the default precision interpretation on my system seems to be
(int)0xFFFFFFFF x 100.
Only the fraction field is getting filled1.
So maybe (here: for 16 bit cross-compilation) I want it to be a different representation like:
(int)0xFFFFFF x 10(int)0xFF
(ignoring the sign bit for a moment).
Thus my question: How can I force a custom double interpretation of the hex literal notation?
1 Even when my hex literal would be 0xFFFF FFFF FFFF FFFF the value is only interpreted as the fraction part - so clearly, bits should be used for exponent and sign field. But it seems the literal gets just cut off.
C++ doesn't specify the in-memory representation for double, moreover, it doesn't even specify the in-memory representation of integer types (and it can really be different on systems with different endings). So if you want to interpret bytes 0xFF, 0xFF as a double, you can do something like:
uint8_t bytes[sizeof(double)] = {0xFF, 0xFF};
double var;
memcpy(&var, bytes, sizeof(double));
Note that using unions or reinterpret_casting pointers is, strictly speaking, undefined behavior, though in practice also works.
"I was wondering if there is a way to manipulate this interpretation."
Yes, you can use a reinterpret_cast<double&> via address, to force type (re-)interpretation from a certain bit pattern in memory.
"Thus my question: How can I force double interpretation of the hex notation?"
You can also use a union, to make it clearer:
union uint64_2_double {
uint64_t bits;
double dValue;
};
uint64_2_double x;
x.bits = 0x000000000000FFFF;
std::cout << x.dValue << std::endl;
There does not seem to be a direct way to initialize a double variable with an hexadecimal pattern, the c-style cast is equivalent to a C++ static_cast and the reinterpret_cast will complain it can't perform the conversion. I will give you two options, one simple solution but that will not initialize directly the variable, and a complicated one. You can do the following:
double var;
*reinterpret_cast<long *>(&var) = 0xFFFF;
Note: watch out that I would expect you to want to initialize all 64 bits of the double, your constant 0xFFFF seems small, it gives 3.23786e-319
A literal value that begins with 0x is an hexadecimal number of type unsigned int. You should use the suffix ul to make it a literal of unsigned long, which in most architectures will mean a 64 bit unsigned; or, #include <stdint.h> and do for example uint64_t(0xABCDFE13)
Now for the complicated stuff: In old C++ you can program a function that converts the integral constant to a double, but it won't be constexpr.
In constexpr functions you can't make reinterpret_cast. Then, your only choice to make a constexpr converter to double is to use an union in the middle, for example:
struct longOrDouble {
union {
unsigned long asLong;
double asDouble;
};
constexpr longOrDouble(unsigned long v) noexcept: asLong(v) {}
};
constexpr double toDouble(long v) { return longOrDouble(v).asDouble; }
This is a bit complicated, but this answers your question. Now, you can write:
double var = toDouble(0xFFFF)
And this will insert the given binary pattern into the double.
Using union to write to one member and read from another is undefined behavior in C++, there is an excellent question and excellent answers on this right here:
Accessing inactive union member and undefined behavior?
I'm getting confused between double and float in C++. For example:
Q. For each type state its constant:
a.) 1.0
b.) 2.8e-10
According to me, the a.) part is a float (as it's less precise) and b.) is a double. Or are both double?
I think precision is the main difference between the two:
Float - 7 digits (32 bit)
Double-15-16 digits (64 bit)
Your answer may depend on the language which you are using since precision factor is a critical one. But I would say that you can go with that both are DOUBLE. Also 1.0 can be float as well, so without knowing your requirement or language it is difficult to answer that.
Without any suffixes all floating-point literals are double in C++. If an f suffix is attached then the literal is a float and if written with L suffix then it'll be a long double. Literal constants generally don't depend on their magnitude. Integer literals like 1 or 2 are of type int although their values lies completely in char's range
The type of a floating literal is double unless explicitly specified by a suffix. The suffixes f and F specify float, the suffixes l and L specify long double
ISO C++ 2013 draft
you might consider them both as double , at the end it is all about size
1.0 is small in size so you can consider it as float too.
With regard to to those definitions found in stdint.h, I wish to test a function for converting vectors of int8_t or vectors of int64_t to vectors of std::string.
Here are my tests:
TEST(TestAlgorithms, toStringForInt8)
{
std::vector<int8_t> input = boost::assign::list_of(-128)(0)(127);
Container container(input);
EXPECT_TRUE(boost::apply_visitor(ToString(),container) == boost::assign::list_of("-128")("0")("127"));
}
TEST(TestAlgorithms, toStringForInt64)
{
std::vector<int64_t> input = boost::assign::list_of(-9223372036854775808)(0)(9223372036854775807);
Container container(input);
EXPECT_TRUE(boost::apply_visitor(ToString(),container) == boost::assign::list_of("-9223372036854775808")("0")("9223372036854775807"));
}
However, I am getting a warning in visual studio for the line:
std::vector<int64_t> input = boost::assign::list_of(-9223372036854775808)(0)(9223372036854775807);
as follows:
warning C4146: unary minus operator applied to unsigned type, result still unsigned
If I change -9223372036854775808 to -9223372036854775807, the warning disappears.
What is the issue here? With regard to my original code, the test is passing.
It's like the compiler says, -9223372036854775808 is not a valid number because the - and the digits are treated separately.
You could try -9223372036854775807 - 1 or use std::numeric_limits<int64_t>::min() instead.
The issue is that integer literals are not negative; so -42 is not a literal with a negative value, but rather the - operator applied to the literal 42.
In this case, 9223372036854775808 is out of the range of int64_t, so it will be given an unsigned type. Due to the magic of modular arithmetic, you can still negate it, assign it to int64_t, and end up with the result you expect; but the compiler will warn you about the unsigned negation (if you tell it to) since that can often be the result of an error.
You could avoid the warning (and make the code more obviously correct) by using std::numeric_limits<int64_t>::min() instead.
Change -9223372036854775808 to -9223372036854775807-1.
The issue is that -9223372036854775808 isn't -9223372036854775808 but rather -(9223372036854775808) and 9223372036854775808 cannot fit into a signed 64-bit type (decimal integer constants by default are a signed type), so it instead becomes unsigned. Applying negation with - to an unsigned type is suspicious, hence the warning.
float b = 1.0f;
int i = b;
int& j = (int&)i;
cout<<j<<endl;
o/p = 1
But for the following scenario
float b = 1.0f;
int i = b;
int& j = (int&)b;
cout<<j<<endl;
O/P = 1065353216
since both are having the same value it shall show the same result ...Can anyone please let me know whats really happening when i am doing some change in line number 3 ?
In the first one, you are doing everything fine. The compiler is able to convert float b to int i, losing precision, but it's fine. Now, take a look at my debugger window during the execution of your second example:
Sorry for my Russian IDE interface, the first column is variable name, the second is value, and the third is type.
As you can see, now the float is simply interpreted as int. So the leading 1 bits are interpreted as the integer's bits, which leads to the result you are getting. So basically, you take the float's binary representation (usually it's represented as sign bit, mantissa and exponent), and try to interpret it as an int.
In the first case you're initializing j correctly and the cast is superfluous. In the second case you're doing it wrong (i.e. to an object of a different type) but the cast shuts the compiler up.
In this second case, what you get is probably the internal representation of 1.0 interpreted as in integer.
Integer 1 and floating-point 1.0f may be mathematically the same value, but in C++ they have different types, with different representations.
Casting an lvalue to a reference is equivalent to reinterpret_cast; it says "look at whatever is in this memory location, and interpret those bytes as an int".
In the first case, the memory contains an int, so interpreting those bytes as an int gives expected value.
In the second case, the memory contains a float, so you see the bytes (or perhaps just some of them, or perhaps some extra ones too, if sizeof(int) != sizeof(float)) that represent the floating-point number, reinterpreted as an integer.
Your computer probably uses 32-bit int and 32-bit IEEE float representations. The float value 1.0f has a sign bit of zero, an exponent of zero (represented by the 8-bit value 127, or 01111111 in binary), and a mantissa of 1 (represented by the 23-bit value zero), so the 32-bit pattern would look like:
00111111 10000000 00000000 00000000
When reinterpreted as an integer, this gives the hex value 0x3f800000, which is 1065353216 in decimal.
Reference doesn't do any memory allocation, it just places an entry into table of local names and their addresses. In first case name 'j' points to the memory previously allocated to int datatype (for variable 'i'), while in second case name 'j' points to memory allocated to float datatype (for variable 'b'). When you use 'j' compiler interprets data at the appropriate address as if it was int, but in fact some float is placed there, that's why you get some "strange" numbers instead of 1
The first one first casts b to an int before assigning it to i. This is the "proper" way, as the compiler will properly convert the value.
The second one does no casting and re-interpret's b's bits as an integer. If you read up on floating point format you can see exactly why you're getting the value you're getting.
Under the covers, all your variables are just collections of bits. How you interpret those bits changes the perceived value they represent. In the first one, you're rearranging the bit pattern to preserve the "perceived" value (of 1). In the second one, you're not rearranging the bit pattern, and so the perceived value is not properly converted.
#include<stdio.h>
#include<math.h>
int main()
{
float i = 2.5;
printf("%d\n%d\n%d",i,i,i);
}
When I compile this using gcc and run it, I get this as the output:
0
1074003968
0
Why doesn't it print just
2
2
2
You're passing a float (which will be converted to a double) to printf, but telling printf to expect an int. The result is undefined behavior, so at least in theory, anything could happen.
What will typically happen is that printf will retrieve sizeof(int) bytes from the stack, and interpret whatever bit pattern they hold as an int, and print out whatever value that happens to represent.
What you almost certainly want is to cast the float to int before passing it to printf.
The "%d" format specifier is for decimal integers. Use "%f" instead.
And take a moment to read the printf() man page.
The "%d" is the specifier for a decimal integer (typically an 32-bit integer) while the "%f" specifier is used for decimal floating point. (typically a double or a float).
if you only want the non-decimal part of the floating point number you could specify the precision as 0.
i.e.
float i = 2.5;
printf("%.0f\n%.0f\n%.0f",i,i,i);
note you could also cast each value to an int and it would give the same result.
printf("%d\n%d\n%d",int(i),int(i),int(i));
%d prints decimal (int)s, not (float)s. printf() cannot tell that you passed a (float) to it (C does not have native objects; you cannot ask a value what type it is); you need to use the appropriate format character for the type you passed.