Consider the following code snippet:
#include <cstdint>
#include <limits>
#include <iostream>
int main(void)
{
uint64_t a = UINT32_MAX;
std::cout << "a: " << a << std::endl;
++a;
std::cout << "a: " << a << std::endl;
uint64_t b = (UINT32_MAX) + 1;
std::cout << "b: " << b << std::endl;
uint64_t c = std::numeric_limits<uint32_t>::max();
std::cout << "c: " << c << std::endl;
uint64_t d = std::numeric_limits<uint32_t>::max() + 1;
std::cout << "d: " << d << std::endl;
return 0;
}
Which gives the following output:
a: 4294967295
a: 4294967296
b: 0
c: 4294967295
d: 0
Why are b and d both 0? I cannot seem to find an explanation for this.
This behaviour is referred to as an overflow. uint32_t takes up 4 bytes or 32 bits of memory. When you use UINT32_MAX you are setting each of the 32 bits to 1 which is the maximum value 4 bytes of memory can represent. 1 is an integer literal which typically takes up 4 bytes of memory too. So you're basically adding 1 to the maximum value 4 bytes can represent. This is how the maximum value looks like in memory:
1111 1111 1111 1111 1111 1111 1111 1111
When you add one to this, there is no more room to represent one greater than the maximum value and hence all bits are set to 0 and back to their minimum value.
Although you're assigning to a uint64_t that has twice the capacity of uint32_t, it is only assigned after the addition operation is complete.
The addition operation checks the types of both the left and the right operands and this is what decides the type of the result. If atleast one value were of type uint64_t, the other operand would automatically be promoted to uint64_t too.
If you do:
(UINT32_MAX) + (uint64_t)1;
or:
(unint64_t)(UINT32_MAX) + 1;
,
you'll get what you expect. In languages like C#, you can use a checked block to check for overflow and prevent this from happening implicitly.
Related
Please, could somebody explain what's happening under the hood there?
The example runs on an Intel machine. Would the behavior be the same on other architectures?
Actually, I have a hardware counter which overruns every now and then, and I have to make sure that the intervals are always computed correctly. I thought that integer arithmetics should always do the trick but when there is a sign change, binary subtraction yields an overflow bit which appears to be actually interpreted as the sign.
Do I really have to handle the sign by myself or is there a more elegant way to compute the interval regardless of the hardware or the implementation?
TIA
std::cout << "\nTest integer arithmetics\n";
int8_t iFirst = -2;
int8_t iSecond = 2;
int8_t iResult = iSecond - iFirst;
std::cout << "\n" << std::to_string(iSecond) << " - " << std::to_string(iFirst) << " = " << std::to_string(iResult);
iResult = iFirst - iSecond;
std::cout << "\n" << std::to_string(iFirst) << " - " << std::to_string(iSecond) << " = " << std::to_string(iResult);
iFirst = SCHAR_MIN + 1; iSecond = SCHAR_MAX - 2;
iResult = iSecond - iFirst;
std::cout << "\n" << std::to_string(iSecond) << " - " << std::to_string(iFirst) << " = " << std::to_string(iResult);
iResult = iFirst - iSecond;
std::cout << "\n" << std::to_string(iFirst) << " - " << std::to_string(iSecond) << " = " << std::to_string(iResult) << "\n\n";
And this is what I get:
Test integer arithmetics
2 - -2 = 4
-2 - 2 = -4
125 - -127 = -4
-127 - 125 = 4
What happens with iResult = iFirst - iSecond is that first both variables iFirst and iSecond are promoted to int due to usual arithmetic conversion. The result is an int. That int result is truncated to int8_t for the assignment (in effect, the top 24 bits of the 32-bit int is cut away).
The int result of -127 - 125 is -252. With two's complement representation that will be 0xFFFFFF04. Truncation only leaves the 0x04 part. Therefore iResult will be equal to 4.
the problem is that your variable is 8 bit. 8 bits can hold up to 256 numbers. So, your variables can only represent numbers within -128~127 range. Any number out of that range will give wrong output. Both of your last calculations produce numbers beyond the variable's range (252 and -252). There is no elegant or even possible way to handle it as it is. You can only handle the overflow bit yourself.
PS. This is not hardware problem. Any processor would give same results.
I read from float in Wikipedia and i tried print his bits.
i used std::bitset and this return other bits different from what I expected(i know because i used the same number of the example in the link), then i used memcpy() and copy the memory of float to 4 parts of 1 byte each and print, this method worked but i have 4 questions.
1) Why using bitset in a float, this print only the integer part?
2) Why bitset working only with the array and not with the float?
3) memcpy() worked in correct order?
The last question is because 0.15625f == 0b00111110001000000000000000000000.
Then i think that the correct order is:
bb[0] == 0b00111110;
bb[1] == 0b00100000;
bb[2] == 0b00000000;
bb[3] == 0b00000000;
But the order returned is inverse.
4) Why happend this ?
My code:
#include <cstring>
#include <iostream>
#include <bitset>
int main(int argc,char** argv){
float f = 0.15625f;
std::cout << std::bitset<32>(f) << std::endl;
//print: 00000000000000000000000000000000
//This print only integer part of the float. I tried with 5.2341 and others
uint8_t bb[4];
memcpy(bb, &f, 4);
std::cout << std::bitset<8>(bb[0]) << std::endl;
//print: 00000000
std::cout << std::bitset<8>(bb[1]) << std::endl;
//print: 00000000
std::cout << std::bitset<8>(bb[2]) << std::endl;
//print: 00100000
std::cout << std::bitset<8>(bb[3]) << std::endl;
//print: 00111110
return 0;
}
To construct std::bitset from a float, one of std::bitset construtors is used. The one that is relevant here is
constexpr bitset(unsigned long long val) noexcept;
Before this constructor is called, float is converted into unsigned long long, and its decimal part is truncated. std::bitset has no constructors that take floating-point values.
The bytes order of floating-point numbers is affected by machine endianness. On a little-endian machine bytes are stored in the reverse order. If your machine uses the same endianness for floating-point numbers and for integers, you can simply write
float f = 0.15625f;
std::uint32_t b;
std::memcpy(&b, &f, 4);
std::cout << std::bitset<32>(b) << std::endl;
// Output: 00111110001000000000000000000000
to get bytes in the correct order automatically.
I have a question with this snippet of code:
uint32_t c = 1 << 31;
uint64_t d = 1 << 31;
cout << "c: " << std::bitset<64>(c) << endl;
cout << "d: " << std::bitset<64>(d) << endl;
cout << (c == d ? "equal" : "not equal") << endl;
The result is:
c: 0000000000000000000000000000000010000000000000000000000000000000
d: 1111111111111111111111111111111110000000000000000000000000000000
not equal
Yes, I know that the solution for 'd' is to use '1ULL'. But I cannot understand why this happens when the shift is of 31 bits. I read somewhere that it is safe to shift size-1 bits, so if I write the instruction without the 'UUL' and the literal '1' is 32 bits long then it should be safe to shift it 31 bits, right?
What am I missing here?
Regards
YotKay
The problem is that the expression that you shift left, namely, the constant 1, is treated as a signed integer. That is why the compiler performs sign extension on it before assigning the result to d, causing the result that you see.
Adding suffix U to 1 will fix the problem (demo).
uint64_t d = 1U << 31;
If I want to flip some bits, I was wondering which way is better. Should I flip them using XOR 0xffffffff or by using ~?
I'm afraid that there will be some cases where I might need to pad bits onto the end in one of these ways and not the other, which would make the other way safer to use. I'm wondering if there are times when it's better to use one over the other.
Here is some code that uses both on the same input value, and the output values are always the same.
#include <iostream>
#include <iomanip>
void flipBits(unsigned long value)
{
const unsigned long ORIGINAL_VALUE = value;
std::cout << "Original value:" << std::setw(19) << std::hex << value << std::endl;
value ^= 0xffffffff;
std::cout << "Value after XOR:" << std::setw(18) << std::hex << value << std::endl;
value = ORIGINAL_VALUE;
value = ~value;
std::cout << "Value after bit negation: " << std::setw(8) << std::hex << value << std::endl << std::endl;
}
int main()
{
flipBits(0x12345678);
flipBits(0x11223344);
flipBits(0xabcdef12);
flipBits(15);
flipBits(0xffffffff);
flipBits(0x0);
return 0;
}
Output:
Original value: 12345678
Value after XOR: edcba987
Value after bit negation: edcba987
Original value: 11223344
Value after XOR: eeddccbb
Value after bit negation: eeddccbb
Original value: abcdef12
Value after XOR: 543210ed
Value after bit negation: 543210ed
Original value: f
Value after XOR: fffffff0
Value after bit negation: fffffff0
Original value: ffffffff
Value after XOR: 0
Value after bit negation: 0
Original value: 0
Value after XOR: ffffffff
Value after bit negation: ffffffff
Use ~:
You won't be relying on any specific width of the type; for example, int is not 32 bits on all platforms.
It removes the risk of accidentally typing one f too few or too many.
It makes the intent clearer.
As you're asking for c++ specifically, simply use std::bitset
#include <iostream>
#include <iomanip>
#include <bitset>
#include <limits>
void flipBits(unsigned long value) {
std::bitset<std::numeric_limits<unsigned long>::digits> bits(value);
std::cout << "Original value : 0x" << std::hex << value;
value = bits.flip().to_ulong();
std::cout << ", Value after flip: 0x" << std::hex << value << std::endl;
}
See live demo.
As for your mentioned concerns, of just using the ~ operator with the unsigned long value, and having more bits flipped as actually wanted:
Since std::bitset<NumberOfBits> actually specifies the number of bits, that should be operated on, it will well solve such problems correctly.
#include <iostream>
using namespace std;
struct test
{
int i;
double h;
int j;
};
int main()
{
test te;
te.i = 5;
te.h = 6.5;
te.j = 10;
cout << "size of an int: " << sizeof(int) << endl; // Should be 4
cout << "size of a double: " << sizeof(double) << endl; //Should be 8
cout << "size of test: " << sizeof(test) << endl; // Should be 24 (word size of 8 for double)
//These two should be the same
cout << "start address of the object: " << &te << endl;
cout << "address of i member: " << &te.i << endl;
//These two should be the same
cout << "start address of the double field: " << &te.h << endl;
cout << "calculate the offset of the double field: " << (&te + sizeof(double)) << endl; //NOT THE SAME
return 0;
}
Output:
size of an int: 4
size of a double: 8
size of test: 24
start address of the object: 0x7fffb9fd44e0
address of i member: 0x7fffb9fd44e0
start address of the double field: 0x7fffb9fd44e8
calculate the offset of the double field: 0x7fffb9fd45a0
Why do the last two lines produce different values? Something I am doing wrong with pointer arithmetic?
(&te + sizeof(double))
This is the same as:
&((&te)[sizeof(double)])
You should do:
(char*)(&te) + sizeof(int)
You are correct -- the problem is with pointer arithmetic.
When you add to a pointer, you increment the pointer by a multiple of that pointer's type
Therefore, &te + 1 will be 24 bytes after &te.
Your code &te + sizeof(double) will add 24 * sizeof(double) or 192 bytes.
Firstly, your code is wrong, you'd want to add the size of the fields before h (i.e. an int), there's no reason to assume double. Second, you need to normalise everything to char * first (pointer arithmetic is done in units of the thing being pointed to).
More generally, you can't rely on code like this to work. The compiler is free to insert padding between fields to align things to word boundaries and so on. If you really want to know the offset of a particular field, there's an offsetof macro that you can use. It's defined in <stddef.h> in C, <cstddef> in C++.
Most compilers offer an option to remove all padding (e.g. GCC's __attribute__ ((packed))).
I believe it's only well-defined to use offsetof on POD types.
struct test
{
int i;
int j;
double h;
};
Since your largest data type is 8 bytes, the struct adds padding around your ints, either put the largest data type first, or think about the padding on your end! Hope this helps!
&te + sizeof(double) is equivalent to &te + 8, which is equivalent to &((&te)[8]). That is — since &te has type test *, &te + 8 adds eight times the size of a test.
You can see what's going on more clearly using the offsetof() macro:
#include <iostream>
#include <cstddef>
using namespace std;
struct test
{
int i;
double h;
int j;
};
int main()
{
test te;
te.i = 5;
te.h = 6.5;
te.j = 10;
cout << "size of an int: " << sizeof(int) << endl; // Should be 4
cout << "size of a double: " << sizeof(double) << endl; // Should be 8
cout << "size of test: " << sizeof(test) << endl; // Should be 24 (word size of 8 for double)
cout << "i: size = " << sizeof te.i << ", offset = " << offsetof(test, i) << endl;
cout << "h: size = " << sizeof te.h << ", offset = " << offsetof(test, h) << endl;
cout << "j: size = " << sizeof te.j << ", offset = " << offsetof(test, j) << endl;
return 0;
}
On my system (x86), I get the following output:
size of an int: 4
size of a double: 8
size of test: 16
i: size = 4, offset = 0
h: size = 8, offset = 4
j: size = 4, offset = 12
On another system (SPARC), I get:
size of an int: 4
size of a double: 8
size of test: 24
i: size = 4, offset = 0
h: size = 8, offset = 8
j: size = 4, offset = 16
The compiler will insert padding bytes between struct members to ensure that each member is aligned properly. As you can see, alignment requirements vary from system to system; on one system (x86), double is 8 bytes but only requires 4-byte alignment, and on another system (SPARC), double is 8 bytes and requires 8-byte alignment.
Padding can also be added at the end of a struct to ensure that everything is aligned properly when you have an array of the struct type. On SPARC, for example, the compile adds 4 bytes pf padding at the end of the struct.
The language guarantees that the first declared member will be at an offset of 0, and that members are laid out in the order in which they're declared. (At least that's true for simple structs; C++ metadata might complicate things.)
Compilers are free to space out structs however they want past the first member, and usually use padding to align to word boundaries for speed.
See these:
C struct sizes inconsistence
Struct varies in memory size?
et. al.