I have looked over the guide given in this answer, but I still don't understand bit-shifting. In particular I am confused about the data types come into play.
The following:
unsigned int a = pow(2,31);
cout << (a << 1);
indeed produces 0 as I expect because the int is 32 bits, so moving the 1 to the left, pushes it into nothing.
But the following
unsigned int a = 1;
unsigned char b = (unsigned char)a;
cout << (unsigned int)(b<<8);
produces 256. Why is that? My guess would have been that a char is 8 bit and so moving 1 left 8 places should give zero.
Is there a function/shift that does this? (i.e. evaluates 1<<8 to 0).
Narrow integral values are promoted to int or unsigned int before being used. It's called integral promotion.
Related
On my application, I receive two signed 32-bit int and I have to store them. I have to create a sort of counter and I don't know when it will be reset, but I'll receive big values and frequently. Beacause of that, in order to store these values, I decided to use two unsigned 64-bit int.
The following could be a simple version of the counter.
struct Counter
{
unsigned int elementNr;
unsigned __int64 totalLen1;
unsigned __int64 totalLen2;
void UpdateCounter(int len1, int len2)
{
if(len1 > 0 && len2 > 0)
{
++elementNr;
totalLen1 += len1;
totalLen2 += len2;
}
}
}
I know that if a smaller type is casted to a bigger one (e.g. int to long) there should be no issues. However, passing from 32 bit rappresentation to 64 bit rappresentation and from signed to unsigned at the same time, is something new for me.
Reading around, I undertood that len1 should be expanded from 32 bit to 64 bit and then applied sign extension. Because the unsigned int and signen int have the same rank (Section 4.13), the latter should be converted.
If len1 stores a negative value, passing from signed to unsigned will return a wrong value, this is why I check the positivy at the beginning of the function. However, for positive values, there
should be no issues I think.
For clarity I could revrite UpdateCounter(int len1, int len2) like this
void UpdateCounter(int len1, int len2)
{
if(len1 > 0 && len2 > 0)
{
++elementNr;
__int64 tmp = len1;
totalLen1 += static_cast<unsigned __int64>(tmp);
tmp = len2;
totalLen2 += static_cast<unsigned __int64>(tmp);
}
}
Might there be some side effects that I have not considered.
Is there another better and safer way to do that?
A little background, just for reference: binary operators such arithmetic addition work on operands of the same type (the specific CPU instruction to which is translated depends on the number representation that must be the same for both instruction operands).
When you write something like this (using fixed width integer types to be explicit):
int32_t a = <some value>;
uint64_t sum = 0;
sum += a;
As you already know this involves an implicit conversion, more specifically an
integral promotion according to integer conversion rank.
So the expression sum += a; is equivalent to sum += static_cast<uint64_t>(a);, so a is promoted having the lesser rank.
Let's see what happens in this example:
int32_t a = 60;
uint64_t sum = 100;
sum += static_cast<uint64_t>(a);
std::cout << "a=" << static_cast<uint64_t>(a) << " sum=" << sum << '\n';
The output is:
a=60 sum=160
So all is all ok as expected. Let's se what happens adding a negative number:
int32_t a = -60;
uint64_t sum = 100;
sum += static_cast<uint64_t>(a);
std::cout << "a=" << static_cast<uint64_t>(a) << " sum=" << sum << '\n';
The output is:
a=18446744073709551556 sum=40
The result is 40 as expected: this relies on the two's complement integer representation (note: unsigned integer overflow is not undefined behaviour) and all is ok, of course as long as you ensure that the sum does not become negative.
Coming back to your question you won't have any surprises if you always add positive numbers or at least ensuring that sum will never be negative... until you reach the maximum representable value std::numeric_limits<uint64_t>::max() (2^64-1 = 18446744073709551615 ~ 1.8E19).
If you continue to add numbers indefinitely sooner or later you'll reach that limit (this is valid also for your counter elementNr).
You'll overflow the 64 bit unsigned integer by adding 2^31-1 (2147483647) every millisecond for approximately three months, so in this case it may be advisable to check:
#include <limits>
//...
void UpdateCounter(const int32_t len1, const int32_t len2)
{
if( len1>0 )
{
if( static_cast<decltype(totalLen1)>(len1) <= std::numeric_limits<decltype(totalLen1)>::max()-totalLen1 )
{
totalLen1 += len1;
}
else
{// Would overflow!!
// Do something
}
}
}
When I have to accumulate numbers and I don't have particular requirements about accuracy I often use double because the maximum representable value is incredibly high (std::numeric_limits<double>::max() 1.79769E+308) and to reach overflow I would need to add 2^32-1=4294967295 every picoseconds for 1E+279 years.
This question already has an answer here:
C++ size_t modulus operation with negative operand
(1 answer)
Closed 2 years ago.
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main()
{
vector<int> v = {1, 2, 3, 4, 5, 6, 7};
int i = -4;
cout << i << endl;
cout << v.size() << endl;
cout << i % v.size() << endl;
cout << -4 % 7 << endl;
}
The above code prints:
-4
7
5
-4
Can someone please explain why i % v.size() prints 5 instead of -4? I'm guessing it has something to do with vector.size(), but unsure what the underlying reasoning is. Thanks in advance.
The operands of % undergo the usual arithmetic conversions to bring them to a common type, before the division is performed. If the operands were int and size_t, then the int is converted to size_t.
If size_t is 32-bit then -4 would become 4294967292 and then the result of the expression is 4294957292 % 7 which is actually 0.
If size_t is 64-bit then -4 would become 18,446,744,073,709,551,612 and the result of this % 7 is 5 which you saw.
So actually we can tell from this output that your system has 64-bit size_t.
In C++ the modulus operator is defined so that the following is true for all integers except for b == 0:
(a/b)*b + a%b == a
So it is forced to be consistent with the integer division, which from C++ 11 onwards truncates to zero even for negative numbers. Hence everything is well defined even for negative numbers.
However, in your case you have an signed / unsigned division (because .size() returns unsigned) and the usual signed/unsigned rules apply. This means that in this case all arguments are converted to unsigned before the operation is carried out (see also Ruslan's comment).
So -4 is converted to unsigned (and becomes a very large number) and then modulo is carried out.
You can also see this as 5 is not a correct answer for -4 modulo 7 with any definition of integer division (3 would be correct).
Arithmetic rules with C and C++ are not intuitive.
Because v.size return size_t.
cout << -4 % size_t(7) << endl; // 5
Take a look modulo-operator-with-negative-values
UPD: and signed-int-modulo-unsigned-int-produces-nonsense-results
This is due to the type of v.size(), which is an unsigned type. Due to integer promotion, this means that the result will also be treated as unsigned, despite i being a signed type.
I am assuming you are compiling on 64 bit. This means that in addition to promotion to unsigned, the result will also be of the 64 bit type unsigned long long. Step by step:
unsigned long long _i = (unsigned long long)-4; // 0xFFFFFFFFFFFFFFFC!
unsigned long long result = _i % (unsigned long long)7; // 5
Since presumably you want to preserve the signedness of i, in this case it is enough to cast v.size() to a signed type to prevent the promotion to unsigned:
i % (int)v.size() will give -4.
From cppreference on Usual arithmetic conversions and C++ standard
Otherwise (the signedness is different): If the unsigned type has conversion rank greater than or equal to the rank of the signed type, then the operand with the signed type is implicitly converted to the unsigned type.
-4 is signed and 7 is size_t which is an unsigned type, so -4 is converted to unsigned first and then modulus is carried out.
With that mind, if you break it down, you will immediately see what is happening:
size_t s = -4; // s = 18446744073709551612 on a 64 bit system
size_t m = 7;
std::cout << s % m << '\n'; //5
The results might be different for a 32-bit system.
cout << -4 % 7 << endl; still prints -4. Why? It's because the type of both -4 and 7 is int.
C++ standard §5.13.2.3 Type of an integer literal
The type of an integer-literal is the first type in the list in Table 8 corresponding to its optional integer-suffix in which its value can be represented. An integer-literal is a prvalue.
Table 8: Types of integer-literals without suffix:
int
long int
long long int
So, -4 and 7 both are int in this case and hence the result of modulo is -4.
I am trying to convert 4 bytes to an integer using C++.
This is my code:
int buffToInteger(char * buffer)
{
int a = (int)(buffer[0] << 24 | buffer[1] << 16 | buffer[2] << 8 | buffer[3]);
return a;
}
The code above works in almost all cases, for example:
When my buffer is: "[\x00, \x00, \x40, \x00]" the code will return 16384 as expected.
But when the buffer is filled with: "[\x00, \x00, \x3e, \xe3]", the code won't work as expected and will return "ffffffe1".
Does anyone know why this happens?
Your buffer contains signed characters. So, actually, buffer[0] == -29, which upon conversion to int gets sign-extended to 0xffffffe3, and in turn (0x3e << 8) | 0xffffffe3 == 0xffffffe3.
You need ensure your individual buffer bytes are interpreted unsigned, either by declaring buffer as unsigned char *, or by explicitly casting:
int a = int((unsigned char)(buffer[0]) << 24 |
(unsigned char)(buffer[1]) << 16 |
(unsigned char)(buffer[2]) << 8 |
(unsigned char)(buffer[3]));
In the expression buffer[0] << 24 the value 24 is an int, so buffer[0] will also be converted to an int before the shift is performed.
On your system a char is apparently signed, and will then be sign extended when converted to int.
There's a implict promotion to a signed int in your shifts.
That's because char is (apparently) signed on your platform (the common thing) and << promotes to integers implicitly. In fact none of this would work otherwise because << 8 (and higher) would scrub all your bits!
If you're stuck with using a buffer of signed chars this will give you what you want:
#include <iostream>
#include <iomanip>
int buffToInteger(char * buffer)
{
int a = static_cast<int>(static_cast<unsigned char>(buffer[0]) << 24 |
static_cast<unsigned char>(buffer[1]) << 16 |
static_cast<unsigned char>(buffer[2]) << 8 |
static_cast<unsigned char>(buffer[3]));
return a;
}
int main(void) {
char buff[4]={0x0,0x0,0x3e,static_cast<char>(0xe3)};
int a=buffToInteger(buff);
std::cout<<std::hex<<a<<std::endl;
// your code goes here
return 0;
}
Be careful about bit shifting on signed values. Promotions don't just add bytes but may convert values.
For example a gotcha here is that you can't use static_cast<unsigned int>(buffer[1]) (etc.) directly because that converts the signed char value to a signed int and then reinterprets that value as an unsigned.
If anyone asks me all implicit numeric conversions are bad. No program should have so many that they would become a chore. It's a softness in the C++ inherited from C that causes all sorts of problems that far exceed their value.
It's even worse in C++ because they make the already confusing overloading rules even more confusing.
I think this could be also done with use of memcpy:
int buffToInteger(char* buffer)
{
int a;
memcpy( &a, buffer, sizeof( int ) );
return a;
}
This is much faster than the example mentioned in the original post, because it just treats all bytes "as is" and there is no need to do any operations such as bit shift etc.
It also doesn't cause any signed-unsigned issues.
char buffer[4];
int a;
a = *(int*)&buffer;
This takes a buffer reference, type casts it to an int reference and then dereferences it.
int buffToInteger(char * buffer)
{
return *reinterpret_cast<int*>(buffer);
}
This conversion is simple and fast. We only tell compiler to treat a byte array in a memory as a single integer
Following are different programs/scenarios using unsigned int with respective outputs. I don't know why some of them are not working as intended.
Expected output: 2
Program 1:
int main()
{
int value = -2;
std::cout << (unsigned int)value;
return 0;
}
// OUTPUT: 4294967294
Program 2:
int main()
{
int value;
value = -2;
std::cout << (unsigned int)value;
return 0;
}
// OUTPUT: 4294967294
Program 3:
int main()
{
int value;
std::cin >> value; // 2
std::cout << (unsigned int)value;
return 0;
}
// OUTPUT: 2
Can someone explain why Program 1 and Program 2 don't work? Sorry, I'm new at coding.
You are expecting the cast from int to unsigned int to simply change the sign of a negative value while maintaining its magnitude. But that isn't how it works in C or C++. when it comes to overflow, unsigned integers follow modular arithmetic, meaning that assigning or initializing from negatives values such as -1 or -2 wraps around to the largest and second largest unsigned values, and so on. So, for example, these two are equivalent:
unsigned int n = -1;
unsigned int m = -2;
and
unsigned int n = std::numeric_limits<unsigned int>::max();
unsigned int m = std::numeric_limits<unsigned int>::max() - 1;
See this working example.
Also note that there is no substantial difference between programs 1 and 2. It is all down to the sign of the value used to initialize or assign to the unsigned integer.
Casting a value from signed to unsigned changes how the single bits of the value are interpreted. Lets have a look at a simple example with an 8 bit value like char and unsigned char.
The values of a character value range from -128 to 127. Including the 0 these are 256 (2^8) values. Usually the first bit indicates wether the value is negativ or positive. Therefore only the last 7 bits can be used to describe the actual value.
An unsigned character can't take any negative values because there is no bit to determine wether the value should be negative or positiv. Therfore its value ranges from 0 to 256.
When all bits are set (1111 1111) the unsigned character will have the value 256. However the simple character value will treat the first bit as an indicator for a negative value. Sticking to the two's complement this value will be -1.
This is the reason the cast from int to unsigned int does not what you expected it to do, but it does exactly what its supposed to do.
EDIT
If you just want to switch from negative to positive values write yourself a simple function like that
uint32_t makeUnsigned(int32_t toCast)
{
if (toCast < 0)
toCast *= -1;
return static_cast<uint32_t>(toCast);
}
This way you will convert your incoming int to an unsigned int with an maximal value of 2^32 - 1
I scan through the byte representation of an int variable and get somewhat unexpected result.
If I do
int a = 127;
cout << (unsigned int) *((char *)&a);
I get 127 as expected. If I do
int a = 256;
cout << (unsigned int) *((char *)&a + 1);
I get 1 as expected. But if I do
int a = 128;
cout << (unsigned int) *((char *)&a);
I have 4294967168 which is, well… quite fancy.
The question is: is there a way to get 128 when looking at first byte of an int variable which value is 128?
For the same reason that (unsigned int)(char)128 is 4294967168: char is signed by default on most commonly used systems. 128 cannot fit in a signed 8-bit quantity, so when you cast it to char, you get -128 (0x80 in hex).
Then, when you cast -128 to an unsigned int, you get 232 - 128, which is 4294967168.
If you want to get +128, then use an unsigned char instead of char.
char is signed here, so in your second example, *((char *)&a + 1) = ((char)256 +1) = (0+1) = 1, which is encoded as 0b00000000000000000000000000000001, so becomes 1 as an unsigned int.
In your third example, *((char *)&a) = (char)128 = (char)-127, which is encoded as 0b10000000000000000000000000000000, i.e., 2<<31, which is 4294967168
As the comments have pointed out, it looks like what's happening here is that you are running into an oddity of twos complement. In your last cast, since you are not using an unsigned char, the highest-order bit of the byte is being used to indicate positive or negative values. You then only have 7 bits out of the full 8 to represent your value, giving you a range of 0-127 for positive numbers (-128-127 overall).
If you exceed this range, then it wraps, and you get -128, which when casted back to an unsigned int will result in that abnormally large value.
int a = 128;
cout << (unsigned int) *((unsigned char *)&a);
Also all of your code is dependent on running on a little endian machine.
Here's how you should probably be doing these things:
int a = 127;
cout << (unsigned)(unsigned char)(0xFF & a);
int a = 256;
cout << (unsigned)(unsigned char)(0xFF & (a>>8));
int a = 128;
cout << (unsigned)(unsigned char)(0xFF & a);