difference of two string size - c++

for example
string s1 = "";
string s2 = "a";
The print out of
cout << s1.size() - s2.size() << endl;
is 18446744073709551615?
What's happening here?

std::string::size returns a std::size_t - while the actual type used for this is implementation defined, it's guaranteed to be unsigned:
std::size_t is the unsigned integer type of the result of the sizeof operator
As a result, subtracting two std::size_t's (0 - 1, in this case) can result in integer underflow, causing the resulting value to wrap-around from a "negative" value to a very large positive value.
The value you got, 18446744073709551615, is the same as 2^64 - 1 (which is -1 after underflow on a 64-bit integer), which is in accordance with the above.
To "fix" this (ie. to get -1 instead of the underflowed value), you'll need to cast your values to a signed type before doing the subtraction. However as you can see from this post, there's no signed type that's guaranteed to be large enough. We can make do with long long:
long long diff = static_cast<long long>(s1.size()) - static_cast<long long>(s2.size());

The return type of std::string::size is size_t. size_t is an unsigned integer type and as such, an operation of size_t(0) - size_t(1) will result in integer underflow.
If you really need to do this operation, then you can cast the result of size to a signed integer type to do the calculation:
cout << static_cast<long int>(s1.size()) - static_cast<long int>(s2.size()) << endl;

string::size returns a size_t - i.e. size_t is an unsigned type. So it can store only positive numbers.
So -1 is converted to an unsigned number.
As per the standard, the negative of an unsigned quantity is computed by subtracting its value from 2n, where n is the number of bits in the promoted operand.
On your system, size_t is a 64 bit unsigned type, so n is 64.
So what's printed is
264 - 1 = 18446744073709551615

Related

Why are these two numbers comparing equal?

I'm wondering why it's the case that these two numbers are comparing equal. I had a (maybe false) realisation that I messed up my enums, because in my enums I often do:
enum class SomeFlags : unsigned long long { ALL = ~0, FLAG1 = 1, FLAG2 };
And I thought that 0 being an int and being assigned to an unsigned long long, that there was a mistake.
unsigned long long number1 = ~0;
/* I expect the 0 ( which is an int, to be flipped, which would make it
the max value of an unsigned int. It would then get promoted and be
assigned that value to an unsigned long long. So number1 should
have the value of max value of unsigned int*/
int main()
{
unsigned long long number2 = unsigned long long(0);
number2 = number2 - 1;
/* Here I expect number2 to have the max value of unsigned long long*/
if (number1 == number2)
{
std::cout << "Is same\n";
// They are the same, but how???
}
}
This is really confusing. To emphasise the point:
unsigned long long number1 = int(~0);
/* I feel like I'm saying:
- initialize an int with the value of zero with its bits flipped
- assign it to the unsigned long long
And this somehow ends up as the max value of unsigned long long???*/
cppreference: "The result of operator~ is the bitwise NOT (one's complement) value of the argument (after promotion."
No promotion (to int or unsigned int) is however needed here - but flipping all bits in a signed type with the value 0 will make it -1 (all bits set) which when converted to an unsigned type will be the largest value that type can hold.
To avoid this, make 0 unsigned. The result of ~0U will be an unsigned int with all bits set, which can be converted to any larger unsigned type perfectly.
unsigned long long number1 = ~0U;
You seems to be thinking that the conversion of int to long long is done just by prepending 0 in front of the int. While this is true for the conversion of positive integer, it's not true for negative integers.
unsigned long long number1 = ~0
is identical as
unsigned long long number1 = -1
So when the conversion takes place, it is not simply copying the bits into number1
Since C++11 unsigned arithmetic must use modulo wrapping.
Which means that if you have N bits then any operation OP works like this:
result = (a OP b) % (2^N)
Say you have 8 bits and substract as operation.
unsigned char result1 = (0 - 1) % 256 = -1 % 256 = 255 = 0b11111111
unsigned char result2 = ~(0b00000000) = 0b11111111 = 255
The same holds for larger N and thus your result
In C/C++ integer promotion is governed by the type of the SOURCE argument, not the DESTINATION.
So - the promotion from signed integer types will be sign extended - regardless the type of the destination, be it signed or unsigned.
The promotion from unsigned integer types will be zero extended - again, regardless the type of destination.
In your case 0 is signed integer and ~0 is also signed integer. This will be sign extended to unsigned long long.
The conversion from int to unsigned long long is specified based on the value of the int (aka "the source integer") not the bit pattern. The value is -1 in this case:
4.7 Integral conversions [conv.integral]
A prvalue of an integer type can be converted to a prvalue of another integer type. A prvalue of an unscoped enumeration type can be converted to a prvalue of an integer type.
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]
For an integer with a value of -1 the "least unsigned integer congruent to the source integer" modulo 2^64 is (-1 + 2^64).
The simple-to-understand effect of this rule is that int is sign-extended when being converted to an unsigned type.

Is a negative integer summed with a greater unsigned integer promoted to unsigned int?

After getting advised to read "C++ Primer 5 ed by Stanley B. Lipman" I don't understand this:
Page 66. "Expressions Involving Unsigned Types"
unsigned u = 10;
int i = -42;
std::cout << i + i << std::endl; // prints -84
std::cout << u + i << std::endl; // if 32-bit ints, prints 4294967264
He said:
In the second expression, the int value -42 is converted to unsigned before the addition is done. Converting a negative number to unsigned behaves exactly as if we had attempted to assign that negative value to an unsigned object. The value “wraps around” as described above.
But if I do something like this:
unsigned u = 42;
int i = -10;
std::cout << u + i << std::endl; // Why the result is 32?
As you can see -10 is not converted to unsigned int. Does this mean a comparison occurs before promoting a signed integer to an unsigned integer?
-10 is being converted to a unsigned integer with a very large value, the reason you get a small number is that the addition wraps you back around. With 32 bit unsigned integers -10 is the same as 4294967286. When you add 42 to that you get 4294967328, but the max value is 4294967296, so we have to take 4294967328 modulo 4294967296 and we get 32.
Well, I guess this is an exception to "two wrongs don't make a right" :)
What's happening is that there are actually two wrap arounds (unsigned overflows) under the hood and the final result ends up being mathematically correct.
First, i is converted to unsigned and as per the wrap around behavior the value is std::numeric_limits<unsigned>::max() - 9.
When this value is summed with u the mathematical result would be std::numeric_limits<unsigned>::max() - 9 + 42 == std::numeric_limits<unsigned>::max() + 33 which is an overflow and we get another wrap around. So the final result is 32.
As a general rule in an arithmetic expression if you only have unsigned overflows (no matter how many) and if the final mathematical result is representable in the expression data type, then the value of the expression will be the mathematically correct one. This is a consequence of the fact that unsigned integers in C++ obey the laws of arithmetic modulo 2n (see bellow).
Important notice. According to C++ unsigned arithmetic does not overflow:
§6.9.1 Fundamental types [basic.fundamental]
Unsigned integers shall obey the laws of arithmetic modulo 2n where n
is the number of bits in the value representation of that particular
size of integer 49
49) This implies that unsigned arithmetic does not overflow because a
result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting unsigned integer type.
I will however leave "overflow" in my answer to express values that cannot be represented in regular arithmetic.
Also what we colloquially call "wrap around" is in fact just the arithmetic modulo nature of the unsigned integers. I will however use "wrap around" also because it is easier to understand.
i is in fact promoted to unsigned int.
Unsigned integers in C and C++ implement arithmetic in ℤ / 2nℤ, where n is the number of bits in the unsigned integer type. Thus we get
[42] + [-10] ≡ [42] + [2n - 10] ≡ [2n + 32] ≡ [32],
with [x] denoting the equivalence class of x in ℤ / 2nℤ.
Of course, the intermediate step of picking only non-negative representatives of each equivalence class, while it formally occurs, is not necessary to explain the result; the immediate
[42] + [-10] ≡ [32]
would also be correct.
"In the second expression, the int value -42 is converted to unsigned before the addition is done"
yes this is true
unsigned u = 42;
int i = -10;
std::cout << u + i << std::endl; // Why the result is 32?
Supposing we are in 32 bits (that change nothing in 64b, this is just to explain) this is computed as 42u + ((unsigned) -10) so 42u + 4294967286u and the result is 4294967328u truncated in 32 bits so 32. All was done in unsigned
This is part of what is wonderful about 2's complement representation. The processor doesn't know or care if a number is signed or unsigned, the operations are the same. In both cases, the calculation is correct. It's only how the binary number is interpreted after the fact, when printing, that is actually matters (there may be other cases, as with comparison operators)
-10 in 32BIT binary is FFFFFFF6
42 IN 32bit BINARY is 0000002A
Adding them together, it doesn't matter to the processor if they are signed or unsigned, the result is: 100000020. In 32bit, the 1 at the start will be placed in the overflow register, and in c++ is just disappears. You get 0x20 as the result, which is 32.
In the first case, it is basically the same:
-42 in 32BIT binary is FFFFFFD6
10 IN 32bit binary is 0000000A
Add those together and get FFFFFFE0
FFFFFFE0 as a signed int is -32 (decimal). The calculation is correct! But, because it is being PRINTED as an unsigned, it shows up as 4294967264. It's about interpreting the result.

Why does the unsigned int give a negative value in c++?

I have two functions add and main as follows.
int add(unsigned int a, unsigned int b)
{
return a+b;
}
int main()
{
unsigned int a,b;
cout << "Enter a value for a: ";
cin >> a;
cout << "Enter a value for b: ";
cin >> b;
cout << "a: " << a << " b: "<<b <<endl;
cout << "Result is: " << add(a,b) <<endl;
return 0;
}
When I run this program I get the following results:
Enter a value for a: -1
Enter a value for b: -2
a: 4294967295 b: 4294967294
Result is: -3
Why is the result -3?
Because add returns an int (no unsigned int) which cannot represent 4294967295 + 4294967294 = 4294967293 (unsigned integer arithmetic is defined mod 2^n with n = 32 in this case) because the result is too big.
Thus, you have signed integer overflow (or, more precisely, an implicit conversion from a source integer that cannot be represented as int) which has an implementation defined result, i.e. any output (that is representable as int) would be "correct".
The reason for getting exactly -3 is that the result is 2^32 - 3 and that gets converted to -3 on your system. But still, note that any result would be equally legal.
int add(unsigned int a, unsigned int b)
{
return a+b;
}
The expression a+b adds two operands of type unsigned int, yielding an unsigned int result. Unsigned addition, strictly speaking, does not "overflow"; rather than result is reduced modulo MAX + 1, where MAX is the maximum value of the unsigned type. In this case, assuming 32-bit unsigned int, the result of adding 4294967295 + 4294967294 is well defined: it's 4294967293, or 232-3.
Since add is defined to return an int result, the unsigned value is implicitly converted from unsigned int to int. Unlike an arithmetic overflow, an unsigned-to-signed conversion that can't be represented in the target type yields an implementation-defined result. On a typical implementation, such a conversion (where the source and target have the same size) will reinterpret the representation, yielding -3. Other results are possible, depending on the implementation, but not particularly likely.
As for why a and b were set to those values in the first place, apparently that's how cin >> a behaves when a is an unsigned value and the input is negative. I'm not sure whether that behavior is defined by the language, implementation-defined, or undefined. In any case, once you have those values, the result returned by add follows as described above.
If you are intending to return an unsigned int then you need to add unsigned to your function declaration. If you changed your return type to an unsigned int
and you use the values -1 & -2 then this will be your output:
a: 4294967295 b: 4294967294
Result: 4294967293
unsigned int ranges from [0, 4294967295] provided an unsigned int is 4bytes in size on your local machine. When you input -1 for an unsigned int you have buffer overflow and what happens here is the compiler will set -1 to be the largest possible valid number in an unsigned int. When you pass -2 into your function the same thing happens but you are being index back to the second largest value. With unsigned int there is no "sign" for negatives stored. If you take the largest possible value of an unsigned as stated above by (a) and add 1 it will give you a value of 0. These values are passed into the function, and the function creates 2 stack variables of local scope within this function.
So now you have a copy of a & b on the stack and a has a value of unsigned int max size and b has a value of unsigned int max size - 1. Now you are performing an addition on these two values which exceeds the max value size of an unsigned int and wrapping occurs in the opposite direction. So if the index value starts at .....95 and you add 1 this gives you 0 for the last digit, so if you take the second value which is max value - 1 : .....94 subtract one from it because we already reached 0 now we are in the positive direction. This will give you a result of ......93 which the function is returning if your return type is unsigned int.
This same concept applies if your return type is int, however what happens here
is this: The addition of the two unsigned values are the same result giving you
.....93 then the compiler will implicitly cast this to an int. Well here the range value for a signed int is -2147483648 to 2147483647 - one bit is used to store the (signed value) but it also depends on if two's compliment is being used etc.
The compiler here is smart enough to know that the signed int has a range of these values but the wrapping still occurs. What happens when we store 4294967293 into a singed int? Well the max value of int in the positive is 2147483647 so if we subtract the two (4294967293 - 2147483647) we would get 2147483646 left over. At this point does this value fit in the range of a signed int max value? It does however because of the implicit casting being done from an unsigned to a signed value we have 2 bits to account for, the signed itself and the wrapping value meaning max_value + 1 = 0 to account for, except this doesn't happen with signed values when you add 1 to max_value you get the largest possible -max_value.
For the unsigned values:
- ...95 = -1
- ...94 = -2
- ...93 = -3
With signed values the negative is preserved in its own bit, or if twos compliment is used etc., pending on the definitions to a signed int within the compiler. Here the compiler recognizes this unsigned value as being negative even though negative values are not stored in an unsigned for the sign is not preserved. So when it explicitly casts from an unsigned to signed one bit is used to store the signed value then the calculations are done and the wrapping occurs from a buffer overflow. So as we subtracted the actual unsigned value that would represent -3 : 4294967293 with the max+ value for a signed int 2147483647 we got the value 2147483646 now with signed ints if you add 1 to the max value it does not give you 0 like an unsigned does, it will give you the largest -signed value which for a 4byte int is -2147483648, since this does fit in the max+ value and the compiler knows that this is supposed to be a negative value if we add our remainder by the -max_value for a signed int : 2147483646 + (-2147483648) this will give us -2, but because of the fact that the wrapping with int is different we have to subtract 1 from this -2 - 1 = -3.
When it converts from an unsigned int to an int, the number is over what a signed integer can hold. This causes it to overflow as a negative value.
If you change the return value to an unsigned integer, your problem should be solved.

Wrap around range of unsigned int in C++?

I am going through C++ Primer 5th Edition and am currently doing the signed/unsigned section. A quick question I have is when there is a wrap-around, say, in this block of code:
unsigned u = 10;
int i = -42;
std::cout << i + i << std::endl; // prints -84
std::cout << u + i << std::endl; // if 32-bit ints, prints 4294967264
I thought that the max range was 4294967295 with the 0 being counted, so I was wondering why the wrap-around seems to be done from 4294967296 in this problem.
Unsigned arithmetic is modulo (maximum value of the type plus 1).
If maximum value of an unsigned is 4294967295 (2^32 - 1), the result will be mathematically equal to (10-42) modulo 4294967296 which equals 10-42+4294967296 i.e. 4294967264
When an out-of-range value is converted to an unsigned type, the result is the remainder of it modulo the number of values the target unsigned type can hold. For instance, the result of n converted to unsigned char is n % 256, because unsigned char can hold values 0 to 255.
It's similar in your example, the wrap-around is done using 4294967296, the number of values that a 32-bit unsigned integer can hold.
Given unsigned int that is 32 bits you're correct that the range is [0, 4294967295].
Therefore -1 is 4294967295. Which is logically equivalent to 4294967296 - 1 which should explain the behavior you're seeing.

c/c++ left shift unsigned vs signed

I have this code.
#include <iostream>
int main()
{
unsigned long int i = 1U << 31;
std::cout << i << std::endl;
unsigned long int uwantsum = 1 << 31;
std::cout << uwantsum << std::endl;
return 0;
}
It prints out.
2147483648
18446744071562067968
on Arch Linux 64 bit, gcc, ivy bridge architecture.
The first result makes sense, but I don't understand where the second number came from. 1 represented as a 4byte int signed or unsigned is
00000000000000000000000000000001
When you shift it 31 times to the left, you end up with
10000000000000000000000000000000
no? I know shifting left for positive numbers is essentially 2^k where k is how many times you shift it, assuming it still fits within bounds. Why is it I get such a bizarre number?
Presumably you're interested in why this: unsigned long int uwantsum = 1 << 31; produces a "strange" value.
The problem is pretty simple: 1 is a plain int, so the shift is done on a plain int, and only after it's complete is the result converted to unsigned long.
In this case, however, 1<<31 overflows the range of a 32-bit signed int, so the result is undefined1. After conversion to unsigned, the result remains undefined.
That said, in most typical cases, what's likely to happen is that 1<<31 will give a bit pattern of 10000000000000000000000000000000. When viewed as a signed 2's complement2 number, this is -2147483648. Since that's negative, when it's converted to a 64-bit type, it'll be sign extended, so the top 32 bits will be filled with copies of what's in bit 31. That gives: 1111111111111111111111111111111110000000000000000000000000000000 (33 1-bits followed by 31 0-bits).
If we then treat that as an unsigned 64-bit number, we get 18446744071562067968.
§5.8/2:
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
In theory, the computer could use 1's complement or signed magnitude for signed numbers--but 2's complement is currently much more common than either of those. If it did use one of those, we'd expect a different final result.
The literal 1 with no U is a signed int, so when you shift << 31, you get integer overflow, generating a negative number (under the umbrella of undefined behavior).
Assigning this negative number to an unsigned long causes sign extension, because long has more bits than int, and it translates the negative number into a large positive number by taking its modulus with 264, which is the rule for signed-to-unsigned conversion.
It's not "bizarre".
Try printing the number in hex and see if it's any more recognizable:
std::cout << std::hex << i << std::endl;
And always remember to qualify your literals with "U", "L" and/or "LL" as appropriate:
http://en.cppreference.com/w/cpp/language/integer_literal
unsigned long long l1 = 18446744073709550592ull;
unsigned long long l2 = 18'446'744'073'709'550'592llu;
unsigned long long l3 = 1844'6744'0737'0955'0592uLL;
unsigned long long l4 = 184467'440737'0'95505'92LLU;
I think it is compiler dependent .
It gives same value
2147483648
2147483648
on my machiene (g++) .
Proof : http://ideone.com/cvYzxN
And if overflow is there , then because uwantsum is unsigned long int and unsigned values are ALWAYS positive , conversion is done from signed to unsigned by using (uwantsum)%2^64 .
Hope this helps !
Its in the way you printed it out.
using formar specifier %lu should represent a proper long int