When I was working on string::npos I noticed something and I couldn't find any explanation for it on the web.
(string::npos == ULONG_MAX)
and
(string::npos == -1)
are true.
So I tried this:
(18446744073709551615 == -1)
which is also true.
How can it be possible? Is it because of binary conversation?
18,446,744,073,709,551,615
This number mentioned, 18,446,744,073,709,551,615, is actually 2^64 − 1. The important thing here is that 2^64-1 is essentially 0-based 2^64. The first digit of an unsigned integer is 0, not 1. So if the maximum value is 1, it has two possible values: 0, or 1 (2).
Let's look at 2^64 - 1 in 64bit binary, all the bits are on.
1111111111111111111111111111111111111111111111111111111111111111b
The -1
Let's look at +1 in 64bit binary.
0000000000000000000000000000000000000000000000000000000000000001b
To make it negative in One's Complement (OCP) we invert the bits.
1111111111111111111111111111111111111111111111111111111111111110b
Computers seldom use OCP, they use Two's Complement (TCP). To get TCP, you add one to OCP.
1111111111111111111111111111111111111111111111111111111111111110b (-1 in OCP)
+ 1b (1)
-----------------------------------------------------------------
1111111111111111111111111111111111111111111111111111111111111111b (-1 in TCP)
"But, wait" you ask, if in Twos Complement -1 is,
1111111111111111111111111111111111111111111111111111111111111111b
And, if in binary 2^64 - 1 is
1111111111111111111111111111111111111111111111111111111111111111b
Then they're equal! And, that's what you're seeing. You're comparing a signed 64 bit integer to an unsigned 64bit integer. In C++ that means convert the signed value to unsigned, which the compiler does.
Update
For a technical correction thanks to davmac in the comments, the conversion from -1 which is signed to an unsigned type of the same size is actually specified in the language, and not a function of the architecture. That all said, you may find the answer above useful for understanding the arch/languages that support two's compliment but lack the spec to ensure results you can depend on.
string::npos is defined as constexpr static std::string::size_type string::npos = -1; (or if it's defined inside the class definition that would be constexpr static size_type npos = -1; but that's really irrelevant).
The wraparound of negative numbers converted to unsigned types (std::string::size_type is basically std::size_t, which is unsigned) is perfectly well-defined by the Standard. -1 wraps to the largest representable value of the unsigned type, which in your case is 18446744073709551615. Note that the exact value is implementation-defined because the size of std::size_t is implementation-defined (but capable of holding the size of the largest possible array on the system in question).
According to the C++ Standard (Document Number: N3337 or Document Number: N4296) std::string::npos is defined the following way
static const size_type npos = -1;
where std::string::size_type is some unsigned integer type. So there is nothing wonderful that std::string::npos is equal to -1. The initializer is converted to the tyhpe of std::string::npos.
As for this equation
(string::npos == ULONG_MAX) is true,
then it means that the type std::string::npos has type in the used implementation unsigned long. This type is usually corresponds to the type size_t.
In this equation
(18446744073709551615 == -1)
The left literal has some unsigned integral type that is appropriate to store such a big literal. Thus the right operand is converted also to this unsigned type by propogating the sign bit. As the left operand represents itself the maximum value of the type then they are equal.
This is all about signed overflow and the fact that negative numbers are stored as 2s complement. The means that to get the absolute value of a negative number, you invert all the bits and add one. Meaning when doing an 8 bit comparison 255 and -1 have the same binary value of 11111111. The same applies to bigger integers
https://en.m.wikipedia.org/wiki/Two%27s_complement
Related
In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.
C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.
For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression
(size_t) (SIZE_MAX + 1)
is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.
Note that the above holds true:
for all unsigned types,
even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.
Also, the above means that you can't rely on specific representations for signed types.
Edit: In order to answer some of the comments:
Let's say we have a code snippet like:
int i = -1;
long j = i;
There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.
Similarly, when we do:
size_t s = (size_t) -1
There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.
If we do:
long j = LONG_MAX;
int i = j;
If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).
It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.
Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.
Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)
That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.
Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.
I could use a little help clarifying this strange comparison when dealing with vector.size() aka size_type
vector<cv::Mat> rebuiltFaces;
int rebuildIndex = 1;
cout << "rebuiltFaces size is " << rebuiltFaces.size() << endl;
while( rebuildIndex >= rebuiltFaces.size() ) {
cout << (rebuildIndex >= rebuiltFaces.size()) << " , " << rebuildIndex << " >= " << rebuiltFaces.size() << endl;
--rebuildIndex;
}
And what I get out of the console is
rebuiltFaces size is 0
1 , 1 >= 0
1 , 0 >= 0
1 , -1 >= 0
1 , -2 >= 0
1 , -3 >= 0
If I had to guess I would say the compiler is blindly casting rebuildIndex to unsigned and the +- but is causing things to behave oddly, but I'm really not sure. Does anyone know?
As others have pointed out, this is due to the somewhat
counter-intuitive rules C++ applies when comparing values with different
signedness; the standard requires the compiler to convert both values to
unsigned. For this reason, it's generally considered best practice to
avoid unsigned unless you're doing bit manipulations (where the actual
numeric value is irrelevant). Regretfully, the standard containers
don't follow this best practice.
If you somehow know that the size of the vector can never overflow
int, then you can just cast the results of std::vector<>::size() to
int and be done with it. This is not without danger, however; as Mark
Twain said: "It's not what you don't know that kills you, it's what you
know for sure that ain't true." If there are no validations when
inserting into the vector, then a safer test would be:
while ( rebuildFaces.size() <= INT_MAX
&& rebuildIndex >= (int)rebuildFaces.size() )
Or if you really don't expect the case, and are prepared to abort if it
occurs, design (or find) a checked_cast function, and use it.
On any modern computer that I can think of, signed integers are represented as two's complement. 32-bit int max is 0x7fffffff, and int min is 0x80000000, this makes adding easy when the value is negative. The system works so that 0xffffffff is -1, and adding one to that causes the bits to all roll over and equal zero. It's a very efficient thing to implement in hardware.
When the number is cast from a signed value to an unsigned value the bits stored in the register don't change. This makes a barely negative value like -1 into a huge unsigned number (unsigned max), and this would make that loop run for a long time if the code inside didn't do something that would crash the program by accessing memory it shouldn't.
Its all perfectly logical, just not necessarily the logic you expected.
Example...
$ cat foo.c
#include <stdio.h>
int main (int a, char** v) {
unsigned int foo = 1;
int bar = -1;
if(foo < bar) printf("wat\n");
return 0;
}
$ gcc -o foo foo.c
$ ./foo
wat
$
In C and C++ languages when unsigned type has the same or greater width than signed type, mixed signed/unsigned comparisons are performed in the domain of unsigned type. The singed value is implicitly converted to unsigned type. There's nothing about the "compiler" doing anything "blindly" here. It was like that in C and C++ since the beginning of times.
This is what happens in your example. Your rebuildIndex is implicitly converted to vector<cv::Mat>::size_type. I.e. this
rebuildIndex >= rebuiltFaces.size()
is actually interpreted as
(vector<cv::Mat>::size_type) rebuildIndex >= rebuiltFaces.size()
When signed value are converted to unsigned type, the conversion is performed in accordance with the rules of modulo arithmetic, which is a well-known fundamental principle behind unsigned arithmetic in C and C++.
Again, all this is required by the language, it has absolutely nothing to do with how numbers are represented in the machine etc and which bits are stored where.
Regardless of the underlying representation (two's complement being the most popular, but one's complement and sign magnitude are others), if you cast -1 to an unsigned type, you will get the largest number that can be represented in that type.
The reason is that unsigned 'overflow' behavior is strictly defined as converting the value to the number between 0 and the maximum value of that type by way of modulo arithmetic. Essentially, if the value is larger than the largest value, you repeatedly subtract the maximum value until your value is in range. If your value is smaller than the smallest value (0), you repeatedly add the largest value until it's in range. So if we assume a 32-bit size_t, you start with -1, which is less than 0. Therefore, you add 2^32, giving you 2^32 - 1, which is in range, so that's your final value.
Roughly speaking, C++ defines promotion rules like this: any type of char or short is first promoted to int, regardless of signedness. Smaller types in a comparison are promoted up to the larger type in the comparison. If two types are the same size, but one is signed and one is unsigned, then the signed type is converted to unsigned. What is happening here is that your rebuildIndex is being converted up to the unsigned size_t. 1 is converted to 1u, 0 is converted to 0u, and -1 is converted to -1u, which when cast to an unsigned type is the largest value of type size_t.
What's the meaning of ~0 in this code?
Can somebody analyze this code for me?
unsigned int Order(unsigned int maxPeriod = ~0) const
{
Point r = *this;
unsigned int n = 0;
while( r.x_ != 0 && r.y_ != 0 )
{
++n;
r += *this;
if ( n > maxPeriod ) break;
}
return n;
}
~0 is the bitwise complement of 0, which is a number with all bits filled. For an unsigned 32-bit int, that's 0xffffffff. The exact number of fs will depend on the size of the value that you assign ~0 to.
It's the one complement, which inverts all bits.
~ 0101 => 1010
~ 0000 => 1111
~ 1111 => 0000
As others have mentioned, the ~ operator performs bitwise complement. However, the result of performing the operation on a signed value is not defined by the standard.
In particular, the value of ~0 need not be -1, which is probably the value intended. Setting the default argument to
unsigned int maxPeriod = -1
would make maxPeriod contain the highest possible value (signed to unsigned conversion is defined as an assignment modulo 2**n, where n is a characteristic number of the given unsigned type (the number of bits of representation)).
Also note that default arguments are not valid in C.
It's a binary complement function.
Basically it means flip each bit.
It is the bitwise complement of 0 which would be, in this example, an int with all the bits set to 1. If sizeof(int) is 4, then the number is 0xffffffff.
Basically, it's saying that maxPeriod has a default value of UINT_MAX. Rather than writing it as UINT_MAX, the author used his knowledge of complements to calculate the value.
If you want to make the code a bit more readable in the future, include
#include <limits>
and change the call to read
unsigned int Order(unsigned int maxPeriod = UINT_MAX) const
Now to explain why ~0 is UINT_MAX. Since we are dealing with an int, in which 0 is represented with all zero bits (00000000). Adding one would give (00000001), adding one more would give (00000010), and one more would give (00000011). Finally one more addition would give (00000100) because the 1's carry.
For unsigned ints, if you repeat the process ad-infiniteum, eventually you have all one bits (11111111), and adding another one will overflow the buffer setting all the bits back to zero. This means that all one bits in an unsigned number is the maximum that data type (int in your case) can hold.
The "~" operation flips all bits from 0 to 1 or 1 to 0, flipping a zero integer (which has all zero bits) effectively gives you UINT_MAX. So he basically the previous coded opted to computer UINT_MAX instead of using the system defined copy located in #include <limits.h>
In the example it is probably an attempt to generate the UINT_MAX value. The technique is possibly flawed for reasons already stated.
The expression does however does have legitimate use to generate a bit mask with all bits set using a literal constant that is type-width independent; but that is not how it is being used in your example.
As others have said, ~ is the bitwise complement operator (sometimes also referred to as bitwise not). It's a unary operator which means that it takes a single input.
Bitwise operators treat the input as a bit pattern and perform their respective operations on each individual bit then return the resulting pattern. Applying the ~ operator to the bit pattern will negate each bit (each zero becomes a one, each one becomes a zero).
In the example you gave, the bit representation of the integer 0 is all zeros. Thus, ~0 will produce a bit pattern of all ones. Even though 0 is an int, it is the bit pattern ~0 that is assigned to maxPeriod (not the int value that would be represented by said bit pattern). Since maxPeriod is an unsigned int, it is assigned the unsigned int value represented by ~0 (a pattern of all ones), which is in fact the highest value that an unsigned int can store before wrapping around back to 0.
In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.
C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.
For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression
(size_t) (SIZE_MAX + 1)
is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.
Note that the above holds true:
for all unsigned types,
even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.
Also, the above means that you can't rely on specific representations for signed types.
Edit: In order to answer some of the comments:
Let's say we have a code snippet like:
int i = -1;
long j = i;
There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.
Similarly, when we do:
size_t s = (size_t) -1
There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.
If we do:
long j = LONG_MAX;
int i = j;
If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).
It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.
Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.
Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)
That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.
Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.
I have this simple part of my code:
int pch = name.find("#");
if(pch == name.npos) continue;
When in name.find doesn't find "#", pch is equal to -1. name.npos instead, if I print it, is 4294967295. Why is it that in this case, when pch is -1 and name.npos is 4294967295, the program enters the if condition?
string::npos denotes that the position is not found. It is usually represented by a constant value of -1.
Reference
This constant is defined with a value of -1, which because size_t is an unsigned integral type, it is the largest possible representable value for this type.
In case, find is unsuccessful, it returns -1.
So, both are equal, in your case and the if is satisfied.
Now, to answer
name.npos instead, if I print it, is 4294967295
because, string::npos is of type size_t which is usually typedef to unsigned type. The -1,which is used to initialize an unsigned type will be stored as and printing the maximum possible unsigned value.
Because of the internal representation of negative numbers. This is called the two's complement.