Why does 4294967295 (the highest number at 32bit) equal -1? - c++

I have this simple part of my code:
int pch = name.find("#");
if(pch == name.npos) continue;
When in name.find doesn't find "#", pch is equal to -1. name.npos instead, if I print it, is 4294967295. Why is it that in this case, when pch is -1 and name.npos is 4294967295, the program enters the if condition?

string::npos denotes that the position is not found. It is usually represented by a constant value of -1.
Reference
This constant is defined with a value of -1, which because size_t is an unsigned integral type, it is the largest possible representable value for this type.
In case, find is unsuccessful, it returns -1.
So, both are equal, in your case and the if is satisfied.
Now, to answer
name.npos instead, if I print it, is 4294967295
because, string::npos is of type size_t which is usually typedef to unsigned type. The -1,which is used to initialize an unsigned type will be stored as and printing the maximum possible unsigned value.

Because of the internal representation of negative numbers. This is called the two's complement.

Related

error when cout vector size minus a number bigger than the size ( vector.size()-n )

I have a vector a={1,2}, so a.size()=2;
I found that:
cout<<(a.size()-3);
will give me a weird number (e.g. 18446744073709551615).
why is it not -1?
if I do:
cout<<(a.size-2);
or
int s = a.size-3;
cout<<s;
everything is normal (-1).
size() returns a size_t, which is an unsigned type. When performing a calculation that's supposed to return a negative number, it underflows, and you get the huge value you observed. When you explicitly assign the result to a (signed) int, you treat the result as a signed number, and get the expected result of -1.

C++ can not calculate a formula with a vector's size in it?

int main() {
vector<int> v;
if (0 < v.size() - 1) {
printf("true");
} else {
printf("false");
}
}
It prints true which indicates 0 < -1
std::vector::size() returns an unsigned integer. If it is 0 and you subtract 1, it underflows and becomes a huge value (specifically std::numeric_limits<std::vector::size_type>::max()). The comparison works fine, but the subtraction produces a value you did not expect.
For more about unsigned underflow (and overflow), see: C++ underflow and overflow
The simplest fix for your code is probably if (1 < v.size()).
v.size() returns a result of size_t, which is an unsigned type. An unsigned value minus 1 is still unsigned. And all non-zero unsigned values are greater than zero.
std::vector<int>::size() returns type size_t which is an unsigned type whose rank is usually at least that of int.
When, in a math operation, you put together a signed type with a unsigned type and the unsigned type doesn't have a lower rank, the signed typed will get converted to the unsigned type (see 6.3.1.8 Usual arithmetic conversions (I'm linking to the C standard, but rules for integer arithmetic are foundational and need to be common to both languages)).
In other words, assuming that size_t isn't unsigned char or unsigned short
(it's usually unsigned long and the C standard recommends it shouldn't be unsigned long long unless necessary)
(size_t)0 - 1
gets implicitly translated to
(size_t)0 - (size_t)1
which is a positive number equal to SIZE_MAX (-1 cannot be represented in an unsigned type so it gets converted converted by formally "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type" (6.3.1.3p)).
0 is always less than SIZE_MAX.

Why is (18446744073709551615 == -1) true?

When I was working on string::npos I noticed something and I couldn't find any explanation for it on the web.
(string::npos == ULONG_MAX)
and
(string::npos == -1)
are true.
So I tried this:
(18446744073709551615 == -1)
which is also true.
How can it be possible? Is it because of binary conversation?
18,446,744,073,709,551,615
This number mentioned, 18,446,744,073,709,551,615, is actually 2^64 − 1. The important thing here is that 2^64-1 is essentially 0-based 2^64. The first digit of an unsigned integer is 0, not 1. So if the maximum value is 1, it has two possible values: 0, or 1 (2).
Let's look at 2^64 - 1 in 64bit binary, all the bits are on.
1111111111111111111111111111111111111111111111111111111111111111b
The -1
Let's look at +1 in 64bit binary.
0000000000000000000000000000000000000000000000000000000000000001b
To make it negative in One's Complement (OCP) we invert the bits.
1111111111111111111111111111111111111111111111111111111111111110b
Computers seldom use OCP, they use Two's Complement (TCP). To get TCP, you add one to OCP.
1111111111111111111111111111111111111111111111111111111111111110b (-1 in OCP)
+ 1b (1)
-----------------------------------------------------------------
1111111111111111111111111111111111111111111111111111111111111111b (-1 in TCP)
"But, wait" you ask, if in Twos Complement -1 is,
1111111111111111111111111111111111111111111111111111111111111111b
And, if in binary 2^64 - 1 is
1111111111111111111111111111111111111111111111111111111111111111b
Then they're equal! And, that's what you're seeing. You're comparing a signed 64 bit integer to an unsigned 64bit integer. In C++ that means convert the signed value to unsigned, which the compiler does.
Update
For a technical correction thanks to davmac in the comments, the conversion from -1 which is signed to an unsigned type of the same size is actually specified in the language, and not a function of the architecture. That all said, you may find the answer above useful for understanding the arch/languages that support two's compliment but lack the spec to ensure results you can depend on.
string::npos is defined as constexpr static std::string::size_type string::npos = -1; (or if it's defined inside the class definition that would be constexpr static size_type npos = -1; but that's really irrelevant).
The wraparound of negative numbers converted to unsigned types (std::string::size_type is basically std::size_t, which is unsigned) is perfectly well-defined by the Standard. -1 wraps to the largest representable value of the unsigned type, which in your case is 18446744073709551615. Note that the exact value is implementation-defined because the size of std::size_t is implementation-defined (but capable of holding the size of the largest possible array on the system in question).
According to the C++ Standard (Document Number: N3337 or Document Number: N4296) std::string::npos is defined the following way
static const size_type npos = -1;
where std::string::size_type is some unsigned integer type. So there is nothing wonderful that std::string::npos is equal to -1. The initializer is converted to the tyhpe of std::string::npos.
As for this equation
(string::npos == ULONG_MAX) is true,
then it means that the type std::string::npos has type in the used implementation unsigned long. This type is usually corresponds to the type size_t.
In this equation
(18446744073709551615 == -1)
The left literal has some unsigned integral type that is appropriate to store such a big literal. Thus the right operand is converted also to this unsigned type by propogating the sign bit. As the left operand represents itself the maximum value of the type then they are equal.
This is all about signed overflow and the fact that negative numbers are stored as 2s complement. The means that to get the absolute value of a negative number, you invert all the bits and add one. Meaning when doing an 8 bit comparison 255 and -1 have the same binary value of 11111111. The same applies to bigger integers
https://en.m.wikipedia.org/wiki/Two%27s_complement

C: why in K&R is written that EOF does not fit in char?

I just started studying The C programming Language and I need to ask you this question:
I know that the function getchar() gives you an integer that is always positive.
for example for \t the result is 32 and this value can be store in char.
K&R's book say that EOF that is -1 can't be stored in char (but in realty it can).
Anyway it doesn't work with unsigned char.
The explenation that I give to this is that char can store values from -127 to 127, so it can contain -1, but unsigned char can only go from 0 to 255, so it can't contain -1.
Am I right? And why K&R's book says that?
K&R's book say that EOF that is -1 can't be stored in char (but in realty it can).
The standard does not specify whether char is signed or unsigned type. It's up to an implementation to decide whether char is signed or unsigned type. For an implementation that uses an unsigned type for char, you cannot hold the value -1 in a char. int is a signed type and it can hold the value -1. That's the reason for int being the return type of getchar, getc, and fgetc.
The getc function in C accesses a stream, and returns either non-negative byte value in the range 0 to UCHAR_MAX, or else a value equal to the EOF constant, which is negative. It is returned as a type int value.
These two data ranges cannot fit into the type char, whether it is signed or unsigned. The EOF value, if stored in a char, creates an ambiguity because it clashes with a valid byte value. The range 0 to UCHAR_MAX already claims every possible value in a character type.
Suppose we're in the now nearly ubiquitous 8 bit, two's complement world. A signed char has a value from -128 to 127. That range covers -1: the value -1 could occur in a stream of char-s. An unsigned char ranges from 0 to 255. The value -1 doesn't occur; but if -1 is converted unsigned char, it will turn into 255. That is a valid byte value. (Note that EOF isn't necessarily -1, but similar reasoning applies to other negative values. ISO C only says that EOF is negative. It could be INT_MIN!)
If you do capture the return value of getc using a char, then you have to test ferror(stream) || feof(stream) every time you see a value which compares equal to EOF. If this test is false, then the EOF is actually a byte value and you must treat it accordingly.
(This must also be done on a platform where it happens that sizeof (int) == 1).

c++ illogical >= comparison when dealing with vector.size() most likely due to size_type being unsigned

I could use a little help clarifying this strange comparison when dealing with vector.size() aka size_type
vector<cv::Mat> rebuiltFaces;
int rebuildIndex = 1;
cout << "rebuiltFaces size is " << rebuiltFaces.size() << endl;
while( rebuildIndex >= rebuiltFaces.size() ) {
cout << (rebuildIndex >= rebuiltFaces.size()) << " , " << rebuildIndex << " >= " << rebuiltFaces.size() << endl;
--rebuildIndex;
}
And what I get out of the console is
rebuiltFaces size is 0
1 , 1 >= 0
1 , 0 >= 0
1 , -1 >= 0
1 , -2 >= 0
1 , -3 >= 0
If I had to guess I would say the compiler is blindly casting rebuildIndex to unsigned and the +- but is causing things to behave oddly, but I'm really not sure. Does anyone know?
As others have pointed out, this is due to the somewhat
counter-intuitive rules C++ applies when comparing values with different
signedness; the standard requires the compiler to convert both values to
unsigned. For this reason, it's generally considered best practice to
avoid unsigned unless you're doing bit manipulations (where the actual
numeric value is irrelevant). Regretfully, the standard containers
don't follow this best practice.
If you somehow know that the size of the vector can never overflow
int, then you can just cast the results of std::vector<>::size() to
int and be done with it. This is not without danger, however; as Mark
Twain said: "It's not what you don't know that kills you, it's what you
know for sure that ain't true." If there are no validations when
inserting into the vector, then a safer test would be:
while ( rebuildFaces.size() <= INT_MAX
&& rebuildIndex >= (int)rebuildFaces.size() )
Or if you really don't expect the case, and are prepared to abort if it
occurs, design (or find) a checked_cast function, and use it.
On any modern computer that I can think of, signed integers are represented as two's complement. 32-bit int max is 0x7fffffff, and int min is 0x80000000, this makes adding easy when the value is negative. The system works so that 0xffffffff is -1, and adding one to that causes the bits to all roll over and equal zero. It's a very efficient thing to implement in hardware.
When the number is cast from a signed value to an unsigned value the bits stored in the register don't change. This makes a barely negative value like -1 into a huge unsigned number (unsigned max), and this would make that loop run for a long time if the code inside didn't do something that would crash the program by accessing memory it shouldn't.
Its all perfectly logical, just not necessarily the logic you expected.
Example...
$ cat foo.c
#include <stdio.h>
int main (int a, char** v) {
unsigned int foo = 1;
int bar = -1;
if(foo < bar) printf("wat\n");
return 0;
}
$ gcc -o foo foo.c
$ ./foo
wat
$
In C and C++ languages when unsigned type has the same or greater width than signed type, mixed signed/unsigned comparisons are performed in the domain of unsigned type. The singed value is implicitly converted to unsigned type. There's nothing about the "compiler" doing anything "blindly" here. It was like that in C and C++ since the beginning of times.
This is what happens in your example. Your rebuildIndex is implicitly converted to vector<cv::Mat>::size_type. I.e. this
rebuildIndex >= rebuiltFaces.size()
is actually interpreted as
(vector<cv::Mat>::size_type) rebuildIndex >= rebuiltFaces.size()
When signed value are converted to unsigned type, the conversion is performed in accordance with the rules of modulo arithmetic, which is a well-known fundamental principle behind unsigned arithmetic in C and C++.
Again, all this is required by the language, it has absolutely nothing to do with how numbers are represented in the machine etc and which bits are stored where.
Regardless of the underlying representation (two's complement being the most popular, but one's complement and sign magnitude are others), if you cast -1 to an unsigned type, you will get the largest number that can be represented in that type.
The reason is that unsigned 'overflow' behavior is strictly defined as converting the value to the number between 0 and the maximum value of that type by way of modulo arithmetic. Essentially, if the value is larger than the largest value, you repeatedly subtract the maximum value until your value is in range. If your value is smaller than the smallest value (0), you repeatedly add the largest value until it's in range. So if we assume a 32-bit size_t, you start with -1, which is less than 0. Therefore, you add 2^32, giving you 2^32 - 1, which is in range, so that's your final value.
Roughly speaking, C++ defines promotion rules like this: any type of char or short is first promoted to int, regardless of signedness. Smaller types in a comparison are promoted up to the larger type in the comparison. If two types are the same size, but one is signed and one is unsigned, then the signed type is converted to unsigned. What is happening here is that your rebuildIndex is being converted up to the unsigned size_t. 1 is converted to 1u, 0 is converted to 0u, and -1 is converted to -1u, which when cast to an unsigned type is the largest value of type size_t.