IF comparison test by value failure ( C++ ) - c++

My current project would be too lengthy to post here, however, this is the single line that produces a really strange behavior, at least as I see it . I use the clip object to store relatively short strings ( maximum size in use in 35 ), however the condition fails when dealing with negative values in start .
I tried adding (const int) in front of clip.length(), but the output wouldn't change :
Any ideas what does this mean ? I'm using G++ on Ubuntu 14.04 .
void Cut ( const int start, const int stop )
{ if (start > clip.length() ) cout << "start: " << start << " > " << clip.length() << endl;
...
}

It is likely that length() returns unsigned int, so another argument, signed int, gets converted to unsigned too, and then comparison takes place.
It is a part of so called usual arithmetic conversions. See the standard:
Expressions [expr]
....
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.

The reason is this comparison:
if (start > clip.length()) {
You are comparing a signed and an unsigned here. I suggest changing both operands to have the same type, e.g.:
if (start > static_cast<int>(clip.length())) {
Additional, the original code produces a nice compiler warning when warnings are turned on (and they should be turned on to avoid such issues):
test.cpp:8:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
With g++, try using -Wall and maybe even -Wextra.

Related

Is there an operator precedence problem I'm missing? Compare of unsigned short with inverse fails

I can THINK of why this wouldn't work, but I don't understand why many of the workarounds I've tried don't work. Below is an example of the code I'm trying to make work. The intent should be obvious, but compiling with GCC 7.4.0 for Windows 32 bit, Visual C 32 bit and Visual C 64 bit as well as the same compilers in C++ modes, all of them result in the same answer, so I'm sure it's not just a compiler bug.
The code is:
unsigned short usAlgo = 0x0001;
unsigned short usNotAlgo = ~usAlgo;
if ( usAlgo == ~usNotAlgo )
printf("Pass\n");
else
printf("Fail\n");
On all the compilers I've tried, this code prints "Fail". By a slight rearrangement to:
unsigned short usCheck = ~usNotAlgo;
if ( usAlgo == usCheck )
It prints "Pass". I would have thought the usCheck would get optimized out anyway, so why is this different?
I have tried all kinds of workarounds that don't work, with bit masking, parentheses, making them signed values, and such like:
if ( usAlgo == (~usNotAlgo) & 0xffff )
or
if ( (unsigned int)(usAlgo) == ~(unsigned int)(usNotAlgo) )
I think I've discovered that the first of those two fails because '==' has a higher order of precedence than '&', but I can't for the life of me understand why the simple:
if ( usAlgo == ~usNotAlgo )
fails.
Looking at the compiler output doesn't REALLY help, other than I can see the "real" comparison ends up being:
if( 0x00000001 == 0xFFFF0001 )
implying, the unsigned short (0xFFFE) was first promoted to an unsigned int (0x0000FFFE) and THEN negated. (That's why we thought making them signed might sign extend to 0xFFFFFFFE.
I obviously have the answer to how to fix this, but I need to understand WHY.
Any ideas?
[Edit: Grammar]
As you have noticed, usNotAlgo was promoted to type int before the ~ operator was applied. Generally speaking, anytime a type smaller than int is used in an expression, it is first promoted to int.
This is documented in section 6.3.1.1p2 of the C standard:
The following may be used in an expression wherever an int or
unsigned int may be used:
An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less
than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int, signed int,or unsigned int.
If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is
converted to an int; otherwise, it is converted to an
unsigned int. These are called the integer promotions. All
other types are unchanged by the integer promotions.
Section 6.5.3.3p4 regarding the ~ operator specifically says:
The result of the ~ operator is the bitwise complement of its (promoted) operand (that is,each bit in the result is set if and
only if the corresponding bit in the converted operand is not set).
The integer promotions are performed on the operand, and the
result has the promoted type. If the promoted type is an unsigned
type, the expression ~E is equivalent to the maximum value
representable in that type minus E.
This can be fixed by casting the result back to unsigned short to mask off the additional bits:
if ( usAlgo == (unsigned short)~usNotAlgo )
The problem is that by writing ~usNotAlgo in the if statement, it got promoted to an int value, and then because of comparison, usAlgo value got promoted to int as well. This is why you see if(0x00000001 == 0xFFFF0001) output from the compiler (instead of expected if( 0x0001 == 0x0001 )).
In order to fix it, cast ~usNotAlgo to unsigned short:
if (usAlgo == (unsigned short)~usNotAlgo) {code...}

Signed arithmetic

I'm running this piece of code, and I'm getting the output value as (converted to hex) 0xFFFFFF93 and 0xFFFFFF94.
#include <iostream>
using namespace std;
int main()
{
char x = 0x91;
char y = 0x02;
unsigned out;
out = x + y;
cout << out << endl;
out = x + y + 1;
cout << out << endl;
}
I'm confused about the arithmetic going on here. Is it because all the higher bits in out are taken to be 1 by default?
When I typecast out to an int, I get the answers as (in int) -109 and -108. Any idea why this is happening?
So there are a couple of things going on here. One, char can be either signed or unsigned, in your case it is signed. Two assignment will covert the right hand side to the type of the left hand side. Using the right warning flags would help, clang with the -Wconversion flags warns:
warning: implicit conversion changes signedness: 'int' to 'unsigned int' [-Wsign-conversion]
out = x + y;
~ ~~^~~
In this case to do this conversion it will basically add or subtract the unsigned max + 1 to the number to be converted.
We can see the same results using the limits header:
#include <limits>
//....
std::cout << std::hex << (std::numeric_limits<unsigned>::max() + 1) + (x+y) << std::endl ;
//...
and the result is:
ffffff93
For reference the draft C++ standard section 5.17 Assignment and compound assignment operators says:
If the left operand is not of class type, the expression is implicitly converted (Clause 4) to the cv-unqualified type of the left operand.
Clause 4 under 4.7 Integral conversions says:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]
which is equivalent to adding or subtracting UMAX + 1.
A plain char usually also represents a signed type! Since compatibility reasons with C syntax, this isn't specified further, and may be compiler implementation dependent. You always can make it distinct signed arithmetic behavior, by explicitly specifying the signed / unsigned keywords.
Try replacing your char definitions like this
unsigned char x = 0x91;
unsigned char y = 0x02;
to get the results you expect!
See the fully working sample here.
The negative numbers are represented internally as 2's complement and hence, their first bit is a 1. When you work in hex (and print in hex), the significant bits are displayed as 1's leading to numbers like you showed.
C++ doesn't specify whether char is signed or unsigned. Here they are signed, so when they are promoted to int's, the negative value is used which is then converted to unsigned. Use or cast to unsigned char.

c++ illogical >= comparison when dealing with vector.size() most likely due to size_type being unsigned

I could use a little help clarifying this strange comparison when dealing with vector.size() aka size_type
vector<cv::Mat> rebuiltFaces;
int rebuildIndex = 1;
cout << "rebuiltFaces size is " << rebuiltFaces.size() << endl;
while( rebuildIndex >= rebuiltFaces.size() ) {
cout << (rebuildIndex >= rebuiltFaces.size()) << " , " << rebuildIndex << " >= " << rebuiltFaces.size() << endl;
--rebuildIndex;
}
And what I get out of the console is
rebuiltFaces size is 0
1 , 1 >= 0
1 , 0 >= 0
1 , -1 >= 0
1 , -2 >= 0
1 , -3 >= 0
If I had to guess I would say the compiler is blindly casting rebuildIndex to unsigned and the +- but is causing things to behave oddly, but I'm really not sure. Does anyone know?
As others have pointed out, this is due to the somewhat
counter-intuitive rules C++ applies when comparing values with different
signedness; the standard requires the compiler to convert both values to
unsigned. For this reason, it's generally considered best practice to
avoid unsigned unless you're doing bit manipulations (where the actual
numeric value is irrelevant). Regretfully, the standard containers
don't follow this best practice.
If you somehow know that the size of the vector can never overflow
int, then you can just cast the results of std::vector<>::size() to
int and be done with it. This is not without danger, however; as Mark
Twain said: "It's not what you don't know that kills you, it's what you
know for sure that ain't true." If there are no validations when
inserting into the vector, then a safer test would be:
while ( rebuildFaces.size() <= INT_MAX
&& rebuildIndex >= (int)rebuildFaces.size() )
Or if you really don't expect the case, and are prepared to abort if it
occurs, design (or find) a checked_cast function, and use it.
On any modern computer that I can think of, signed integers are represented as two's complement. 32-bit int max is 0x7fffffff, and int min is 0x80000000, this makes adding easy when the value is negative. The system works so that 0xffffffff is -1, and adding one to that causes the bits to all roll over and equal zero. It's a very efficient thing to implement in hardware.
When the number is cast from a signed value to an unsigned value the bits stored in the register don't change. This makes a barely negative value like -1 into a huge unsigned number (unsigned max), and this would make that loop run for a long time if the code inside didn't do something that would crash the program by accessing memory it shouldn't.
Its all perfectly logical, just not necessarily the logic you expected.
Example...
$ cat foo.c
#include <stdio.h>
int main (int a, char** v) {
unsigned int foo = 1;
int bar = -1;
if(foo < bar) printf("wat\n");
return 0;
}
$ gcc -o foo foo.c
$ ./foo
wat
$
In C and C++ languages when unsigned type has the same or greater width than signed type, mixed signed/unsigned comparisons are performed in the domain of unsigned type. The singed value is implicitly converted to unsigned type. There's nothing about the "compiler" doing anything "blindly" here. It was like that in C and C++ since the beginning of times.
This is what happens in your example. Your rebuildIndex is implicitly converted to vector<cv::Mat>::size_type. I.e. this
rebuildIndex >= rebuiltFaces.size()
is actually interpreted as
(vector<cv::Mat>::size_type) rebuildIndex >= rebuiltFaces.size()
When signed value are converted to unsigned type, the conversion is performed in accordance with the rules of modulo arithmetic, which is a well-known fundamental principle behind unsigned arithmetic in C and C++.
Again, all this is required by the language, it has absolutely nothing to do with how numbers are represented in the machine etc and which bits are stored where.
Regardless of the underlying representation (two's complement being the most popular, but one's complement and sign magnitude are others), if you cast -1 to an unsigned type, you will get the largest number that can be represented in that type.
The reason is that unsigned 'overflow' behavior is strictly defined as converting the value to the number between 0 and the maximum value of that type by way of modulo arithmetic. Essentially, if the value is larger than the largest value, you repeatedly subtract the maximum value until your value is in range. If your value is smaller than the smallest value (0), you repeatedly add the largest value until it's in range. So if we assume a 32-bit size_t, you start with -1, which is less than 0. Therefore, you add 2^32, giving you 2^32 - 1, which is in range, so that's your final value.
Roughly speaking, C++ defines promotion rules like this: any type of char or short is first promoted to int, regardless of signedness. Smaller types in a comparison are promoted up to the larger type in the comparison. If two types are the same size, but one is signed and one is unsigned, then the signed type is converted to unsigned. What is happening here is that your rebuildIndex is being converted up to the unsigned size_t. 1 is converted to 1u, 0 is converted to 0u, and -1 is converted to -1u, which when cast to an unsigned type is the largest value of type size_t.

When do we need to mention/specify the type of integer for number literals?

I came across a code like below:
#define SOME_VALUE 0xFEDCBA9876543210ULL
This SOME_VALUE is assigned to some unsigned long long later.
Questions:
Is there a need to have postfix like ULL in this case ?
What are the situation we need to specify the type of integer used ?
Do C and C++ behave differently in this case ?
In C, a hexadecimal literal gets the first type of int, unsigned int, long, unsigned long, long long or unsigned long long that can represent its value if it has no suffix. I wouldn't be surprised if C++ has the same rules.
You would need a suffix if you want to give a literal a larger type than it would have by default or if you want to force its signedness, consider for example
1 << 43;
Without suffix, that is (almost certainly) undefined behaviour, but 1LL << 43; for example would be fine.
I think not, but maybe that was required for that compiler.
For example, printf("%ld", SOME_VALUE); if SOME_VALUE's integer type is not specified, this might end up with the wrong output.
A good example for the use of specifying a suffix in C++ is overloaded functions. Take the following for example:
#include <iostream>
void consumeInt(unsigned int x)
{
std::cout << "UINT" << std::endl;
}
void consumeInt(int x)
{
std::cout << "INT" << std::endl;
}
void consumeInt(unsigned long long x)
{
std::cout << "ULL" << std::endl;
}
int main(int argc, const char * argv[])
{
consumeInt(5);
consumeInt(5U);
consumeInt(5ULL);
return 0;
}
Results in:
INT
UINT
ULL
You do not need suffixes if your only intent is to get the right value of the number; C automatically chooses a type in which the value fits.
The suffixes are important if you want to force the type of the expression, e.g. for purposes of how it interacts in expressions. Making it long, or long long, may be needed when you're going to perform an arithmetic operation that would overflow a smaller type (for example, 1ULL<<n or x*10LL), and making it unsigned is useful when you want to the expression as a whole to have unsigned semantics (for example, c-'0'<10U, or n%2U).
When you don't mention any suffix, then the type of integral literal is deduced to be int by the compiler. Since some integral literal may overflow if its type is deduced to be int, so you add suffix to tell the compiler to deduce the type to be something other than int. That is what you do when you write 0xFEDCBA9876543210ULL.
You can also use suffix when you write floating-pointer number. 1.2 is a double, while 1.2f is a float.

Implicit type casts in expressions with bit shifts

In the code below, why 1-byte anUChar is automatically converted into 4 bytes to produce the desired result 0x300 (instead of 0x0 if anUChar would remain 1 byte in size):
unsigned char anUChar = 0xc0; // only the two most significant bits are set
int anInt = anUChar << 2; // 0x300 (correct)
But in this code, aimed at a 64-bit result, no automatic conversion into 8 bytes happens:
unsigned int anUInt = 0xc0000000; // only the two most significant bits are set
long long aLongLong = anUInt << 2; // 0x0 (wrong, means no conversion occurred)
And only placing an explicit type cast works:
unsigned int anUInt = 0xc0000000;
long long aLongLong = (long long)anUInt << 2; // 0x300000000 (correct)
And most importantly, would this behavior be the same in a program that targets 64-bit machines?
By the way, which of the two is most right and portable: (type)var << 1 or ((type)var) << 1?
char always gets promoted to int during arithmetic. I think this is specified behavior in the C standard.
However, int is not automatically promoted to long long.
Under some situations, some compilers (Visual Studio) will actually warn you about this if you try to left-shift a smaller integer and store it into a larger one.
By the way, which of the two is most right and portable: (type)var <<
1 or ((type)var) << 1?
Both are fine and portable. Though I prefer the first one since it's shorter. Casting has higher precedence than shift.
Conversion does happen. The problem is the result of the expression anUInt << 2 is an unsigned int because anUInt is an unsigned int.
Casting anUInt to a long long (actually, this is conversion in this particular case) is the correct thing to do.
Neither (type)var << 1 or ((type)var) << 1 is more correct or portable because operator precedence is strictly defined by the Standard. However, the latter is probably better because it's easier to understand to humans looking at the code casually. Others may disagree with this assertion.
EDIT:
Note that in your first example:
unsigned char anUChar = 0xc0;
int anInt = anUChar << 2;
...the result of the expression anUChar << 2 is not an unsigned char as you might expect, but an int because of Integral Promotion.
The operands of operator<< are integral or enumeration type (See Standard 5.8/1). When a binary operator that expects operands of arithmetic or enumeration type is called, the compiler attempts to convert both operands to the same type, so that the expression may yield a common type. In this case, integral promotion is performed on both operands (5/9). When an unsigned char takes part in integral promotion, it will be converted to an int if your platform can accomodate all possible values of unsigned char in an int, else it will be converted to an unsigned int (4.5/1).
Shorter integral types are promoted to an int type for bitshift operations. This has nothing to do with the type to which you assign the result of the shift.
On 64-bit machines, your second piece of code would be equally problematic since the int types are usually also 32 bit wide. (Between x86 and x64, long long int is typically always 64 and int 32 bits, only long int depends on the platform.)
(In the spirit of C++, I would write the conversion as (unsigned long long int)(anUInt) << 2, evocative of the conversion-constructor syntax. The first set of parentheses is purely because the type name consists of several tokens.)
I would also prefer to do bitshifting exclusively on unsigned types, because only unsigned types can be considered equivalent (in terms of values) to their own bit pattern value.
Because of integer promotions. For most operators (e.g. <<), char operands are promoted to int first.
This has nothing to do with where the result of the calculation is going. In other words, the fact that your second example assigns to a long long does not affect the promotion of the operands.