substituting positive signed variables to unsigned variables - c++

(In C/C++)
//1
int i = 1;
unsigned u = i;
//2
int i = 1;
unsigned u = (unsigned)i;
//3
unsigned u = 1;
//4
unsigned u = 1u;
The gcc (4.8) compiler makes no difference in the assembly code produced between 1, 2 and 3, 4 each. When writing actual code, (to me) it is often more convenient to use the form 1 and 3 unless it is outside the range of the positive signed. (such as 3,333,333,333 for 32 bit int)
With this function,
void mpz_set_ui (mpz_t rop, unsigned long int op)
I use it as,
mpz_set_ui(num, 3); //or an int variable in place of 3
, for example.
My understanding of the current C(++) standard is that it is unnecessary to explicitly state as unsigned in the above cases, but I am not sure whether in some cases there may be some additional tasks to convert from signed to unsigned, or is it always the exactly same executable when the substituted signed variable is within the range of the target unsigned variable.

There will be no actual difference in the conversion result, the implicit and explicit conversion do the same thing. The advantage of the explicit cast, preferable static_cast in C++, is clarity: If you write the explicit cast, no one needs to wonder if you changed signedness on accident. I would consider 3) and 4) equally good.

Related

Why does "unsigned int" + "unsigned int" return an "unsigned int"?

I believe that when you add two unsigned int values together, the returned value's data type will be an unsigned int.
But the addition of two unsigned int values may return a value that is larger than an unsigned int.
So why does unsigned int + unsigned int return an unsigned int and not some other larger data type?
This would have truly evil consequences:
Would you really want 1 + 1 to be a long type? And (1 + 1) + (1 + 1) would become a long long type? It would wreak havoc with the type system.
It's also possible, for example, that short, int, long, and long long are all the same size, and similarly for the unsigned versions.
So the implicit type conversion rules as they stand are probably the best solution.
You could always force the issue with something like
0UL + "unsigned int" + "unsigned int"
Let's imagine that we have a language where adding two integers results in a bigger type. So, adding two 32 bit numbers results in a 64 bit number. What would happen in expression the following expression?
auto x = a + b + c + d + e + f + g;
a + b is 64 bits. a + b + c is 128 bits. a + b + c + d is 256 bits... This becomes unmanageable very fast. Most processors don't support operations with so wide operands.
The type of a varaible does not only determine the range of values it can hold, but sloppy speaking, also how the operations are realized. If you add two unsigned values you get an unsigned result. If you want a different type as result (eg long unsigned) you could cast:
unsigned x = 42;
unsigned y = 42;
long unsigned z = static_cast<long unsigned>(x) + static_cast<long unsigned>(y);
Actually the real reason is: It is defined like that. In particular unsigned overflow is well defined in C++ to wrap around and using a wider type for the result of unsigned operators would break that behaviour.
As a contrived example, consider this loop:
for (unsigned i = i0; i != 0; ++i) {}
Note the condition! Lets assume i0 > 0, then it can only ever be false when incrementing the maximum value of unsigned results in 0. This code is obfuscated and should probably make you raise an eyebrow or two in a code-review, though it is perfectly legal. Making the result type adjust depending on the value of the result, or choosing the result type such that overflow cannot happen would break this behaviour.
Because a variable + a same type variable can be only equal to that type variable ,
(well in some cases it will but not in your case)
example:
int + int = int a int plus another int cannot be equal to a float because it dont have the properties of a float.
I hope this answers your question bye!

Is it safe to compare an unsigned int with a std::string::size_type

I am going trough the book "Accelerated C++" by Andrew Koenig and Barbara E. Moo and I have some questions about the main example in chap 2. The code can be summarized as below, and is compiling without warning/error with g++:
#include <string>
using std::string;
int main()
{
const string greeting = "Hello, world!";
// OK
const int pad = 1;
// KO
// int pad = 1;
// OK
// unsigned int pad = 1;
const string::size_type cols = greeting.size() + 2 + pad * 2;
string::size_type c = 0;
if (c == 1 + pad)
{;}
return 0;
}
However, if I replace const int pad = 1; by int pad = 1;, the g++ compiler will return a warning:
warning: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
if (c == 1 + pad)
If I replace const int pad = 1; by unsigned int pad = 1;, the g++ compiler will not return a warning.
I understand why g++ return the warning, but I am not sure about the three below points:
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
From the compiler point of view:
It is unsafe to compare signed and unsinged variables (non-constants).
It is safe to compare 2 unsinged variables of different sizes.
It is safe to compare an unsigned variable with a singed constant if the compiler can check that constant to be in the allowed range for the type of the signed variable (e.g. for 16-bit signed integer it is safe to use a constant in range [0..32767]).
So the answers to your questions:
Yes, it is safe to compare unsigned int and std::string::size_type.
There is no warning because the compiler can perform the safety check (while compiling :)).
There is no problem to use different unsigned types in comparison. Use unsinged int.
Comparing signed and unsigned values is "dangerous" in the sense that you may not get what you expect when the signed value is negative (it may well behave as a very large unsigned value, and thus a > b gives true when a = -1 and b = 100. (The use of const int works because the compiler knows the value isn't changing and thus can say "well, this value is always 1, so it works fine here")
As long as the value you want to compare fits in unsigned int (on typical machines, a little over 4 billion) is fine.
If you are using std::string with the default allocator (which is likely), then size_type is actually size_t.
[support.types]/6 defines that size_t is
an implementation-defined unsigned integer type that is large enough to contain the size
in bytes of any object.
So it's not technically guaranteed to be a unsigned int, but I believe it is defined this way in most cases.
Now regarding your second question: if you use const int something = 2, the compiler sees that this integer is a) never negative and b) never changes, so it's always safe to compare this variable with size_t. In some cases the compiler may optimize the variable out completely and simply replace all it's occurrences with 2.
I would say that it is better to use size_type everywhere where you are to the size of something, since it is more verbose.
What the compiler warns about is the comparison of unsigned and signed integer types. This is because the signed integer can be negative and the meaning is counter intuitive. This is because the signed is converted to unsigned before comparison, which means the negative number will compare greater than the positive.
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Yes, they are both unsigned and then the semantics is what's expected. If their range differs the narrower are converted to a wider type.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
This is because how the compiler is constructed. The compiler parses and to some extent optimizes the code before warnings are issued. The important point is that at the point this warning is being considered the compiler nows that the signed integer is 1 and then it's safe to compare with a unsigned integer.
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
If you don't want it to be constant the best solution would probably be to make it at least an unsigned integer type. However you should be aware that there is no guaranteed relation between normal integer types and sizes, for example unsigned int may be narrower, wider or equal to size_t and size_type (the latter may also differ).

Will implicit type conversion not happen if I divide by 1?

#include <stdio.h>
int main()
{
unsigned int count=1;
signed int val = -20;
signed int div=(val/count);
signed int div1=(val/(signed int)count);
printf("div %d div1 %d \n",div,div1);
return 0;
}
output :
div -20 div1 -20
But if count = 5 then the output :
div 858993455 div1 -4
In count=5 case signed int has been implicitly converted to unsigned int, why not for count=1
signed int div=(val/count);
If one of operands is int and another one is unsigned int, the int one is converted to unsigned int. So here val is converted to unsigned int, then divided by count, then the result is converted back to int.
That is, this expression is evaluated as
int div = (int)((unsigned int)val / count);
So when count == 1, the result remains the same, but when count == 5 the result of (unsigned int)val / count becomes less than INT_MAX, so when converted back to int it doesn't change its (big positive) value.
Note that strictly speaking, even if count == 1 the result doesn't have to be -20, because the result of conversion from (unsigned int)-20 to int is implementation-defined.
There is no such thing as "implicit typecast", typecast refers to explicitly changing the type of an operand. The proper term is implicit (type) conversion.
The fundamentals of C state that the compiler is free to order around or optimize your program as it pleases, as long as it doesn't change the outcome of the program.
So even if the compiler spots that a division by 1 doesn't make sense and can get optimized away, it must still take the potential side-effects caused by the implicit type conversion in account: it cannot optimize those away, because the programmer might have been intentionally relying on them.
In your specific case, signed int div=(val/count) would enforce val to get implicitly converted to unsigned type. But it doesn't really matter, as you show the results back into a signed type and anything divided by 1 will remain unchanged anyhow. So the compiler can therefore optimize the whole thing away as the result would have been the same no matter if unsigned or signed arithmetic was used.
If you divide by 5 though, the results turn very different between -20/5 = -4 and 0xFFFFFFEC/5 = 0xFFFFFFFC. So then the compiler is not allowed to optimize away the implict conversion, as it affects the result.
Therefore the programmer has to know the implicit type conversion rules to tell what will actually happen between the lines of their own source code.
This is the the usual arithmetic conversions. you can find rule there.
So actually, the first result is (unsigned)-4. use complement rule, it will be 858993455.
you can also reference Implicit conversions

Implicit type casts in expressions with bit shifts

In the code below, why 1-byte anUChar is automatically converted into 4 bytes to produce the desired result 0x300 (instead of 0x0 if anUChar would remain 1 byte in size):
unsigned char anUChar = 0xc0; // only the two most significant bits are set
int anInt = anUChar << 2; // 0x300 (correct)
But in this code, aimed at a 64-bit result, no automatic conversion into 8 bytes happens:
unsigned int anUInt = 0xc0000000; // only the two most significant bits are set
long long aLongLong = anUInt << 2; // 0x0 (wrong, means no conversion occurred)
And only placing an explicit type cast works:
unsigned int anUInt = 0xc0000000;
long long aLongLong = (long long)anUInt << 2; // 0x300000000 (correct)
And most importantly, would this behavior be the same in a program that targets 64-bit machines?
By the way, which of the two is most right and portable: (type)var << 1 or ((type)var) << 1?
char always gets promoted to int during arithmetic. I think this is specified behavior in the C standard.
However, int is not automatically promoted to long long.
Under some situations, some compilers (Visual Studio) will actually warn you about this if you try to left-shift a smaller integer and store it into a larger one.
By the way, which of the two is most right and portable: (type)var <<
1 or ((type)var) << 1?
Both are fine and portable. Though I prefer the first one since it's shorter. Casting has higher precedence than shift.
Conversion does happen. The problem is the result of the expression anUInt << 2 is an unsigned int because anUInt is an unsigned int.
Casting anUInt to a long long (actually, this is conversion in this particular case) is the correct thing to do.
Neither (type)var << 1 or ((type)var) << 1 is more correct or portable because operator precedence is strictly defined by the Standard. However, the latter is probably better because it's easier to understand to humans looking at the code casually. Others may disagree with this assertion.
EDIT:
Note that in your first example:
unsigned char anUChar = 0xc0;
int anInt = anUChar << 2;
...the result of the expression anUChar << 2 is not an unsigned char as you might expect, but an int because of Integral Promotion.
The operands of operator<< are integral or enumeration type (See Standard 5.8/1). When a binary operator that expects operands of arithmetic or enumeration type is called, the compiler attempts to convert both operands to the same type, so that the expression may yield a common type. In this case, integral promotion is performed on both operands (5/9). When an unsigned char takes part in integral promotion, it will be converted to an int if your platform can accomodate all possible values of unsigned char in an int, else it will be converted to an unsigned int (4.5/1).
Shorter integral types are promoted to an int type for bitshift operations. This has nothing to do with the type to which you assign the result of the shift.
On 64-bit machines, your second piece of code would be equally problematic since the int types are usually also 32 bit wide. (Between x86 and x64, long long int is typically always 64 and int 32 bits, only long int depends on the platform.)
(In the spirit of C++, I would write the conversion as (unsigned long long int)(anUInt) << 2, evocative of the conversion-constructor syntax. The first set of parentheses is purely because the type name consists of several tokens.)
I would also prefer to do bitshifting exclusively on unsigned types, because only unsigned types can be considered equivalent (in terms of values) to their own bit pattern value.
Because of integer promotions. For most operators (e.g. <<), char operands are promoted to int first.
This has nothing to do with where the result of the calculation is going. In other words, the fact that your second example assigns to a long long does not affect the promotion of the operands.

In case of integer overflows what is the result of (unsigned int) * (int) ? unsigned or int?

In case of integer overflows what is the result of (unsigned int) * (int) ? unsigned or int? What type does the array index operator (operator[]) take for char*: int, unsigned int or something else?
I was auditing the following function, and suddenly this question arose. The function has a vulnerability at line 17.
// Create a character array and initialize it with init[]
// repeatedly. The size of this character array is specified by
// w*h.
char *function4(unsigned int w, unsigned int h, char *init)
{
char *buf;
int i;
if (w*h > 4096)
return (NULL);
buf = (char *)malloc(4096+1);
if (!buf)
return (NULL);
for (i=0; i<h; i++)
memcpy(&buf[i*w], init, w); // line 17
buf[4096] = '\0';
return buf;
}
Consider both w and h are very large unsigned integers. The multiplication at line 9 have a chance to pass the validation.
Now the problem is at line 17. Multiply int i with unsigned int w: if the result is int, it is possible that the product is negative, resulting in accessing a position that is before buf. If the result is unsigned int, the product will always be positive, resulting in accessing a position that is after buf.
It's hard to write code to justify this: int is too large. Does anyone has ideas on this?
Is there any documentation that specifies the type of the product? I have searched for it, but so far haven't found anything.
I suppose that as far as the vulnerability is concerned, whether (unsigned int) * (int) produces unsigned int or int doesn't matter, because in the compiled object file, they are just bytes. The following code works the same no matter the type of the product:
unsigned int x = 10;
int y = -10;
printf("%d\n", x * y); // print x * y in signed integer
printf("%u\n", x * y); // print x * y in unsigned integer
Therefore, it does not matter what type the multiplication returns. It matters that whether the consumer function takes int or unsigned.
The question here is not how bad the function is, or how to improve the function to make it better. The function undoubtedly has a vulnerability. The question is about the exact behavior of the function, based on the prescribed behavior from the standards.
do the w*h calculation in long long, check if bigger than MAX_UINT
EDIT : alternative : if overflown (w*h)/h != w (is this always the case ?! should be, right ?)
To answer your question: the type of an expression multiplying an int and an unsigned int will be an unsigned int in C/C++.
To answer your implied question, one decent way to deal with possible overflow in integer arithmetic is to use the "IntSafe" set of routines from Microsoft:
http://blogs.msdn.com/michael_howard/archive/2006/02/02/523392.aspx
It's available in the SDK and contains inline implementations so you can study what they're doing if you're on another platform.
Ensure that w * h doesn't overflow by limiting w and h.
The type of w*i is unsigned in your case. If I read the standard correctly, the rule is that the operands are converted to the larger type (with its signedness), or unsigned type corresponding to the signed type (which is unsigned int in your case).
However, even if it's unsigned, it doesn't prevent the wraparound (writing to memory before buf), because it might be the case (on i386 platform, it is), that p[-1] is the same as p[-1u]. Anyway, in your case, both buf[-1] and buf[big unsigned number] would be undefined behavior, so the signed/unsigned question is not that important.
Note that signed/unsigned matters in other contexts - eg. (int)(x*y/2) gives different results depending on the types of x and y, even in the absence of undefined behaviour.
I would solve your problem by checking for overflow on line 9; since 4096 is a pretty small constant and 4096*4096 doesn't overflow on most architectures (you need to check), I'd do
if (w>4096 || h>4096 || w*h > 4096)
return (NULL);
This leaves out the case when w or h are 0, you might want to check for it if needed.
In general, you could check for overflow like this:
if(w*h > 4096 || (w*h)/w!=h || (w*h)%w!=0)
In C/C++ the p[n] notation is really a shortcut to writting *(p+n), and this pointer arithmetic takes into account the sign. So p[-1] is valid and refers to the value immediately before *p.
So the sign really matters here, the result of arithmetic operator with integer follow a set of rules defined by the standard, and this is called integer promotions.
Check out this page: INT02-C. Understand integer conversion rules
2 changes make it safer:
if (w >= 4096 || h >= 4096 || w*h > 4096) return NULL;
...
unsigned i;
Note also that it's not less a bad idea to write to or read from past the buffer end. So the question is not whether iw may become negative, but whether 0 <= ih +w <= 4096 holds.
So it's not the type that matters, but the result of h*i.
For example, it doesn't make a difference whether this is (unsigned)0x80000000 or (int)0x80000000, the program will seg-fault anyway.
For C, refer to "Usual arithmetic conversions" (C99: Section 6.3.1.8, ANSI C K&R A6.5) for details on how the operands of the mathematical operators are treated.
In your example the following rules apply:
C99:
Otherwise, if the type of the operand
with signed integer type can represent
all of the values of the type of the
operand with unsigned integer type,
then the operand with unsigned integer
type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted
to the unsigned integer type
corresponding to the type of the
operand with signed integer type.
ANSI C:
Otherwise, if either operand is unsigned int, the other is converted to unsigned int.
Why not just declare i as unsigned int? Then the problem goes away.
In any case, i*w is guaranteed to be <= 4096, as the code tests for this, so it's never going to overflow.
memcpy(&buf[iw > -1 ? iw < 4097? iw : 0 : 0], init, w);
I don't think the triple calculation of iw does degrade the perfomance)
w*h could overflow if w and/or h are sufficiently large and the following validation could pass.
9. if (w*h > 4096)
10. return (NULL);
On int , unsigned int mixed operations, int is elevated to unsigned int, in which case, a negative value of 'i' would become a large positive value. In that case
&buf[i*w]
would be accessing a out of bound value.
Unsigned arithmetic is done as modular (or wrap-around), so the product of two large unsigned ints can easily be less than 4096. The multiplication of int and unsigned int will result in an unsigned int (see section 4.5 of the C++ standard).
Therefore, given large w and a suitable value of h, you can indeed get into trouble.
Making sure integer arithmetic doesn't overflow is difficult. One easy way is to convert to floating-point and doing a floating-point multiplication, and seeing if the result is at all reasonable. As qwerty suggested, long long would be usable, if available on your implementation. (It's a common extension in C90 and C++, does exist in C99, and will be in C++0x.)
There are 3 paragraphs in the current C1X draft on calculating (UNSIGNED TYPE1) X (SIGNED TYPE2) in 6.3.1.8 Usual arithmetic coversions, N1494,
WG 14: C - Project status and milestones
Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.
Otherwise, if the type of the operand with signed integer type can represent
all of the values of the type of the operand with unsigned integer type, then
the operand with unsigned integer type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
So if a is unsigned int and b is int, parsing of (a * b) should generate code (a * (unsigned int)b). Will overflow if b < 0 or a * b > UINT_MAX.
If a is unsigned int and b is long of greater size, (a * b) should generate ((long)a * (long)b). Will overflow if a * b > LONG_MAX or a * b < LONG_MIN.
If a is unsigned int and b is long of the same size, (a * b) should generate ((unsigned long)a * (unsigned long)b). Will overflow if b < 0 or a * b > ULONG_MAX.
On your second question about the type expected by "indexer", the answer appears "integer type" which allows for any (signed) integer index.
6.5.2.1 Array subscripting
Constraints
1 One of the expressions shall have type ‘‘pointer to complete object type’’, the other
expression shall have integer type, and the result has type ‘‘type’’.
Semantics
2 A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
It is up to the compiler to perform static analysis and warn the developer about possibility of buffer overrun when the pointer expression is an array variable and the index may be negative. Same goes about warning on possible array size overruns even when the index is positive or unsigned.
To actually answer your question, without specifying the hardware you're running on, you don't know, and in code intended to be portable, you shouldn't depend on any particular behavior.