Does C++ R-Value has a size? - c++

Let's assume the largest number an int variable can hold is 10. Consider the following situation:
main()
{
int r1 = 10;
int r2 = 1;
int x = r1 + r2;
}
According to my current little knowledge, r1 + r2 expression creates a temporary variable to hold the result before copying that value to x.
What i want to know is since the largest x can hold is 10, i know (it's a guess actually) that if i print x, i get 10. But what about r1 + r2 ?. Does this temporary variable that represent the result of r1 + r2 expression also hold 10 ?.
In other words does this temporary variable also has a largest it can hold ?
This is probably a noob question and i apologise.
Please Note:
I asked this question based on what i thought what overflowing is. That is; i thought when a variable reach to a state where (let's say for an integer case), if i add one more integer to it's value it's gonna overflow. And i thought when that happens the maximum value it holds gonna stay the same regardless of me increasing it. But that's not the case apparently. The behaviour is undefined when overflow for most types. check #bolov's answer

Signed integers
Computing a value larger than the maximum value or smaller than the minimum value of an signed integer type is called "overflow" and is Undefined Behavior.
E.g.:
int a = std::numeric_limits<int>::max();
int b = 1;
a + b;
The above program has Undefined Behavior because the type of a + b is int and the value computed would overflow.
§ 8 Expressions [expr]
§ 8.1 Preamble [expr.pre]
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for
its type, the behavior is undefined.
Unsigned integers
Unsigned integers do not overflow because they are always computed in modulo arithmetic.
unsigned a = std::numeric_limits<unsigned>::max();
a + 1; // guaranteed to be 0
unsigned b = 0;
b - 1; // guaranteed to be std::numeric_limits<unsigned>::max();
§6.9.1 Fundamental types [basic.fundamental]
Unsigned integers shall obey the laws of arithmetic modulo 2n where n
is the number of bits in the value representation of that particular
size of integer 49
49) This implies that unsigned arithmetic does not overflow because a
result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting unsigned integer type.

BTW, you cannot be sure, if it creates a new variable at all. I mean it all depends on the compiler, compiler options, etc. For instance, in some circumstances, the compiler can just calculate the value of the r-value (if it is vivid and possible for that point) and just put the calculated value into the variable.
For your example, it is obvious, that r1 + r2 == 11. Then the x might be constructed using a value 11. And this also doesn't mean, that the x will 100% constructed and a constructor will be called for him.
Once I debugged and saw, that a variable I declared (and defined) was not created at all (and also some calculations I had). That was because I didn't use the variable in any meaningful way and set the optimization to the highest level.

Related

Why, in some C++ compilers, does `int x = 2147483647+1;` give only a warning but stores a negative value, while some compilers give a runtime error?

I want to check if the reverse of an signed int value x lies inside INT_MAX and INT_MIN. For this, I have reversed x twice and checked if it is equal to the original x, if so then it lies inside INT_MAX and INT_MIN, else it does not.
But online compilers are giving a runtime error, but my g++ compiler is working fine and giving the correct output. Can anybody tell me the reason?
int reverse(int x) {
int tx=x,rx=0,ans;
while(tx!=0){
rx = rx+rx+rx+rx+rx+rx+rx+rx+rx+rx+tx%10;
tx/=10;
}
ans = tx = rx;
rx=0;
while(tx!=0){
rx = rx*10 + tx%10;
tx/=10;
}
while(x%10==0&&x!=0)x/=10;
//triming trailing zeros
if(rx!=x){
return 0;
}else{
return ans;
}
}
ERROR:
Line 6: Char 23: runtime error: signed integer overflow: 1929264870 + 964632435 cannot be represented in type 'int' (solution.cpp)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior prog_joined.cpp:15:23
According to the cppreference:
Overflows
...
When signed integer arithmetic operation overflows (the result does not fit in the result type), the behavior is undefined: it may wrap around according to the rules of the representation (typically 2's complement), it may trap on some platforms or due to compiler options (e.g. -ftrapv in GCC and Clang), or may be completely optimized out by the compiler.
So the online compiler may comes with more strict checking, while your local GCC choosed to "wrap" on overflow.
Instead, if you want to wrap on overflow for sure, you may promote your operands to 64 bit width, perform the addition and then convert the result to 32 bit width again. According to cppreference this seems to be well defined after C++20.
Numeric conversions
Unlike the promotions, numeric conversions may change the values, with potential loss of precision.
Integral conversions
...
if the destination type is signed, the value does not change if the source integer can be represented in the destination type. Otherwise the result is { implementation-defined (until C++20) } / { the unique value of the destination type equal to the source value modulo 2n
where n is the number of bits used to represent the destination type. (since C++20) }
Example code on godbolt
I'm not sure what's wrong with your algorithm, but let me suggest an alternative that avoids doing math and the possibility of integer overflows.
If you want to find of if say, 2147483641 is a valid integer when reversed (e.g. 1463847412), you can do it entirely as string comparisons after converting the initial value to a string and reversing that string.
Basic algorithm for a non-negative value it this:
convert integer to string (let's call this string s)
convert INT_MAX to a string (let's call this string max_s)
reverse s. Handle leading zero's and negative chars appropriately. That is, 120 reversed is "21" and -456 reversed is "-654". The result of this conversion is a string called rev.
If rev.length() < s_max.length() Then rev is valid as an integer.
If rev.length() > s_max.length(), then rev is not valid as a integer.
If rev.size() == s_max.length(), then the reversed string is the same length as INT_MAX as a string. A lexical comparison will suffice to see if it's valid. That is isValid = (rev <= s_max) or isValid = strcmp(rev, s_max) < 1
The algorithm for a negative value is identical except replace INT_MAX with INT_MIN.

Confused in the output of the following programme

float b = 1.0f;
int i = b;
int& j = (int&)i;
cout<<j<<endl;
o/p = 1
But for the following scenario
float b = 1.0f;
int i = b;
int& j = (int&)b;
cout<<j<<endl;
O/P = 1065353216
since both are having the same value it shall show the same result ...Can anyone please let me know whats really happening when i am doing some change in line number 3 ?
In the first one, you are doing everything fine. The compiler is able to convert float b to int i, losing precision, but it's fine. Now, take a look at my debugger window during the execution of your second example:
Sorry for my Russian IDE interface, the first column is variable name, the second is value, and the third is type.
As you can see, now the float is simply interpreted as int. So the leading 1 bits are interpreted as the integer's bits, which leads to the result you are getting. So basically, you take the float's binary representation (usually it's represented as sign bit, mantissa and exponent), and try to interpret it as an int.
In the first case you're initializing j correctly and the cast is superfluous. In the second case you're doing it wrong (i.e. to an object of a different type) but the cast shuts the compiler up.
In this second case, what you get is probably the internal representation of 1.0 interpreted as in integer.
Integer 1 and floating-point 1.0f may be mathematically the same value, but in C++ they have different types, with different representations.
Casting an lvalue to a reference is equivalent to reinterpret_cast; it says "look at whatever is in this memory location, and interpret those bytes as an int".
In the first case, the memory contains an int, so interpreting those bytes as an int gives expected value.
In the second case, the memory contains a float, so you see the bytes (or perhaps just some of them, or perhaps some extra ones too, if sizeof(int) != sizeof(float)) that represent the floating-point number, reinterpreted as an integer.
Your computer probably uses 32-bit int and 32-bit IEEE float representations. The float value 1.0f has a sign bit of zero, an exponent of zero (represented by the 8-bit value 127, or 01111111 in binary), and a mantissa of 1 (represented by the 23-bit value zero), so the 32-bit pattern would look like:
00111111 10000000 00000000 00000000
When reinterpreted as an integer, this gives the hex value 0x3f800000, which is 1065353216 in decimal.
Reference doesn't do any memory allocation, it just places an entry into table of local names and their addresses. In first case name 'j' points to the memory previously allocated to int datatype (for variable 'i'), while in second case name 'j' points to memory allocated to float datatype (for variable 'b'). When you use 'j' compiler interprets data at the appropriate address as if it was int, but in fact some float is placed there, that's why you get some "strange" numbers instead of 1
The first one first casts b to an int before assigning it to i. This is the "proper" way, as the compiler will properly convert the value.
The second one does no casting and re-interpret's b's bits as an integer. If you read up on floating point format you can see exactly why you're getting the value you're getting.
Under the covers, all your variables are just collections of bits. How you interpret those bits changes the perceived value they represent. In the first one, you're rearranging the bit pattern to preserve the "perceived" value (of 1). In the second one, you're not rearranging the bit pattern, and so the perceived value is not properly converted.

Curious arithmetic error- 255x256x256x256=18446744073692774400

I encountered a strange thing when I was programming under c++. It's about a simple multiplication.
Code:
unsigned __int64 a1 = 255*256*256*256;
unsigned __int64 a2= 255 << 24; // same as the above
cerr()<<"a1 is:"<<a1;
cerr()<<"a2 is:"<<a2;
interestingly the result is:
a1 is: 18446744073692774400
a2 is: 18446744073692774400
whereas it should be:(using calculator confirms)
4278190080
Can anybody tell me how could it be possible?
255*256*256*256
all operands are int you are overflowing int. The overflow of a signed integer is undefined behavior in C and C++.
EDIT:
note that the expression 255 << 24 in your second declaration also invokes undefined behavior if your int type is 32-bit. 255 x (2^24) is 4278190080 which cannot be represented in a 32-bit int (the maximum value is usually 2147483647 on a 32-bit int in two's complement representation).
C and C++ both say for E1 << E2 that if E1 is of a signed type and positive and that E1 x (2^E2) cannot be represented in the type of E1, the program invokes undefined behavior. Here ^ is the mathematical power operator.
Your literals are int. This means that all the operations are actually performed on int, and promptly overflow. This overflowed value, when converted to an unsigned 64bit int, is the value you observe.
It is perhaps worth explaining what happened to produce the number 18446744073692774400. Technically speaking, the expressions you wrote trigger "undefined behavior" and so the compiler could have produced anything as the result; however, assuming int is a 32-bit type, which it almost always is nowadays, you'll get the same "wrong" answer if you write
uint64_t x = (int) (255u*256u*256u*256u);
and that expression does not trigger undefined behavior. (The conversion from unsigned int to int involves implementation-defined behavior, but as nobody has produced a ones-complement or sign-and-magnitude CPU in many years, all implementations you are likely to encounter define it exactly the same way.) I have written the cast in C style because everything I'm saying here applies equally to C and C++.
First off, let's look at the multiplication. I'm writing the right hand side in hex because it's easier to see what's going on that way.
255u * 256u = 0x0000FF00u
255u * 256u * 256u = 0x00FF0000u
255u * 256u * 256u * 256u = 0xFF000000u (= 4278190080)
That last result, 0xFF000000u, has the highest bit of a 32-bit number set. Casting that value to a signed 32-bit type therefore causes it to become negative as-if 232 had been subtracted from it (that's the implementation-defined operation I mentioned above).
(int) (255u*256u*256u*256u) = 0xFF000000 = -16777216
I write the hexadecimal number there, sans u suffix, to emphasize that the bit pattern of the value does not change when you convert it to a signed type; it is only reinterpreted.
Now, when you assign -16777216 to a uint64_t variable, it is back-converted to unsigned as-if by adding 264. (Unlike the unsigned-to-signed conversion, this semantic is prescribed by the standard.) This does change the bit pattern, setting all of the high 32 bits of the number to 1 instead of 0 as you had expected:
(uint64_t) (int) (255u*256u*256u*256u) = 0xFFFFFFFFFF000000u
And if you write 0xFFFFFFFFFF000000 in decimal, you get 18446744073692774400.
As a closing piece of advice, whenever you get an "impossible" integer from C or C++, try printing it out in hexadecimal; it's much easier to see oddities of twos-complement fixed-width arithmetic that way.
The answer is simple -- overflowed.
Here Overflow occurred on int and when you are assigning it to unsigned int64 its converted in to 18446744073692774400 instead of 4278190080

Basic integer explanation in C++

This is a very basic question.Please don't mind but I need to ask this. Adding two integers
int main()
{
cout<<"Enter a string: ";
int a,b,c;
cout<<"Enter a";
cin>>a;
cout<<"\nEnter b";
cin>>b;
cout<<a<<"\n"<<b<<"\n";
c= a + b;
cout <<"\n"<<c ;
return 0;
}
If I give a = 2147483648 then
b automatically takes a value of 4046724. Note that cin will not be prompted
and the result c is 7433860
If int is 2^32 and if the first bit is MSB then it becomes 2^31
c= 2^31+2^31
c=2^(31+31)
is this correct?
So how to implement c= a+b for a= 2147483648 and b= 2147483648 and should c be an integer or a double integer?
When you perform any sort of input operation, you must always include an error check! For the stream operator, this could look like this:
int n;
if (!(std::cin >> n)) { std::cerr << "Error!\n"; std::exit(-1); }
// ... rest of program
If you do this, you'll see that your initial extraction of a already fails, so whatever values are read afterwards are not well defined.
The reason the extraction fails is that the literal token "2147483648" does not represent a value of type int on your platform (it is too large), no different from, say, "1z" or "Hello".
The real danger in programming is to assume silently that an input operation succeeds when often it doesn't. Fail as early and as noisily as possible.
The int type is signed and therefor it's maximum value is 2^31-1 = 2147483648 - 1 = 2147483647
Even if you used unsigned integer it's maximum value is 2^32 -1 = a + b - 1 for the values of a and b you give.
For the arithmetics you are doing, you should better use "long long", which has maximum value of 2^63-1 and is signed or "unsigned long long" which has a maximum value of 2^64-1 but is unsigned.
c= 2^31+2^31
c=2^(31+31)
is this correct?
No, but you're right that the result takes more than 31 bits. In this case the result takes 32 bits (whereas 2^(31+31) would take 62 bits). You're confusing multiplication with addition: 2^31 * 2^31 = 2^(31+31).
Anyway, the basic problem you're asking about dealing with is called overflow. There are a few options. You can detect it and report it as an error, detect it and redo the calculation in such a way as to get the answer, or just use data types that allow you to do the calculation correctly no matter what the input types are.
Signed overflow in C and C++ is technically undefined behavior, so detection consists of figuring out what input values will cause it (because if you do the operation and then look at the result to see if overflow occurred, you may have already triggered undefined behavior and you can't count on anything). Here's a question that goes into some detail on the issue: Detecting signed overflow in C/C++
Alternatively, you can just perform the operation using a data type that won't overflow for any of the input values. For example, if the inputs are ints then the correct result for any pair of ints can be stored in a wider type such as (depending on your implementation) long or long long.
int a, b;
...
long c = (long)a + (long)b;
If int is 32 bits then it can hold any value in the range [-2^31, 2^31-1]. So the smallest value obtainable would be -2^31 + -2^31 which is -2^32. And the largest value obtainable is 2^31 - 1 + 2^31 - 1 which is 2^32 - 2. So you need a type that can hold these values and every value in between. A single extra bit would be sufficient to hold any possible result of addition (a 33-bit integer would hold any integer from [-2^32,2^32-1]).
Or, since double can probably represent every integer you need (a 64-bit IEEE 754 floating point data type can represent integers up to 53 bits exactly) you could do the addition using doubles as well (though adding doubles may be slower than adding longs).
If you have a library that offers arbitrary precision arithmetic you could use that as well.

In case of integer overflows what is the result of (unsigned int) * (int) ? unsigned or int?

In case of integer overflows what is the result of (unsigned int) * (int) ? unsigned or int? What type does the array index operator (operator[]) take for char*: int, unsigned int or something else?
I was auditing the following function, and suddenly this question arose. The function has a vulnerability at line 17.
// Create a character array and initialize it with init[]
// repeatedly. The size of this character array is specified by
// w*h.
char *function4(unsigned int w, unsigned int h, char *init)
{
char *buf;
int i;
if (w*h > 4096)
return (NULL);
buf = (char *)malloc(4096+1);
if (!buf)
return (NULL);
for (i=0; i<h; i++)
memcpy(&buf[i*w], init, w); // line 17
buf[4096] = '\0';
return buf;
}
Consider both w and h are very large unsigned integers. The multiplication at line 9 have a chance to pass the validation.
Now the problem is at line 17. Multiply int i with unsigned int w: if the result is int, it is possible that the product is negative, resulting in accessing a position that is before buf. If the result is unsigned int, the product will always be positive, resulting in accessing a position that is after buf.
It's hard to write code to justify this: int is too large. Does anyone has ideas on this?
Is there any documentation that specifies the type of the product? I have searched for it, but so far haven't found anything.
I suppose that as far as the vulnerability is concerned, whether (unsigned int) * (int) produces unsigned int or int doesn't matter, because in the compiled object file, they are just bytes. The following code works the same no matter the type of the product:
unsigned int x = 10;
int y = -10;
printf("%d\n", x * y); // print x * y in signed integer
printf("%u\n", x * y); // print x * y in unsigned integer
Therefore, it does not matter what type the multiplication returns. It matters that whether the consumer function takes int or unsigned.
The question here is not how bad the function is, or how to improve the function to make it better. The function undoubtedly has a vulnerability. The question is about the exact behavior of the function, based on the prescribed behavior from the standards.
do the w*h calculation in long long, check if bigger than MAX_UINT
EDIT : alternative : if overflown (w*h)/h != w (is this always the case ?! should be, right ?)
To answer your question: the type of an expression multiplying an int and an unsigned int will be an unsigned int in C/C++.
To answer your implied question, one decent way to deal with possible overflow in integer arithmetic is to use the "IntSafe" set of routines from Microsoft:
http://blogs.msdn.com/michael_howard/archive/2006/02/02/523392.aspx
It's available in the SDK and contains inline implementations so you can study what they're doing if you're on another platform.
Ensure that w * h doesn't overflow by limiting w and h.
The type of w*i is unsigned in your case. If I read the standard correctly, the rule is that the operands are converted to the larger type (with its signedness), or unsigned type corresponding to the signed type (which is unsigned int in your case).
However, even if it's unsigned, it doesn't prevent the wraparound (writing to memory before buf), because it might be the case (on i386 platform, it is), that p[-1] is the same as p[-1u]. Anyway, in your case, both buf[-1] and buf[big unsigned number] would be undefined behavior, so the signed/unsigned question is not that important.
Note that signed/unsigned matters in other contexts - eg. (int)(x*y/2) gives different results depending on the types of x and y, even in the absence of undefined behaviour.
I would solve your problem by checking for overflow on line 9; since 4096 is a pretty small constant and 4096*4096 doesn't overflow on most architectures (you need to check), I'd do
if (w>4096 || h>4096 || w*h > 4096)
return (NULL);
This leaves out the case when w or h are 0, you might want to check for it if needed.
In general, you could check for overflow like this:
if(w*h > 4096 || (w*h)/w!=h || (w*h)%w!=0)
In C/C++ the p[n] notation is really a shortcut to writting *(p+n), and this pointer arithmetic takes into account the sign. So p[-1] is valid and refers to the value immediately before *p.
So the sign really matters here, the result of arithmetic operator with integer follow a set of rules defined by the standard, and this is called integer promotions.
Check out this page: INT02-C. Understand integer conversion rules
2 changes make it safer:
if (w >= 4096 || h >= 4096 || w*h > 4096) return NULL;
...
unsigned i;
Note also that it's not less a bad idea to write to or read from past the buffer end. So the question is not whether iw may become negative, but whether 0 <= ih +w <= 4096 holds.
So it's not the type that matters, but the result of h*i.
For example, it doesn't make a difference whether this is (unsigned)0x80000000 or (int)0x80000000, the program will seg-fault anyway.
For C, refer to "Usual arithmetic conversions" (C99: Section 6.3.1.8, ANSI C K&R A6.5) for details on how the operands of the mathematical operators are treated.
In your example the following rules apply:
C99:
Otherwise, if the type of the operand
with signed integer type can represent
all of the values of the type of the
operand with unsigned integer type,
then the operand with unsigned integer
type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted
to the unsigned integer type
corresponding to the type of the
operand with signed integer type.
ANSI C:
Otherwise, if either operand is unsigned int, the other is converted to unsigned int.
Why not just declare i as unsigned int? Then the problem goes away.
In any case, i*w is guaranteed to be <= 4096, as the code tests for this, so it's never going to overflow.
memcpy(&buf[iw > -1 ? iw < 4097? iw : 0 : 0], init, w);
I don't think the triple calculation of iw does degrade the perfomance)
w*h could overflow if w and/or h are sufficiently large and the following validation could pass.
9. if (w*h > 4096)
10. return (NULL);
On int , unsigned int mixed operations, int is elevated to unsigned int, in which case, a negative value of 'i' would become a large positive value. In that case
&buf[i*w]
would be accessing a out of bound value.
Unsigned arithmetic is done as modular (or wrap-around), so the product of two large unsigned ints can easily be less than 4096. The multiplication of int and unsigned int will result in an unsigned int (see section 4.5 of the C++ standard).
Therefore, given large w and a suitable value of h, you can indeed get into trouble.
Making sure integer arithmetic doesn't overflow is difficult. One easy way is to convert to floating-point and doing a floating-point multiplication, and seeing if the result is at all reasonable. As qwerty suggested, long long would be usable, if available on your implementation. (It's a common extension in C90 and C++, does exist in C99, and will be in C++0x.)
There are 3 paragraphs in the current C1X draft on calculating (UNSIGNED TYPE1) X (SIGNED TYPE2) in 6.3.1.8 Usual arithmetic coversions, N1494,
WG 14: C - Project status and milestones
Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.
Otherwise, if the type of the operand with signed integer type can represent
all of the values of the type of the operand with unsigned integer type, then
the operand with unsigned integer type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
So if a is unsigned int and b is int, parsing of (a * b) should generate code (a * (unsigned int)b). Will overflow if b < 0 or a * b > UINT_MAX.
If a is unsigned int and b is long of greater size, (a * b) should generate ((long)a * (long)b). Will overflow if a * b > LONG_MAX or a * b < LONG_MIN.
If a is unsigned int and b is long of the same size, (a * b) should generate ((unsigned long)a * (unsigned long)b). Will overflow if b < 0 or a * b > ULONG_MAX.
On your second question about the type expected by "indexer", the answer appears "integer type" which allows for any (signed) integer index.
6.5.2.1 Array subscripting
Constraints
1 One of the expressions shall have type ‘‘pointer to complete object type’’, the other
expression shall have integer type, and the result has type ‘‘type’’.
Semantics
2 A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
It is up to the compiler to perform static analysis and warn the developer about possibility of buffer overrun when the pointer expression is an array variable and the index may be negative. Same goes about warning on possible array size overruns even when the index is positive or unsigned.
To actually answer your question, without specifying the hardware you're running on, you don't know, and in code intended to be portable, you shouldn't depend on any particular behavior.