assignment operation confusion - c++

What is the output of the following code:
int main() {
int k = (k = 2) + (k = 3) + (k = 5);
printf("%d", k);
}
It does not give any error, why? I think it should give error because the assignment operations are on the same line as the definition of k.
What I mean is int i = i; cannot compile.
But it compiles. Why? What will be the output and why?

int i = i compiles because 3.3.1/1 (C++03) says
The point of declaration for a name is immediately after its complete declarator and before its initializer
So i is initialized with its own indeterminate value.
However the code invokes Undefined Behaviour because k is being modified more than once between two sequence points. Read this FAQ on Undefined Behaviour and Sequence Points

int i = i; first defines the variable and then assigns a value to it. In C you can read from an uninitialized variable. It's never a good idea, and some compilers will issue a warning message, but it's possible.
And in C, assignments are also expressions. The output will be "10", or it would be if you had a 'k' there, instead of an 'a'.

Wow, I got 11 too. I think k is getting assigned to 3 twice and then once to 5 for the addition. Making it just int k = (k=2)+(k=3) yields 6, and int k = (k=2)+(k=4) yields 8, while int k = (k=2)+(k=4)+(k=5) gives 13. int k = (k=2)+(k=4)+(k=5)+(k=6) gives 19 (4+4+5+6).
My guess? The addition is done left to right. The first two (k=x) expressions are added, and the result is stored in a register or on the stack. However, since it is k+k for this expression, both values being added are whatever k currently is, which is the second expression because it is evaluated after the other (overriding its assignment to k). However, after this initial add, the result is stored elsewhere, so is now safe from tampering (changing k will not affect it). Moving from left to right, each successive addition reassigns k (not affected the running sum), and adds k to the running sum.

Related

for loop execution within the loop condition, c++

I wanted to fill up an int array with 121 ints, from 0 to 120. What is the difference between:
for(int i = 0; i < 122; arr[i] = i, i++){} and
for(int i = 0; i < 122; i++){arr[i] = i;} ?
I checked it and except warning: iteration 121u invokes undefined behavior, which I think isn't related to my question, the code compiles fine and gets the expected results
EDIT: thanks for all who noticed the readability problem, that's true of course, but I meant to see if there is a different interpretation for these 2 lines, so I checked both of these lines in C to assembly and they look identical
None, the result will be the same.
The first construction uses a comma operator; the left side of a comma operator is sequenced before the right side, so arr[i] = i, i++ is well-defined
The second one is easier to read, though, especially if one chooses to omit the {} completely:
for(int i = 0; i < 122; arr[i] = i, i++); //this ; is evil, don't write such code.
Also, if you want to fill up 120 elements, you should use i < 120.
The end result from both of the lines will be the same. However, the second one is better as the first one sacrifices readability for no gain.
When people read through code, they expect for loops to be in manner you have written in the second line. If I was stepping through code and encountered the first line, I would've stopped for a second to look at why an empty for loop was being run, and then would've realised that you are setting the variable in the for loop itself using the comma operator. Breaks the flow while reading code, and so won't recommend it.

C/C++ intentional out of range indexing [duplicate]

This question already has answers here:
Access array beyond the limit in C and C++ [duplicate]
(7 answers)
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 9 years ago.
Say I have an array like so:
int val[10];
and I intentionally index it with everything from negative values to anything higher than 9, but WITHOUT using the resulting value in any way. This would be for performance reasons (perhaps it's more efficient to check the input index AFTER the array access has been made).
My questions are:
Is it safe to do so, or will I run into some sort of memory protection barriers, risk corrupting memory or similar for certain indices?
Is it perhaps not at all efficient if I access data out of range like this? (assuming the array has no built in range check).
Would it be considered bad practice? (assuming a comment is written to indicate we're aware of using out of range indices).
It is undefined behavior. By definition, undefined means "anything could happen." Your code could crash, it could work perfectly, it could bring about peace and harmony amongst all humans. I wouldn't bet on the second or the last.
It is Undefined Behavior, and you might actually run afoul of the optimizers.
Imagine this simple code example:
int select(int i) {
int values[10] = { .... };
int const result = values[i];
if (i < 0 or i > 9) throw std::out_of_range("out!");
return result;
}
And now look at it from an optimizer point of view:
int values[10] = { ... };: valid indexes are in [0, 9].
values[i]: i is an index, thus i is in [0, 9].
if (i < 0 or i > 9) throw std::out_of_range("out!");: i is in [0, 9], never taken
And thus the function rewritten by the optimizer:
int select(int i) {
int values[10] = { ... };
return values[i];
}
For more amusing stories about forward and backward propagation of assumptions based on the fact that the developer is not doing anything forbidden, see What every C programmer should know about Undefined Behavior: Part 2.
EDIT:
Possible work-around: if you know that you will access from -M to +N you can:
declare the array with appropriate buffer: int values[M + 10 + N]
offset any access: values[M + i]
As verbose said, this yields undefined behavior. A bit more precision follows.
5.2.1/1 says
[...] The expression E1[E2] is identical (by definition) to *((E1)+(E2))
Hence, val[i] is equivalent to *((val)+i)). Since val is an array, the array-to-pointer conversion (4.2/1) occurs before the addition is performed. Therefore, val[i] is equivalent to *(ptr + i) where ptr is an int* set to &val[0].
Then, 5.7/2 explains what ptr + i points to. It also says (emphasis are mine):
[...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
In the case of ptr + i, ptr is the pointer operand and the result is ptr + i. According to the quote above, both should point to an element of the array or to one past the last element. That is, in the OP's case ptr + i is a well defined expression for all i = 0, ..., 10. Finally, *(ptr + i) is well defined for 0 <= i < 10 but not for i = 10.
Edit:
I'm puzzled to whether val[10] (or, equivalently, *(ptr + 10)) yields undefined behavior or not (I'm considering C++ not C). In some circumstances this is true (e.g. int x = val[10]; is undefined behavior) but in others this is not so clear. For instance,
int* p = &val[10];
As we have seen, this is equivalent to int* p = &*(ptr + 10); which could be undefined behavior (because it dereferences a pointer to one past the last element of val) or the same as int* p = ptr + 10; which is well defined.
I found these two references which show how fuzzy this question is:
May I take the address of the one-past-the-end element of an array?
Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?
If you put it in a structure with some padding ints, it should be safe (since the pointer actually points to "known" destinations).
But it's better to avoid it.
struct SafeOutOfBoundsAccess
{
int paddingBefore[6];
int val[10];
int paddingAfter[6];
};
void foo()
{
SafeOutOfBoundsAccess a;
bool maybeTrue1 = a.val[-1] == a.paddingBefore[5];
bool maybeTrue2 = a.val[10] == a.paddingAfter[0];
}

C++ crashes in a 'for' loop with a negative expression

The following code crashes C++ with a runtime error:
#include <string>
using namespace std;
int main() {
string s = "aa";
for (int i = 0; i < s.length() - 3; i++) {
}
}
While this code does not crash:
#include <string>
using namespace std;
int main() {
string s = "aa";
int len = s.length() - 3;
for (int i = 0; i < len; i++) {
}
}
I just don't have any idea how to explain it. What could be the reason for this behavior?
s.length() is unsigned integer type. When you subtract 3, you make it negative. For an unsigned, it means very big.
A workaround (valid as long the string is long up to INT_MAX) would be to do like this:
#include <string>
using namespace std;
int main() {
string s = "aa";
for (int i = 0; i < static_cast<int> (s.length() ) - 3; i++) {
}
}
Which would never enter the loop.
A very important detail is that you have probably received a warning "comparing signed and unsigned value". The problem is that if you ignore those warnings, you enter the very dangerous field of implicit "integer conversion"(*), which has a defined behaviour, but it is difficult to follow: the best is to never ignore those compiler warnings.
(*) You might also be interested to know about "integer promotion".
First of all: why does it crash? Let's step through your program like a debugger would.
Note: I'll assume that your loop body isn't empty, but accesses the string. If this isn't the case, the cause of the crash is undefined behaviour through integer overflow. See Richard Hansens answer for that.
std::string s = "aa";//assign the two-character string "aa" to variable s of type std::string
for ( int i = 0; // create a variable i of type int with initial value 0
i < s.length() - 3 // call s.length(), subtract 3, compare the result with i. OK!
{...} // execute loop body
i++ // do the incrementing part of the loop, i now holds value 1!
i < s.length() - 3 // call s.length(), subtract 3, compare the result with i. OK!
{...} // execute loop body
i++ // do the incrementing part of the loop, i now holds value 2!
i < s.length() - 3 // call s.length(), subtract 3, compare the result with i. OK!
{...} // execute loop body
i++ // do the incrementing part of the loop, i now holds value 3!
.
.
We would expect the check i < s.length() - 3 to fail right away, since the length of s is two (we only every given it a length at the beginning and never changed it) and 2 - 3 is -1, 0 < -1 is false. However we do get an "OK" here.
This is because s.length() isn't 2. It's 2u. std::string::length() has return type size_t which is an unsigned integer. So going back to the loop condition, we first get the value of s.length(), so 2u, now subtract 3. 3 is an integer literal and interpreted by the compiler as type int. So the compiler has to calculate 2u - 3, two values of different types. Operations on primitive types only work for same types, so one has to be converted into the other. There are some strict rules, in this case, unsigned "wins", so 3 get's converted to 3u. In unsigned integers, 2u - 3u can't be -1u as such a number does not exists (well, because it has a sign of course!). Instead it calculates every operation modulo 2^(n_bits), where n_bits is the number of bits in this type (usually 8, 16, 32 or 64). So instead of -1 we get 4294967295u (assuming 32bit).
So now the compiler is done with s.length() - 3 (of course it's much much faster than me ;-) ), now let's go for the comparison: i < s.length() - 3. Putting in the values: 0 < 4294967295u. Again, different types, 0 becomes 0u, the comparison 0u < 4294967295u is obviously true, the loop condition is positively checked, we can now execute the loop body.
After incrementing, the only thing that changes in the above is the value of i. The value of i will again be converted into an unsigned int, as the comparison needs it.
So we have
(0u < 4294967295u) == true, let's do the loop body!
(1u < 4294967295u) == true, let's do the loop body!
(2u < 4294967295u) == true, let's do the loop body!
Here's the problem: What do you do in the loop body? Presumably you access the i^th character of your string, don't you? Even though it wasn't your intention, you didn't only accessed the zeroth and first, but also the second! The second doesn't exists (as your string only has two characters, the zeroth and first), you access memory you shouldn't, the program does whatever it wants (undefined behaviour). Note that the program isn't required to crash immediately. It can seem to work fine for another half an hour, so these mistakes are hard to catch. But it's always dangerous to access memory beyond the bounds, this is where most crashes come from.
So in summary, you get a different value from s.length() - 3 from that what you'd expect, this results in a positive loop condition check, that leads to repetitive execution of the loop body, which in itself accesses memory it shouldn't.
Now let's see how to avoid that, i.e. how to tell the compiler what you actually meant in your loop condition.
Lengths of strings and sizes of containers are inherently unsigned so you should use an unsigned integer in for loops.
Since unsigned int is fairly long and therefore undesirable to write over and over again in loops, just use size_t. This is the type every container in the STL uses for storing length or size. You may need to include cstddef to assert platform independence.
#include <cstddef>
#include <string>
using namespace std;
int main() {
string s = "aa";
for ( size_t i = 0; i + 3 < s.length(); i++) {
// ^^^^^^ ^^^^
}
}
Since a < b - 3 is mathematically equivalent to a + 3 < b, we can interchange them. However, a + 3 < b prevents b - 3 to be a huge value. Recall that s.length() returns an unsigned integer and unsigned integers perform operations module 2^(bits) where bits is the number of bits in the type (usually 8, 16, 32 or 64). Therefore with s.length() == 2, s.length() - 3 == -1 == 2^(bits) - 1.
Alternatively, if you want to use i < s.length() - 3 for personal preference, you have to add a condition:
for ( size_t i = 0; (s.length() > 3) && (i < s.length() - 3); ++i )
// ^ ^ ^- your actual condition
// ^ ^- check if the string is long enough
// ^- still prefer unsigned types!
Actually, in the first version you loop for a very long time, as you compare i to an unsigned integer containing a very large number. The size of a string is (in effect) the same as size_t which is an unsigned integer. When you subtract the 3 from that value it underflows and goes on to be a big value.
In the second version of the code, you assign this unsigned value to a signed variable, and so you get the correct value.
And it's not actually the condition or the value that causes the crash, it's most likely that you index the string out of bounds, a case of undefined behavior.
Assuming you left out important code in the for loop
Most people here seem unable to reproduce the crash—myself included—and it looks like the other answers here are based on the assumption that you left out some important code in the body of the for loop, and that the missing code is what is causing your crash.
If you are using i to access memory (presumably characters in the string) in the body of the for loop, and you left that code out of your question in an attempt to provide a minimal example, then the crash is easily explained by the fact that s.length() - 3 has the value SIZE_MAX due to modular arithmetic on unsigned integer types. SIZE_MAX is a very big number, so i will keep getting bigger until it is used to access an address that triggers a segfault.
However, your code could theoretically crash as-is, even if the body of the for loop is empty. I am unaware of any implementations that would crash, but maybe your compiler and CPU are exotic.
The following explanation does not assume that you left out code in your question. It takes on faith that the code you posted in your question crashes as-is; that it isn't an abbreviated stand-in for some other code that crashes.
Why your first program crashes
Your first program crashes because that is its reaction to undefined behavior in your code. (When I try running your code, it terminates without crashing because that is my implementation's reaction to the undefined behavior.)
The undefined behavior comes from overflowing an int. The C++11 standard says (in [expr] clause 5 paragraph 4):
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
In your example program, s.length() returns a size_t with value 2. Subtracting 3 from that would yield negative 1, except size_t is an unsigned integer type. The C++11 standard says (in [basic.fundamental] clause 3.9.1 paragraph 4):
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.46
46) This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.
This means that the result of s.length() - 3 is a size_t with value SIZE_MAX. This is a very big number, bigger than INT_MAX (the largest value representable by int).
Because s.length() - 3 is so big, execution spins in the loop until i gets to INT_MAX. On the very next iteration, when it tries to increment i, the result would be INT_MAX + 1 but that is not in the range of representable values for int. Thus, the behavior is undefined. In your case, the behavior is to crash.
On my system, my implementation's behavior when i is incremented past INT_MAX is to wrap (set i to INT_MIN) and keep going. Once i reaches -1, the usual arithmetic conversions (C++ [expr] clause 5 paragraph 9) cause i to equal SIZE_MAX so the loop terminates.
Either reaction is appropriate. That is the problem with undefined behavior—it might work as you intend, it might crash, it might format your hard drive, or it might cancel Firefly. You never know.
How your second program avoids the crash
As with the first program, s.length() - 3 is a size_t type with value SIZE_MAX. However, this time the value is being assigned to an int. The C++11 standard says (in [conv.integral] clause 4.7 paragraph 3):
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.
The value SIZE_MAX is too big to be representable by an int, so len gets an implementation-defined value (probably -1, but maybe not). The condition i < len will eventually be true regardless of the value assigned to len, so your program will terminate without encountering any undefined behavior.
The type of s.length() is size_t with a value of 2, therefore s.length() - 3 is also an unsigned type size_t and it has a value of SIZE_MAX which is implementation defined (which is 18446744073709551615 if its size is 64 bit). It is at least 32 bit type (can be 64 bit in 64 bit platforms) and this high number means an indefinite loop. In order to prevent this problem you can simply cast s.length() to int:
for (int i = 0; i < (int)s.length() - 3; i++)
{
//..some code causing crash
}
In the second case len is -1 because it is a signed integer and it does not enter the loop.
When it comes to crashing, this "infinite" loop is not the direct cause of the crash. If you share the code within the loop you can get further explanation.
Since s.length() is unsigned type quantity, when you do s.length()-3, it becomes negative and negative values are stored as large positive values (due to unsigned conversion specifications) and the loop goes infinite and hence it crashes.
To make it work, you must typecast the s.length() as :
static_cast < int > (s.length())
The problem you are having arises from the following statement:
i < s.length() - 3
The result of s.length() is of the unsigned size_t type.
If you imagine the binary representation of two:
0...010
And you then substitute three from this, you are effectively taking off 1 three times, that is:
0...001
0...000
But then you have a problem, removing the third digit it underflows, as it attempts to get another digit from the left:
1...111
This is what happens no matter if you have an unsigned or signed type, however the difference is the signed type uses the Most Significant Bit (or MSB) to represent if the number is negative or not. When the undeflow occurs it simply represents a negative for the signed type.
On the other hand, size_t is unsigned. When it underflows it will now represent the highest number size_t can possibly represent. Thus the loop is practically infinite (Depending on your computer, as this effects the maximum of size_t).
In order to fix this problem, you can manipulate the code you have in a few different ways:
int main() {
string s = "aa";
for (size_t i = 3; i < s.length(); i++) {
}
}
or
int main() {
string s = "aa";
for (size_t i = 0; i + 3 < s.length(); i++) {
}
}
or even:
int main() {
string s = "aa";
for(size_t i = s.length(); i > 3; --i) {
}
}
The important things to note is that the substitution has been omitted and instead addition has been used elsewhere with the same logical evaluations.
Both the first and last ones change the value of i that is available inside the for loop whereas the second will keep it the same.
I was tempted to provide this as an example of code:
int main() {
string s = "aa";
for(size_t i = s.length(); --i > 2;) {
}
}
After some thought I realised this was a bad idea. Readers' exercise is to work out why!
The reason is the same as
int a = 1000000000;
long long b = a * 100000000; would give error. When compilers multiplies these numbers it evaluates it as ints, since a and literal 1000000000 are ints, and since 10^18 is much more large than the upper bound of int, it will give error.
In your case we have s.length() - 3, as s.length() is unsigned int, it cant be negative, and since s.length() - 3 is evaluated as unsigned int, and its value is -1, it gives error here too.

Side effects in C

I thought that my understanding of side effects in programming languages was OK.
I think this is a great definition from wikipedia:
"in addition to returning a value, it also modifies some state or has
an observable interaction with calling functions or the outside world."
However, I read this in the same link(yes, I know that is probably not the best place to look for examples):
"One common demonstration of side effect behavior is that of the
assignment operator in C++. For example, assignment returns the right
operand and has the side effect of assigning that value to a variable.
This allows for syntactically clean multiple assignment:"
int i, j;
i = j = 3;
Why do they consider that a side-effect? It is the same as two simple assignment statements to 2 local variables.
Thanks in advance.
You can use an assignment expression as a value:
double d = 3.5;
int x, y;
printf("%d", x = d); // Prints "3".
y = (x = d) * 5; // Sets y to 15.
double z = x = d; // Sets z to 3 (not 3.5).
The value produced by x = d, is its main effect. The changing of the value of x is a side effect.
If the state of the world, for example the value of a variable, is modified in a calculation, it's a side effect.
For example, j = 3 calculates 3, but it also modifies the value of j as a side effect.
A less trivial example: j += 3 calculates j + 3, but it also sets j to this new value.
The semantics of C muddle the waters: in C the main point of writing i = 1 is to get the side effect of the variable assignment; not calculating the value 1. The talk about assignments as side effects makes more sense with functional programming languages such as Haskell or Erlang, where variables can only be assigned once.
I would presume that to be because j = 3 has the intended effect of assigning the value 3 to j but also has the side effect of returning the value of j

C C++ array.... need help understanding code

Can you please explain this code? It seems a little confusing to me
Is "a" a double array? I would think it's just an integer, but then in the cout statement it's used as a double array. Also in the for loop condition it says a<3[b]/3-3, it makes no sense to me, however the code compiles and runs. i'm just having trouble understanding it, it seems syntactically incorrect to me
int a,b[]={3,6,5,24};
char c[]="This code is really easy?";
for(a=0;a<3[b]/3-3;a++)
{
cout<<a[b][c];
}
Array accessors are almost syntactic sugar for pointer arithmetic. a[b] is equivalent to b[a] is equivalent to *(a+b).
That said, using index[array] rather than array[index] is utterly horrible and you should never use it.
Wow. This is really funky. This isn't really 2 dimensional array. it works because c is an array and there is an identity in the C language that treats this
b[3]
as the same as this
3[b]
so this code translates into a loop that increments a while a < (24/3-3) since 3[b] is the same as b[3] and b[3] is 24. Then it uses a[b] (which is the same as b[a]) as an index into the array c.
so, un-obfuscated this code is
int a;
int b[] = {3,5,6,24}
char c[] = "This code is really easy?";
for (a = 0; a < 5; a++)
{
cout << c[b[a]];
}
which is broken since b[4] doesn't exist, so the output should be the 3rd, 5th, 6th and 24th characters of the string c or
sco?
followed by some random character or a crash.
No, two variables are declared in the first statement: int a and int b[].
a[b][c] is just a tricky way of saying c[b[a]], that is because of the syntax for arrays: b[0] and 0[b] are equivalent.
int a,b[]={3,6,5,24};
Declares two variables, an int a and an array of ints b
char c[]="This code is really easy?";
Declares an array of char with the given string
for(a=0;a<3[b]/3-3;a++)
Iterates a through the range [0..4]:
3[b] is another way of saying b[3], which is 24.
24 / 3 = 8
8 - 3 = 5
cout << a[b][c];
This outputs the following result:
a[b] is equivalent to b[a], which will be b[0..4]
b[0..4][c] is another way of saying c[b[0..4]]
Well there is a simple trick in the code. a[3] is exactly the same as 3[a] for c compiler.
After knowing this your code can be transformed into more meaningful:
int a,b[]={3,6,5,24};
char c[]="This code is really easy?";
for(a=0;a<b[3]/3-3;a++)
{
cout<<c[b[a]];
}
a<3[b]/3-3 is the same as writing
a < b[3]/3-3
and a[b] is the same is b[a] since a is an integer
sp b[a] is one of the items from {3,6,5,24}
which then means a[b][c] is b[a][c]
which is either c[{3,6,5,24}]
foo[bar] "expands" to "*(foo + bar)" in C. So a[b] is actually the same as b[a] (because addition is commutative), meaning the ath element of the array b. And a[b][c] is the same as c[b[a]] i.e. the ith char in c where i is the ath element in b.
Okay - first, let's tackle the for loop.
When you write b[3], this is equivelent to *(b+3). *(b+3) is also equivelent to *(3+b), which can be written as 3[b]. This basically can be rewritten, more understandably, as:
for(a=0; a < ((b[3]/3) - 3); a++)
Since b[3] is a constant value (24), you can see this as:
for(a=0; a < ((24/3) - 3); a++)
or
for(a=0; a < (8 - 3); a++)
and finally:
for(a=0; a < 5; a++)
In your case, this will make a iterate from 0-4. You then output a[b][c], which can be rewritten as c[b[a]].
However, I don't see how this compiles and runs correctly, since it's accessing c[b[4]] - and b only has 4 elements. This, as written, is buggy.
First: 'a' is not initialized. Let's assume that it is initialized to 0.
'3[b]/3-3' equals 5. The loop will go from 0 to 4 using 'a'. ('3[b]' is 'b[3]')
In the a==4 step 'a[b]' (so 'b[a]') will be out of bounds (bounds of 'b' is 0..3) so it has undefined behavior. On my computer somethimes 'Segmentation fault' sometimes not. Until that point it outputs: "soc?"