Is this undefined behavior in C/C++ (Part 2) [duplicate] - c++

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Undefined behavior and sequence points
(5 answers)
Closed 5 years ago.
What does the rule about sequence points say about the following code?
int main(void) {
int i = 5;
printf("%d", ++i, i); /* Statement 1 */
}
There is just one %d. I am confused because I am getting 6 as output in compilers GCC, Turbo C++ and Visual C++. Is the behavior well defined or what?
This is related to my last question.

It's undefined because of 2 reasons:
The value of i is twice used without an intervening sequence point (the comma in argument lists is not the comma operator and does not introduce a sequence point).
You're calling a variadic function without a prototype in scope.
The number of arguments passed to printf() are not compatible with the format string.
the default output stream is usually line buffered. Without a '\n' there is no guarantee the output will be effectively output.

All arguments get evaluated when calling a function, even if they are not used, so, since the order of evaluation of function arguments is undefined, you have UB again.

I think it's well defined. The printf matches the first % placeholder to the first argument, which in this instance is a preincremented variable.

All arguments are evaluated. Order not defined. All implementations of C/C++ (that I know of) evaluate function arguments from right to left. Thus i is usually evaluated before ++i.
In printf, %d maps to the first argument. The rest are ignored.
So printing 6 is the correct behaviior.
I believe that the right-to-left evaluation order has been very very old (since the first C compilers). Certainly way before C++ was invented, and most implementations of C++ would be keeping the same evaluation order because early C++ implementations simply translates into C.
There are some technical reasons for evaluating function arguments right-to-left. In stack architectures, arguments are typically pushed onto the stack. In C, you can call a function with more arguments than actually specified -- the extra arguments are simiply ignored. If arguments are evaluated left-to-right, and pushed left-to-right, then the stack slot right under the stack pointer will hold the last argument, and there is no way for the function to get at the offset of any particular argument (because the actual number of arguments pushed depends on the caller).
In a right-to-left push order, the stack slot right under the stack pointer will always hold the first argument, and the next slot holds the second argument etc. Argument offsets will always be deterministic for the function (which may be written and compiled elsewhere into a library, separately from where it is called).
Now, right-to-left push order does not mandate right-to-left evaluation order, but in early compilers, memory is scarce. In right-to-left evaluation order, the same stack can be used in-place (essentially, after evaluating the argument -- which may be an expression or a funciton call! -- the return value is already at the right position on the stack). In left-to-right evaluation, the argument values must be stored separately and the pushed back to the stack in reverse order.
Would be interested to know the true history behind right-to-left evaluation though.

According to this documentation, any additional arguments passed to a format string shall be ignored. It also mentions for fprintf that the argument will be evaluated then ignored. I'm not sure if this is the case with printf.

Related

Function argument evaluation order [duplicate]

This question already has answers here:
Order of evaluation in C++ function parameters
(6 answers)
Closed 7 years ago.
I'm confused about in what order function arguments are evaluated when calling a C++ function. I have probably interepreted something wrong, so please explain if that is the case.
As an example, the legendary book "Programming Windows" by Charles Petzold contains code like this:
// hdc = handle to device context
// x, y = coordinates of where to output text
char szBuffer[64];
TextOut(hdc, x, y, szBuffer, snprintf(szBuffer, 64, "My text goes here"));
Now, the last argument is
snprintf(szBuffer, 64, "My text goes here")
which returns the number of characters written to the char[] szBuffer. It also writes the text "My text goes here" to the char[] szBuffer.
The fourth argument is szBuffer, which contains the text to be written. However, we can see that szBuffer is filled in the fifth argument, telling us that somehow is the expression
// argument 5
snprintf(szBuffer, 64, "My text goes here")
evaluated before
// argument 4
szBuffer
Okay, fine. Is this always the case? Evaluation is always done from right to left? Looking at the default calling convention __cdecl:
The main characteristics of __cdecl calling convention are:
Arguments are passed from right to left, and placed on the stack.
Stack cleanup is performed by the caller.
Function name is decorated by prefixing it with an underscore character '_' .
(Source: Calling conventions demystified)
(Source: MSDN on __cdecl)
It says "Arguments are passed from right to left, and placed on the stack".
Does this mean that the rightmost/last argument in a function call is always evaluated first? Then the next to last etc? The same goes for the calling convention __stdcall, it also specified a right-to-left argument passing order.
At the same time, I came across posts like this:
How are arguments evaluated in a function call?
In that post the answers say (and they're quoting the standard) that the order is unspecified.
Finally, when Charles Petzold writes
TextOut(hdc, x, y, szBuffer, snprintf(szBuffer, 64, "My text goes here"));
maybe it doesn't matter? Because even if
szBuffer
is evaluated before
snprintf(szBuffer, 64, "My text goes here")
the function TextOut is called with a char* (pointing to the first character in szBuffer), and since all arguments are evaluated before the TextOut function proceeds it doesn't matter in this particular case which gets evaluated first.
In this case it does not matter.
By passing szBuffer to a function that accepts a char * (or char const *) argument, the array decays to a pointer. The pointer value is independent of the actual data stored in the array, and the pointer value will be the same in both cases no matter whether the fourth or fifth argument to TextOut() gets fully evaluated first. Even if the fourth argument is fully evaluated first, it will evaluate as a pointer to data -- the pointed-to data is what gets changed, not the pointer itself.
To answer your posed question: the actual order of argument evaluation is unspecified. For example, in the statement f(g(), h()), a compliant compiler can execute g() and h() in any order. Further, in the statement f(g(h()), i()), the compiler can execute the three functions g, h, and i in any order with the constraint that h() gets executed before g() -- so it could execute h(), then i(), then g().
It just happens that in this specific case, evaluation order of arguments is wholly irrelevant.
(None of this behavior is dependent on calling convention, which only deals with how the arguments are communicated to the called function. The calling convention does not address in any way the order in which those arguments are evaluated.)
I would agree that it depends on the calling convention, because the standard does not specify the order.
See also: Compilers and argument order of evaluation in C++
And I would also agree that is does not matter in this case, because the snprintf is always evaluated before the TextOut - and the buffer gets filled.

How strict C/C++ compilers about operator precedence/evaluation? [duplicate]

This question already has answers here:
order of evaluation of || and && in c
(5 answers)
Closed 9 years ago.
This question is been on my mind for a while so time to let it out and see what what you guys you have to say about it.
In C/C++ the operator precedence is defined by the C specification but as with everything there may be backdoors or unknown / not well known things that the compilers may employ under the name of 'optimization' which will mess up your application in the end.
Take this simple example :
bool CheckStringPtr(const char* textData)
{
return (!textData || textData[0]==(char)0);
}
In this case I test if the pointer is null then I check the first char is zero, essentially this is a test for a zero length string. Logically the 2 operations are exchangeable but if that would happen in some cases it would crash with since it's trying to read a non-existent memory address.
So the question is : is there anything that enforces the order of how operators/functions are executed, I know the safest way is to use 2 IFs below each other but this way should be the same assuming that the evaluation order of the operators never ever change.
So are compilers forced by the C/C++ specification to not change the order of evaluation or are they sometimes allowed to change the order, like it depends on compiler parameters, optimizations especially?
First note that precedence and evaluation order are two different (largely unrelated) concepts.
So are compilers forced by the C/C++ specification to not change the order of evaluation?
The compiler must produce behaviour that is consistent with the behaviour guaranteed by the C language standard. It is free to change e.g. the order of evaluation so long as the overall observed behaviour is unchanged.
Logically the 2 operations are exchangeable but if that would happen in some cases it would crash
|| and && are defined to have short-circuit semantics; they may not be interchanged.
The C and C++ standards explicitly support short-circuit evaluation, and thus require the left-hand operand of the &&, ||, or ? operator to be evaluated before the right-hand side.
Other "sequence points" include the comma operator (not to be confused with commas separating function arguments as in f(a, b)), the end of a statement (;), and between the evaluation of a function's arguments and the call to the function.
But for the most part, order of evaluation (not to be confused with precedence) is implementation defined. So, for example, don't depend on f to be called first in an expression like f(x) + g(y).

Prefix Operator Strangeness in C++ and array indices

So I just ran into a huge headache with the prefix operator.
In my debug build in Visual C++ 2010.
someArray[++index]
Would correctly increment the array index and then use it to index into the array.
In my release build it used the array index and then incremented it afterwards, causing some huge headaches.
The weird thing is my debug build code was actually wrong for a while and I had it written as
someArray[index++]
This would use the index and then increment it, but the debug build was still incrementing it, and then using the value. I didn't even realize my mistake until this morning.
Here's a sample of the actual code.
for(unsigned int newPointIndex = 0; newPointIndex < newEdgeList.size() - 1;) {
m_edges.push_back(Edge(newEdgeList[newPointIndex], newEdgeList[++newPointIndex]));
}
There is no incrementing happening in the for loop. It happens in the actual code inside the loop while I'm indexing into the array. I thought it was a clever little optimization, but it makes it not work in the release build.
The second time I was indexing into the array, it was using the unincremented index in the release build, but was working in the debug build.
Your for loop body includes this:
Edge(newEdgeList[newPointIndex], newEdgeList[++newPointIndex])
That's undefined unspecified undefined [1] behaviour, because the two arguments can be evaluated in either order (or even simultaneously), so it's not clear whether newPointIndex will have been incremented or not before the first use.
Debug and optimized builds are very likely to evaluate arguments in different orders.
I'd suggest putting the newPointIndex increment in the for statement itself, and writing in the body:
Edge(newEdgeList[newPointIndex], newEdgeList[newPointIndex + 1])
[1]: Read the comments for the discussion of un{specified, defined}. tl;dr: Holy leaping lizards, Batman!
The problem is actually here:
m_edges.push_back(Edge(newEdgeList[newPointIndex], newEdgeList[++newPointIndex]));
You cannot tell which of the two expressions newEdgeList[newPointIndex] and newEdgeList[++newPointIndex] is going to be executed first.
According to the C++ Standard, there is no guarantee that they will be executed left-to-right. See 5.2.2/8:
"The evaluations of the postfix expression and of the argument expressions are all unsequenced
relative to one another. All side effects of argument expression evaluations are sequenced before the function is entered"
Also relevant is 1.9/15:
"When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [ Note: Value computations and side effects associated with different argument expressions are unsequenced. —end note ]"
This means that an implementation is free not only to have a different order of execution of those two expressions in debug to release build, but in theory also to change the order every time you execute that statement in the same program execution and in no deterministic way.
The solution consists in taking the increment out of those sub-expressions (as pointed out in another answer).

order of evaluation of function parameters

What will be printed as the result of the operation below:
x=5;
printf("%d,%d,%d\n",x,x<<2,x>>2);
Answer: 5,20,1
I thought order is undefined yet I found above as interview question on many sites.
From the C++ standard:
The order of evaluation of arguments is unspecified. All side effects of argument expression evaluations take effect before the function is entered. The order of evaluation of the postfix expression and the argument expression list is unspecified.
However, your example would only have undefined behavior if the arguments were x>>=2 and x<<=2, such that x were being modified.
Bit shift operators don't modify the value of the variable... so order doesn't matter.
The order of evaluation is unspecified, but it doesn't matter because you're not modifying x at all.
So the program is well-defined, and the answer is as given.
The following would have undefined semantics:
printf("%d,%d,%d\n", x, x <<= 2, x >>= 2);
I found the answer in c++ standards.
Paragraph 5.2.2.8:
The order of evaluation of arguments is unspecified. All side effects
of argument expression evaluations take effect before the function is
entered. The order of evaluation of the postfix expression and the
argument expression list is unspecified.
In other words, It depends on compiler only.
The order of evaluation is undefined in the Official C Specification.
However, as a matter of practicality, parameters are usually evaluated right-to-left.
In your problem, the bit-shift operator doesn't change the value of X, so the order of evaluation is not important. You'd get 5,20,1, whether evaluated left-to-right, right-to-left, or middle-first.
In C, parameters are pushed on to the stack in a right-to-left order, so that the 1st param (in this case, the char* "%d,%d,%d") is at the top of the stack. Parameters are usually (but not always) evaluated in the same order they are pushed.
A problem that better illustrates what you're talking about is:
int i=1;
printf("%d, %d, %d", i++, i++, i++);
The official answer is "undefined".
The practical answer, (in the several compilers/platforms I've tried), is "3, 2, 1".

Implementing a stack based virtual machine for a subset of C

Hello everyone I'm currently implementing a simple programming language for learning experience but I'm in need of some advice. Currently I'm designing my Interpreter and I've come into a problem.
My language is a subset of C and I'm having a problem regarding the stack interpreter implementation. In the language the following will compile:
somefunc ()
{
1 + 2;
}
main ()
{
somefunc ();
}
Now this is alright but when "1+2" is computed the result is pushed onto a stack and then the function returns but there's still a number on the stack, and there shouldn't be. How can I get around this problem?
I've thought about saving a "state" of the stack before a function call and restoring the "state" after the function call. For example saving the number of elements on the stack, then execute the function code, return, and then pop from the stack until we have the same number of elements as before (or maybe +1 if the function returned something).
Any ideas? Thanks for any tips!
Great question! One of my hobbies is writing compilers for toy languages, so kudos for your excellent programming taste.
An expression statement is one where the code in the statement is simply an expression. This means anything of the form <expression> ;, which includes things like assignments and function calls, but not ifs, whiles, or returns. Any expression statement will have a left over value on the stack at the end, which you should discard.
1 + 2 is an expression statement, but so are these:
x = 5;
The assignment expression leaves the value 5 on the stack since the result of an assignment is the value of the left-hand operand. After the statement is finished you pop off the unused value 5.
printf("hello world!\n");
printf() returns the number of characters output. You will have this value left over on the stack, so pop it when the statement finishes.
Effectively every expression statement will leave a value on the stack unless the expression's type is void. In that case you either special-case void statements and don't pop anything afterwards, or push a pretend "void" value onto the stack so you can always pop a value.
You'll need a smarter parser. When you see an expression whose value isn't being used then you need to emit a POP.
This is an important opportunity on learning optimization. you have a function that does number but integer math, the int math result isn't even used in any way, shape, or form.
Having your compiler optimize the function away would reduce alot of bytecode being generated and executed for nothing!