I am a C++ beginning learner.
Recently, I read a paragraph describing the evaluation of an expression.
The original text is as below:
"...Evaluation of an expression may generate side-effects, e.g. std::printf("%d", 4) prints the character '4' on the standard output...."
My question is "Why the character '4' caused by std::printf("%d", 4) is a side-effect?"
Can anyone give me a more comprehensive explanation or more examples about side-effects evaluated by expressions?
Thanks!
A side effect is any change in the system that is observable to the outside world.
Printing a number is clearly a visible change (also, internally you affect stdout state etc...)
Another important notion that can be helpful is the notion of pure function. It has two main characteristics:
A pure function is deterministic. This means, that given the same input, the function will always return the same output. ...
A pure function will not cause side effects. A side effect is any change in the system that is observable to the outside world.
Typical examples of functions violating these properties are:
static int n=0;
int foo_1(int m) // not deterministic, without side effect
{
return m+n;
}
int foo_2(int m) // with side effect, but deterministic
{
++n;
return m;
}
int foo_3(int m) // with side effect, not deterministic
{
++n;
return m+n;
}
int foo_4(int m) // without side effect + deterministic = pure function
{
return 2*m;
}
Related
I was messing around with some c++ code after learning that you cannot increment booleans in the standard. I thought incrementing booleans would be useful because I like compacting functions if I can and
unsigned char b = 0; //type doesn't really matter assuming no overflow
while (...) {
if (b++) //do something
...}
is sometimes useful to have, but it would be nice to not have to worry about integer overflow. Since booleans cannot take on any value besides 0 or 1, I thought that perhaps this would work. Of course, you could just have a boolean assigned to 1 after you do your operation but that takes an extra line of code.
Anyway, I thought about how I might achieve this a different way and I came up with this.
bool b = 0;
while (...)
if (b & (b=1)) ...
This turned out to always evaluate to true, even on the first pass through. I thought - sure, its just doing the bitwise operator from right to left, except that when I swapped the order, it did the same thing.
So my question is, how is this expression being evaluated so that it simplifies to always being true?
As an aside, the way to do what I wanted is like this I guess:
bool b = 0;
while (...) if (!b && !(b=1)) //code executes every time except the first iteration
There's a way to get it to only execute once, also.
Or just do the actually readable and obvious solution.
bool b = 0;
while (...) {
if (b) {} else {b=1;}
}
This is what std::exchange is for:
if (std::exchange(b, true)) {
// b was already true
}
As mentioned, sequencing rules mean that b & (b=1) is undefined behaviour in both C and C++. You can't modify a variable and read from it "without proper separation" (e.g., as separate statements).
There is no "strange behavior" here. The & operator, in your case is a bitwise arithmetic operator. These operator are commutative, so both operand must be evaluated before computing the resulting value.
As mentioned earlier, this is considered as undefined behavior, as the standard does not prevent computing + loading into a register the left operand, before doing the same to the second, or doing so in reverse order.
If you are using C++14 or above, you could use std::exchange do achieve this, or re-implement it quite easily using the following :
bool exchange_bool(bool &var, bool &&val) {
bool ret = var;
var = val;
return ret;
}
If you end up re-implementing it, consider using template instead of hard-coded types.
As a proper answer to your question :
So my question is, how is this expression being evaluated so that it simplifies to always being true?
It does not simplify, if we strictly follow the standard, but I'd guess you are always using the same compiler, which end up producing the same output.
Side point: You should not "compact" functions. Most of the time the price of this is less readability. This might be fine if you're the only dev working on a project, but on bigger project, or if you come back to your code later, that could be an issue. Last but not least, "compact" function are more error prone, because of their lack of readability.
If you really want shorter function, you should consider using sub-function that will handle part of the function computation. This would enable you to write short function, to quickly fix or update behavior, without impacting clarity.
I don't see what meaning you can assign to a "bitwise increment" operator. If I am right, what you achieve is just setting the low-order bit, which is done by b|= 1 (following the C conventions, this could have been defined as a unary b||, but the || was assigned another purpose). But I don't feel that this is a useful bitwise operation, and it can increment only once. IMO the standard increment operator is more bitwise.
If your goal was in fact to "increment" a boolean variable, i.e. set it to true (prefix or postfix), b= b || true is a way. (There is no "logical OR assignement" b||= true, and even less a "logical increment" b||||.)
Consider the following c++ code for a simple binary tree DFS traversal:
#include <iostream>
#include <vector>
using namespace std;
int print_vector(vector<char> *vec){
for (auto &it: *vec)
cout<<it;
cout<<'\n';
return 0;
}
int btree_dfs_traversal(int max_depth, int cur_depth, vector<char> position){
if (cur_depth>max_depth)
return 0;
print_vector(&position);
vector<char>left = position;
vector<char>right = position;
left.push_back('l');
right.push_back('r');
return btree_dfs_traversal(max_depth, cur_depth+1, left)+btree_dfs_traversal(max_depth, cur_depth+1, right);
}
int main(int argc, const char * argv[]) {
vector<char> pos;
btree_dfs_traversal(4, 0, pos);
return 0;
}
The function(a minimal example) visits a binary tree, and prints the "position" of each node it visits. The only difference with a standard DFS is that (this part made the difference), most implementation uses a iteration for visiting the two nodes, while my the return statement returns the sum of two visits.
I expected the program to recurse from the left statement, i.e. the output starts with l, ll, lll, ... And indeed in my system(OSX) it is like this, and ideone has this output too.
However, in some friends' system, the output is different. The recursion starts from the right statement, i.e. r, rr... Unfortunately, I do not have exact information of their complier information at present.
My question is: is the sum of two recursion being a undefined behavior such that different compiler can produce different results? Or, it is just wrong to start from right?
The "problem" is that in f(1) + f(2) it is unspecified which function call comes first. Different compilers will pick different order and the order can depend on any number of unrelated factors. From the C++ reference:
Order of evaluation
Order of evaluation of the operands of almost all C++ operators (including the order of evaluation of function arguments in a function-call expression and the order of evaluation of the subexpressions within any expression) is unspecified. The compiler can evaluate operands in any order, and may choose another order when the same expression is evaluated again.
There are exceptions to this rule which are noted below.
Except where noted below, there is no concept of left-to-right or right-to-left evaluation in C++. This is not to be confused with left-to-right and right-to-left associativity of operators: the expression f1() + f2() + f3() is parsed as (f1() + f2()) + f3() due to left-to-right associativity of operator+, but the function call to f3 may be evaluated first, last, or between f1() or f2() at run time.
See http://en.cppreference.com/w/cpp/language/eval_order for exceptions and more info.
I was writing a console application that would try to "guess" a number by trial and error, it worked fine and all but it left me wondering about a certain part that I wrote absentmindedly,
The code is:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int x,i,a,cc;
for(;;){
scanf("%d",&x);
a=50;
i=100/a;
for(cc=0;;cc++)
{
if(x<a)
{
printf("%d was too big\n",a);
a=a-((100/(i<<=1))?:1);
}
else if (x>a)
{
printf("%d was too small\n",a);
a=a+((100/(i<<=1))?:1);
}
else
{
printf("%d was the right number\n-----------------%d---------------------\n",a,cc);
break;
}
}
}
return 0;
}
More specifically the part that confused me is
a=a+((100/(i<<=1))?:1);
//Code, code
a=a-((100/(i<<=1))?:1);
I used ((100/(i<<=1))?:1) to make sure that if 100/(i<<=1) returned 0 (or false) the whole expression would evaluate to 1 ((100/(i<<=1))?:***1***), and I left the part of the conditional that would work if it was true empty ((100/(i<<=1))? _this space_ :1), it seems to work correctly but is there any risk in leaving that part of the conditional empty?
This is a GNU C extension (see ?: wikipedia entry), so for portability you should explicitly state the second operand.
In the 'true' case, it is returning the result of the conditional.
The following statements are almost equivalent:
a = x ?: y;
a = x ? x : y;
The only difference is in the first statement, x is always evaluated once, whereas in the second, x will be evaluated twice if it is true. So the only difference is when evaluating x has side effects.
Either way, I'd consider this a subtle use of the syntax... and if you have any empathy for those maintaining your code, you should explicitly state the operand. :)
On the other hand, it's a nice little trick for a common use case.
This is a GCC extension to the C language. When nothing appears between ?:, then the value of the comparison is used in the true case.
The middle operand in a conditional expression may be omitted. Then if the first operand is nonzero, its value is the value of the conditional expression.
Therefore, the expression
x ? : y
has the value of x if that is nonzero; otherwise, the value of y.
This example is perfectly equivalent to
x ? x : y
In this simple case, the ability to omit the middle operand is not especially useful. When it becomes useful is when the first operand does, or may (if it is a macro argument), contain a side effect. Then repeating the operand in the middle would perform the side effect twice. Omitting the middle operand uses the value already computed without the undesirable effects of recomputing it.
I recently got confused by the following c++ snippet:
#include <cstdio>
int lol(int *k){
*k +=5;
return *k;
}
int main(int argc, const char *argv[]){
int k = 0;
int w = k + lol(&k);
printf("%d\n", w);
return 0;
}
Take a look at line:
int w = k + lol(&k);
Until now I thought that this expression would be evaluated from left to right: take current value of k (which before calll to lol function is 0) and then add it to the result of lol function. But compiler proves me I'm wrong, the value of w is 10. Even if I switch places to make it
int w = lol(&k) + k;
the result would be still 10. What am I doing wrong?
Tomek
This is because the parameters in an expression are not specified to be evaluated in any particular order.
The compiler is free to execute either parameter k or lol(&k) first. There are no sequence points in that expression. This means that the side-effects of the parameters can be executed in any order.
So in short, it's not specified whether the code prints 5 or 10. Both are valid outputs.
The exception to this is short-circuiting in boolean expressions because && and || are sequence points. (see comments)
This code either yields 5 or 10 depending on the choice of evaluation oder of the function call relative to that of the left side of +.
Its behavior is not undefined because a function call is surrounded by two sequence points.
Plus is by definition commutative, so the order in your example is totally implementation-defined.
Mysticial is right when mentioning sequence points. Citing Wikipedia article (don't have C++ standard at hand):
A sequence point in imperative programming defines any point in a
computer program's execution at which it is guaranteed that all side
effects of previous evaluations will have been performed, and no side
effects from subsequent evaluations have yet been performed. They are
often mentioned in reference to C and C++, because the result of some
expressions can depend on the order of evaluation of their
subexpressions. Adding one or more sequence points is one method of
ensuring a consistent result, because this restricts the possible
orders of evaluation.
The article also has a list of sequence point in C++.
I've recently (only on SO actually) run into uses of the C/C++ comma operator. From what I can tell, it creates a sequence point on the line between the left and right hand side operators so that you have a predictable (defined) order of evaluation.
I'm a little confused about why this would be provided in the language as it seems like a patch that can be applied to code that shouldn't work in the first place. I find it hard to imagine a place it could be used that wasn't overly complex (and in need of refactoring).
Can someone explain the purpose of this language feature and where it may be used in real code (within reason), if ever?
It can be useful in the condition of while() loops:
while (update_thing(&foo), foo != 0) {
/* ... */
}
This avoids having to duplicate the update_thing() line while still maintaining the exit condition within the while() controlling expression, where you expect to find it. It also plays nicely with continue;.
It's also useful in writing complex macros that evaluate to a value.
The comma operator just separates expressions, so you can do multiple things instead of just one where only a single expression is required. It lets you do things like
(x) (y)
for (int i = 0, j = 0; ...; ++i, ++j)
Note that x is not the comma operator but y is.
You really don't have to think about it. It has some more arcane uses, but I don't believe they're ever absolutely necessary, so they're just curiosities.
Within for loop constructs it can make sense. Though I generally find them harder to read in this instance.
It's also really handy for angering your coworkers and people on SO.
bool guess() {
return true, false;
}
Playing Devil's Advocate, it might be reasonable to reverse the question:
Is it good practice to always use the semi-colon terminator?
Some points:
Replacing most semi-colons with commas would immediately make the structure of most C and C++ code clearer, and would eliminate some common errors.
This is more in the flavor of functional programming as opposed to imperative.
Javascript's 'automatic semicolon insertion' is one of its controversial syntactic features.
Whether this practice would increase 'common errors' is unknown, because nobody does this.
But of course if you did do this, you would likely annoy your fellow programmers, and become a pariah on SO.
Edit: See AndreyT's excellent 2009 answer to Uses of C comma operator. And Joel 2008 also talks a bit about the two parallel syntactic categories in C#/C/C++.
As a simple example, the structure of while (foo) a, b, c; is clear, but while (foo) a; b; c; is misleading in the absence of indentation or braces, or both.
Edit #2: As AndreyT states:
[The] C language (as well as C++) is historically a mix of two completely different programming styles, which one can refer to as "statement programming" and "expression programming".
But his assertion that "in practice statement programming produces much more readable code" [emphasis added] is patently false. Using his example, in your opinion, which of the following two lines is more readable?
a = rand(), ++a, b = rand(), c = a + b / 2, d = a < c - 5 ? a : b;
a = rand(); ++a; b = rand(); c = a + b / 2; if (a < c - 5) d = a; else d = b;
Answer: They are both unreadable. It is the white space which gives the readability--hurray for Python!. The first is shorter. But the semi-colon version does have more pixels of black space, or green space if you have a Hazeltine terminal--which may be the real issue here?
Everyone is saying that it is often used in a for loop, and that's true. However, I find it's more useful in the condition statement of the for loop. For example:
for (int x; x=get_x(), x!=sentinel; )
{
// use x
}
Rewriting this without the comma operator would require doing at least one of a few things that I'm not entirely comfortable with, such as declaring x outside the scope where it's used, or special casing the first call to get_x().
I'm also plotting ways I can utilize it with C++11 constexpr functions, since I guess they can only consist of single statements.
I think the only common example is the for loop:
for (int i = 0, j = 3; i < 10 ; ++i, ++j)
As mentioned in the c-faq:
Once in a while, you find yourself in a situation in which C expects a
single expression, but you have two things you want to say. The most
common (and in fact the only common) example is in a for loop,
specifically the first and third controlling expressions.
The only reasonable use I can think of is in the for construct
for (int count=0, bit=1; count<10; count=count+1, bit=bit<<1)
{
...
}
as it allows increment of multiple variables at the same time, still keeping the for construct structure (easy to read and understand for a trained eye).
In other cases I agree it's sort of a bad hack...
I also use the comma operator to glue together related operations:
void superclass::insert(item i) {
add(i), numInQ++, numLeft--;
}
The comma operator is useful for putting sequence in places where you can't insert a block of code. As pointed out this is handy in writing compact and readable loops. Additionally, it is useful in macro definitions. The following macro increments the number of warnings and if a boolean variable is set will also show the warning.
#define WARN if (++nwarnings, show_warnings) std::cerr
So that you may write (example 1):
if (warning_condition)
WARN << "some warning message.\n";
The comma operator is effectively a poor mans lambda function.
Though posted a few months after C++11 was ratified, I don't see any answers here pertaining to constexpr functions. This answer to a not-entirely-related question references a discussion on the comma operator and its usefulness in constant expressions, where the new constexpr keyword was mentioned specifically.
While C++14 did relax some of the restrictions on constexpr functions, it's still useful to note that the comma operator can grant you predictably ordered operations within a constexpr function, such as (from the aforementioned discussion):
template<typename T>
constexpr T my_array<T>::at(size_type n)
{
return (n < size() || throw "n too large"), (*this)[n];
}
Or even something like:
constexpr MyConstexprObject& operator+=(int value)
{
return (m_value += value), *this;
}
Whether this is useful is entirely up to the implementation, but these are just two quick examples of how the comma operator might be applied in a constexpr function.