Should I change the value of a function parameter? [closed] - c++

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I know that this is allowed in C, but I am used not to update the value of a variable passed by value.
In my "coding style", a parameter passed by value is not changed.
I mean that I prefer this:
void func(int var)
{
int locVar = var;
if(something)
{
locVar = locVar/2;
}
// [some stuff using locVar]
}
over this:
void func(int var)
{
if(something)
{
var = var/2;
}
// [some stuff using var]
}
I assume that the compiler will not produce different assembly if the register optimizations are enabled, but still, is there any good reason to prefer one of the two code snippets?

is there any good reason to prefer one of the two code snippets?
1) Compilers are not created equal. int locVar = var; may create faster code. (I was surprise to find this true in a given application.) This is local or micro-optimization and is only useful in select cases and of course may result is different performance when compiled with other options or on another machine.
2) Less is better. Introducing a synonym as in int locVar = var;is more code and more variables to understand and maintain. Usually this is less helpful.
3) Both code snippets generate valid code. So this is a style issue too. If your coding group has a coding guideline concerning this, better to follow that than be different for trivial reasons.
Select reasons to prefer one over the other: Yes. Strong reasons: In general no. When in doubt of which way to go, the easy to maintain one wins outs (IMO).

No.
From a computers point of view - no difference. Optimization will do the job.
From a personal point of view - a matter of taste
From a meta-view - do not set up pitfalls
Very often initial parameter values are needed multiple times in a function, the more common style therefore is not to overwrite parameters. It is very convenient to have the values available for debugging, logging messages (“wrote n bytes”), in catch clauses and so on. As this is more or less common style, a maintainer could easily miss your itsy-bitsy-premature-optimization. Such optimizations were common in the age of non-optimizing C compilers, nowadays they are just 'because-I-can' stuff. Remember, we write code to be readable by humans. Compilers can do that anyway.

In general the more variables are introduced the less readable and more complicated the code will be.
Sometimes it is even difficult to invent two names for one entity that semantically they would look identically.
For example a function listing can occupy several screens. In this case if you encountered a name like locVar you need to scroll the function listing backward to determine what this name means.
Moreover a function can have several parameters. Are you going to introduce new aliases for each parameter?
For readers of your code it will not be clear that you introduced new local variables just to support your coding style. They for example can think that you changed the function and forgot to remove variables that are not necessary any more.:)
Consider for example a recursive function that calculates the sum of digits of a number.
unsigned int sum( unsigned int x )
{
const unsigned int Base = 10;
unsigned int digit = x % Base;
return digit + ( ( x /= Base ) == 0 ? 0 : sum( x ) );
^^^^^^^^^^^^^
}
or that can be written like
unsigned int sum( unsigned int x )
{
const unsigned int Base = 10;
return x % Base + ( x / Base == 0 ? 0 : sum( x / Base ) );
}
What is the purpose to introduce a new local variable as an alias of x in this function?
unsigned int sum( unsigned int x )
{
const unsigned int Base = 10;
unsigned int y = x;
unsigned int digit = y % Base;
return digit + ( ( y /= Base ) == 0 ? 0 : sum( y ) );
}
As for me then the first function implementation without the intermediate variable y is more clear. The recursive nature of the function is more visible when the same variable x is used.
If you want to point out that a parameter is not changed in a function you can declare or define the function with the parameter that has qualifier const.
For example
unsigned int sum( const unsigned int x )
{
const unsigned int Base = 10;
return x % Base + ( x / Base == 0 ? 0 : sum( x / Base ) );
}
In C++ function calls could look like
#include <iostream>
constexpr unsigned int sum( const unsigned int x )
{
const unsigned int Base = 10;
return x % Base + ( x / Base == 0 ? 0 : sum( x / Base ) );
}
int main()
{
std::cout << sum( 123456789 ) << std::endl;
int a[sum( 123456789 )];
std::cout << sizeof( a ) << std::endl;
}
The program output is
45
180

Related

Composite function in C++

I am a beginner in C++ and want to do simple example of composite function.
For example, in MATLAB, I can write
a = #(x) 2*x
b = #(y) 3*y
a(b(1))
Answer is 6
I searched following questions.
function composition in C++ / C++11 and
Function Composition in C++
But they are created using advanced features, such as templates, to which I am not much familiar at this time. Is there a simple and more direct way to achieve this? In above MATLAB code, user does not need to know implementation of function handles. User can just use proper syntax to get result. Is there such way in C++?
** Another Edit:**
In above code, I am putting a value at the end. However, if I want to pass the result to a third function, MATLAB can still consider it as a function. But, how to do this in C++?
For example, in addition to above code, consider this code:
c = #(p,q) a(p)* b(q) %This results a function
c(1,2)
answer=12
d = #(r) a(b(r))
d(1)
answer=6
function [ output1 ] = f1( arg1 )
val = 2.0;
output1 = feval(arg1,val)
end
f1(d)
answer = 12
In this code, c takes two functions as input and d is composite function. In the next example, function f1 takes a function as argument and use MATLAB builtin function feval to evaluate the function at val.
How can I achieve this in C++?
How about:
#include <iostream>
int main(int, char**)
{
auto a = [](int x) { return 2 * x; };
auto b = [](int y) { return 3 * y; };
for (int i = 0; i < 5; ++i)
std::cout << i << " -> " << a(b(i)) << std::endl;
return 0;
}
Perhaps I'm misunderstanding your question, but it sounds easy:
int a(const int x) { return x * 2; }
int b(const int y) { return y * 3; }
std::cout << a(b(1)) << std::endl;
Regarding your latest edit, you can make a function return a result of another function:
int fun1(const int c) { return a(c); }
std::cout << fun1(1) << std::endl;
Note that this returns a number, the result of calling a, not the function a itself. Sure, you can return a pointer to that function, but then the syntax would be different: you'd have to write something like fun1()(1), which is rather ugly and complicated.
C++'s evaluation strategy for function arguments is always "eager" and usually "by value". The short version of what that means is, a composed function call sequence such as
x = a(b(c(1)));
is exactly the same as
{
auto t0 = c(1);
auto t1 = b(t0);
x = a(t1);
}
(auto t0 means "give t0 whatever type is most appropriate"; it is a relatively new feature and may not work in your C++ compiler. The curly braces indicate that the temporary variables t0 and t1 are destroyed after the assignment to x.)
I bring this up because you keep talking about functions "taking functions as input". There are programming languages, such as R, where writing a(b(1)) would pass the expression b(1) to a, and only actually call b when a asked for the expression to be evaluated. I thought MATLAB was not like that, but I could be wrong. Regardless, C++ is definitely not like that. In C++, a(b(1)) first evaluates b(1) and then passes the result of that evaluation to a; a has no way of finding out that the result came from a call to b. The only case in C++ that is correctly described as "a function taking another function as input" would correspond to your example using feval.
Now: The most direct translation of the MATLAB code you've shown is
#include <stdio.h>
static double a(double x) { return 2*x; }
static double b(double y) { return 3*y; }
static double c(double p, double q) { return a(p) * b(q); }
static double d(double r) { return a(b(r)); }
static double f1(double (*arg1)(double))
{ return arg1(2.0); }
int main()
{
printf("%g\n", a(b(1))); // prints 6
printf("%g\n", c(1,2)); // prints 12
printf("%g\n", d(1)); // prints 6
printf("%g\n", f1(d)); // prints 12
printf("%g\n", f1(a)); // prints 4
return 0;
}
(C++ has no need for explicit syntax like feval, because the typed parameter declaration, double (*arg1)(double) tells the compiler that arg1(2.0) is valid. In older code you may see (*arg1)(2.0) but that's not required, and I think it makes the code less readable.)
(I have used printf in this code, instead of C++'s iostreams, partly because I personally think printf is much more ergonomic than iostreams, and partly because that makes this program also a valid C program with the same semantics. That may be useful, for instance, if the reason you are learning C++ is because you want to write MATLAB extensions, which, the last time I checked, was actually easier if you stuck to plain C.)
There are significant differences; for instance, the MATLAB functions accept vectors, whereas these C++ functions only take single values; if I'd wanted b to call c I would have had to swap them or write a "forward declaration" of c above b; and in C++, (with a few exceptions that you don't need to worry about right now,) all your code has to be inside one function or another. Learning these differences is part of learning C++, but you don't need to confuse yourself with templates and lambdas and classes and so on just yet. Stick to free functions with fixed type signatures at first.
Finally, I would be remiss if I didn't mention that in
static double c(double p, double q) { return a(p) * b(q); }
the calls to a and b might happen in either order. There is talk of changing this but it has not happened yet.
int a(const int x){return x * 2;}
int b(const int x){return x * 3;}
int fun1(const int x){return a(x);}
std::cout << fun1(1) << std::endl; //returns 2
This is basic compile-time composition. If you wanted runtime composition, things get a tad more involved.

putting a '&' after the type [duplicate]

This question already has answers here:
ampersand (&) at the end of variable etc
(5 answers)
Closed 3 years ago.
I am fairly new to programming. I am just moving on to C++ from C in my college courses, and I encountered something that I haven't seen before in C. Sometimes after the type, either in a function declaration or passing a parameter, a & immediately follows the type. For example, we use a struct called Customer in one of our projects, and some of the functions pass Customer&. Why is the ampersand after the type, as opposed to in front? Thanks!
References in C++ simply allow for a cleaner way to execute the following code:
int x = 16;
int* y = &x;
cout << *y;
Which could be written instead as
int x = 16;
int& y = x;
cout << y;
When defining functions, a reference allows a function to change the value of parameters without causing the user of the function to put an ampersand before everything. E.g.
void func( int& a )
{
a = 5;
}
void main()
{
int A = 10;
func( A );
cout << A; // Will output '5'
}
Be careful with this type of mutation, as a programmer using functions like this without checking the implementation might not realize that the function is changing the value of the parameters unless the intent is obvious. init_server(my_server) would be an example of a case where it's obvious, but to_json(my_struct) would clearly be an example where you should not be using a reference to change the struct in any way.
But, one of the most important uses of references, would be function like
int sum_vector( const vector<int>& a ) {
int sum = 0;
for( int i = 0; i < a.size(); i++ ) {
sum += a[i];
}
return sum;
}
If you tried to make sum_vector take in a vector, and you passed in a vector with 100 million entries, then it would have to copy them all over, taking forever. You could take in a pointer, but then the internal parts of the function would have to constantly dereference, and it must called with sum_vector(&myvec), which is more annoying than sum_vector(myvec). In this way, using a const reference, you can prevent the highly inefficient copying of the whole vector into the function body, while keeping syntax neat. Using const lets you reassure yourself that you're not going to change the vector that you were given. And, it also assures the user of your function that you won't change it. Similarly, void to_json(const some_struct&) would be a better function definition as it ensures you won't change the user's data.

What is the equivalent of CPython string concatenation, in C++? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Simple string concatenation
Yesterday, as I'm writing this, someone asked on SO
if i have a string x='wow'
applying the function add in python :
x='wow'
x.add(x)
'wowwow'
how can i do that in C++?
With add (which is non-existent) corrected to __add__ (a standard
method) this is a deep and interesting question, involving both subtle low
level details, high level algorithm complexity considerations, and
even threading!, and yet it’s formulated in a very short and concise way.
I am reposting
the original question
as my own because I did not get a chance to provide a correct
answer before it was deleted, and my efforts at having the original question revived, so
that I could help to increase the general understanding of these issues, failed.
I have changed the original title “select python or C++” to …
What is the equivalent of CPython string concatenation, in C++?
thereby narrowing the question a little.
The general meaning of the code snippet.
The given code snippet
x = 'wow'
x.__add__( x )
has different meanings in Python 2.x and Python 3.x.
In Python 2.x the strings are by default narrow strings, with one byte per encoding unit,
corresponding to C++ char based strings.
In Python 3.x the strings are wide strings, guaranteed to represent Unicode,
corresponding to the in practice usage of C++
wchar_t based strings, and likewise with an unspecified 2 or 4 bytes
per encoding unit.
Disregarding efficiency the __add__ method behaves the same in both main
Python versions, corresponding to the C++ + operator for std::basic_string
(i.e., for std::string and std::wstring), e.g. quoting the CPython 3k
documentation:
object.__add__(self, other)
… to evaluate the expression x + y, where x is an instance of a class that has an __add__() method, x.__add__(y) is called.
So as an example, the CPython 2.7 code
x = 'wow'
y = x.__add__( x )
print y
would normally be written as
x = 'wow'
y = x + x
print y
and corresponds to this C++ code:
#include <iostream>
#include <string>
using namespace std;
int main()
{
auto const x = string( "wow" );
auto const y = x + x;
cout << y << endl;
}
A main difference from the many incorrect answers given for
the original question,
is that the C++ correspondence is an expression, and not an update.
It is perhaps natural to think that the method name __add__ signifies a change
of the string object’s value, an update, but with respect to observable
behavior Python strings are immutable strings. Their values never change, as
far as can be directly observed in Python code. This is the same as in Java and
C#, but very unlike C++’s mutable std::basic_string strings.
The quadratic to linear time optimization in CPython.
CPython 2.4 added
the following
optimization, for narrow strings only:
String concatenations in statements of the form s = s + "abc" and s += "abc"
are now performed more efficiently in certain circumstances. This optimization
won't be present in other Python implementations such as Jython, so you shouldn't
rely on it; using the join() method of strings is still recommended when you
want to efficiently glue a large number of strings together. (Contributed by Armin Rigo.)
It may not sound like much, but where it’s applicable this optimization
reduces a sequence of concatenations from quadratic time O(n2)
to linear time O(n), in the length n of the final result.
First of all the optimization replaces concatenations with updates, e.g. as if
x = x + a
x = x + b
x = x + c
or for that matter
x = x + a + b + c
was replaced with
x += a
x += b
x += c
In the general case there will be many references to the string object that x
refers to, and since Python string objects must appear to be immutable, the first
update assignment cannot change that string object. It therefore, generally, has
to create a completely new string object, and assign that (reference) to x.
At this point x holds the only reference to that object. Which means that the
object can be updated by the update assignment that appends b, because there are
no observers. And likewise for the append of c.
It’s a bit like quantum mechanics: you cannot observe this dirty thing
going on, and it’s never done when there is the possibility of anyone observing
the machinations, but you can infer that it must have been going on by the statistics that
you collect about performance, for linear time is quite different from quadratic time!
How is linear time achieved? Well, with the update the same strategy of buffer
doubling as in C++ std::basic_string can be done, which means that the
existing buffer contents only need to be copied at each buffer reallocation,
and not for each append operation. Which means that the
total cost of copying is at worst linear in the final string size, in the
same way as the sum (representing the costs of copying at each buffer doubling)
1 + 2 + 4 + 8 + … + N is less than 2*N.
Linear time string concatenation expressions in C++.
In order to faithfully reproduce the CPython code snippet in C++,
the final result and expression-nature of the operation be should captured,
and also its performance characteristics should be captured!
A direct translation of CPython __add__ to C++ std::basic_string + fails
to reliably capture the CPython linear time. The C++ + string concatenation
may be optimized by the compiler in the same way as the CPython
optimization. Or not – which then means that one has told a beginner that
the C++ equivalent of a Python linear time operation, is something with quadratic
time – hey, this is what you should use…
For the performance characteristics C++ += is the basic answer, but, this does
not catch the expression nature of the Python code.
The natural answer is a linear time C++ string builder class that translates
a concatenation expression to a series of += updates, so that the Python code
from __future__ import print_function
def foo( s ):
print( s )
a = 'alpha'
b = 'beta'
c = 'charlie'
foo( a + b + c ) # Expr-like linear time string building.
correspond to roughly
#include <string>
#include <sstream>
namespace my {
using std::string;
using std::ostringstream;
template< class Type >
string stringFrom( Type const& v )
{
ostringstream stream;
stream << v;
return stream.str();
}
class StringBuilder
{
private:
string s_;
template< class Type >
static string fastStringFrom( Type const& v )
{
return stringFrom( v );
}
static string const& fastStringFrom( string const& s )
{ return s; }
static char const* fastStringFrom( char const* const s )
{ return s; }
public:
template< class Type >
StringBuilder& operator<<( Type const& v )
{
s_ += fastStringFrom( v );
return *this;
}
string const& str() const { return s_; }
char const* cStr() const { return s_.c_str(); }
operator string const& () const { return str(); }
operator char const* () const { return cStr(); }
};
} // namespace my
#include <iostream>
using namespace std;
typedef my::StringBuilder S;
void foo( string const& s )
{
cout << s << endl;
}
int main()
{
string const a = "alpha";
string const b = "beta";
string const c = "charlie";
foo( S() << a << b << c ); // Expr-like linear time string building.
}

Where is the virtual function call overhead?

I'm trying to benchmark the difference between a function pointer call and a virtual function call. To do this, I have written two pieces of code, that do the same mathematical computation over an array. One variant uses an array of pointers to functions and calls those in a loop. The other variant uses an array of pointers to a base class and calls its virtual function, which is overloaded in the derived classes to do absolutely the same thing as the functions in the first variant. Then I print the time elapsed and use a simple shell script to run the benchmark many times and compute the average run time.
Here is the code:
#include <iostream>
#include <cstdlib>
#include <ctime>
#include <cmath>
using namespace std;
long long timespecDiff(struct timespec *timeA_p, struct timespec *timeB_p)
{
return ((timeA_p->tv_sec * 1000000000) + timeA_p->tv_nsec) -
((timeB_p->tv_sec * 1000000000) + timeB_p->tv_nsec);
}
void function_not( double *d ) {
*d = sin(*d);
}
void function_and( double *d ) {
*d = cos(*d);
}
void function_or( double *d ) {
*d = tan(*d);
}
void function_xor( double *d ) {
*d = sqrt(*d);
}
void ( * const function_table[4] )( double* ) = { &function_not, &function_and, &function_or, &function_xor };
int main(void)
{
srand(time(0));
void ( * index_array[100000] )( double * );
double array[100000];
for ( long int i = 0; i < 100000; ++i ) {
index_array[i] = function_table[ rand() % 4 ];
array[i] = ( double )( rand() / 1000 );
}
struct timespec start, end;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
for ( long int i = 0; i < 100000; ++i ) {
index_array[i]( &array[i] );
}
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
unsigned long long time_elapsed = timespecDiff(&end, &start);
cout << time_elapsed / 1000000000.0 << endl;
}
and here is the virtual function variant:
#include <iostream>
#include <cstdlib>
#include <ctime>
#include <cmath>
using namespace std;
long long timespecDiff(struct timespec *timeA_p, struct timespec *timeB_p)
{
return ((timeA_p->tv_sec * 1000000000) + timeA_p->tv_nsec) -
((timeB_p->tv_sec * 1000000000) + timeB_p->tv_nsec);
}
class A {
public:
virtual void calculate( double *i ) = 0;
};
class A1 : public A {
public:
void calculate( double *i ) {
*i = sin(*i);
}
};
class A2 : public A {
public:
void calculate( double *i ) {
*i = cos(*i);
}
};
class A3 : public A {
public:
void calculate( double *i ) {
*i = tan(*i);
}
};
class A4 : public A {
public:
void calculate( double *i ) {
*i = sqrt(*i);
}
};
int main(void)
{
srand(time(0));
A *base[100000];
double array[100000];
for ( long int i = 0; i < 100000; ++i ) {
array[i] = ( double )( rand() / 1000 );
switch ( rand() % 4 ) {
case 0:
base[i] = new A1();
break;
case 1:
base[i] = new A2();
break;
case 2:
base[i] = new A3();
break;
case 3:
base[i] = new A4();
break;
}
}
struct timespec start, end;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
for ( int i = 0; i < 100000; ++i ) {
base[i]->calculate( &array[i] );
}
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
unsigned long long time_elapsed = timespecDiff(&end, &start);
cout << time_elapsed / 1000000000.0 << endl;
}
My system is LInux, Fedora 13, gcc 4.4.2. The code is compiled it with g++ -O3. The first one is test1, the second is test2.
Now I see this in console:
[Ignat#localhost circuit_testing]$ ./test2 && ./test2
0.0153142
0.0153166
Well, more or less, I think. And then, this:
[Ignat#localhost circuit_testing]$ ./test2 && ./test2
0.01531
0.0152476
Where are the 25% which should be visible? How can the first executable be even slower than the second one?
I'm asking this because I'm doing a project which involves calling a lot of small functions in a row like this in order to compute the values of an array, and the code I've inherited does a very complex manipulation to avoid the virtual function call overhead. Now where is this famous call overhead?
In both cases you are calling functions indirectly. In one case through your table of function pointers, and in the other through the compiler's array of function pointers (the vtable). Not surprisingly, two similar operations give you similar timing results.
Virtual functions may be slower than regular functions, but that's due to things like inlines. If you call a function through a function table, those can't be inlined either, and the lookup time is pretty much the same. Looking up through your own lookup table is of course going to be the same as looking up through the compiler's lookup table.
Edit: Or even slower, because the compiler knows a lot more than you about things like processor cache and such.
I think you're seeing the difference, but it's just the function call overhead. Branch misprediction, memory access and the trig functions are the same in both cases. Compared to those, it's just not that big a deal, though the function pointer case was definitely a bit quicker when I tried it.
If this is representative of your larger program, this is a good demonstration that this type of microoptimization is sometimes just a drop in the ocean, and at worst futile. But leaving that aside, for a clearer test, the functions should perform some simpler operation, that is different for each function:
void function_not( double *d ) {
*d = 1.0;
}
void function_and( double *d ) {
*d = 2.0;
}
And so on, and similarly for the virtual functions.
(Each function should do something different, so that they don't get elided and all end up with the same address; that would make the branch prediction work unrealistically well.)
With these changes, the results are a bit different. Best of 4 runs in each case. (Not very scientific, but the numbers are broadly similar for larger numbers of runs.) All timings are in cycles, running on my laptop. Code was compiled with VC++ (only changed the timing) but gcc implements virtual function calls in the same way so the relative timings should be broadly similar even with different OS/x86 CPU/compiler.
Function pointers: 2,052,770
Virtuals: 3,598,039
That difference seems a bit excessive! Sure enough, the two bits of code aren't quite the same in terms of their memory access behaviour. The second one should have a table of 4 A *s, used to fill in base, rather than new'ing up a new one for each entry. Both examples will then have similar behaviour (1 cache miss/N entries) when fetching the pointer to jump through. For example:
A *tbl[4] = { new A1, new A2, new A3, new A4 };
for ( long int i = 0; i < 100000; ++i ) {
array[i] = ( double )( rand() / 1000 );
base[i] = tbl[ rand() % 4 ];
}
With this in place, still using the simplified functions:
Virtuals (as suggested here): 2,487,699
So there's 20%, best case. Close enough?
So perhaps your colleague was right to at least consider this, but I suspect that in any realistic program the call overhead won't be enough of a bottleneck to be worth jumping through hoops over.
Nowadays, on most systems, memory access is the primary bottleneck, and not the CPU. In many cases, there is little significant difference between virtual and non-virtual functions - they usually represent a very small portion of execution time. (Sorry, I don't have reported figures to back this up, just emprical data.)
If you want to get the best performance you will get more bang for your buck if you look into how to parallelize the computation to take advantage of multiple cores/processing units, rather than worring about micro-details of virtual vs non-virtual functions.
Many people fall into the habit of doing things just because they are thought to be "faster". It's all relative.
If I'm going to take a 100-mile drive from my home, I have to start by driving around the block. I can drive around the block to the right, or to the left. One of those will be "faster". But will it matter? Of course not.
In this case, the functions that you call are in turn calling math functions.
If you pause the program under the IDE or GDB, I suspect you will find that nearly every time you pause it it will be in those math library routines (or it should be!), and dereferencing an additional pointer to get there (assuming it doesn't bust a cache) should be lost in the noise.
Added: Here is a favorite video: Harry Porter's relay computer. As that thing laboriously clacks away adding numbers and stepping its program counter, I find it helpful to be mindful that that's what all computers are doing, just on a different scale of time and complexity. In your case, think about an algorithm to do sin, cos, tan, or sqrt. Inside, it is clacking away doing those things, and only incidentally following addresses or messing with a really slow memory to get there.
And finally, the function pointer approach has turned out to be the fastest one. Which was what I'd expected from the very beginning.

Uses of C comma operator [duplicate]

This question already has answers here:
What does the comma operator , do?
(8 answers)
Closed 8 years ago.
You see it used in for loop statements, but it's legal syntax anywhere. What uses have you found for it elsewhere, if any?
C language (as well as C++) is historically a mix of two completely different programming styles, which one can refer to as "statement programming" and "expression programming". As you know, every procedural programming language normally supports such fundamental constructs as sequencing and branching (see Structured Programming). These fundamental constructs are present in C/C++ languages in two forms: one for statement programming, another for expression programming.
For example, when you write your program in terms of statements, you might use a sequence of statements separated by ;. When you want to do some branching, you use if statements. You can also use cycles and other kinds of control transfer statements.
In expression programming the same constructs are available to you as well. This is actually where , operator comes into play. Operator , is nothing else than a separator of sequential expressions in C, i.e. operator , in expression programming serves the same role as ; does in statement programming. Branching in expression programming is done through ?: operator and, alternatively, through short-circuit evaluation properties of && and || operators. (Expression programming has no cycles though. And to replace them with recursion you'd have to apply statement programming.)
For example, the following code
a = rand();
++a;
b = rand();
c = a + b / 2;
if (a < c - 5)
d = a;
else
d = b;
which is an example of traditional statement programming, can be re-written in terms of expression programming as
a = rand(), ++a, b = rand(), c = a + b / 2, a < c - 5 ? d = a : d = b;
or as
a = rand(), ++a, b = rand(), c = a + b / 2, d = a < c - 5 ? a : b;
or
d = (a = rand(), ++a, b = rand(), c = a + b / 2, a < c - 5 ? a : b);
or
a = rand(), ++a, b = rand(), c = a + b / 2, (a < c - 5 && (d = a, 1)) || (d = b);
Needless to say, in practice statement programming usually produces much more readable C/C++ code, so we normally use expression programming in very well measured and restricted amounts. But in many cases it comes handy. And the line between what is acceptable and what is not is to a large degree a matter of personal preference and the ability to recognize and read established idioms.
As an additional note: the very design of the language is obviously tailored towards statements. Statements can freely invoke expressions, but expressions can't invoke statements (aside from calling pre-defined functions). This situation is changed in a rather interesting way in GCC compiler, which supports so called "statement expressions" as an extension (symmetrical to "expression statements" in standard C). "Statement expressions" allow user to directly insert statement-based code into expressions, just like they can insert expression-based code into statements in standard C.
As another additional note: in C++ language functor-based programming plays an important role, which can be seen as another form of "expression programming". According to the current trends in C++ design, it might be considered preferable over traditional statement programming in many situations.
I think generally C's comma is not a good style to use simply because it's so very very easy to miss - either by someone else trying to read/understand/fix your code, or you yourself a month down the line. Outside of variable declarations and for loops, of course, where it is idiomatic.
You can use it, for example, to pack multiple statements into a ternary operator (?:), ala:
int x = some_bool ? printf("WTF"), 5 : fprintf(stderr, "No, really, WTF"), 117;
but my gods, why?!? (I've seen it used in this way in real code, but don't have access to it to show unfortunately)
Two killer comma operator features in C++:
a) Read from stream until specific string is encountered (helps to keep the code DRY):
while (cin >> str, str != "STOP") {
//process str
}
b) Write complex code in constructor initializers:
class X : public A {
X() : A( (global_function(), global_result) ) {};
};
I've seen it used in macros where the macro is pretending to be a function and wants to return a value but needs to do some other work first. It's always ugly and often looks like a dangerous hack though.
Simplified example:
#define SomeMacro(A) ( DoWork(A), Permute(A) )
Here B=SomeMacro(A) "returns" the result of Permute(A) and assigns it to "B".
The Boost Assignment library is a good example of overloading the comma operator in a useful, readable way. For example:
using namespace boost::assign;
vector<int> v;
v += 1,2,3,4,5,6,7,8,9;
I had to use a comma to debug mutex locks to put a message before the lock starts to wait.
I could not but the log message in the body of the derived lock constructor, so I had to put it in the arguments of the base class constructor using : baseclass( ( log( "message" ) , actual_arg )) in the initialization list. Note the extra parenthesis.
Here is an extract of the classes :
class NamedMutex : public boost::timed_mutex
{
public:
...
private:
std::string name_ ;
};
void log( NamedMutex & ref__ , std::string const& name__ )
{
LOG( name__ << " waits for " << ref__.name_ );
}
class NamedUniqueLock : public boost::unique_lock< NamedMutex >
{
public:
NamedUniqueLock::NamedUniqueLock(
NamedMutex & ref__ ,
std::string const& name__ ,
size_t const& nbmilliseconds )
:
boost::unique_lock< NamedMutex >( ( log( ref__ , name__ ) , ref__ ) ,
boost::get_system_time() + boost::posix_time::milliseconds( nbmilliseconds ) ),
ref_( ref__ ),
name_( name__ )
{
}
....
};
From the C standard:
The left operand of a comma operator is evaluated as a void expression; there is a sequence point after its evaluation. Then the right operand is evaluated; the result has its type and value. (A comma operator does not yield an lvalue.)) If an attempt is made to modify the result of a comma operator or to access it after the next sequence point, the behavior is undefined.
In short it let you specify more than one expression where C expects only one. But in practice it's mostly used in for loops.
Note that:
int a, b, c;
is NOT the comma operator, it's a list of declarators.
It is sometimes used in macros, such as debug macros like this:
#define malloc(size) (printf("malloc(%d)\n", (int)(size)), malloc((size)))
(But look at this horrible failure, by yours truly, for what can happen when you overdo it.)
But unless you really need it, or you are sure that it makes the code more readable and maintainable, I would recommend against using the comma operator.
You can overload it (as long as this question has a "C++" tag). I have seen some code, where overloaded comma was used for generating matrices. Or vectors, I don't remember exactly. Isn't it pretty (although a little confusing):
MyVector foo = 2, 3, 4, 5, 6;
Outside of a for loop, and even there is has can have an aroma of code smell, the only place I've seen as a good use for the comma operator is as part of a delete:
delete p, p = 0;
The only value over the alternative is you can accidently copy/paste only half of this operation if it is on two lines.
I also like it because if you do it out of habit, you'll never forget the zero assignment. (Of course, why p isn't inside somekind of auto_ptr, smart_ptr, shared_ptr, etc wrapper is a different question.)
Given #Nicolas Goy's citation from the standard, then it sounds like you could write one-liner for loops like:
int a, b, c;
for(a = 0, b = 10; c += 2*a+b, a <= b; a++, b--);
printf("%d", c);
But good God, man, do you really want to make your C code more obscure in this way?
It's very useful in adding some commentary into ASSERT macros:
ASSERT(("This value must be true.", x));
Since most assert style macros will output the entire text of their argument, this adds an extra bit of useful information into the assertion.
In general I avoid using the comma operator because it just makes code less readable. In almost all cases, it would be simpler and clearer to just make two statements. Like:
foo=bar*2, plugh=hoo+7;
offers no clear advantage over:
foo=bar*2;
plugh=hoo+7;
The one place besides loops where I have used it it in if/else constructs, like:
if (a==1)
... do something ...
else if (function_with_side_effects_including_setting_b(), b==2)
... do something that relies on the side effects ...
You could put the function before the IF, but if the function takes a long time to run, you might want to avoid doing it if it's not necessary, and if the function should not be done unless a!=1, then that's not an option. The alternative is to nest the IF's an extra layer. That's actually what I usually do because the above code is a little cryptic. But I've done it the comma way now and then because nesting is also cryptic.
I often use it to run a static initializer function in some cpp files, to avoid lazy initalization problems with classic singletons:
void* s_static_pointer = 0;
void init() {
configureLib();
s_static_pointer = calculateFancyStuff(x,y,z);
regptr(s_static_pointer);
}
bool s_init = init(), true; // just run init() before anything else
Foo::Foo() {
s_static_pointer->doStuff(); // works properly
}
For me the one really useful case with commas in C is using them to perform something conditionally.
if (something) dothis(), dothat(), x++;
this is equivalent to
if (something) { dothis(); dothat(); x++; }
This is not about "typing less", it's just looks very clear sometimes.
Also loops are just like that:
while(true) x++, y += 5;
Of course both can only be useful when the conditional part or executable part of the loop is quite small, two-three operations.
The only time I have ever seen the , operator used outside a for loop was to perform an assingment in a ternary statement. It was a long time ago so I cannot remeber the exact statement but it was something like:
int ans = isRunning() ? total += 10, newAnswer(total) : 0;
Obviously no sane person would write code like this, but the author was an evil genius who construct c statements based on the assembler code they generated, not readability. For instance he sometimes used loops instead of if statements because he preferred the assembler it generated.
His code was very fast but unmaintainable, I am glad I don't have to work with it any more.
I've used it for a macro to "assign a value of any type to an output buffer pointed to by a char*, and then increment the pointer by the required number of bytes", like this:
#define ASSIGN_INCR(p, val, type) ((*((type) *)(p) = (val)), (p) += sizeof(type))
Using the comma operator means the macro can be used in expressions or as statements as desired:
if (need_to_output_short)
ASSIGN_INCR(ptr, short_value, short);
latest_pos = ASSIGN_INCR(ptr, int_value, int);
send_buff(outbuff, (int)(ASSIGN_INCR(ptr, last_value, int) - outbuff));
It reduced some repetitive typing but you do have to be careful it doesn't get too unreadable.
Please see my overly-long version of this answer here.
It can be handy for "code golf":
Code Golf: Playing Cubes
The , in if(i>0)t=i,i=0; saves two characters.
qemu has some code that uses the comma operator within the conditional portion of a for loop (see QTAILQ_FOREACH_SAFE in qemu-queue.h). What they did boils down to the following:
#include <stdio.h>
int main( int argc, char* argv[] ){
int x = 0, y = 0;
for( x = 0; x < 3 && (y = x+1,1); x = y ){
printf( "%d, %d\n", x, y );
}
printf( "\n%d, %d\n\n", x, y );
for( x = 0, y = x+1; x < 3; x = y, y = x+1 ){
printf( "%d, %d\n", x, y );
}
printf( "\n%d, %d\n", x, y );
return 0;
}
... with the following output:
0, 1
1, 2
2, 3
3, 3
0, 1
1, 2
2, 3
3, 4
The first version of this loop has the following effects:
It avoids doing two assignments, so the chances of the code getting out of sync is reduced
Since it uses &&, the assignment is not evaluated after the last iteration
Since the assignment isn't evaluated, it won't try to de-reference the next element in the queue when it's at the end (in qemu's code, not the code above).
Inside the loop, you have access to the current and next element
Found it in array initialization:
In C what exactly happens if i use () to initialize a double dimension array instead of the {}?
When I initialize an array a[][]:
int a[2][5]={(8,9,7,67,11),(7,8,9,199,89)};
and then display the array elements.
I get:
11 89 0 0 0
0 0 0 0 0