What is the equivalent of CPython string concatenation, in C++? [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Simple string concatenation
Yesterday, as I'm writing this, someone asked on SO
if i have a string x='wow'
applying the function add in python :
x='wow'
x.add(x)
'wowwow'
how can i do that in C++?
With add (which is non-existent) corrected to __add__ (a standard
method) this is a deep and interesting question, involving both subtle low
level details, high level algorithm complexity considerations, and
even threading!, and yet it’s formulated in a very short and concise way.
I am reposting
the original question
as my own because I did not get a chance to provide a correct
answer before it was deleted, and my efforts at having the original question revived, so
that I could help to increase the general understanding of these issues, failed.
I have changed the original title “select python or C++” to …
What is the equivalent of CPython string concatenation, in C++?
thereby narrowing the question a little.

The general meaning of the code snippet.
The given code snippet
x = 'wow'
x.__add__( x )
has different meanings in Python 2.x and Python 3.x.
In Python 2.x the strings are by default narrow strings, with one byte per encoding unit,
corresponding to C++ char based strings.
In Python 3.x the strings are wide strings, guaranteed to represent Unicode,
corresponding to the in practice usage of C++
wchar_t based strings, and likewise with an unspecified 2 or 4 bytes
per encoding unit.
Disregarding efficiency the __add__ method behaves the same in both main
Python versions, corresponding to the C++ + operator for std::basic_string
(i.e., for std::string and std::wstring), e.g. quoting the CPython 3k
documentation:
object.__add__(self, other)
… to evaluate the expression x + y, where x is an instance of a class that has an __add__() method, x.__add__(y) is called.
So as an example, the CPython 2.7 code
x = 'wow'
y = x.__add__( x )
print y
would normally be written as
x = 'wow'
y = x + x
print y
and corresponds to this C++ code:
#include <iostream>
#include <string>
using namespace std;
int main()
{
auto const x = string( "wow" );
auto const y = x + x;
cout << y << endl;
}
A main difference from the many incorrect answers given for
the original question,
is that the C++ correspondence is an expression, and not an update.
It is perhaps natural to think that the method name __add__ signifies a change
of the string object’s value, an update, but with respect to observable
behavior Python strings are immutable strings. Their values never change, as
far as can be directly observed in Python code. This is the same as in Java and
C#, but very unlike C++’s mutable std::basic_string strings.
The quadratic to linear time optimization in CPython.
CPython 2.4 added
the following
optimization, for narrow strings only:
String concatenations in statements of the form s = s + "abc" and s += "abc"
are now performed more efficiently in certain circumstances. This optimization
won't be present in other Python implementations such as Jython, so you shouldn't
rely on it; using the join() method of strings is still recommended when you
want to efficiently glue a large number of strings together. (Contributed by Armin Rigo.)
It may not sound like much, but where it’s applicable this optimization
reduces a sequence of concatenations from quadratic time O(n2)
to linear time O(n), in the length n of the final result.
First of all the optimization replaces concatenations with updates, e.g. as if
x = x + a
x = x + b
x = x + c
or for that matter
x = x + a + b + c
was replaced with
x += a
x += b
x += c
In the general case there will be many references to the string object that x
refers to, and since Python string objects must appear to be immutable, the first
update assignment cannot change that string object. It therefore, generally, has
to create a completely new string object, and assign that (reference) to x.
At this point x holds the only reference to that object. Which means that the
object can be updated by the update assignment that appends b, because there are
no observers. And likewise for the append of c.
It’s a bit like quantum mechanics: you cannot observe this dirty thing
going on, and it’s never done when there is the possibility of anyone observing
the machinations, but you can infer that it must have been going on by the statistics that
you collect about performance, for linear time is quite different from quadratic time!
How is linear time achieved? Well, with the update the same strategy of buffer
doubling as in C++ std::basic_string can be done, which means that the
existing buffer contents only need to be copied at each buffer reallocation,
and not for each append operation. Which means that the
total cost of copying is at worst linear in the final string size, in the
same way as the sum (representing the costs of copying at each buffer doubling)
1 + 2 + 4 + 8 + … + N is less than 2*N.
Linear time string concatenation expressions in C++.
In order to faithfully reproduce the CPython code snippet in C++,
the final result and expression-nature of the operation be should captured,
and also its performance characteristics should be captured!
A direct translation of CPython __add__ to C++ std::basic_string + fails
to reliably capture the CPython linear time. The C++ + string concatenation
may be optimized by the compiler in the same way as the CPython
optimization. Or not – which then means that one has told a beginner that
the C++ equivalent of a Python linear time operation, is something with quadratic
time – hey, this is what you should use…
For the performance characteristics C++ += is the basic answer, but, this does
not catch the expression nature of the Python code.
The natural answer is a linear time C++ string builder class that translates
a concatenation expression to a series of += updates, so that the Python code
from __future__ import print_function
def foo( s ):
print( s )
a = 'alpha'
b = 'beta'
c = 'charlie'
foo( a + b + c ) # Expr-like linear time string building.
correspond to roughly
#include <string>
#include <sstream>
namespace my {
using std::string;
using std::ostringstream;
template< class Type >
string stringFrom( Type const& v )
{
ostringstream stream;
stream << v;
return stream.str();
}
class StringBuilder
{
private:
string s_;
template< class Type >
static string fastStringFrom( Type const& v )
{
return stringFrom( v );
}
static string const& fastStringFrom( string const& s )
{ return s; }
static char const* fastStringFrom( char const* const s )
{ return s; }
public:
template< class Type >
StringBuilder& operator<<( Type const& v )
{
s_ += fastStringFrom( v );
return *this;
}
string const& str() const { return s_; }
char const* cStr() const { return s_.c_str(); }
operator string const& () const { return str(); }
operator char const* () const { return cStr(); }
};
} // namespace my
#include <iostream>
using namespace std;
typedef my::StringBuilder S;
void foo( string const& s )
{
cout << s << endl;
}
int main()
{
string const a = "alpha";
string const b = "beta";
string const c = "charlie";
foo( S() << a << b << c ); // Expr-like linear time string building.
}

Related

Can I use a std::string variable as an argument to an operator?

For example, if I have:
#include <iostream>
using namespace std;
int main()
{
int b = 1;
int c = 2;
string a = "(b + c)";
cout << (4 * a) << "\n";
return 0;
}
Is it possible to have string a interpreted literally as if the code had said cout << (4 * (b + c)) << "\n";?
No. C++ is not (designed to be) an interpreted language.
Although it's theoretically possible to override operators for different types, it's very non-recommended to override operators that don't involve your own types. Defining an operator overload for std::string from the standard library may break in future. It's just a bad idea.
But, let's say you used a custom type. You could write a program that interprets the text string as a arithmetic expression. The core of such program would be a parser. It would be certainly possible, but the standard library doesn't provide such parser for you.
Furthermore, such parser wouldn't be able to make connection between the "b" in the text, and the variable b. At the point when the program is running, it has no knowledge of variable names that had been used to compile the program. You would have to specify such information using some form of data structure.
P.S. You forgot to include the header that defines std::string.
#include<string>
///...
auto a=std::to_string(b+c);
std::cout<<4*std::stoi(a)<<std::endl;
You can just use std::string and std::stoi.

creating a literal for a Point - C++

I am trying to do a simple library where the object is a point on the xy-axis.
I want to be able to use literals like this:
Point a = (3,4);
where (3,4) is a point literal.
I read about user defined literals, but (as I understood) this seems to be impossible.
May be "(3,4)"_P is possible as I understand it.
However, I found on this page interesting use of user defined literals as follows:
#include <iostream>
#include <complex>
int main()
{
using namespace std::complex_literals;
std::complex<double> c = 1.0 + 1i;
std::cout << "abs" << c << " = " << abs(c) << '\n';
}
I can under stand the part 1i as a user defined literal, but not the whole thing 1.0 + 1i.
What I am missing, and what is the nearest possible way of getting a literal similar to (x,y) without using ".
As Some programmer dude shows, the best way is to use uniform initialization.
However, just for the fun of it, you can (sort of) do this with User Defined Literals. My idea is to to have 2 literals for each coordinate and overload operator+ between them to create the point.
Remember, this is just for fun, don't use this in a real code:
struct Px { int x; };
struct Py { int y; };
struct Point {
int x;
int y;
};
constexpr auto operator""_px(unsigned long long x) -> Px { return Px{(int)x}; }
constexpr auto operator""_py(unsigned long long y) -> Py { return Py{(int)y}; }
constexpr auto operator+(Px x, Py y) -> Point { return Point{x.x, y.y}; }
then you can have:
auto p = 3_px + 4_py; // p is deduced to type `Point`
Of course this is just a rough framework. Read this great article to learn more about UDLs. You would need to deal with the narrowing conversion in a better way and propper use namespaces to make it a better solution.
As a bonus, you could also use operator, to create a syntax more appropriate to what you had in mind. But, don't do this, as overloading operator, is just evil:
auto operator,(Px x, Py y) -> Point { return Point{x.x, y.y}; }
auto p = (2_px, 1_py); // p is deduced to type `Point`
You can't make up literals on your own, only create suffixes for literals. Like the shown 1i or the standard language f as in 1.0f. (See e.g. this user-defined literal reference for more information.)
What you can to is to use uniform initialization doing something like
Point a = { 3, 4 }; // note the use of curly-braces
Depending on what Point is you might need to add a suitable constructor to make it work.
You have 3 options
Point p = { 1,2 };
Point p2{ 1,2 };
Point p3(1,2);

Composite function in C++

I am a beginner in C++ and want to do simple example of composite function.
For example, in MATLAB, I can write
a = #(x) 2*x
b = #(y) 3*y
a(b(1))
Answer is 6
I searched following questions.
function composition in C++ / C++11 and
Function Composition in C++
But they are created using advanced features, such as templates, to which I am not much familiar at this time. Is there a simple and more direct way to achieve this? In above MATLAB code, user does not need to know implementation of function handles. User can just use proper syntax to get result. Is there such way in C++?
** Another Edit:**
In above code, I am putting a value at the end. However, if I want to pass the result to a third function, MATLAB can still consider it as a function. But, how to do this in C++?
For example, in addition to above code, consider this code:
c = #(p,q) a(p)* b(q) %This results a function
c(1,2)
answer=12
d = #(r) a(b(r))
d(1)
answer=6
function [ output1 ] = f1( arg1 )
val = 2.0;
output1 = feval(arg1,val)
end
f1(d)
answer = 12
In this code, c takes two functions as input and d is composite function. In the next example, function f1 takes a function as argument and use MATLAB builtin function feval to evaluate the function at val.
How can I achieve this in C++?
How about:
#include <iostream>
int main(int, char**)
{
auto a = [](int x) { return 2 * x; };
auto b = [](int y) { return 3 * y; };
for (int i = 0; i < 5; ++i)
std::cout << i << " -> " << a(b(i)) << std::endl;
return 0;
}
Perhaps I'm misunderstanding your question, but it sounds easy:
int a(const int x) { return x * 2; }
int b(const int y) { return y * 3; }
std::cout << a(b(1)) << std::endl;
Regarding your latest edit, you can make a function return a result of another function:
int fun1(const int c) { return a(c); }
std::cout << fun1(1) << std::endl;
Note that this returns a number, the result of calling a, not the function a itself. Sure, you can return a pointer to that function, but then the syntax would be different: you'd have to write something like fun1()(1), which is rather ugly and complicated.
C++'s evaluation strategy for function arguments is always "eager" and usually "by value". The short version of what that means is, a composed function call sequence such as
x = a(b(c(1)));
is exactly the same as
{
auto t0 = c(1);
auto t1 = b(t0);
x = a(t1);
}
(auto t0 means "give t0 whatever type is most appropriate"; it is a relatively new feature and may not work in your C++ compiler. The curly braces indicate that the temporary variables t0 and t1 are destroyed after the assignment to x.)
I bring this up because you keep talking about functions "taking functions as input". There are programming languages, such as R, where writing a(b(1)) would pass the expression b(1) to a, and only actually call b when a asked for the expression to be evaluated. I thought MATLAB was not like that, but I could be wrong. Regardless, C++ is definitely not like that. In C++, a(b(1)) first evaluates b(1) and then passes the result of that evaluation to a; a has no way of finding out that the result came from a call to b. The only case in C++ that is correctly described as "a function taking another function as input" would correspond to your example using feval.
Now: The most direct translation of the MATLAB code you've shown is
#include <stdio.h>
static double a(double x) { return 2*x; }
static double b(double y) { return 3*y; }
static double c(double p, double q) { return a(p) * b(q); }
static double d(double r) { return a(b(r)); }
static double f1(double (*arg1)(double))
{ return arg1(2.0); }
int main()
{
printf("%g\n", a(b(1))); // prints 6
printf("%g\n", c(1,2)); // prints 12
printf("%g\n", d(1)); // prints 6
printf("%g\n", f1(d)); // prints 12
printf("%g\n", f1(a)); // prints 4
return 0;
}
(C++ has no need for explicit syntax like feval, because the typed parameter declaration, double (*arg1)(double) tells the compiler that arg1(2.0) is valid. In older code you may see (*arg1)(2.0) but that's not required, and I think it makes the code less readable.)
(I have used printf in this code, instead of C++'s iostreams, partly because I personally think printf is much more ergonomic than iostreams, and partly because that makes this program also a valid C program with the same semantics. That may be useful, for instance, if the reason you are learning C++ is because you want to write MATLAB extensions, which, the last time I checked, was actually easier if you stuck to plain C.)
There are significant differences; for instance, the MATLAB functions accept vectors, whereas these C++ functions only take single values; if I'd wanted b to call c I would have had to swap them or write a "forward declaration" of c above b; and in C++, (with a few exceptions that you don't need to worry about right now,) all your code has to be inside one function or another. Learning these differences is part of learning C++, but you don't need to confuse yourself with templates and lambdas and classes and so on just yet. Stick to free functions with fixed type signatures at first.
Finally, I would be remiss if I didn't mention that in
static double c(double p, double q) { return a(p) * b(q); }
the calls to a and b might happen in either order. There is talk of changing this but it has not happened yet.
int a(const int x){return x * 2;}
int b(const int x){return x * 3;}
int fun1(const int x){return a(x);}
std::cout << fun1(1) << std::endl; //returns 2
This is basic compile-time composition. If you wanted runtime composition, things get a tad more involved.

Using CRC32 algorithm to hash string at compile-time

Basically I want in my code to be able to do this:
Engine.getById(WSID('some-id'));
Which should get transformed by
Engine.getById('1a61bc96');
just before being compiled into asm. So at compile-time.
This is my try
constexpr int WSID(const char* str) {
boost::crc_32_type result;
result.process_bytes(str,sizeof(str));
return result.checksum();
}
But I get this when trying to compile with MSVC 18 (CTP November 2013)
error C3249: illegal statement or sub-expression for 'constexpr' function
How can I get the WSID function, using this way or any, as long as it is done during compile time?
Tried this: Compile time string hashing
warning C4592: 'crc32': 'constexpr' call evaluation failed; function will be called at run-time
EDIT:
I first heard about this technique in Game Engine Architecture by Jason Gregory. I contacted the author who obligingly answer to me this :
What we do is to pass our source code through a custom little pre-processor that searches for text of the form SID('xxxxxx') and converts whatever is between the single quotes into its hashed equivalent as a hex literal (0xNNNNNNNN). [...]
You could conceivably do it via a macro and/or some template metaprogramming, too, although as you say it's tricky to get the compiler to do this kind of work for you. It's not impossible, but writing a custom tool is easier and much more flexible. [...]
Note also that we chose single quotes for SID('xxxx') literals. This was done so that we'd get some reasonable syntax highlighting in our code editors, yet if something went wrong and some un-preprocessed code ever made it thru to the compiler, it would throw a syntax error because single quotes are normally reserved for single-character literals.
Note also that it's crucial to have your little pre-processing tool cache the strings in a database of some sort, so that the original strings can be looked up given the hash code. When you are debugging your code and you inspect a StringId variable, the debugger will normally show you the rather unintelligible hash code. But with a SID database, you can write a plug-in that converts these hash codes back to their string equivalents. That way, you'll see SID('foo') in your watch window, not 0x75AE3080 [...]. Also, the game should be able to load this same database, so that it can print strings instead of hex hash codes on the screen for debugging purposes [...].
But while preprocess has some main advantages, it means that I have to prepare some kind of output system of modified files (those will be stored elsewhere, and then we need to tell MSVC). So it might complicate the compiling task. Is there a way to preprocess file with python for instance without headaches? But this is not the question, and I'm still interested about using compile-time function (about cache I could use an ID index)
Here is a solution that works entirely at compile time, but may also be used at runtime. It is a mix of constexpr, templates and macros. You may want to change some of the names or put them in a separate file since they are quite short.
Note that I reused code from this answer for the CRC table generation and I based myself off of code from this page for the implementation.
I have not tested it on MSVC since I don't currently have it installed in my Windows VM, but I believe it should work, or at least be made to work with trivial changes.
Here is the code, you may use the crc32 function directly, or the WSID function that more closely matches your question :
#include <cstring>
#include <cstdint>
#include <iostream>
// Generate CRC lookup table
template <unsigned c, int k = 8>
struct f : f<((c & 1) ? 0xedb88320 : 0) ^ (c >> 1), k - 1> {};
template <unsigned c> struct f<c, 0>{enum {value = c};};
#define A(x) B(x) B(x + 128)
#define B(x) C(x) C(x + 64)
#define C(x) D(x) D(x + 32)
#define D(x) E(x) E(x + 16)
#define E(x) F(x) F(x + 8)
#define F(x) G(x) G(x + 4)
#define G(x) H(x) H(x + 2)
#define H(x) I(x) I(x + 1)
#define I(x) f<x>::value ,
constexpr unsigned crc_table[] = { A(0) };
// Constexpr implementation and helpers
constexpr uint32_t crc32_impl(const uint8_t* p, size_t len, uint32_t crc) {
return len ?
crc32_impl(p+1,len-1,(crc>>8)^crc_table[(crc&0xFF)^*p])
: crc;
}
constexpr uint32_t crc32(const uint8_t* data, size_t length) {
return ~crc32_impl(data, length, ~0);
}
constexpr size_t strlen_c(const char* str) {
return *str ? 1+strlen_c(str+1) : 0;
}
constexpr int WSID(const char* str) {
return crc32((uint8_t*)str, strlen_c(str));
}
// Example usage
using namespace std;
int main() {
cout << "The CRC32 is: " << hex << WSID("some-id") << endl;
}
The first part takes care of generating the table of constants, while crc32_impl is a standard CRC32 implementation converted to a recursive style that works with a C++11 constexpr.
Then crc32 and WSID are just simple wrappers for convenience.
If anyone is interested, I coded up a CRC-32 table generator function and code generator function using C++14 style constexpr functions. The result is, in my opinion, much more maintainable code than many other attempts I have seen on the internet and it stays far, far away from the preprocessor.
Now, it does use a custom std::array 'clone' called cexp::array, because G++ seems to not have not added the constexpr keyword to their non-const reference index access/write operator.
However, it is quite light-weight, and hopefully the keyword will be added to std::array in the close future. But for now, the very simple array implementation is as follows:
namespace cexp
{
// Small implementation of std::array, needed until constexpr
// is added to the function 'reference operator[](size_type)'
template <typename T, std::size_t N>
struct array {
T m_data[N];
using value_type = T;
using reference = value_type &;
using const_reference = const value_type &;
using size_type = std::size_t;
// This is NOT constexpr in std::array until C++17
constexpr reference operator[](size_type i) noexcept {
return m_data[i];
}
constexpr const_reference operator[](size_type i) const noexcept {
return m_data[i];
}
constexpr size_type size() const noexcept {
return N;
}
};
}
Now, we need to generate the CRC-32 table. I based the algorithm off some Hacker's Delight code, and it can probably be extended to support the many other CRC algorithms out there. But alas, I only required the standard implementation, so here it is:
// Generates CRC-32 table, algorithm based from this link:
// http://www.hackersdelight.org/hdcodetxt/crc.c.txt
constexpr auto gen_crc32_table() {
constexpr auto num_bytes = 256;
constexpr auto num_iterations = 8;
constexpr auto polynomial = 0xEDB88320;
auto crc32_table = cexp::array<uint32_t, num_bytes>{};
for (auto byte = 0u; byte < num_bytes; ++byte) {
auto crc = byte;
for (auto i = 0; i < num_iterations; ++i) {
auto mask = -(crc & 1);
crc = (crc >> 1) ^ (polynomial & mask);
}
crc32_table[byte] = crc;
}
return crc32_table;
}
Next, we store the table in a global and perform rudimentary static checking on it. This checking could most likely be improved, and it is not necessary to store it in a global.
// Stores CRC-32 table and softly validates it.
static constexpr auto crc32_table = gen_crc32_table();
static_assert(
crc32_table.size() == 256 &&
crc32_table[1] == 0x77073096 &&
crc32_table[255] == 0x2D02EF8D,
"gen_crc32_table generated unexpected result."
);
Now that the table is generated, it's time to generate the CRC-32 codes. I again based the algorithm off the Hacker's Delight link, and at the moment it only supports input from a c-string.
// Generates CRC-32 code from null-terminated, c-string,
// algorithm based from this link:
// http://www.hackersdelight.org/hdcodetxt/crc.c.txt
constexpr auto crc32(const char *in) {
auto crc = 0xFFFFFFFFu;
for (auto i = 0u; auto c = in[i]; ++i) {
crc = crc32_table[(crc ^ c) & 0xFF] ^ (crc >> 8);
}
return ~crc;
}
For sake of completion, I generate one CRC-32 code below and statically check if it has the expected output, and then print it to the output stream.
int main() {
constexpr auto crc_code = crc32("some-id");
static_assert(crc_code == 0x1A61BC96, "crc32 generated unexpected result.");
std::cout << std::hex << crc_code << std::endl;
}
Hopefully this helps anyone else that was looking to achieve compile time generation of CRC-32, or even in general.
#tux3's answer is pretty slick! Hard to maintain, though, because you are basically writing your own implementation of CRC32 in preprocessor commands.
Another way to solve your question is to go back and understand the need for the requirement first. If I understand you right, the concern seems to be performance. In that case, there is a second point of time you can call your function without performance impact: at program load time. In that case, you would be accessing a global variable instead of passing a constant. Performance-wise, after initialization both should be identical (a const fetches 32 bits from your code, a global variable fetches 32 bits from a regular memory location).
You could do something like this:
static int myWSID = 0;
// don't call this directly
static int WSID(const char* str) {
boost::crc_32_type result;
result.process_bytes(str,sizeof(str));
return result.checksum();
}
// Put this early into your program into the
// initialization code.
...
myWSID = WSID('some-id');
Depending on your overall program, you may want to have an inline accessor to retrieve the value.
If a minor performance impact is acceptable, you would also write your function like this, basically using the singleton pattern.
// don't call this directly
int WSID(const char* str) {
boost::crc_32_type result;
result.process_bytes(str,sizeof(str));
return result.checksum();
}
// call this instead. Note the hard-coded ID string.
// Create one such function for each ID you need to
// have available.
static int myWSID() {
// Note: not thread safe!
static int computedId = 0;
if (computedId == 0)
computedId = WSID('some-id');
return computedId;
}
Of course, if the reason for asking for compile-time evaluation is something different (such as, not wanting some-id to appear in the compiled code), these techniques won't help.
The other option is to use Jason Gregory's suggestion of a custom preprocessor. It can be done fairly cleanly if you collect all the IDS into a separate file. This file doesn't need to have C syntax. I'd give it an extension such as .wsid. The custom preprocessor generates a .H file from it.
Here is how this could look:
idcollection.wsid (before custom preprocessor):
some_id1
some_id2
some_id3
Your preprocessor would generate the following idcollection.h:
#define WSID_some_id1 0xabcdef12
#define WSID_some_id2 0xbcdef123
#define WSID_some_id3 0xcdef1234
And in your code, you'd call
Engine.getById(WSID_some_id1);
A few notes about this:
This assumes that all the original IDs can be converted into valid identifiers. If they contain special characters, your preprocessor may need to do additional munging.
I notice a mismatch in your original question. Your function returns an int, but Engine.getById seems to take a string. My proposed code would always use int (easy to change if you want always string).

Uses of C comma operator [duplicate]

This question already has answers here:
What does the comma operator , do?
(8 answers)
Closed 8 years ago.
You see it used in for loop statements, but it's legal syntax anywhere. What uses have you found for it elsewhere, if any?
C language (as well as C++) is historically a mix of two completely different programming styles, which one can refer to as "statement programming" and "expression programming". As you know, every procedural programming language normally supports such fundamental constructs as sequencing and branching (see Structured Programming). These fundamental constructs are present in C/C++ languages in two forms: one for statement programming, another for expression programming.
For example, when you write your program in terms of statements, you might use a sequence of statements separated by ;. When you want to do some branching, you use if statements. You can also use cycles and other kinds of control transfer statements.
In expression programming the same constructs are available to you as well. This is actually where , operator comes into play. Operator , is nothing else than a separator of sequential expressions in C, i.e. operator , in expression programming serves the same role as ; does in statement programming. Branching in expression programming is done through ?: operator and, alternatively, through short-circuit evaluation properties of && and || operators. (Expression programming has no cycles though. And to replace them with recursion you'd have to apply statement programming.)
For example, the following code
a = rand();
++a;
b = rand();
c = a + b / 2;
if (a < c - 5)
d = a;
else
d = b;
which is an example of traditional statement programming, can be re-written in terms of expression programming as
a = rand(), ++a, b = rand(), c = a + b / 2, a < c - 5 ? d = a : d = b;
or as
a = rand(), ++a, b = rand(), c = a + b / 2, d = a < c - 5 ? a : b;
or
d = (a = rand(), ++a, b = rand(), c = a + b / 2, a < c - 5 ? a : b);
or
a = rand(), ++a, b = rand(), c = a + b / 2, (a < c - 5 && (d = a, 1)) || (d = b);
Needless to say, in practice statement programming usually produces much more readable C/C++ code, so we normally use expression programming in very well measured and restricted amounts. But in many cases it comes handy. And the line between what is acceptable and what is not is to a large degree a matter of personal preference and the ability to recognize and read established idioms.
As an additional note: the very design of the language is obviously tailored towards statements. Statements can freely invoke expressions, but expressions can't invoke statements (aside from calling pre-defined functions). This situation is changed in a rather interesting way in GCC compiler, which supports so called "statement expressions" as an extension (symmetrical to "expression statements" in standard C). "Statement expressions" allow user to directly insert statement-based code into expressions, just like they can insert expression-based code into statements in standard C.
As another additional note: in C++ language functor-based programming plays an important role, which can be seen as another form of "expression programming". According to the current trends in C++ design, it might be considered preferable over traditional statement programming in many situations.
I think generally C's comma is not a good style to use simply because it's so very very easy to miss - either by someone else trying to read/understand/fix your code, or you yourself a month down the line. Outside of variable declarations and for loops, of course, where it is idiomatic.
You can use it, for example, to pack multiple statements into a ternary operator (?:), ala:
int x = some_bool ? printf("WTF"), 5 : fprintf(stderr, "No, really, WTF"), 117;
but my gods, why?!? (I've seen it used in this way in real code, but don't have access to it to show unfortunately)
Two killer comma operator features in C++:
a) Read from stream until specific string is encountered (helps to keep the code DRY):
while (cin >> str, str != "STOP") {
//process str
}
b) Write complex code in constructor initializers:
class X : public A {
X() : A( (global_function(), global_result) ) {};
};
I've seen it used in macros where the macro is pretending to be a function and wants to return a value but needs to do some other work first. It's always ugly and often looks like a dangerous hack though.
Simplified example:
#define SomeMacro(A) ( DoWork(A), Permute(A) )
Here B=SomeMacro(A) "returns" the result of Permute(A) and assigns it to "B".
The Boost Assignment library is a good example of overloading the comma operator in a useful, readable way. For example:
using namespace boost::assign;
vector<int> v;
v += 1,2,3,4,5,6,7,8,9;
I had to use a comma to debug mutex locks to put a message before the lock starts to wait.
I could not but the log message in the body of the derived lock constructor, so I had to put it in the arguments of the base class constructor using : baseclass( ( log( "message" ) , actual_arg )) in the initialization list. Note the extra parenthesis.
Here is an extract of the classes :
class NamedMutex : public boost::timed_mutex
{
public:
...
private:
std::string name_ ;
};
void log( NamedMutex & ref__ , std::string const& name__ )
{
LOG( name__ << " waits for " << ref__.name_ );
}
class NamedUniqueLock : public boost::unique_lock< NamedMutex >
{
public:
NamedUniqueLock::NamedUniqueLock(
NamedMutex & ref__ ,
std::string const& name__ ,
size_t const& nbmilliseconds )
:
boost::unique_lock< NamedMutex >( ( log( ref__ , name__ ) , ref__ ) ,
boost::get_system_time() + boost::posix_time::milliseconds( nbmilliseconds ) ),
ref_( ref__ ),
name_( name__ )
{
}
....
};
From the C standard:
The left operand of a comma operator is evaluated as a void expression; there is a sequence point after its evaluation. Then the right operand is evaluated; the result has its type and value. (A comma operator does not yield an lvalue.)) If an attempt is made to modify the result of a comma operator or to access it after the next sequence point, the behavior is undefined.
In short it let you specify more than one expression where C expects only one. But in practice it's mostly used in for loops.
Note that:
int a, b, c;
is NOT the comma operator, it's a list of declarators.
It is sometimes used in macros, such as debug macros like this:
#define malloc(size) (printf("malloc(%d)\n", (int)(size)), malloc((size)))
(But look at this horrible failure, by yours truly, for what can happen when you overdo it.)
But unless you really need it, or you are sure that it makes the code more readable and maintainable, I would recommend against using the comma operator.
You can overload it (as long as this question has a "C++" tag). I have seen some code, where overloaded comma was used for generating matrices. Or vectors, I don't remember exactly. Isn't it pretty (although a little confusing):
MyVector foo = 2, 3, 4, 5, 6;
Outside of a for loop, and even there is has can have an aroma of code smell, the only place I've seen as a good use for the comma operator is as part of a delete:
delete p, p = 0;
The only value over the alternative is you can accidently copy/paste only half of this operation if it is on two lines.
I also like it because if you do it out of habit, you'll never forget the zero assignment. (Of course, why p isn't inside somekind of auto_ptr, smart_ptr, shared_ptr, etc wrapper is a different question.)
Given #Nicolas Goy's citation from the standard, then it sounds like you could write one-liner for loops like:
int a, b, c;
for(a = 0, b = 10; c += 2*a+b, a <= b; a++, b--);
printf("%d", c);
But good God, man, do you really want to make your C code more obscure in this way?
It's very useful in adding some commentary into ASSERT macros:
ASSERT(("This value must be true.", x));
Since most assert style macros will output the entire text of their argument, this adds an extra bit of useful information into the assertion.
In general I avoid using the comma operator because it just makes code less readable. In almost all cases, it would be simpler and clearer to just make two statements. Like:
foo=bar*2, plugh=hoo+7;
offers no clear advantage over:
foo=bar*2;
plugh=hoo+7;
The one place besides loops where I have used it it in if/else constructs, like:
if (a==1)
... do something ...
else if (function_with_side_effects_including_setting_b(), b==2)
... do something that relies on the side effects ...
You could put the function before the IF, but if the function takes a long time to run, you might want to avoid doing it if it's not necessary, and if the function should not be done unless a!=1, then that's not an option. The alternative is to nest the IF's an extra layer. That's actually what I usually do because the above code is a little cryptic. But I've done it the comma way now and then because nesting is also cryptic.
I often use it to run a static initializer function in some cpp files, to avoid lazy initalization problems with classic singletons:
void* s_static_pointer = 0;
void init() {
configureLib();
s_static_pointer = calculateFancyStuff(x,y,z);
regptr(s_static_pointer);
}
bool s_init = init(), true; // just run init() before anything else
Foo::Foo() {
s_static_pointer->doStuff(); // works properly
}
For me the one really useful case with commas in C is using them to perform something conditionally.
if (something) dothis(), dothat(), x++;
this is equivalent to
if (something) { dothis(); dothat(); x++; }
This is not about "typing less", it's just looks very clear sometimes.
Also loops are just like that:
while(true) x++, y += 5;
Of course both can only be useful when the conditional part or executable part of the loop is quite small, two-three operations.
The only time I have ever seen the , operator used outside a for loop was to perform an assingment in a ternary statement. It was a long time ago so I cannot remeber the exact statement but it was something like:
int ans = isRunning() ? total += 10, newAnswer(total) : 0;
Obviously no sane person would write code like this, but the author was an evil genius who construct c statements based on the assembler code they generated, not readability. For instance he sometimes used loops instead of if statements because he preferred the assembler it generated.
His code was very fast but unmaintainable, I am glad I don't have to work with it any more.
I've used it for a macro to "assign a value of any type to an output buffer pointed to by a char*, and then increment the pointer by the required number of bytes", like this:
#define ASSIGN_INCR(p, val, type) ((*((type) *)(p) = (val)), (p) += sizeof(type))
Using the comma operator means the macro can be used in expressions or as statements as desired:
if (need_to_output_short)
ASSIGN_INCR(ptr, short_value, short);
latest_pos = ASSIGN_INCR(ptr, int_value, int);
send_buff(outbuff, (int)(ASSIGN_INCR(ptr, last_value, int) - outbuff));
It reduced some repetitive typing but you do have to be careful it doesn't get too unreadable.
Please see my overly-long version of this answer here.
It can be handy for "code golf":
Code Golf: Playing Cubes
The , in if(i>0)t=i,i=0; saves two characters.
qemu has some code that uses the comma operator within the conditional portion of a for loop (see QTAILQ_FOREACH_SAFE in qemu-queue.h). What they did boils down to the following:
#include <stdio.h>
int main( int argc, char* argv[] ){
int x = 0, y = 0;
for( x = 0; x < 3 && (y = x+1,1); x = y ){
printf( "%d, %d\n", x, y );
}
printf( "\n%d, %d\n\n", x, y );
for( x = 0, y = x+1; x < 3; x = y, y = x+1 ){
printf( "%d, %d\n", x, y );
}
printf( "\n%d, %d\n", x, y );
return 0;
}
... with the following output:
0, 1
1, 2
2, 3
3, 3
0, 1
1, 2
2, 3
3, 4
The first version of this loop has the following effects:
It avoids doing two assignments, so the chances of the code getting out of sync is reduced
Since it uses &&, the assignment is not evaluated after the last iteration
Since the assignment isn't evaluated, it won't try to de-reference the next element in the queue when it's at the end (in qemu's code, not the code above).
Inside the loop, you have access to the current and next element
Found it in array initialization:
In C what exactly happens if i use () to initialize a double dimension array instead of the {}?
When I initialize an array a[][]:
int a[2][5]={(8,9,7,67,11),(7,8,9,199,89)};
and then display the array elements.
I get:
11 89 0 0 0
0 0 0 0 0