C++ domain specific embedded language operators - c++

In numerical oriented languages (Matlab, Fortran) range operator and semantics is very handy when working with multidimensional data.
For example:
A(i:j,k,:n) // represents two-dimensional slice B(i:j,0:n) of A at index k
unfortunately C++ does not have range operator (:). of course it can be emulated using range/slice functor, but semantics is less clean than Matlab. I am prototyping matrix/tensor domain language in C++ and am wondering if there any options to reproduce range operator.
I still would like to rely on C++/prprocessor framework exclusively.
So far I have looked through boost wave which might be an suitable option.
is there any other means to introduce new non-native operators to C++ DSL?
I know you cannot add new operators.am specifically looking for workaround.
One thing I came up (very ugly hack and I do not intend to use):
#define A(r) A[range(((1)?r), ((0)?r))] // assume A overloads []
A(i:j); // abuse ternary operator

A solution that I've used before is to write an external preprocessor that parses the source and replaces any uses of your custom operator with vanilla C++. For your purposes, a : b uses would be replaced with something like a.operator_range_(b), and operator:() declarations with declarations of range_ operator_range_(). In your makefile you then add a rule that preprocesses source files before compiling them. This can be done with relative ease in Perl.
However, having worked with a similar solution in the past, I do not recommend it. It has the potential to create maintainability and portability issues if you do not remain vigilant of how source is processed and generated.

No -- you can't define your own operators in C++. Bjarne Stroustrup details why..

As Billy said, you cannot overload operators. However, you can come very close yo what you want with "regular" operator overloading (and maybe some template metaprogramming). It would be quite easy to allow for something like this:
#include <iostream>
class FakeNumber {
int n;
public:
FakeNumber(int nn) : n(nn) {}
operator int() const { return n; }
};
class Range {
int f, t;
public:
Range(const int& ff, const int& tt) : f(ff), t(tt) {};
int from() const { return f; }
int to() const { return t; }
};
Range operator-(const FakeNumber& a, const int b) {
return Range(a,b);
}
class Matrix {
public:
void operator()(const Range& a, const Range& b) {
std::cout << "(" << a.from() << ":" << a.to() << "," << b.from() << ":" << b.to() << ")" << std::endl;
}
};
int main() {
FakeNumber a=1,b=2,c=3,d=4;
Matrix m;
m(a-b,c-d);
return 0;
}
The downside is that This solution doesn't support all-literal expressions. Either from or to have to be user-defined classes, since we can't overload operator- for two primitive types.
You can also overload operator* to allow specifying stepping, like so:
m(a-b*3,c-d); // equivalent to m[a:b:3,c:d]
And overload both versions of operator-- to allow ignoring one of the bounds:
m(a--,--d); // equivalent to m[a:,:d]
Another option is to define two objects, named something like Matrix::start and Matrix::end, or whatever you like, and then instead of using operator--, you could use them, and then the other bound wouldn't have to be a variable and could be a literal:
m(start-15,38-end); // This clutters the syntax however
And you could of course use both ways.
I think it's pretty much the best you can get without resorting to bizarre solutions, such as custom prebuild tools or macro abuse (of the sort Matthieu presented and suggested against using them:)).

An alternative is to build a C++ variant dialect using a program transformation tool.
The DMS Software Reengineering Toolkit is a program transformation engine, with an industrial strength C++ Front End. DMS, using this front end, can parse full C++ (it even has a preprocessor and can retain most preprocessor directives unexpanded), automatically build ASTs and complete symbol tables.
The C++ front end comes in source, using a grammar derived directly from the standard. It is technically straightforward to add new grammar rules including those that would allow ":" syntax as array subscripts as you have described, and as Fortran90+ has implemented. One can then use the program transformation capability of DMS to transform the "new" syntax into "vanilla" C++ for use in conventional C++ compilers. (This scheme is a generalization of the Intentional Programming model of "add DSL concepts to your language").
We in fact did a concept demonstration of "Vector C++" using this approach.
We added a multidimensional Vector datatype, whose storage semantics are only that array elements are distinct. This is different than C++'s model of sequential locations, but you need this different semantic if you want the compiler/transformer to have freedom to lay out memory arbitrarily, and this is fundamental if you want to use SIMD machine instructions and/or efficient cache accesses along different axes.
We added Fortran-90 style scalar and subarray range accesses, added virtually all of F90's array-processing operations, added a good fraction of APL's matrix operations, all by adjusting the DMS C++ grammar.
Finally, we built two translators using DMS transformational capability: one mapping a significant part of this (remember, this was a concept demo) to vanilla C++ so you could compile and run Vector C++ applications on a typical workstation, and the other mapping C++ to a PowerPC C++ dialect with SIMD instruction extensions, and we generated SIMD code that was pretty reasonable we thought. Took us about 6 man-months to do all this.
The customer for this ultimately bailed out (his business model didn't include supporting a custom compiler in spite of his severe need for parallel/SIMD based operations), and it has been languishing on the shelf. We've chosen not to pursue this in the broader market because it isn't clear what the market really is. I'm pretty sure there are organizations for which this would be valuable.
Point is, you really can do this. It is almost impossible using ad hoc methods. It is technically quite straightforward with a strong enough program transformation system. It isn't a walk in the park.

The easiest solution is to use a method on matrix instead of an operator.
A.range(i, j, k, n);
Note that typically you do not use , in a subscript operator [], eg A[i][j] instead of A[i,j]. The second form could be possible by overloading the comma operator but then you force i and j to be objects not numbers.
You could define a range class that could be used as a subscript for your matrix class.
class RealMatrix
{
public:
MatrixRowRangeProxy operator[] (int i) {
return operator[](range(i, 1));
}
MatrixRowRangeProxy operator[] (range r);
// ...
RealMatrix(const MatrixRangeProxy proxy);
};
// A generic view on a matrix
class MatrixProxy
{
protected:
RealMatrix * matrix;
};
// A view on a matrix of a range of rows
class MatrixRowRangeProxy : public MatrixProxy
{
public:
MatrixColRangeProxy operator[] (int i) {
return operator[](range(i, 1));
}
MatrixColRangeProxy operator[] (const range & r);
// ...
};
// A view on a matrix of a range of columns
class MatrixColRangeProxy : public MatrixProxy
{
public:
MatrixRangeProxy operator[] (int i) {
return operator[](range(i, 1));
}
MatrixRangeProxy operator[] (const range & r);
// ...
};
Then you can copy a range from one matrix into another.
RealMatrix A = ...
RealMatrix B = A[range(i,j)][range(k,n)];
Finally by creating a Matrix class that can hold either a RealMatrix or a MatrixProxy you can make a RealMatrix and a MatrixProxy appear the same from the outside.
Note the operator[] on the proxies are not and cannot be virtual.

If you want to have fun, you may check out IdOp.
If you are really working on a project, I don't suggest using this trick though. Maintenance will suffer from clever tricks.
Your best bet is thus to bite the bullet and use explicit notation. A short function called range which yields a custom defined object for which the operators are overloaded seems especially suitable.
Matrix<10,30,50> matrix = /**/;
MatrixView<5,6,7> view = matrix[range(0,5)][range(0,6)][range(0,7)];
Matrix<5,6,7> n = view;
Note that the operator[] only has 4 overloads (const/non-const + basic int / range) and yields a proxy object (until the last dimension). Once applied to the last dimension, it gives a view of the matrix. A normal matrix may be built from a view that has the same dimensions (non-explicit constructor).

Related

Can I iterate over class members as though they were an array in C++?

Suppose I have a class akin to the following:
struct Potato {
Eigen::Vector3d position;
double weight, size;
string name;
};
and a collection of Potatos
std::vector<Potato> potato_farm = {potato1, potato2, ...};
This is pretty clearly an array-of-structures (AoS) layout because, let's say for most purposes, it makes sense to have all of a Potato's data lumped together. However, I might like to do a calculation of the most common name where a structure-of-arrays (SoA) design makes things agnostic to the type of thing with a name (an array of people all with names, an array of places, all with names, etc.) Does C++ have any tools or tricks that makes an AoS layout look like a SoA to do something like this, or is there a better design that accomplishes the same thing?
You can use lambdas to access particular member in algos that work over range:
double mean = std::accumulate( potato_farm.begin(), potato_farm.end(), 0.0, []( double val, const Potato &p ) { return val + p.weight; } ) / potato_farm.size();
if that is not enough you cannot make it look like array of data as that requires objects to be in continuous memory, but you can make it like a container. So you can implement custom iterators (for example random access iterator of type == double which iterates over weight member). How to implement custom iterators is described here. You can probably even make that generic, but it is not clear if that would worse the effort as it is not very simple to implement properly.
Unfortunately, there is no language tool to generically change a struct into SoA. This is actually one of big obstacles when you try to bring SIMD programming into higher level.
You will need to create a SoA manually. However, you can help yourself by creating a reference to SoA objects acting as if it was a regular Potato.
struct Potato {
float position;
double weight, size;
std::string name;
};
struct PotatoSoARef {
float& position;
double& weight;
double& size;
std::string& name;
};
class PotatoSoA {
private:
float* position;
double* weight;
double* size;
std::string* name;
public:
PotatoSoA(std::size_t size) { /* allocate the SoA */ }
PotatoSoARef operator[](std::size_t idx) {
return PotatoSoARef{position[idx], weight[idx], size[idx], name[idx]};
}
};
This way, regardless if you have an AoS or SoA of Potatos, you can access its fields as arr[idx].position etc. (both as r- and l-value). The compiler is likely to optimize the proxy away.
You might want to add other constructors and accessors as well.
You might also be interested in implementing a regular AoS with an operator[] returning a PotatoSoARef if you want functions to have a uniform interface for both AoS and SoA access patterns.
If you are willing to depart from C++ though, you might be interested in language extensions such as Sierra
As Slava has said you aren't going to get SoA-like access out of AoS data without writing your own iterators, and I would think really hard about whether the use of STL algorithms is that important before doing that, especially if this isn't meant to be a generic solution. The primary benefit of SoA data is cache performance anyway, not the particular syntax of whatever containers you're using, and nothing besides actual SoA data is going to get you that.
With range-v3 (not in C++17 :-/ ), you may use Projection or transformation view:
ranges::accumulate(potato_farm, 0., ranges::v3::plus{}, &Potato::weight);
or
auto weightsView = potato_farm | ranges::view::transform([](auto& p) { return p.weight; });
ranges::accumulate(weightsView, 0.);

Comparing objects in C and C++

I'm working on a project where I should instrument a program (written in C and C++) by inserting a print statement before
the statements that respect some criteria. Then, I should compare those values for different executions.
Since in C there are structures, while in C++ one can also define classes, I was wondering if there is a particular method that:
Permits to print primitives as well as complex data structures.
Permits to compare those values, for different executions, based on the format used by the print module (point 1.).
Just an example to clarify my question. Suppose that I have two different executions with this data structure:
struct Point {
int x, y;
}
int main() {
int k = random();
Point p = foo(k);
some_print(p); // Print the value of 'p' in a file
return 0;
}
and then, another module will compare the two values of the point 'p' generated with the two executions.
The pragmatic C++-way of printing an object is defining a friend function:
std::ostream& operator<<(std::ostream& os, const Point& point) {
return os << "(" << point.x << "," << point.y << ")";
}
It's usually class-specific so you need to implement it yourself; however, you might use some form of reflection. Particularly interesting is a CppCon-talk from Antony Polukhin [1] which gives reflection for POD types (like Point above). Generic reflection without external tools is N/A yet (as of 2016), there's a proposal on it. If you can't / don't want to wait, you can do multiple things:
Parse C++ code: ctags comes to mind.
Macros: It's relatively easy to write a FIELDS macro that defines a reflection class and the fields.
FIELDS(
(int)x,
(int)y
)
Tuples: Works only if you define all your fields on the same inheritance level. Inherit privately from a std::tuple<> which contains all your fields. Make const and optionally non-const getters for fields in terms of std::get<>. Then you can iterate over the types of your tuple.
(Would love to add more - pls. write comments if you have ideas.)
All the reflection methods also give you operator==() basically for free. Note that it's more pragmatic to add operator<() when possible. The former can be defined in terms of the first (albeit suboptimally: a == b iff !(a < b) && !(b < a) ) and the latter gives you std::set<> and std::map<>. Or you can do all the comparisons in terms of reflection.
[1] https://www.youtube.com/watch?v=abdeAew3gmQ
what you could do in c++, is to implement an equals method in your specific class.
That way what you could do is have a boolean equals() method, that checks if the objects are similar.
object1.equals(object2) could return either true or false.
to give an example with this answer, take a look at the following(an example i found online):
class car
{
private:
std::string m_make;
std::string m_model;
bool operator== (const Car &c1, const Car &c2)
{
return (c1.m_make== c2.m_make &&
c1.m_model== c2.m_model);
}
}
something like this should be implemented in your own class.

C++ - Check if One Array of Strings Contains All Elements of Another

I've recently been porting a Python application to C++, but am now at a loss as to how I can port a specific function. Here's the corresponding Python code:
def foo(a, b): # Where `a' is a list of strings, as is `b'
for x in a:
if not x in b:
return False
return True
I wish to have a similar function:
bool
foo (char* a[], char* b[])
{
// ...
}
What's the easiest way to do this? I've tried working with the STL algorithms, but can't seem to get them to work. For example, I currently have this (using the glib types):
gboolean
foo (gchar* a[], gchar* b[])
{
gboolean result;
std::sort (a, (a + (sizeof (a) / sizeof (*a))); // The second argument corresponds to the size of the array.
std::sort (b, (b + (sizeof (b) / sizeof (*b)));
result = std::includes (b, (b + (sizeof (b) / sizeof (*b))),
a, (a + (sizeof (a) / sizeof (*a))));
return result;
}
I'm more than willing to use features of C++11.
I'm just going to add a few comments to what others have stressed and give a better algorithm for what you want.
Do not use pointers here. Using pointers doesn't make it c++, it makes it bad coding. If you have a book that taught you c++ this way, throw it out. Just because a language has a feature, does not mean it is proper to use it anywhere you can. If you want to become a professional programmer, you need to learn to use the appropriate parts of your languages for any given action. When you need a data structure, use the one appropriate to your activity. Pointers aren't data structures, they are reference types used when you need an object with state lifetime - i.e. when an object is created on one asynchronous event and destroyed on another. If an object lives it's lifetime without any asynchronous wait, it can be modeled as a stack object and should be. Pointers should never be exposed to application code without being wrapped in an object, because standard operations (like new) throw exceptions, and pointers do not clean themselves up. In other words, pointers should always be used only inside classes and only when necessary to respond with dynamic created objects to external events to the class (which may be asynchronous).
Do not use arrays here. Arrays are simple homogeneous collection data types of stack lifetime of size known at compiletime. They are not meant for iteration. If you need an object that allows iteration, there are types that have built in facilities for this. To do it with an array, though, means you are keeping track of a size variable external to the array. It also means you are enforcing external to the array that the iteration will not extend past the last element using a newly formed condition each iteration (note this is different than just managing size - it is managing an invariant, the reason you make classes in the first place). You do not get to reuse standard algorithms, are fighting decay-to-pointer, and generally are making brittle code. Arrays are (again) useful only if they are encapsulated and used where the only requirement is random access into a simple type, without iteration.
Do not sort a vector here. This one is just odd, because it is not a good translation from your original problem, and I'm not sure where it came from. Don't optimise early, but don't pessimise early by choosing a bad algorithm either. The requirement here is to look for each string inside another collection of strings. A sorted vector is an invariant (so, again, think something that needs to be encapsulated) - you can use existing classes from libraries like boost or roll your own. However, a little bit better on average is to use a hash table. With amortised O(N) lookup (with N the size of a lookup string - remember it's amortised O(1) number of hash-compares, and for strings this O(N)), a natural first way to translate "look up a string" is to make an unordered_set<string> be your b in the algorithm. This changes the complexity of the algorithm from O(NM log P) (with N now the average size of strings in a, M the size of collection a and P the size of collection b), to O(NM). If the collection b grows large, this can be quite a savings.
In other words
gboolean foo(vector<string> const& a, unordered_set<string> const& b)
Note, you can now pass constant to the function. If you build your collections with their use in mind, then you often have potential extra savings down the line.
The point with this response is that you really should never get in the habit of writing code like that posted. It is a shame that there are a few really (really) bad books out there that teach coding with strings like this, and it is a real shame because there is no need to ever have code look that horrible. It fosters the idea that c++ is a tough language, when it has some really nice abstractions that do this easier and with better performance than many standard idioms in other languages. An example of a good book that teaches you how to use the power of the language up front, so you don't build bad habits, is "Accelerated C++" by Koenig and Moo.
But also, you should always think about the points made here, independent of the language you are using. You should never try to enforce invariants outside of encapsulation - that was the biggest source of savings of reuse found in Object Oriented Design. And you should always choose your data structures appropriate for their actual use. And whenever possible, use the power of the language you are using to your advantage, to keep you from having to reinvent the wheel. C++ already has string management and compare built in, it already has efficient lookup data structures. It has the power to make many tasks that you can describe simply coded simply, if you give the problem a little thought.
Your first problem is related to the way arrays are (not) handled in C++. Arrays live a kind of very fragile shadow existence where, if you as much as look at them in a funny way, they are converted into pointers. Your function doesn't take two pointers-to-arrays as you expect. It takes two pointers to pointers.
In other words, you lose all information about the size of the arrays. sizeof(a) doesn't give you the size of the array. It gives you the size of a pointer to a pointer.
So you have two options: the quick and dirty ad-hoc solution is to pass the array sizes explicitly:
gboolean foo (gchar** a, int a_size, gchar** b, int b_size)
Alternatively, and much nicer, you can use vectors instead of arrays:
gboolean foo (const std::vector<gchar*>& a, const std::vector<gchar*>& b)
Vectors are dynamically sized arrays, and as such, they know their size. a.size() will give you the number of elements in a vector. But they also have two convenient member functions, begin() and end(), designed to work with the standard library algorithms.
So, to sort a vector:
std::sort(a.begin(), a.end());
And likewise for std::includes.
Your second problem is that you don't operate on strings, but on char pointers. In other words, std::sort will sort by pointer address, rather than by string contents.
Again, you have two options:
If you insist on using char pointers instead of strings, you can specify a custom comparer for std::sort (using a lambda because you mentioned you were ok with them in a comment)
std::sort(a.begin(), a.end(), [](gchar* lhs, gchar* rhs) { return strcmp(lhs, rhs) < 0; });
Likewise, std::includes takes an optional fifth parameter used to compare elements. The same lambda could be used there.
Alternatively, you simply use std::string instead of your char pointers. Then the default comparer works:
gboolean
foo (const std::vector<std::string>& a, const std::vector<std::string>& b)
{
gboolean result;
std::sort (a.begin(), a.end());
std::sort (b.begin(), b.end());
result = std::includes (b.begin(), b.end(),
a.begin(), a.end());
return result;
}
Simpler, cleaner and safer.
The sort in the C++ version isn't working because it's sorting the pointer values (comparing them with std::less as it does with everything else). You can get around this by supplying a proper comparison functor. But why aren't you actually using std::string in the C++ code? The Python strings are real strings, so it makes sense to port them as real strings.
In your sample snippet your use of std::includes is pointless since it will use operator< to compare your elements. Unless you are storing the same pointers in both your arrays the operation will not yield the result you are looking for.
Comparing adresses is not the same thing as comparing the true content of your c-style-strings.
You'll also have to supply std::sort with the neccessary comparator, preferrably std::strcmp (wrapped in a functor).
It's currently suffering from the same problem as your use of std::includes, it's comparing addresses instead of the contents of your c-style-strings.
This whole "problem" could have been avoided by using std::strings and std::vectors.
Example snippet
#include <iostream>
#include <algorithm>
#include <cstring>
typedef char gchar;
gchar const * a1[5] = {
"hello", "world", "stack", "overflow", "internet"
};
gchar const * a2[] = {
"world", "internet", "hello"
};
...
int
main (int argc, char *argv[])
{
auto Sorter = [](gchar const* lhs, gchar const* rhs) {
return std::strcmp (lhs, rhs) < 0 ? true : false;
};
std::sort (a1, a1 + 5, Sorter);
std::sort (a2, a2 + 3, Sorter);
if (std::includes (a1, a1 + 5, a2, a2 + 3, Sorter)) {
std::cerr << "all elements in a2 was found in a1!\n";
} else {
std::cerr << "all elements in a2 wasn't found in a1!\n";
}
}
output
all elements in a2 was found in a1!
A naive transcription of the python version would be:
bool foo(std::vector<std::string> const &a,std::vector<std::string> const &b) {
for(auto &s : a)
if(end(b) == std::find(begin(b),end(b),s))
return false;
return true;
}
It turns out that sorting the input is very slow. (And wrong in the face of duplicate elements.) Even the naive function is generally much faster. Just goes to show again that premature optimization is the root of all evil.
Here's an unordered_set version that is usually somewhat faster than the naive version (or was for the values/usage patterns I tested):
bool foo(std::vector<std::string> const& a,std::unordered_set<std::string> const& b) {
for(auto &s:a)
if(b.count(s) < 1)
return false;
return true;
}
On the other hand, if the vectors are already sorted and b is relatively small ( less than around 200k for me ) then std::includes is very fast. So if you care about speed you just have to optimize for the data and usage pattern you're actually dealing with.

Is this use of the "," operator considered bad form?

I have made a list class as a means of replacing variadic functions in my program used for initializing objects that need to contain a changing list of elements. The list class has a usage syntax that I really like. However I haven't seen it used before, so I was wondering if I shouldn't use it just because of that fact? A basic implementation of the list class looks like this...
#include <list>
#include <iostream>
template<typename T>
struct list
{
std::list<T> items;
list(const list&ref):items(ref.items){}
list(){}
list(T var){items.push_back(var);}
list& operator,(list add_){
items.insert(items.end(),add_.items.begin(), add_.items.end());
return *this;
}
list& operator=(list add_){
items.clear();
items.insert(items.end(),add_.items.begin(), add_.items.end());
return *this;
}
list& operator+=(list add_){
items.insert(items.end(),add_.items.begin(), add_.items.end());
return *this;
}
};
This allows me to have use this in code like so...
struct music{
//...
};
struct music_playlist{
list<music> queue;
//...
};
int main (int argc, const char * argv[])
{
music_playlist playlist;
music song1;
music song2;
music song3;
music song4;
playlist.queue = song1,song2; // The queue now contains song1 and song2
playlist.queue+= song1,song3,song4; //The queue now contains two song1s and song2-4
playlist.queue = song2; //the queue now only contains song2
return 0;
}
I really think that the syntax is much nicer than it would of been if I had just exposed a regular stl container, and even nicer (and typesafe) than variadic functions. However, since I have not seen this syntax used, I am curious about whether I should avoid it, because above all the code should be easily understood by other programmers?
EDIT:
In joint with this question, I have posted this question more targeted at solutions to the actual problem.
Why not overload the << operator as QList does? Then use it like:
playlist.queue << song1 << song2; // The queue now contains song1 and song2
playlist.queue << song1 << song3 << song4; //The queue now contains two song1s and song2-4
I agree that your syntax looks nice as you have written it.
My main difficulty with the code is that I would expect the following to be the same
playlist.queue = song1,song2;
playlist.queue = (song1,song2); //more of c-style, as #Iuser notes.
whereas in fact they are completely different.
This is dangerous because its too easy to introduce usage bugs into the code.
If someone likes to use parenthesis to add extra emphasis to groupings (not uncommon) then the comma could become a real pain. For example,
//lets combine differnt playlists
new_playlist.queue = song1 //the first playlist
,(song3,song4) //the second playlist //opps, I didn't add song 3!
, song5; //the third
or
new_playlist.queue = (old_playlist.queue, song6); //opps, I edited my old playlist too!
Incidently, have you come across boost.assign: http://www.boost.org/doc/libs/1_47_0/libs/assign/doc/index.html
Has the precedence changed recently?
playlist.queue = song1,song2;
This should parse as:
(playlist.queue = song1) , song2;
Your ',' and '+=' are the same!
It would be a better semantic match if your comma operator were to create a temporary list, insert the left and right items and return the temporary. Then you could write it like this;
playlist.queue = (song1,song2);
with explicit parens. That would give C-programmers a fighting chance at being able to read the code.
A bit of a problem is that if the compiler cannot choose your overloaded operator comma, it can fall back on using the built-in operator.
In contrast, with Boost.Assign mixing up types produces a compilation error.
#include <boost/assign.hpp>
int main()
{
int one = 1;
const char* two = "2";
list<int> li;
li = one, two;
using namespace boost::assign;
std::list<int> li2;
li2 += one, two;
}
This is probably something that belongs over on Programmers, but here's my two cents.
If you're talking about code that has a fairly narrow context, where users will use it in a couple of places and that's all, then overloading the , operator is probably OK. If you're building a domain-specific language that is used in a particular domain and nowhere else, it's probably fine.
The issue comes when you're overloading it for something that you expect the user to use with some frequency.
Overloading , means that the reader needs to completely reinterpret how they read your code. They can't just look at an expression and instantly know what it does. You're messing with some of the most basic assumptions that C++ programmers make when it comes to scanning code.
Do that at your own peril.
I am curious about whether I should avoid it, because above all the
code should be easily understood by other programmers
If the goal is to make your code easy for other C++ programmers to understand, overriding operators to give them a meaning that's very different from that of standard C++ is not a good start. Readers shouldn't have to a) understand how you've implemented your container and b) recalibrate their understanding of standard operators just to be able to make sense of your code.
I can appreciate the Boost precedent for this sort of thing. If you're pretty sure that most of the people who will read your code will also be familiar with Boost Assign, your own override of operator, might be pretty reasonable. Still, I'd suggest following #badzeppelin's suggestion to use operator<< instead, just as iostreams does. Every C++ developer can be counted on to have run into code like:
cout << "Hello world!"`
and your append operation is very similar to writing to a stream.
It's bad on so many levels...
You're overriding list and shadowingstd::list. A big no-no. If you want your own list class - make it be with a different name, don't shadow the standard library.
Using , in such way is not readable. The return value of the operator is the right operand. Even if your code works, for an external reader it won't be obvious why, and it's a bad thing. Code should be readable, not nice.
There is nothing bad about using comma operator , using specifically. Any operator leaves bad taste, if exploited. In your code, I don't see any reasonable problem. Only one suggestion, I would like to give is:
list& operator,(list &add_){ // <--- pass by reference to avoid copies
*this += add_; // <--- reuse operator +=
return *this;
}
This way, you have to always edit just operator +=, if you want any change in logic. Note that, my answer is in the perspective of readability and code maintenance in general. I will not raise concern about business logic you use.

Best way to store constant data in C++

I have an array of constant data like following:
enum Language {GERMAN=LANG_DE, ENGLISH=LANG_EN, ...};
struct LanguageName {
ELanguage language;
const char *name;
};
const Language[] languages = {
GERMAN, "German",
ENGLISH, "English",
.
.
.
};
When I have a function which accesses the array and find the entry based on the Language enum parameter. Should I write a loop to find the specific entry in the array or are there better ways to do this.
I know I could add the LanguageName-objects to an std::map but wouldn't this be overkill for such a simple problem? I do not have an object to store the std::map so the map would be constructed for every call of the function.
What way would you recommend?
Is it better to encapsulate this compile time constant array in a class which handles the lookup?
If the enum values are contiguous starting from 0, use an array with the enum as index.
If not, this is what I usually do:
const char* find_language(Language lang)
{
typedef std::map<Language,const char*> lang_map_type;
typedef lang_map_type::value_type lang_map_entry_type;
static const lang_map_entry_type lang_map_entries[] = { /*...*/ }
static const lang_map_type lang_map( lang_map_entries
, lang_map_entries + sizeof(lang_map_entries)
/ sizeof(lang_map_entries[0]) );
lang_map_type::const_iterator it = lang_map.find(lang);
if( it == lang_map.end() ) return NULL;
return it->second;
}
If you consider a map for constants, always also consider using a vector.
Function-local statics are a nice way to get rid of a good part of the dependency problems of globals, but are dangerous in a multi-threaded environment. If you're worried about that, you might rather want to use globals:
typedef std::map<Language,const char*> lang_map_type;
typedef lang_map_type::value_type lang_map_entry_type;
const lang_map_entry_type lang_map_entries[] = { /*...*/ }
const lang_map_type lang_map( lang_map_entries
, lang_map_entries + sizeof(lang_map_entries)
/ sizeof(lang_map_entries[0]) );
const char* find_language(Language lang)
{
lang_map_type::const_iterator it = lang_map.find(lang);
if( it == lang_map.end() ) return NULL;
return it->second;
}
There are three basic approaches that I'd choose from. One is the switch statement, and it is a very good option under certain conditions. Remember - the compiler is probably going to compile that into an efficient table-lookup for you, though it will be looking up pointers to the case code blocks rather than data values.
Options two and three involve static arrays of the type you are using. Option two is a simple linear search - which you are (I think) already doing - very appropriate if the number of items is small.
Option three is a binary search. Static arrays can be used with standard library algorithms - just use the first and first+count pointers in the same way that you'd use begin and end iterators. You will need to ensure the data is sorted (using std::sort or std::stable_sort), and use std::lower_bound to do the binary search.
The complication in this case is that you'll need a comparison function object which acts like operator< with a stored or referenced value, but which only looks at the key field of your struct. The following is a rough template...
class cMyComparison
{
private:
const fieldtype& m_Value; // Note - only storing a reference
public:
cMyComparison (const fieldtype& p_Value) : m_Value (p_Value) {}
bool operator() (const structtype& p_Struct) const
{
return (p_Struct.field < m_Value);
// Warning : I have a habit of getting this comparison backwards,
// and I haven't double-checked this
}
};
This kind of thing should get simpler in the next C++ standard revision, when IIRC we'll get anonymous functions (lambdas) and closures.
If you can't put the sort in your apps initialisation, you might need an already-sorted boolean static variable to ensure you only sort once.
Note - this is for information only - in your case, I think you should either stick with linear search or use a switch statement. The binary search is probably only a good idea when...
There are a lot of data items to search
Searches are done very frequently (many times per second)
The key enumerate values are sparse (lots of big gaps) - otherwise, switch is better.
If the coding effort were trivial, it wouldn't be a big deal, but C++ currently makes this a bit harder than it should be.
One minor note - it may be a good idea to define an enumerate for the size of your array, and to ensure that your static array declaration uses that enumerate. That way, your compiler should complain if you modify the table (add/remove items) and forget to update the size enum, so your searches should never miss items or go out of bounds.
I think you have two questions here:
What is the best way to store a constant global variable (with possible Multi-Threaded access) ?
How to store your data (which container use) ?
The solution described by sbi is elegant, but you should be aware of 2 potential problems:
In case of Multi-Threaded access, the initialization could be skrewed.
You will potentially attempt to access this variable after its destruction.
Both issues on the lifetime of static objects are being covered in another thread.
Let's begin with the constant global variable storage issue.
The solution proposed by sbi is therefore adequate if you are not concerned by 1. or 2., on any other case I would recommend the use of a Singleton, such as the ones provided by Loki. Read the associated documentation to understand the various policies on lifetime, it is very valuable.
I think that the use of an array + a map seems wasteful and it hurts my eyes to read this. I personally prefer a slightly more elegant (imho) solution.
const char* find_language(Language lang)
{
typedef std::map<Language, const char*> map_type;
typedef lang_map_type::value_type value_type;
// I'll let you work out how 'my_stl_builder' works,
// it makes for an interesting exercise and it's easy enough
// Note that even if this is slightly slower (?), it is only executed ONCE!
static const map_type = my_stl_builder<map_type>()
<< value_type(GERMAN, "German")
<< value_type(ENGLISH, "English")
<< value_type(DUTCH, "Dutch")
....
;
map_type::const_iterator it = lang_map.find(lang);
if( it == lang_map.end() ) return NULL;
return it->second;
}
And now on to the container type issue.
If you are concerned about performance, then you should be aware that for small data collection, a vector of pairs is normally more efficient in look ups than a map. Once again I would turn toward Loki (and its AssocVector), but really I don't think that you should worry about performance.
I tend to choose my container depending on the interface I am likely to need first and here the map interface is really what you want.
Also: why do you use 'const char*' rather than a 'std::string'?
I have seen too many people using a 'const char*' like a std::string (like in forgetting that you have to use strcmp) to be bothered by the alleged loss of memory / performance...
It depends on the purpose of the array. If you plan on showing the values in a list (for a user selection, perhaps) the array would be the most efficient way of storing them. If you plan on frequently looking up values by their enum key, you should look into a more efficient data structure like a map.
There is no need to write a loop. You can use the enum value as index for the array.
I would make an enum with sequential language codes
enum { GERMAN=0, ENGLISH, SWAHILI, ENOUGH };
The put them all into array
const char *langnames[] = {
"German", "English", "Swahili"
};
Then I would check if sizeof(langnames)==sizeof(*langnames)*ENOUGH in debug build.
And pray that I have no duplicates or swapped languages ;-)
If you want fast and simple solution , Can try like this
enum ELanguage {GERMAN=0, ENGLISH=1};
static const string Ger="GERMAN";
static const string Eng="ENGLISH";
bool getLanguage(const ELanguage& aIndex,string & arName)
{
switch(aIndex)
{
case GERMAN:
{
arName=Ger;
return true;
}
case ENGLISH:
{
arName=Eng;
}
default:
{
// Log Error
return false;
}
}
}