Having seen the advantages of metaprogramming in Ruby and Python, but being bound to lower-level languages like C++ and C for actual work, I'm thinking of manners by which to combine the two. One instance comes in the simple problem for sorting lists of arbitrary structures/classes. For instance:
struct s{
int a;
int b;
};
vector<s> vec;
for(int x=0;x<10;x++){
s inst;
inst.a = x;
inst.b = x+10;
vec.push_back(inst);
}
Ultimately, I'd like to be able to sort inst arbitrarily with a minimal amount of boilerplate code. The easiest way I can see to do this is to make use of STL's sort:
sort(vec.begin(),vec.end());
Yet this requires me to write a method that can compare "struct s"s. What I'd rather do is:
sort(vec,a ASC,b DESC);
Which is very clearly not valid C++.
What is the best way to accomplish my dream? If I had some sort of typeful macro, that would reveal to me what the type of a vector's elements were, then it would be trivial to write C preprocessor macros to create the function required to do the sorting.
The alternative seems to be to write my own preprocessor. This works well, up until the point where I have to deduce the type of "vec" again. Is there an easy way to do this?
Context: Less code = less bugs, programming competitions.
For the above, you can use Boost.Lambda to write your comparison function inline, just like a Python lambda:
using namespace boost::lambda;
std::sort(vec.begin(), vec.end(), (_1 ->* &s::a) < (_2 ->* &s::a));
This of course assumes that you are sorting by a.
If the expressions you are looking for are far more complex, you are better off writing a separate function; even in languages like Python and Ruby with native support for closures, complex closures become quite unreadable anyway.
Warning: The code above is untested.
Hope this helps!
I would stick with writing a comparison operator for the struct. The bonus of having a comparison operator defined is that you don't end up with multiple lambda comparisons scattered all over the place. Chances are that you will need a comparison operator more than just once, so why not define it once in the logical place (along with the type)?
Personally, I prefer writing code once and keeping it some place that is particularly easy to find. I also favor writing code that is idiomatic with respect to the language that I am writing in. In C++, I expect constructors, destructors, less-than operators, and the like. You are better off writing a less-than operator and then letting std::sort(vec.begin(), vec.end()) do its proper job. If you really want to make your code clear, then do something like:
struct S {
int a, b;
bool less_than(S const& other) {...};
};
bool operator<(S const& left, S const& right) {
return left.less_than(right);
}
If you define a member function to do the comparison and then provide the operator at the namespace-level, life is much easier when you have to negate the comparison. For example:
void foo(std::vector<S>& svec) {
std::sort(svec.begin(), svec.end(), std::not1(&S::less_than));
}
This code is untested but you get the idea.
If you are using C++11, you can use Linq to sort it like this:
auto q = LINQ(from(x, vec) orderby(ascending x.a, descending x.b));
Or if you don't like the query syntax, you can use the extension methods as well:
auto q = vec | linq::order_by([](s x) { return x.a; })
| linq::then_by_descending([](s x) { return x.b; });
Both are functionally equivalent.
For c++ the standard library offers the algorithms header which contains many useful functions that work on various containers. An example for your purposes would be:
bool sCompare(const s & s1, const s & s2) {
return s1.a+s1.b/1000 < s2/a+s2.b/1000;
}
vector<s> vec;
...
std::sort(vec.begin(), vec.end(), sCompare);
sort has a prototype that looks something like:
template<class Iter, class Op>
void sort(Iter& start, Iter& stop, Op& op);
Most of these algorithms should work for any of the standard containers (some are specific to sorted containers, some associative, etc). I believe sort (and others) will even work with arrays (Iterators, the foundation of algorithms, are built to emulate pointers to array elements as closely as possible.)
In short, using modern c++ you will not need a special preprocessor to achieve what you're trying to do.
BTW, if you've declared that you're using std or std::sort, then sort(vec.begin(),vec.end()) is valid c++;
Related
I'm using the ranges library to help filer data in my classes, like this:
class MyClass
{
public:
MyClass(std::vector<int> v) : vec(v) {}
std::vector<int> getEvens() const
{
auto evens = vec | ranges::views::filter([](int i) { return ! (i % 2); });
return std::vector<int>(evens.begin(), evens.end());
}
private:
std::vector<int> vec;
};
In this case, a new vector is constructed in the getEvents() function. To save on this overhead, I'm wondering if it is possible / advisable to return the range directly from the function?
class MyClass
{
public:
using RangeReturnType = ???;
MyClass(std::vector<int> v) : vec(v) {}
RangeReturnType getEvens() const
{
auto evens = vec | ranges::views::filter([](int i) { return ! (i % 2); });
// ...
return evens;
}
private:
std::vector<int> vec;
};
If it is possible, are there any lifetime considerations that I need to take into account?
I am also interested to know if it is possible / advisable to pass a range in as an argument, or to store it as a member variable. Or is the ranges library more intended for use within the scope of a single function?
This was asked in op's comment section, but I think I will respond it in the answer section:
The Ranges library seems promising, but I'm a little apprehensive about this returning auto.
Remember that even with the addition of auto, C++ is a strongly typed language. In your case, since you are returning evens, then the return type will be the same type of evens. (technically it will be the value type of evens, but evens was a value type anyways)
In fact, you probably really don't want to type out the return type manually: std::ranges::filter_view<std::ranges::ref_view<const std::vector<int>>, MyClass::getEvens() const::<decltype([](int i) {return ! (i % 2);})>> (141 characters)
As mentioned by #Caleth in the comment, in fact, this wouldn't work either as evens was a lambda defined inside the function, and the type of two different lambdas will be different even if they were basically the same, so there's literally no way of getting the full return type here.
While there might be debates on whether to use auto or not in different cases, but I believe most people would just use auto here. Plus your evens was declared with auto too, typing the type out would just make it less readable here.
So what are my options if I want to access a subset (for instance even numbers)? Are there any other approaches I should be considering, with or without the Ranges library?
Depends on how you would access the returned data and the type of the data, you might consider returning std::vector<T*>.
views are really supposed to be viewed from start to end. While you could use views::drop and views::take to limit to a single element, it doesn't provide a subscript operator (yet).
There will also be computational differences. vector need to be computed beforehand, where views are computed while iterating. So when you do:
for(auto i : myObject.getEven())
{
std::cout << i;
}
Under the hood, it is basically doing:
for(auto i : myObject.vec)
{
if(!(i % 2)) std::cout << i;
}
Depends on the amount of data, and the complexity of computations, views might be a lot faster, or about the same as the vector method. Plus you can easily apply multiple filters on the same range without iterating through the data multiple times.
In the end, you can always store the view in a vector:
std::vector<int> vec2(evens.begin(), evens.end());
So my suggestions is, if you have the ranges library, then you should use it.
If not, then vector<T>, vector<T*>, vector<index> depending on the size and copiability of T.
There's no restrictions on the usage of components of the STL in the standard. Of course, there are best practices (eg, string_view instead of string const &).
In this case, I can foresee no problems with handling the view return type directly. That said, the best practices are yet to be decided on since the standard is so new and no compiler has a complete implementation yet.
You're fine to go with the following, in my opinion:
class MyClass
{
public:
MyClass(std::vector<int> v) : vec(std::move(v)) {}
auto getEvens() const
{
return vec | ranges::views::filter([](int i) { return ! (i % 2); });
}
private:
std::vector<int> vec;
};
As you can see here, a range is just something on which you can call begin and end. Nothing more than that.
For instance, you can use the result of begin(range), which is an iterator, to traverse the range, using the ++ operator to advance it.
In general, looking back at the concept I linked above, you can use a range whenever the conext code only requires to be able to call begin and end on it.
Whether this is advisable or enough depends on what you need to do with it. Clearly, if your intention is to pass evens to a function which expects a std::vector (for instance it's a function you cannot change, and it calls .push_back on the entity we are talking about), you clearly have to make a std::vector out of filter's output, which I'd do via
auto evens = vec | ranges::views::filter(whatever) | ranges::to_vector;
but if all the function which you pass evens to does is to loop on it, then
return vec | ranges::views::filter(whatever);
is just fine.
As regards life time considerations, a view is to a range of values what a pointer is to the pointed-to entity: if the latter is destroied, the former will be dangling, and making improper use of it will be undefined behavior. This is an erroneous program:
#include <iostream>
#include <range/v3/view/filter.hpp>
#include <string>
using namespace ranges;
using namespace ranges::views;
auto f() {
// a local vector here
std::vector<std::string> vec{"zero","one","two","three","four","five"};
// return a view on the local vecotor
return vec | filter([](auto){ return true; });
} // vec is gone ---> the view returned is dangling
int main()
{
// the following throws std::bad_alloc for me
for (auto i : f()) {
std::cout << i << std::endl;
}
}
You can use ranges::any_view as a type erasure mechanism for any range or combination of ranges.
ranges::any_view<int> getEvens() const
{
return vec | ranges::views::filter([](int i) { return ! (i % 2); });
}
I cannot see any equivalent of this in the STL ranges library; please edit the answer if you can.
EDIT: The problem with ranges::any_view is that it is very slow and inefficient. See https://github.com/ericniebler/range-v3/issues/714.
It is desirable to declare a function returning a range in a header and define it in a cpp file
for compilation firewalls (compilation speed)
stop the language server from going crazy
for better factoring of the code
However, there are complications that make it not advisable:
How to get type of a view?
If defining it in a header is fine, use auto
If performance is not a issue, I would recommend ranges::any_view
Otherwise I'd say it is not advisable.
Say I have the following struct in C++
struct Foo {
double a;
int b;
};
And say I have a parameter to some function declared as follows:
const std::initializer_list<Foo> &args;
Is there an concise way to extract just one field from the elements in args to get, for instance, just an std::vector containing each b field from the original args list?
Of course, I know I could do this by just explicitly writing it out as a loop:
std::vector<int> result;
for(auto &x:args) {
result.push_back(x.b);
}
... but given that I can copy an entire initializer_list of any type to a like-typed vector in a single line of C++, just using functions like std::copy and std::back_inserter, I am wondering if there is a more elegant way to do this as well, using stl or C++11 facilities that may already exist.
You could use std::transform and add elements to the vector via std::back_inserter:
std::transform(std::begin(args), std::end(args), std::back_inserter(result),
[] (const Foo & foo) { return foo.b; });
If you find the lambda too verbose you can use std::mem_fn instead (credit goes to #StoryTeller).
std::transform(std::begin(args), std::end(args), std::back_inserter(result), std::mem_fn(&Foo::b));
But then again, your approach isn't necessary bad since it's pretty readable and does the job just fine (might have some performance issues tho).
One solution can be using linq++ like the following:
shared_ptr<vector<Foo>> foo_list;
// suppose foo_list is being filled
shared_ptr<vector> bs = from(foo_list).select(&_1 ->* &Foo::b).get();
I am trying to find out if there is an actual computational benefit to using lambda expressions in C++, namely "this code compiles/runs faster/slower because we use lambda expressions" or is it just a neat development perk open for abuse by poor coders trying to look cool?
I understand this question may seem subjective, but I would much appreciate the opinion of the community on this matter.
The benefit is what's the most important thing in writing computer programs: easier to understand code. I'm not aware of any performance considerations.
C++ allows, to a certain extend, to do Functional Programming. Consider this:
std::for_each( begin, end, doer );
The problem with this is that the function (object) doer
specifies what's done in the loop
yet somewhat hides what's actually done (you have to look up the function object's operator()'s implementation)
must be defined in a different scope than the std::for_each call
contains a certain amount of boilerplate code
is often throw-away code that's not used for anything but this one loop construct
Lambdas considerably improve on all these (and maybe some more I forgot).
I don't think it's nearly as much about the computational performance as increasing the expressive power of the language.
There's no performance benefit per se, but the need for lambda came as a consequence of the wide adoption of the STL and its design ideas.
Specifically, the STL algorithms make frequent use of functors. Without lambda, these functors need to be previously declared to be used. Lambdas make it possible to have 'anonymous', in-place functors.
This is important because there are many situations in which you need to use a functor only once, and you don't want to give a name to it for two reasons: you don't want to pollute the namespace, and in those specific cases the name you give is either vague or extremely long.
I, for instance, use STL a lot, but without C++0x I use much more for() loops than the for_each() algorithm and its cousins. That's because if I were to use for_each() instead, I'd need to get the code from inside the loop and declare a functor for it. Also all the local variables before the loop wouldn't be accessible, so I'd need to write additional code to pass them as parameters to the functor constructor, or another thing equivalent. As a consequence, I tend not to use for_each() unless there's strong motivation, otherwise the code would be longer and more difficult to read.
That's bad, because it's well known that using for_each() and similar algorithms gives much more room to the compiler & the library for optimizations, including automatic parallelism. So, indirectly, lambda will favour more efficient code.
IMO, the most important thing about lambda's is it keeps related code close together. If you have this code:
std::for_each(begin, end, unknown_function);
You need to navigate over to unknown_function to understand what the code does. But with a lambda, the logic can be kept together.
Lambdas are syntactic sugar for functor classes, so no, there is no computational benefit. As far as the motivation, probably any of the other dozen or so popular languages which have lambdas in them?
One could argue it aids in the readability of code (having your functor declared inline where it is used).
Although I think other parts of C++0x are more important, lambdas are more than just "syntactic sugar" for C++98 style function objects, because they can capture contexts, and they do so by name and then they can take those contexts elsewhere and execute. This is something new, not something that "compiles faster/slower".
#include <iostream>
#include <vector>
#include <functional>
void something_else(std::function<void()> f)
{
f(); // A closure! I wonder if we can write in CPS now...
}
int main()
{
std::vector<int> v(10,0);
std::function<void ()> f = [&](){ std::cout << v.size() << std::endl; };
something_else(f);
}
Well, compare this:
int main () {
std::vector<int> x = {2, 3, 5, 7, 11, 13, 17, 19};
int center = 10;
std::sort(x.begin(), x.end(), [=](int x, int y) {
return abs(x - center) < abs(y - center);
});
std::for_each(x.begin(), x.end(), [](int v) {
printf("%d\n", v);
});
return 0;
}
with this:
// why enforce this to be defined nonlocally?
void printer(int v) {
printf("%d\n", v);
}
int main () {
std::vector<int> x = {2, 3, 5, 7, 11, 13, 17, 19};
// why enforce we to define a whole struct just need to maintain a state?
struct {
int center;
bool operator()(int x, int y) const {
return abs(x - center) < abs(y - center);
}
} comp = {10};
std::sort(x.begin(), x.end(), comp);
std::for_each(x.begin(), x.end(), printer);
return 0;
}
"a neat development perk open for abuse by poor coders trying to look cool?"...whatever you call it, it makes code a lot more readable and maintainable. It does not increase the performance.
Most often, a programmer iterates over a range of elements (searching for an element, accumulating elements, sorting elements etc). Using functional style, you immediatly see what the programmer intends to do, as different from using for loops, where everything "looks" the same.
Compare algorithms + lambda:
iterator longest_tree = std::max_element(forest.begin(), forest.end(), [height]{arg0.height>arg1.height});
iterator first_leaf_tree = std::find_if(forest.begin(), forest.end(), []{is_leaf(arg0)});
std::transform(forest.begin(), forest.end(), firewood.begin(), []{arg0.trans(...));
std::for_each(forest.begin(), forest.end(), {arg0.make_plywood()});
with oldschool for-loops;
Forest::iterator longest_tree = it.begin();
for (Forest::const_iterator it = forest.begin(); it != forest.end(); ++it{
if (*it.height() > *longest_tree.height()) {
longest_tree = it;
}
}
Forest::iterator leaf_tree = it.begin();
for (Forest::const_iterator it = forest.begin(); it != forest.end(); ++it{
if (it->type() == LEAF_TREE) {
leaf_tree = it;
break;
}
}
for (Forest::const_iterator it = forest.begin(), jt = firewood.begin();
it != forest.end();
it++, jt++) {
*jt = boost::transformtowood(*it);
}
for (Forest::const_iterator it = forest.begin(); it != forest.end(); ++it{
std::makeplywood(*it);
}
(I know this pieace of code contains syntactic errors.)
I have a sequence, e.g
std::vector< Foo > someVariable;
and I want a loop which iterates through everything in it.
I could do this:
for (int i=0;i<someVariable.size();i++) {
blah(someVariable[i].x,someVariable[i].y);
woop(someVariable[i].z);
}
or I could do this:
for (std::vector< Foo >::iterator i=someVariable.begin(); i!=someVariable.end(); i++) {
blah(i->x,i->y);
woop(i->z);
}
Both these seem to involve quite a bit of repetition / excessive typing. In an ideal language I'd like to be able to do something like this:
for (i in someVariable) {
blah(i->x,i->y);
woop(i->z);
}
It seems like iterating through everything in a sequence would be an incredibly common operation. Is there a way to do it in which the code isn't twice as long as it should have to be?
You could use for_each from the standard library. You could pass a functor or a function to it. The solution I like is BOOST_FOREACH, which is just like foreach in other languages. C+0x is gonna have one btw.
For example:
#include <iostream>
#include <vector>
#include <algorithm>
#include <boost/foreach.hpp>
#define foreach BOOST_FOREACH
void print(int v)
{
std::cout << v << std::endl;
}
int main()
{
std::vector<int> array;
for(int i = 0; i < 100; ++i)
{
array.push_back(i);
}
std::for_each(array.begin(), array.end(), print); // using STL
foreach(int v, array) // using Boost
{
std::cout << v << std::endl;
}
}
Not counting BOOST_FOREACH which AraK already suggested, you have the following two options in C++ today:
void function(Foo& arg){
blah(arg.x, arg.y);
woop(arg.z);
}
std::for_each(someVariable.begin(), someVariable.end(), function);
struct functor {
void operator()(Foo& arg){
blah(arg.x, arg.y);
woop(arg.z);
}
};
std::for_each(someVariable.begin(), someVariable.end(), functor());
Both require you to specify the "body" of the loop elsewhere, either as a function or as a functor (a class which overloads operator()). That might be a good thing (if you need to do the same thing in multiple loops, you only have to define the function once), but it can be a bit tedious too. The function version may be a bit less efficient, because the compiler is generally unable to inline the function call. (A function pointer is passed as the third argument, and the compiler has to do some more detailed analysis to determine which function it points to)
The functor version is basically zero overhead. Because an object of type functor is passed to for_each, the compiler knows exactly which function to call: functor::operator(), and so it can be trivially inlined and will be just as efficient as your original loop.
C++0x will introduce lambda expressions which make a third form possible.
std::for_each(someVariable.begin(), someVariable.end(), [](Foo& arg){
blah(arg.x, arg.y);
woop(arg.z);
});
Finally, it will also introduce a range-based for loop:
for(Foo& arg : my_someVariable)
{
blah(arg.x, arg.y);
woop(arg.z);
}
So if you've got access to a compiler which supports subsets of C++0x, you might be able to use one or both of the last forms. Otherwise, the idiomatic solution (without using Boost) is to use for_eachlike in one of the two first examples.
By the way, MSVS 2008 has a "for each" C++ keyword. Look at How to: Iterate Over STL Collection with for each.
int main() {
int retval = 0;
vector<int> col(3);
col[0] = 10;
col[1] = 20;
col[2] = 30;
for each( const int& c in col )
retval += c;
cout << "retval: " << retval << endl;
}
Prefer algorithm calls to hand-written loops
There are three reasons:
1) Efficiency: Algorithms are often more efficient than the loops programmers produce
2) Correctness: Writing loops is more subject to errors than is calling algorithms.
3) Maintainability: Algorithm calls often yield code that is clearer and more
straightforward than the corresponding explicit loops.
Prefer almost every other algorithm to for_each()
There are two reasons:
for_each is extremely general, telling you nothing about what's really being done, just that you're doing something to all the items in a sequence.
A more specialized algorithm will often be simpler and more direct
Consider, an example from an earlier reply:
void print(int v)
{
std::cout << v << std::endl;
}
// ...
std::for_each(array.begin(), array.end(), print); // using STL
Using std::copy instead, that whole thing turns into:
std::copy(array.begin(), array.end(), std::ostream_iterator(std::cout, "\n"));
"struct functor {
void operator()(Foo& arg){
blah(arg.x, arg.y);
woop(arg.z);
}
};
std::for_each(someVariable.begin(), someVariable.end(), functor());"
I think approaches like these are often needlessly baroque for a simple problem.
do i=1,N
call blah( X(i),Y(i) )
call woop( Z(i) )
end do
is perfectly clear, even if it's 40 years old (and not C++, obviously).
If the container is always a vector (STL name), I see nothing wrong with an index and nothing wrong with calling that index an integer.
In practice, often one needs to iterate over multiple containers of the same size simultaneously and peel off a datum from each, and do something with the lot of them. In that situation, especially, why not use the index?
As far as SSS's points #2 and #3 above, I'd say it could be so for complex cases, but often iterating 1...N is often as simple and clear as anything else.
If you had to explain the algorithm on the whiteboard, could you do it faster with, or without, using 'i'? I think if your meatspace explanation is clearer with the index, use it in codespace.
Save the heavy C++ firepower for the hard targets.
As most programmers I admire and try to follow the principles of Literate programming, but in C++ I routinely find myself using std::pair, for a gazillion common tasks. But std::pair is, IMHO, a vile enemy of literate programming...
My point is when I come back to code I've written a day or two ago, and I see manipulations of a std::pair (typically as an iterator) I wonder to myself "what did iter->first and iter->second mean???".
I'm guessing others have the same doubts when looking at their std::pair code, so I was wondering, has anyone come up with some good solutions to recover literacy when using std::pair?
std::pair is a good way to make a "local" and essentially anonymous type with essentially anonymous columns; if you're using a certain pair over so large a lexical space that you need to name the type and columns, I'd use a plain struct instead.
How about this:
struct MyPair : public std::pair < int, std::string >
{
const int& keyInt() { return first; }
void keyInt( const int& keyInt ) { first = keyInt; }
const std::string& valueString() { return second; }
void valueString( const std::string& valueString ) { second = valueString; }
};
It's a bit verbose, however using this in your code might make things a little easier to read, eg:
std::vector < MyPair > listPairs;
std::vector < MyPair >::iterator iterPair( listPairs.begin() );
if ( iterPair->keyInt() == 123 )
iterPair->valueString( "hello" );
Other than this, I can't see any silver bullet that's going to make things much clearer.
typedef std::pair<bool, int> IsPresent_Value;
typedef std::pair<double, int> Price_Quantity;
...you get the point.
You can create two pairs of getters (const and non) that will merely return a reference to first and second, but will be much more readable. For instance:
string& GetField(pair& p) { return p.first; }
int& GetValue(pair& p) { return p.second; }
Will let you get the field and value members from a given pair without having to remember which member holds what.
If you expect to use this a lot, you could also create a macro that will generate those getters for you, given the names and types: MAKE_PAIR_GETTERS(Field, string, Value, int) or so. Making the getters straightforward will probably allow the compiler to optimize them away, so they'll add no overhead at runtime; and using the macro will make it a snap to create those getters for whatever use you make of pairs.
You could use boost tuples, but they don't really alter the underlying issue: Do your really want to access each part of the pair/tuple with a small integral type, or do you want more 'literate' code. See this question I posted a while back.
However, boost::optional is a useful tool which I've found replaces quite a few of the cases where pairs/tuples are touted as ther answer.
Recently I've found myself using boost::tuple as a replacement for std::pair. You can define enumerators for each member and so it's obvious what each member is:
typedef boost::tuple<int, int> KeyValueTuple;
enum {
KEY
, VALUE
};
void foo (KeyValueTuple & p) {
p.get<KEY> () = 0;
p.get<VALUE> () = 0;
}
void bar (int key, int value)
{
foo (boost:tie (key, value));
}
BTW, comments welcome on if there is a hidden cost to using this approach.
EDIT: Remove names from global scope.
Just a quick comment regarding global namespace. In general I would use:
struct KeyValueTraits
{
typedef boost::tuple<int, int> Type;
enum {
KEY
, VALUE
};
};
void foo (KeyValueTuple::Type & p) {
p.get<KeyValueTuple::KEY> () = 0;
p.get<KeyValueTuple::VALUE> () = 0;
}
It does look to be the case that boost::fusion does tie the identity and value closer together.
As Alex mentioned, std::pair is very convenient but when it gets confusing create a structure and use it in the same way, have a look at std::pair code, it's not that complex.
I don't like std::pair as used in std::map either, map entries should have had members key and value.
I even used boost::MIC to avoid this. However, boost::MIC also comes with a cost.
Also, returning a std::pair results in less than readable code:
if (cntnr.insert(newEntry).second) { ... }
???
I also found that std::pair is commonly used by the lazy programmers who needed 2 values but didn't think why these values where needed together.