Related
I have a complex algorithm. This uses many variables, calculates helper arrays at initialization and also calculates arrays along the way. Since the algorithm is complex, I break it down into several functions.
Now, I actually do not see how this might be a class from an idiomatic way; I mean, I am just used to have algorithms as functions. The usage would simply be:
Calculation calc(/* several parameters */);
calc.calculate();
// get the heterogenous results via getters
On the other hand, putting this into a class has the following advantages:
I do not have to pass all the variables to the other functions/methods
arrays initialized at the beginning of the algorithm are accessible throughout the class in each function
my code is shorter and (imo) clearer
A hybrid way would be to put the algorithm class into a source file and access it via a function that uses it. The user of the algorithm would not see the class.
Does anyone have valuable thoughts that might help me out?
Thank you very much in advance!
I have a complex algorithm. This uses many variables, calculates helper arrays at initialization and also calculates arrays along the way.[...]
Now, I actually do not see how this might be a class from an idiomatic way
It is not, but many people do the same thing you do (so did I a few times).
Instead of creating a class for your algorithm, consider transforming your inputs and outputs into classes/structures.
That is, instead of:
Calculation calc(a, b, c, d, e, f, g);
calc.calculate();
// use getters on calc from here on
you could write:
CalcInputs inputs(a, b, c, d, e, f, g);
CalcResult output = calculate(inputs); // calculate is now free function
// use getters on output from here on
This doesn't create any problems and performs the same (actually better) grouping of data.
I'd say it is very idiomatic to represent an algorithm (or perhaps better, a computation) as a class. One of the definitions of object class from OOP is "data and functions to operate on that data." A compex algorithm with its inputs, outputs and intermediary data matches this definition perfectly.
I've done this myself several times, and it simplifies (human) code flow analysis significantly, making the whole thing easier to reason about, to debug and to test.
If the abstraction for the client code is an algorithm, you
probably want to keep a pure functional interface, and not
introduce additional types there. It's quite common, on the
other hand, for such a function to be implemented in a source
file which defines a common data structure or class for its
internal use, so you might have:
double calculation( /* input parameters */ )
{
SupportClass calc( /* input parameters */ );
calc.part1();
calc.part2();
// etc...
return calc.results();
}
Depending on how your code is organized, SupportClass will be
in an unnamed namespace in the source file (probably the most
common case), or in a "private" header, included only by the
sources involved in the algorith.
It really depends of what kind of algorithm you want to encapsulate. Generally I agree with John Carmack : "Sometimes, the elegant implementation is just a function. Not a method. Not a class. Not a framework. Just a function."
It really boils down to: do the algorithm need access to the private area of the class that is not supposed to be public? If the answer is yes (unless you are willing to refactor your class interface, depending on the specific cases) you should go with a member function, if not, then a free function is good enough.
Take for example the standard library. Most of the algorithms are provided as free functions because they only access the public interface of the class (with iterators for standard containers, for example).
Do you need to call the exact same functions in the exact same order each time? Then you shouldn't be requiring calling code to do this. Splitting your algorithm into multiple functions is fine, but I'd still have one call the next and then the next and so on, with a struct of results/parameters being passed along the way. A class doesn't feel right for a one-off invocation of some procedure.
The only way I'd do this with a class is if the class encapsulates all the input data itself, and you then call myClass.nameOfMyAlgorithm() on it, among other potential operations. Then you have data+manipulators. But just manipulators? Yeah, I'm not so sure.
In modern C++ the distinction has been eroded quite a bit. Even from the operator overloading of the pre-ANSI language, you could create a class whose instances are syntactically like functions:
struct Multiplier
{
int factor_;
Multiplier(int f) : factor_(f) { }
int operator()(int v) const
{
return v * _factor;
}
};
Multipler doubler(2);
std::cout << doubler(3) << std::endl; // prints 6
Such a class/struct is called a functor, and can capture "contextual" values in its constructor. This allows you to effectively pass the parameters to a function in two stages: some in the constructor call, some later each time you call it for real. This is called partial function application.
To relate this to your example, your calculate member function could be turned into operator(), and then the Calculation instance would be a function! (or near enough.)
To unify these ideas, you can try thinking of a plain function as a functor of which there is only one instance (and hence no need for a constructor - although this is no guarantee that the function only depends on its formal parameters: it might depend on global variables...)
Rather than asking "Should I put this algorithm in a function or a class?" instead ask yourself "Would it be useful to be able to pass the parameters to this algorithm in two or more stages?" In your example, all the parameters go into the constructor, and none in the later call to calculate, so it makes little sense to ask users of your class make two calls.
In C++11 the distinction breaks down further (and things get a lot more convenient), in recognition of the fluidity of these ideas:
auto doubler = [] (int val) { return val * 2; };
std::cout << doubler(3) << std::endl; // prints 6
Here, doubler is a lambda, which is essentially a nifty way to declare an instance of a compiler-generated class that implements the () operator.
Reproducing the original example more exactly, we would want a function-like thing called multiplier that accepts a factor, and returns another function-like thing that accepts a value v and returns v * factor.
auto multiplier = [] (int factor)
{
return [=] (int v) { return v * factor; };
};
auto doubler = multiplier(2);
std::cout << doubler(3) << std::endl; // prints 6
Note the pattern: ultimately we're multiplying two numbers, but we specify the numbers in two steps. The functor we get back from calling multiplier acts like a "package" containing the first number.
Although lambdas are relatively new, they are likely to become a very common part of C++ style (as they have in every other language they've been added to).
But sadly at this point we've reached the "cutting edge" as the above example works in GCC but not in MSVC 12 (I haven't tried it in MSVC 13). It does pass the intellisense checking of MSVC 12 though (they use two completely different compilers)! And you can fix it by wrapping the inner lambda with std::function<int(int)>( ... ).
Even so, you can use these ideas in old-school C++ when writing functors by hand.
Looking further ahead, resumable functions may make it into some future version of the language (Microsoft is pushing hard for them as they are practically identical to async/await in C#) and that is yet another blurring of the distinction between functions and classes (a resumable function acts like a constructor for a state machine class).
So I have this huge tree that is basically a big switch/case with string keys and different function calls on one common object depending on the key and one piece of metadata.
Every entry basically looks like this
} else if ( strcmp(key, "key_string") == 0) {
((class_name*)object)->do_something();
} else if ( ...
where do_something can have different invocations, so I can't just use function pointers. Also, some keys require object to be cast to a subclass.
Now, if I were to code this in a higher level language, I would use a dictionary of lambdas to simplify this.
It occurred to me that I could use macros to simplify this to something like
case_call("key_string", class_name, do_something());
case_call( /* ... */ )
where case_call would be a macro that would expand this code to the first code snippet.
However, I am very much on the fence whether that would be considered good style. I mean, it would reduce typing work and improve the DRYness of the code, but then it really seems to abuse the macro system somewhat.
Would you go down that road, or rather type out the whole thing? And what would be your reasoning for doing so?
Edit
Some clarification:
This code is used as a glue layer between a simplified scripting API which accesses several different aspects of a C++ API as simple key-value properties. The properties are implemented in different ways in C++ though: Some have getter/setter methods, some are set in a special struct. Scripting actions reference C++ objects casted to a common base class. However, some actions are only available on certain subclasses and have to be cast down.
Further down the road, I may change the actual C++ API, but for the moment, it has to be regarded as unchangeable. Also, this has to work on an embedded compiler, so boost or C++11 are (sadly) not available.
I would suggest you slightly reverse the roles. You are saying that the object is already some class that knows how to handle a certain situation, so add a virtual void handle(const char * key) in your base class and let the object check in the implementation if it applies to it and do whatever is necessary.
This would not only eliminate the long if-else-if chain, but would also be more type safe and would give you more flexibility in handling those events.
That seems to me an appropriate use of macros. They are, after all, made for eliding syntactic repetition. However, when you have syntactic repetition, it’s not always the fault of the language—there are probably better design choices out there that would let you avoid this decision altogether.
The general wisdom is to use a table mapping keys to actions:
std::map<std::string, void(Class::*)()> table;
Then look up and invoke the action in one go:
object->*table[key]();
Or use find to check for failure:
const auto i = table.find(key);
if (i != table.end())
object->*(i->second)();
else
throw std::runtime_error(...);
But if as you say there is no common signature for the functions (i.e., you can’t use member function pointers) then what you actually should do depends on the particulars of your project, which I don’t know. It might be that a macro is the only way to elide the repetition you’re seeing, or it might be that there’s a better way of going about it.
Ask yourself: why do my functions take different arguments? Why am I using casts? If you’re dispatching on the type of an object, chances are you need to introduce a common interface.
What's everybody's opinion on using lambdas to do nested functions in C++? For example, instead of this:
static void prepare_eggs()
{
...
}
static void prepare_ham()
{
...
}
static void prepare_cheese()
{
...
}
static fry_ingredients()
{
...
}
void make_omlette()
{
prepare_eggs();
prepare_ham();
prepare_cheese();
fry_ingredients();
}
You do this:
void make_omlette()
{
auto prepare_eggs = [&]()
{
...
};
auto prepare_ham = [&]()
{
...
};
auto prepare_cheese = [&]()
{
...
};
auto fry_ingredients = [&]()
{
...
};
prepare_eggs();
prepare_ham();
prepare_cheese();
fry_ingredients();
}
Having come from the generation that learned how to code by using Pascal, nested functions make perfect sense to me. However, this usage seemed to confuse some of the less experienced developers in my group at work during a code review where I made use of lambdas in this way.
I don't see anything wrong with nested functions per se. I use lambdas for nested functions, but only if it meets some conditions:
It is called in more than once place. (otherwise just write the code directly if it's not too long)
It is really an internal function, so that that calling it in any other context would not make sense.
It's short enough (maybe 10 lines max).
So in your example I would not use lambdas for reason number one.
Conceptually nested functions can be useful for the same reason why private methods in classes are useful. They enforce encapsulation and they make it easier to see the structure of the program. If a function is an implementation detail to some other function then why not make it explicitly so?
The biggest problem I see is with readability; it's more difficult to read code that has a lot of nesting and indenting. Also, people are not very comfortable with lambdas yet so resistance is expected.
For any given piece of code, make it as visible as necessary and as hidden as possible:
If the piece of code is used in only one place, write it there.
If it is used in multiple places inside the same function, emulate nested functions through lambdas.
If it is used by multiple functions, put it in a proper function.
You can already guess that you're doing something unorthodox by the comments you received. This is one of the reasons C++ has bad reputation, people never stop abusing it. Lambdas are mainly used as inline function objects for standard library algorithms and places that require some kind of callback mechanism. I think this covers 99% of use-cases, and it should stay that way!
As Bjarne said in one of his lectures: "Not everything should be a template, and not everything should be an object."
And not everything should be a lambda :) there is nothing wrong with a free standing function.
It's a very limited use case. For starters, the functionality present in the local function must be needed at several spots inside the enclosing function such that the resulting local refactoring will be a win in readability. Otherwise I will write the functionality inline, perhaps putting it in a block if that helps.
But at the same time, the functionality must be local or specific enough that I don't have an incentive to refactor the functionality outside of the (not so) enclosing function, where I could perhaps reuse it entirely in another function at some point. It must also be short: otherwise I'm just going to move it out, perhaps putting it in an anonymous namespace (or namespace detail in a header) or some such. It doesn't take much for me to trade locality off in favour of compactness (long functions are a pain to review).
Note that the above is strictly language agnostic. I don't think C++ spins a particular spin on that. If there is one particular C++ advice I have to give on the topic however, it's that I would proscribe using a default by-reference capture ([&]). There'd be no way to tell if that particular lambda expression describe a closure or a local function without carefully reviewing the whole body. Which wouldn't be that bad (it's not that closures are 'scary') if not for the fact that that by-reference captures ([&], [&foo]) allow mutations even if the lambda is not marked mutable, and by-value captures ([=], [foo]) can make an undesirable copy, or even attempt an impossible copy for move-only types. All in all I'd rather not capture anything at all if it's possible (that's what parameters are for!), and use individual captures when needed. It's especially problematic
To sum up:
// foo is expensive to copy, but ubiquitous enough
// that capturing it rather than passing it as a parameter
// is acceptable
auto const& foo_view = foo;
auto do_quux = [&foo_view](arg_type0 arg0, arg_type1 arg1) -> quux_type
{
auto b = baz(foo_view, arg0, arg1);
b.frobnicate;
return foo_view.quux(b);
};
// use do_quux several times later
In my opinion it's useless, but you can accomplish that using only C++03
void foo() {
struct {
void operator()() {}
} bar;
bar();
}
But once again, IMHO it's useless.
I started using stl containers because they came in very handy when I needed the functionality of a list, set and map and had nothing else available in my programming environment. I did not care much about the ideas behind it. STL documentation was interesting up to the point where it came to functions, etc. Then I skipped reading and just used the containers.
But yesterday, still being relaxed from my holidays, I just gave it a try and wanted to go a bit more the stl way. So I used the transform function (can I have a little bit of applause for me, thank you).
From an academic point of view it really looked interesting and it worked. But the thing that bothers me is that if you intensify the use of those functions, you need thousands of helper classes for mostly everything you want to do in your code. The whole logic of the program is sliced into tiny pieces. This slicing is not the result of good coding habits; it's just a technical need. Something, that makes my life probably harder not easier.
I learned the hard way, that you should always choose the simplest approach that solves the problem at hand. I can't see what, for example, the for_each function is doing for me that justifies the use of a helper class over several simple lines of code that sit inside a normal loop so that everybody can see what is going on.
I would like to know, what you are thinking about my concerns? Did you see it like I do when you started working this way and have changed your mind when you got used to it? Are there benefits that I overlooked? Or do you just ignore this stuff as I did (and will go on doing it, probably).
Thanks.
PS: I know that there is a real for_each loop in boost. But I ignore it here since it is just a convenient way for my usual loops with iterators I guess.
The whole logic of the program is sliced in tiny pieces. This slicing is not the result of good coding habits. It's just a technical need. Something, that makes my life probably harder not easier.
You're right, to a certain extent. That's why the upcoming revision to the C++ standard will add lambda expressions, allowing you to do something like this:
std::for_each(vec.begin(), vec.end(), [&](int& val){val++;})
but I also think it is often a good coding habit to split up your code as currently required. You're effectively separating the code describing the operation you want to do, from the act of applying it to a sequence of values. It is some extra boilerplate code, and sometimes it's just annoying, but I think it also often leads to good, clean, code.
Doing the above today would look like this:
int incr(int& val) { return val+1}
// and at the call-site
std::for_each(vec.begin(), vec.end(), incr);
Instead of bloating up the call site with a complete loop, we have a single line describing:
which operation is performed (if it is named appropriately)
which elements are affected
so it's shorter, and conveys the same information as the loop, but more concisely.
I think those are good things. The drawback is that we have to define the incr function elsewhere. And sometimes that's just not worth the effort, which is why lambdas are being added to the language.
I find it most useful when used along with boost::bind and boost::lambda so that I don't have to write my own functor. This is just a tiny example:
class A
{
public:
A() : m_n(0)
{
}
void set(int n)
{
m_n = n;
}
private:
int m_n;
};
int main(){
using namespace boost::lambda;
std::vector<A> a;
a.push_back(A());
a.push_back(A());
std::for_each(a.begin(), a.end(), bind(&A::set, _1, 5));
return 0;
}
You'll find disagreement among experts, but I'd say that for_each and transform are a bit of a distraction. The power of STL is in separating non-trivial algorithms from the data being operated on.
Boost's lambda library is definitely worth experimenting with to see how you get on with it. However, even if you find the syntax satisfactory, the awesome amount of machinery involved has disadvantages in terms of compile time and debug-ability.
My advice is use:
for (Range::const_iterator i = r.begin(), end = r.end(); i != end(); ++i)
{
*out++ = .. // for transform
}
instead of for_each and transform, but more importantly get familiar with the algorithms that are very useful: sort, unique, rotate to pick three at random.
Incrementing a counter for each element of a sequence is not a good example for for_each.
If you look at better examples, you may find it makes the code much clearer to understand and use.
This is some code I wrote today:
// assume some SinkFactory class is defined
// and mapItr is an iterator of a std::map<int,std::vector<SinkFactory*> >
std::for_each(mapItr->second.begin(), mapItr->second.end(),
checked_delete<SinkFactory>);
checked_delete is part of boost, but the implementation is trivial and looks like this:
template<typename T>
void checked_delete(T* pointer)
{
delete pointer;
}
The alternative would have been to write this:
for(vector<SinkFactory>::iterator pSinkFactory = mapItr->second.begin();
pSinkFactory != mapItr->second.end(); ++pSinkFactory)
delete (*pSinkFactory);
More than that, once you have that checked_delete written once (or if you already use boost), you can delete pointers in any sequence aywhere, with the same code, without caring what types you're iterating over (that is, you don't have to declare vector<SinkFactory>::iterator pSinkFactory).
There is also a small performance improvement from the fact that with for_each the container.end() will be only called once, and potentially great performance improvements depending on the for_each implementation (it could be implemented differently depending on the iterator tag received).
Also, if you combine boost::bind with stl sequence algorithms you can make all kinds of fun stuff (see here: http://www.boost.org/doc/libs/1_43_0/libs/bind/bind.html#with_algorithms).
I guess the C++ comity has the same concerns. The to be validated new C++0x standard introduces lambdas. This new feature will enable you to use the algorithm while writing simple helper functions directly in the algorithm parameter list.
std::transform(in.begin(), int.end(), out.begin(), [](int a) { return ++a; })
Local classes are a great feature to solve this. For example:
void IncreaseVector(std::vector<int>& v)
{
class Increment
{
public:
int operator()(int& i)
{
return ++i;
}
};
std::for_each(v.begin(), v.end(), Increment());
}
IMO, this is way too much complexity for just an increment, and it'll be clearer to write it in the form of a regular plain for loop. But when the operation you want to perform over a sequence becomes mor complex. Then I find it useful to clearly separate the operation to be performed over each element from the actual loop sentence. If your functor name is properly chosen, code gets a descriptive plus.
These are indeed real concerns, and these are being addressed in the next version of the C++ standard ("C++0x") which should be published either at the end of this year or in 2011. That version of C++ introduces a notion called C++ lambdas which allow for one to construct simple anonymous functions within another function, which makes it very easy to accomplish what you want without breaking your code into tiny little pieces. Lambdas are (experimentally?) supported in GCC as of GCC 4.5.
Those libraries like STL and Boost are complex also because they need to solve every need and work on any plateform.
As a user of these libraries -- you're not planning on remaking .NET are you? -- you can use their simplified goodies.
Here is possibly a simpler foreach from Boost I like to use:
BOOST_FOREACH(string& item in my_list)
{
...
}
Looks much neater and simpler than using .begin(), .end(), etc. and yet it works for pretty much any iteratable collection (not just arrays/vectors).
I've found myself in a strange place, mentally. In a C++ project, I long for closures.
Background. There's a Document-type class with a public Render method which spawns a deep call tree. There's some transient state that only makes sense during rendering. Right now it resides in the class like regular member variables. However, this is not satisfactory on some levels - this data only makes sense during a Render call, why store it all the time? Passing it around in arguments would be ugly - there are around 15 variables there. Passing around a structure would add a lot of "RenderState->..." in the lower-level methods.
So what do I want? I want the world, like we all do. Specifically, a set of variables that are:
available to some methods in a class (not all of them)
accessible by name alone (no pState->... stuff - so that refactoring is easy)
not copied around on every method call
only live during a method call and up its call tree (assuming trees grow up)
live on a stack
I know I can have some of those properties with C++ - but not all of them. Tell me I'm not turning weird.
Heck, in Pascal, of all places, nested functions give you all that...
So what is a good workaround to emulate closures in C++, getting as many of the above benefits as possible?
Standard C++ since C++11 provides native lambda expressions and several compilers (VC10+ GCC and clang at least) implements it.
With GCC and Clang you can activate it with "--std=c++11" (or use a higher version of C++ if available). VC10 and later versions have it activated without need for flags.
By the way, you can also use boost::lambda (that is not perfect but works with C++03) also provide lambda in C++.
You don't have nested functions, but you have local classes:
void Document::Render(Param)
{
class RenderState
{
public:
RenderState(Document&)
{
//...
}
void Go(Param);
private:
// "Nested" functions
// ....
// Data that nested functions operate on
// ...
};
RenderState s(*this);
s.Go(Param);
}
See this GotW article for more information
Personally, I'd go with the RenderState approach.
Alternatively, if there's a well-defined set of Render-only functions that all require access to the same data, I'd seriously investigate pulling those into their own DocumentRenderer class that contains both the appropriate methods and the appropriate member variables. (This is similar to Fowler's "method object" refactoring.)
C++ doesn't have nested functions, but local classes can serve as an imperfect solution. (Imperfect because local classes' methods cannot access variables of the enclosing class and because they can't be used to instantiate templates.) A local class is simply a class that's declared, along with its methods, within the body of a function. Herb Sutter discusses local classes in more detail here.
Local classes are used to implement Boost's ScopeExit library. ScopeExit's reviewers noted that ScopeExit "suggests a method for creating a general closure mechanism as a library," so if you aren't happy with a RenderState or DocumentRenderer approach, ScopeExit's implementation may give you some ideas for closures in C++.
Currently there are no closures in C++ that would generate "orinary" first-class functions (whether member or non-member). Moreover, there's no standard way to implement such closures.
Closure semantics is available for functors in template metaprogramming at compile time, but that's a completely different kind of beast. In order to obtain a true run-time closure functionality for first-class functions you haver no other choice but to use a non-standard low-level implementation like this one, for example.
A functor is basically a closure.
Why the downvotes? Take Érics comment, change void Go(Param); to void operator () (Param); and there you have it.
There is no way to keep the stack in a native application after the function has exited. But this would be neccesary to make closures like the ones in Javascript. And there is no way to reference a function's stack without doing anything evil. A class that acts like a function (=a functor) would have to get all the relevant information passed somehow, but this is as close as you get in C++. It has state, it has code, and you can pass it around.
Please explain, where am I wrong?
As long as the local variables you want to bind are in scope, you can try something like the following to bind them to your inner class. Though, if you have read the above posted GotW article, it is a fragile solution.
#include <iostream>
using namespace std;
int main() {
int x = 1;
cout << x << endl; // 1
class Inner {
public:
Inner(int& x) : bound_x(x) {}
void do_sth() { ++bound_x; }
private:
int& bound_x;
};
Inner i(x);
i.do_sth();
cout << x << endl; // 2
x = 5;
i.do_sth();
cout << x << endl; // 6
return 0;
}