I am trying to understand cases that require using functors. Most of the answer on Stackoverflow and other websites put emphasis on being able to define different adders or multipliers regarding benefits of functors.
Can the use of functors go beyond them? What are some other uses of functors?
More often than not, functors are used with other API calls that need some kind of function object. For example, sorting vectors of user-defined objects which don't have operator() or operator< (etc.) defined.
There are some cases where a set of functors may prove useful. One such case comes when you have several algorithms which functionally do the same thing, but achieve varying levels of accuracy. This happens a lot with some numeric optimization problems: given the general form of a matrix, we might use a different technique to find the solution of a linear equation (e.g., sparse vs dense problem-matracies can employ different algorithms to invert the matrix).
In particular, you should consider functors versus lambdas. In modern versions of C++, there really isn't a need to specify a functor unless you're implementing a function/method that needs a functor (or lambda) as an argument. There are some cases to consider: Do you need a unit-test? Is the functor itself a prototype of future functionality? etc.
ADDENDUM: The key thing to consider is that the use of functor/lambda ultimately boils down to a design decision. As #t.niese noted in the comments, you could use just use functions in combination of template arguments. In addition to the previous considerations above, consider whether or not you can make a compile-time or run-time assessment of the needed functionality.
Additionally, as you make design decisions, you may want to consider "Is there a need for this function to be used outside of this specific context?" If the answer is no, that's a compelling argument to choose a lambda over a free function. With regards to functor specifically, this was an important pattern added before the addition of lambdas to the standard. Typically they're defined in a somewhat private context (frequently in the implementation files, thus after compiled into a library, obfuscated to users of the API). Now with lambdas, you can simply define them within another function or even as a function argument, instead of pre-defining them prior to need.
Related
Why is std::range::sort (and other range-based algorithms) implemented in the range namespace? Why isn't it defined as an overload of std::sort taking a range?
It's to avoid disrupting existing code bases. Eric Niebler, Sean Parent and Andrew Sutton discussed different approaches in their design paper D4128.
3.3.6 Algorithm Return Types are Changed to Accommodate Sentinels
... In similar fashion, most algorithm get new return types when they
are generalized to support sentinels. This is a source-breaking change
in many cases. In some cases, like for_each, the change is unlikely to
be very disruptive. In other cases it may be more so. Merely accepting
the breakage is clearly not acceptable. We can imagine three ways to
mitigate the problem:
Only change the return type when the types of the iterator and the sentinel differ. This leads to a slightly more complicated interface
that may confuse users. It also greatly complicates generic code,
which would need metaprogramming logic just to use the result of
calling some algorithms. For this reason, this possibility is not
explored here.
Make the new return type of the algorithms implicitly convertible to the old return type. Consider copy, which currently returns the
ending position of the output iterator. When changed to accommodate
sentinels, the return type would be changed to something like pair<I, O>; that is, a pair of the input and output iterators. Instead of
returning a pair, we could return a kind of pair that is implicitly
convertible to its second argument. This avoids breakage in some, but
not all, scenarios. This subterfuge is unlikely to go completely
unnoticed.
Deliver the new standard library in a separate namespace that users must opt into. In that case, no code is broken until the user
explicitly ports their code. The user would have to accommodate the
changed return types then. An automated upgrade tool similar to clang
modernize can greatly help here.
We, the authors, prefer (3).
Ultimately, it was to be the least disruptive to existing code bases that move onto building using C++20 enabled compilers. It's the approach they themselves preferred, and seems like the rest is history.
tl;dr
Is the customization of boost::hana::transform for std::vector (via specializing boost::hana::tranform_impl for the tag ext::std::vector) a form of Adaptor pattern that wraps the STL's std::transform into Hana's interface function boost::hana::transform?
Why this question
I'm reading Dive into Design Patterns, and I'm a bit bothered by the fact that this resource, just like any others I've been peeping into (including Head First - Design Patterns), doesn't have a single page, probably not even a paragraph, that doesn't contain the words inheritance/virtual/extend/etc, just like inheritance is the only way to understand and use design patterns.
In an attempt to get a less inheritance-biased understanding of what these infamous patterns are, I'm trying to look at them from other perspectives, by asking what they look like in functional programming languages like Haskell (e.g. this question), or, with the present question, if a use of such patters is present in libraries which make use of a lot of template meta-programming, such as Boost.Hana.
The question
If you look at /usr/include/boost/hana/ext/std/vector.hpp, there's some (commented) code for making std::vector a functor in the way that Hana defines it, i.e. by specializing the transform_impl template for std::vector (well, actually for the tag ext::std::vector which is associated to any std::vector<T>).
The specialization obviously resorts to std::transform (and some template tricks/enable_if) to get the job done.
The net effect, however, is that if I uncomment that code, then I'm able to use boost::hana::transform on std::vector, which I couldn't do without uncommenting that code.
On the other hand, std::vector is conceptually a functor whether or not I have boost::hana::transform at my disposal. Indeed, std::transform exists in the STL, and the "only" drawback is that it's not usable in a functional way because it's iterator-based.
Therefore, it looks to me that customizing boost::hana::transform for std::vector (via specializing boost::hana::tranform_impl for the tag ext::std::vector) is a bit like adapting the STL interface to functors, std::transform, to Hana's interface, boost::hana::transform.
Is this a valid interpretation?
The code
Here is a use-case and solution inspired to the linked open source book.
Here is the same use-case where I've used (hopefully correctly) the Hana approach to solve it. Since it solves the same problem, I'm tempted to self-answer that it is indeed a case of Adaptor pattern.
I am making a project that will use mathematic computations a lot. Also I want to be able to simply change the implementation of real numbers. Let's say between float, double, my own implementation and gmplib float types.
So far I thouht of two ways:
I create a class "Number" which will interface with the rest of the program.
I typedef the arithmetic type and write global functions to interface with the rest of the program.
The first choice seems to be more elegant, but the second seems to have less overhead. Is there a third better choice? Also I am worried by the elementary mathematical functions such as sine, cosine, exp... I figured out that to make the switching easy, I should implement them as templates, but my implementations are hopelessly slow.
I am generally new to programming in C++. I was brought up in the comfortable Matlab and Mathematica environments, where I did not have to worry about such things.
You'll want to use templates with constraints to avoid re-implementing things.
For instance, say you want to use sin in your program differently for float and double. You can overload based on type and create specialized templates.
template<class T> T MySin(const T& f) {
return genericSin(f);
}
template<> float MySin<float>(float f) {
return sinf(f);
}
template<> double MySin<double>(double d) {
return sin(d);
}
For functions. The syntax is similar when partially specializing a Math class if you want to go the OO route. This will enable you to call your routines with any type and have the most specialized and most efficient routine called.
Templates are the way I have done this. it makes it easy to specialize what must be specialized, and provides a good way to reuse implementations when it applies to multiple types.
The number type can be done, but it's actually not simple to do right and introduces some restrictions (compared to templates).
Multiple types are just hopelessly complex, if you want something even close to fast, accurate, and simple to maintain. You'd likely end up using templates to implement these correctly if you were to create a global typedef.
Templates provide all the power, control, and flexibility you would need, and they will be faster than the alternatives posted (technically, #2 could be as fast if you resorted to... templates).
a template class like real numbers should work for you. in that you can overload the required functions and if required use template specializations.
in order to improve efficiency use STL algorithms instead of hand written loops.
good luck
Both alternatives are equivalent in terms of encapsulation: There will be a single point in your program where you'll have to change the number type, and this one change will affect your whole program. If presented with those two alternatives, choose the typedef; it is less elegant (=> simpler, and simpler is better) and has the same power.
When you get more comfortable with C++, templating your functions will be a better fit, since the determination of the number type can be made locally instead of globally. With templates, you determine the number type at the instantiation point (most likely the call site), giving much greater flexibility. However, there is a number of pitfalls in templates, and I'd recommend to you that you get a little more experience with C++ first and then start templating.
Does anyone know a way to have double dispatch handled correctly in C++ without using RTTI and dynamic_cast<> and also a solution, in which the class hierarchy is extensible, that is the base class can be derived from further and its definition/implementation does not need to know about that?
I suspect there is no way, but I'd be glad to be proven wrong :)
The first thing to realize is that double (or higher order) dispatch doesn't scale. With single
dispatch, and n types, you need n functions; for double dispatch n^2, and so on. How you
handle this problem partially determines how you handle double dispatch. One obvious solution is to
limit the number of derived types, by creating a closed hierarchy; in that case, double dispatch can
be implemented easily using a variant of the visitor pattern. If you don't close the hierarchy,
then you have several possible approaches.
If you insist that every pair corresponds to a function, then you basically need a:
std::map<std::pair<std::type_index, std::type_index>, void (*)(Base const& lhs, Base const& rhs)>
dispatchMap;
(Adjust the function signature as necessary.) You also have to implement the n^2 functions, and
insert them into the dispatchMap. (I'm assuming here that you use free functions; there's no
logical reason to put them in one of the classes rather than the other.) After that, you call:
(*dispatchMap[std::make_pair( std::type_index( typeid( obj1 ) ), std::type_index( typeid( obj2 ) )])( obj1, obj2 );
(You'll obviously want to wrap that into a function; it's not the sort of thing you want scattered
all over the code.)
A minor variant would be to say that only certain combinations are legal. In this case, you can use
find on the dispatchMap, and generate an error if you don't find what you're looking for.
(Expect a lot of errors.) The same solution could e used if you can define some sort of default
behavior.
If you want to do it 100% correctly, with some of the functions able to handle an intermediate class
and all of its derivatives, you then need some sort of more dynamic searching, and ordering to
control overload resolution. Consider for example:
Base
/ \
/ \
I1 I2
/ \ / \
/ \ / \
D1a D1b D2a D2b
If you have an f(I1, D2a) and an f(D1a, I2), which one should be chosen. The simplest solution
is just a linear search, selecting the first which can be called (as determined by dynamic_cast on
pointers to the objects), and manually managing the order of insertion to define the overload
resolution you wish. With n^2 functions, this could become slow fairly quickly, however. Since
there is an ordering, it should be possible to use std::map, but the ordering function is going to
be decidedly non-trivial to implement (and will still have to use dynamic_cast all over the
place).
All things considered, my suggestion would be to limit double dispatch to small, closed hierarchies,
and stick to some variant of the visitor pattern.
The "visitor pattern" in C++ is often equated with double dispatch. It uses no RTTI or dynamic_casts.
See also the answers to this question.
The first problem is trivial. dynamic_cast involves two things: run-time check and a type cast. The former requires RTTI, the latter does not. All you need to do to replace dynamic_cast with a functionality that does the same without requiring RTTI is to have your own method to check the type at run-time. To do this, all you need is a simple virtual function that returns some sort of identification of what type it is or what more-specific interface it complies to (that can be an enum, an integer ID, even a string). For the cast, you can safely do a static_cast once you have already done the run-time check yourself and you are sure that the type you are casting to is in the object's hierarchy. So, that solves the problem of emulating the "full" functionality of dynamic_cast without needing the built-in RTTI. Another, more involved solution is to create your own RTTI system (like it is done in several softwares, like LLVM that Matthieu mentioned).
The second problem is a big one. How to create a double dispatch mechanism that scales well with an extensible class hierarchy. That's hard. At compile-time (static polymorphism), this can be done quite nicely with function overloads (and/or template specializations). At run-time, this is much harder. As far as I know, the only solution, as mentioned by Konrad, is to keep a dispatch table of function pointers (or something of that nature). With some use of static polymorphism and splitting dispatch functions into categories (like function signatures and stuff), you can avoid having to violate type safety, in my opinion. But, before implementing this, you should think very hard about your design to see if this double dispatch is really necessary, if it really needs to be a run-time dispatch, and if it really needs to have a separate function for each combination of two classes involved (maybe you can come up with a reduced and fixed number of abstract classes that capture all the truly distinct methods you need to implement).
You may want to check how LLVM implement isa<>, dyn_cast<> and cast<> as a template system, since it's compiled without RTTI.
It is a bit cumbersome (requires tidbits of code in every class involved) but very lightweight.
LLVM Programmer's Manual has a nice example and a reference to the implementation.
(All 3 methods share the same tidbit of code)
You can fake the behaviour by implementing the compile-time logic of multiple dispatch yourself. However, this is extremely tedious. Bjarne Stroustrup has co-authored a paper describing how this could be implemented in a compiler.
The underlying mechanism – a dispatch table – could be dynamically generated. However, using this approach you would of course lose all syntactical support. You’d need to to maintain 2-dimensional matrix of method pointers and manually look up the correct method depending on the argument types. This would render a simple (hypothetical) call
collision(foo, bar);
at least as complicated as
DynamicDispatchTable::lookup(collision_signature, FooClass, BarClass)(foo, bar);
since you didn’t want to use RTTI. And this is assuming that all your methods take only two arguments. As soon as more arguments are required (even if those aren’t part of the multiple dispatch) this becomes more complicated still, and would require circumventing type safety.
For some time I've been designing my class interfaces to be minimal, preferring namespace-wrapped non-member functions over member functions. Essentially following Scott Meyer's advice in the article How Non-Member Functions Improve Encapsulation.
I've been doing this with good effect in a few small scale projects, but I'm wondering how well it works on a larger scale. Are there any large, well regarded open-source C++ projects that I can take a look at and perhaps reference where this advice is strongly followed?
Update: Thanks for all the input, but I'm not really interested in opinion so much as finding out how well it works in practice on a larger scale. Nick's answer is closest in this regard, but I'd like to be able to see the code. Any sort of detailed description of practical experiences (positives, negatives, practical considerations, etc) would be acceptable as well.
I do this quite a bit on the project I work on; the largest of which at my current company is around 2M lines, but it's not open source, so I can't provide it as a reference. However, I will say that I agree with the advice, generally speaking. The more you can separate the functionality which is not strictly contained to just one object from that object, the better your design will be.
By way of an example, consider the classic polymorphism example: a Shape base class with subclasses, and a virtual Draw() function. In the real world, Draw() would need to take some drawing context, and potentially be aware of the state of other things being drawn, or the application in general. Once you put all that into each subclass implementation of Draw(), you're likely to have some code overlap, or most of your actual Draw() logic will be in the base class, or somewhere else. Then consider that if you want to re-use some of that code, you'll need to provide more entry points into the interface, and possibly pollute the functions with other code not related to drawing shapes (eg: multi-shape drawing correlation logic). Before long, it'll be a mess, and you'll wish you had a draw function which took a Shape (and context, and other data) instead, and Shape just had functions/data which were entirely encapsulated and not using or referencing external objects.
Anyway, that's my experience/advice, for what it's worth.
I'd argue that the benefit of non-member functions increases as the size of the project increases. The standard library containers, iterators, and algorithms library are proof of this.
If you can decouple algorithms from data structures (or, to phrase it another way, if you can decouple what you do with objects from how their internal state is manipulated), you can decrease coupling between your classes and take greater advantage of generic code.
Scott Meyers isn't the only author who has argued in favor of this principle; Herb Sutter has too, especially in Monoliths Unstrung, which ends with the guideline:
Where possible, prefer writing functions as nonmember nonfriends.
I think one of the best examples of an unneccessary member function from that article is std::basic_string::find; there is no reason for it to exist, really, as std::find provides exactly the same functionality.
OpenCV library does this. They have a cv::Mat class that presents a 3D matrix (or images). Then they have all the other functions in the cv namespace.
OpenCV library is huge and is widely regarded in its field.
One practical advantage of writing functions as nonmember nonfriends is that doing so can significantly reduce the time it takes to thoroughly test and verify the code.
Consider, for example, the sequence container member functions insert and push_back. There are at least two approaches to implementing push_back:
It can simply call insert (it's behavior is defined in terms of insert anyway)
It can do all the work that insert would do (possibly calling private helper functions) without actually calling insert
Obviously, when implementing a sequence container, you probably want to use the first approach. push_back is just a special form of insert and (to the best of my knowledge) you can't really get any performance benefit by implementing push_back some other way (at least not for list, deque, or vector).
However, to thoroughly test such a container, you have to test push_back separately: since push_back is a member function, it can modify any and all of the internal state of the container. From a testing standpoint, you should (must?) assume that push_back is implemented using the second approach because it is possible that it could be implemented using the second approach. There is no guarantee that it is implemented in terms of insert.
If push_back is implemented as a nonmember nonfriend, it can't touch any of the internal state of the container; it must use the first approach. When you write tests for it, you know that it can't break the internal state of the container (assuming the actual container member functions are implemented correctly). You can use that knowledge to significantly reduce the number of tests that you need to write to fully exercise the code.
(I don't have time to write this up nicely, the following's a 5 minute brain dump which doubtless can be ripped apart at various trival levels, but please address the concepts and general thrust.)
I have considerable sympathy for the position taken by Jonathan Grynspan, but want to say a bit more about it than can reasonably be done in comments.
First - a "well said" to Alf Steinbach, who chipped in with "It's only over-simplified caricatures of their viewpoints that might seem to be in conflict. For what it's worth I don't agree with Scott Meyers on this matter; as I see it he's over-generalizing here, or he was."
Scott, Herb etc. were making these points when few people understood the trade-offs or alternatives, and they did so with disproportionate strength. Some nagging hassles people had during evolution of code were analysed and a new design approach addressing those issues was rationally derived. Let's return to the question of whether there were downsides later, but first - worth saying that the pain in question was typically small and infrequent: non-member functions are just one small aspect of designing reusable code, and in enterprise scale systems I've worked on simply writing the same kind of code you'd have put into a member function as a non-member is rarely enough to make the non-members reusable. It's pretty rare for them to even express algorithms that are both complex enough to be worth reusing and yet not tightly bound to the specific of the class they were designed for, that being weird enough that it's practically inconceivable some other class will happen along supporting the same operations and semantics. Often, you also need to template arguments, or introduce a base class to abstract the set of operations required. Both have significant implications in terms of performance, being inline vs out-of-line, client-code recompilation.
That said, there's often less code changes and impact study required when changing implementation if operations have been implementing in terms of a public interface, and being a non-friend non-member systematically enforces that. Occasionally though, it makes the initial implementation more verbose or in some other way less desirable and maintainble.
But, as a litmus test - how many of these non-member functions sit in the same header as the only class for which they're currently applicable? How many want to abstract their arguments via templates (which means inlining, compilation dependencies) or base classes (virtual function overheads) to allow reuse? Both discourage people from seeing them as reusable, but when not the case, the operations available on a class are delocalised, which can frustrate developers perception of a system: the develop often has to work out for themselves the rather disappointing fact that - "oh - that will only work for class X".
Bottom line: most member functions aren't potentially reusable. Much corporate code isn't broken into clean algorithm versus data with potential for reuse of the former. That kind of division just isn't required or useful or conceivably useful 20 years down the road. It's much the same as get/set methods - they're needed at certain API boundaries, but can constitute needless verbosity when ownership and use of the code is localised.
Personally, I don't have an all or nothing approach to this, but decide what to make a member function or non-member based on whether there's any likely benefit to either, potential reusability versus locality of interface.
I also do this alot, where it seems to make sense, and it causes absolutely no problems with scaling. (although my current project is only 40000 LOC) In fact, I think it makes the code more scalable - it slims down classes, reduces dependencies.
It sometimes requires you to refactor your functions to make them independent of members of the class - and thereby often creating a library of more general helper functions, which you can easly reuse elsewhere. I'd also mention that one of the common problems with many large projects is the bloating of classes - and I think preferring non-member, non-friend functions also helps here.
Prefer non-member non-friend functions for encapsulation UNLESS you want implicit conversions to work for class templates non-member functions (in which case you better make them friend functions):
That is, if you have a class template type<T>:
template<class T>
struct type {
void friend foo(type<T> a) {}
};
and a type implicitly convertible to type<T>, e.g.:
template<class T>
struct convertible_to_type {
operator type<T>() { }
};
The following works as expected:
auto t = convertible_to_type<int>{};
foo(t); // t is converted to type<int>
However, if you make foo a non-friend function:
template<class T>
void foo(type<T> a) {}
then the following doesn't work:
auto t = convertible_to_type<int>{};
foo(t); // FAILS: cannot deduce type T for type
Since you cannot deduce T then the function foo is removed from the overload resolution set, that is: no function is found, which means that the implicit conversion does not trigger.