Suppose I have a stream of [acme] objects that I want to expose via an API. I have two choices, callbacks and iterators.
API #1: Callbacks
// API #1
// This function takes a user-defined callback
// and invokes it for each object in the stream.
template<typename CallbackFunctor>
void ProcessAcmeStream(CallbackFunctor &callback);
API #2: Iterators
// API #2
// Provides the iterator class AcmeStreamIterator.
AcmeStreamIterator my_stream_begin = AcmeStreamIterator::begin();
AcmeStreamIterator my_stream_end = AcmeStreamIterator::end();
API #1 takes the control flow of the program from the user's hand and will not return until the entire stream is consumed (forgetting exceptions for the moment).
API #2 retains the control flow in the user's hand, allowing the user to move forward the stream on his own.
API #1 feels more higher level, allowing the users to jump to the business logic (the callback functor) right away. On the other hand, API #2 feels more flexible, allowing the users lower-level of control.
From a design perspective, which one should I go with? Are there more pros and cons that I have not seen yet? What are some support/maintenance issues down the future?
The iterator approach is more flexible, with the callback version being easily implemented in terms of the first one by means of existing algorithms:
std::for_each( MyStream::begin(), MyStream::end(), callback );
IMO, the second is clearly superior. While I can (sort of) understand your feeling that it's lower level, I think that's incorrect. The first defines its own specific idea of "higher level" -- but it's one that doesn't fit well with the rest of the C++ standard library, and ends up being relatively difficult to use. In particular, it requires that if the user wants something equivalent to a standard algorithm, it has to be re-implemented from the ground up rather than using existing code.
The second fits perfectly with the rest of the library (assuming you implement your iterators correctly) and gives the user an opportunity for dealing with your data at a much higher level via standard algorithms (and/or new, non-standard algorithms that follow the standard patterns).
One advantage of callbacks over iterators is that users of your API can't mess up iteration. It's easy to compare the wrong iterators, or use the wrong comparison operation or fail in some other way. The callback API prevents that.
Canceling enumeration is easily done using a callback, BTW: Just let the callback return a bool and continue only as long as it returns true.
C++ standard library idiom is to provide iterators. If you provide iterators, then ProcessAcmeStream is a simple wrapper around std::for_each. Maybe worth the trouble of writing, maybe not, but it isn't exactly boosting your caller into a radical new world of usability, it's a new name for an application of a standard library function to your iterator pair.
In C++0x, if you also make the iterator pair available through std::begin and std::end then caller can use range-based for, which takes them into the business logic just as quickly as ProcessAcmeStream does, perhaps quicker.
So I'd say, if it's possible to provide an iterator then provide it - the C++ standard does inversion of control for you if the caller wants to program that way. At least, for a case where the control is as simple as this it does.
From a design perspective, I would say that the iterator method is better, simply because it's easier and also more flexible; it's really annoying to make callback functions for without lambdas. (Now that C++0x will have lambda expressions, though, this may become less of a concern, but even still, the iterator method is more generic.)
Another issue with callbacks is cancellation. You can return a boolean value to indicate whether you'd like to cancel enumeration, but I always feel uneasy when the control is out of my hands, since you don't always know what might happen. Iterators don't have this issue.
And of course, there's always the issue that iterators can be random-access whereas callbacks aren't, so they're more extensible as well.
Related
I am working with the API of a C++ library that takes lots of std::weak_ptrs as the input parameters of the API methods. The library does not keep these pointers and just does some processing on them. To me, this design is something like this from the library's point of view:
Hi API User,
You have passed me some weak pointers as the input parameter(s) of
a method to get a service from the library. But your pointers may be expired and not
valid anymore. OK, no problem, I will do the check and let you know
about it.
BR,
Library API
Isn't it more reasonable for the design of such an API to get all of the pointers using a std::shared_ptr? In this case, if the API user is working with weak_ptrs, it is the user's responsibility to .lock() the weak_ptr pointers first and then pass them to the API (if the .lock() is successful). Is there any cases that the API should just get the parameters as the std::weak_ptr not the std::shared_ptr?
p.s. There is a similar question here in S.O., but it does not clearly answer my question in general.
If the API methods take a long time to execute then the shared_ptrs would be locked for the duration of the execution if it took std::shared_ptr instead of std::weak_ptr. Whether this is a concern or not is difficult to tell without knowing the API.
I don't see any real disadvantage to this approach, there will be a small cost in converting from shared_ptr to weak_ptr and back to shared_ptr again and certainly a complexity cost in terms of the implementation though as it probably would have to check for null pointers anyway that cost is presumably small.
The rule I tend to follow is this: if the callee isn't mucking with lifetime/ownership, do not pass it a smart pointer; rather, pass in a raw C++ reference (preferred) or raw pointer. I find it far cleaner and more flexible to separate the concern of ownership from usage.
[Note: this is also my answer to a different question: what's the point of std::unique_ptr::get
This question already has answers here:
What is the point of function pointers?
(18 answers)
Closed 4 years ago.
I hope its an extremely repetitive question. And my advance excuse to all the viewers who find it annoying.
Although I am bit experienced programmer, but I cannot justify the use of function pointer over direct call. Scenarios where I unable to find the differences are -
1) callbacks - same can be achieved by direct call.
2) Asynchronous or synchronous event handling - anyway event has to be identified, based on which element no. in function pointer array got updated. But the same can be also done via direct call.
3) In some post I had seen people commenting it is to be used when it is not known which function to call. I didn't get any proper justification for this.
I really appreciate if someone can explain me using above scenarios with practical and really simple realistic example.
Some more things function pointers are often used for:
Runtime polymorphism: You can define a structure that encapsulates a function pointer, or a pointer to a function table. This enables you to call the specified function at runtime, even for a type of client object that did not exist when your library was written. You can use this to implement multiple dispatch or something like the visitor design pattern in C. This is also how C++ classes and their virtual member functions were originally implemented under the hood.
Closures: These can be structures containing a function pointer and one or more of its arguments.
State Machines: Instead of a switch with a case for each state label, I’ve often found it convenient to give the handler for each state its own function. The current state is the function you’re in, the state transitions are tail-recursive calls, and the program variables are parameters. The state labels then become function pointers, which you might store in a table or return from a function.
Higher-Order Functions: Two examples from the C standard library are qsort() and btree(), which generalize the type of elements and the comparison function.
Low-Level Support: Shared-library loaders, for example, need this.
1) callbacks - same can be achieved by direct call.
Not true. For a direct call, the caller must know the function name and signature when the code is compiled, and can only ever call that one function. A callback is defined at runtime and can be changed dynamically, while the caller need only know the signature, not the name. Moreover each instance of an object may have a different callback, whereas with a direct call, all instances must call the same function.
2) Asynchronous or synchronous event handling - anyway event has to be
identified, based on which element no. in function pointer array got
updated. But the same can be also done via direct call.
Not sure what you mean, but an event handler is simply a kind of callback. The event may be identified by the caller and different call-back handlers called through pointers. Your point only stands if there is one event handler for all event types and the user is to be responsible for identification.
3) In some post I had seen people commenting it is to be used when it is not known which function to call. I didn't get any proper justification for this.
See (1) and (2) above. Often it is a means to hook platform independent third-party library code into a specific platform without having to deliver source-code or for system events that require user/application-defined handlers.
I would not sweat it however - if all your application requirements can be resolved without using a pointer to a function, then you don't need a pointer to a function. When you need one, you will probably know. You will most likely encounter it when you have to use an API that requires it before you ever implement an interface yourself that does. For example in the standard library the qsort() function requires a pointer to a function in order to define how two objects of arbitrary type are to be ordered - allowing qsort() to support any type of object - it is a way in C of making a function "polymorphic". C++ supports polymorphism directly, so there is often less need for explicit function-pointers in C++ - although internally polymorphism is implemented using function pointers in any case.
There is a concept in programming called DRY -- don't repeat yourself.
Suppose you have 121 buttons in your UI. Each one of them behaves much the same, except when you press the button, a different operation happens.
You can (A) use virtual inheritance to dispatch to the right operation (requiring a class per button), or (B) use a function pointer (or a std::function) stored in the class in order to call the right "on click" handler, or (C) have every single button be a distinct type.
A virtual function is implemented in every compiler I have examined as a complex table that, in the end, is a collection of function pointers.
So your choices are function pointers or generating 121 completely distinct buttons that happen to mostly behave the same.
In any situation where you want to decouple the caller and the called, you must use something akin to a function pointer. There are a ridiculous number of cases, from work queues to thread off tasks, callbacks, etc.
In tiny programs where everything is hard coded, hard coding every call can work. But hard coded stuff like this doesn't scale. When you want to update those 121 buttons each hand-implemented, knowing their points of customization is going to be ridiculously difficult. And they will fall out of sync.
And 121 is a modest number of buttons. What about an app with 10,000? And you want to update every button's behavior to handle touch-based input?
Even more, when you type erase, you can reduce binary size significantly. 121 copies of a class implementing a button is going to take more executable space than 1 class, each of which stores a function pointer or two.
Function pointers are but one type of "type erasure". Type erasure reduces binary size, provides clearer contracts between provider and consumer, and makes it easier to refactor behavior around the type erased data.
Without function pointers, how would you implement a function which calculates the integral of any real-valued function?
typedef double (*Function)(double);
double Integral(Function f, double a, double b);
1) callbacks - same can be achieved by direct call.
Not in all cases, since the caller may not know at compile-time what function must be called. For instance, this is typical in libraries since they cannot know in advance your code.
However, it can also happen in your own code: whenever you want to re-use partially a function, you can either:
Create several versions of that function, each calling a different function. Duplicates code, very bad maintenance. Good performance unless hit by code bloat.
Pass a function pointer (or callable in general in C++). Flexible, less code, performance might suffer in some cases.
Create a set of branches (if/switch chain), if you know in advance the set of possible functions to call. Rigid, but might be faster than a function pointer for small number of branches.
In C++, create a templated version. Same as the first case, but automated; so good maintenance. Code bloat might be an issue.
Factor out the common code so that callers can call whatever they need piece by piece. Sometimes this isn't possible/easy -- specially when parametrizing complex algorithms that you want to keep reusable (e.g. qsort()). In C++, see the STL (Standard Template Library).
2) Asynchronous or synchronous event handling - anyway event has to be identified, based on which element no. in function pointer array got updated. But the same can be also done via direct call.
Some event systems are designed so that you simply configure which function(s) will be triggered when a given event happens. If this is an external library with a C interface, they have no choice but to use function pointers.
Some other systems let you create your own event loop and you fetch the events somehow and do whatever you want with them; so they avoid callbacks.
3) In some post I had seen people commenting it is to be used when it is not known which function to call. I didn't get any proper justification for this.
See the first case.
Thanks all for actively participating in this discussion. Thanks for giving practical examples like -
1) Implement Library function
2) Look qsort
3) Refer Linux Kernel
4) Generic Heap data structure in C
I feel qsort() void qsort(void *base, size_t nitems, size_t size, int (*compar)(const void *, const void*)) s is quite sufficient to clear my 1) & 3) point.
1) callbacks - same can be achieved by direct call.
3) In some post I had seen people commenting it is to be used when it is not known which function to call. I didn't get any proper justification for this.
Mainly by callbacks - it is a provision of calling a function for which the body is not yet defined. And it expected that the definition of the function will be provided later during run-time. So, compilation won't be hindered due to lack of function definition. Practical use if someone consider above qsort() function. In this the user is responsible for providing the function definition for compare() like -
int compare (int* a, int* b)
{
//User defined body based on problem requirement
}
Lets consider a practical scenario where multiple threads have their respective compare function. In case of direct call every thread need to implement their own sorting function or if a common function then implementation would be much more bulky. But by using the callback method all threads can use same function for sorting, since the sorting algo remain same for all threads.
Considering a layered architecture mainly higher layers have an abstract view of lower layer. So, here if say we have qsort() function [User defined qsort] implemented at application layer and lets say underlying application there is a ADC driver layer which capture sample and provide to application for sorting. Then for application it is not necessary to understand the definition of function responsible for collecting and providing the samples. But application will only focus on obtaining the sample. Hence, that main application won't know which function to call. Respective ADC driver will simply make a call to application using the qsort() and provide needful data.
Regarding 2 point still confused -
2) Asynchronous or synchronous event handling - anyway event has to be identified, based on which element no. in function pointer array got updated. But the same can be also done via direct call.
From above discussion I conclude that if event handlers pointed to some library function, then it need to be implemented via pointer to function. And secondly to create an independent and handy code it is necessary to maintain function pointer. Lets say between application and driver we have an interfacing layer. So, if either application or driver changes anytime it won't affect or very least affect each other. And this interface layer is implemented using pointer to function. But consider below scenario -
int (*fptr[10]) (void) =
{
function1; //function for starting LED
function2; //function for relay operation
.
.
function10; //function for motor control
}
lets say we have GPIO0.0 - GPIO0.10 has been mapped to the function pointer array. i.e. GPIO0.0 - 0th element of fptr
.
.
GPIO0.10 - 10th element of fptr
These GPIO pins has been configured for level triggered interrupt and their respective ISR will update the array element no. i=GPIO_Value; further the scheduler have an thread which will call the function pointer array -
fptr[i]();
Does the use of function pointer is justifiable here??
I'm looking to implement something similar to the deferred IEnumerable concept in C++, but without template implementations. (My question is very similar to this other question, except for the deal-breaking templates.)
In our code, we have many functions that receive a spec to query and will store the results in a passed std::vector&. Often these functions call lower level versions which do their own filtering and combining with other sets. Sometimes they will create local vectors (with a different temp allocator) to call the lower-level functions, then push_back to the caller's vector after filtering that. These are not generic functions, in either word or spirit.
I'm looking to eliminate the copies and allocs, and even the need for a results container. Plus while I'm doing this, I'd like to give it a short-circuit ability.
There's an obvious need for deferred operations here. In C#, this would be easy. Everything yields into an IEnumerable, and use the Where() etc. operators, all deferred.
I have an idea how to do this in C++, passing in callbacks rather than containers. We just need the "push_back" abstracted, plus a returned bool for "stop here". The callbacks would be chained very similarly to IEnumerable operators, though in operation it would be more of a push-pull, and maybe feel a bit like IObservable.
And on top of all this, I have to avoid templatizing these function implementations. I don't want to refactor a lot of code, and I don't want to run into unsupported compiler surprises on the 15 or so different platforms we're currently compiling to. So no promises, no (I think). And no C++11.
This seems like a solved problem, but I've poked around on Google and maybe haven't found the right way to ask it. Has this problem been solved already?
I re-read your question and I actually think this can be servicable, but it's definitely not ideal without C++11.
You can consider boost::function<boost::optional<T>()> (yay Boost) as a range. The semantics of this range are super simple- you invoke it, and when it's finished, it gives back an empty optional instead of a value.
For output operations like push_back, you can consider a boost::function<void(T)> that will accept the value and then do whatever you need with it. However generally, these aren't that useful, as most functions can be written as returning a modified range, then the caller can simply use that range directly rather than requiring to take an output range.
Not having lambdas here is a total bitch, but you can look into e.g. Boost.Phoenix/Lambda to help.
There are various ways of returning a collection of items from a method of a class in C++.
For example, consider the class MessageSpy that listens on all Messages sent over a connection. A client could access the messaging information in a number of ways.
const CollectionClass MessageSpy::getMessages()
Iterator MessageSpy::begin(), Iterator MessageSpy::end()
void MessageSpy::getMessages(OutputIterator)
void MessageSpy::eachMessage(Functor)
others...
Each approach has its trade-offs. For example: Approach 1 would require copying the whole collection which is expensive for large collections. While approach 2 makes the class look like a collection which is inappropriate for a view...
Since I'm always strungling choosing the most appropriate approach in I wonder what you consider the trade-offs/costs when considering these approaches?
I suggest an iterator based/callback based approach in cases where you demand the most lightweight solution possible.
The reason is that it decouples the supplier from the usage patterns by the consumer.
In particular, slamming the result into a collection1 (even though the result maybe "optimized" - likely into (N)RVO or moving instead of copying the object) would still allocate a complete container for the full capacity.
Edit: 1 an excellent addition to "obligatory papers" (they're not; they're just incredibly helpful if you want to understand things): Want Speed? Pass By value by Dave Abrahams.
Now
this is overkill if the consumer actually stops processing data after the first few elements
for(auto f=myType.begin(), l=myType.end(); f!=l; ++f)
{
if (!doProcessing(*f))
break;
}
this can be suboptimal even if the consumer processes al elements eventually: there might not be a need to have all elements copied at any particular moment, so the 'slot' for the 'current element' can be reused, reducing memory requirements, increasing cache locality. E.g.:
for(auto f=myType.begin(), l=myType.end(); f!=l; ++f)
{
myElementType const& slot = *f; // making the temp explicit
doProcessing(slot);
}
Note that iterator interfaces are simply still superior if the consumer did want a collection containing all elements:
std::vector<myElementType> v(myType.begin(), myType.end());
// look: the client gets to _decide_ what container he wants!
std::set<myElementType, myComparer> s(myType.begin(), myType.end());
Try getting this flexibility otherwise.
Finally, there are some elements of style:
by nature it's easy to expose (const) references to the elements using iterators; this makes it much easier to avoid object slicing and to enable clients to use the elements polymorphically.
iterator-style interfaces could be overloaded to return non-const references on dereference. A container to be returned, couldn't contain references (directly)
if you adhere to the requirements of range-based-for in C++11 you can have some syntactic sugar:
for (auto& slot : myType)
{
doProcessing(slot);
}
Finally, (as shown above), in the general sense iterators work nicely with the standard library.
The callback style (and similarly the Output-iterator style) has many of the benefits of the iterator style (namely, you could use return values to abort iteration halfway, and you could do processing without allocating copies of all elements up front), but it seems to me to be slightly less flexible in use. Of course, there may be situations where you want to encourage a particular usage pattern, and this migh be a good way to go.
The first thing (you somehow didn't mention at all) I would think about is
const CollectionClass& MessageSpy::getMessages()
Note the &. That returns you const reference which can't be modified but can be freely accepted.
No copying, unless you really want to copy.
If that's not suitable, Qt, for example, uses "implicit data sharing" for plenty of classes.
I.e. your classes are "kinda" returned by value, BUT their internal data is shared until you attempt to perform write operation on one of them. In this case, class you're trying to write into, performs a deep copy, and data stops being shared. That means less data is moved around.
And there's return value optimization some people on SO seems to love too much. Basically, when you return something big by value, some compilers in certain situations can eliminate extra copy, and immediately pass value bypassing extra assignment which may be faster than returning by reference. I wouldn't rely on it too much, but if you profiled your code and figured out that using RVO provides a good speed-up, then it is worth using.
I wouldn't recommend "iterators", because using them on C++03 compiler without auto keyword is royal pain in the #&#. Long names or many typedefs. I would return const reference to container itself instead.
Classic example is iterator invalidation :
std::string test("A");
auto it = test.insert(test.begin()+1,'B');
test.erase();
...
std::cout << *it;
Do you think having this kind of API is bad design, and will be difficult to learn/use for beginners ?
A costly, performance/memory wise, solution would be, in that type of case, to assign the pointer/iterator to an empty string (or a nullptr, but that's not very helpful) when a clear method is used.
Some precisions
I'm thinking of this design for returning const chars* that can be modified internally (maybe they're stored in a std::vector that can be cleared). I don't want to return a std::string (binary compatibility) and I don't want a get(char*,std::size_t) method because of the size argument that needs to be fetched (too slow). Also I don't want to create a wrapper around std::string or my own string class.
I would recommend reading up on Stepanov's design philosophy (pages 9-11):
[This example] is written in a clear object-oriented style with getters and setters. The proponents of this style say that the advantage of having such functions is that it allows programmers later on to change the implementation. What they forget to mention is that sometimes it is awfully good to expose the implementation. Let us see what I mean. It is hard for me to imagine an evolution of a system that would let you keep the interface of get and set, but be able to change the implementation. I could imagine that the implementation outgrows int and you need to switch to long. But that is a different interface. I can imagine that you decide to switch from an array to a list but that also will force you to change the interface, since it is really not a very good idea to index into a linked list.
Now let us see why it is really good to expose the implementation. Let us assume that tomorrow you decide to sort your integers. How can you do it? Could you use the C library qsort? No, since it knows nothing about your getters and setters. Could you use the STL sort? The answer is the same. While you design your class to survive some hypothetical change in the implementation, you did not design it for the very common task of sorting. Of course, the proponents of getters and setters will suggest that you extend your interface with a member function sort. After you do that, you will discover that you need binary search and median, etc. Very soon your class will have 30 member functions but, of course, it will be hiding the implementation. And that could be done only if you are the owner of the class. Otherwise, you need to implement a decent sorting algorithm on top of the setter-getter interface from scratch and that is a far more difficult and dangerous activity than one can imagine. ...
Setters and getters make our daily programming hard but promise huge rewards in the future when we discover better ways to store arrays of integers in memory. But I do not know a single realistic scenario when hiding memory locations inside our data structure helps and exposure hurts; it is, therefore, my obligation to expose a much more convenient interface that also happens to be consistent with the familiar interface to the C arrays. When we program in C++ we should not be ashamed of its C heritage, but make full use of it. The only problems with C++, and even the only problems with C, arise when they themselves are not consistent with their own logic. ...
My remark about exposing the address locations of consecutive integers is not facetious.
It took a major effort to convince the standard committee that such a requirement is an
essential property of vectors; they would not, however, agree that vector iterators should
be pointers and, therefore, on several major platforms – including the Microsoft one – it
is faster to sort your vector by saying the unbelievably ugly
if (!v.empty()) {
sort(&*v.begin(), &*v.begin() + v.size());
}
than the intended
sort(v.begin(), v.end());
Attempts to impose pseudo-abstractness at the cost of efficiency can be defeated, but at a terrible cost.
Stepanov has a lot of other interesting documents available, especially in the "Class Notes" section.
Yes, there are several rules of thumb regarding OOP. No, I'm not convinced that they are really the best way to do things. When you're working with the STL it makes a lot of sense to do things the STL compatible way. And when your abstraction is low level (like std::vector, which is meant specifically to make working with dynamically allocated arrays easier; i.e., it should be usable almost like an array with some added features), then some of those OOP rules of thumb make no sense at all.
To answer the original question: even beginners will eventually need to learn about iterators, object lifetimes, and what I'll call an object's useful life (i.e., "the object hasn't fallen out of scope, but is no longer valid to use, like an invalidated iterator"). I don't see any reason to try to hide those facts of life from the user, so I personally wouldn't rule out an iterator-based API on those grounds. The real question is what your API is meant to abstract and what's it's meant to expose (similar to the fact that a vector is a nicer array and is meant to expose its array nature). If you answer that, you should have a better idea about whether an iterator-based API makes sense.
As Scott Meyers states in Effective C++: yes it is indeed not a good design to grant access to private/protected members via pointers, iterators or references because you never know what the client code will do with it.
As far as I can remember this should be avoided, and it is sometimes better to create a copy of data members which are then returned to the caller.
It is a bad or faulty implementation rather than design.
As for providing access to private or protected members through pointers, basically it destroys one of the basic OOP principle of Abstraction.
I am unsure though as to what the question is, Yes ofcourse it is bad to have implementation which invalidates iterator. What is the real Q here?