Existing methods to emulate C# concept of deferred IEnumerable<T> in C++? - c++

I'm looking to implement something similar to the deferred IEnumerable concept in C++, but without template implementations. (My question is very similar to this other question, except for the deal-breaking templates.)
In our code, we have many functions that receive a spec to query and will store the results in a passed std::vector&. Often these functions call lower level versions which do their own filtering and combining with other sets. Sometimes they will create local vectors (with a different temp allocator) to call the lower-level functions, then push_back to the caller's vector after filtering that. These are not generic functions, in either word or spirit.
I'm looking to eliminate the copies and allocs, and even the need for a results container. Plus while I'm doing this, I'd like to give it a short-circuit ability.
There's an obvious need for deferred operations here. In C#, this would be easy. Everything yields into an IEnumerable, and use the Where() etc. operators, all deferred.
I have an idea how to do this in C++, passing in callbacks rather than containers. We just need the "push_back" abstracted, plus a returned bool for "stop here". The callbacks would be chained very similarly to IEnumerable operators, though in operation it would be more of a push-pull, and maybe feel a bit like IObservable.
And on top of all this, I have to avoid templatizing these function implementations. I don't want to refactor a lot of code, and I don't want to run into unsupported compiler surprises on the 15 or so different platforms we're currently compiling to. So no promises, no (I think). And no C++11.
This seems like a solved problem, but I've poked around on Google and maybe haven't found the right way to ask it. Has this problem been solved already?

I re-read your question and I actually think this can be servicable, but it's definitely not ideal without C++11.
You can consider boost::function<boost::optional<T>()> (yay Boost) as a range. The semantics of this range are super simple- you invoke it, and when it's finished, it gives back an empty optional instead of a value.
For output operations like push_back, you can consider a boost::function<void(T)> that will accept the value and then do whatever you need with it. However generally, these aren't that useful, as most functions can be written as returning a modified range, then the caller can simply use that range directly rather than requiring to take an output range.
Not having lambdas here is a total bitch, but you can look into e.g. Boost.Phoenix/Lambda to help.

Related

Why is there no clear() function for the inbuilt stack interface in C++?

I have to empty a stack before using it further more. I do understand that it can be done like:
while (!mystack.empty()) { mystack.pop(); }
Is there a specific reason for not having this function? or its just that the first time it was made, no one felt its requirement and has been just left out
Also, the stack interface in Java does have a clear() function.
While it would possibly be more readable to have an explicit .clear(), even without it you can empty a stack like this:
mystack = {};
As molbdnilo mentioned within the comments, you have to distinguish between standard containers and container adapters. std::stack is a container adapter, not a container. There are several reasons, why these adapters have to reduce their assumptions about characteristics of used inner container types as far as possible. A relevant one is time complexity (theoretical, accidental) for instance, that might differ a lot between possible underlying containers here. A further relevant aspect can be the requirement to be consistent to several access schemes within parallel working environments (parallel reads and writes), although that might not be relevant here specifically to the clear-functionality.
And in general, it follows a simple software design rule: Do not inflate top-level interfaces with too many assumptions about possible inner implementations and possible usage-scenarios that might occur for your data type but are not directly related to its core-characteristics. Directly clearing an 'abstract' stack can introduce a lot of confusion in doubt and error-prone misusage of objects of this type since a stack often represents more than just a simple partial ordering, it commonly represents a history. Semantically, a direct clear-operation can be seen as a design attack for several stack-related scenarios here: "Forget what I've done and thought so far with and about the stack, let's try something totally different..." Re-assigning is the superior approach therefore here in terms of proprtionality between the issues mentioned here since you explicitly introduce a totally new object (while the previous one might still live within a shared_ptr for instance somewhere else, unaffected by the clearing if required).
Run a while loop until the stack is empty to pop all the elements
while(!stack.empty()){
stack.pop();
}

C++ - Feasibility and performance of passing NULL vs. allocating empty std::set to pass to method

I'm doing some performance tuning work. Basically, I have found a potential bottleneck in our code base and thinking of the best solution for it. I will try to keep this question as simple as possible. Basically, I have a method that will work with a set of double values (std::set). So the method signature looks something like:
void MyClass::CalculateStuff(const std::set<double> & mySet);
There are several places in the code that call this method. Only a few of these places need to work with this set while others don't care about the set. I suppose I could create another version of this method that includes this set and modify the existing one to use an empty set. However, this would create some overhead for the places that don't care about the set (because they would have to make additional method calls). So the other option I thought of was using a pointer to a set argument instead, like so:
void MyClass::CalculateStuff(const std::set<double> * pMySet);
The validity of the pointer would determine whether we want to use the set or not (i.e. passing NULL pointer for the set argument means that we do no work associated with the set). This would be faster but obviously not clean from an interface perspective. I suppose I could heavily comment the code.
What do you think should be done? This is probably not a huge deal but it got me thinking about how far you should go to make your code faster (if performance is very important in an application) vs. making sure the code is still clean and manageable. Where should the line be drawn in this case?
well i do games programming. and for me. i only worry about potential bottlenecks when the game starts lagging then i profile etc.
passing in by pointer will be more memory efficent as a pointer to a object is only 4bytes. but i do suggest when you pass it null. that you use nullptr as thats the current standard to define a null pointer rather then NULL or 0.
im guessing you are passing in a set to calculate something with the class's set. in which case maybe overloading operators would be the best option for you?

Long delegation chains in C++

This is definitely subjective, but I'd like to try to avoid it
becoming argumentative. I think it could be an interesting question if
people treat it appropriately.
In my several recent projects I used to implement architectures where long delegation chains are a common thing.
Dual delegation chains can be encountered very often:
bool Exists = Env->FileSystem->FileExists( "foo.txt" );
And triple delegation is not rare at all:
Env->Renderer->GetCanvas()->TextStr( ... );
Delegation chains of higher order exist but are really scarce.
In above mentioned examples no NULL run-time checks are performed since the objects used are always there and are vital to the functioning of the program and
explicitly constructed when execution starts. Basically I used to split a delegation chain in these cases:
1) I reuse the object obtained through a delegation chain:
{ // make C invisible to the parent scope
clCanvas* C = Env->Renderer->GetCanvas();
C->TextStr( ... );
C->TextStr( ... );
C->TextStr( ... );
}
2) An intermediate object somewhere in the middle of the delegation chain should be checked for NULL before usage. Eg.
clCanvas* C = Env->Renderer->GetCanvas();
if ( C ) C->TextStr( ... );
I used to fight the case (2) by providing proxy objects so that a method can be invoked on non-NULL object leading to an empty result.
My questions are:
Is either of cases (1) or (2) a pattern or an antipattern?
Is there a better way to deal with long delegation chains in C++?
Here are some pros and cons I considered while making my choice:
Pros:
it is very descriptive: it is clear out of 1 line of code where did the object came from
long delegation chains look nice
Cons:
interactive debugging is labored since it is hard to inspect more than one temporary object in the delegation chain
I would like to know other pros and cons of the long delegation chains. Please, present your reasoning and vote based on how well-argued opinion is and not how well you agree with it.
I wouldn't go so far to call either an anti-pattern. However, the first has the disadvantage that your variable C is visible even after it's logically relevant (too gratuitous scoping).
You can get around this by using this syntax:
if (clCanvas* C = Env->Renderer->GetCanvas()) {
C->TextStr( ... );
/* some more things with C */
}
This is allowed in C++ (while it's not in C) and allows you to keep proper scope (C is scoped as if it were inside the conditional's block) and check for NULL.
Asserting that something is not NULL is by all means better than getting killed by a SegFault. So I wouldn't recommend simply skipping these checks, unless you're a 100% sure that that pointer can never ever be NULL.
Additionally, you could encapsulate your checks in an extra free function, if you feel particularly dandy:
template <typename T>
T notNULL(T value) {
assert(value);
return value;
}
// e.g.
notNULL(notNULL(Env)->Renderer->GetCanvas())->TextStr();
In my experience, chains like that often contains getters that are less than trivial, leading to inefficiencies. I think that (1) is a reasonable approach. Using proxy objects seems like an overkill. I would rather see a crash on a NULL pointer rather than using a proxy objects.
Such long chain of delegation should not happens if you follow the Law of Demeter. I've often argued with some of its proponents that they where holding themselves to it too conscientiously, but if you come to the point to wonder how best to handle long delegation chains, you should probably be a little more compliant with its recommendations.
Interesting question, I think this is open to interpretation, but:
My Two Cents
Design patterns are just reusable solutions to common problems which are generic enough to be widely applied in object oriented (usually) programming. Many common patterns will start you out with interfaces, inheritance chains, and/or containment relationships that will result in you using chaining to call things to some extent. The patterns are not trying to solve a programming issue like this though - chaining is just a side effect of them solving the functional problems at hand. So, I wouldn't really consider it a pattern.
Equally, anti-patterns are approaches that (in my mind) counter-act the purpose of design patterns. For example, design patterns are all about structure and the adaptability of your code. People consider a singleton an anti-pattern because it (often, not always) results in spider-web like code due to the fact that it inherently creates a global, and when you have many, your design deteriorates fast.
So, again, your chaining problem doesn't necessarily indicate good or bad design - it's not related to the functional objectives of patterns or the drawbacks of anti-patterns. Some designs just have a lot of nested objects even when designed well.
What to do about it:
Long delegation chains can definitely be a pain in the butt after a while, and as long as your design dictates that the pointers in those chains won't be reassigned, I think saving a temporary pointer to the point in the chain you're interested in is completely fine (function scope or less preferably).
Personally though, I'm against saving a permanent pointer to a part of the chain as a class member as I've seen that end up in people having 30 pointers to sub objects permanently stored, and you lose all conception of how the objects are laid out in the pattern or architecture you're working with.
One other thought - I'm not sure if I like this or not, but I've seen some people create a private (for your sanity) function that navigates the chain so you can recall that and not deal with issues about whether or not your pointer changes under the covers, or whether or not you have nulls. It can be nice to wrap all that logic up once, put a nice comment at the top of the function stating which part of the chain it gets the pointer from, and then just use the function result directly in your code instead of using your delegation chain each time.
Performance
My last note would be that this wrap-in-function approach as well as your delegation chain approach both suffer from performance drawbacks. Saving a temporary pointer lets you avoid the extra two dereferences potentially many times if you're using these objects in a loop. Equally, storing the pointer from the function call will avoid the over head of an extra function call every loop cycle.
For bool Exists = Env->FileSystem->FileExists( "foo.txt" ); I'd rather go for an even more detailed breakdown of your chain, so in my ideal world, there are the following lines of code:
Environment* env = GetEnv();
FileSystem* fs = env->FileSystem;
bool exists = fs->FileExists( "foo.txt" );
and why? Some reasons:
readability: my attention gets lost till I have to read to the end of the line in case of bool Exists = Env->FileSystem->FileExists( "foo.txt" ); It's just too long for me.
validity: regardles that you mentioned the objects are, if your company tomorrow hires a new programmer and he starts writing code, the day after tomorrow the objects might not be there. These long lines are pretty unfriendly, new people might get scared of them and will do something interesting such as optimising them... which will take more experienced programmer extra time to fix.
debugging: if by any chance (and after you have hired the new programmer) the application throws a segmentation fault in the long list of chain it is pretty difficult to find out which object was the guilty one. The more detailed the breakdown the more easier to find the location of the bug.
speed: if you need to do lots of calls for getting the same chain elements, it might be faster to "pull out" a local variable from the chain instead of calling a "proper" getter function for it. I don't know if your code is production or not, but it seems to miss the "proper" getter function, instead it seems to use only the attribute.
Long delegation chains are a bit of a design smell to me.
What a delegation chain tells me is that one piece of code has deep access to an unrelated piece of code, which makes me think of high coupling, which goes against the SOLID design principles.
The main problem I have with this is maintainability. If you're reaching two levels deep, that is two independent pieces of code that could evolve on their own and break under you. This quickly compounds when you have functions inside the chain, because they can contain chains of their own - for example, Renderer->GetCanvas() could be choosing the canvas based on information from another hierarchy of objects and it is difficult to enforce a code path that does not end up reaching deep into objects over the life time of the code base.
The better way would be to create an architecture that obeyed the SOLID principles and used techniques like Dependency Injection and Inversion Of Control to guarantee your objects always have access to what they need to perform their duties. Such an approach also lends itself well to automated and unit testing.
Just my 2 cents.
If it is possible I would use references instead of pointers. So delegates are guaranteed to return valid objects or throw exception.
clCanvas & C = Env.Renderer().GetCanvas();
For objects which can not exist i will provide additional methods such as has, is, etc.
if ( Env.HasRenderer() ) clCanvas* C = Env.Renderer().GetCanvas();
If you can guarantee that all the objects exist, I don't really see a problem in what you're doing. As others have mentioned, even if you think that NULL will never happen, it may just happen anyway.
This being said, I see that you use bare pointers everywhere. What I would suggest is that you start using smart pointers instead. When you use the -> operator, a smart pointer will usually throw if the pointer is NULL. So you avoid a SegFault. Not only that, if you use smart pointers, you can keep copies and the objects don't just disappear under your feet. You have to explicitly reset each smart pointer before the pointer goes to NULL.
This being said, it wouldn't prevent the -> operator from throwing once in a while.
Otherwise I would rather use the approach proposed by AProgrammer. If object A needs a pointer to object C pointed by object B, then the work that object A is doing is probably something that object B should actually be doing. So A can guarantee that it has a pointer to B at all time (because it holds a shared pointer to B and thus it cannot go NULL) and thus it can always call a function on B to do action Z on object C. In function Z, B knows whether it always has a pointer to C or not. That's part of its B's implementation.
Note that with C++11 you have std::smart_ptr<>, so use it!

Pros and Cons of Inversion of Control

Suppose I have a stream of [acme] objects that I want to expose via an API. I have two choices, callbacks and iterators.
API #1: Callbacks
// API #1
// This function takes a user-defined callback
// and invokes it for each object in the stream.
template<typename CallbackFunctor>
void ProcessAcmeStream(CallbackFunctor &callback);
API #2: Iterators
// API #2
// Provides the iterator class AcmeStreamIterator.
AcmeStreamIterator my_stream_begin = AcmeStreamIterator::begin();
AcmeStreamIterator my_stream_end = AcmeStreamIterator::end();
API #1 takes the control flow of the program from the user's hand and will not return until the entire stream is consumed (forgetting exceptions for the moment).
API #2 retains the control flow in the user's hand, allowing the user to move forward the stream on his own.
API #1 feels more higher level, allowing the users to jump to the business logic (the callback functor) right away. On the other hand, API #2 feels more flexible, allowing the users lower-level of control.
From a design perspective, which one should I go with? Are there more pros and cons that I have not seen yet? What are some support/maintenance issues down the future?
The iterator approach is more flexible, with the callback version being easily implemented in terms of the first one by means of existing algorithms:
std::for_each( MyStream::begin(), MyStream::end(), callback );
IMO, the second is clearly superior. While I can (sort of) understand your feeling that it's lower level, I think that's incorrect. The first defines its own specific idea of "higher level" -- but it's one that doesn't fit well with the rest of the C++ standard library, and ends up being relatively difficult to use. In particular, it requires that if the user wants something equivalent to a standard algorithm, it has to be re-implemented from the ground up rather than using existing code.
The second fits perfectly with the rest of the library (assuming you implement your iterators correctly) and gives the user an opportunity for dealing with your data at a much higher level via standard algorithms (and/or new, non-standard algorithms that follow the standard patterns).
One advantage of callbacks over iterators is that users of your API can't mess up iteration. It's easy to compare the wrong iterators, or use the wrong comparison operation or fail in some other way. The callback API prevents that.
Canceling enumeration is easily done using a callback, BTW: Just let the callback return a bool and continue only as long as it returns true.
C++ standard library idiom is to provide iterators. If you provide iterators, then ProcessAcmeStream is a simple wrapper around std::for_each. Maybe worth the trouble of writing, maybe not, but it isn't exactly boosting your caller into a radical new world of usability, it's a new name for an application of a standard library function to your iterator pair.
In C++0x, if you also make the iterator pair available through std::begin and std::end then caller can use range-based for, which takes them into the business logic just as quickly as ProcessAcmeStream does, perhaps quicker.
So I'd say, if it's possible to provide an iterator then provide it - the C++ standard does inversion of control for you if the caller wants to program that way. At least, for a case where the control is as simple as this it does.
From a design perspective, I would say that the iterator method is better, simply because it's easier and also more flexible; it's really annoying to make callback functions for without lambdas. (Now that C++0x will have lambda expressions, though, this may become less of a concern, but even still, the iterator method is more generic.)
Another issue with callbacks is cancellation. You can return a boolean value to indicate whether you'd like to cancel enumeration, but I always feel uneasy when the control is out of my hands, since you don't always know what might happen. Iterators don't have this issue.
And of course, there's always the issue that iterators can be random-access whereas callbacks aren't, so they're more extensible as well.

How to avoid out parameters?

I've seen numerous arguments that using a return value is preferable to out parameters. I am convinced of the reasons why to avoid them, but I find myself unsure if I'm running into cases where it is unavoidable.
Part One of my question is: What are some of your favorite/common ways of getting around using an out parameter? Stuff along the lines: Man, in peer reviews I always see other programmers do this when they could have easily done it this way.
Part Two of my question deals with some specific cases I've encountered where I would like to avoid an out parameter but cannot think of a clean way to do so.
Example 1:
I have a class with an expensive copy that I would like to avoid. Work can be done on the object and this builds up the object to be expensive to copy. The work to build up the data is not exactly trivial either. Currently, I will pass this object into a function that will modify the state of the object. This to me is preferable to new'ing the object internal to the worker function and returning it back, as it allows me to keep things on the stack.
class ExpensiveCopy //Defines some interface I can't change.
{
public:
ExpensiveCopy(const ExpensiveCopy toCopy){ /*Ouch! This hurts.*/ };
ExpensiveCopy& operator=(const ExpensiveCopy& toCopy){/*Ouch! This hurts.*/};
void addToData(SomeData);
SomeData getData();
}
class B
{
public:
static void doWork(ExpensiveCopy& ec_out, int someParam);
//or
// Your Function Here.
}
Using my function, I get calling code like this:
const int SOME_PARAM = 5;
ExpensiveCopy toModify;
B::doWork(toModify, SOME_PARAM);
I'd like to have something like this:
ExpensiveCopy theResult = B::doWork(SOME_PARAM);
But I don't know if this is possible.
Second Example:
I have an array of objects. The objects in the array are a complex type, and I need to do work on each element, work that I'd like to keep separated from the main loop that accesses each element. The code currently looks like this:
std::vector<ComplexType> theCollection;
for(int index = 0; index < theCollection.size(); ++index)
{
doWork(theCollection[index]);
}
void doWork(ComplexType& ct_out)
{
//Do work on the individual element.
}
Any suggestions on how to deal with some of these situations? I work primarily in C++, but I'm interested to see if other languages facilitate an easier setup. I have encountered RVO as a possible solution, but I need to read up more on it and it sounds like a compiler specific feature.
I'm not sure why you're trying to avoid passing references here. It's pretty much these situations that pass-by-reference semantics exist.
The code
static void doWork(ExpensiveCopy& ec_out, int someParam);
looks perfectly fine to me.
If you really want to modify it then you've got a couple of options
Move doWork so that's it's a member of ExpensiveCopy (which you say you can't do, so that's out)
return a (smart) pointer from doWork instead of copying it. (which you don't want to do as you want to keep things on the stack)
Rely on RVO (which others have pointed out is supported by pretty much all modern compilers)
Every useful compiler does RVO (return value optimization) if optimizations are enabled, thus the following effectively doesn't result in copying:
Expensive work() {
// ... no branched returns here
return Expensive(foo);
}
Expensive e = work();
In some cases compilers can apply NRVO, named return value optimization, as well:
Expensive work() {
Expensive e; // named object
// ... no branched returns here
return e; // return named object
}
This however isn't exactly reliable, only works in more trivial cases and would have to be tested. If you're not up to testing every case, just use out-parameters with references in the second case.
IMO the first thing you should ask yourself is whether copying ExpensiveCopy really is so prohibitive expensive. And to answer that, you will usually need a profiler. Unless a profiler tells you that the copying really is a bottleneck, simply write the code that's easier to read: ExpensiveCopy obj = doWork(param);.
Of course, there are indeed cases where objects cannot be copied for performance or other reasons. Then Neil's answer applies.
In addition to all comments here I'd mention that in C++0x you'd rarely use output parameter for optimization purpose -- because of Move Constructors (see here)
Unless you are going down the "everything is immutable" route, which doesn't sit too well with C++. you cannot easily avoid out parameters. The C++ Standard Library uses them, and what's good enough for it is good enough for me.
As to your first example: return value optimization will often allow the returned object to be created directly in-place, instead of having to copy the object around. All modern compilers do this.
What platform are you working on?
The reason I ask is that many people have suggested Return Value Optimization, which is a very handy compiler optimization present in almost every compiler. Additionally Microsoft and Intel implement what they call Named Return Value Optimization which is even more handy.
In standard Return Value Optimization your return statement is a call to an object's constructor, which tells the compiler to eliminate the temporary values (not necessarily the copy operation).
In Named Return Value Optimization you can return a value by its name and the compiler will do the same thing. The advantage to NRVO is that you can do more complex operations on the created value (like calling functions on it) before returning it.
While neither of these really eliminate an expensive copy if your returned data is very large, they do help.
In terms of avoiding the copy the only real way to do that is with pointers or references because your function needs to be modifying the data in the place you want it to end up in. That means you probably want to have a pass-by-reference parameter.
Also I figure I should point out that pass-by-reference is very common in high-performance code for specifically this reason. Copying data can be incredibly expensive, and it is often something people overlook when optimizing their code.
As far as I can see, the reasons to prefer return values to out parameters are that it's clearer, and it works with pure functional programming (you can get some nice guarantees if a function depends only on input parameters, returns a value, and has no side effects). The first reason is stylistic, and in my opinion not all that important. The second isn't a good fit with C++. Therefore, I wouldn't try to distort anything to avoid out parameters.
The simple fact is that some functions have to return multiple things, and in most languages this suggests out parameters. Common Lisp has multiple-value-bind and multiple-value-return, in which a list of symbols is provided by the bind and a list of values is returned. In some cases, a function can return a composite value, such as a list of values which will then get deconstructed, and it isn't a big deal for a C++ function to return a std::pair. Returning more than two values this way in C++ gets awkward. It's always possible to define a struct, but defining and creating it will often be messier than out parameters.
In some cases, the return value gets overloaded. In C, getchar() returns an int, with the idea being that there are more int values than char (true in all implementations I know of, false in some I can easily imagine), so one of the values can be used to denote end-of-file. atoi() returns an integer, either the integer represented by the string it's passed or zero if there is none, so it returns the same thing for "0" and "frog". (If you want to know whether there was an int value or not, use strtol(), which does have an out parameter.)
There's always the technique of throwing an exception in case of an error, but not all multiple return values are errors, and not all errors are exceptional.
So, overloaded return values causes problems, multiple value returns aren't easy to use in all languages, and single returns don't always exist. Throwing an exception is often inappropriate. Using out parameters is very often the cleanest solution.
Ask yourself why you have some method that performs work on this expensive to copy object in the first place. Say you have a tree, would you send the tree off into some building method or else give the tree its own building method? Situations like this come up constantly when you have a little bit off design but tend to fold into themselves when you have it down pat.
I know in practicality we don't always get to change every object at all, but passing in out parameters is a side effect operation, and it makes it much harder to figure out what's going on, and you never really have to do it (except as forced by working within others' code frameworks).
Sometimes it is easier, but it's definitely not desirable to use it for no reason (if you've suffered through a few large projects where there's always half a dozen out parameters you'll know what I mean).