Aggregating contributions from multiple donors

Aggregating contributions from multiple donors - c++

As I try to modernize my C++ skills, I keep encountering this situation where "the STL way" isn't obvious to me.
I have an object that wants to gather contributions from multiple sources into a container (typically a std::vector). Each source is an object, and each of those objects provides a method get_contributions() that returns any number of contributions (from 0 to many). The gatherer will call get_contributions() on each contributor and aggregate the results into a single collection.
The question is, what's the best signature for get_contributions()?
Option 1: std::vector<contribution> get_contributions() const
This is the most straightforward, but it leads to lots of copying as the gatherer copies each set of results into the master collection. And yes, performance matters here. For example, if the contributors were geometric models and getting contributions amounted to tesselating them into triangles for rendering, then speed would count and the number of contributions could be enormous.
Option 2: template <typename container> void get_contributions(container &target) const
This allows each contributor to add its contributions directly to the master container by calling target.push_back(foo). The drawback here is that we're exposing the container to other types of inspection and manipulation. I'd prefer to keep the interface as narrow as possible.
Option 3: template <typename out_it> void get_contributions(out_it &it) const
In this solution, the aggregator would pass a std::back_insert_iterator for the master collection, and the individual contributors would do *it++ = foo; for each contribution. This is the best I've come up with so far, but I'm left with the feeling that there must be a more elegant way. The back_insert_iterator feels like a kludge.
Is Option 3 the best, or is there a better approach? Does this gathering pattern have a name?

There's a fourth, that would require you to define you iterator ranges. Check out Alexandrescu's presentation on "Iterators must go".

Option 3 is the most idiomatic way. Note that you don't have to use back_insert_iterator. If you know how many elements are going to be added, you can resize the vector, and then provide a regular vector iterator instead. It won't call push_back then (and potentially save you some copying)
back_insert_iterator's main advantage is that it expands the vector as needed.
It's not a kludge though. It's designed for this exact purpose.
One minor adjustment would be to take pass the iterator by value, and then return it when the function returns.

I would say there are two idiomatic STL ways: your Option 3 (taking an output iterator, which you'd pass by value, by the way) and taking a functor which you would call with each of the contributions.
Each of these is only appropriate if it is suitable to implement get_contributions as a template, of course.

Related

Searching data using different keys

I am no expert in C++ and STL.
I use a structure in a Map as data. Key is some class C1.
I would like to access the same data but using a different key C2 too (where C1 and C2 are two unrelated classes).
Is this possible without duplicating the data?
I tried searching in google, but had a tough time finding an answer that I could understand.
This is for an embedded target where boost libraries are not supported.
Can somebody offer help?

You may store pointers to Data as std::map values, and you can have two maps with different keys pointing to the same data.
I think a smart pointer like std::shared_ptr is a good option in this case of shared ownership of data:
#include <map> // for std::map
#include <memory> // for std::shared_ptr
....
std::map<C1, std::shared_ptr<Data>> map1;
std::map<C2, std::shared_ptr<Data>> map2;
Instances of Data can be allocated using std::make_shared().

Not in the Standard Library, but Boost offers boost::multi_index

Two keys of different types
I must admit I've misread a bit, and didn't really notice you want 2 keys of different types, not values. The solution for that will base on what's below, though. Other answers have pretty much what will be needed for that, I'd just add that you could make an universal lookup function: (C++14-ish pseudocode).
template<class Key>
auto lookup (Key const& key) { }
And specialize it for your keys (arguably easier than SFINAE)
template<>
auto lookup<KeyA> (KeyA const& key) { return map_of_keys_a[key]; }
And the same for KeyB.
If you wanted to encapsulate it in a class, an obvious choice would be to change lookup to operator[].
Key of the same type, but different value
Idea 1
The simplest solution I can think of in 60 seconds: (simplest meaning exactly that it should be really thought through). I'd also switch to unordered_map as default.
map<Key, Data> data;
map<Key2, Key> keys;
Access via data[keys["multikey"]].
This will obviously waste some space (duplicating objects of Key type), but I am assuming they are much smaller than the Data type.
Idea 2
Another solution would be to use pointers; then the only cost of duplicate is a (smart) pointer:
map<Key, shared_ptr<Data>> data;
Object of Data will be alive as long as there is at least one key pointing to it.

What I usually do in these cases is use non-owned pointers. I store my data in a vector:
std::vector<Data> myData;
And then I map pointers to each element. Since it is possible that pointers are invalidated because of the future growth of the vector, though, I will choose to use the vector indexes in this case.
std::map<Key1, int> myMap1;
std::map<Key2, int> myMap2;
Don't expose the data containers to your clients. Encapsulate element insertion and removal in specific functions, which insert everywhere and remove everywhere.

Bartek's "Idea 1" is good (though there's no compelling reason to prefer unordered_map to map).
Alternatively, you could have a std::map<C2, Data*>, or std::map<C2, std::map<C1, Data>::iterator> to allow direct access to Data objects after one C2-keyed search, but then you'd need to be more careful not to access invalid (erased) Data (or more precisely, to erase from both containers atomically from the perspective of any other users).
It's also possible for one or both maps to move to shared_ptr<Data> - the other could use weak_ptr<> if that's helpful ownership-wise. (These are in the C++11 Standard, otherwise the obvious source - boost - is apparently out for you, but maybe you've implemented your own or selected another library? Pretty fundamental classes for modern C++).
EDIT - hash tables versus balanced binary trees
This isn't particularly relevant to the question, but has received comments/interest below and I need more space to address it properly. Some points:
1) Bartek's casually advising to change from map to unordered_map without recommending an impact study re iterator/pointer invalidation is dangerous, and unwarranted given there's no reason to think it's needed (the question doesn't mention performance) and no recommendation to profile.
3) Relatively few data structures in a program are important to performance-critical behaviours, and there are plenty of times when the relative performance of one versus another is of insignificant interest. Supporting this claim - masses of code were written with std::map to ensure portability before C++11, and perform just fine.
4) When performance is a serious concern, the advice should be "Care => profile", but saying that a rule of thumb is ok - in line with "Don't pessimise prematurely" (see e.g. Sutter and Alexandrescu's C++ Coding Standards) - and if asked for one here I'd happily recommend unordered_map by default - but that's not particularly reliable. That's a world away from recommending every std::map usage I see be changed.
5) This container performance side-track has started to pull in ad-hoc snippets of useful insight, but is far from being comprehensive or balanced. This question is not a sane venue for such a discussion. If there's another question addressing this where it makes sense to continue this discussion and someone asks me to chip in, I'll do it sometime over the next month or two.

You could consider having a plain std::list holding all your data, and then various std::map objects mapping arbitrary key values to iterators pointing into the list:
std::list<Data> values;
std::map<C1, std::list<Data>::iterator> byC1;
std::map<C2, std::list<Data>::iterator> byC2;
I.e. instead of fiddling with more-or-less-raw pointers, you use plain iterators. And iterators into a std::list have very good invalidation guarantees.

I had the same problem, at first holding two map for shared pointers sound very cool. But you will still need to manage this two maps(inserting, removing etc...).
Than I came up with other way of doing this.
My reason was; accessing a data with x-y or radius-angle. Think like each point will hold data but point could be described as cartesian x,y or radius-angle .
So I wrote a struct like
struct MyPoint
{
std::pair<int, int> cartesianPoint;
std::pair<int, int> radianPoint;
bool operator== (const MyPoint& rhs)
{
if (cartesianPoint == rhs.cartesianPoint || radianPoint == rhs.radianPoint)
return true;
return false;
}
}
After that I could used that as key,
std::unordered_map<MyPoint, DataType> myMultIndexMap;
I am not sure if your case is the same or adjustable to this scenerio but it can be a option.

Is using a map where value is std::shared_ptr a good design choice for having multi-indexed lists of classes?

problem is simple:
We have a class that has members a,b,c,d...
We want to be able to quickly search(key being value of one member) and update class list with new value by providing current value for a or b or c ...
I thought about having a bunch of
std::map<decltype(MyClass.a/*b,c,d*/),shared_ptr<MyClass>>.
1) Is that a good idea?
2) Is boost multi index superior to this handcrafted solution in every way?
PS SQL is out of the question for simplicity/perf reasons.

Boost MultiIndex may have a distinct disadvantage that it will attempt to keep all indices up to date after each mutation of the collection.
This may be a large performance penalty if you have a data load phase with many separate writes.
The usage patterns of Boost Multi Index may not fit with the coding style (and taste...) of the project (members). This should be a minor disadvantage, but I thought I'd mention it
As ildjarn mentioned, Boost MI doesn't support move semantics as of yet
Otherwise, I'd consider Boost MultiIndex superior in most occasions, since you'd be unlikely to reach the amount of testing it received.

You want want to consider containing all of your maps in a single class, arbitrarily deciding on one of the containers as the one that stores the "real" objects, and then just use a std::map with a mapped type of raw pointers to elements of the first std::map.
This would be a little more difficult if you ever need to make copies of those maps, however.

Mapping vectors of arbitrary type

I need to store a list vectors of different types, each to be referenced by a string identifier. For now, I'm using std::map with std::string as the key and boost::any as it's value (example implementation posted here).
I've come unstuck when trying to run a method on all the stored vector, e.g.:
std::map<std::string, boost::any>::iterator it;
for (it = map_.begin(); it != map_.end(); ++it) {
it->second.reserve(100); // FAIL: refers to boost::any not std::vector
}
My questions:
Is it possible to cast boost::any to an arbitrary vector type so I can execute its methods?
Is there a better way to map vectors of arbitrary types and retrieve then later on with the correct type?
At present, I'm toying with an alternative implementation which replaces boost::any with a pointer to a base container class as suggested in this answer. This opens up a whole new can of worms with other issues I need to work out. I'm happy to go down this route if necessary but I'm still interested to know if I can make it work with boost::any, of if there are other better solutions.
P.S. I'm a C++ n00b novice (and have been spoilt silly by Python's dynamic typing for far too long), so I may well be going about this the wrong way. Harsh criticism (ideally followed by suggestions) is very welcome.
The big picture:
As pointed out in comments, this may well be an XY problem so here's an overview of what I'm trying to achieve.
I'm writing a task scheduler for a simulation framework that manages the execution of tasks; each task is an elemental operation on a set of data vectors. For example, if task_A is defined in the model to be an operation on "x"(double), "y"(double), "scale"(int) then what we're effectively trying to emulate is the execution of task_A(double x[i], double y[i], int scale[i]) for all values of i.
Every task (function) operate on different subsets of data so these functions share a common function signature and only have access to data via specific APIs e.g. get_int("scale") and set_double("x", 0.2).
In a previous incarnation of the framework (written in C), tasks were scheduled statically and the framework generated code based on a given model to run the simulation. The ordering of tasks is based on a dependency graph extracted from the model definition.
We're now attempting to create a common runtime for all models with a run-time scheduler that executes tasks as their dependencies are met. The move from generating model-specific code to a generic one has brought about all sorts of pain. Essentially, I need to be able to generically handle heterogenous vectors and access them by "name" (and perhaps type_info), hence the above question.
I'm open to suggestions. Any suggestion.

Looking through the added detail, my immediate reaction would be to separate the data out into a number of separate maps, with the type as a template parameter. For example, you'd replace get_int("scale") with get<int>("scale") and set_double("x", 0.2) with set<double>("x", 0.2);
Alternatively, using std::map, you could pretty easily change that (for one example) to something like doubles["x"] = 0.2; or int scale_factor = ints["scale"]; (though you may need to be a bit wary with the latter -- if you try to retrieve a nonexistent value, it'll create it with default initialization rather than signaling an error).
Either way, you end up with a number of separate collections, each of which is homogeneous, instead of trying to put a number of collections of different types together into one big collection.
If you really do need to put those together into a single overall collection, I'd think hard about just using a struct, so it would become something like vals.doubles["x"] = 0.2; or int scale_factor = vals.ints["scale"];
At least offhand, I don't see this losing much of anything, and by retaining static typing throughout, it certainly seems to fit better with how C++ is intended to work.

Why is the use of tuples in C++ not more common?

Why does nobody seem to use tuples in C++, either the Boost Tuple Library or the standard library for TR1? I have read a lot of C++ code, and very rarely do I see the use of tuples, but I often see lots of places where tuples would solve many problems (usually returning multiple values from functions).
Tuples allow you to do all kinds of cool things like this:
tie(a,b) = make_tuple(b,a); //swap a and b
That is certainly better than this:
temp=a;
a=b;
b=temp;
Of course you could always do this:
swap(a,b);
But what if you want to rotate three values? You can do this with tuples:
tie(a,b,c) = make_tuple(b,c,a);
Tuples also make it much easier to return multiple variable from a function, which is probably a much more common case than swapping values. Using references to return values is certainly not very elegant.
Are there any big drawbacks to tuples that I'm not thinking of? If not, why are they rarely used? Are they slower? Or is it just that people are not used to them? Is it a good idea to use tuples?

A cynical answer is that many people program in C++, but do not understand and/or use the higher level functionality. Sometimes it is because they are not allowed, but many simply do not try (or even understand).
As a non-boost example: how many folks use functionality found in <algorithm>?
In other words, many C++ programmers are simply C programmers using C++ compilers, and perhaps std::vector and std::list. That is one reason why the use of boost::tuple is not more common.

Because it's not yet standard. Anything non-standard has a much higher hurdle. Pieces of Boost have become popular because programmers were clamoring for them. (hash_map leaps to mind). But while tuple is handy, it's not such an overwhelming and clear win that people bother with it.

The C++ tuple syntax can be quite a bit more verbose than most people would like.
Consider:
typedef boost::tuple<MyClass1,MyClass2,MyClass3> MyTuple;
So if you want to make extensive use of tuples you either get tuple typedefs everywhere or you get annoyingly long type names everywhere. I like tuples. I use them when necessary. But it's usually limited to a couple of situations, like an N-element index or when using multimaps to tie the range iterator pairs. And it's usually in a very limited scope.
It's all very ugly and hacky looking when compared to something like Haskell or Python. When C++0x gets here and we get the 'auto' keyword tuples will begin to look a lot more attractive.
The usefulness of tuples is inversely proportional to the number of keystrokes required to declare, pack, and unpack them.

For me, it's habit, hands down: Tuples don't solve any new problems for me, just a few I can already handle just fine. Swapping values still feels easier the old fashioned way -- and, more importantly, I don't really think about how to swap "better." It's good enough as-is.
Personally, I don't think tuples are a great solution to returning multiple values -- sounds like a job for structs.

But what if you want to rotate three values?
swap(a,b);
swap(b,c); // I knew those permutation theory lectures would come in handy.
OK, so with 4 etc values, eventually the n-tuple becomes less code than n-1 swaps. And with default swap this does 6 assignments instead of the 4 you'd have if you implemented a three-cycle template yourself, although I'd hope the compiler would solve that for simple types.
You can come up with scenarios where swaps are unwieldy or inappropriate, for example:
tie(a,b,c) = make_tuple(b*c,a*c,a*b);
is a bit awkward to unpack.
Point is, though, there are known ways of dealing with the most common situations that tuples are good for, and hence no great urgency to take up tuples. If nothing else, I'm not confident that:
tie(a,b,c) = make_tuple(b,c,a);
doesn't do 6 copies, making it utterly unsuitable for some types (collections being the most obvious). Feel free to persuade me that tuples are a good idea for "large" types, by saying this ain't so :-)
For returning multiple values, tuples are perfect if the values are of incompatible types, but some folks don't like them if it's possible for the caller to get them in the wrong order. Some folks don't like multiple return values at all, and don't want to encourage their use by making them easier. Some folks just prefer named structures for in and out parameters, and probably couldn't be persuaded with a baseball bat to use tuples. No accounting for taste.

As many people pointed out, tuples are just not that useful as other features.
The swapping and rotating gimmicks are just gimmicks. They are utterly confusing to those who have not seen them before, and since it is pretty much everyone, these gimmicks are just poor software engineering practice.
Returning multiple values using tuples is much less self-documenting then the alternatives -- returning named types or using named references. Without this self-documenting, it is easy to confuse the order of the returned values, if they are mutually convertible, and not be any wiser.

Not everyone can use boost, and TR1 isn't widely available yet.

When using C++ on embedded systems, pulling in Boost libraries gets complex. They couple to each other, so library size grows. You return data structures or use parameter passing instead of tuples. When returning tuples in Python the data structure is in the order and type of the returned values its just not explicit.

You rarely see them because well-designed code usually doesn't need them- there are not to many cases in the wild where using an anonymous struct is superior to using a named one.
Since all a tuple really represents is an anonymous struct, most coders in most situations just go with the real thing.
Say we have a function "f" where a tuple return might make sense. As a general rule, such functions are usually complicated enough that they can fail.
If "f" CAN fail, you need a status return- after all, you don't want callers to have to inspect every parameter to detect failure. "f" probably fits into the pattern:
struct ReturnInts ( int y,z; }
bool f(int x, ReturnInts& vals);
int x = 0;
ReturnInts vals;
if(!f(x, vals)) {
..report error..
..error handling/return...
}
That isn't pretty, but look at how ugly the alternative is. Note that I still need a status value, but the code is no more readable and not shorter. It is probably slower too, since I incur the cost of 1 copy with the tuple.
std::tuple<int, int, bool> f(int x);
int x = 0;
std::tuple<int, int, bool> result = f(x); // or "auto result = f(x)"
if(!result.get<2>()) {
... report error, error handling ...
}
Another, significant downside is hidden in here- with "ReturnInts" I can add alter "f"'s return by modifying "ReturnInts" WITHOUT ALTERING "f"'s INTERFACE. The tuple solution does not offer that critical feature, which makes it the inferior answer for any library code.

Certainly tuples can be useful, but as mentioned there's a bit of overhead and a hurdle or two you have to jump through before you can even really use them.
If your program consistently finds places where you need to return multiple values or swap several values, it might be worth it to go the tuple route, but otherwise sometimes it's just easier to do things the classic way.
Generally speaking, not everyone already has Boost installed, and I certainly wouldn't go through the hassle of downloading it and configuring my include directories to work with it just for its tuple facilities. I think you'll find that people already using Boost are more likely to find tuple uses in their programs than non-Boost users, and migrants from other languages (Python comes to mind) are more likely to simply be upset about the lack of tuples in C++ than to explore methods of adding tuple support.

As a data-store std::tuple has the worst characteristics of both a struct and an array; all access is nth position based but one cannot iterate through a tuple using a for loop.
So if the elements in the tuple are conceptually an array, I will use an array and if the elements are not conceptually an array, a struct (which has named elements) is more maintainable. ( a.lastname is more explanatory than std::get<1>(a)).
This leaves the transformation mentioned by the OP as the only viable usecase for tuples.

I have a feeling that many use Boost.Any and Boost.Variant (with some engineering) instead of Boost.Tuple.

What is the usefulness of project1st<Arg1, Arg2> in the STL?

I was browsing the SGI STL documentation and ran into project1st<Arg1, Arg2>. I understand its definition, but I am having a hard time imagining a practical usage.
Have you ever used project1st or can you imagine a scenario?

A variant of project1st (taking a std::pair, and returning .first) is quite useful. You can use it in combination with std::transform to copy the keys from a std::map<K,V> to a std::vector<K>. Similarly, a variant of project2nd can be used to copy the values from a map to a vector<V>.
As it happens, none of the standard algorithms really benefits from project1st. The closest is partial_sum(project1st), which would set all output elements to the first input element. It mainly exists because the STL is heavily founded in mathematical set theory, and there operations like project1st are basic building blocks.

My guess is that if you were using the strategy pattern and had a situation where you needed to pass an identity object, this would be a good choice. For example, there might be a case where an algorithm takes several such objects, and perhaps it is possible that you want one of them to do nothing under some situation.

Parallel programming. Imagine a situation where two processes come up with two valid but different results for a given computation, and you need to force them to be the same. project1st/2nd provides a very convenient way to perform this operation on a whole container, using an appropriate parallel call that takes a functor as an argument.

I assume that someone had a practical use for it, or it wouldn't have been written, but I'm drawing a blank on what it might have been. Presumably its use-case is similar to the identity function that the description mentions, where there's no real need for processing but the syntax requires a functor anyway.
The example on that same page suggests using it with the two-container form of std::transform, but if I'm not mistaken, the way they're using it is functionally identical to std::copy, so I don't see the point.
It looks like a solution in search of a problem to me.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js