boost lambda collection size evaluation

boost lambda collection size evaluation - c++

I have a function of the form:
void DoSomething(const boost::function<bool ()>& condition, other stuff);
This function does some work and returns only when the condition is true. The condition has been expressed as a functor argument because I want to supply different conditions at different call sites.
Now, this is fairly straightforward to use directly, but it requires declaring lots of little throwaway functions or functor objects, which I'd like to avoid if possible. I've been looking at Boost's lambda library for possible ways to do away with these, but I think I'm missing something fundamental; I just can't get it to do what I want.
One case that's stumped me at the moment: I have a std::vector collection called data; the condition that I'm after is when the size() of that collection reaches a certain threshold. Essentially, then, I want my condition functor to return true when data.size() >= threshold and false otherwise. But I've been having trouble expressing that in lambda syntax.
The best that I've been able to come up with thus far (which at least compiles, though it doesn't work) is this:
boost::function<bool (size_t)> ge = boost::bind(std::greater_equal<size_t>(),
_1, threshold);
boost::function<size_t ()> size = boost::bind(&std::vector<std::string>::size,
data);
DoSomething(boost::lambda::bind(ge, boost::lambda::bind(size)), other stuff);
On entry to DoSomething, the size is 0 -- and even though the size increases during the course of running, the calls to condition() always seem to get a size of 0. Tracing it through (which is a bit tricky through Boost's internals), while it does appear to be calling greater_equal each time condition() is evaluated, it doesn't appear to be calling size().
So what fundamental thing have I completely messed up? Is there a simpler way of expressing this sort of thing (while still keeping the code as inline as possible)?
I'd ideally like to get it as close as possible to the C# equivalent code fluency:
DoSomething(delegate() { return data.size() >= threshold; }, other stuff);
DoSomething(() => (data.size() >= threshold), other stuff);

The problem is, that the lambda function stores a copy of the data vector, not a reference. So size() is called on the copy, not the original object that you are modifying. This can be solved by wrapping data with boost::ref, which stores a reference instead:
boost::function<size_t ()> size = boost::bind(&std::vector<std::string>::size,
boost::ref(data));
You can also use the normal >= operator instead of std::greater_equal<> in the definition of your lambda function and combine it all together:
boost::function<bool ()> cond =
(boost::bind(&std::vector<std::string>::size, boost::ref(data))
>= threshold);
DoSomething(cond, other stuff);

Related

C++ - how to buffer calc results faster than using unordered_map

I read a lot about unordered_map not being very fast but I wonder what's the best alternative to do this:
I need to buffer calculation results for a function of an integer argument. I don't know ahead of time what range or interval will be requested. Storing in a vector with maximal resolution would cost way too much memory.
So I'm using
unordered_map<unsigned long, pair<T, long>>
Where the key is the argument of the function to be computed, the first of the pair the result of the computation of type T, and the second of the pair a version information for that computation.
Only if the unordered_map does not contain the element or it contains it but the version is outdated, the computation is carried out and then added to the unordered_map. The lookup function looks something like this:
template<typename T> class BufferClass{
long MyVersion;
unordered_map<unsigned long, pair<T,long>> Buffer;
public:
BufferClass(): MyVersion{1} {};
T* GetIfValid(unsigned long index)
{
if (!Buffer.count(index)) return nullptr;
pair <T,long> &x{Buffer.at(index)};
if (x.second!=MyVersion) return nullptr;
return &x.first;
}
/* ...Functions to set elements...*/
}
As you can see, I combined element validity check and retrieval in one function, so that I only need one lookup for both.
The profiler shows most of the computation time is used up in the hash function __constrain_hash related to unordered_map.
What would be the fastest way to store and retrieve values like that? The list of stored indices is expected to be non-continuous (there will be a lot of "holes") and first and last index are also mostly unknown.
T will generally be a "small" data type (like double or complex).
Thanks!
Martin

In your code, there could be two hash lookup in one query, one invoked in count() and the other invoked in at(). It is redundant, use unordered_map::find instead, see here.
Sample code:
const auto iter = Buffer.find(index);
if(iter != Buffer.end()) //Found something, so the return value is not end()
{
return &(iter->first);
}
else return nullptr;
In my opinion, unordered_map is slow but not that slow, for 99.9% usage is fast enough. You may want to check whether you call this function (unnecessarily) too many times. Using other fast implementation is not free, it could bloat your code base, harm your application's compatibility with different host systems or so on. If you think std::unordered_map is unreasonably slow, it is almost always because you got somewhere wrong in your work. (either your estimation or your code implementation)
BTW, another thing to mention: You said T is a small data type right? then return its value instead of pointer to it, it is faster and safer.

One thing that strikes me as odd about your implementation is the following two lines:
if (!Buffer.count(index)) return nullptr;
pair <T,long> &x{Buffer.at(index)};
This code is checking if the key exists, then throws away the result and searches for the same key again with bounds checking to boot. I think you'll find searching once with std::unordered_map<unsigned long, std::pair<T, long>>::find and reusing the result to be preferable:
auto it = Buffer.find(index);
if (it == Buffer.end()) return nullptr;
auto& x = *it;

Using the Lambda Function with Objects and for_each loops

I'm trying to better understand a few fundamental concepts about working with the lambda functions with a vector of objects and std::for_each() loops.
I'm attempting to pass the const int contents of someObjectVector.end()->someObjectVectorMethod() into i, but just can't find a way to make it happen.
I also want to use those iterators to set the parameters of the std::for_each() loop. Is this just not possible, or am I approaching this the wrong way syntactically?
std::for_each(someObjectVector.begin()->someObjectVectorMethod(), (someObjectVector.end()->getSomeObjectVectorData(), [&](int i)
{
someObjectVector[0].setSomeObjectVectorDate() + i;
});

Your syntax is definitely wrong.
As input, std::for_each() expects
a range of elements denoted by 2 input iterators
a unary function object that each element in that range will be passed to as the sole input parameter.
In your code, someObjectVectorMethod() and getSomeObjectVectorData() are not iterators, and your lambda doesn't accept a vector element as input.
What are you TRYING to accomplish with your code? You probably need something more like this instead:
std::vector<YourObjectType> someObjectVector;
...
std::for_each(someObjectVector.begin(), someObjectVector.end(),
[](YourObjectType &obj) {
obj.setSomeObjectVectorData(...);
}
);
Or:
std::vector<YourObjectType*> someObjectVector;
...
std::for_each(someObjectVector.begin(), someObjectVector.end(),
[](YourObjectType *obj) {
obj->setSomeObjectVectorData(...);
}
);
Depending on how someObjectVector is actually declared in your code.
Tweak the above lambdas to suit your actual needs.

How do you create an array of member function pointers with arguments?

I am trying to create a jump table for a fuzzy controller. Basically, I have a lot of functions that take in a string and return a float, and I want to be able to do something along the lines:
float Defuzzify(std::string varName, DefuzzificationMethod defuzz)
{
return functions[defuzz](varName);
}
where DefuzzificationMethod is an enum. The objective is to avoid a switch statement and have a O(1) operation.
What I have right now is:
float CenterOfGravity(std::string varName);
std::vector<std::function<float (std::string)>> defuzzifiers;
Then I try to initialize it in the constructor with:
defuzzifiers.reserve(NUMBER_OF_DEFUZZIFICATION_METHODS);
defuzzifiers[DEFUZZ_COG] = std::bind(&CenterOfGravity, std::placeholders::_1);
This is making the compiler throw about 100 errors about enable_if (which I don't use anywhere, so I assume std does). Is there a way to make this compile ? Moreover, is there a way to make this a static vector, since every fuzzy controller will essentially have the same vector ?
Thanks in advance

Reserve just makes sure there's enough capacity, it doesn't actually mak the vector's size big enough. What you want to do is:
// construct a vector of the correct size
std::vector<std::function<float (std::string)>> defuzzifiers(NUMBER_OF_DEFUZZIFICATION_METHODS);
// now assign into it...
// if CentorOfGravity is a free function, just simple = works
defuzzifiers[DEFUZZ_COG] = CenterOfGravity;
// if it's a method
defuzzifiers[DEFUZZ_COG] = std::bind(&ThisType::CenterOfGravity, this, std::placeholders::_1);
Now this might leave you some holes which don't actually have a function defined, so maybe you want to provide a default function of sorts, which the vector constructor allows too
std::vector<std::function<float (std::string)>> defuzzifiers(
NUMBER_OF_DEFUZZIFICATION_METHODS,
[](std::string x) { return 0f; }
);
An unrelated note, you probably want your functions to take strings by const-ref and not by value, as copying strings is expensive.

How do you use ranges in D?

Whenever I try to use ranges in D, I fail miserably.
What is the proper way to use ranges in D? (See inline comments for my confusion.)
void print(R)(/* ref? auto ref? neither? */ R r)
{
foreach (x; r)
{
writeln(x);
}
// Million $$$ question:
//
// Will I get back the same things as last time?
// Do I have to check for this every time?
foreach (x; r)
{
writeln(x);
}
}
void test2(alias F, R)(/* ref/auto ref? */ R items)
{
// Will it consume items?
// _Should_ it consume items?
// Will the caller be affected? How do I know?
// Am I supposed to?
F(items);
}

You should probably read this tutorial on ranges if you haven't.
When a range will and won't be consumed depends on its type. If it's an input range and not a forward range (e.g if it's an input stream of some kind - std.stdio.byLine would be one example of this), then iterating over it in any way shape or form will consume it.
//Will consume
auto result = find(inRange, needle);
//Will consume
foreach(e; inRange) {}
If it's a forward range and it's a reference type, then it will be consumed whenever you iterate over it, but you can call save to get a copy of it, and consuming the copy won't consume the original (nor will consuming the original consume the copy).
//Will consume
auto result = find(refRange, needle);
//Will consume
foreach(e; refRange) {}
//Won't consume
auto result = find(refRange.save, needle);
//Won't consume
foreach(e; refRange.save) {}
Where things get more interesting is forward ranges which are value types (or arrays). They act the same as any forward range with regards to save, but they differ in that simply passing them to a function or using them in a foreach implicitly saves them.
//Won't consume
auto result = find(valRange, needle);
//Won't consume
foreach(e; valRange) {}
//Won't consume
auto result = find(valRange.save, needle);
//Won't consume
foreach(e; valRange.save) {}
So, if you're dealing with an input range which isn't a forward range, it will be consumed regardless. And if you're dealing with a forward range, you need to call save if you want want to guarantee that it isn't consumed - otherwise whether it's consumed or not depends on its type.
With regards to ref, if you declare a range-based function to take its argument by ref, then it won't be copied, so it won't matter whether the range passed in is a reference type or not, but it does mean that you can't pass an rvalue, which would be really annoying, so you probably shouldn't use ref on a range parameter unless you actually need it to always mutate the original (e.g. std.range.popFrontN takes a ref because it explicitly mutates the original rather than potentially operating on a copy).
As for calling range-based functions with forward ranges, value type ranges are most likely to work properly, since far too often, code is written and tested with value type ranges and isn't always properly tested with reference types. Unfortunately, this includes Phobos' functions (though that will be fixed; it just hasn't been properly tested for in all cases yet - if you run into any cases where a Phobos function doesn't work properly with a reference type forward range, please report it). So, reference type forward ranges don't always work as they should.

Sorry, I can't fit this into a comment :D. Consider if Range were defined this way:
interface Range {
void doForeach(void delegate() myDel);
}
And your function looked like this:
void myFunc(Range r) {
doForeach(() {
//blah
});
}
You wouldn't expect anything strange to happen when you reassigned r, nor would you expect
to be able to modify the caller's Range. I think the problem is that you are expecting your template function to be able to account for all of the variation in range types, while still taking advantage of the specialization. That doesn't work. You can apply a contract to the template to take advantage of the specialization, or use only the general functionality.
Does this help at all?
Edit (what we've been talking about in comments):
void funcThatDoesntRuinYourRanges(R)(R r)
if (isForwardRange(r)) {
//do some stuff
}
Edit 2 std.range It looks like isForwardRange simply checks whether save is defined, and save is just a primitive that makes a sort of un-linked copy of the range. The docs specify that save is not defined for e.g. files and sockets.

The short of it; ranges are consumed. This is what you should expect and plan for.
The ref on the foreach plays no role in this, it only relates to the value returned by the range.
The long; ranges are consumed, but may get copied. You'll need to look at the documentation to decide what will happen. Value types get copied and thus a range may not be modified when passed to a function, but you can not rely on if the range comes as a struct as the data stream my be a reference, e.g. FILE. And of course a ref function parameter will add to the confusion.

Say your print function looks like this:
void print(R)(R r) {
foreach (x; r) {
writeln(x);
}
}
Here, r is passed into the function using reference semantics, using the generic type R: so you don't need ref here (and auto will give a compilation error). Otherwise, this will print the contents of r, item-by-item. (I seem to remember there being a way to constrain the generic type to that of a range, because ranges have certain properties, but I forget the details!)
Anyway:
auto myRange = [1, 2, 3];
print(myRange);
print(myRange);
...will output:
1
2
3
1
2
3
If you change your function to (presuming x++ makes sense for your range):
void print(R)(R r) {
foreach (x; r) {
x++;
writeln(x);
}
}
...then each element will be increased before being printed, but this is using copy semantics. That is, the original values in myRange won't be changed, so the output will be:
2
3
4
2
3
4
If, however, you change your function to:
void print(R)(R r) {
foreach (ref x; r) {
x++;
writeln(x);
}
}
...then the x is reverted to reference semantics, which refer to the original elements of myRange. Hence the output will now be:
2
3
4
3
4
5

Sort a vector on a value calculated on each element, without performing the calculation multiple times per element

can anyone recommend a nice and tidy way to achieve this:
float CalculateGoodness(const Thing& thing);
void SortThings(std::vector<Thing>& things)
{
// sort 'things' on value returned from CalculateGoodness, without calling CalculateGoodness more than 'things.size()' times
}
Clearly I could use std::sort with a comparison function that calls CalculateGoodness, but then that will get called several times per Thing as it is compared to other elements, which is no good if CalculateGoodness is expensive. I could create another std::vector just to store the ratings and std::sort that, and rearrange things in the same way, but I can't see a tidy way of doing that. Any ideas?
Edit: Apologies, I should have said without modifying Thing, else it's a fairly easy problem to solve :)

I can think of a simple transformation (well two) to get what you want. You could use std::transform with suitable predicates.
std::vector<Thing> to std::vector< std::pair<Result,Thing> >
sort the second vector (works because a pair is sorted by it first member)
reverse transformation
Tadaam :)
EDIT: Minimizing the number of copies
std::vector<Thing> to std::vector< std::pair<Result,Thing*> >
sort the second vector
transform back into a secondary vector (local)
swap the original and local vectors
This way you would only copy each Thing once. Notably remember that sort perform copies so it could be worth using.
And because I am feeling grant:
typedef std::pair<float, Thing*> cached_type;
typedef std::vector<cached_type> cached_vector;
struct Compute: std::unary_function< Thing, cached_type >
{
cached_type operator()(Thing& t) const
{
return cached_type(CalculateGoodness(t), &t);
}
};
struct Back: std::unary_function< cached_type, Thing >
{
Thing operator()(cached_type t) const { return *t.second; }
};
void SortThings(std::vector<Thing>& things)
{
// Reserve to only allocate once
cached_vector cache; cache.reserve(things.size());
// Compute Goodness once and for all
std::transform(things.begin(), things.end(),
std::back_inserter(cache), Compute());
// Sort
std::sort(cache.begin(), cache.end());
// We have references inside `things` so we can't modify it
// while dereferencing...
std::vector<Thing> local; local.reserve(things.size());
// Back transformation
std::transform(cache.begin(), cache.end(),
std::back_inserter(local), Back());
// Put result in `things`
swap(things, local);
}
Provided with the usual caveat emptor: off the top of my head, may kill kittens...

You can have a call to CalculateGoodness that you call for each element before sorting, and then CalculateGoodness simply updates an internal member variable. Then you can sort based on that member variable.
Another possibility if you can't modify your type, is storing some kind of std::map for your objects and their previously calculated values. Your sort function would use that map which acts as a cache.

I've upvoted Brian's answer because it clearly best answers what you're looking for. But another solution you should consider is just write it the easy way. Processors are getting more powerful every day. Make it correct and move on. You can profile it later to see if CalculateGoodness really is the bottleneck.

I'd create pairs of ratings and things, calling CalculateGoodness once per thing, and sort that on the rating. if applicable you could also move this to a map from rating to thing
the other option would be to cache CalculateGoodness in the Thing itself either as a simple field or by making CalculateGoodness a method of Thing (making sure the cache is mutable so const Things still works)

Perhaps a tidy way of doing the separate vector thing is to actually create a vector< pair<float, Thing*> >, where the second element points to the Thing object with the corresponding float value. If you sort this vector by the float values, you can iterate over it and read the Thing objects in the correct order, possibly playing them into another vector or list so they end up stored in order.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

boost lambda collection size evaluation - c++

Related

C++ - how to buffer calc results faster than using unordered_map

Using the Lambda Function with Objects and for_each loops

How do you create an array of member function pointers with arguments?

How do you use ranges in D?

Sort a vector on a value calculated on each element, without performing the calculation multiple times per element

Categories

Resources