I am trying to create a jump table for a fuzzy controller. Basically, I have a lot of functions that take in a string and return a float, and I want to be able to do something along the lines:
float Defuzzify(std::string varName, DefuzzificationMethod defuzz)
{
return functions[defuzz](varName);
}
where DefuzzificationMethod is an enum. The objective is to avoid a switch statement and have a O(1) operation.
What I have right now is:
float CenterOfGravity(std::string varName);
std::vector<std::function<float (std::string)>> defuzzifiers;
Then I try to initialize it in the constructor with:
defuzzifiers.reserve(NUMBER_OF_DEFUZZIFICATION_METHODS);
defuzzifiers[DEFUZZ_COG] = std::bind(&CenterOfGravity, std::placeholders::_1);
This is making the compiler throw about 100 errors about enable_if (which I don't use anywhere, so I assume std does). Is there a way to make this compile ? Moreover, is there a way to make this a static vector, since every fuzzy controller will essentially have the same vector ?
Thanks in advance
Reserve just makes sure there's enough capacity, it doesn't actually mak the vector's size big enough. What you want to do is:
// construct a vector of the correct size
std::vector<std::function<float (std::string)>> defuzzifiers(NUMBER_OF_DEFUZZIFICATION_METHODS);
// now assign into it...
// if CentorOfGravity is a free function, just simple = works
defuzzifiers[DEFUZZ_COG] = CenterOfGravity;
// if it's a method
defuzzifiers[DEFUZZ_COG] = std::bind(&ThisType::CenterOfGravity, this, std::placeholders::_1);
Now this might leave you some holes which don't actually have a function defined, so maybe you want to provide a default function of sorts, which the vector constructor allows too
std::vector<std::function<float (std::string)>> defuzzifiers(
NUMBER_OF_DEFUZZIFICATION_METHODS,
[](std::string x) { return 0f; }
);
An unrelated note, you probably want your functions to take strings by const-ref and not by value, as copying strings is expensive.
Related
I'm in the mood for some premature optimization and was wondering the following.
If one has a for-loop, and inside that loop there is a call to a function that returns a container, say a vector, of which the value is caught as an rvalue into a variable in the loop using move semantics, for instance:
std::vector<any_type> function(int i)
{
std::vector<any_type> output(3);
output[0] = i;
output[1] = i*2;
output[2] = i-3;
return(output);
}
int main()
{
for (int i = 0; i < 10; ++i)
{
// stuff
auto value = function(i);
// do stuff with value ...
// ... but in such a way that it can be discarded in the next iteration
}
}
How do compilers handle this memory-wise in the case that move semantics are applied (and that the function will not be inlined)? I would imagine that the most efficient thing to do is to allocate a single piece of memory for all the values, both inside the function and outside in the for-loop, that will get overwritten in each iteration.
I am mainly interested in this, because in my real-life application the vectors I'm creating are a lot larger than in the example given here. I am concerned that if I use functions like this, the allocation and destruction process will take up a lot of useless time, because I already know that I'm going to use that fixed amount of memory a lot of times. So, what I'm actually asking is whether there's some way that compilers would optimize to something of this form:
void function(int i, std::vector<any_type> &output)
{
// fill output
}
int main()
{
std::vector<any_type> dummy; // allocate memory only once
for (int i = 0; i < 10; ++i)
{
// stuff
function(i, dummy);
// do stuff with dummy
}
}
In particular I'm interested in the GCC implementation, but would also like to know what, say, the Intel compiler does.
Here, the most predictable optimization is RVO. When a function return an object, if it is used to initialize a new variable, the compiler can elide additional copy and move to construct directly on the destination ( it means that a program can contains two versions of the function depending on the use case ).
Here, you will still pay for allocating and destroying a buffer inside the vector at each loo iteration. If it is unacceptable, you will have to rely on an other solution, like std::array as your function seems to use fixed size dimension or move the vector before the loop and reuse it.
I would imagine that the most efficient thing to do is to allocate a
single piece of memory for all the values, both inside the function
and outside in the for-loop, that will get overwritten in each
iteration.
I don't think that any of the current compilers can do that. (I would be stunned to see that.) If you want to get insights, watch Chandler Carruth's talk.
If you need this kind of optimization, you need to do it yourself: Allocate the vector outside the loop and pass it by non-const reference to function() as argument. Of course, don't forget to call clear() when you are done or call clear() first inside function().
All this has nothing to do with move semantics, nothing has changed with C++11 in this respect.
If your loop is a busy loop, than allocating a container in each iteration can cost you a lot. It's easier to find yourself in such a situation than you would probably expect. Andrei Alexandrescu presents an example in his talk Writing Quick Code in C++, Quickly. The surprising thing is that doing unnecessary heap allocations in a tight loop like the one in his example can be slower than the actual file IO. I was surprised to see that. By the way, the container was std::string.
Deal all, I have implemented some functions and like to ask some basic thing as I do not have a sound fundamental knowledge on C++. I hope, you all would be kind enough to tell me what should be the good way as I can learn from you. (Please, this is not a homework and i donot have any experts arround me to ask this)
What I did is; I read the input x,y,z, point data (around 3GB data set) from a file and then compute one single value for each point and store inside a vector (result). Then, it will be used in next loop. And then, that vector will not be used anymore and I need to get that memory as it contains huge data set. I think I can do this in two ways.
(1) By just initializing a vector and later by erasing it (see code-1). (2) By allocating a dynamic memory and then later de-allocating it (see code-2). I heard this de-allocation is inefficient as de-allocation again cost memory or maybe I misunderstood.
Q1)
I would like to know what would be the optimized way in terms of memory and efficiency.
Q2)
Also, I would like to know whether function return by reference is a good way of giving output. (Please look at code-3)
code-1
int main(){
//read input data (my_data)
vector<double) result;
for (vector<Position3D>::iterator it=my_data.begin(); it!=my_data.end(); it++){
// do some stuff and calculate a "double" value (say value)
//using each point coordinate
result.push_back(value);
// do some other stuff
//loop over result and use each value for some other stuff
for (int i=0; i<result.size(); i++){
//do some stuff
}
//result will not be used anymore and thus erase data
result.clear()
code-2
int main(){
//read input data
vector<double) *result = new vector<double>;
for (vector<Position3D>::iterator it=my_data.begin(); it!=my_data.end(); it++){
// do some stuff and calculate a "double" value (say value)
//using each point coordinate
result->push_back(value);
// do some other stuff
//loop over result and use each value for some other stuff
for (int i=0; i<result->size(); i++){
//do some stuff
}
//de-allocate memory
delete result;
result = 0;
}
code03
vector<Position3D>& vector<Position3D>::ReturnLabel(VoxelGrid grid, int segment) const
{
vector<Position3D> *points_at_grid_cutting = new vector<Position3D>;
vector<Position3D>::iterator point;
for (point=begin(); point!=end(); point++) {
//do some stuff
}
return (*points_at_grid_cutting);
}
For such huge data sets I would avoid using std containers at all and make use of memory mapped files.
If you prefer to go on with std::vector, use vector::clear() or vector::swap(std::vector()) to free memory allocated.
erase will not free the memory used for the vector. It reduces the size but not the capacity, so the vector still holds enough memory for all those doubles.
The best way to make the memory available again is like your code-1, but let the vector go out of scope:
int main() {
{
vector<double> result;
// populate result
// use results for something
}
// do something else - the memory for the vector has been freed
}
Failing that, the idiomatic way to clear a vector and free the memory is:
vector<double>().swap(result);
This creates an empty temporary vector, then it exchanges the contents of that with result (so result is empty and has a small capacity, while the temporary has all the data and the large capacity). Finally, it destroys the temporary, taking the large buffer with it.
Regarding code03: it's not good style to return a dynamically-allocated object by reference, since it doesn't provide the caller with much of a reminder that they are responsible for freeing it. Often the best thing to do is return a local variable by value:
vector<Position3D> ReturnLabel(VoxelGrid grid, int segment) const
{
vector<Position3D> points_at_grid_cutting;
// do whatever to populate the vector
return points_at_grid_cutting;
}
The reason is that provided the caller uses a call to this function as the initialization for their own vector, then something called "named return value optimization" kicks in, and ensures that although you're returning by value, no copy of the value is made.
A compiler that doesn't implement NRVO is a bad compiler, and will probably have all sorts of other surprising performance failures, but there are some cases where NRVO doesn't apply - most importantly when the value is assigned to a variable by the caller instead of used in initialization. There are three fixes for this:
1) C++11 introduces move semantics, which basically sort it out by ensuring that assignment from a temporary is cheap.
2) In C++03, the caller can play a trick called "swaptimization". Instead of:
vector<Position3D> foo;
// some other use of foo
foo = ReturnLabel();
write:
vector<Position3D> foo;
// some other use of foo
ReturnLabel().swap(foo);
3) You write a function with a more complicated signature, such as taking a vector by non-const reference and filling the values into that, or taking an OutputIterator as a template parameter. The latter also provides the caller with more flexibility, since they need not use a vector to store the results, they could use some other container, or even process them one at a time without storing the whole lot at once.
Your code seems like the computed value from the first loop is only used context-insensitively in the second loop. In other words, once you have computed the double value in the first loop, you could act immediately on it, without any need to store all values at once.
If that's the case, you should implement it that way. No worries about large allocations, storage or anything. Better cache performance. Happiness.
vector<double) result;
for (vector<Position3D>::iterator it=my_data.begin(); it!=my_data.end(); it++){
// do some stuff and calculate a "double" value (say value)
//using each point coordinate
result.push_back(value);
If the "result" vector will end up having thousands of values, this will result in many reallocations. It would be best if you initialize it with a large enough capacity to store, or use the reserve function :
vector<double) result (someSuitableNumber,0.0);
This will reduce the number of reallocation, and possible optimize your code further.
Also I would write : vector<Position3D>& vector<Position3D>::ReturnLabel(VoxelGrid grid, int segment) const
Like this :
void vector<Position3D>::ReturnLabel(VoxelGrid grid, int segment, vector<Position3D> & myVec_out) const //myVec_out is populated inside func
Your idea of returning a reference is correct, since you want to avoid copying.
`Destructors in C++ must not fail, therefore deallocation does not allocate memory, because memory can't be allocated with the no-throw guarantee.
Apart: Instead of looping multiple times, it is probably better if you do the operations in an integrated manner, i.e. instead of loading the whole dataset, then reducing the whole dataset, just read in the points one by one, and apply the reduction directly, i.e. instead of
load_my_data()
for_each (p : my_data)
result.push_back(p)
for_each (p : result)
reduction.push_back (reduce (p))
Just do
file f ("file")
while (f)
Point p = read_point (f)
reduction.push_back (reduce (p))
If you don't need to store those reductions, simply output them sequentially
file f ("file")
while (f)
Point p = read_point (f)
cout << reduce (p)
code-1 will work fine and is almost the same as code-2, with no major advantages or disadvantages.
code03 Somebody else should answer that but i believe the difference between a pointer and a reference in this case would be marginal, I do prefer pointers though.
That being said, I think you might be approaching the optimization from the wrong angle. Do you really need all points to compute the output of a point in your first loop? Or can you rewrite your algorithm to read only one point, compute the value as you would in your first loop and then use it immediately the way you want to? Maybe not with single Points, but with batches of points. That could potentially cut back on your memory require quite a bit with only a small increase in processing time.
I am writing a function for getting datasets from a file and putting them into vectors. The datasets are then used in a calculation. In the file, a user writes each dataset on a line under a heading like 'Dataset1'. The result is i vectors by the time the function finishes executing. The function works just fine.
The problem is that I don't know how to get the vectors out of the function! (1) I think I can only return one entity from a function. So I can't return i vectors. Also, (2) I can't write the vectors/datasets as function parameters and return them by reference because the number of vectors/datasets is different for each calculation. If there are other possibilities, I am unaware of them.
I'm sure this is a silly question, but am I missing something here? I would be very grateful for any suggestions. Until now, I have not put the vector/dataset extraction code into a function; I have kept it in my main file, where it has worked fine. I would now like to clean up my code by putting all data extraction code into its own function.
For each calculation, I DO know the number of vectors/datasets that the function will find in the file because I have that information written in the file and can extract it. Is there some way I could use this information?
If each vector is of the same type you can return a
std::vector<std::vector<datatype> >
This would look like:
std::vector<std::vector<datatype> > function(arguments) {
std::vector<std::vector<datatype> > return_vector;
for(int i =0; i < rows; ++i) {
\\ do processing
return_vector.push_back(resulting_vector);
}
return return_vector;
}
As has been mentionned, you may simply use a vector of vectors.
In addition, you may want to add a smart pointer around it, just to make sure you're not copying the contents of your vectors (but that's already an improvement. First aim at something that works).
As for the information on the number of vectors, you may use it by resizing the global vector to the appropriate value.
You question is, at its essence "How do I return a pile of things from a function?" It happens that your things are vector<double>, but that's not really important. What is important is that you have a pile of them of unknown size.
You can refine your thinking by rephrasing your one question into two:
How do I represent a pile of things?
How do I return that representation from a function?
As to the first question, this is precisely what containers do. Containers, as you surely know because you are already using one, hold an arbitrary numbers of similar objects. Examples include std::vector<T> and std::list<T>, among others. Your choice of which container to use is dictated by circumstances you haven't mentioned -- for example, how expensive are the items to copy, do you need to delete an item from middle of the pile, etc.
In your specific case, knowing what little we know, it seems you should use std::vector<>. As you know, the template parameter is the type of the thing you want to store. In your case that happens to be (coincidentally), an std::vector<double>. (The fact that the container and its contained object happen to be similar types is of no consequence. If you need a pile of Blobs or Widgets, you say std::vector<Blob> or std::vector<Widget>. Since you need a pile of vector<double>s, you say vector<vector<double> >.) So you would declare it thus:
std::vector<std::vector<double > > myPile;
(Notice the space between > and >. That space is required in the previous C++ standard.)
You build up that vector just as you did your vector<double> -- either using generic algorithms, or invoking push_back, or some other way. So, your code would look like this:
void function( /* args */ ) {
std::vector<std::vector<double> > myPile;
while( /* some condition */ ) {
std::vector<double> oneLineOfData;
/* code to read in one vector */
myPile.push_back(oneLineOfData);
}
}
In this manner, you collect all of the incoming data into one structure, myPile.
As to the second question, how to return the data. Well, that's simple -- use a return statement.
std::vector<std::vector<double> > function( /* args */ ) {
std::vector<std::vector<double> > myPile;
/* All of the useful code goes here*/
return myPile;
}
Of course, you could also return the information via a passed-in reference to your vector:
void function( /* args */, std::vector<std::vector<double> >& myPile)
{
/* code goes here. including: */
myPile.push_back(oneLineOfData);
}
Or via a passed-in pointer to your vector:
void function( /* args */, std::vector<std::vector<double> >* myPile)
{
/* code goes here. */
myPile->push_back(oneLineOfData);
}
In both of those cases, the caller must create the vector-of-vector-of-double before invoking your function. Prefer the first (return) way, but if your program design dictates, you can use the other ways.
can anyone recommend a nice and tidy way to achieve this:
float CalculateGoodness(const Thing& thing);
void SortThings(std::vector<Thing>& things)
{
// sort 'things' on value returned from CalculateGoodness, without calling CalculateGoodness more than 'things.size()' times
}
Clearly I could use std::sort with a comparison function that calls CalculateGoodness, but then that will get called several times per Thing as it is compared to other elements, which is no good if CalculateGoodness is expensive. I could create another std::vector just to store the ratings and std::sort that, and rearrange things in the same way, but I can't see a tidy way of doing that. Any ideas?
Edit: Apologies, I should have said without modifying Thing, else it's a fairly easy problem to solve :)
I can think of a simple transformation (well two) to get what you want. You could use std::transform with suitable predicates.
std::vector<Thing> to std::vector< std::pair<Result,Thing> >
sort the second vector (works because a pair is sorted by it first member)
reverse transformation
Tadaam :)
EDIT: Minimizing the number of copies
std::vector<Thing> to std::vector< std::pair<Result,Thing*> >
sort the second vector
transform back into a secondary vector (local)
swap the original and local vectors
This way you would only copy each Thing once. Notably remember that sort perform copies so it could be worth using.
And because I am feeling grant:
typedef std::pair<float, Thing*> cached_type;
typedef std::vector<cached_type> cached_vector;
struct Compute: std::unary_function< Thing, cached_type >
{
cached_type operator()(Thing& t) const
{
return cached_type(CalculateGoodness(t), &t);
}
};
struct Back: std::unary_function< cached_type, Thing >
{
Thing operator()(cached_type t) const { return *t.second; }
};
void SortThings(std::vector<Thing>& things)
{
// Reserve to only allocate once
cached_vector cache; cache.reserve(things.size());
// Compute Goodness once and for all
std::transform(things.begin(), things.end(),
std::back_inserter(cache), Compute());
// Sort
std::sort(cache.begin(), cache.end());
// We have references inside `things` so we can't modify it
// while dereferencing...
std::vector<Thing> local; local.reserve(things.size());
// Back transformation
std::transform(cache.begin(), cache.end(),
std::back_inserter(local), Back());
// Put result in `things`
swap(things, local);
}
Provided with the usual caveat emptor: off the top of my head, may kill kittens...
You can have a call to CalculateGoodness that you call for each element before sorting, and then CalculateGoodness simply updates an internal member variable. Then you can sort based on that member variable.
Another possibility if you can't modify your type, is storing some kind of std::map for your objects and their previously calculated values. Your sort function would use that map which acts as a cache.
I've upvoted Brian's answer because it clearly best answers what you're looking for. But another solution you should consider is just write it the easy way. Processors are getting more powerful every day. Make it correct and move on. You can profile it later to see if CalculateGoodness really is the bottleneck.
I'd create pairs of ratings and things, calling CalculateGoodness once per thing, and sort that on the rating. if applicable you could also move this to a map from rating to thing
the other option would be to cache CalculateGoodness in the Thing itself either as a simple field or by making CalculateGoodness a method of Thing (making sure the cache is mutable so const Things still works)
Perhaps a tidy way of doing the separate vector thing is to actually create a vector< pair<float, Thing*> >, where the second element points to the Thing object with the corresponding float value. If you sort this vector by the float values, you can iterate over it and read the Thing objects in the correct order, possibly playing them into another vector or list so they end up stored in order.
In C++ I want to add two 50-digit numbers. I use an array to keep each. It means that I want to add two arrays. My problem is that I want to do this in a function named AddNum()
and pass the result to another function named WriteNum for printing and I don't know how to pass an array returned by one function to another function.
hope that my question was clear enough
thanx all
Don't use arrays. Look up the C++ std::vector class in your text book or help system and use that instead. It will make life much easier for you.
Don't return an array from the addition function - instead make it return void and pass the array for storing the result by reference or pointer. Then you will pass that array to the function for printing.
First of all, if you will be adding a lot of numbers, saving them in arrays is troublesome and takes both cpu power and memory. If you can, use GMP (optimized and fast library).
If you must use arrays, then use c++'s vectors instead of c's arrays which will minimize the chance for error and make it simpler.
To send a vector to a function, you do it normally as with int's, string's etc, namely
vector<int> number1;
vector<int> number2;
addnum(number1, number2);
where addnum is defined as :
void addnum(vector<int> a, vector<int> b)
This will copy the first and the second vector array that you have into variables a and b. It is recommended that you send a reference to the addnum in order skip to copy the vectors all the time. This can be done by changing the addnum definition to :
void addnum(vector<int>& a, vector>int>& b)
and then performing normal operation on a and b as usual.
To have addnum return a vector you need to change the definition of addnum to
vector<int> addnum(vector<int>& a, vector>int>& b)
and of course have the return statement with the vector you want to return.
If you choose to send the values by references, that means that the number1 and number2 vectors declared in you main class will also change if you change them in the addnum function. That basically means that, if you save the result in variable a in the addnum function, you will have that same value in the number1 vector, meaning you don't need the function to return a new vector but can instead reuse the existing ones.
Just to clarify the answer of sharptooth: if you want to return a new array, you have to allocate it in the function and someone else will have to free it. It can be done but you will always have to remember to free the result after calling the function. As Neil Butterworth points out: std::vector helps you solve this.
if i get it right, and if this is what you wanna do, your functions should have signatures like that:
int* addNumbers(int* array1, int* array2)
{
....
}
and
void writeNumbers(int *array3)
{
....
}
and you can call it like:
//declare and init array1 and array2
//...
writeNumbers(addNumbers(array1, array2));
i hope i understood your question correctly.