Memory optimization in huge data set - c++

Deal all, I have implemented some functions and like to ask some basic thing as I do not have a sound fundamental knowledge on C++. I hope, you all would be kind enough to tell me what should be the good way as I can learn from you. (Please, this is not a homework and i donot have any experts arround me to ask this)
What I did is; I read the input x,y,z, point data (around 3GB data set) from a file and then compute one single value for each point and store inside a vector (result). Then, it will be used in next loop. And then, that vector will not be used anymore and I need to get that memory as it contains huge data set. I think I can do this in two ways.
(1) By just initializing a vector and later by erasing it (see code-1). (2) By allocating a dynamic memory and then later de-allocating it (see code-2). I heard this de-allocation is inefficient as de-allocation again cost memory or maybe I misunderstood.
Q1)
I would like to know what would be the optimized way in terms of memory and efficiency.
Q2)
Also, I would like to know whether function return by reference is a good way of giving output. (Please look at code-3)
code-1
int main(){
//read input data (my_data)
vector<double) result;
for (vector<Position3D>::iterator it=my_data.begin(); it!=my_data.end(); it++){
// do some stuff and calculate a "double" value (say value)
//using each point coordinate
result.push_back(value);
// do some other stuff
//loop over result and use each value for some other stuff
for (int i=0; i<result.size(); i++){
//do some stuff
}
//result will not be used anymore and thus erase data
result.clear()
code-2
int main(){
//read input data
vector<double) *result = new vector<double>;
for (vector<Position3D>::iterator it=my_data.begin(); it!=my_data.end(); it++){
// do some stuff and calculate a "double" value (say value)
//using each point coordinate
result->push_back(value);
// do some other stuff
//loop over result and use each value for some other stuff
for (int i=0; i<result->size(); i++){
//do some stuff
}
//de-allocate memory
delete result;
result = 0;
}
code03
vector<Position3D>& vector<Position3D>::ReturnLabel(VoxelGrid grid, int segment) const
{
vector<Position3D> *points_at_grid_cutting = new vector<Position3D>;
vector<Position3D>::iterator point;
for (point=begin(); point!=end(); point++) {
//do some stuff
}
return (*points_at_grid_cutting);
}

For such huge data sets I would avoid using std containers at all and make use of memory mapped files.
If you prefer to go on with std::vector, use vector::clear() or vector::swap(std::vector()) to free memory allocated.

erase will not free the memory used for the vector. It reduces the size but not the capacity, so the vector still holds enough memory for all those doubles.
The best way to make the memory available again is like your code-1, but let the vector go out of scope:
int main() {
{
vector<double> result;
// populate result
// use results for something
}
// do something else - the memory for the vector has been freed
}
Failing that, the idiomatic way to clear a vector and free the memory is:
vector<double>().swap(result);
This creates an empty temporary vector, then it exchanges the contents of that with result (so result is empty and has a small capacity, while the temporary has all the data and the large capacity). Finally, it destroys the temporary, taking the large buffer with it.
Regarding code03: it's not good style to return a dynamically-allocated object by reference, since it doesn't provide the caller with much of a reminder that they are responsible for freeing it. Often the best thing to do is return a local variable by value:
vector<Position3D> ReturnLabel(VoxelGrid grid, int segment) const
{
vector<Position3D> points_at_grid_cutting;
// do whatever to populate the vector
return points_at_grid_cutting;
}
The reason is that provided the caller uses a call to this function as the initialization for their own vector, then something called "named return value optimization" kicks in, and ensures that although you're returning by value, no copy of the value is made.
A compiler that doesn't implement NRVO is a bad compiler, and will probably have all sorts of other surprising performance failures, but there are some cases where NRVO doesn't apply - most importantly when the value is assigned to a variable by the caller instead of used in initialization. There are three fixes for this:
1) C++11 introduces move semantics, which basically sort it out by ensuring that assignment from a temporary is cheap.
2) In C++03, the caller can play a trick called "swaptimization". Instead of:
vector<Position3D> foo;
// some other use of foo
foo = ReturnLabel();
write:
vector<Position3D> foo;
// some other use of foo
ReturnLabel().swap(foo);
3) You write a function with a more complicated signature, such as taking a vector by non-const reference and filling the values into that, or taking an OutputIterator as a template parameter. The latter also provides the caller with more flexibility, since they need not use a vector to store the results, they could use some other container, or even process them one at a time without storing the whole lot at once.

Your code seems like the computed value from the first loop is only used context-insensitively in the second loop. In other words, once you have computed the double value in the first loop, you could act immediately on it, without any need to store all values at once.
If that's the case, you should implement it that way. No worries about large allocations, storage or anything. Better cache performance. Happiness.

vector<double) result;
for (vector<Position3D>::iterator it=my_data.begin(); it!=my_data.end(); it++){
// do some stuff and calculate a "double" value (say value)
//using each point coordinate
result.push_back(value);
If the "result" vector will end up having thousands of values, this will result in many reallocations. It would be best if you initialize it with a large enough capacity to store, or use the reserve function :
vector<double) result (someSuitableNumber,0.0);
This will reduce the number of reallocation, and possible optimize your code further.
Also I would write : vector<Position3D>& vector<Position3D>::ReturnLabel(VoxelGrid grid, int segment) const
Like this :
void vector<Position3D>::ReturnLabel(VoxelGrid grid, int segment, vector<Position3D> & myVec_out) const //myVec_out is populated inside func
Your idea of returning a reference is correct, since you want to avoid copying.

`Destructors in C++ must not fail, therefore deallocation does not allocate memory, because memory can't be allocated with the no-throw guarantee.
Apart: Instead of looping multiple times, it is probably better if you do the operations in an integrated manner, i.e. instead of loading the whole dataset, then reducing the whole dataset, just read in the points one by one, and apply the reduction directly, i.e. instead of
load_my_data()
for_each (p : my_data)
result.push_back(p)
for_each (p : result)
reduction.push_back (reduce (p))
Just do
file f ("file")
while (f)
Point p = read_point (f)
reduction.push_back (reduce (p))
If you don't need to store those reductions, simply output them sequentially
file f ("file")
while (f)
Point p = read_point (f)
cout << reduce (p)

code-1 will work fine and is almost the same as code-2, with no major advantages or disadvantages.
code03 Somebody else should answer that but i believe the difference between a pointer and a reference in this case would be marginal, I do prefer pointers though.
That being said, I think you might be approaching the optimization from the wrong angle. Do you really need all points to compute the output of a point in your first loop? Or can you rewrite your algorithm to read only one point, compute the value as you would in your first loop and then use it immediately the way you want to? Maybe not with single Points, but with batches of points. That could potentially cut back on your memory require quite a bit with only a small increase in processing time.

Related

Is std::push_back relatively expensive to use?

I want to improve the performance of the following code. What aspect might affect the performance of the code when it's executed?
Also, considering that there is no limit to how many objects you can add to the container, what improvements could be made to “Object” or “addToContainer” to improve the performance of the program?
I was wondering if std::push_back in C++ affects performance of the code in any way? Especially if there is no limit to adding to list.
struct Object{
string name;
string description;
};
vector<Object> container;
void addToContainer(Object object) {
container.push_back(object);
}
int main() {
addToContainer({ "Fira", "+5 ATTACK" });
addToContainer({ "Potion", "+10 HP" });
}
Before you do ANYTHING profile the code and get a benchmark. After you make a change profile the code and get a benchmark. Compare the benchmarks. If you do not do this, you're rolling dice. Is it faster? Who knows.
Profile profile profile.
With push_back you have two main concerns:
Resizing the vector when it fills up, and
Copying the object into the vector.
There are a number of improvements you can make to the resizing cost cost of push_back depending on how items are being added.
Strategic use of reserve to minimize the amount of resizing, for example. If you know how many items are about to be added, you can check the capacity and size to see if it's worth your time to reserve to avoid multiple resizes. Note this requires knowledge of vector's expansion strategy and that is implementation-specific. An optimization for one vector implementation could be a terribly bad mistake on another.
You can use insert to add multiple items at a time. Of course this is close to useless if you need to add another container into the code in order to bulk-insert.
If you have no idea how many items are incoming, you might as well let vector do its job and optimize HOW the items are added.
For example
void addToContainer(Object object) // pass by value. Possible copy
{
container.push_back(object); // copy
}
Those copies can be expensive. Get rid of them.
void addToContainer(Object && object) //no copy and can still handle temporaries
{
container.push_back(std::move(object)); // moves rather than copies
}
std::string is often very cheap to move.
This variant of addToContainer can be used with
addToContainer({ "Fira", "+5 ATTACK" });
addToContainer({ "Potion", "+10 HP" });
and might just migrate a pointer and as few book-keeping variables per string. They are temporaries, so no one cares if it will rips their guts out and throws away the corpses.
As for existing Objects
Object o{"Pizza pop", "+5 food"};
addToContainer(std::move(o));
If they are expendable, they get moved as well. If they aren't expendable...
void addToContainer(const Object & object) // no copy
{
container.push_back(object); // copy
}
You have an overload that does it the hard way.
Tossing this one out there
If you already have a number of items you know are going to be in the list, rather than appending them all one at a time, use an initialization list:
vector<Object> container{
{"Vorpal Cheese Grater", "Many little pieces"},
{"Holy Hand Grenade", "OMG Damage"}
};
push_back can be extremely expensive, but as with everything, it depends on the context. Take for example this terrible code:
std::vector<float> slow_func(const float* ptr)
{
std::vector<float> v;
for(size_t i = 0; i < 256; ++i)
v.push_back(ptr[i]);
return v;
}
each call to push_back has to do the following:
Check to see if there is enough space in the vector
If not, allocate new memory, and copy the old values into the new vector
copy the new item to the end of the vector
increment end
Now there are two big problems here wrt performance. Firstly each push_back operation depends upon the previous operation (since the previous operation modified end, and possibly the entire contents of the array if it had to be resized). This pretty much destroys any vectorisation possibilities in the code. Take a look here:
https://godbolt.org/z/RU2tM0
The func that uses push_back does not make for very pretty asm. It's effectively hamstrung into being forced to copy a single float at a time. Now if you compare that to an alternative approach where you resize first, and then assign; the compiler just replaces the whole lot with a call to new, and a call to memcpy. This will be a few orders of magnitude faster than the previous method.
std::vector<float> fast_func(const float* ptr)
{
std::vector<float> v(256);
for(size_t i = 0; i < 256; ++i)
v[i] = ptr[i];
return v;
}
BUT, and it's a big but, the relative performance of push_back very much depends on whether the items in the array can be trivially copied (or moved). If you example you do something silly like:
struct Vec3 {
float x = 0;
float y = 0;
float z = 0;
};
Well now when we did this:
std::vector<Vec3> v(256);
The compiler will allocate memory, but also be forced to set all the values to zero (which is pointless if you are about to overwrite them again!). The obvious way around this is to use a different constructor:
std::vector<Vec3> v(ptr, ptr + 256);
So really, only use push_back (well, really you should prefer emplace_back in most cases) when either:
additional elements are added to your vector occasionally
or, The objects you are adding are complex to construct (in which case, use emplace_back!)
without any other requirements, unfortunately this is the most efficient:
void addToContainer(Object) { }
to answer the rest of your question. In general push_back will just add to the end of the allocated vector O(1), but will need to grow the vector on occasion, which can be amortized out but is O(N)
also, it would likely be more efficient not to use string, but to keep char * although memory management might be tricky unless it is always a literal being added

Copying vector elements to a vector pair

In my C++ code,
vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
So I combined these two vectors into a single one,
void combineVectors(vector<string>& strVector, vector <int>& intVector, vector < pair <string, int>>& pairVector)
{
for (int i = 0; i < strVector.size() || i < intVector.size(); ++i )
{
pairVector.push_back(pair<string, int> (strVector.at(i), intVector.at(i)));
}
}
Now this function is called like this,
vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
vector < pair <string, int>> pairVector
combineVectors(strVector, intVector, pairVector);
//rest of the implementation
The combineVectors function uses a loop to add the elements of other 2 vectors to the vector pair. I doubt this is a efficient way as this function gets called hundrands of times passing different data. This might cause a performance issue because everytime it goes through the loop.
My goal is to copy both the vectors in "one go" to the vector pair. i.e., without using a loop. Am not sure whether that's even possible.
Is there a better way of achieving this without compromising the performance?
You have clarified that the arrays will always be of equal size. That's a prerequisite condition.
So, your situation is as follows. You have vector A over here, and vector B over there. You have no guarantees whether the actual memory that vector A uses and the actual memory that vector B uses are next to each other. They could be anywhere.
Now you're combining the two vectors into a third vector, C. Again, no guarantees where vector C's memory is.
So, you have really very little to work with, in terms of optimizations. You have no additional guarantees whatsoever. This is pretty much fundamental: you have two chunks of bytes, and those two chunks need to be copied somewhere else. That's it. That's what has to be done, that's what it all comes down to, and there is no other way to get it done, other than doing exactly that.
But there is one thing that can be done to make things a little bit faster. A vector will typically allocate memory for its values in incremental steps, reserving some extra space, initially, and as values get added to the vector, one by one, and eventually reach the vector's reserved size, the vector has to now grab a new larger block of memory, copy everything in the vector to the larger memory block, then delete the older block, and only then add the next value to the vector. Then the cycle begins again.
But you know, in advance, how many values you are about to add to the vector, so you simply instruct the vector to reserve() enough size in advance, so it doesn't have to repeatedly grow itself, as you add values to it. Before your existing for loop, simply:
pairVector.reserve(pairVector.size()+strVector.size());
Now, the for loop will proceed and insert new values into pairVector which is guaranteed to have enough space.
A couple of other things are possible. Since you have stated that both vectors will always have the same size, you only need to check the size of one of them:
for (int i = 0; i < strVector.size(); ++i )
Next step: at() performs bounds checking. This loop ensures that i will never be out of bounds, so at()'s bound checking is also some overhead you can get rid of safely:
pairVector.push_back(pair<string, int> (strVector[i], intVector[i]));
Next: with a modern C++ compiler, the compiler should be able to optimize away, automatically, several redundant temporaries, and temporary copies here. It's possible you may need to help the compiler, a little bit, and use emplace_back() instead of push_back() (assuming C++11, or later):
pairVector.emplace_back(strVector[i], intVector[i]);
Going back to the loop condition, strVector.size() gets evaluated on each iteration of the loop. It's very likely that a modern C++ compiler will optimize it away, but just in case you can also help your compiler check the vector's size() only once:
int i=strVector.size();
for (int i = 0; i < n; ++i )
This is really a stretch, but it might eke out a few extra quantums of execution time. And that pretty much all obvious optimizations here. Realistically, the most to be gained here is by using reserve(). The other optimizations might help things a little bit more, but it all boils down to moving a certain number of bytes from one area in memory to another area. There aren't really special ways of doing that, that's faster than other ways.
We can use std:generate() to achieve this:
#include <bits/stdc++.h>
using namespace std;
vector <string> strVector{ "hello", "world" };
vector <int> intVector{ 2, 3 };
pair<string, int> f()
{
static int i = -1;
++i;
return make_pair(strVector[i], intVector[i]);
}
int main() {
int min_Size = min(strVector.size(), intVector.size());
vector< pair<string,int> > pairVector(min_Size);
generate(pairVector.begin(), pairVector.end(), f);
for( int i = 0 ; i < 2 ; i++ )
cout << pairVector[i].first <<" " << pairVector[i].second << endl;
}
I'll try and summarize what you want with some possible answers depending on your situation. You say you want a new vector that is essentially a zipped version of two other vectors which contain two heterogeneous types. Where you can access the two types as some sort of pair?
If you want to make this more efficient, you need to think about what you are using the new vector for? I can see three scenarios with what you are doing.
The new vector is a copy of your data so you can do stuff with it without affecting the original vectors. (ei you still need the original two vectors)
The new vector is now the storage mechanism for your data. (ei you
no longer need the original two vectors)
You are simply coupling the vectors together to make use and representation easier. (ei where they are stored doesn't actually matter)
1) Not much you can do aside from copying the data into your new vector. Explained more in Sam Varshavchik's answer.
3) You do something like Shakil's answer or here or some type of customized iterator.
2) Here you make some optimisations here where you do zero coping of the data with the use of a wrapper class. Note: A wrapper class works if you don't need to use the actual std::vector < std::pair > class. You can make a class where you move the data into it and create access operators for it. If you can do this, it also allows you to decompose the wrapper back into the original two vectors without copying. Something like this might suffice.
class StringIntContainer {
public:
StringIntContaint(std::vector<std::string>& _string_vec, std::vector<int>& _int_vec)
: string_vec_(std::move(_string_vec)), int_vec_(std::move(_int_vec))
{
assert(string_vec_.size() == int_vec_.size());
}
std::pair<std::string, int> operator[] (std::size_t _i) const
{
return std::make_pair(string_vec_[_i], int_vec_[_i]);
}
/* You may want methods that return reference to data so you can edit it*/
std::pair<std::vector<std::string>, std::vector<int>> Decompose()
{
return std::make_pair(std::move(string_vec_), std::move(int_vec_[_i])));
}
private:
std::vector<std::string> _string_vec_;
std::vector<int> int_vec_;
};

Fastest way to allocate temporary elements (knowing maximum number) in a vector?

in a function I need to store some integers in a vector. The function is called a lot of times. I know that they are less then 10 but the number is variable for each call of the function. What is the choice to have better performances?
In example I found that this:
std::vector<int> list(10)
std::vector<int>::iterator it=list.begin();
unsigned int nume_of_elements_stored;
for ( ... iterate on some structures ... ){
if (... a specific condition ...){
*it= integer from structures ;
it++;
nume_of_elements_stored++;
}
}
is slower than:
std::vector<int> list;
unsigned int num_of_elements_stored(0);
for ( ... iterate on some structures ... ){
if (... a specific condition ...){
list.push_back( integer from structures )
}
}
num_of_elements_stored=list.size();
I'm going to go down an extremely uncool route here. At the risk of being crucified, I would suggest that std::vector isn't so great here. An exception would be if you get lucky with the memory allocator and get that temporal locality through the allocator that creating and destroying a bunch of teeny vectors normally wouldn't provide.
Wait!
Before people kill me, I want to say that vector is awesome, generally speaking, as one of the most well-rounded data structures available. But when you're looking at a hotspot like this (hopefully with a profiler) as a result of creating a bunch of teeny vectors repeatedly in a tight loop, that's where this kind of straightforward usage of vector can bite you.
The trouble is that it's a heap-allocated structure (basically a dynamic array), and when we're dealing with a boatload of teeny arrays like this, we really want to use that often-cached memory at the top of the stack that's so cheap to allocate/free when we can.
One way to mitigate this is to reuse the same vector across repeated calls. Store it in the outside caller function's scope and pass it in by reference, clear it, do your push_backs, rinse and repeat. It's worth noting that clear doesn't free any memory in the vector, so it keeps that former capacity around (useful here when we want to reuse the same memory and play to temporal locality).
But here we can play to that stack. As a simplified example (using C-style code that isn't very kosher in C++ or even bothers with exception-safety, but easier to illustrate):
int stack_mem[32];
int num = 0;
int cap = 32;
int* ptr = stack_mem;
for ( ... iterate on some structures ... )
{
if (... a specific condition ...)
{
if (num == cap)
{
cap *= 2;
int* new_ptr = static_cast<int*>(malloc(cap * sizeof(int)));
memcpy(new_ptr, ptr, num * sizeof(int));
if (ptr != stack_mem)
free(ptr);
ptr = new_ptr;
}
ptr[num++] = your_int;
}
}
if (ptr != stack_mem)
free(ptr);
Of course if you use something like this, you should properly wrap it in a reusable class template that does bounds-checking, doesn't use memcpy, has exception-safety, a formal push_back method, emplace_back, copy ctor, move ctor, swap, possibly a fill ctor, range ctor, erase, range erase, insert, range insert, size, empty, iterators/begin/end, uses placement new to avoid requiring copy assignment or default ctor, etc.
The solution uses the stack when N <= 32 (can use a different number suited for your common-case needs) and then switches to heap when exceeded. This allows it to handle your common case scenarios efficiently but also not just go kablooey in those rare case scenarios when N might be huge in some pathological case. That makes it somewhat comparable to variable-length arrays in C (something I actually wish we had in C++, at least until std::dynarray is available) but without the stack overflow tendencies VLAs could have since it switches to heap in rare case scenarios.
I applied all these standard-compliant formalities with a structure based on this idea with a class template that accepts <T, FixedN>, and now use it almost as much as vector since I work with so many cases like this with teeny arrays being repeatedly created that should, in the vast majority of common cases, fit on the stack (but always with those ultra rare exceptional possibilities). It wiped off many profiler hotspots I was getting related to memory off the map.
... but applying this basic idea might give you quite a boost. You can apply that kind of effort above of wrapping it into a safe container preserving C++ object semantics if it pays off in your measurements, and I think it should quite a bit in your case.
I would probably go with sort of a middle ground:
std::vector<int> list;
list.reserve(10);
...and the rest could be pretty much like your second version. To be honest, however, it's probably open to question whether this will really make a lot of difference though.
If you use static vector it will be allocated only once.
First example works slower because it allocates and destroys vector each call.

Memory allocation for return value of a function in a loop in C++11: how does it optimize?

I'm in the mood for some premature optimization and was wondering the following.
If one has a for-loop, and inside that loop there is a call to a function that returns a container, say a vector, of which the value is caught as an rvalue into a variable in the loop using move semantics, for instance:
std::vector<any_type> function(int i)
{
std::vector<any_type> output(3);
output[0] = i;
output[1] = i*2;
output[2] = i-3;
return(output);
}
int main()
{
for (int i = 0; i < 10; ++i)
{
// stuff
auto value = function(i);
// do stuff with value ...
// ... but in such a way that it can be discarded in the next iteration
}
}
How do compilers handle this memory-wise in the case that move semantics are applied (and that the function will not be inlined)? I would imagine that the most efficient thing to do is to allocate a single piece of memory for all the values, both inside the function and outside in the for-loop, that will get overwritten in each iteration.
I am mainly interested in this, because in my real-life application the vectors I'm creating are a lot larger than in the example given here. I am concerned that if I use functions like this, the allocation and destruction process will take up a lot of useless time, because I already know that I'm going to use that fixed amount of memory a lot of times. So, what I'm actually asking is whether there's some way that compilers would optimize to something of this form:
void function(int i, std::vector<any_type> &output)
{
// fill output
}
int main()
{
std::vector<any_type> dummy; // allocate memory only once
for (int i = 0; i < 10; ++i)
{
// stuff
function(i, dummy);
// do stuff with dummy
}
}
In particular I'm interested in the GCC implementation, but would also like to know what, say, the Intel compiler does.
Here, the most predictable optimization is RVO. When a function return an object, if it is used to initialize a new variable, the compiler can elide additional copy and move to construct directly on the destination ( it means that a program can contains two versions of the function depending on the use case ).
Here, you will still pay for allocating and destroying a buffer inside the vector at each loo iteration. If it is unacceptable, you will have to rely on an other solution, like std::array as your function seems to use fixed size dimension or move the vector before the loop and reuse it.
I would imagine that the most efficient thing to do is to allocate a
single piece of memory for all the values, both inside the function
and outside in the for-loop, that will get overwritten in each
iteration.
I don't think that any of the current compilers can do that. (I would be stunned to see that.) If you want to get insights, watch Chandler Carruth's talk.
If you need this kind of optimization, you need to do it yourself: Allocate the vector outside the loop and pass it by non-const reference to function() as argument. Of course, don't forget to call clear() when you are done or call clear() first inside function().
All this has nothing to do with move semantics, nothing has changed with C++11 in this respect.
If your loop is a busy loop, than allocating a container in each iteration can cost you a lot. It's easier to find yourself in such a situation than you would probably expect. Andrei Alexandrescu presents an example in his talk Writing Quick Code in C++, Quickly. The surprising thing is that doing unnecessary heap allocations in a tight loop like the one in his example can be slower than the actual file IO. I was surprised to see that. By the way, the container was std::string.

Avoid recomputation when data is not changed

Imagine you have a pretty big array of double and a simple function avg(double*,size_t) that computes the average value (just a simple example: both the array and the function could be whatever data structure and algorithm). I would like that if the function is called a second time and the array is not changed in the meanwhile, the return value comes directly from the previous one, without going through the unchanged data.
To hold the previous value looks simple, I just need a static variable inside the function, right? But what about detecting the changes in the array? Do I need to write an interface to access the array which sets a flag to be read by the function? Can something smarter and more portable be done?
As Kerrek SB so astutely put it, this is known as "memoization." I'll cover my personal favorite method at the end (both with double* array and the much easier DoubleArray), so you can skip to there if you just want to see code. However, there are many ways to solve this problem, and I wanted to cover them all, including those suggested by others. Skip to the horizontal rule if you just want to see code.
The first part is some theory and alternate approaches. There are fundamentally four parts to the problem:
Prove the function is idempotent (calling a function once is the same as calling it any number of times)
Cache results keyed to the inputs
Search cached results given a new set of inputs
Invalidating cached results which are no longer accurate/current
The first step is easy for you: average is idempotent. It has no side effects.
Caching the results is a fun step. You obviously are going to create some "key" for the inputs that you can compare against the cached "keys." In Kerrek SB's memoization example, the key is a tuple of all of the arguments, compared against other keys with ==. In your system, the equivalent solution would be to have the key be the contents of the entire array. This means each key comparison is O(n), which is expensive. If the function was more expensive to calculate than the average function is, this price may be acceptable. However in the case of averaging, this key is terribly expensive.
This leads one on the open-ended search for good keys. Dieter Lücking's answer was to key the array pointer. This is O(1), and wicked fast to boot. However, it also makes the assumption that once you've calculated the average for an array, that array's values never change, and that memory address is never re-used for another array. Solutions for this come later, in the invalidation portion of the task.
Another popular key is HotLick's (1) in the comments. You use a unique identifier for the array (pointer or, better yet, a unique integer idx that will never be used again) as your key. Each array then has a "dirty bit for avg" that they are expected to set to true whenever a value is changed. Caches first look for the dirty bit. If it is true, they ignore the cached value, calculate the new value, cache the new value, then clear the dirty bit indicating that the cached value is now valid. (this is really invalidation, but it fit well in this part of the answer)
This technique assumes that there are more calls to avg than updates to the data. If the array is constantly dirty, then avg still has to keep recalculating, but we still pay the price of setting the dirty bit on every write (slowing it down).
This technique also assumes that there is only one function, avg, which needs cached results. If you have many functions, it starts to get expensive to keep all of the dirty bits up to date. The solution is an "epoch" counter. Instead of a dirty bit, you have an integer, which starts at 0. Every write increments it. When you cache a result, you cache not only the identity of the array, but its epoch as well. When you check to see if you have a cached value, you also check to see if the epoch changed. If it did change, you can't prove your old results are current, and have to throw them out.
Storing the results is an interesting task. It is very easy to write a storing algorithm which uses up gobs of memory by remembering hundreds of thousands of old results to avg. Generally speaking, there needs to be a way to let the caching code know that an array has been destroyed, or a way to slowly remove old unused cache results. In the former case, the deallocator of the double arrays needs to let the cache code know that that array is being deallocated. In the latter case, it is common to limit a cache to 10 or 100 entries, and have evict old cache results.
The last piece is invalidation of caches. I spoke earlier regarding the dirty bit. The general pattern for this is that a value inside a cache must be marked invalid if the key it was stored in didn't change, but the values in the array did change. This can obviously never happen if the key is a copy of the array, but it can occur when the key is an identifing integer or a pointer.
Generally speaking, invalidation is a way to add a requirement to your caller: if you want to use avg with caching, here's the extra work you are required to do to help the caching code.
Recently I implemented a system with such caching invalidation scheme. It was very simple, and stemmed from one philosophy: the code which is calling avg is in a better position to determine if the array has changed than avg is itself.
There were two versions of the equvalent of avg: double avg(double* array, int n) and double avg(double* array, int n, CacheValidityObject& validity).
Calling the 2 argument version of avg never cached, because it had no guarantees that array had not changed.
Calling the 3 argument version of avg activated caching. The caller guarentees that, if it passes the same CacheValidityObject to avg without marking it dirty, then the arrays must be the same.
Putting the onus on the caller makes average trivial. CacheValidityObject is a very simple class to hold on to the results
class CacheValidityObject
{
public:
CacheValidityObject(); // creates a new dirty CacheValidityObject
void invalidate(); // marks this object as dirty
// this function is used only by the `avg` algorithm. "friend" may
// be used here, but this example makes it public
boost::shared_ptr<void>& getData();
private:
boost::shared_ptr<void> mData;
};
inline void CacheValidityObject::invalidate()
{
mData.reset(); // blow away any cached data
}
double avg(double* array, int n); // defined as usual
double avg(double* array, int n, CacheValidityObject& validity)
{
// this function assumes validity.mData is null or a shared_ptr to a double
boost::shared_ptr<void>& data = validity.getData();
if (data) {
// The cached result, stored on the validity object, is still valid
return *static_pointer_cast<double>(data);
} else {
// There was no cached result, or it was invalidated
double result = avg(array, n);
data = make_shared<double>(result); // cache the result
return result;
}
}
// usage
{
double data[100];
fillWithRandom(data, 100);
CacheValidityObject dataCacheValidity;
double a = avg(data, 100, dataCacheValidity); // caches the aveerage
double b = avg(data, 100, dataCacheValidity); // cache hit... uses cached result
data[0] = 0;
dataCacheValidity.invalidate();
double c = avg(data, 100, dataCacheValidity); // dirty.. caches new result
double d = avg(data, 100, dataCacheValidity); // cache hit.. uses cached result
// CacheValidityObject::~CacheValidityObject() will destroy the shared_ptr,
// freeing the memory used to cache the result
}
Advantages
Nearly the fastest caching possible (within a few opcodes)
Trivial to implement
Doesn't leak memory, saving cached values only when the caller thinks it may want to use them again
Disadvantages
Requires the caller to handle caching, instead of doing it implicitly for them.
If you wrap the double* array in a class, you can minimize the disadvantage. Assign each algorithm an index (can be done at run time) Have the DoubleArray class maintain a map of cached values. Each modification to DoubleArray invalidates the cached results. This is the most easy to use version, but doesn't work with a naked array... you need a class to help you out
class DoubleArray
{
public:
// all of the getters and setters and constructors.
// Special note: all setters MUST call invalidate()
CacheValidityObject getCache(int inIdx)
{
return mCaches[inIdx];
}
void setCache(int inIdx, const CacheValidityObject& inObj)
{
mCaches[inIdx] = inObj;
}
private:
void invalidate()
{
mCaches.clear();
}
std::map<int, CacheValidityObject> mCaches;
double* mArray;
int mSize;
};
inline int getNextAlgorithmIdx()
{
static int nextIdx = 1;
return nextIdx++;
}
static const int avgAlgorithmIdx = getNextAlgorithmIdx();
double avg(DoubleArray& inArray)
{
CacheValidityObject valid = inArray.getCache(avgAlgorithmIdx);
// use the 3 argument avg in the previous example
double result = avg(inArray.getArray(), inArray.getSize(), valid);
inArray.setCache(avgAlgorithmIdx, valid);
return result;
}
// usage
DoubleArray array(100);
fillRandom(array);
double a = avg(array); // calculates, and caches
double b = avg(array); // cache hit
array.set(0, 5); // invalidates caches
double c = avg(array); // calculates, and caches
double d = avg(array); // cache hit
#include <limits>
#include <map>
// Note: You have to manage cached results - release it with avg(p, 0)!
double avg(double* p, std::size_t n) {
typedef std::map<double*, double> map;
static map results;
map::iterator pos = results.find(p);
if(n) {
// Calculate or get a cached value
if(pos == results.end()) {
pos = results.insert(map::value_type(p, 0.5)).first; // calculate it
}
return pos->second;
}
// Erase a cached value
results.erase(pos);
return std::numeric_limits<double>::quiet_NaN();
}