How to lock-free update index to maximum over a range? - c++

The problem is best explained with some simple code.
struct foo
{
static constexpr auto N=8;
double data[N]; // initialised at construction
int max; // index to maximum: data[max] is largest value
// if value < data[index]:
// – update data[index] = value
// - update max
void update(int index, double value)
{
if(value >= data[index])
return;
data[index] = value;
if(index==max) // max unaffected if index!=max
for(index=0; index!=N; ++index)
if(data[index] > data[max])
max = index;
}
};
Now, I want to make foo::update() thread safe, i.e. allow concurrent calls from different threads, where participating threads cannot call with the same index. One way is to add a mutex or simple spinlock (the contention can be presumed low) to foo:
struct foo
{
static constexpr auto N=8;
std::atomic_flag lock = ATOMIC_FLAG_INIT;
double data[N];
int max;
// index is unique to each thread
// if value < data[index]:
// – update data[index] = value
// - update max
void update(int index, double value)
{
if(value >= data[index])
return;
while(lock.test_and_set(std::memory_order_acquire)); // aquire spinlock
data[index] = value;
if(index==max)
for(index=0; index!=N; ++index)
if(data[index] > data[max])
max = index;
lock.clear(std::memory_order_release); // release spinlock
}
};
However, how can I implement foo::update() lock-free (you may consider data and max to be atomic)?
NOTE: this is a simpler version of the original post, without relation to the tree structure.

So, IIUC, the array only gets new values if they are lower than what is already there (and I won't worry about how the initial values got there), and if the current max is lowered, find a new max.
Some of this is not too hard.
But some is... harder.
So the "if value < data[index] then write data" needs to be in a CAS-loop. Something like:
auto oldval = data[index].load(memory_order_relaxed);
do
if (value <= oldval) return;
while ( ! data[index].compare_exchange_weak(oldval, value) );
// (note that oldval is updated to data[index] each time comp-exch fails)
So now data[index] has the new lower value. Awesome. And relatively easy.
Now about max.
First question - Is it OK for max to ever be wrong? Because it may currently be wrong (in our scenario, where we update data[index] before handling max).
Can it be wrong in some ways, not others? ie let's say our data is just two entries:
data[2] = { 3, 7 };
And we want to do update(1, 2) ie change the 7 to a 2. (And thus update max!)
Scenario A: set data first, then max:
data[1] = 2;
pause(); // ie scheduler pauses this thread
max = 0; // data[0]==3 is now max
If another thread comes in at pause(), then data[max] is wrong: 2 instead of 3 :-(
Scenario B: set max first:
max = 0; // it will be "shortly"?
pause();
data[1] = 2;
Now a thread could read data[max] as 3 while 7 is still in data. But 7 is going to become 2 "soon", so is that OK? Is it "less wrong" than scenario A? Depends on usage? (ie if the important thing is "which is max" we have that right. But if max was the only important thing, why store all the data at all?)
It seems odd to ask "is wrong OK", but in some lock-free situations it actually is a valid question. To me B has a chance to be OK for some uses, whereas A doesn't.
Also, and this is important:
data[max] is always wrong, even in the perfect algorithm
By this I mean you need to realize that data[max], as soon as you read it is already out of date - if you are living in a lockfree world. Because it may have changed as soon as you read it. (Also because data and max change independently. But even if you had a function getMaxValue() it would be out of date as soon as it returns.)
Is that OK? Because, if not, you obviously need a lock. But if it is OK, we can use it to our advantage - we might be able to return an answer which we know is somewhat incorrect / out-of-date, but no more incorrect than what you could tell from the outside.
If neither scenario is OK, then you must update max and data[index] at the same time. Which is hard since they don't fit into a lock-free sized chunk.
So instead you can add a layer of indirection:
struct DataAndMax { double data[N]; int max; };
DataAndMax * ptr;
Whenever you need to update max, you need to make a whole new DataAndMax struct (ie allocate a new one), somehow fill it all out nicely, and then atomically swap ptr to the new struct.
And if some other thread changed ptr while you were preparing the new data, then you would need to start over, since you need their new data in your data.
And if ptr has changed twice, then it may look like it hasn't changed, when it really has: Let's say ptr currently has value 0xA000 and a 2nd thread allocates a new DataAndStruct at 0xB000, and sets ptr to 0xB000, and frees the old one at 0xA000. Now yet another thread (3rd) comes in, allocates yet another DataAndStruct - and low and behold the allocator gives you back 0xA000 (why not, it was just freed!). So this 3rd thread sets ptr to 0xA000.
And this all happens while you are trying to set ptr to 0xC000. All you see is that ptr was 0xA000, and later still is 0xA000, so you think it (and its data) hasn't changed. Yet it has - it went from 0xA000 to 0xB000 (when you weren't looking) back to 0xA000 - the address is the same, but the data is different. This is called the ABA problem.
Now, if you knew the max number of threads, you could pre-allocate:
DataAndMax dataBufs[NUM_THREADS];
DataAndMax * ptr; // current DataAndMax
And then never allocate/delete and never have ABA problems. Or there's other ways to avoid ABA.
Let's go back, and think about how we are going to - no matter what - return a max value that is potentially out of date. Can we use that?
So you come in, and first check if the index you are about to write to is the important one or not:
if (index != max) {
// we are not touching max,
// so nothing fancy here!
data[index] value;
return;
}
// else do it the hard way:
//...
But this is already wrong. After the if and before the set, max may have changed. Does every set need to update max!?!?
So, if N is small, you could just linear search for max. It may be wrong if someone makes an update while searching, but remember - it could also be wrong if someone makes an update right after searching or right after "insert magic here". So searching, other than possibly being slow, is as correct as any algorithm. You will find something that was, for a moment, max.
If N == 8, I would use searching. Definitely.
You can search 8 entries using memory_order_relaxed possibly, and that will be faster than trying to maintain anything using stronger atomic ops.
I have other ideas:
More bookkeeping? Store maxValue separately?
double data[N];
double maxValue;
int indexOfMax;
bool wasMax = false;
if (index == indexOfMax)
wasMax = true;
data[index] = value;
if (wasMax || index == indexOfMax)
findMax(&indexOfMax, &maxValue); // linear search
That would probably need a CAS-loop somewhere. Still linear search, but maybe less often?
Maybe you need extra data at each entry? Not sure yet.
Hmmmm.
This is not simple. Thus if there is a correct algorithm (and I think there is, within some constraints) it is unlikely to not have bugs. ie a correct algorithm might actually exist, but you don't find it - what you instead find is an algorithm that just looks correct.

Related

Is there a fast way to 'move' all vector values by 1 position?

I want to implement an algorithm that basically moves every value(besides the last one) one place to the left, as in the first element becomes the second element, and so on.
I have already implemented it like this:
for(int i = 0; i < vct.size() - 1; i++){
vct[i] = vct[i + 1];
}
which works, but I was just wondering if there is a faster, optionally shorter way to achieve the same result?
EDIT: I have made a mistake where I said that I wanted it to move to the right, where in reality I wanted it to go left, so sorry for the confusion and thanks to everyone for pointing that out. I also checked if the vector isn't empty beforehand, just didn't include it in the snippet.
As a comment (or more than one?) has pointed out, the obvious choice here would be to just use a std::deque.
Another possibility would be to use a circular buffer. In this case, you'll typically have an index (or pointer) to the first and last items in the collection. Removing an item from the beginning consists of incrementing that index/pointer (and wrapping it around to the beginning of you've reached the end of the buffer). That's quite fast, and constant time, regardless of collection size. There is a downside that every time you add an item, remove an item, or look at an item, you need to do a tiny bit of extra math to do it. But it's usually just one addition, so overhead is pretty minimal. Circular buffers work well, but they have a fair number of corner cases, so getting them just right is often kind of a pain. Worse, many obvious implementations waste one the slot for one data item (though that often doesn't matter a lot).
A slightly simpler possibility would reduce overhead by some constant factor. To do this, use a little wrapper that keeps track of the first item in the collection, along with your vector of items. When you want to remove the first item, just increment the variable that keeps track of the first item. When you reach some preset limit, you erase the first N elements all at once. This reduces the time spent shifting items by a factor of N.
template <class T>
class pseudo_queue {
std::vector<T> data;
std:size_t b;
// adjust as you see fit:
static const int max_slop = 20;
void shift() {
data.erase(data.begin(), data.begin() + b);
}
public:
void push_back(T &&t) { data.push_back(std::move(t); }
void pop_back() { data.pop_back(); }
T &back() { return data.back(); }
T &front() { return data[b]; }
void pop_front() {
if (++b > max_slop) shift();
}
std::vector<T>::iterator begin() { return data.begin() + b; }
std::vector<T>::iterator end() { return data.end(); }
T &operator[](std::size_t index) { return data[index + b]; }
};
If you want to get really tricky, you can change this a bit, so you compute max_slop as a percentage of the size of data in the collection. In this case, you can change the computational complexity involved, rather than just leaving it linear but with a larger constant factor than you currently have. But I have no idea how much (if at all) you care about that--it's only likely to matter much if you deal with a wide range of sizes.
assuming you really meant moving data to the right, and that your code has a bug,
you have std::move_backwards from <algorithms> and its sibling std::move to do that, but accessing data backwards may be inefficient.
std::move_backward(vct.begin(),vct.end()-1,vct.end());
if you actually meant move data to the left, you can use std::move
std::move(vct.begin()+1,vct.end(),vct.begin())
you can also use std::copy and std::copy_backward instead if your object is trivially copiable, the syntax is exactly the same and it will be faster for trivially copyable objects (stack objects).
you can just do a normal loop to the right, assuming vct.size() is bigger than 1.
int temp1 = vct[0]; // assume vct of int
int temp2; // assume vct of int
for(int i = 0;i<vct.size() - 1;i++){
temp2 = vct[i+1];
vct[i+1] = temp1;
temp1 = temp2;
}
and your version is what's to do if you are moving to the left.
also as noted in the comments, you should check that your list is not empty if you are doing the loop version.

Couple performance questions (one bigger vector vs smaller chunks vectors) and Is it worth to store iteration index for jump access of vector?

I am a bit curiuous about vector optimization and have couple questions about it. (I am still a beginner in programing)
example:
struct GameInfo{
EnumType InfoType;
// Other info...
};
int _lastPosition;
// _gameInfoV is sorted beforehand
std::vector<GameInfo> _gameInfoV;
// The tick function is called every game frame (in "perfect" condition it's every 1.0/60 second)
void BaseClass::tick()
{
for (unsigned int i = _lastPosition; i < _gameInfoV.size(); i++{
auto & info = _gameInfoV[i];
if( !info.bhasbeenAdded ){
if( DoWeNeedNow() ){
_lastPosition++;
info.bhasbeenAdded = true;
_otherPointer->DoSomething(info.InfoType);
// Do something more with "info"....
}
else return; //Break the cycle since we don't need now other "info"
}
}
}
The _gameInfoV vector size can be between 2000 and 5000.
My main 2 questions are:
Is it better to leave the way how it is or it's better to make smaller chunks of it, which is checked for every different GameInfo.InfoType
Is it worth the hassle of storing the last start position index of the vector instead of iterating from the beginning.
Note that if using smaller vectors there will be like 3 to 6 of them
The third thing is probably that I am not using vector iterators, but is it safe to use then like this?
std::vector<GameInfo>::iterator it = _gameInfoV.begin() + _lastPosition;
for (it = _gameInfoV.begin(); it != _gameInfoV.end(); ++it){
//Do something
}
Note: It will be used in smartphones, so every optimization will be appreciated, when targeting weaker phones.
-Thank you
Don't; except if you frequently move memory around
It is no hassle if you do it correctly:
std::vector<GameInfo>::const_iterator _lastPosition(gameInfoV.begin());
// ...
for (std::vector<GameInfo>::iterator info=_lastPosition; it!=_gameInfoV.end(); ++info)
{
if (!info->bhasbeenAdded)
{
if (DoWeNeedNow())
{
++_lastPosition;
_otherPointer->DoSomething(info->InfoType);
// Do something more with "info"....
}
else return; //Break the cycle since we don't need now other "i
}
}
Breaking one vector up into several smaller vectors in general doesn't improve performance. It could even slightly degrade performance because the compiler has to manage more variables, which take up more CPU registers etc.
I don't know about gaming so I don't understand the implication of GameInfo.InfoType. Your processing time and CPU resource requirements are going to increase if you do more total iterations through loops (where each loop iteration performs the same type of operation). So if separating the vectors causes you to avoid some loop iterations because you can skip entire vectors, that's going to increase performance of your app.
iterators are the most secure way to iterate through containers. But for a vector I often just use the index operator [] and my own indexer (a plain old unsigned integer).

Ring Allocator For Lockfree Update of Member Variable?

I have a class that stores the latest value of some incoming realtime data (around 150 million events/second).
Suppose it looks like this:
class DataState
{
Event latest_event;
public:
//pushes event atomically
void push_event(const Event __restrict__* e);
//pulls event atomically
Event pull_event();
};
I need to be able to push events atomically and pull them with strict ordering guarantees. Now, I know I can use a spinlock, but given the massive event rate (over 100 million/second) and high degree of concurrency I'd prefer to use lockfree operations.
The problem is that Event is 64 bytes in size. There is no CMPXCHG64B instruction on any current X86 CPU (as of August '16). So if I use std::atomic<Event> I'd have to link to libatomic which uses mutexes under the hood (too slow).
So my solution was to instead atomically swap pointers to the value. Problem is dynamic memory allocation becomes a bottleneck with these event rates. So... I define something I call a "ring allocator":
/// #brief Lockfree Static short-lived allocator used for a ringbuffer
/// Elements are guaranteed to persist only for "size" calls to get_next()
template<typename T> class RingAllocator {
T *arena;
std::atomic_size_t arena_idx;
const std::size_t arena_size;
public:
/// #brief Creates a new RingAllocator
/// #param size The number of elements in the underlying arena. Make this large enough to avoid overwriting fresh data
RingAllocator<T>(std::size_t size) : arena_size(size)
{
//allocate pool
arena = new T[size];
//zero out pool
std::memset(arena, 0, sizeof(T) * size);
arena_idx = 0;
}
~RingAllocator()
{
delete[] arena;
}
/// #brief Return next element's pointer. Thread-safe
/// #return pointer to next available element
T *get_next()
{
return &arena[arena_idx.exchange(arena_idx++ % arena_size)];
}
};
Then I could have my DataState class look like this:
class DataState
{
std::atomic<Event*> latest_event;
RingAllocator<Event> event_allocator;
public:
//pushes event atomically
void push_event(const Event __restrict__* e)
{
//store event
Event *new_ptr = event_allocator.get_next()
*new_ptr = *e;
//swap event pointers
latest_event.store(new_ptr, std::memory_order_release);
}
//pulls event atomically
Event pull_event()
{
return *(latest_event.load(std::memory_order_acquire));
}
};
As long as I size my ring allocator to the max # of threads that may concurrently call the functions, there's no risk of overwriting data that pull_event could return. Plus everything's super localized so indirection won't cause bad cache performance. Any possible pitfalls with this approach?
The DataState class:
I thought it was going to be a stack or queue, but it isn't, so push / pull don't seem like good names for methods. (Or else the implementation is totally bogus).
It's just a latch that lets you read the last event that any thread stored.
There's nothing to stop two writes in a row from overwriting an element that's never been read. There's also nothing to stop you reading the same element twice.
If you just need somewhere to copy small blocks of data, a ring buffer does seem like a decent approach. But if you don't want to lose events, I don't think you can use it this way. Instead, just get a ring buffer entry, then copy to it and use it there. So the only atomic operation should be incrementing the ring buffer position index.
The ring buffer
You can make get_next() much more efficient. This line does an atomic post-increment (fetch_add) and an atomic exchange:
return &arena[arena_idx.exchange(arena_idx++ % arena_size)];
I'm not even sure it's safe, because the xchg can maybe step on the fetch_add from another thread. Anyway, even if it's safe, it's not ideal.
You don't need that. Make sure the arena_size is always a power of 2, then you don't need to modulo the shared counter. You can just let it go, and have every thread modulo it for their own use. It will eventually wrap, but it's a binary integer so it will wrap at a power of 2, which is a multiple of your arena size.
I'd suggest storing an AND-mask instead of a size, so there's no risk of the % compiling to anything other than an and instruction, even if it's not a compile-time constant. This makes sure we avoid a 64-bit integer div instruction.
template<typename T> class RingAllocator {
T *arena;
std::atomic_size_t arena_idx;
const std::size_t size_mask; // maybe even make this a template parameter?
public:
RingAllocator<T>(std::size_t size)
: arena_idx(0), size_mask(size-1)
{
// verify that size is actually a power of two, so the mask is all-ones in the low bits, and all-zeros in the high bits.
// so that i % size == i & size_mask for all i
...
}
...
T *get_next() {
size_t idx = arena_idx.fetch_add(1, std::memory_order_relaxed); // still atomic, but we don't care which order different threads take blocks in
idx &= size_mask; // modulo our local copy of the idx
return &arena[idx];
}
};
Allocating the arena would be more efficient if you used calloc instead of new + memset. The OS already zeros pages before giving them to user-space processes (to prevent information leakage), so writing them all is just wasted work.
arena = new T[size];
std::memset(arena, 0, sizeof(T) * size);
// vs.
arena = (T*)calloc(size, sizeof(T));
Writing the pages yourself does fault them in, so they're all wired to real physical pages, instead of just copy-on-write mappings for a system-wide shared physical zero page (like they are after new/malloc/calloc). On a NUMA system, the physical page chosen might depend on which thread actually touched the page, rather than which thread did the allocation. But since you're reusing the pool, the first core to write a page might not be the one that ends up using it most.
Maybe something to look for in microbenchmarks / perf counters.
As long as I size my ring allocator to the max # of threads that may concurrently call the functions, there's no risk of overwriting data that pull_event could return. .... Any possible pitfalls with this approach?
The pitfall is, IIUC, that your statement is wrong.
If I have just 2 threads, and 10 elements in the ring buffer, the first thread could call pull_event once, and be "mid-pulling", and then the second thread could call push 10 times, overwriting what thread 1 is pulling.
Again, assuming I understand your code correctly.
Also, as mentioned above,
return &arena[arena_idx.exchange(arena_idx++ % arena_size)];
that arena_idx++ inside the exchange on the same variable, just looks wrong. And in fact is wrong. Two threads could increment it - ThreadA increments to 8 and threadB increments to 9, and then threadB exchanges it to 9, then threadA exchanges it to 8. whoops.
atomic(op1) # atomic(op2) != atomic(op1 # op2)
I worry about what else is wrong in code not shown. I don't mean that as an insult - lock-free is just not easy.
Have you looked at any of the C++ Disruptor (Java) ports that are available?
disruptor--
disruptor
Although they are not complete ports they may offer all that you need. I am currently working on a more fully featured port however it's not quite ready.

Avoid recomputation when data is not changed

Imagine you have a pretty big array of double and a simple function avg(double*,size_t) that computes the average value (just a simple example: both the array and the function could be whatever data structure and algorithm). I would like that if the function is called a second time and the array is not changed in the meanwhile, the return value comes directly from the previous one, without going through the unchanged data.
To hold the previous value looks simple, I just need a static variable inside the function, right? But what about detecting the changes in the array? Do I need to write an interface to access the array which sets a flag to be read by the function? Can something smarter and more portable be done?
As Kerrek SB so astutely put it, this is known as "memoization." I'll cover my personal favorite method at the end (both with double* array and the much easier DoubleArray), so you can skip to there if you just want to see code. However, there are many ways to solve this problem, and I wanted to cover them all, including those suggested by others. Skip to the horizontal rule if you just want to see code.
The first part is some theory and alternate approaches. There are fundamentally four parts to the problem:
Prove the function is idempotent (calling a function once is the same as calling it any number of times)
Cache results keyed to the inputs
Search cached results given a new set of inputs
Invalidating cached results which are no longer accurate/current
The first step is easy for you: average is idempotent. It has no side effects.
Caching the results is a fun step. You obviously are going to create some "key" for the inputs that you can compare against the cached "keys." In Kerrek SB's memoization example, the key is a tuple of all of the arguments, compared against other keys with ==. In your system, the equivalent solution would be to have the key be the contents of the entire array. This means each key comparison is O(n), which is expensive. If the function was more expensive to calculate than the average function is, this price may be acceptable. However in the case of averaging, this key is terribly expensive.
This leads one on the open-ended search for good keys. Dieter Lücking's answer was to key the array pointer. This is O(1), and wicked fast to boot. However, it also makes the assumption that once you've calculated the average for an array, that array's values never change, and that memory address is never re-used for another array. Solutions for this come later, in the invalidation portion of the task.
Another popular key is HotLick's (1) in the comments. You use a unique identifier for the array (pointer or, better yet, a unique integer idx that will never be used again) as your key. Each array then has a "dirty bit for avg" that they are expected to set to true whenever a value is changed. Caches first look for the dirty bit. If it is true, they ignore the cached value, calculate the new value, cache the new value, then clear the dirty bit indicating that the cached value is now valid. (this is really invalidation, but it fit well in this part of the answer)
This technique assumes that there are more calls to avg than updates to the data. If the array is constantly dirty, then avg still has to keep recalculating, but we still pay the price of setting the dirty bit on every write (slowing it down).
This technique also assumes that there is only one function, avg, which needs cached results. If you have many functions, it starts to get expensive to keep all of the dirty bits up to date. The solution is an "epoch" counter. Instead of a dirty bit, you have an integer, which starts at 0. Every write increments it. When you cache a result, you cache not only the identity of the array, but its epoch as well. When you check to see if you have a cached value, you also check to see if the epoch changed. If it did change, you can't prove your old results are current, and have to throw them out.
Storing the results is an interesting task. It is very easy to write a storing algorithm which uses up gobs of memory by remembering hundreds of thousands of old results to avg. Generally speaking, there needs to be a way to let the caching code know that an array has been destroyed, or a way to slowly remove old unused cache results. In the former case, the deallocator of the double arrays needs to let the cache code know that that array is being deallocated. In the latter case, it is common to limit a cache to 10 or 100 entries, and have evict old cache results.
The last piece is invalidation of caches. I spoke earlier regarding the dirty bit. The general pattern for this is that a value inside a cache must be marked invalid if the key it was stored in didn't change, but the values in the array did change. This can obviously never happen if the key is a copy of the array, but it can occur when the key is an identifing integer or a pointer.
Generally speaking, invalidation is a way to add a requirement to your caller: if you want to use avg with caching, here's the extra work you are required to do to help the caching code.
Recently I implemented a system with such caching invalidation scheme. It was very simple, and stemmed from one philosophy: the code which is calling avg is in a better position to determine if the array has changed than avg is itself.
There were two versions of the equvalent of avg: double avg(double* array, int n) and double avg(double* array, int n, CacheValidityObject& validity).
Calling the 2 argument version of avg never cached, because it had no guarantees that array had not changed.
Calling the 3 argument version of avg activated caching. The caller guarentees that, if it passes the same CacheValidityObject to avg without marking it dirty, then the arrays must be the same.
Putting the onus on the caller makes average trivial. CacheValidityObject is a very simple class to hold on to the results
class CacheValidityObject
{
public:
CacheValidityObject(); // creates a new dirty CacheValidityObject
void invalidate(); // marks this object as dirty
// this function is used only by the `avg` algorithm. "friend" may
// be used here, but this example makes it public
boost::shared_ptr<void>& getData();
private:
boost::shared_ptr<void> mData;
};
inline void CacheValidityObject::invalidate()
{
mData.reset(); // blow away any cached data
}
double avg(double* array, int n); // defined as usual
double avg(double* array, int n, CacheValidityObject& validity)
{
// this function assumes validity.mData is null or a shared_ptr to a double
boost::shared_ptr<void>& data = validity.getData();
if (data) {
// The cached result, stored on the validity object, is still valid
return *static_pointer_cast<double>(data);
} else {
// There was no cached result, or it was invalidated
double result = avg(array, n);
data = make_shared<double>(result); // cache the result
return result;
}
}
// usage
{
double data[100];
fillWithRandom(data, 100);
CacheValidityObject dataCacheValidity;
double a = avg(data, 100, dataCacheValidity); // caches the aveerage
double b = avg(data, 100, dataCacheValidity); // cache hit... uses cached result
data[0] = 0;
dataCacheValidity.invalidate();
double c = avg(data, 100, dataCacheValidity); // dirty.. caches new result
double d = avg(data, 100, dataCacheValidity); // cache hit.. uses cached result
// CacheValidityObject::~CacheValidityObject() will destroy the shared_ptr,
// freeing the memory used to cache the result
}
Advantages
Nearly the fastest caching possible (within a few opcodes)
Trivial to implement
Doesn't leak memory, saving cached values only when the caller thinks it may want to use them again
Disadvantages
Requires the caller to handle caching, instead of doing it implicitly for them.
If you wrap the double* array in a class, you can minimize the disadvantage. Assign each algorithm an index (can be done at run time) Have the DoubleArray class maintain a map of cached values. Each modification to DoubleArray invalidates the cached results. This is the most easy to use version, but doesn't work with a naked array... you need a class to help you out
class DoubleArray
{
public:
// all of the getters and setters and constructors.
// Special note: all setters MUST call invalidate()
CacheValidityObject getCache(int inIdx)
{
return mCaches[inIdx];
}
void setCache(int inIdx, const CacheValidityObject& inObj)
{
mCaches[inIdx] = inObj;
}
private:
void invalidate()
{
mCaches.clear();
}
std::map<int, CacheValidityObject> mCaches;
double* mArray;
int mSize;
};
inline int getNextAlgorithmIdx()
{
static int nextIdx = 1;
return nextIdx++;
}
static const int avgAlgorithmIdx = getNextAlgorithmIdx();
double avg(DoubleArray& inArray)
{
CacheValidityObject valid = inArray.getCache(avgAlgorithmIdx);
// use the 3 argument avg in the previous example
double result = avg(inArray.getArray(), inArray.getSize(), valid);
inArray.setCache(avgAlgorithmIdx, valid);
return result;
}
// usage
DoubleArray array(100);
fillRandom(array);
double a = avg(array); // calculates, and caches
double b = avg(array); // cache hit
array.set(0, 5); // invalidates caches
double c = avg(array); // calculates, and caches
double d = avg(array); // cache hit
#include <limits>
#include <map>
// Note: You have to manage cached results - release it with avg(p, 0)!
double avg(double* p, std::size_t n) {
typedef std::map<double*, double> map;
static map results;
map::iterator pos = results.find(p);
if(n) {
// Calculate or get a cached value
if(pos == results.end()) {
pos = results.insert(map::value_type(p, 0.5)).first; // calculate it
}
return pos->second;
}
// Erase a cached value
results.erase(pos);
return std::numeric_limits<double>::quiet_NaN();
}

Queue + Stack C++

How do u push Items to the front of the array, ( like a stack ) without starting at MAXSIZE-1? I've been trying to use the modulus operator to do so..
bool quack::pushFront(const int nPushFront)
{
if ( count == maxSize ) // indicates a full array
{
return false;
}
else if ( count == 0 )
{
++count;
items[0].n = nPushFront;
return true;
}
intBack = intFront;
items[++intBack] = items[intFront];
++count;
items[(top+(count)+maxSize)%maxSize].n = nPushFront;
/*
for ( int shift = count - 1; shift >= 0; --shift )
{
items[shift] = i€tems[shift-1];
}
items[top+1].n = nPushFront; */
return true;
}
"quack" meaning a cross between a queue and a stack. I cannot simply shift my elements by 1 because it is terribly inefficient. I've been working on this for over a month now. I just need some guidence to push_front by using the modulus operator...I dont think a loop is even necessary.
Its funny because I will need to print the list randomly. So if I start adding values to the MAXSIZE-1 element of my integer array, and then need to print the array, I will have garbage values..
not actual code:
pushFront(2);
pushFront(4);
cout << q;
if we started adding from the back i would get several null values.
I cannot just simply shift the array elements down or up by one.
I cant use any stls, or boosts.
Not sure what your problem is. Are you trying to implement a queue (which also can work as a stack, no need for your quack) as a ring buffer?
In that case, you need to save both a front and a back index. The mechanics are described in the article linked above. Pay attention to the “Difficulties” section: in particular, you need to either have an extra variable or pay attention to leave one field empty – otherwise you won’t know how to differentiate between a completely empty and a completely full queue.
Well, it seems kind of silly to rule out the stl, since std::deque is exactly what you want. Amortized constant time random access. Amortized constant insert/removal time from both the front and the back.
This can be achieved with an array with extra space at the beginning and end. When you run out of space at either end, allocate a new array with twice the space and copy everything over, again with space at both the end and the beginning. You need to keep track of the beginning index and the end index in your class.
It seems to me that you have some conflicting requirements:
You have to push to the head of a C++ array primitive.
Without shifting all of the existing elements.
Maintain insertion order.
Short answer: You can't do it, as the above requirements are mutually exclusive.
One of these requirements has to be relaxed.
To help you without having to guess, we need more information about what you are trying to do.