Calculating moving average in C++ - c++

I am trying to calculate the moving average of a signal. The signal value ( a double ) is updated at random times.
I am looking for an efficient way to calculate it's time weighted average over a time window, in real time. I could do it my self, but it is more challenging than I thought.
Most of the resources I've found over the internet are calculating moving average of periodical signal, but mine updates at random time.
Does anyone know good resources for that ?
Thanks

The trick is the following: You get updates at random times via void update(int time, float value). However you also need to also track when an update falls off the time window, so you set an "alarm" which called at time + N which removes the previous update from being ever considered again in the computation.
If this happens in real-time you can request the operating system to make a call to a method void drop_off_oldest_update(int time) to be called at time + N
If this is a simulation, you cannot get help from the operating system and you need to do it manually. In a simulation you would call methods with the time supplied as an argument (which does not correlate with real time). However, a reasonable assumption is that the calls are guaranteed to be such that the time arguments are increasing. In this case you need to maintain a sorted list of alarm time values, and for each update and read call you check if the time argument is greater than the head of the alarm list. While it is greater you do the alarm related processing (drop off the oldest update), remove the head and check again until all alarms prior to the given time are processed. Then do the update call.
I have so far assumed it is obvious what you would do for the actual computation, but I will elaborate just in case. I assume you have a method float read (int time) that you use to read the values. The goal is to make this call as efficient as possible. So you do not compute the moving average every time the read method is called. Instead you precompute the value as of the last update or the last alarm, and "tweak" this value by a couple of floating point operations to account for the passage of time since the last update. (i. e. a constant number of operations except for perhaps processing a list of piled up alarms).
Hopefully this is clear -- this should be a quite simple algorithm and quite efficient.
Further optimization: one of the remaining problems is if a large number of updates happen within the time window, then there is a long time for which there are neither reads nor updates, and then a read or update comes along. In this case, the above algorithm will be inefficient in incrementally updating the value for each of the updates that is falling off. This is not necessary because we only care about the last update beyond the time window so if there is a way to efficiently drop off all older updates, it would help.
To do this, we can modify the algorithm to do a binary search of updates to find the most recent update before the time window. If there are relatively few updates that needs to be "dropped" then one can incrementally update the value for each dropped update. But if there are many updates that need to be dropped then one can recompute the value from scratch after dropping off the old updates.
Appendix on Incremental Computation: I should clarify what I mean by incremental computation above in the sentence "tweak" this value by a couple of floating point operations to account for the passage of time since the last update. Initial non-incremental computation:
start with
sum = 0;
updates_in_window = /* set of all updates within window */;
prior_update' = /* most recent update prior to window with timestamp tweaked to window beginning */;
relevant_updates = /* union of prior_update' and updates_in_window */,
then iterate over relevant_updates in order of increasing time:
for each update EXCEPT last {
sum += update.value * time_to_next_update;
},
and finally
moving_average = (sum + last_update * time_since_last_update) / window_length;.
Now if exactly one update falls off the window but no new updates arrive, adjust sum as:
sum -= prior_update'.value * time_to_next_update + first_update_in_last_window.value * time_from_first_update_to_new_window_beginning;
(note it is prior_update' which has its timestamp modified to start of last window beginning). And if exactly one update enters the window but no new updates fall off, adjust sum as:
sum += previously_most_recent_update.value * corresponding_time_to_next_update.
As should be obvious, this is a rough sketch but hopefully it shows how you can maintain the average such that it is O(1) operations per update on an amortized basis. But note further optimization in previous paragraph. Also note stability issues alluded to in an older answer, which means that floating point errors may accumulate over a large number of such incremental operations such that there is a divergence from the result of the full computation that is significant to the application.

If an approximation is OK and there's a minimum time between samples, you could try super-sampling. Have an array that represents evenly spaced time intervals that are shorter than the minimum, and at each time period store the latest sample that was received. The shorter the interval, the closer the average will be to the true value. The period should be no greater than half the minimum or there is a chance of missing a sample.

#include <map>
#include <iostream>
// Sample - the type of a single sample
// Date - the type of a time notation
// DateDiff - the type of difference of two Dates
template <class Sample, class Date, class DateDiff = Date>
class TWMA {
private:
typedef std::map<Date, Sample> qType;
const DateDiff windowSize; // The time width of the sampling window
qType samples; // A set of sample/date pairs
Sample average; // The answer
public:
// windowSize - The time width of the sampling window
TWMA(const DateDiff& windowSize) : windowSize(windowSize), average(0) {}
// Call this each time you receive a sample
void
Update(const Sample& sample, const Date& now) {
// First throw away all old data
Date then(now - windowSize);
samples.erase(samples.begin(), samples.upper_bound(then));
// Next add new data
samples[now] = sample;
// Compute average: note: this could move to Average(), depending upon
// precise user requirements.
Sample sum = Sample();
for(typename qType::iterator it = samples.begin();
it != samples.end();
++it) {
DateDiff duration(it->first - then);
sum += duration * it->second;
then = it->first;
}
average = sum / windowSize;
}
// Call this when you need the answer.
const Sample& Average() { return average; }
};
int main () {
TWMA<double, int> samples(10);
samples.Update(1, 1);
std::cout << samples.Average() << "\n"; // 1
samples.Update(1, 2);
std::cout << samples.Average() << "\n"; // 1
samples.Update(1, 3);
std::cout << samples.Average() << "\n"; // 1
samples.Update(10, 20);
std::cout << samples.Average() << "\n"; // 10
samples.Update(0, 25);
std::cout << samples.Average() << "\n"; // 5
samples.Update(0, 30);
std::cout << samples.Average() << "\n"; // 0
}

Note: Apparently this is not the way to approach this. Leaving it here for reference on what is wrong with this approach. Check the comments.
UPDATED - based on Oli's comment... not sure about the instability that he is talking about though.
Use a sorted map of "arrival times" against values. Upon arrival of a value add the arrival time to the sorted map along with it's value and update the moving average.
warning this is pseudo-code:
SortedMapType< int, double > timeValueMap;
void onArrival(double value)
{
timeValueMap.insert( (int)time(NULL), value);
}
//for example this runs every 10 seconds and the moving window is 120 seconds long
void recalcRunningAverage()
{
// you know that the oldest thing in the list is
// going to be 129.9999 seconds old
int expireTime = (int)time(NULL) - 120;
int removeFromTotal = 0;
MapIterType i;
for( i = timeValueMap.begin();
(i->first < expireTime || i != end) ; ++i )
{
}
// NOW REMOVE PAIRS TO LEFT OF i
// Below needs to apply your time-weighting to the remaining values
runningTotal = calculateRunningTotal(timeValueMap);
average = runningTotal/timeValueMap.size();
}
There... Not fully fleshed out but you get the idea.
Things to note:
As I said the above is pseudo code. You'll need to choose an appropriate map.
Don't remove the pairs as you iterate through as you will invalidate the iterator and will have to start again.
See Oli's comment below also.

Related

Operator= slowing down simulation

I am running a Monte Carlo simulation of a polymer. The entire configuration of the current state of the system is given by the object called Grid. This is my definition of Grid:
class Grid{
public:
std::vector <Polymer> PolymersInGrid; // all the polymers in the grid
int x; // length of x-edge of grid
int y; // length of y-edge of grid
int z; // length of z-edge of grid
double kT; // energy factor
double Emm_n ; // monomer-solvent when Not aligned
double Emm_a ; // monomer-solvent when Aligned
double Ems; // monomer-solvent interaction
double Energy; // energy of grid
std::map <std::vector <int>, Particle> OccupancyMap; // a map that gives the particle given the location
Grid(int xlen, int ylen, int zlen, double kT_, double Emm_a_, double Emm_n_, double Ems_): x (xlen), y (ylen), z (zlen), kT (kT_), Emm_n(Emm_n_), Emm_a (Emm_a_), Ems (Ems_) { // Constructor of class
// this->instantiateOccupancyMap();
};
// Destructor of class
~Grid(){
};
// assignment operator that allows for a correct transfer of properties. Important to functioning of program.
Grid& operator=(Grid other){
std::swap(PolymersInGrid, other.PolymersInGrid);
std::swap(Energy, other.Energy);
std::swap(OccupancyMap, other.OccupancyMap);
return *this;
}
.
.
.
}
I can go into the details of the object Polymer and Particle, if required.
In my driver code, this is what I am going:
Define maximum number of iterations.
Defining a complete Grid G.
Creating a copy of G called G_.
I am perturbing the configuration of G_.
If the perturbance on G_ is accepted per the Metropolis criterion, I assign G_ to G (G=G_).
Repeat steps 1-4 until maximum number of iterations is achieved.
This is my driver code:
auto start = std::chrono::high_resolution_clock::now();
Grid G_ (G);
int acceptance_count = 0;
for (int i{1}; i< (Nmov+1); i++){
// choose a move
G_ = MoveChooser(G, v);
if ( MetropolisAcceptance (G.Energy, G_.Energy, G.kT) ) {
// accepted
// replace old config with new config
acceptance_count++;
std::cout << "Number of acceptances is " << acceptance_count << std::endl;
G = G_;
}
else {
// continue;
}
if (i % dfreq == 0){
G.dumpPositionsOfPolymers (i, dfile) ;
G.dumpEnergyOfGrid(i, efile, call) ;
}
// G.PolymersInGrid.at(0).printChainCoords();
}
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds> (stop-start);
std::cout << "\n\nTime taken for simulation: " << duration.count() << " milliseconds" << std::endl;
This is the interesting part: if I run the simulation using condition that do not have lots of "acceptances" (low temperatures, bad solvent), the simulation runs pretty fast. However, if there are a large number of acceptances, the simulation gets incredibly slow. My hypothesis is that my assignment operator = is slowing down my simulation.
I ran some tests:
number of acceptances = 25365, wall-clock time = 717770 milliseconds (!)
number of acceptances = 2165, wall-clock time = 64412 milliseconds
number of acceptances = 3000, wall-clock time = 75550 milliseconds
And the trend continues.
Could anyone advise me on how to possibly make this more efficient? Is there a way to bypass the slowdown I am experiencing, I think, due to the = operator?
I would really appreciate any advice you have for me!
One thing that you can certainly do to improve performance is to force moving _G rather than coping it to G:
G = std::move(G_);
After all, at this stage you don't need G_ any more.
Side remark. The fact that you don't need to copy all member data in operator= indicates that your design of Grid is far from perfect, but, well, keep it if the program is small and you're sure you control everything. Anyway, rather than using operator=, you should define and use a member function with a meaningful name, like "fast_and_dirty_swap" etc :-) Then you can define operator= the way suggested by #Jarod42, that is, using = default.
An alternative approach that I used before C++11 is to operate on pointers. In this scenario one would have two Grids, one "real" and one treated as a buffer, or sandbox, and on acceptance on would simply swap the pointers, so that the "buffer" filled with MoveChooser would become the real, current Grid.
A pseudocode:
Create two buffers, previous and current, each capable of storing a simulation state
Initialize current
Create two pointers, p_prev = &previous, p_curr = &currenrt
For as many steps as you wish
compute the next state from *p_curr and store it in *p_prev (e.g. monte_carlo_step(p_curr, p_prev)
swap the pointers: now the current system state is at p_curr and the previous at p_prev.
analyze the results stored at *p_curr

How to improve this random number generator code in c++?

I am C++ student and I am working on creating a random number generator.
Infact I should say my algorithm selects a number within a defined range.
I am writing this just because of my curiosity.
I am not challenging existing library functions.
I always use library functions when writing applications based on randomness but I am again stating that I just want to make it because of my curiosity.
I would also like to know if there is something wrong with my algorithm or with my approach.
Because i googled how PRNGs work and on some sites they said that a mathematical algorithm is there and a predefined series of numbers and a seed just sets the pointer in a different point in the series and after some intervals the sequence repeats itself.
My algorithm just starts moving to and fro in the array of possible values and the seed breaks the loop with different values each time. I don't i this approach is wrong. I got answers suggesting a different algorithm but they didn't explain What's wrong with my current algorithm?
Yes,there was a problem with my seed as it was not precise and made results little predictable as here:-
cout<
<
rn(50,100);
The results in running four times are 74,93,56,79.
See the pattern of "increasing order".
And for large ranges patterns could be seen easily.I got an answer on getting good seeds but that too recommended a new algorithm(but didn't say why?).
An alternative way could be to shuffle my array randomly generating a new sequence every time.And the pattern of increasing order will go off.Any help with that rearranging too will also be good.Here is the code below.And if my function is not possible please notify me.
Thanking you in anticipation.
int rn(int lowerlt, int upperlt)
{
/* Over short ranges, results are satisfactory.
* I want to make it effective for big ranges.
*/
const int size = upperlt - lowerlt; // Constant size of the integer array.
int ar[size]; // Array to store all possible values within defined range.
int i, x, ret; // Variables to control loops and return value.
long pointer = 0; //pointer variable. The one which breaks the main loop.
// Loop to initialize the array with possible values..
for (i=0, x=lowerlt; x <= upperlt; i++, x++)
ar[i]=x;
long seed = time(0);
//Main loop . To find the random number.
for (i=0; pointer <= seed; i++, pointer++)
{
ret = ar[i];
if (i == size-1)
{
// Reverse loop.
for (; i >= 0; i--)
{
ret=ar[i];
}
}
}
return ret;
}
Caveat: From your post, aside from your random generator algorithm, one of your problems is getting a good seed value, so I'll address that part of it.
You could use /dev/random to get a seed value. That would be a great place to start [and would be sufficient on its own], but might be considered "cheating" from some perspective.
So, here are some other sources of "entropy":
Use a higher resolution time of day clock source: gettimeofday or clock_gettime(CLOCK_REALTIME,...) call it "cur_time". Use only the microsecond or nanosecond portion respectively, call it "cur_nano". Note that cur_nano is usually pretty random all by itself.
Do a getpid(2). This has a few unpredictable bits because between invocations other programs are starting and we don't know how many.
Create a new temp file and get the file's inode number [then delete it]. This varies slowly over time. It may be the same on each invocation [or not]
Get the high resolution value for the system's time of day clock when the system was booted, call it "sysboot".
Get the high resolution value for the start time of your "session": When your program's parent shell was started, call it "shell_start".
If you were using Linux, you could compute a checksum of /proc/interrupts as that's always changing. For other systems, get some hash of the number of interrupts of various types [should be available from some type of syscall].
Now, create some hash of all of the above (e.g.):
dev_random * cur_nano * (cur_time - sysboot) * (cur_time - shell_start) *
getpid * inode_number * interrupt_count
That's a simple equation. You could enhance it with some XOR and/or sum operations. Experiment until you get one that works for you.
Note: This only gives you the seed value for your PRNG. You'll have to create your PRNG from something else (e.g. earl's linear algorithm)
unsigned int Random::next() {
s = (1664525 * s + 1013904223);
return s;
}
's' is growing with every call of that function.
Correct is
unsigned int Random::next() {
s = (1664525 * s + 1013904223) % xxxxxx;
return s;
}
Maybe use this function
long long Factor = 279470273LL, Divisor = 4294967291LL;
long long seed;
next()
{
seed = (seed * Factor) % Divisor;
}

Determining if 5 seconds have passed

I'm trying to determine if five seconds have passed in a console application since the last time I checked. I think my logic is slightly off and I don't know how to resolve it.
My lastCheck variable is firstly 0 when the program begins. It's responsible for holding the "old time".
LastCheck is updated by CheckSeconds(), which gives it a new "old time"
If the LastCheck was equal to 1232323, and the now variable is currently equal to 1227323 then I would know 5000 milliseconds have passed. (in reality, the numbers are much greater than this)
Else, I don't want anything to happen, I want to wait until these five seconds have actually passed.
BACKEND
inline std::vector<int> CheckSeconds(int previous, int timeinseconds)
{
//check if a certain amount of seconds have passed.
int now = GetTickCount();
int timepassed = 0;
std::vector<int> trueandnewtime;
//if the current time minus the old time is greater than 5000, then that means more than 5000 milliseoncds passed.
//therefore the timepassed is true.
if (now - previous > 5000)
timepassed = 1;
trueandnewtime.push_back(timepassed);
trueandnewtime.push_back(now);
return trueandnewtime;
}
FRONTEND
storage = CheckSeconds(LastCheck, 5);
LastCheck = storage.at(1);
if (storage.at(0) == 1)
{
....blahblahblah.....
}
Anyone know what I'm doing wrong? I must have a logic error somewhere or I'm being dumb.
Also worth noting, this code is in a while loop, getting constantly run at Sleep(60); It's a console application at the momemnt.
Appreciate any assistance.
Fixed it by putting the Lastcheck set into the loop.

c++ stack efficient for multicore application

I am trying to code a multicode Markov Chain in C++ and while I am trying to take advantage of the many CPUs (up to 24) to run a different chain in each one, I have a problem in picking a right container to gather the result the numerical evaluations on each CPU. What I am trying to measure is basically the average value of an array of boolean variables. I have tried coding a wrapper around a `std::vector`` object looking like that:
struct densityStack {
vector<int> density; //will store the sum of boolean varaibles
int card; //will store the amount of elements we summed over for normalizing at the end
densityStack(int size){ //constructor taking as only parameter the size of the array, usually size = 30
density = vector<int> (size, 0);
card = 0;
}
void push_back(vector<int> & toBeAdded){ //method summing a new array (of measurements) to our stack
for(auto valStack = density.begin(), newVal = toBeAdded.begin(); valStack != density.end(); ++valStack, ++ newVal)
*valStack += *newVal;
card++;
}
void savef(const char * fname){ //method outputting into a file
ofstream out(fname);
out.precision(10);
out << card << "\n"; //saving the cardinal in first line
for(auto val = density.begin(); val != density.end(); ++val)
out << << (double) *val/card << "\n";
out.close();
}
};
Then, in my code I use a single densityStack object and every time a CPU core has data (can be 100 times per second) it will call push_back to send the data back to densityStack.
My issue is that this seems to be slower that the first raw approach where each core stored each array of measurement in file and then I was using some Python script to average and clean (I was unhappy with it because storing too much information and inducing too much useless stress on the hard drives).
Do you see where I can be losing a lot of performance? I mean is there a source of obvious overheading? Because for me, copying back the vector even at frequencies of 1000Hz should not be too much.
How are you synchronizing your shared densityStack instance?
From the limited info here my guess is that the CPUs are blocked waiting to write data every time they have a tiny chunk of data. If that is the issue, a simple technique to improve performance would be to reduce the number of writes. Keep a buffer of data for each CPU and write to the densityStack less frequently.

algorithm for calculating a running clicks/second value (ala a speedometer)

I'm trying to figure out how to produce a running calculation of clicks per second (e.g. an app with a window I click on and it gives me a speedometer-like value of the 'speed' of my clicks in clicks per second). For some reason the algorithm is eluding me.
It's easy to figure out if I just want to figure out clicks per second if at each second I report how many clicks happened in the last second. But where it gets tricky is if there was one click in second 1, then 0 clicks in seconds 2-9 and 1 click in second 10. Presumably that would be .2 clicks per second--although really only if it was kept up and averaged out to that over time. If that click in second 10 was followed by 0 clicks for 40 seconds, then it should be 0 clicks/second, not .04 clicks/second.
So clearly I need some kind of window within which I'm willing to presume the clicks are part of a pattern, or at least associated with the last ones. But it's just not making sense to me.
I'm using openframeworks for this, so have an update() function that is called more than once/second (say 30x/sec), and have a mousePressed() function that allows me to increment a variable to track the clicks. i can use difftime() and time() to track whether I just crossed into a new second, and then use fmod() to figure out if I just crossed some larger interval.
Any suggestions are appreciated.
I think you want to calculate the running average of the clicks per second. You would use a circular buffer of counters of a length of say 30 for a 30 second window. The average clicks per second is the sum of the counters divided by 30.
An index points to the current counter, the index is incremented modulo 30 every second, and the counter at the new position is set to zero.
example:
const unsigned BUFFER_SIZE = 30;
unsigned counters[BUFFER_SIZE];
unsigned current = 0;
time_t last;
void init() {
time(&last);
}
void update() {
time_t now;
time(&now);
while (now - last >= 1) {
++last;
current = (current+1)%BUFFER_SIZE;
counters[current] = 0;
}
}
void mousePressed() {
++counters[current];
}
float average() {
float sum = 0;
for (int i = 0; i < BUFFER_SIZE; ++i) {
sum += counters[i];
}
return sum/BUFFER_SIZE;
}
This is pseudo code, but I think it will do what you are asking:
onUpdate() {
if (currentTime() - lastClickTime > idleTimeout) {
// reset the clickometer to zero
} else {
// calculate the speed
}
}
onMouseClick() {
lastClickTime = currentTime();
// and whatever else needs to happen
}
Basically you are just tracking the time of the last click, and making sure it happened within the idleTimeout, which you obviously have to define for some span of time.