Implementing Exponential Moving Average in C++ - c++

I am developing a small trading robot as an exercise. He receives stock prices day after day (represented as iterations).
Here's what my Trade class looks like:
class Trade
{
private:
int capital_;
int days_; // Total number of days of available stock prices
int daysInTrading_; // Increments as days go by.
std::list<int> stockPrices_; // Contains stock prices day after day.
int currentStock_; // Current stock we are dealing with.
int lastStock_; // Last stock dealt with
int trend_; // Either {-1; 0; 1} depending on the trend.
int numOfStocks_; // Number of stocks in our possession
int EMA_; // Exponential Moving Average
int lastEMA_; // Last EMA
public:
// functions
};
As you can see from my last two attributes, I wish to implement an Exponential Moving Average as part of a Trend Following Algorithm.
But I think I didn't quite understand how to implement it; here's my calcEMA function that simply calculates the EMA:
int Trade::calcEMA()
{
return ((this->currentStock_ - this->lastEMA_
* (2/(this->daysInTrading_ + 1)))
+ this->lastEMA_);
}
But when my stock values (passed in a file) are like such:
1000, 1100, 1200, 1300, 1400, 1500, 1400, 1300, 1200, 1100, 1000
As to make sure my EMA makes sense, and well... it does not !
Where did I go wrong on the operation?
Aditionally, what value should I give lastEMA if it's the first time I call calcEMA?

I believe that you are missing a parentheses in the "calcEMA" function. How about breaking the expression up into smaller expressions with temporary variables to hold intermediate results like this?
int Trade::calcEMA()
{
auto mult = 2/(timePeriod_ + 1);
auto rslt = (currentStock_ - lastEMA_) * mult + lastEMA_;
return rslt;
}
Also, as user PaulMcKenzie pointed out in the comment on your question, you are using the integer to do floating point calculation. You may consider using float or double to avoid possible truncation.
Here are my suggestions:
  An EMA like yours is defined for a time period. While daysInTrading is less or equal to timePeriod, lastEMA should be set to a normal average.
  Once daysInTrading is greater than your timePeriod you can start calling your "calcEMA" function with initialized lastEMA.
  Please, remember to update lastEMA after each call to the "calcEMA" function.
Here is my code for you:
#include <vector>
#include <list>
#include <iostream>
// calculate a moving average
double calcMA (double previousAverage,
unsigned int previousNumDays,
double newStock) {
auto rslt = previousNumDays * previousAverage + newStock;
return rslt / (previousNumDays + 1.0);
}
// calculate an exponential moving average
double calcEMA (double previousAverage,
int timePeriod,
double newStock) {
auto mult = 2.0 / (timePeriod + 1.0);
auto rslt = (newStock - previousAverage) * mult + previousAverage;
return rslt;
}
class Trade {
unsigned int timePeriod_ = 5;
double lastMA_ = 0.0;
std::list<double> stockPrices_;
public:
void addStock (double newStock) {
stockPrices_.push_back(newStock);
auto num_days = stockPrices_.size();
if (num_days <= timePeriod_)
lastMA_ = calcMA(lastMA_, num_days - 1, newStock);
else
lastMA_ = calcEMA(lastMA_, num_days - 1, newStock);
}
double getAverage() const { return lastMA_; }
};
// ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
int main() {
std::vector<double> stocks =
{1000, 1100, 1200, 1300, 1400, 1500,
1400, 1300, 1200, 1100, 1000};
Trade trade;
for (auto stock : stocks)
trade.addStock(stock);
std::cout << "Average: " << trade.getAverage() << std::endl;
return 0;
}

The operation is wrong, as you noticed.
Disclaimer I got this algorithm from wikipedia, and as such might no be accurate. Here (page 3) might be a better one, but I can't judge, I never used those algorithms and so have no idea what I'm talking about :)
c(EMA) = y(EMA) + a * (c(price) - y(EMA))
c(EMA) is current EMA
y(EMA) is previous EMA
a is some "random" value between 0 and 1
c(price) is current price
But you did almost the same thing:
c(EMA) = (c(price) - y(EMA) * b) + y(EMA)
I don't know why you did 2 / daysInTrading_ + 1, but this will not always be a value between 0 and 1 (actually, it might even be most of the time 0, because those are all intergers).
You put a parenthesis at the wrong place (after b, and not after y(EMA)).
So the operation will now look like this:
lastEMA_ + 0.5 * (currentStock_ - lastEMA_)
For the first lastEMA_, according to Wikipedia:
S1 is undefined. S1 may be initialized in a number of different ways, most commonly by setting S11 [First element in the list], though other techniques exist, such as setting S1 to an average of the first 4 or 5 observations.
The importance of the S1 initialisations effect on the resultant moving average depends on α; smaller α values make the choice of S1 relatively more important than larger α values, since a higher α discounts older observations faster.

There are generally two accepted forms of EMA.
The traditional:
m = 2/(1+n) // where n >= 1
EMA = m * currentPrice + (1-m) * previousEMA
rf the Wilder:
m = 1/n // where n >= 1
EMA Wilder = m * currentPrice + (1-m) * previousEMA

Related

"Probability Based" Time Delay of Arrival Algorithm Producing Weird Results?

I've written the following algorithm designed to solve the Time Delay of Arrival problem through a "brute force" method. The problem is as follows: given the known locations of three receivers in a plane, and the propagating speed of some signal, determine the location of the signal source knowing only the times at which each receiver "saw" the signal.
The algorithm works by assuming the source to be within a 1000 x 1000 kilometre square area, and then iterating (with 1 km "resolution") over every possible location, calculating the time to travel to each receiver and determining which location matches most closely with the known delay of arrival between each receiver (so, for each location [x,y], I calculate the time to arrive to receivers 1, 2, and 3, then determine how close time to arrive at 1 - time to arrive at 2 is to the live data, and similarly for the combinations 1 - 3 and 2 - 3 (ignoring other possible combinations, for simplicity).
Here's the problem: it's highly unlikely each signal event is coming from the same direction. However, my code seems to suggest every event is at [0,0]. While it is technically possible that this is the case, it is far more likely that there is something wrong with my code, so for the purposes of this question let's assume that to be the case. Perhaps I've made some obvious mistake here?
#define c 299792
#define statn1x 3.00000
#define statn1y 3.60000
#define statn2x 2.10000
#define statn2y 2.10000
#define statn3x 0.96000
#define statn3y 3.60000
void findProb(double alpha, double gamma, double beta){
int x,y;
double thld = DBL_MAX;
for(int i = 0; i < 1000; ++i){
for(int j = 0; j < 1000; ++j){
double alphaEst = sqrt(pow(i-statn1x,2) + pow(j-statn1y,2)) / c;
double betaEst = sqrt(pow(i-statn1x,2) + pow(j-statn1y,2)) / c;
double gammaEst = sqrt(pow(i-statn1x,2) + pow(j-statn1y,2)) / c;
std::cout << i << "," << j << "\n";
if( std::max(std::max(fabs((alphaEst-betaEst) - alpha), fabs((alphaEst-gammaEst)-beta) ), fabs((betaEst-gammaEst)-gamma)) -
std::min(std::min(fabs((alphaEst-betaEst) - alpha), fabs((alphaEst-gammaEst)-beta) ), fabs((betaEst-gammaEst)-gamma)) < thld){
thld = std::max(std::max(fabs((alphaEst-betaEst) - alpha), fabs((alphaEst-gammaEst)-beta) ), fabs((betaEst-gammaEst)-gamma)) -
std::min(std::min(fabs((alphaEst-betaEst) - alpha), fabs((alphaEst-gammaEst)-beta) ), fabs((betaEst-gammaEst)-gamma));
x = i;
y = j;
}
}
}
//std::cout << x << "," << y << "\n";
}
void localize(){
ROOT::RDataFrame tdoa("D","./coincidences.root");
vector<double> alpha, beta, gamma;
tdoa.Foreach([&](double delay){ alpha.push_back(delay); },{"x"});
tdoa.Foreach([&](double delay){ beta.push_back(delay); },{"y"});
tdoa.Foreach([&](double delay){ gamma.push_back(delay); },{"z"});
int iter = std::min(std::min(alpha.size(), beta.size()), gamma.size());
for(int i = 0; i < iter; ++i){
findProb(alpha[i], beta[i], gamma[i]);
}
}
Note the terminology used in this project:
alpha refers to time to arrive (ttoa) at station 1 - ttoa station 2.
gamma refers to ttoa station 1 - ttoa station 3
beta refers to ttoa station 2 - ttoa station 3
alphaEst refers to the calculated estimated travel time from some location [x,y] to station 1
betaEst refers to the calculated estimated travel time from some [x,y] to station 2
... and likewise for gammaEst.
Note that I'm also working now to produce a smaller, reproducible example (if possible). I'll add that as soon as I can.
alphaEst, betaEst and gammaEst are assigned the same values.
betaEst = sqrt(pow(i-statn1x,2) + pow(j-statn1y,2)) / c
probably it should be
betaEst = sqrt(pow(i-statn2x,2) + pow(j-statn2y,2)) / c

How to divide a number into several, unequal, yet increasing numbers [ for sending a PlaceOrder( OP_BUY, lots ) contract XTO ]

I try to create an MQL4-script (an almost C++ related language, MQL4) where I want to divide a double value into 9 parts, where the fractions would be unequal, yet increasing
My current code attempts to do it this way (pseudo-code) :
Lots1 = 0.1;
Lots2 = (Lots1 / 100) * 120;//120% of Lot1
Lots3 = (Lots2 / 100) * 130;//130% of Lot2
Lots4 = (Lots3 / 100) * 140;//140% of Lot3
Lots5 = (Lots4 / 100) * 140;//140% of Lot4
Lots6 = (Lots5 / 100) * 160;//160% of Lot5
Lots7 = (Lots6 / 100) * 170;//170% of Lot6
Lots8 = (Lots7 / 100) * 180;//180% of Lot7
Lots9 = (Lots8 / 100) * 190;//190% of Lot8
...
or better :
double Lots = 0.1; // a Lot Size
double lot = Lots;
...
/* Here is the array with percentages of lots' increments
in order */
int AllZoneLots[8] = { 120, 130, 140, 140, 160, 170, 180, 190 }; // 120%, 130%,...
/* Here, the lot sizes are used by looping the array
and increasing the lot size by the count */
for( int i = 0; i < ArraySize( AllZoneLots ); i++ ) {
lots = AllZoneLots[i] * ( lots / 100 ) *;
// PlaceOrder( OP_BUY, lots );
}
But, what I want is to just have a fixed value of 6.7 split into 9 parts, like these codes do, yet to have the value increasing, rather than being same...
e.g, 6.7 split into :
double lots = { 0.10, 0.12, 0.16, 0.22, 0.31, 0.50, 0.85, 1.53, 2.91 };
/* This is just an example
of how to divide a value of 6.7 into 9, growing parts
This can be done so as to make equal steps in the values. If there are 9 steps, divide the value by 45 to get the first value, and the equal step x. Why? Because the sum of 1..9 is 45.
x = 6.7 / 45
which is 0.148889
The first term is x, the second term is 2 * x, the third term is 3 * x etc. They add up to 45 * x which is 6.7, but it's better to divide last. So the second term, say, would be 6.7 * 2 / 45;
Here is code which shows how it can be done in C, since MQL4 works with C Syntax:
#include <stdio.h>
int main(void) {
double val = 6.7;
double term;
double sum = 0;
for(int i = 1; i <= 9; i++) {
term = val * i / 45;
printf("%.3f ", term);
sum += term;
}
printf("\nsum = %.3f\n", sum);
}
Program output:
0.149 0.298 0.447 0.596 0.744 0.893 1.042 1.191 1.340
sum = 6.700
Not sure I understood right, but probably you need total of 3.5 shared between all lots.
And I can see only 8 lots not counting initial one.
totalPercentage = 0;
for(int i = 0; i < ArraySize(AllZoneLots); i++) {
totalPercentage += AllZoneLots[i];
}
double totalValue = 3.5;
// total value is total percentage, Lots1 - 100%, so:
Lots1 = totalValue / totalPercentage * 100.00;
Then you continue with your code.
If you want to include Lots1, you just add 100 to the total and do the same.
Q : How to divide a number into several, unequal, yet increasing numbers [ for sending a PlaceOrder( OP_BUY, lots ) contract XTO ]?
A : The problem is not as free as it might look for a first sight :
In metatrader Terminal ecosystem, the problem formulation has also to obey the externally decided factors ( that are mandatory for any XTO with an ambition not to get rejected, as being principally incompatible with the XTO Terms & Conditions set, and to get filled ~ "placed" At Market )
These factors are reportable via a call to:
MarketInfo( <_a_SymbolToReportSTRING>, MODE_MINLOT ); // a minimum permitted size
MarketInfo( <_a_SymbolToReportSTRING>, MODE_LOTSTEP ); // a mandatory size-stepping
MarketInfo( <_a_SymbolToReportSTRING>, MODE_MAXLOT ); // a maximum permitted size
Additionally, any such lot-size has to be prior of submitting an XTO also "normalised" for given number of decimal places, so as to successfully placed / accepted by the Trading-Server on the Broker's side. A failure to do so results in remotely rejected XTO-s ( which obviously come at a remarkable blocking / immense code-execution latency penalty one would always want to prevent from ever happening in real trading )
Last, but not least, any such XTO sizing has to be covered by a safe amount of leveraged equity ( checking the free-margin availability first, before ever sending any such XTO for reasons just mentioned above ).
The code:
While the initial pseudo-code above, does a progressive ( Martingale-alike ) lot-size scaling:
>>> aListOfFACTORs = [ 100, 120, 130, 140, 140, 160, 170, 180, 190 ]
>>> for endPoint in range( len( aListOfFACTORs ) ):
... product = 1.
... for item in aListOfFACTORs[:1+endPoint]:
... product *= item / 100.
... print( "Lots{0:} ~ ought be about {1:} times the amount of Lots1".format( 1 + endPoint, product ) )
...
Lots1 ~ ought be about 1.0 times the amount of Lots1
Lots2 ~ ought be about 1.2 times the amount of Lots1
Lots3 ~ ought be about 1.56 times the amount of Lots1
Lots4 ~ ought be about 2.184 times the amount of Lots1
Lots5 ~ ought be about 3.0576 times the amount of Lots1
Lots6 ~ ought be about 4.89216 times the amount of Lots1
Lots7 ~ ought be about 8.316672 times the amount of Lots1
Lots8 ~ ought be about 14.9700096 times the amount of Lots1
Lots9 ~ ought be about 28.44301824 times the amount of Lots1
the _MINLOT, _LOTSTEP and _MAXLOT put the game into a new light.
Any successful strategy is not free to chose the sizes. Given the said 9-steps and a fixed amount of the total-amount ~ 6.7 lots, the process can obey the stepping and total, plus, it must obey the MarketInfo()-reported sizing algebra
Given 9-steps are mandatory,
each one has to be at least _MINLOT-sized:
double total_amount_to_split = aSizeToSPLIT;
total_amount_to_split = Min( aSizeToSPLIT, // a wished-to-have-sizing
FreeMargin/LotInBaseCurr*sFty // a FreeMargin-covered size
);
int next = 0;
while ( total_amount_to_split >= _MINLOT )
{ total_amount_to_split -= _MINLOT;
lot_size[next++] = _MINLOT;
}
/*
###################################################################################
------------------------------------------------- HERE, WE HAVE 0:next lot_sizes
next NEED NOT == 9
If there is anything yet to split:
there is an integer amount of _LOTSTEP-s to distribute among 'em
HERE, and ONLY here, you have a freedom to decide about split/mapping
of the integer amount of _LOTSTEP-sized
additions to the _MINLOT "pre"-sets
in lot_size[]-s
YET, still no more than _MAXLOT is permissible for the above explained reasons
------------------------------------------------- CODE has to obey this, if XTO-s
are to
get a chance
###################################################################################
*/

Why does switching from Mersenne twister to other PRNGs in Gradient Noise Generator give bad results?

I've been trying to create a generalized Gradient Noise generator (which doesn't use the hash method to get gradients). The code is below:
class GradientNoise {
std::uint64_t m_seed;
std::uniform_int_distribution<std::uint8_t> distribution;
const std::array<glm::vec2, 4> vector_choice = {glm::vec2(1.0, 1.0), glm::vec2(-1.0, 1.0), glm::vec2(1.0, -1.0),
glm::vec2(-1.0, -1.0)};
public:
GradientNoise(uint64_t seed) {
m_seed = seed;
distribution = std::uniform_int_distribution<std::uint8_t>(0, 3);
}
// 0 -> 1
// just passes the value through, origionally was perlin noise activation
double nonLinearActivationFunction(double value) {
//return value * value * value * (value * (value * 6.0 - 15.0) + 10.0);
return value;
}
// 0 -> 1
//cosine interpolation
double interpolate(double a, double b, double t) {
double mu2 = (1 - cos(t * M_PI)) / 2;
return (a * (1 - mu2) + b * mu2);
}
double noise(double x, double y) {
std::mt19937_64 rng;
//first get the bottom left corner associated
// with these coordinates
int corner_x = std::floor(x);
int corner_y = std::floor(y);
// then get the respective distance from that corner
double dist_x = x - corner_x;
double dist_y = y - corner_y;
double corner_0_contrib; // bottom left
double corner_1_contrib; // top left
double corner_2_contrib; // top right
double corner_3_contrib; // bottom right
std::uint64_t s1 = ((std::uint64_t(corner_x) << 32) + std::uint64_t(corner_y) + m_seed);
std::uint64_t s2 = ((std::uint64_t(corner_x) << 32) + std::uint64_t(corner_y + 1) + m_seed);
std::uint64_t s3 = ((std::uint64_t(corner_x + 1) << 32) + std::uint64_t(corner_y + 1) + m_seed);
std::uint64_t s4 = ((std::uint64_t(corner_x + 1) << 32) + std::uint64_t(corner_y) + m_seed);
// each xy pair turns into distance vector from respective corner, corner zero is our starting corner (bottom
// left)
rng.seed(s1);
corner_0_contrib = glm::dot(vector_choice[distribution(rng)], {dist_x, dist_y});
rng.seed(s2);
corner_1_contrib = glm::dot(vector_choice[distribution(rng)], {dist_x, dist_y - 1});
rng.seed(s3);
corner_2_contrib = glm::dot(vector_choice[distribution(rng)], {dist_x - 1, dist_y - 1});
rng.seed(s4);
corner_3_contrib = glm::dot(vector_choice[distribution(rng)], {dist_x - 1, dist_y});
double u = nonLinearActivationFunction(dist_x);
double v = nonLinearActivationFunction(dist_y);
double x_bottom = interpolate(corner_0_contrib, corner_3_contrib, u);
double x_top = interpolate(corner_1_contrib, corner_2_contrib, u);
double total_xy = interpolate(x_bottom, x_top, v);
return total_xy;
}
};
I then generate an OpenGL texture to display with like this:
int width = 1024;
int height = 1024;
unsigned char *temp_texture = new unsigned char[width*height * 4];
double octaves[5] = {2,4,8,16,32};
for( int i = 0; i < height; i++){
for(int j = 0; j < width; j++){
double d_noise = 0;
d_noise += temp_1.noise(j/octaves[0], i/octaves[0]);
d_noise += temp_1.noise(j/octaves[1], i/octaves[1]);
d_noise += temp_1.noise(j/octaves[2], i/octaves[2]);
d_noise += temp_1.noise(j/octaves[3], i/octaves[3]);
d_noise += temp_1.noise(j/octaves[4], i/octaves[4]);
d_noise/=5;
uint8_t noise = static_cast<uint8_t>(((d_noise * 128.0) + 128.0));
temp_texture[j*4 + (i * width * 4) + 0] = (noise);
temp_texture[j*4 + (i * width * 4) + 1] = (noise);
temp_texture[j*4 + (i * width * 4) + 2] = (noise);
temp_texture[j*4 + (i * width * 4) + 3] = (255);
}
}
Which give good results:
But gprof is telling me that the Mersenne twister is taking up 62.4% of my time and growing with larger textures. Nothing else individual takes any where near as much time. While the Mersenne twister is fast after initialization, the fact that I initialize it every time I use it seems to make it pretty slow.
This initialization is 100% required for this to make sure that the same x and y generates the same gradient at each integer point (so you need either a hash function or seed the RNG each time).
I attempted to change the PRNG to both the linear congruential generator and Xorshiftplus, and while both ran orders of magnitude faster, they gave odd results:
LCG (one time, then running 5 times before using)
Xorshiftplus
After one iteration
After 10,000 iterations.
I've tried:
Running the generator several times before utilizing output, this results in slow execution or simply different artifacts.
Using the output of two consecutive runs after initial seed to seed the PRNG again and use the value after wards. No difference in result.
What is happening? What can i do to get faster results that are of the same quality as the mersenne twister?
OK BIG UPDATE:
I don't know why this works, I know it has something to do with the prime number utilized, but after messing around a bit, it appears that the following works:
Step 1, incorporate the x and y values as seeds separately (and incorporate some other offset value or additional seed value with them, this number should be a prime/non trivial factor)
Step 2, Use those two seed results into seeding the generator again back into the function (so like geza said, the seeds made were bad)
Step 3, when getting the result, instead of using modulo number of items (4) trying to get, or & 3, modulo the result by a prime number first then apply & 3. I'm not sure if the prime being a mersenne prime matters or not.
Here is the result with prime = 257 and xorshiftplus being used! (note I used 2048 by 2048 for this one, the others were 256 by 256)
LCG is known to be inadequate for your purpose.
Xorshift128+'s results are bad, because it needs good seeding. And providing good seeding defeats the whole purpose of using it. I don't recommend this.
However, I recommend using an integer hash. For example, one from Bob's page.
Here's a result of the first hash of that page, it looks OK to me, and it is fast (I think it is much faster than Mersenne Twister):
Here's the code I've written to generate this:
#include <cmath>
#include <stdio.h>
unsigned int hash(unsigned int a) {
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
unsigned int ivalue(int x, int y) {
return hash(y<<16|x)&0xff;
}
float smooth(float x) {
return 6*x*x*x*x*x - 15*x*x*x*x + 10*x*x*x;
}
float value(float x, float y) {
int ix = floor(x);
int iy = floor(y);
float fx = smooth(x-ix);
float fy = smooth(y-iy);
int v00 = ivalue(iy+0, ix+0);
int v01 = ivalue(iy+0, ix+1);
int v10 = ivalue(iy+1, ix+0);
int v11 = ivalue(iy+1, ix+1);
float v0 = v00*(1-fx) + v01*fx;
float v1 = v10*(1-fx) + v11*fx;
return v0*(1-fy) + v1*fy;
}
unsigned char pic[1024*1024];
int main() {
for (int y=0; y<1024; y++) {
for (int x=0; x<1024; x++) {
float v = 0;
for (int o=0; o<=9; o++) {
v += value(x/64.0f*(1<<o), y/64.0f*(1<<o))/(1<<o);
}
int r = rint(v*0.5f);
pic[y*1024+x] = r;
}
}
FILE *f = fopen("x.pnm", "wb");
fprintf(f, "P5\n1024 1024\n255\n");
fwrite(pic, 1, 1024*1024, f);
fclose(f);
}
If you want to understand, how a hash function work (or better yet, which properties a good hash have), check out Bob's page, for example this.
You (unknowingly?) implemented a visualization of PRNG non-random patterns. That looks very cool!
Except Mersenne Twister, all your tested PRNGs do not seem fit for your purpose. As I have not done further tests myself, I can only suggest to try out and measure further PRNGs.
The randomness of LCGs are known to be sensitive to the choice of their parameters. In particular, the period of a LCG is relative to the m parameter - at most it will be m (your prime factor) & for many values it can be less.
Similarly, the careful parameters selection is required to get a long period from Xorshift PRNGs.
You've noted that some PRNGs give good procedural generation results while other do not. In order to isolate the cause, I would factor out the proc gen stuff & examine the PRNG output directly. An easy way to visualize the data is to build a grey scale image where each pixel value is a (possibly scaled) random value. For image based stuff, I find this to be an easy way to find stuff that may lead to visual artifacts. Any artifacts you see with this are likely to cause issues with your proc gen output.
Another option is to try something like the Diehard tests. If the aforementioned image test failed to reveal any problems, I might use this just to be sure my PRNG techniques were trustworthy.
Note that your code seeds the PRNG, then generates one pseudorandom number from the PRNG. The reason for the nonrandomness in xorshift128+ that you discovered is that xorshift128+ simply adds the two halves of the seed (and uses the result mod 264 as the generated number) before changing its state (review its source code). This makes that PRNG considerably different from a hash function.
What you see is the practical demonstration of quality of PRNG. Mersenne Twister is one of the best PRNGs with good performance, it passes DIEHARD tests. One should know that generating a random numbers is not an easy computational task, so looking for a better performance will inevitably result in poor quality. LCG is known to be simplest and worst PRNG ever designed and it clearly shows two-dimensional correlation as in your picture. The quality of Xorshift generators largely depend on bitness and parameters. They are definitely worse than Mersenne Twister, but some (xorshift128+) may work good enough to pass BigCrush battery of TestU01 tests.
In other words, if you are making an important physical modelling numerical experiment, you better continue to use Mersenne Twister as known to be a good trade-off between speed and quality and it comes in many standard libraries. On a less important case you may try to use xorshift128+ generator. For an ultimate results you need to use cryptographical-quality PRNG (none of mentioned here may be used for cryptographical purposes).

How to do logarithmic binning on a histogram?

I'm looking for a technique to logarithmically bin some data sets. We've got data with values ranging from _min to _max (floats >= 0) and the user needs to be able to specify a varying number of bins _num_bins (some int n).
I've implemented a solution taken from this question and some help on scaling here but my solution stops working when my data values lie below 1.0.
class Histogram {
double _min, _max;
int _num_bins;
......
};
double Histogram::logarithmicValueOfBin(double in) const {
if (in == 0.0)
return _min;
double b = std::log(_max / _min) / (_max - _min);
double a = _max / std::exp(b * _max);
double in_unscaled = in * (_max - _min) / _num_bins + _min;
return a * std::exp(b * in_unscaled) ;
}
When the data values are all greater than 1 I get nicely sized bins and can plot properly. When the values are less than 1 the bins come out more or less the same size and we get way too many of them.
I found a solution by reimplementing an opensource version of Matlab's logspace function.
Given a range and a number of bins you need to create an evenly spaced numerical sequence
module.exports = function linspace(a,b,n) {
var every = (b-a)/(n-1),
ranged = integers(a,b,every);
return ranged.length == n ? ranged : ranged.concat(b);
}
After that you need to loop through each value and with your base (e, 2 or 10 most likely) store the power and you get your bin ranges.
module.exports.logspace = function logspace(a,b,n) {
return linspace(a,b,n).map(function(x) { return Math.pow(10,x); });
}
I rewrote this in C++ and it's able to support ranges > 0.
You can do something like the following
// Create isolethargic binning
int T_MIN = 0; //The lower limit i.e. 1.e0
int T_MAX = 8; //The uper limit i.e. 1.e8
int ndec = T_MAX - T_MIN; //Number of decades
int N_BPDEC = 1000; //Number of bins per decade
int nbins = (int) ndec*N_BPDEC; //Total number of bins
double step = (double) ndec / nbins;//The increment
double tbins[nbins+1]; //The array to store the bins
for(int i=0; i <= nbins; ++i)
tbins[i] = (float) pow(10., step * (double) i + T_MIN);

Subsampling an array of numbers

I have a series of 100 integer values which I need to reduce/subsample to 77 values for the purpose of fitting into a predefined space on screen. This gives a fraction of 77/100 values-per-pixel - not very neat.
Assuming the 77 is fixed and cannot be changed, what are some typical techniques for subsampling 100 numbers down to 77. I get a sense that it will be a jagged mapping, by which I mean the first new value is the average of [0, 1] then the next value is [3], then average [4, 5] etc. But how do I approach getting the pattern for this mapping?
I am working in C++, although I'm more interested in the technique than implementation.
Thanks in advance.
Either if you downsample or you oversample, you are trying to reconstruct a signal over nonsampled points in time... so you have to make some assumptions.
The sampling theorem tells you that if you sample a signal knowing that it has no frequency components over half the sampling frequency, you can continously and completely recover the signal over the whole timing period. There's a way to reconstruct the signal using sinc() functions (this is sin(x)/x)
sinc() (indeed sin(M_PI/Sampling_period*x)/M_PI/x) is a function that has the following properties:
Its value is 1 for x == 0.0 and 0 for x == k*Sampling_period with k == 0, +-1, +-2, ...
It has no frequency component over half of the sampling_frequency derived from Sampling_period.
So if you consider the sum of the functions F_x(x) = Y[k]*sinc(x/Sampling_period - k) to be the sinc function that equals the sampling value at position k and 0 at other sampling value and sum over all k in your sample, you'll get the best continous function that has the properties of not having components on frequencies over half the sampling frequency and have the same values as your samples set.
Said this, you can resample this function at whatever position you like, getting the best way to resample your data.
This is by far, a complicated way of resampling data, (it has also the problem of not being causal, so it cannot be implemented in real time) and you have several methods used in the past to simplify the interpolation. you have to constructo all the sinc functions for each sample point and add them together. Then you have to resample the resultant function to the new sampling points and give that as a result.
Next is an example of the interpolation method just described. It accepts some input data (in_sz samples) and output interpolated data with the method described before (I supposed the extremums coincide, which makes N+1 samples equal N+1 samples, and this makes the somewhat intrincate calculations of (in_sz - 1)/(out_sz - 1) in the code (change to in_sz/out_sz if you want to make plain N samples -> M samples conversion:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
/* normalized sinc function */
double sinc(double x)
{
x *= M_PI;
if (x == 0.0) return 1.0;
return sin(x)/x;
} /* sinc */
/* interpolate a function made of in samples at point x */
double sinc_approx(double in[], size_t in_sz, double x)
{
int i;
double res = 0.0;
for (i = 0; i < in_sz; i++)
res += in[i] * sinc(x - i);
return res;
} /* sinc_approx */
/* do the actual resampling. Change (in_sz - 1)/(out_sz - 1) if you
* don't want the initial and final samples coincide, as is done here.
*/
void resample_sinc(
double in[],
size_t in_sz,
double out[],
size_t out_sz)
{
int i;
double dx = (double) (in_sz-1) / (out_sz-1);
for (i = 0; i < out_sz; i++)
out[i] = sinc_approx(in, in_sz, i*dx);
}
/* test case */
int main()
{
double in[] = {
0.0, 1.0, 0.5, 0.2, 0.1, 0.0,
};
const size_t in_sz = sizeof in / sizeof in[0];
const size_t out_sz = 5;
double out[out_sz];
int i;
for (i = 0; i < in_sz; i++)
printf("in[%d] = %.6f\n", i, in[i]);
resample_sinc(in, in_sz, out, out_sz);
for (i = 0; i < out_sz; i++)
printf("out[%.6f] = %.6f\n", (double) i * (in_sz-1)/(out_sz-1), out[i]);
return EXIT_SUCCESS;
} /* main */
There are different ways of interpolation (see wikipedia)
The linear one would be something like:
std::array<int, 77> sampling(const std::array<int, 100>& a)
{
std::array<int, 77> res;
for (int i = 0; i != 76; ++i) {
int index = i * 99 / 76;
int p = i * 99 % 76;
res[i] = ((p * a[index + 1]) + ((76 - p) * a[index])) / 76;
}
res[76] = a[99]; // done outside of loop to avoid out of bound access (0 * a[100])
return res;
}
Live example
Create 77 new pixels based on the weighted average of their positions.
As a toy example, think about the 3 pixel case which you want to subsample to 2.
Original (denote as multidimensional array original with RGB as [0, 1, 2]):
|----|----|----|
Subsample (denote as multidimensional array subsample with RGB as [0, 1, 2]):
|------|------|
Here, it is intuitive to see that the first subsample seems like 2/3 of the first original pixel and 1/3 of the next.
For the first subsample pixel, subsample[0], you make it the RGB average of the m original pixels that overlap, in this case original[0] and original[1]. But we do so in weighted fashion.
subsample[0][0] = original[0][0] * 2/3 + original[1][0] * 1/3 # for red
subsample[0][1] = original[0][1] * 2/3 + original[1][1] * 1/3 # for green
subsample[0][2] = original[0][2] * 2/3 + original[1][2] * 1/3 # for blue
In this example original[1][2] is the green component of the second original pixel.
Keep in mind for different subsampling you'll have to determine the set of original cells that contribute to the subsample, and then normalize to find the relative weights of each.
There are much more complex graphics techniques, but this one is simple and works.
Everything depends on what you wish to do with the data - how do you want to visualize it.
A very simple approach would be to render to a 100-wide image, and then smooth scale the image down to a narrower size. Whatever graphics/development framework you're using will surely support such an operation.
Say, though, that your goal might be to retain certain qualities of the data, such as minima and maxima. In such a case, for each bin, you're drawing a line of darker color up to the minimum value, and then continue with a lighter color up to the maximum. Or, you could, instead of just putting a pixel at the average value, you draw a line from the minimum to the maximum.
Finally, you might wish to render as if you had 77 values only - then the goal is to somehow transform the 100 values down to 77. This will imply some kind of an interpolation. Linear or quadratic interpolation is easy, but adds distortions to the signal. Ideally, you'd probably want to throw a sinc interpolator at the problem. A good list of them can be found here. For theoretical background, look here.