Generate random number with non-uniform density - c++

Did someone know how to generate random numbers with a non-uniform density?

Easiest solution if applicable: Use C++11 random facilities or the ones from Boost, which have lots of non-uniform distributions for you.

Use a uniform density RNG, and pass its result through a mapping function to convert to your desired density distribution.

You should state what distribution you need. Basically, you use the inverse of the probability function you want. For example, the most common way to get normal distribution is Box-Muller transform.
Here is the code for Box-Muller just to get the idea:
float box_muller(float m, float s) /* normal random variate generator */
{ /* mean m, standard deviation s */
float x1, x2, w, y1;
static float y2;
static int use_last = 0;
if (use_last) /* use value from previous call */
{
y1 = y2;
use_last = 0;
}
else
{
do {
x1 = 2.0 * ranf() - 1.0;
x2 = 2.0 * ranf() - 1.0;
w = x1 * x1 + x2 * x2;
} while ( w >= 1.0 );
w = sqrt( (-2.0 * log( w ) ) / w );
y1 = x1 * w;
y2 = x2 * w;
use_last = 1;
}
return( m + y1 * s );
}

This class takes a distribution as a matrix (each row is a couple of a number and its frequency) and generates random numbers.
So you can have
Look at main method and run.
public class RandomGenerator {
HashMap<Integer,Range> mappa = new HashMap<Integer,Range>();
Random random = new Random();
int max;
public static void main(String as[]){
int[][] matrice = new int[3][2];
//number 5 occurs 5 times
matrice[0][0] = 5 ;
matrice[0][1] = 5 ;
//number 18 occurs 18 times
matrice[1][0] = 18 ;
matrice[1][1] = 18 ;
//number 77 occurs 77 times
matrice[2][0] = 77 ;
matrice[2][1] = 77 ;
RandomGenerator randomGenerator = new RandomGenerator(matrice);
for (int i = 0; i < 100; i++) {
System.out.println( randomGenerator.getNext() );
}
}
public int getNext(){
int percentile = random.nextInt(max);
Range r =mappa.get(percentile);
return r.getValMax();
}
public HashMap<Integer, Range> getMappa() {
return mappa;
}
public void setMappa(HashMap<Integer, Range> mappa) {
this.mappa = mappa;
}
public RandomGenerator(int[][] distribuzioneOriginale ){
ArrayList<Range> listaRange = new ArrayList<Range>();
int previous = 0;
int totaleOccorrenze = 0;
for (int riga = 0; riga < distribuzioneOriginale.length; riga++) {
Range r = new Range();
r.setValMin(previous);
r.setValMax(distribuzioneOriginale[riga][0]);
r.setOccorrenze(distribuzioneOriginale[riga][1]);
totaleOccorrenze += distribuzioneOriginale[riga][1];
previous = distribuzioneOriginale[riga][0];
listaRange.add(r);
}
int indice = 0;
for (int iRange = 0; iRange < listaRange.size(); iRange++) {
Range r = listaRange.get(iRange);
int perc = (int) ( 1000* (r.getOccorrenze() / (double) totaleOccorrenze) ) ;
for (int i = 0; i < perc; i++) {
mappa.put( i + indice , r);
}
indice += perc;
}
max = indice;
}
class Range{
int valMin;
int valMax;
int occorrenze;
public int getValMin() {
return valMin;
}
public void setValMin(int valMin) {
this.valMin = valMin;
}
public int getValMax() {
return valMax;
}
public void setValMax(int valMax) {
this.valMax = valMax;
}
public int getOccorrenze() {
return occorrenze;
}
public void setOccorrenze(int occorrenze) {
this.occorrenze = occorrenze;
}
}
}

In your comment in the question:
1/sqrt(2*pi) * e^(-x^2)
The only variable is x. x itself will have a uniform density. So just pick a good random number, then stick it into that equation.

Related

Passing Arrays through Fuctions - "Error: Cannot Convert 'float**' to 'float*' for argument '1'"

Long time Reader, First time Asker.
So I'm working on a coding project of which the long term goal is to make a solar system simulator. The idea is that that it whips up a randomized solar system with a few rules like 'at formation the first planet after the frostline has to be the largest gas giant' etc, and calculates the orbits to check for stability.
Obviously it's not done yet, I'm having some trouble with using the arrays in the subroutines. I know that you can't directly take arrays in and out of functions, but you can take pointers to said arrays in and out of functions if you do it right.
I apparently have not done it right below. I've tried to comment and make the code as readable as possible, here it is.
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <tuple>
#include <vector>
#include <stdio.h>
#include <math.h>
#include <complex>
#include <stdint.h>
#include <time.h>
#include <string.h>
#include <algorithm>
//#include "mpi.h"
using namespace std;
double MyRandom(){
//////////////////////////
//Random Number Generator
//Returns number between 0-99
//////////////////////////
double y = 0;
unsigned seed = time(0);
srand(seed);
uint64_t x = rand();
x ^= x << 13;
x ^= x >> 7;
x ^= x << 17;
x = (1070739 * x) % 2199023255530;
y = x / 21990232555.31 ;
return y;
}
////////////////////////
///////////////////////
tuple< char& , float& , float& , float& , int& > Star(){
////////////////////////////
//Star will generate a Star
//Randomly or User Selected
//Class, Luminosity, Probability, Radius, Mass, Temperature
//Stars always take up 99% of the mass of the system.
///////////////////////////
char Class;
string Choice;
float L, R, M;
int T;
tuple< char& , float& , float& , float& , int& > star( Class = 'i', L = 1 , R = 1 , M = 1 , T = 3000) ;
cout << "Select Star Class (OBAFGKM) or Select R for Random: ";
cin >> Choice;
if ( Choice == "R" ) {
double y;
y = MyRandom();
if (y <= 0.003) Class = 'O';
if ((y > 0.003) && (y <= 0.133)) Class = 'B';
if ((y > 0.133) && (y <= 0.733)) Class = 'A';
if ((y > 0.733) && (y <= 3.733)) Class = 'F';
if ((y > 3.733) && (y <= 11.333)) Class = 'G';
if ((y > 11.333) && (y <= 23.433)) Class = 'K';
else Class = 'M';
}
if (Class == 'O') {
L = 30000;
R = 0.0307;
M = 16;
T = 30000;
}
if (Class == 'B') {
L = 15000;
R = 0.0195;
M = 9;
T = 20000;
}
if (Class == 'A') {
L = 15;
R = 0.00744;
M = 1.7;
T = 8700;
}
if (Class == 'F') {
L = 3.25;
R = 0.00488;
M = 1.2;
T = 6750;
}
if (Class == 'G') {
L = 1;
R = 0.00465;
M = 1;
T = 5700;
}
if (Class == 'K') {
L = 0.34;
R = 0.00356;
M = 0.62;
T = 4450;
}
if (Class == 'M') {
L = 0.08;
R = 0.00326;
M = 0.26;
T = 3000;
}
return star;
}
////////////
////////////
float* Planet( float &L, float &R, float &M, int &T, int &n){
///////////////////////////
//Planet generates the Planets
//Random 1 - 10, Random distribution 0.06 - 6 JAU unless specified by User
//Frost line Calculated, First Planet after Frost line is the Jupiter
//The Jupiter will have the most mass of all Jovian worlds
//Otherwise divided into Jovian and Terrestrial Worlds, Random Masses within groups
//Also calculates if a planet is in the Habitable Zone
////////////////////////////
float frostline, innerCHZ, outerCHZ;
float a = 0.06; // a - albedo
float m = M / 100; //Mass of the Jupiter always 1/100th mass of the Star.
float sys[n];
float* system[n][5] = {{0}};
for (int i = 0 ; i < n ; i++){
sys[i] = MyRandom()/10 * 3; //Distances in terms of Sol AU
}
sort(sys, sys + n );
for (int i = 0 ; i < n ; i++){
system[i][0] = &sys[i];
system[i][1] = 0; //system[i][0] is x, system[i][1] is y
}
frostline = (0.6 * T / 150) * (0.6 * T/150) * R / sqrt(1 - a);
innerCHZ = sqrt(L / 1.1);
outerCHZ = sqrt(L / 0.53);
for (int i = 0 ; i < n ; i++){
if (system[i][0] <= &frostline) {
float tmass = m * 0.0003 * MyRandom();
system[i][2] = &tmass ; //system[i][2] is mass, [3] is marker for the Jupter
system[i][3] = 0 ;
}
if ((system[i][0] >= &frostline) && (system[i-1][0] < &frostline)){
system[i][2] = &m ;
float J = 1;
system[i][3] = &J ;
}
if ((system[i][0] >= &frostline) && (system[i-1][0] >= &frostline)) {
float jmass = m * 0.01 * MyRandom();
system[i][2] = &jmass ;
system[i][3] = 0 ;
}
if ((system[i][0] >= &innerCHZ) && (system[i][0] <= &outerCHZ)){
float H = 1;
system[i][4] = &H;
}
else system[i][4] = 0; //[4] is habitable marker
}
return system[n][5];
}
////////////
////////////
float* Time( float *system , int n){
///////////////////////////
//Time advances the solar system.
//Plots the Orbits
//Uses MPI to spread it's calculations.
///////////////////////////
return system;
}
////////////
////////////
void FinalCheck( float system){
///////////////////////////
//Final Checks
//Reports if a Planet spent the whole Time in the Habitable Zone
///////////////////////////
/*for (int i = 0 ; i < row ; i++){
if (system[i][4] == 1.0) {
cout << "Planet " << i << " in this system is Habitable." ;
}
// The Habitable stat only means liquid water can exist on the surface
// Add pi if planet enters habitable zone, minus 1 if it leaves.
// If planet was habitable but left, assuming all life was destroyed
}
*/
}
////////////
int main(){
char Class;
int T;
float L, R, M;
tuple< char , float , float , float , int > star( Class , L , R , M , T );
star = Star();
int n = MyRandom()/10 + 1;
float * system[n][5] = {{0}};
float system1[n][5] = {{0}};
system[n][5] = Planet( L , R , M, T, n);
for (int i = 0 ; i < 100 ; i++) {
system1[n][5] = Time( *system, n );
system[n][5] = &system1[n][5];
}
FinalCheck( *system[n][5]);
///////////////////////////
//Report cleans everything up and gives the results
//Shows the plot, lists the Planets
//Reports the Positions and Masses of all Planets
//Reports which was the Jupiter and which if any were Habitable
//////////////////////////
return 0;
}
The problem is when I run a compiler over this line 227 gets flagged -
system1[n][5] = Time( *system, n );
With the following error:
error: cannot convert 'float**' to 'float*' for argument '1' to 'float* Time(float*, int)
I get that this means that the compiler things I'm trying to equate a pointer-to-a-pointer with a pointer, but I'm not sure how it arrived at that conclusion or how to fix it. I'd appreciate help with this, especially the second part. I also would love to hear anything about passing arrays through subroutines as apparently I'm not doing it right, or at least not well.
Update 1 : - Got the short-term fix in and the compiler makes it through but gives a segmentation fault (core dumped) error when I try to run it. Looks like I have some reading and updates to do though with the namespace, the pointers, and possibly changing the arrays into vectors instead. Feels like if I concentrate on those first it might fix the segmentation error.
Your variable system is declared as
float * system[n][5] = {{0}};
which is a pointer to a 2D array (which will decay to float*** when passed to a function).
Your Time function is declared as
float* Time( float *system , int n);
where the 1st argument needs to be a float*.
That means this call
system1[n][5] = Time( *system, n );
should actually be something like
system1[n][5] = Time( **system, n );
That being said, there are a number of issues in your code.
To start off, don't do using namespace std;.
Also, this line float sys[n]; is not allowed. You can't have variable length arrays in c++.
float* system[n][5]
system here is a 2D array of float*s, not floats.
So, in other words, system decays to float***, *system decays to float**, **system decays to float*, and ***system decays to float.
So, the compiler is correct. You're passing what decays to a float** to Time() which expects a float*.
You're going to have to reconfigure your code to pass the right thing, whatever that is.
Side note: please be advised that the way you're creating arrays isn't valid C++ and may cause issues later.

C++ Memory Error

When I compile my code, I repeatedly get the error
free(): invalid next size (fast)
Yet the code only goes so far as to create references. Specifically, commenting out a specific line seems to fix the error; however, it's a very important line.
void neuron::updateWeights(layer &prevLayer) {
for(unsigned i = 0; i < prevLayer.size(); i++) {
double oldDeltaWeight = prevLayer[i].m_connections[m_index].m_deltaWeight;
double newDeltaWeight = eta * prevLayer[i].m_output * m_gradient + alpha * oldDeltaWeight;
prevLayer[i].m_connections[m_index].m_deltaWeight = newDeltaWeight; // THIS LINE
prevLayer[i].m_connections[m_index].m_weight += newDeltaWeight;
}
}
Any help would be very appreciated!
EDIT:
Additional code
// Headers
#include "../../Include/neuralNet.h"
// Libraries
#include <vector>
#include <iostream>
#include <cmath>
// Namespace
using namespace std;
// Class constructor
neuron::neuron(unsigned index, unsigned outputs) {
m_index = index;
for(unsigned i = 0; i < outputs; i++) {
m_connections.push_back(connection());
}
// Set default neuron output
setOutput(1.0);
}
double neuron::eta = 0.15; // overall net learning rate, [0.0..1.0]
double neuron::alpha = 0.5; // momentum, multiplier of last deltaWeight, [0.0..1.0]
// Definition of transfer function method
double neuron::transferFunction(double x) const {
return tanh(x); // -1 -> 1
}
// Transfer function derivation method
double neuron::transferFunctionDerivative(double x) const {
return 1 - x*x; // Derivative of tanh
}
// Set output value
void neuron::setOutput(double value) {
m_output = value;
}
// Forward propagate
void neuron::recalculate(layer &previousLayer) {
double sum = 0.0;
for(unsigned i = 0; i < previousLayer.size(); i++) {
sum += previousLayer[i].m_output * previousLayer[i].m_connections[m_index].m_weight;
}
setOutput(transferFunction(sum));
}
// Change weights based on target
void neuron::updateWeights(layer &prevLayer) {
for(unsigned i = 0; i < prevLayer.size(); i++) {
double oldDeltaWeight = prevLayer[i].m_connections[m_index].m_deltaWeight;
double newDeltaWeight = eta * prevLayer[i].m_output * m_gradient + alpha * oldDeltaWeight;
prevLayer[i].m_connections[m_index].m_deltaWeight = newDeltaWeight;
prevLayer[i].m_connections[m_index].m_weight += newDeltaWeight;
}
}
// Complex math stuff
void neuron::calculateOutputGradients(double target) {
double delta = target - m_output;
m_gradient = delta * transferFunctionDerivative(m_output);
}
double neuron::sumDOW(const layer &nextLayer) {
double sum = 0.0;
for(unsigned i = 1; i < nextLayer.size(); i++) {
sum += m_connections[i].m_weight * nextLayer[i].m_gradient;
}
return sum;
}
void neuron::calculateHiddenGradients(const layer &nextLayer) {
double dow = sumDOW(nextLayer);
m_gradient = dow * neuron::transferFunctionDerivative(m_output);
}
Also the line is called here
// Update weights
for(unsigned layerIndex = m_layers.size() - 1; layerIndex > 0; layerIndex--) {
layer &currentLayer = m_layers[layerIndex];
layer &previousLayer = m_layers[layerIndex - 1];
for(unsigned i = 1; i < currentLayer.size(); i++) {
currentLayer[i].updateWeights(previousLayer);
}
}
Your constructor initialize N 'outputs' m_connections in the class.
But you have a lot of places calling:
m_connections[m_index]
What happens if m_index > outputs? Is this possible in your problem?
Try including an assert (http://www.cplusplus.com/reference/cassert/assert/) in the first line of the constructor:
assert(index < outputs)
You are probably having a bad pointer access somewhere.

Trying to create a neural network in c++

I'm trying to implement a neural network into c++, but all I have to show for it are lots of unknown errors. I've already searched and found other post such as (C++ class has no member named), however this has been no help to me. Can please help me figure out how to resolve all the errors I've been getting.
Here's the code
#include <iostream>
#include <vector>
#include <cstdlib>
#include <assert.h>
#include <math.h>
using namespace std;
struct Connection
{
double weight;
double deltaWeight;
};
class Neuron {};
typedef vector<Neuron> Layer;
// ************************* class Neuron *************************
class Neuron
{
public:
Neuron(unsigned numOutputs, unsigned myIndex);
void setOutputVal(double val)
{
m_outputVal = val;
};
double getOutputVal(void) const
{
return m_outputVal;
};
void feedForward(const Layer &prevLayer);
void calcOutputGradients(double targetVal);
void calcHiddenGradients(const Layer &nextLayer);
void updateInputWeights(Layer &prevLayer);
private:
static double eta; // [0.0..1.0] overall net training rate
static double alpha; // [0.0..n] multiplier of last weight change (momentum)
static double transferFunction(double x);
static double transferFunctionDerivative(double x);
static double randomWeight(void)
{
return rand() / double(RAND_MAX);
};
double sumDOW(const Layer &nextLayer) const;
double m_outputVal;
vector<Connection> m_outputWeights;
unsigned m_myIndex;
double m_gradient;
};
double Neuron::eta = 0.15; // overall net learning rate, [0.0..1.0]
double Neuron::alpha = 0.5; // momentum, multiplier of last deltaWeight [0.0..n]
void Neuron::updateInputWeights(Layer &prevLayer)
{
// The weight are updated in the Connection container
// in the neurons in the preceding layer
for (unsigned n = 0; n < prevLayer.size(); ++n)
{
Neuron &neuron = prevLayer[n];
double oldDeltaWeight = neuron.m_outputWeights[m_myIndex].deltaWeight;
double newDeltaWeight =
eta
* neuron_getOutputVal()
* m_gradient
+ alpha
* oldDeltaWeight;
neuron.m_outputWeights[m_myIndex].deltaWeight = newDeltaWeight;
neuron.m_outputWeights[m_myIndex].weight += newDeltaWeight;
}
}
double Neuron::sumDOW(const Layer &nextLayer) const
{
double sum = 0.0;
// Sum our contributions of the errors at the nodes we feed
for (unsigned n = 0; nextLayer.size() - 1; ++n)
{
sum += m_outputWeights[n].weight * nextLayer[n].m_gradient;
}
return sum;
}
void Neuron::calcHiddenGradients(const Layer &nextLayer)
{
double dow = sumDOW(nextLayer);
m_gradient = dow * Neuron::transferFunctionDerivative(m_outputVal);
}
void Neuron::calcOutputGradients(double targetVal)
{
double delta = targetVal - m_outputVal;
m_gradient = delta * Neuron::transferFunctionDerivative(m_outputVal);
}
double Neuron::transferFunction(double x)
{
// tanh - output range [-1.0..1.0]
return tanh(x);
}
double Neuron::transferFunctionDerivative(double x)
{
// tanh derivative
return 1.0 - x * x;
}
void Neuron::feedForward(const Layer &prevLayer)
{
double sum = 0.0;
// Sum the previous layer's outputs (which are our inputs)
// Include the bias node from the previous layer
for (unsigned n = 0; n < prevLayer.size(); ++n)
{
sum += prevLayer[n].getOutputVal() *
prevLayer[n].m_outputWeights[m_myIndex].weight;
}
m_outputVal = Neuron::transferFunction(sum);
}
Neuron::Neuron(unsigned numOutputs, unsigned myIndex)
{
for (unsigned c = 0; c < numOutputs; ++c)
{
m_outputWeights.push_back(Connection());
m_outputWeights.back().weight = randomWeight();
}
m_myIndex = myIndex;
}
// ************************* class Net *************************
class Net
{
public:
Net(const vector<unsigned> &topology);
void feedForward(const vector<double> &inputVals);
void backProp(const vector<double> &targetVals);
void getResults(vector<double> &resultVals) const;
private:
vector<Layer> m_layers; // m_layers{layerNum][neuronNum]
double m_error;
double m_recentAverageError;
double m_recentAverageSmoothingFactor;
};
void Net::getResults(vector<double> &resultVals) const
{
resultVals.clear();
for (unsigned n = 0; n < m_layers.back().size() - 1; ++n)
{
resultVals.push_back(m_layers.back()[n].getOutputVals());
}
}
void Net::backProp(const vector<double> &targetVals)
{
// Calculate overall net error (RMS of output errors)
Layer &outputLayer = m_layers.back();
m_error = 0.0;
for (unsigned n = 0; n < outputLayer.size() - 1; ++n)
{
double delta = targetVals[n] - outputLayer[n].getOutputVal();
m_error += delta * delta;
}
m_error /= outputLayer.size() - 1; // get average error squared
m_error = sqrt(m_error); // RMS
// Implement a recent average measurement:
m_recentAverageError =
(m_recentAverageError * m_recentAverageSmoothingFactor + m_error)
/ (m_recentAverageSmoothingFactor + 1.0);
// Calculate output layer gradients
for (unsigned n = 0; n < outputLayer.size() - 1; ++n)
{
outputLayer[n].calcOutputGradients(targetVals[n]);
}
// Calculate gradients on hidden layers
for (unsigned layerNum = m_layers.size() - 2; layerNum > 0; --layerNum)
{
Layer &hiddenLayer = m_layers[layerNum];
Layer &nextLayer = m_layers[layerNum + 1];
for (unsigned n = 0; n < hiddenLayer.size(); ++n)
{
hiddenLayer[n].calcHiddenGradients(nextLayer);
}
}
// For all layers from output to first hidden layer.
// update connection weights
for (unsigned layerNum = m_layers.size() - 1; layerNum > 0; --layerNum)
{
Layer &layer = m_layers[layerNum];
Layer &prevLayer = m_layers[layerNum - 1];
for (unsigned n = 0; n < layer.size() - 1; ++n)
{
layer[n].updateInputWeights(prevLayer);
}
}
}
void Net::feedForward(const vector<double> &inputVals)
{
assert(inputVals.size() == m_layers[0].size() - 1);
// Assign (latch) the values into the input neurons
for (unsigned i = 0; i < inputVals.size(); ++i)
{
m_layers[0][i].setOutputVal(inputVals[i]);
}
// Forward propagate
for (unsigned layerNum = 1; layerNum = m_layers.size(); ++layerNum)
{
Layer &prevLayer = m_layers[layerNum - 1];
for (unsigned n = 0; n < m_layers[layerNum].size() - 1; ++n)
{
m_layers[layerNum][n].feedForward(prevLayer);
}
}
}
Net::Net(const vector<unsigned> &topology)
{
unsigned numLayers = topology.size();
for (unsigned layerNum = 0; layerNum < numLayers; ++layerNum)
{
m_layers.push_back(Layer());
unsigned numOutputs = layerNum == topology.size() - 1 ? 0 : topology[layerNum + 1];
// We have made a new layer, now fill it with neurons, and
// add a bias neuron to the layer:
for (unsigned neuronNum = 0; neuronNum <= topology[layerNum]; ++neuronNum)
{
m_layers.back().push_back(Neuron(numOutputs, neuronNum));
cout << "Made a Neuron!" << endl;
}
}
}
int main()
{
// e.g.. { 3, 2, 1 }
// THIS IS FOR THE NUMBER OF NEURONS THAT YOU WANT!!
vector<unsigned> topology;
topology.push_back(3);
topology.push_back(2);
topology.push_back(1);
Net myNet(topology);
vector<double> inputVals;
myNet.feedForward(inputVals);
vector<double> targetVals;
myNet.backProp(targetVals);
vector<double> resultVals;
myNet.getResults(resultVals);
system("pause");
}
I've been getting errors such as:
Error: class "Neuron" has no member "feedForward"
Error: class "Neuron" has no member "setOutputVal"
'neuron_OutputVal': identifier not found
class Neuron {};
Here your file defined a class called Neuron. It's a class with no members, and no methods. A completely empty class.
A few lines later:
class Neuron
{
public:
// ...
Why, here's another class called Neuron. However, in C++, all classes must have unique names. So, your C++ compiler will completely reject this class declaration, and refuse to process it. Or, take some other, unspecified action.
There are three issues in your code.
First, as others mentioned fix your forward declaration.
class Neuron;
note this doesnt have {} as in your code. You dont need to move the typedef down, since your Neuron class uses the typedef 'Layer'.
Second, on line 70,
neuron.getOutputVal
instead of neuron_getOutputVal.
Third on line 167 just drop the s from getOutputVal (s).
class Neuron {}; is not a valid forward declaration. You can't use a forward declaration anyway, because declaration of Layer requires full knowledge of Neuron.
You'll have to remove the forward declaration class Neuron {}; entirely and move your declaration of typedef vector<Neuron> Layer; down. Put it right above your declaration of class Net.

Weighted Variance and Weighted Standard Deviation in C++

Hi I'm trying to calculate the weighted variance and weighted standard deviation of a series of ints or floats. I found these links:
http://math.tutorvista.com/statistics/standard-deviation.html#weighted-standard-deviation
http://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/weightsd.pdf (warning pdf)
Here are my template functions so far. Variance and standard deviation work fine but for the life of me I can't get the weighted versions to match the test case at the bottom of the pdf:
template <class T>
inline float Mean( T samples[], int count )
{
float mean = 0.0f;
if( count >= 1 )
{
for( int i = 0; i < count; i++ )
mean += samples[i];
mean /= (float) count;
}
return mean;
}
template <class T>
inline float Variance( T samples[], int count )
{
float variance = 0.0f;
if( count > 1 )
{
float mean = 0.0f;
for( int i = 0; i < count; i++ )
mean += samples[i];
mean /= (float) count;
for( int i = 0; i < count; i++ )
{
float sum = (float) samples[i] - mean;
variance += sum*sum;
}
variance /= (float) count - 1.0f;
}
return variance;
}
template <class T>
inline float StdDev( T samples[], int count )
{
return sqrtf( Variance( samples, count ) );
}
template <class T>
inline float VarianceWeighted( T samples[], T weights[], int count )
{
float varianceWeighted = 0.0f;
if( count > 1 )
{
float sumWeights = 0.0f, meanWeighted = 0.0f;
int numNonzero = 0;
for( int i = 0; i < count; i++ )
{
meanWeighted += samples[i]*weights[i];
sumWeights += weights[i];
if( ((float) weights[i]) != 0.0f ) numNonzero++;
}
if( sumWeights != 0.0f && numNonzero > 1 )
{
meanWeighted /= sumWeights;
for( int i = 0; i < count; i++ )
{
float sum = samples[i] - meanWeighted;
varianceWeighted += weights[i]*sum*sum;
}
varianceWeighted *= ((float) numNonzero)/((float) count*(numNonzero - 1.0f)*sumWeights); // this should be right but isn't?!
}
}
return varianceWeighted;
}
template <class T>
inline float StdDevWeighted( T samples[], T weights[], int count )
{
return sqrtf( VarianceWeighted( samples, weights, count ) );
}
Test case:
int samples[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23 };
printf( "%.2f\n", StdDev( samples, 9 ) );
int weights[] = { 1, 1, 0, 0, 4, 1, 2, 1, 0 };
printf( "%.2f\n", StdDevWeighted( samples, weights, 9 ) );
Result:
7.46
1.94
Should be:
7.46
5.82
I think the problem is that weighted variance has a few different interpretations and I don't know which one is standard. I found this variation:
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Weighted_incremental_algorithm
template <class T>
inline float VarianceWeighted( T samples[], T weights[], int count )
{
float varianceWeighted = 0.0f;
if( count > 1 )
{
float sumWeights = 0.0f, meanWeighted = 0.0f, m2 = 0.0f;
for( int i = 0; i < count; i++ )
{
float temp = weights[i] + sumWeights,
delta = samples[i] - meanWeighted,
r = delta*weights[i]/temp;
meanWeighted += r;
m2 += sumWeights*delta*r; // Alternatively, m2 += weights[i] * delta * (samples[i]−meanWeighted)
sumWeights = temp;
}
varianceWeighted = (m2/sumWeights)*((float) count/(count - 1));
}
return varianceWeighted;
}
Result:
7.46
5.64
I also tried looking at boost and esutil but they didn't help much:
http://www.boost.org/doc/libs/1_48_0/boost/accumulators/statistics/weighted_variance.hpp
http://esutil.googlecode.com/svn-history/r269/trunk/esutil/stat/util.py
I don't need an entire statistics library, and more importantly, I want to understand the implementation.
Can someone please post functions to calculate these correctly?
Bonus points if your functions can do it in a single pass.
Also, does anyone know if weighted variance gives the same result as ordinary variance with repeated values? For example, would the variance of samples[] = { 1, 2, 3, 3 } be the same as weighted variance of samples[] = { 1, 2, 3 }, weights[] = { 1, 1, 2 }?
Update: here is a google docs spreadsheet I have set up to explore the problem. Unfortunately my answers are nowhere close to the NIST pdf. I think the problem is in the unbias step, but I can't see how to fix it.
https://docs.google.com/spreadsheet/ccc?key=0ApzPh5nRin0ldGNNYjhCUTlWTks2TGJrZW4wQUcyZnc&usp=sharing
The result is a weighted variance of 3.77, which is the square of the weighted standard deviation of 1.94 I got in my c++ code.
I am attempting to install octave on my Mac OS X setup so that I can run their var() function with weights, but it is taking forever to install it with brew. I am deeply into yak shaving now.
float mean(uint16_t* x, uint16_t n) {
uint16_t sum_xi = 0;
int i;
for (i = 0; i < n; i++) {
sum_xi += x[i];
}
return (float) sum_xi / n;
}
/**
* http://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/weigmean.pdf
*/
float weighted_mean(uint16_t* x, uint16_t* w, uint16_t n) {
int sum_wixi = 0;
int sum_wi = 0;
int i;
for (i = 0; i < n; i++) {
sum_wixi += w[i] * x[i];
sum_wi += w[i];
}
return (float) sum_wixi / (float) sum_wi;
}
float variance(uint16_t* x, uint16_t n) {
float mean_x = mean(x, n);
float dist, dist2;
float sum_dist2 = 0;
int i;
for (i = 0; i < n; i++) {
dist = x[i] - mean_x;
dist2 = dist * dist;
sum_dist2 += dist2;
}
return sum_dist2 / (n - 1);
}
/**
* http://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/weighvar.pdf
*/
float weighted_variance(uint16_t* x, uint16_t* w, uint16_t n) {
float xw = weighted_mean(x, w, n);
float dist, dist2;
float sum_wi_times_dist2 = 0;
int sum_wi = 0;
int n_prime = 0;
int i;
for (i = 0; i < n; i++) {
dist = x[i] - xw;
dist2 = dist * dist;
sum_wi_times_dist2 += w[i] * dist2;
sum_wi += w[i];
if (w[i] > 0)
n_prime++;
}
if (n_prime > 1) {
return sum_wi_times_dist2 / ((float) ((n_prime - 1) * sum_wi) / n_prime);
} else {
return 0.0f;
}
}
/**
* http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Weighted_incremental_algorithm
*/
float weighted_incremental_variance(uint16_t* x, uint16_t* w, uint16_t n) {
uint16_t sumweight = 0;
float mean = 0;
float M2 = 0;
int n_prime = 0;
uint16_t temp;
float delta;
float R;
int i;
for (i = 0; i < n; i++) {
if (w[i] == 0)
continue;
temp = w[i] + sumweight;
delta = x[i] - mean;
R = delta * w[i] / temp;
mean += R;
M2 += sumweight * delta * R;
sumweight = temp;
n_prime++;
}
if (n_prime > 1) {
float variance_n = M2 / sumweight;
return variance_n * n_prime / (n_prime - 1);
} else {
return 0.0f;
}
}
void main(void) {
uint16_t n = 9;
uint16_t x[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23 };
uint16_t w[] = { 1, 1, 0, 0, 4, 1, 2, 1, 0 };
printf("%f\n", weighted_variance(x, w, n)); /* outputs: 33.900002 */
printf("%f\n", weighted_incremental_variance(x, w, n)); /* outputs: 33.900005 */
}
Solution
You accidentally added an extra term "count" in the denominator of the variance update term.
When using the correction below I get your expected answer of
5.82
FYI, one way to pick up on things like this when you are doing a code review is to do a "dimensional analysis". The "units" of the equation were wrong. You were effectively dividing by an order N squared term when it should be an order N term.
Before
template <class T>
inline float VarianceWeighted( T samples[], T weights[], int count )
{
...
varianceWeighted *= ((float) numNonzero)/((float) count*(numNonzero - 1.0f)*sumWeights); // this should be right but isn't?!
...
}
After
Removing "count" this line should be replaced by
template <class T>
inline float VarianceWeighted( T samples[], T weights[], int count )
{
...
varianceWeighted *= ((float) numNonzero)/((float) (numNonzero - 1.0f)*sumWeights); // removed count term
...
}
Here's a much shorter version with a working Demo :
#include <iostream>
#include <vector>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/weighted_variance.hpp>
#include <boost/accumulators/statistics/variance.hpp>
namespace ba = boost::accumulators;
int main() {
std::vector<double> numbers{2, 3, 5, 7, 11, 13, 17, 19, 23};
std::vector<double> weights{1, 1, 0, 0, 4, 1, 2, 1, 0 };
ba::accumulator_set<double, ba::stats<ba::tag::variance > > acc;
ba::accumulator_set<double, ba::stats<ba::tag::weighted_variance > , double > acc_weighted;
double n = numbers.size();
double N = n;
for(size_t i = 0 ; i<numbers.size() ; i++ ) {
acc ( numbers[i] );
acc_weighted( numbers[i] , ba::weight = weights[i] );
if(weights[i] == 0) {
n=n-1;
}
};
std::cout << "Sample Standard Deviation, s: " << std::sqrt(ba::variance(acc) *N/(N-1)) << std::endl;
std::cout << "Weighted Sample Standard Deviation, s: " << std::sqrt(ba::weighted_variance(acc_weighted)*n/(n-1)) << std::endl;
}
Make note that n must reflect the number of samples with nonzero weights, hence extra n=n-1; line.
Sample Standard Deviation, s: 7.45729
Weighted Sample Standard Deviation, s: 5.82237

'std::vector<double>::iterator' has no member named 'begin'

So I am trying to perform recursion ( A very simple code for split radix recursive butterflies) on a large C++ STL vector and I am using iterators to call the recursion but it isn't working as I keep getting errors.
#include <iostream>
#include <cmath>
#include <vector>
#include <string>
#include <algorithm>
using namespace std;
template <typename T>
class fft_data{
public:
vector<T> re;
vector<T> im;
};
void inline split_radix_rec(vector<double>::iterator r,vector<double>::iterator i, int sgn,int N) {
if (N == 1) {
return;
} else if (N == 2) {
for (int k = 0; k < N/2; k++) {
int index = 2*k;
int index1 = index+1;
double taur = *(r+index1);
double taui = *(i+index1);
*(r+index1) = *(r+index) - taur;
*(i+index1) = *(i+index) - taui;
*(r+index) = *(r+index) + taur;
*(i+index) = *(i+index) + taui;
}
N=N/2;
} else {
int m = N/2;
int p = N/4;
double PI2 = 6.28318530717958647692528676655900577;
double theta = -1.0 * sgn * PI2/N;
double S = sin(theta);
double C = cos(theta);
double PI6 = 3.0*6.28318530717958647692528676655900577;
double theta3 = -1.0 * sgn * PI6/N;
double S3 = sin(theta3);
double C3 = cos(theta3);
double wlr = 1.0;
double wli = 0.0;
//T wl2r = (T) 1.0;
//T wl2i = (T) 0.0;
double wl3r = 1.0;
double wl3i = 0.0;
double tau1r,tau1i,tau2r,tau2i;
double ur,ui,vr,vi;
for (int j = 0; j < p; j++) {
int index1 = j+m;
int index2 = index1+p;
int index3 = j+p;
tau1r = *(r+index1);
tau1i = *(i+index1);
tau2r = *(r+index2);
tau2i = *(i+index2);
ur = tau1r + tau2r;
ui = tau1i + tau2i;
vr = sgn* (tau2r - tau1r);
vi = sgn* (tau2i - tau1i);
*(r+index2) = *(r+index3) - vi;
*(i+index2) = *(i+index3) + vr;
*(r+index1) = *(r+j) - ur;
*(i+index1) = *(i+j) - ui;
*(r+index3) = *(r+index3) + vi;
*(i+index3) = *(i+index3) - vr;
*(r+j) = *(r+j) + ur;
*(i+j) = *(i+j) + ui;
}
split_radix_rec(r.begin(),i.begin(),sgn,m);
split_radix_rec(r.begin()+m,i.begin()+m,sgn,p);
split_radix_rec(r.begin()+m+p,i.begin()+m+p,sgn,p);
}
}
int main() {
vector<double> u,v;
for (int i = 0; i < 256; i++) {
u.push_back(i);
v.push_back(i);
}
int sgn = 1;
int N = 256;
split_radix_rec(u.begin(),v.begin(),sgn,N);
return 0;
}
Here are the errors I am getting
main.cpp:93:21: error: 'std::vector<double>::iterator' has no member named 'begin'
6 Identical errors on lines 93,94,95 (the three split_radix_rec() functions called from within the split_radix_rec function). This is part of a much larger code so I want it to work for STL vectors. What am I doing wrong?
As the error states, you are calling begin() on a std::vector<double>::iterator.
You should call that on a std::vector<double>, so that it could return you a std::vector<double>::iterator.
r,i are itself iterators(begins) in your code.
Try:
split_radix_rec(r,i,sgn,m);
split_radix_rec(r+m,i+m,sgn,p);
split_radix_rec(r+m+p,i+m+p,sgn,p);
There is way too much code to give you a concise answer, but the error clearly states that you are calling begin() on a vector iterator instead of a vector. And that happens at the split_radix_rec recursive call. You may have intended this instead:
split_radix_rec(r,i,sgn,m);
split_radix_rec(r+m,i+m,sgn,p);
split_radix_rec(r+m+p,i+m+p,sgn,p);