Is it possible to use boost accumulators with vectors? - c++

I wanted to use boost accumulators to calculate statistics of a variable that is a vector. Is there a simple way to do this. I think it's not possible to use the dumbest thing:
using namespace boost::accumulators;
//stuff...
accumulator_set<vector<double>, stats<tag::mean> > acc;
vector<double> some_vetor;
//stuff
some_vector = doStuff();
acc(some_vector);
maybe this is obvious, but I tried anyway. :P
What I wanted was to have an accumulator that would calculate a vector which is the mean of the components of many vectors. Is there an easy way out?
EDIT:
I don't know if I was thoroughly clear. I don't want this:
for_each(vec.begin(), vec.end(),acc);
This would calculate the mean of the entries of a given vector. What I need is different. I have a function that will spit vectors:
vector<double> doSomething();
// this is a monte carlo simulation;
And I need to run this many times and calculate the vectorial mean of those vectors:
for(int i = 0; i < numberOfMCSteps; i++){
vec = doSomething();
acc(vec);
}
cout << mean(acc);
And I want mean(acc) to be a vector itself, whose entry [i] would be the means of the entries [i] of the accumulated vectors.
Theres a hint about this in the docs of Boost, but nothing explicit. And I'm a bit dumb. :P

I've looked into your question a bit, and it seems to me that Boost.Accumulators already provides support for std::vector. Here is what I could find in a section of the user's guide :
Another example where the Numeric
Operators Sub-Library is useful is
when a type does not define the
operator overloads required to use it
for some statistical calculations.
For instance, std::vector<> does not overload any arithmetic operators, yet
it may be useful to use std::vector<>
as a sample or variate type. The
Numeric Operators Sub-Library defines
the necessary operator overloads in
the boost::numeric::operators
namespace, which is brought into scope
by the Accumulators Framework with a
using directive.
Indeed, after verification, the file boost/accumulators/numeric/functional/vector.hpp does contain the necessary operators for the 'naive' solution to work.
I believe you should try :
Including either
boost/accumulators/numeric/functional/vector.hpp before any other accumulators header
boost/accumulators/numeric/functional.hpp while defining BOOST_NUMERIC_FUNCTIONAL_STD_VECTOR_SUPPORT
Bringing the operators into scope with a using namespace boost::numeric::operators;.
There's only one last detail left : execution will break at runtime because the initial accumulated value is default-constructed, and an assertion will occur when trying to add a vector of size n to an empty vector. For this, it seems you should initialize the accumulator with (where n is the number of elements in your vector) :
accumulator_set<std::vector<double>, stats<tag::mean> > acc(std::vector<double>(n));
I tried the following code, mean gives me a std::vector of size 2 :
int main()
{
accumulator_set<std::vector<double>, stats<tag::mean> > acc(std::vector<double>(2));
const std::vector<double> v1 = boost::assign::list_of(1.)(2.);
const std::vector<double> v2 = boost::assign::list_of(2.)(3.);
const std::vector<double> v3 = boost::assign::list_of(3.)(4.);
acc(v1);
acc(v2);
acc(v3);
const std::vector<double> &meanVector = mean(acc);
}
I believe this is what you wanted ?

I don't have it set up to try right now, but if all boost::accumulators need is properly defined mathematical operators, then you might be able to get away with a different vector type: http://www.boost.org/doc/libs/1_37_0/libs/numeric/ublas/doc/vector.htm

And what about the documentation?
// The data for which we wish to calculate statistical properties:
std::vector< double > data( /* stuff */ );
// The accumulator set which will calculate the properties for us:
accumulator_set< double, features< tag::min, tag::mean > > acc;
// Use std::for_each to accumulate the statistical properties:
acc = std::for_each( data.begin(), data.end(), acc );

Related

Graph with std::vectors?

I thought that a cool way of using vectors could be to have one vector class template hold an two separate int variables for x/y-coordinates to graph.
example:
std::vector<int, int> *name*;
// First int. being the x-intercept on a graph
// Second int. being the y-intercept on a graph
(I also understand that I could just make every even/odd location or two separate vectors to classify each x/y-coordinate, but for me I would just like to see if this could work)
However, after making this vector type, I came across an issue with assigning which int within the vector will be written to or extracted from. Could anyone tell me how to best select and std::cout both x/y ints appropriately?
P.S. - My main goal, in using vectors this way, is to make a very basic graph output to Visual Studio terminal. While being able to change individual x/y-intercepts by 'selecting' and changing if needed. These coordinates will be outputted to the terminal via for/while loops.
Also, would anyone like to list out different ways to best make x/y-coordinates with different containers?
Your question rather broad, in other words it is asking for a bit too much. I will just try to give you some pointers from which you can work your way to what you like.
A) equidistant x
If your x values are equidistant, ie 0, 0.5, 1, 1.5 then there is no need to store them, simply use a
std::vector<int> y;
if the number of variables is not known at compile time, otherwise a
std::array<int,N> y;
B) arbitrary x
There are several options that depend on what you actually want to do. For simply storing (x,y)-pairs and printing them on the screen, they all work equally well.
map
std::map<int,int> map_x_to_y = { { 1,1}, {2,4}, {3,9}};
// print on screen
for (const auto& xy : map_x_to_y) {
std::cout << xy.first << ":" xy.second;
}
a vector of pairs
std::vector<std::pair<int,int>> vector_x_and_y = { { 1,1}, {2,4}, {3,9}};
Printing on screen is actually the same as with map. The advantage of the map is that it has its elements ordered, while this is not the case for the vector.
C) not using any container
For leightweight calculations you can consider to not store the (xy) pairs at all, but simply use a function:
int fun(int x) { return x*x; }
TL;DR / more focussed
A vector stores one type. You cannot have a std::vector<int,int>. If you look at the documentation of std::vector you will find that the second template parameter is an allocator (something you probably dont have to care about for some time). If you want to store two values as one element in a vector you either have to use std::vector<std::pair<double,double>> or a different container.
PS
I used std::pair in the examples above. However, I do consider it as good practice to name things whenever I can and leave std::pair for cases when I simply cannot give names better than first and second. In this spirit you can replace std::pair in the above examples with a
struct data_point {
int x;
int y;
};

Random normal distribution by Gaussian in C++

I have my function in Python for normal distribution. I need to convert it to C++ and i am not familiar with language.
Here is my Python:
def calculation(value):
sigma = 0.5
size = 10000
x = 200
x_distribution = np.random.normal(value, sigma, size)
for i in x_distribution:
x.append(i)
return x
And it works as expected. I am trying to re-write same thing in C++ and found only the Link and where the "std::normal_distribution<> d{5,2};
" has to make magic. But i could not figure it out how to implement.
Here what i have tried and it is failing.
# include frame.distribution
Frame DistributionModel(x_mu, x_sigma)
{
// Motion model;ignore it
model = std::normal_distribution<> d{x_mu,x_sigma};
return model;
}
Please, help me. Looking for any hints. Thanks.
Well, trouble without end...
# include frame.distribution
Syntax for inclusion is:
#include <name_of_header_file>
// or:
#include "name_of_header_file"
(The space in between # and include does not harm, but is absolutely uncommon...)
Frame DistributionModel(x_mu, x_sigma)
C++ is a strongly typed language, i. e. you cannot just give variables a name as in Python, but you need to give them a type!
Frame DistributionModel(double x_mu, double x_sigma)
Same for local variables; type must match what you actually assign to (unless using auto)
std::normal_distribution<double> nd(x_mu, x_sigma);
This is a bit special about C++: You define a local variable, e. g.
std::vector<int> v;
In case of a class, it gets already constructed using its default constructor. If you want to call a constructor with arguments, you just append the call to the variable name:
std::vector<int> v(10); // vector with 10 elements.
What you saw in the sample is a feature called "uniform initialisation", using braces instead of parentheses. I personally strongly oppose against its usage, though, so you won't ever see it in code I have written (see me constructing the std::normal_distribution above...).
std::normal_distribution is defined in header random, so you need to include it (before your function definition):
#include <random>
About the return value: You only can return Frame, if the data type is defined somewhere. Now before trying to define a new class, we just can use an existing one: std::vector (it's a template class, though). A vector is quite similar to a python list, it is a container class storing a number of objects in contiguous memory; other than python lists, though, the type of all elements stored must be the same. We can use such a vector to collect the results:
std::vector<double> result;
Such a vector can grow dynamically, however, this can result in necessity to re-allocate the internal storage memory. Costly. If you know the number of elements in advance, you can tell the vector to allocate sufficient memory in advance, too:
result.reserve(max);
The vector is what we are going to return, so we need to adjust the function signature (I allowed to give it a different name and added another parameter):
std::vector<double> getDistribution(double x_mu, double x_sigma, size_t numberOfValues)
It would be possible to let the compiler deduce the return type, using auto keyword for. While auto brings quite a lot of benefits, I do not recommend it for given purpose: With explicit return type, users of the function see right from the signature what kind of result to expect and do not have to look into the function body to know about.
std::normal_distribution now is a number generator; it does not deliver the entire sequence at once as the python equivalent does, you need to draw the values one by another explicitly:
while(numberOfValues-- > 0)
{
auto value = nd(gen);
result.push_back(value);
}
nd(gen): std::normal_distribution provides a function call operator operator(), so objects of can be called just like functions (such objects are called "functors" in C++ terminology). The function call, however, requires a random number generator as argument, so we need to provide it as in the example you saw. Putting all together:
#include <random>
#include <vector>
std::vector<double> getDistribution
(
double x_mu, double x_sigma, size_t numberOfValues
)
{
// shortened compared to your example:
std::mt19937 gen((std::random_device())());
// create temporary (anonymous) ^^
// instance and call it immediately ^^
// afterwards
std::normal_distribution<double> nd(x_mu, x_sigma);
std::vector<double> result;
result.reserve(numberOfValues);
while(numberOfValues-- > 0)
{
// shorter than above: using result of previous
// function (functor!) call directly as argument to next one
result.push_back(nd(gen));
}
// finally something familiar from python:
return result;
}
#include<iostream>
#include<random>
#include<chrono>
int main() {
unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine generator(seed);
std::normal_distribution<double> distribution(0.0, 3.0);
double number = abs(distribution(generator));
std::cout << number;
std::cin.get();
return 0;
}
This may help, create a random number using gaussian with mean=0.0 and std_dev= 3.0

How to calculate the standard deviation with iterators and lambda functions

After learning that one can calculate the mean of data, which is stored in a std::vector< std::vector<double> > data, can be done the following way:
void calculate_mean(std::vector<std::vector<double>>::iterator dataBegin,
std::vector<std::vector<double>>::iterator dataEnd,
std::vector<double>& rowmeans) {
auto Mean = [](std::vector<double> const& vec) {
return std::accumulate(vec.begin(), vec.end(), 0.0) / vec.size(); };
std::transform(dataBegin, dataEnd, rowmeans.begin(), Mean);
}
I made a function which takes the begin and the end of the iterator of the data vector to calculate the mean and std::vector<double> is where I store the result.
My first question is, how to handle the return value of function, when working with vectors. I mean in this case I make an Alias and modify in this way the vector I initialized before calling this function, so there is no copying back which is nice. So is this good programming practice?
Second my main questions is, how to adapt this function so one can calculate the standard deviation of each row in a similar way. I tried really hard but it only gives a huge mess, where nothing is working properly. So if someone sees it right away how to do that, I would be glad, for a insight. Thank you.
Edit: Solution
So here is my solution for the problem. Given a std::vector< vector<double> > data (rows, std::vector<double>(columns)), where the data is stored in the rows. The following function calculates the sample standard deviation of each row simultaneously.
auto begin = data.begin();
auto end = data.end();
std::vector<double> std;
std.resize(data.size());
void calculate_std(std::vector<std::vector<double>>::iterator dataBegin,
std::vector<std::vector<double>>::iterator dataEnd,
std::vector<double>& rowstds){
auto test = [](std::vector<double> const& vec) {
double sum = std::accumulate(vec.begin(), vec.end(), 0.0);
double mean = sum / vec.size();
double stdSum = 0.0;
auto Std = [&](const double x) { stdSum += (x - mean) * (x - mean); };
std::for_each(vec.begin(), vec.end(), Std);
return sqrt(stdSum / (vec.size() - 1));
};
std::transform(dataBegin, dataEnd, rowstds.begin(), test);
}
I tested it and it works just fine. So if anyone has some suggestions for improvement, please let me know. And is this piece of code good performance wise?
You will find relatively often the convention to write functions with input parameters first, followed by input / output parameters.
Output parameters (that you write to with the return values of your function) are often a pointer to the data, or a reference.
So your solution seems perfect, from that point of view.
Source:
Google's C++ coding conventions
I mean in this case I make an Alias and modify in this way the vector I initialized before calling this function, so there is no copying back which is nice. So is this good programming practice?
No, you should use a local vector<double> variable and return by value. Any compiler worth using would optimize away the copying/moving, and any conforming C++11 compiler is required to perform a move if for whatever reason it cannot elide the copy/move altogether.
Your code as written imposes additional requirements on the caller that are not obvious. For instance, rowmeans must contain enough elements to store the means, or undefined behavior results.

Swap columns in c++

I have an std matrix defined as:
std::vector<std::vector<double> > Qe(6,std::vector<double>(6));
and a vector v that is:
v{0, 1, 3, 2, 4, 5};
I would like to swap the columns 3 and 2 of matrix Qe like indicated in vector v.
In Matlab this is as easy as writing Qe=Qe(:,v);
I wonder if there is an easy way other than a for loop to do this in c++.
Thanks in advance.
Given that you've implemented this as a vector of vectors, you can use a simple swap:
std::swap(Qe[2], Qe[3]);
This should have constant complexity. Of course, this will depend on whether you're treating your data as column-major or row-major. If you're going to be swapping columns often, however, you'll want to arrange the data to suit that (i.e., to allow the code above to work).
As far as doing the job without a for loop when you're using row-major ordering (the usual for C++), you can technically eliminate the for loop (at least from your source code) by using a standard algorithm instead:
std::for_each(Qe.begin(), Qe.end(), [](std::vector<double> &v) {std::swap(v[2], v[3]); });
This doesn't really change what's actually happening though--it just hides the for loop itself inside a standard algorithm. In this case, I'd probably prefer a range-based for loop:
for (auto &v : Qe)
std::swap(v[2], v[3]);
...but I've never been particularly fond of std::for_each, and when C++11 added range-based for loops, I think that was a superior alternative to the vast majority of cases where std::for_each might previously have been a reasonable possibility (IOW, I've never seen much use for std::for_each, and see almost none now).
Depends on how you implement your matrix.
If you have a vector of columns, you can swap the column references. O(1)
If you have a vector of rows, you need to swap the elements inside each row using a for loop. O(n)
std::vector<std::vector<double>> can be used as a matrix but you also need to define for yourself whether it is a vector of columns or vector of rows.
You can create a function for this so you don't write a for loop each time. For example, you can write a function which receives a matrix which is a vector of columns and a reordering vector (like v) and based on the reordering vector you create a new matrix.
//untested code and inefficient, just an example:
vector<vector<double>> ReorderColumns(vector<vector<double>> A, vector<int> order)
{
vector<vector<double>> B;
for (int i=0; i<order.size(); i++)
{
B[i] = A[order[i]];
}
return B;
}
Edit: If you want to do linear algebra there are libraries that can help you, you don't need to write everything yourself. There are math libraries for other purposes too.
If you are in a row scenario. The following would probably work:
// To be tested
std::vector<std::vector<double> >::iterator it;
for (it = Qe.begin(); it != Qe.end(); ++it)
{
std::swap((it->second)[2], (it->second)[3]);
}
In this scenario I don't see any other solution that would avoid doing a loop O(n).

Sorting a vector alongside another vector in C++

I am writing a function in C++ which will take in 2 vectors of doubles called xvalues and yvalues. My aim is to create an interpolation with these inputs. However, it would be really convenient if the (x,y) pairs were sorted so that the x-values were in increasing order and the y-values still corresponded to the correct x-value.
Does anyone know how I can do this efficiently?
I would probably create a vector of pairs and sort that by whatever means necessary.
It sounds like your data abstraction (2 separate collections for values that are actually "linked" is wrong).
As an alternative, you could write some kind of iterator adaptor that internally holds two iterators and increases/decreases/assigns them simultaneously. They dereference to a special type that on swap, swaps the two values in both vectors, but on compare only compare one. This might be some work (extra swap,op<, class ), but when done as a template, and you need this more often, could pay out.
Or you use a vector of pairs, which you then can sort easily with the stl sort algorithm, or you write your own sort method. Therefore you've several options.
Within your own sorting algorithm you can then take care of not only sorting your x-vector but also the y-vector respectively.
Here as an example using bubble sort for your two vectors (vec1 and vec2).
bool bDone = false;
while (!done) {
done = true;
for(unsigned int i=0; i<=vec1.size()-1; ++i) {
if ( vec1.at(i) > vec1.at(i+1) ) {
double tmp = vec1.at(i);
vec1.at(i) = vec1.at(i+1);
vec1.at(i+1) = tmp;
tmp = vec2.at(i);
vec2.at(i) = vec2.at(i+1);
vec2.at(i+1) = tmp;
done = false;
}
}
}
But again, as others pointed out here, you should defenitely use std::vector< std::pair<double, double> > and the just sort it.
The idea is easy: implement a sort algorithm (e.g. quicksort is easy, short an OK for most use cases - there are a lot implementations available: http://www.java-samples.com/showtutorial.php?tutorialid=445 ).
Do the compare on your x-vector and
do the swap on both vectors.
The sort method has to take both vectors a input, but that should be a minor issue.