I want to create an MxN array (M particles in N dimensional space) filled with random numbers within an upper and lower boundary. I have a working python code that looks something like this:
# upper_bound/lower_bound are arrays of shape (dim,)
positions = np.random.rand(num_particle,dim)*(upper_bound-lower_bound)+lower_bound
Each row represents a particle, and each column represents a dimension in the problem space. So the upper_bound and lower_bound applies to each column. Now I want to translate the above code to c++, and I have something like this:
#include <iostream>
#include <vector>
#include <random>
#include <algorithm>
#include <ctime>
typedef std::vector<double> vect1d;
std::vector<vect1d> positions;
for (int i=0; i<num_particle; i++){
std::mt19937_64 generator(static_cast<std::mt19937::result_type>(time(0)));
std::uniform_real_distribution<double> distribution(0,1);
vect1d pos(dimension);
std::generate(pos.begin(),pos.end(),distribution(generator));
positions[i] = pos;
}
My problems:
It gives error regarding the generator, so I'm not sure if I set it properly. I'm also not sure how to use the std::generator. I'm trying it as I've looked at other similar posts and it seems that it allows me to generate more than one random number at a time, so I don't have to run it MxN times for each element. Is this true and how to correctly use it?
In python I can just vectorization and broadcasting to manipulate the numpy array. What's the most 'vectorized' way to do it in c++?
The above (incorrect) code only creates random number between 0 and 1, but how to incorporate the lower_bound and upper_bound as in the python version? I understand that I can change the values inside distribution(0,1), but the problem is the limits can be different for each dimension (so each column can have different valid range), so what's the most efficient way to generate random number, taking into account the range for each dimension?
Thanks
First of all, you're doing more work than you need to with your Python version, just use:
np.random.uniform(lower_bound, upper_bound, size=(num_particle, dim))
In your C++ attempt, the line
std::generate(pos.begin(),pos.end(),distribution(generator));
Is incorrect as the third argument must be a function not a value. A reasonable C++ equivalent would be:
using RandomVector = std::vector<double>;
using RandomMatrix = std::vector<RandomVector>;
template <typename Generator=std::mt19937_64>
RandomMatrix&
fill_uniform(const double low, const double high, RandomMatrix& result)
{
Generator gen {static_cast<typename Generator::result_type>(time(0))};
std::uniform_real_distribution<double> dist {low, high};
for (auto& col : result) {
std::generate(std::begin(col), std::end(col), [&] () { return dist(gen); });
}
return result;
}
template <typename Generator=std::mt19937_64>
RandomMatrix
generate_uniform(const double low, const double high,
const std::size_t ncols, const std::size_t nrows)
{
RandomMatrix result(ncols, RandomVector(nrows));
return fill_uniform<Generator>(low, high, result);
}
int main()
{
auto m = generate_uniform(2, 11, 2, 3);
for (const auto& col : m) {
for (const auto& v : col) {
std::cout << v << " ";
}
std::cout << '\n';
}
}
You could generalise this to generate arbitrary dimension tensors (like the NumPy version) without too much work.
I'll address them in random order:
3.You have several options - using one generator per row, created like distribution(row_lower_limit, row_upper_limit). Should be cheap enough to not cause issues. If you want to reuse the same generator, just do something like row_lower_limit + distribution(generator) * (row_upper_limit - row_lower_limit). The distribution is in both cases U[row_lower_limit, row_upper_limit].
2.The vectorization came from the numpy library, not from Python itself. It provided some nice UX at most. C++ doesn't have an equivalent library to numpy (though there's a lot of libraries for it as well - just nothing so univeral). You wouldn't be wrong by doing two nested fors. You'd perhaps be better served by just declaring a NxM array rather than a vector, like here.
1.Not sure how to help with the problem since we don't know the error. The cplusplus.com reference has an example of how to initialize this with reference to a random_device.
Related
I am a mathematician by training and need to simulate a continuous time Markov chain. I need to use a variant of Gillespie algorithm which relies on fast reading and writing to a 13-dimensional array. At the same time, I need to set the size of each dimension based on users input (they will be each roughly of order 10). Once these sizes are set by the user, they will not change throughout the runtime. The only thing which changes will be the data contained in them. What is the most efficient way of doing this?
My first try was to use the standard arrays but their sizes must be known at the compilation time, which is not my case. Is std::vector a good structure for this? If so, how shall I go about initializing a creature as:
vector<vector<vector<vector<vector<vector<vector<vector<vector<vector<vector<vector<vector<int>>>>>>>>>>>>> Array;
Will the initialization take more time than dealing with an array? Or, is there a better data container to use, please?
Thank you for any help!
I would start by using a std::unordered_map to hold key-value pairs, with each key being a 13-dimensional std::array, and each value being an int (or whatever datatype is appropriate), like this:
#include <iostream>
#include <unordered_map>
#include <array>
typedef std::array<int, 13> MarkovAddress;
// Define a hasher that std::unordered_map can use
// to compute a hash value for a MarkovAddress
// borrowed from: https://codereview.stackexchange.com/a/172095/126857
template<class T, size_t N>
struct std::hash<std::array<T, N>> {
size_t operator() (const std::array<T, N>& key) const {
std::hash<T> hasher;
size_t result = 0;
for(size_t i = 0; i < N; ++i) {
result = result * 31 + hasher(key[i]); // ??
}
return result;
}
};
int main(int, char **)
{
std::unordered_map<MarkovAddress, int> map;
// Just for testing
const MarkovAddress a{{1,2,3,4,5,6,7,8,9,10,11,12,13}};
// Place a value into the map at the specified address
map[a] = 12345;
// Now let's see if the value is present in the map,
// and retrieve it if so
if (map.count(a) > 0)
{
std::cout << "Value in map is " << map[a] << std::endl;
}
else std::cout << "Value not found!?" << std::endl;
return 0;
}
That will give you fast (O(1)) lookup and insert, which is likely your first priority. If you later run into trouble with that (e.g. too much RAM used, or you need a well-defined iteration order, or etc) you could replace it with something more elaborate later.
I'm working on learning C++ STL algorithms. I need help trying to find a function to create a vector of deltas from values in an existing vector. In other words:
delta0 = abs(original1 - original0)
delta1 = abs(original2 - original1)
and so on.
I'm looking for something concise, like R's "diff" function mentioned here:
computing a new vector which has deltas from an existing vector
I found the transform function but it seemed to operate on a single element at a time. It didn't seem to allow parameters of iterator in the function supplied to transform so I was limited to the current element only. I'm trying to learn STL algorithms so I don't really need any libraries that may have "diff" implemented. I would just like to see a way to use STL functions to solve this if there is a concise way I'm not aware of.
Here is an example with the section in question commented:
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<int> v = { 1, 2, 3, 4, 5 };
vector<int> delta;
//---------------------------------------
// way to do this with STL algorithms?
for (auto i = v.begin()+1; i != v.end(); i++) {
delta.push_back(abs(*i - *(i - 1)));
}
//---------------------------------------
for (int i : delta) {
cout << i << " ";
}
return 0;
}
std::transform(std::next(v.begin()), v.end(),
v.begin(), delta.begin(),
[](int a, int b){ return std::abs(a - b); });
You simply missed the second version of transform. Take a look at it here.
See that code in action online!
I want to know why std::accumulate (aka reduce) 3rd parameter is needed. For those who do not know what accumulate is, it's used like so:
vector<int> V{1,2,3};
int sum = accumulate(V.begin(), V.end(), 0);
// sum == 6
Call to accumulate is equivalent to:
sum = 0; // 0 - value of 3rd param
for (auto x : V) sum += x;
There is also optional 4th parameter, which allow to replace addition with any other operation.
Rationale that I've heard is that if you need let say not to add up, but multiply elements of a vector, we need other (non-zero) initial value:
vector<int> V{1,2,3};
int product = accumulate(V.begin(), V.end(), 1, multiplies<int>());
But why not do like Python - set initial value for V.begin(), and use range starting from V.begin()+1. Something like this:
int sum = accumulate(V.begin()+1, V.end(), V.begin());
This will work for any op. Why is 3rd parameter needed at all?
You're making a mistaken assumption: that type T is of the same type as the InputIterator.
But std::accumulate is generic, and allows all different kinds of creative accumulations and reductions.
Example #1: Accumulate salary across Employees
Here's a simple example: an Employee class, with many data fields.
class Employee {
/** All kinds of data: name, ID number, phone, email address... */
public:
int monthlyPay() const;
};
You can't meaningfully "accumulate" a set of employees. That makes no sense; it's undefined. But, you can define an accumulation regarding the employees. Let's say we want to sum up all the monthly pay of all employees. std::accumulate can do that:
/** Simple class defining how to add a single Employee's
* monthly pay to our existing tally */
auto accumulate_func = [](int accumulator, const Employee& emp) {
return accumulator + emp.monthlyPay();
};
// And here's how you call the actual calculation:
int TotalMonthlyPayrollCost(const vector<Employee>& V)
{
return std::accumulate(V.begin(), V.end(), 0, accumulate_func);
}
So in this example, we're accumulating an int value over a collection of Employee objects. Here, the accumulation sum isn't the same type of variable that we're actually summing over.
Example #2: Accumulating an average
You can use accumulate for more complex types of accumulations as well - maybe want to append values to a vector; maybe you have some arcane statistic you're tracking across the input; etc. What you accumulate doesn't have to be just a number; it can be something more complex.
For example, here's a simple example of using accumulate to calculate the average of a vector of ints:
// This time our accumulator isn't an int -- it's a structure that lets us
// accumulate an average.
struct average_accumulate_t
{
int sum;
size_t n;
double GetAverage() const { return ((double)sum)/n; }
};
// Here's HOW we add a value to the average:
auto func_accumulate_average =
[](average_accumulate_t accAverage, int value) {
return average_accumulate_t(
{accAverage.sum+value, // value is added to the total sum
accAverage.n+1}); // increment number of values seen
};
double CalculateAverage(const vector<int>& V)
{
average_accumulate_t res =
std::accumulate(V.begin(), V.end(), average_accumulate_t({0,0}), func_accumulate_average)
return res.GetAverage();
}
Example #3: Accumulate a running average
Another reason you need the initial value is because that value isn't always the default/neutral value for the calculation you're making.
Let's build on the average example we've already seen. But now, we want a class that can hold a running average -- that is, we can keep feeding in new values, and check the average so far, across multiple calls.
class RunningAverage
{
average_accumulate_t _avg;
public:
RunningAverage():_avg({0,0}){} // initialize to empty average
double AverageSoFar() const { return _avg.GetAverage(); }
void AddValues(const vector<int>& v)
{
_avg = std::accumulate(v.begin(), v.end(),
_avg, // NOT the default initial {0,0}!
func_accumulate_average);
}
};
int main()
{
RunningAverage r;
r.AddValues(vector<int>({1,1,1}));
std::cout << "Running Average: " << r.AverageSoFar() << std::endl; // 1.0
r.AddValues(vector<int>({-1,-1,-1}));
std::cout << "Running Average: " << r.AverageSoFar() << std::endl; // 0.0
}
This is a case where we absolutely rely on being able to set that initial value for std::accumulate - we need to be able to initialize the accumulation from different starting points.
In summary, std::accumulate is good for any time you're iterating over an input range, and building up one single result across that range. But the result doesn't need to be the same type as the range, and you can't make any assumptions about what initial value to use -- which is why you must have an initial instance to use as the accumulating result.
The way things are, it is annoying for code that knows for sure a range isn't empty and that wants to start accumulating from the first element of the range on. Depending on the operation that is used to accumulate with, it's not always obvious what the 'zero' value to use is.
If on the other hand you only provide a version that requires non-empty ranges, it's annoying for callers that don't know for sure that their ranges aren't empty. An additional burden is put on them.
One perspective is that the best of both worlds is of course to provide both functionality. As an example, Haskell provides both foldl1 and foldr1 (which require non-empty lists) alongside foldl and foldr (which mirror std::transform).
Another perspective is that since the one can be implemented in terms of the other with a trivial transformation (as you've demonstrated: std::transform(std::next(b), e, *b, f) -- std::next is C++11 but the point still stands), it is preferable to make the interface as minimal as it can be with no real loss of expressive power.
Because standard library algorithms are supposed to work for arbitrary ranges of (compatible) iterators. So the first argument to accumulate doesn't have to be begin(), it could be any iterator between begin() and one before end(). It could also be using reverse iterators.
The whole idea is to decouple algorithms from data. Your suggestion, if I understand it correctly, requires a certain structure in the data.
If you wanted accumulate(V.begin()+1, V.end(), V.begin()) you could just write that. But what if you thought v.begin() might be v.end() (i.e. v is empty)? What if v.begin() + 1 is not implemented (because v only implements ++, not generized addition)? What if the type of the accumulator is not the type of the elements? Eg.
std::accumulate(v.begin(), v.end(), 0, [](long count, char c){
return isalpha(c) ? count + 1 : count
});
It's indeed not needed. Our codebase has 2 and 3-argument overloads which use a T{} value.
However, std::accumulate is pretty old; it comes from the original STL. Our codebase has fancy std::enable_if logic to distinguish between "2 iterators and initial value" and "2 iterators and reduction operator". That requires C++11. Our code also uses a trailing return type (auto accumulate(...) -> ...) to calculate the return type, another C++11 feature.
The program below (well, the lines after "from here") is a construct i have to use a lot.
I was wondering whether it is possible (eventually using functions from the eigen library)
to vectorize or otherwise make this program run faster.
Essentially, given a vector of float x, this construct has recover the indexes
of the sorted elements of x in a int vector SIndex. For example, if the first
entry of SIndex is 10, it means that the 10th element of x was the smallest element
of x.
#include <algorithm>
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>
using std::vector;
using namespace std;
typedef pair<int, float> sortData;
bool sortDataLess(const sortData& left, const sortData& right){
return left.second<right.second;
}
int main(){
int n=20,i;
float LO=-1.0,HI=1.0;
srand (time(NULL));
vector<float> x(n);
vector<float> y(n);
vector<int> SIndex(n);
vector<sortData> foo(n);
for(i=0;i<n;i++) x[i]=LO+(float)rand()/((float)RAND_MAX/(HI-LO));
//from here:
for(i=0;i<n;i++) foo[i]=sortData(i,x[i]);
sort(foo.begin(),foo.end(),sortDataLess);
for(i=0;i<n;i++){
sortData bar=foo[i];
y[i]=x[bar.first];
SIndex[i]=bar.first;
}
for(i=0;i<n;i++) std::cout << SIndex[i] << std::endl;
return 0;
}
There's no getting around the fact that this is a sorting problem, and vectorization doesn't necessarily improve sorts very much. For example, the partition step of quicksort can do the comparison in parallel, but it then needs to select and store the 0–n values that passed the comparison. This can absolutely be done, but it starts throwing out the advantages you get from vectorization—you need to convert from a comparison mask to a shuffle mask, which is probably a lookup table (bad), and you need a variable-sized store, which means no alignment (bad, although maybe not that bad). Mergesort needs to merge two sorted lists, which in some cases could be improved by vectorization, but in the worst case (I think) needs the same number of steps as the scalar case.
And, of course, there's a decent chance that any major speed boost you get from vectorization will have already been done inside your standard library's std::sort implementation. To get it, though, you'd need to be sorting primitive types with the default comparison operator.
If you're worried about performance, you can easily avoid the last loop, though. Just sort a list of indices using your float array as a comparison:
struct IndirectLess {
template <typename T>
IndirectLess(T iter) : values(&*iter) {}
bool operator()(int left, int right)
{
return values[left] < values[right];
}
float const* values;
};
int main() {
// ...
std::vector<int> SIndex;
SIndex.reserve(n);
for (int i = 0; i < n; ++i)
SIndex.push_back(n);
std::sort(SIndex.begin(), SIndex.end(), IndirectLess(x.begin()));
// ...
}
Now you've only produced your list of sorted indices. You have the potential to lose some cache locality, so for really big lists it might be slower. At that point it might be possible to vectorize your last loop, depending on the architecture. It's just data manipulation, though—read four values, store 1st and 3rd in one place and 2nd and 4th in another—so I wouldn't expect Eigen to help much at that point.
I have an issue how to implement to compare two static arrays, ie.
string bufferNames[]={"apple","orange","banana","pomegranate","pear"};
string bufferPictures[] = {"apple.bmp","orange.bmp","banana.bmp","pomegranate.bmp","pear.bmp"};
Each item in the bufferNames presents the choice that to someone has been given, when the picture from the bufferPictures has been loaded onto the screen. So, if I for example get orange.bmp using rand() function that iterates through that list, how can I get the same one corresponding element orange and two other random not correct elements. Any help would be appreciated.
Thanks in advance.
P.S. If further breaking in of the problem is needed, just say it so.
This should do it. The code makes use of the C++11 features. You will
need to adapt it, to pass it off as homework.
#include <string>
#include <iostream>
#include <algorithm>
#include <vector>
struct Picture {
std::string name, file;
bool operator==(const Picture& x) const { return this->name == x.name && this->file == x.file; }
bool operator!=(const Picture& x) const { return !(*this == x); }
};
int main()
{
std::vector< Picture > pics =
{
{"apple", "apple.bmp"},
{"orange", "orange.bmp"},
{"banana", "banana.bmp"},
{"pear", "pear.bmp"},
};
// determined by random choice
const Picture& choice = pics[0];
std::vector< Picture > woChoice;
std::copy_if(pics.begin(), pics.end(), std::back_inserter(woChoice),
[&choice](const Picture& x) {
return x != choice;
});
// random shuffle the remainder and pick the first
// two. alternatively and for more efficience use std::random to
// generate indices
std::random_shuffle(woChoice.begin(), woChoice.end());
std::cout << woChoice[0].name << std::endl;
std::cout << woChoice[1].name << std::endl;
return 0;
}
So, if I for example get orange.bmp using rand() function that iterates through that list, how can I get the same one corresponding element orange and two other random not correct elements.
If you use rand() to get a number (let's call it x) between 0 and 4 inclusive (based on there being 5 distinct values in the arrays), then you can use that number in both arrays to find the related word and image.
To get one other random incorrect element, you can call rand() in a loop until you get a value other than x. Let's call it y.
To get another random incorrect elements, you can call rand() in a loop until you get a value other than x and y.
There are other ways to do this, but that's probably easiest to understand and implement.
The names in arrays correspond to each other. So, if you need fruit
number i, take bufferNames[i] and bufferPictures[i] in parallel way.
Ensure that names ARE parallel. Simply making the second array
elements from the first array elements.
As for random in range 0..n-1 excluding elements number i,j (j>i), count it so:
temp=random(n-3);
k=(temp>=i?temp+1:temp);
k=(k>=j?k+1:k);
And again, take bufferNames[k] and bufferPictures[k].
It is not simple, it is VERY simple.