This question already has answers here:
Fastest way to clamp a real (fixed/floating point) value?
(14 answers)
Closed 7 years ago.
I want to create a simple struct that stores the RGB-values of a color. r, g and b are supposed to be double numbers in [0,1].
struct Color
{
Color(double x): r{x}, g{x}, b{x} {
if (r < 0.0) r = 0.0;
if (r > 1.0) r = 1.0;
if (g < 0.0) g = 0.0;
if (g > 1.0) g = 1.0;
if (b < 0.0) b = 0.0;
if (b > 1.0) b = 1.0;
}
}
Is there a better way than using those if statements?
Just write a function to clamp:
double clamp(double val, double left = 0.0, double right = 1.0) {
return std::min(std::max(val, left), right);
}
And use that in your constructor:
Color(double x)
: r{clamp(x)}
, g{clamp(x)}
, b{clamp(x)}
{ }
You, can can use min and max, ideally combining them into a clamp function:
template <class T>
T clamp(T val, T min, T max)
{
return std::min(max, std::max(min, val));
}
struct Color
{
Color(double x) : r{clamp(x, 0., 1.)}, g{clamp(x, 0., 1.)}, b{clamp(x, 0., 1.)}
{}
};
For a first pass iteration, we have min/max functions we can and should use:
struct Color
{
explicit Color(double x): r{x}, g{x}, b{x}
{
r = std::max(r, 0.0);
r = std::min(r, 1.0);
g = std::max(g, 0.0);
g = std::min(g, 1.0);
b = std::max(b, 0.0);
b = std::min(b, 1.0);
}
double r, g, b;
};
I'd also suggest making that constructor explicit, as it's rather confusing for a scalar to implicitly convert to a Color.
The reason this is arguably an upgrade even with roughly the same amount of code and arguably not the biggest improvement in readability is because, while optimizing compilers might emit faster branchless code here, min and max can often guarantee an efficient implementation. You're also expressing what you're doing in a slightly more direct way.
There is some truth to this somewhat counter-intuitive idea that writing higher level code helps you achieve efficiency, if only for the reason that the low-level logic used to implement the high-level function is more likely to be efficient than what people would repeatedly write otherwise in their more casual, daily kind of code. It also helps direct your codebase towards more central targets for optimization.
As a second pass, this may not improve things for your particular use cases, but in general I've found it's useful to represent color and vector components using an array to allow you to access them with loops. This is because if you start doing somewhat complex things with colors like blending them together, the logic for each color component is non-trivial but identical for all components, so you don't want to end up writing such code three times all the time or always be forced into writing the per-component logic in a separate function or anything like that.
So we might do this:
class Color
{
public:
explicit Color(double x)
{
for (int j=0; j < 3; ++j)
{
rgb[j] = x;
rgb[j] = std::max(rgb[j], 0.0);
rgb[j] = std::min(rgb[j], 1.0);
}
}
// Bounds-checking assertions in these would also be a nice idea.
double& operator[](int n) {return rgb[n]};
double operator[](int n) const {return rgb[n]};
double& red() {return rgb[0];}
double red() const {return rgb[0];}
double& green() {return rgb[1];}
double green() const {return rgb[1];}
double& blue() {return rgb[2];}
double blue() const {return rgb[2];}
// Somewhat excess fluff, but such methods can be useful when
// interacting with a low-level C-style API (OpenGL, e.g.) as
// opposed to using &color.red() or &color[0].
double* data() {return rgb;}
const double* data() const {return rgb;}
private:
double rgb[3];
};
Finally, as others have mentioned, this is where a function to clamp values to a range is useful, so as a final pass:
template <class T>
T clamp(T val, T low, T high)
{
assert(low <= high);
return std::max(std::min(val, high), low);
}
// New constructor using clamp:
explicit Color(double x)
{
for (int j=0; j < 3; ++j)
rgb[j] = clamp(x, 0.0, 1.0);
}
Related
I am trying to implement Non Linear MPC for a 7-DOF manipulator in drake. To do this, in my constraints, I need to have dynamic parameters like the Mass matrix M(q) and the bias term C(q,q_dot)*q_dot, but those depend on the decision variables q, q_dot.
I tried the following
// finalize plant
// create builder, diagram, context, plant context
...
// formulate optimazation problem
drake::solvers::MathematicalProgram prog;
// create decision variables
...
std::vector<drake::solvers::VectorXDecisionVariable> q_v;
std::vector<drake::solvers::VectorXDecisionVariable> q_ddot;
for (int i = 0; i < H; i++) {
q_v.push_back(prog.NewContinuousVariables<14>(state_var_name));
q_ddot.push_back(prog.NewContinuousVariables<7>(input_var_name));
}
// add cost
...
// add constraints
...
for (int i = 0; i < H; i++) {
plant.SetPositionsAndVelocities(*plant_context, q_v[i]);
plant.CalcMassMatrix(*plant_context, M);
plant.CalcBiasTerm(*plant_context, C_q_dot);
}
...
for (int i = 0; i < H; i++) {
prog.AddConstraint( M * q_ddot[i] + C_q_dot + G >= lb );
prog.AddConstraint( M * q_ddot[i] + C_q_dot + G <= ub );
}
// solve prog
...
The above code will not work, because plant.SetPositionsAndVelocities(.) doesn't accept symbolic variables.
Is there any way to integrate M,C in my ocp constraints ?
I think you want to impose the following nonlinear nonconvex constraint
lb <= M * qddot + C(q, v) + g(q) <= ub
This constraint is non-convex. We will need to solve it through nonlinear optimization, and evaluate the constraint in every iteration of the nonlinear optimization. We can't do this evaluation using symbolic computation (it would be horribly slow with symbolic computation).
So you will need a constraint evaluator, something like this
// This constraint takes [q;v;vdot] and evaluate
// M * vdot + C(q, v) + g(q)
class MyConstraint : public solvers::Constraint {
public:
MyConstraint(const MultibodyPlant<AutoDiffXd>& plant, systems::Context<AutoDiffXd>* context, const Eigen::Ref<const Eigen::VectorXd>& lb, const Eigen::Ref<const Eigen::VectorXd>& ub) : solvers::Constraint(plant.num_velocitiex(), plant.num_positions() + 2 * plant.num_velocities(), lb, ub), plant_{plant}, context_{context} {
...
}
private:
void DoEval(const Eigen::Ref<const AutoDiffVecXd>& x, AutoDiffVecXd* y) const {
...
}
MultibodyPlant<AutoDiffXd> plant_;
systems::Context<AutoDiffXd>* context_;
};
int main() {
...
// Construct the constraint and add it to every time instances
std::vector<std::unique_ptr<systems::Context<AutoDiffXd>>> plant_contexts;
for (int i = 0; i < H; ++i) {
plant_contexts.push_back(plant.CreateDefaultContext());
prog.AddConstraint(std::make_shared<MyConstraint>(plant, plant_context[i], lb, ub), {q_v[i], qddot[i]});
}
}
You could refer to the class CentroidalMomentumConstraint on how to construct your own MyConstraint class.
I am trying to implement a numerical simulation of a state space model using Eigen and Odeint. My trouble is that I need to reference control data U (predefined before integration) in order to properly solve the Ax+Bu part of the state space model. I was trying to accomplish this by using a counter to keep track of the current time step, but for whatever reason, it is reset to zero every time the System Function is called by Odeint.
How would I get around this? Is my approach to modeling the state space system flawed?
My System
struct Eigen_SS_NLTIV_Model
{
Eigen_SS_NLTIV_Model(matrixXd &ssA, matrixXd &ssB, matrixXd &ssC,
matrixXd &ssD, matrixXd &ssU, matrixXd &ssY)
:A(ssA), B(ssB), C(ssC), D(ssD), U(ssU), Y(ssY)
{
Y.resizeLike(U);
Y.setZero();
observerStep = 0;
testPtr = &observerStep;
}
/* Observer Function:*/
void operator()(matrixXd &x, double t)
{
Y.col(observerStep) = C*x + D*U.col(observerStep);
observerStep += 1;
}
/* System Function:
* ONLY the mathematical description of the system dynamics may be placed
* here. Any data placed in here is destroyed after each iteration of the
* stepper.
*/
void operator()(matrixXd &x, matrixXd &dxdt, double t)
{
dxdt = A*x + B*U.col(*testPtr);
//Cannot reference the variable "observerStep" directly as it gets reset
//every time this is called. *testPtr doesn't work either.
}
int observerStep;
int *testPtr;
matrixXd &A, &B, &C, &D, &U, &Y; //Input Vectors
};
My ODE Solver Setup
const double t_end = 3.0;
const double dt = 0.5;
int steps = (int)std::ceil(t_end / dt) + 1;
matrixXd A(2, 2), B(2, 2), C(2, 2), D(2, 2), x(2, 1);
matrixXd U = matrixXd::Constant(2, steps, 1.0);
matrixXd Y;
A << -0.5572, -0.7814, 0.7814, 0.0000;
B << 1.0, -1.0, 0.0, 2.0;
C << 1.9691, 6.4493, 1.9691, 6.4493;
D << 0.0, 0.0, 0.0, 0.0;
x << 0, 0;
Eigen_SS_NLTIV_Model matrixTest(A, B, C, D, U, Y);
odeint::integrate_const(odeint::runge_kutta4<matrixXd, double, matrixXd, double,
odeint::vector_space_algebra>(),
matrixTest, x, 0.0, t_end, dt, matrixTest);
//Ignore these two functions. They are there mostly for debugging.
writeCSV<matrixXd>(Y, "Y_OUT.csv");
prettyPrint<matrixXd>(Y, "Out Full");
With classical Runge-Kutta you know that your ODE model function is called 4 times per step with times t, t+h/2, t+h/2, t+h. With other solvers that implement adaptive step size you can not know in advance at what t the ODE model function is called.
You should implement U via some kind of interpolation function, in the most simple case as step function that computes some index from t and returns the U value for that index. Something like
i = (int)(t/U_step)
dxdt = A*x + B*U.col(i);
I'm working on programming my own little game which should have a visibility effect as described here. My world consists of Polygons which each have a list of Edges (sorted CW). I now want (as described in the article) to cast Rays towards the Edges of the polygons, find the intersections and retrieve a Polygon that defines the visible area.
So I wrote a classes for Vectors, Points, Edges and Polygons and adjusted the intersection-algorithm so it works with my code.
I then tested it and everything worked fine, but as I ran the Intersection algorithm in a for-loop to simulate a large amount of Edges processed(starting with 100, until 1000) the fps dropped drastically, with 100 Edges "only" 300fps (3000 before), and with 300 it dropped below 60 i think. This seems to be way to much drop for me as i wanted to reuse this code for my Lightsources and then i think i would quickly come up with processing way more than 300 Edges and it should run fast on way less powerful processors(i got an xeon e1230v3).
I figured out that only calling the EdgeIntersection the program runs many times faster, but I definitely need to loop through the Edges in my polygons so this is no option.
My Source-Code:
Vector.h/.cpp: Basic Vector class with two floats(X,Y), getters&setters, rotating
Vertex.h/.cpp: Basic Point class with a Position Vector, getters&setters and a boolean that indicates whether it is a Intersection Vertex
Edge.h/.cpp Basic Edge class with start/end-Verticies, getters&setters and rotating function(uses Vector.rotate())
Polygon.h:
#pragma once
#include <vector>
#include "Edge.h"
namespace geo
{
class Polygon
{
private:
std::vector<Edge> edges;
public:
Polygon();
Polygon(std::vector<Edge> edges);
~Polygon();
std::vector<Edge> getEdges();
Edge getEdge(int index);
int getEdgeCount();
void setEdges(std::vector<Edge> edges);
void setEdge(Edge e, int index);
void addEdge(Edge e);
void removeEdge(int index);
};
}
Ray.h:
#pragma once
#include "Vertex.h"
class Ray
{
private:
geo::Vertex origin;
geo::Vector dir;
public:
Ray();
Ray(geo::Vertex origin, geo::Vector dir);
~Ray();
geo::Vertex getOrigin();
geo::Vector getDirection();
void setOrigin(geo::Vertex origin);
void setDirection(geo::Vector dir);
};
LightModule.h:
#pragma once
#include "Polygon.h"
#include "Ray.h"
class LightModule
{
private:
//List of blocking Polygons
std::vector<geo::Polygon>* blockingPolygons;
std::vector<Ray> rays;
geo::Polygon bounds;
geo::Polygon visible;
/*geo::Polygon blocked;*/
//HitDetection Class later
geo::Vertex getIntersection(Ray r, geo::Edge* e);
geo::Vertex getClosestIntersection(Ray r, geo::Polygon *p);
public:
LightModule();
LightModule(std::vector<geo::Polygon>* blockingPolygons);
~LightModule();
//Set the Blocking Polygons
void setBlockingPolygons(std::vector<geo::Polygon>* blockingPolygons);
geo::Vertex callCI(Ray r, geo::Polygon* p);
geo::Vertex callI(Ray r, geo::Edge* e);
//Cast Rays towards Vertecies and store them in rays
void updateRays();
//Update Visibility Polygon
void updateVisible();
//Return Visibility Polygon
geo::Polygon* getVisible();
};
LightMModule.cpp:
#include "LightModule.h"
LightModule::LightModule()
{
rays.clear();
}
LightModule::LightModule(std::vector<geo::Polygon>* blockingPolygons)
{
this->blockingPolygons = blockingPolygons;
rays.clear();
}
LightModule::~LightModule()
{
}
void LightModule::setBlockingPolygons(std::vector<geo::Polygon>* blockingPolygons)
{
this->blockingPolygons = blockingPolygons;
}
//Test-cast a Ray (will follow mouse in the Test)
void LightModule::updateRays()
{
Ray r(geo::Vertex(geo::Vector(200, 100)), geo::Vector(-100, 0));
rays.push_back(r);
}
void LightModule::updateVisible()
{
}
//Both for Testing will later be part of a seperate class
geo::Vertex LightModule::callCI(Ray r, geo::Polygon *p)
{
return this->getClosestIntersection(r, p);
}
geo::Vertex LightModule::callI(Ray r, geo::Edge* e)
{
return this->getIntersection(r, e);
}
//TEST
geo::Vertex LightModule::getIntersection(Ray r, geo::Edge* e)
{
geo::Vertex v;
v.setIntersectVert(false);
float r_px = r.getOrigin().getPosition().getX();
float r_py = r.getOrigin().getPosition().getY();
float r_dx = r.getDirection().getX();
float r_dy = r.getDirection().getY();
float s_px = e->getOrigin().getPosition().getX();
float s_py = e->getOrigin().getPosition().getY();
float s_dx = e->getDirection().getX();
float s_dy = e->getDirection().getY();
float r_mag = sqrt(r_dx*r_dx + r_dy*r_dy);
float s_mag = sqrt(s_dx*s_dx + s_dy*s_dy);
if (r_dx / r_mag == s_dx / s_mag && r_dy / r_mag == s_dy / s_mag)
{
return v;
}
float T2 = (r_dx*(s_py - r_py) + r_dy*(r_px - s_px)) / (s_dx*r_dy - s_dy*r_dx);
float T1 = (s_px + s_dx*T2 - r_px) / r_dx;
if (T1 < 0 /*|| T1 > 1 For Lines*/)
{
return v;
}
if (T2 < 0 || T2 > 1)
{
return v;
}
v.setIntersectVert(true);
v.setPosition(geo::Vector(r_px + r_dx*T1, r_py + r_dy*T1));
return v;
}
geo::Vertex LightModule::getClosestIntersection(Ray r, geo::Polygon *p)
{
geo::Vertex v;
v.setIntersectVert(false);
geo::Vertex v_nearest(geo::Vector(0, 0));
v_nearest.setIntersectVert(false);
geo::Vector h1;
geo::Vector h2;
for (int i = 0; i < p->getEdges().size(); i++)
{
v = this->getIntersection(r, &p->getEdges().at(i));
h1.setX(v.getPosition().getX() - r.getOrigin().getPosition().getX());
h1.setY(v.getPosition().getY() - r.getOrigin().getPosition().getY());
h2.setX(v_nearest.getPosition().getX() - r.getOrigin().getPosition().getX());
h2.setY(v_nearest.getPosition().getY() - r.getOrigin().getPosition().getY());
if (i < 1)
v_nearest = v;
else if (v.isIntersectVert() == true && h1.getLength() < h2.getLength())
{
v_nearest = v;
}
}
return v_nearest;
}
For the Testing i create a Polygon a LightModule and call updateRays and then call the helper-Function callCI().
I know my code gets pretty messy when i have to cascade my getters and setters, ill have to fix that but for the Rest i hope everything is understandable and if not feel free to ask. And just to have mentioned it, I Test-draw my Objects with Vertex-Arrays but I don't need Graphical output of the intersection process, i just need the visible polygon.
Just to point out again: I need a faster way of finding the Intersection-Point between a Ray and a Polygon and as I didn't know if i did something wrong in my code I posted it all here so someone can maybe help me making my code more efficient or show me a different method to solve my problem.
Have a nice day and thank you for your answers :)
Paul
EDIT: Would it be meaningfully faster to first triangulate my polygons and then do a Ray-Triangle intersection Test?
I can't speak to the algorithm (which is possibly what you need) but some immediate thoughts on speeding up what you have.
First off you can define all your getters and setters inline (put them in the class in the header, not the separate source file) so the compiler can optimize the function calls away.
Then these changes might buy you a few frames:
// make sure your getters and setters are inline so the compiler
// can optimize them away
geo::Vertex LightModule::getClosestIntersection(Ray r, geo::Polygon* p)
{
geo::Vertex v;
v.setIntersectVert(false);
geo::Vector h1;
geo::Vector h2;
// cache these
Vector ray_position = r.getOrigin().getPosition();
geo::Vertex v_nearest(geo::Vector(0, 0));
v_nearest.setIntersectVert(false);
// cache size (don't dereference each time)
size_t size = p->getEdges().size();
// avoid acces violation
if(!size)
return v_nearest;
// preset item 0
v_nearest = this->getIntersection(r, &p->getEdges()[0]);
// start from 1 not 0
for(int i = 1; i < size; i++)
{
// don't use at() its slower
// v = this->getIntersection(r, &p->getEdges().at(i));
v = this->getIntersection(r, &p->getEdges()[i]);
// used cached ray position rather than call functions
h1.setX(v.getPosition().getX() - ray_position.getX());
h1.setY(v.getPosition().getY() - ray_position.getY());
h2.setX(v_nearest.getPosition().getX() - ray_position.getX());
h2.setY(v_nearest.getPosition().getY() - ray_position.getY());
// this if not needed because presetting item 0
//if(i < 1)
// v_nearest = v;
if(v.isIntersectVert() == true && h1.getLength() < h2.getLength())
{
v_nearest = v;
}
}
return v_nearest;
}
I removed one of the if statements by calculating the 0 item before the loop and starting the loop from 1, the rest is just caching a much used value and avoiding at() which is slower because it does bound-checking.
I'm trying to create a basic value noise function. I've reached the point where it's outputting it but within the output there are unexpected artefacts popping up such as diagonal discontinuous lines and blurs. I just can't seem to find what's causing it. Could somebody please take a look at it to see if I'm going wrong somewhere.
First off, here are three images that it's ouputting with greater magnification on each one.
//data members
float m_amplitude, m_frequency;
int m_period; //controls the tile size of the noise
vector<vector<float> m_points; //2D array to store the lattice
//The constructor generates the 2D square lattice and populates it.
Noise2D(int period, float frequency, float amplitude)
{
//initialize the lattice to the appropriate NxN size
m_points.resize(m_period);
for (int i = 0; i < m_period; ++i)
m_points[i].resize(m_period);
//populates the lattice with values between 0 and 1
int seed = 209;
srand(seed);
for(int i = 0; i < m_period; i++)
{
for(int j = 0; j < m_period; j++)
{
m_points[i][j] = abs(rand()/(float)RAND_MAX);
}
}
}
//Evaluates a position
float Evaluate(float x, float y)
{
x *= m_frequency;
y *= m_frequency;
//Gets the integer values from each component
int xFloor = (int) x;
int yFloor = (int) y;
//Gets the decimal data in the range of [0:1] for each of the components for interpolation
float tx = x - xFloor;
float ty = y - yFloor;
//Finds the appropriate boundary lattice array indices using the modulus technique to ensure periodic noise.
int xPeriodLower = xFloor % m_period;
int xPeriodUpper;
if(xPeriodLower == m_period - 1)
xPeriodUpper = 0;
else
xPeriodUpper = xPeriodLower + 1;
int yPeriodLower = yFloor % m_period;
int yPeriodUpper;
if(yPeriodLower == m_period - 1)
yPeriodUpper = 0;
else
yPeriodUpper = yPeriodLower + 1;
//The four random values at each boundary. The naming convention for these follow a single 2d coord system 00 for bottom left, 11 for top right
const float& random00 = m_points[xPeriodLower][yPeriodLower];
const float& random10 = m_points[xPeriodUpper][yPeriodLower];
const float& random01 = m_points[xPeriodLower][yPeriodUpper];
const float& random11 = m_points[xPeriodUpper][yPeriodUpper];
//Remap the weighting of each t dimension here if you wish to use an s-curve profile.
float remappedTx = tx;
float remappedTy = ty;
return MyMath::Bilinear<float>(remappedTx, remappedTy, random00, random10, random01, random11) * m_amplitude;
}
Here are the two interpolation functions that it relies on.
template <class T1>
static T1 Bilinear(const T1 &tx, const T1 &ty, const T1 &p00, const T1 &p10, const T1 &p01, const T1 &p11)
{
return Lerp( Lerp(p00,p10,tx),
Lerp(p01,p11,tx),
ty);
}
template <class T1> //linear interpolation aka Mix
static T1 Lerp(const T1 &a, const T1 &b, const T1 &t)
{
return a * (1 - t) + b * t;
}
Some of the artifacts are the result of linear interpolation. Using a higher order interpolation method would help, but it will only solve part of the problem. Crudely put, sharp transitions in the signal can lead to artifacts.
Additional artifacts result from distributing the starting noise values (I.E. the values you are interpolating among) at equal intervals - in this case, a grid. The highest & lowest values will only ever occur at these grid points - at least when using linear interpolation. Roughly speaking, patterns in the signal can lead to artifacts. Two potential ways I know of addressing this part of the problem are either using a nonlinear interpolation &/or randomly nudging the coordinates of the starting noise values to break up their regularity.
Libnoise has an explanation of generating coherent noise which covers these problems & solutions in greater depth with some nice illustrations. You could also peek at the source if you need see how it deals with these problems. And as richard-tingle already mentioned, simplex noise was designed to correct the artifact problems inherent in Perlin noise; it's a little tougher to get your head around, but it's a solid technique.
I'm currently benchmarking some data structures in C++ and I want to test them when working on Zipf-distributed numbers.
I'm using the generator provided on this site: http://www.cse.usf.edu/~christen/tools/toolpage.html
I adapted the implementation to use a Mersenne Twister generator.
It works well but it is really slow. In my case, the range can be big (about a million) and the number of random numbers generate can be several millions.
The alpha parameter does not change over time, it is fixed.
I tried to precaculate all the sum_prob. It's much faster, but still slows on big range.
Is there a faster way to generate Zipf distributed numbers ? Even something less precise will be welcome.
Thanks
The pre-calculation alone does not help so much. But as it's obvious the sum_prob is accumulative and has ascending order. So if we use a binary-search to find the zipf_value we would decrease the order of generating a Zipf distributed number from O(n) to O(log(n)). Which is so much improvement in efficiency.Here it is, just replace the zipf() function in genzipf.c with following one:
int zipf(double alpha, int n)
{
static int first = TRUE; // Static first time flag
static double c = 0; // Normalization constant
static double *sum_probs; // Pre-calculated sum of probabilities
double z; // Uniform random number (0 < z < 1)
int zipf_value; // Computed exponential value to be returned
int i; // Loop counter
int low, high, mid; // Binary-search bounds
// Compute normalization constant on first call only
if (first == TRUE)
{
for (i=1; i<=n; i++)
c = c + (1.0 / pow((double) i, alpha));
c = 1.0 / c;
sum_probs = malloc((n+1)*sizeof(*sum_probs));
sum_probs[0] = 0;
for (i=1; i<=n; i++) {
sum_probs[i] = sum_probs[i-1] + c / pow((double) i, alpha);
}
first = FALSE;
}
// Pull a uniform random number (0 < z < 1)
do
{
z = rand_val(0);
}
while ((z == 0) || (z == 1));
// Map z to the value
low = 1, high = n, mid;
do {
mid = floor((low+high)/2);
if (sum_probs[mid] >= z && sum_probs[mid-1] < z) {
zipf_value = mid;
break;
} else if (sum_probs[mid] >= z) {
high = mid-1;
} else {
low = mid+1;
}
} while (low <= high);
// Assert that zipf_value is between 1 and N
assert((zipf_value >=1) && (zipf_value <= n));
return(zipf_value);
}
The only C++11 Zipf random generator I could find calculated the probabilities explicitly and used std::discrete_distribution. This works fine for small ranges, but is not useful if you need to generate Zipf values with a very wide range (for database testing, in my case) since it will exhaust memory. So, I implemented the below-mentioned algorithm in C++.
I have not rigorously tested this code, and some optimizations are probably possible, but it only requires constant space and seems to work well.
#include <algorithm>
#include <cmath>
#include <random>
/** Zipf-like random distribution.
*
* "Rejection-inversion to generate variates from monotone discrete
* distributions", Wolfgang Hörmann and Gerhard Derflinger
* ACM TOMACS 6.3 (1996): 169-184
*/
template<class IntType = unsigned long, class RealType = double>
class zipf_distribution
{
public:
typedef RealType input_type;
typedef IntType result_type;
static_assert(std::numeric_limits<IntType>::is_integer, "");
static_assert(!std::numeric_limits<RealType>::is_integer, "");
zipf_distribution(const IntType n=std::numeric_limits<IntType>::max(),
const RealType q=1.0)
: n(n)
, q(q)
, H_x1(H(1.5) - 1.0)
, H_n(H(n + 0.5))
, dist(H_x1, H_n)
{}
IntType operator()(std::mt19937& rng)
{
while (true) {
const RealType u = dist(rng);
const RealType x = H_inv(u);
const IntType k = clamp<IntType>(std::round(x), 1, n);
if (u >= H(k + 0.5) - h(k)) {
return k;
}
}
}
private:
/** Clamp x to [min, max]. */
template<typename T>
static constexpr T clamp(const T x, const T min, const T max)
{
return std::max(min, std::min(max, x));
}
/** exp(x) - 1 / x */
static double
expxm1bx(const double x)
{
return (std::abs(x) > epsilon)
? std::expm1(x) / x
: (1.0 + x/2.0 * (1.0 + x/3.0 * (1.0 + x/4.0)));
}
/** H(x) = log(x) if q == 1, (x^(1-q) - 1)/(1 - q) otherwise.
* H(x) is an integral of h(x).
*
* Note the numerator is one less than in the paper order to work with all
* positive q.
*/
const RealType H(const RealType x)
{
const RealType log_x = std::log(x);
return expxm1bx((1.0 - q) * log_x) * log_x;
}
/** log(1 + x) / x */
static RealType
log1pxbx(const RealType x)
{
return (std::abs(x) > epsilon)
? std::log1p(x) / x
: 1.0 - x * ((1/2.0) - x * ((1/3.0) - x * (1/4.0)));
}
/** The inverse function of H(x) */
const RealType H_inv(const RealType x)
{
const RealType t = std::max(-1.0, x * (1.0 - q));
return std::exp(log1pxbx(t) * x);
}
/** That hat function h(x) = 1 / (x ^ q) */
const RealType h(const RealType x)
{
return std::exp(-q * std::log(x));
}
static constexpr RealType epsilon = 1e-8;
IntType n; ///< Number of elements
RealType q; ///< Exponent
RealType H_x1; ///< H(x_1)
RealType H_n; ///< H(n)
std::uniform_real_distribution<RealType> dist; ///< [H(x_1), H(n)]
};
The following line in your code is executed n times for each call to zipf():
sum_prob = sum_prob + c / pow((double) i, alpha);
It is regrettable that it is necessary to call the pow() function because, internally, this function sums not one but two Taylor series [considering that pow(x, alpha) == exp(alpha*log(x))]. If alpha is an integer, of course, then you can speed the code up a lot by replacing pow() with simple multiplication. If alpha is a rational number, then you may be able to speed the code up to a lesser degree by coding a Newton-Raphson iteration to take the place of the two Taylor series. If the last condition holds, please advise.
Fortunately, you have indicated that alpha does not change. Can you not speed the code up a lot by preparing a table of pow((double) i, alpha), then letting zipf() look numbers up the table? That way, zipf() would not have to call pow() at all. I suspect that this would save significant time.
Yet further improvements are possible. What if you factored a function sumprob() out of zipf()? Could you not prepare an even more aggressive look-up table for sumprob()'s use?
Maybe some of these ideas will move you in the right direction. See what you cannot do with them.
Update: I see that your question as now revised may not be able to use this answer. From the present point, your question may resolve into a question in complex variable theory. Such are often not easy questions, as you know. It may be that a sufficiently clever mathematician has discovered a relevant recurrence relation or some trick like the normal distribution's Box-Muller technique but, if so, I am not acquainted with the technique. Good luck. (It probably does not matter to you but, in case it does, the late N. N. Lebedev's excellent 1972 book Special Functions and Their Applications is available in English translation from the Russian in an inexpensive paperback edition. If you really, really wanted to crack this problem, you might read Lebedev next -- but, of course, that is a desperate measure, isn't it?)
As a complement to the very nice rejection-inversion implementation given above, here's a C++ class, with the same API, that is simpler and faster for a small number of bins, only. On my machine, its about 2.3x faster for N=300. It's faster because it performs a direct table lookup, instead of computing logs and powers. The table eats cache, though... Making a guess based on the size of my CPU's d-cache, I imagine that the proper rejection-inversion algo given above will become faster for something around N=35K, maybe. Also, initializing the table requires a call to std::pow() for each bin, so this wins performance only if you are drawing more than N values out of it. Otherwise, rejection-inversion is faster. Choose wisely.
(I've set up the API so it looks a lot like what the std::c++ standards committee might come up with.)
/**
* Example usage:
*
* std::random_device rd;
* std::mt19937 gen(rd());
* zipf_table_distribution<> zipf(300);
*
* for (int i = 0; i < 100; i++)
* printf("draw %d %d\n", i, zipf(gen));
*/
template<class IntType = unsigned long, class RealType = double>
class zipf_table_distribution
{
public:
typedef IntType result_type;
static_assert(std::numeric_limits<IntType>::is_integer, "");
static_assert(!std::numeric_limits<RealType>::is_integer, "");
/// zipf_table_distribution(N, s)
/// Zipf distribution for `N` items, in the range `[1,N]` inclusive.
/// The distribution follows the power-law 1/n^s with exponent `s`.
/// This uses a table-lookup, and thus provides values more
/// quickly than zipf_distribution. However, the table can take
/// up a considerable amount of RAM, and initializing this table
/// can consume significant time.
zipf_table_distribution(const IntType n,
const RealType q=1.0) :
_n(init(n,q)),
_q(q),
_dist(_pdf.begin(), _pdf.end())
{}
void reset() {}
IntType operator()(std::mt19937& rng)
{
return _dist(rng);
}
/// Returns the parameter the distribution was constructed with.
RealType s() const { return _q; }
/// Returns the minimum value potentially generated by the distribution.
result_type min() const { return 1; }
/// Returns the maximum value potentially generated by the distribution.
result_type max() const { return _n; }
private:
std::vector<RealType> _pdf; ///< Prob. distribution
IntType _n; ///< Number of elements
RealType _q; ///< Exponent
std::discrete_distribution<IntType> _dist; ///< Draw generator
/** Initialize the probability mass function */
IntType init(const IntType n, const RealType q)
{
_pdf.reserve(n+1);
_pdf.emplace_back(0.0);
for (IntType i=1; i<=n; i++)
_pdf.emplace_back(std::pow((double) i, -q));
return n;
}
};
Here's a version that is 2x faster than drobilla's original post, plus it also supports non-zero deformation parameter q (aka Hurwicz q, q-series q or quantum group deformation q) and changes notation to conform to standard usage in number theory textbooks. Rigorously tested; see unit tests at https://github.com/opencog/cogutil/blob/master/tests/util/zipfUTest.cxxtest
Dual license MIT license, or Gnu Affero, please copy into the C++ standard as desired.
/**
* Zipf (Zeta) random distribution.
*
* Implementation taken from drobilla's May 24, 2017 answer to
* https://stackoverflow.com/questions/9983239/how-to-generate-zipf-distributed-numbers-efficiently
*
* That code is referenced with this:
* "Rejection-inversion to generate variates from monotone discrete
* distributions", Wolfgang Hörmann and Gerhard Derflinger
* ACM TOMACS 6.3 (1996): 169-184
*
* Note that the Hörmann & Derflinger paper, and the stackoverflow
* code base incorrectly names the paramater as `q`, when they mean `s`.
* Thier `q` has nothing to do with the q-series. The names in the code
* below conform to conventions.
*
* Example usage:
*
* std::random_device rd;
* std::mt19937 gen(rd());
* zipf_distribution<> zipf(300);
*
* for (int i = 0; i < 100; i++)
* printf("draw %d %d\n", i, zipf(gen));
*/
template<class IntType = unsigned long, class RealType = double>
class zipf_distribution
{
public:
typedef IntType result_type;
static_assert(std::numeric_limits<IntType>::is_integer, "");
static_assert(!std::numeric_limits<RealType>::is_integer, "");
/// zipf_distribution(N, s, q)
/// Zipf distribution for `N` items, in the range `[1,N]` inclusive.
/// The distribution follows the power-law 1/(n+q)^s with exponent
/// `s` and Hurwicz q-deformation `q`.
zipf_distribution(const IntType n=std::numeric_limits<IntType>::max(),
const RealType s=1.0,
const RealType q=0.0)
: n(n)
, _s(s)
, _q(q)
, oms(1.0-s)
, spole(abs(oms) < epsilon)
, rvs(spole ? 0.0 : 1.0/oms)
, H_x1(H(1.5) - h(1.0))
, H_n(H(n + 0.5))
, cut(1.0 - H_inv(H(1.5) - h(1.0)))
, dist(H_x1, H_n)
{
if (-0.5 >= q)
throw std::runtime_error("Range error: Parameter q must be greater than -0.5!");
}
void reset() {}
IntType operator()(std::mt19937& rng)
{
while (true)
{
const RealType u = dist(rng);
const RealType x = H_inv(u);
const IntType k = std::round(x);
if (k - x <= cut) return k;
if (u >= H(k + 0.5) - h(k))
return k;
}
}
/// Returns the parameter the distribution was constructed with.
RealType s() const { return _s; }
/// Returns the Hurwicz q-deformation parameter.
RealType q() const { return _q; }
/// Returns the minimum value potentially generated by the distribution.
result_type min() const { return 1; }
/// Returns the maximum value potentially generated by the distribution.
result_type max() const { return n; }
private:
IntType n; ///< Number of elements
RealType _s; ///< Exponent
RealType _q; ///< Deformation
RealType oms; ///< 1-s
bool spole; ///< true if s near 1.0
RealType rvs; ///< 1/(1-s)
RealType H_x1; ///< H(x_1)
RealType H_n; ///< H(n)
RealType cut; ///< rejection cut
std::uniform_real_distribution<RealType> dist; ///< [H(x_1), H(n)]
// This provides 16 decimal places of precision,
// i.e. good to (epsilon)^4 / 24 per expanions log, exp below.
static constexpr RealType epsilon = 2e-5;
/** (exp(x) - 1) / x */
static double
expxm1bx(const double x)
{
if (std::abs(x) > epsilon)
return std::expm1(x) / x;
return (1.0 + x/2.0 * (1.0 + x/3.0 * (1.0 + x/4.0)));
}
/** log(1 + x) / x */
static RealType
log1pxbx(const RealType x)
{
if (std::abs(x) > epsilon)
return std::log1p(x) / x;
return 1.0 - x * ((1/2.0) - x * ((1/3.0) - x * (1/4.0)));
}
/**
* The hat function h(x) = 1/(x+q)^s
*/
const RealType h(const RealType x)
{
return std::pow(x + _q, -_s);
}
/**
* H(x) is an integral of h(x).
* H(x) = [(x+q)^(1-s) - (1+q)^(1-s)] / (1-s)
* and if s==1 then
* H(x) = log(x+q) - log(1+q)
*
* Note that the numerator is one less than in the paper
* order to work with all s. Unfortunately, the naive
* implementation of the above hits numerical underflow
* when q is larger than 10 or so, so we split into
* different regimes.
*
* When q != 0, we shift back to what the paper defined:
* H(x) = (x+q)^{1-s} / (1-s)
* and for q != 0 and also s==1, use
* H(x) = [exp{(1-s) log(x+q)} - 1] / (1-s)
*/
const RealType H(const RealType x)
{
if (not spole)
return std::pow(x + _q, oms) / oms;
const RealType log_xpq = std::log(x + _q);
return log_xpq * expxm1bx(oms * log_xpq);
}
/**
* The inverse function of H(x).
* H^{-1}(y) = [(1-s)y + (1+q)^{1-s}]^{1/(1-s)} - q
* Same convergence issues as above; two regimes.
*
* For s far away from 1.0 use the paper version
* H^{-1}(y) = -q + (y(1-s))^{1/(1-s)}
*/
const RealType H_inv(const RealType y)
{
if (not spole)
return std::pow(y * oms, rvs) - _q;
return std::exp(y * log1pxbx(oms * y)) - _q;
}
};
In the meantime there is a faster way based on rejection inversion sampling, see code here.