generate a random number from log normal distribution in C/C++ - c++

i am using MS Visual studio 2010.
and now I would like to generate a random number in the range from 3 to 200 by a log normal distribution.
I heard that "central limit theorem" can convert the uniform distribution to normal distribution, but it seem too much work for me, because my range has 198 numbers :
a = random(MaxRange+1); // mean i have to write this for 198 time???!!!!
x = (a+.......)/198 ; //this will obtain a number which is a normal distribution right???
then, may i just write
y = log (x); // and is this mean that y is log normal distribution????
thanks for answering my question....

Well random will give you uniformly distributed random numbers as you said correctly. In order to generator variables with normal distribution you can use the Box-Muller transformation which is simple to implement.
Next you need to generate your lognormal variable v. By calculating v = exp(mu + sig * n) where n is your normal distributed random variable.
I don't quite understand what you mean with range 3 to 200 as the lognormal distribution has support ]0,inf[

You may want to look at the lognormal_distribution class inside Boost random library. See here for an example of how to generate numbers from a given distribution (you have to instantiate a boost::variate_generator with a given random number generator plus an instance of the distribution).

Further to Azrael3000's answer,
Let the lognormal variable lgn is generated as lgn = exp(mu + sig * stdn) where stdn is the standard normal variable, then we must note that the mu and sig for the equation above are given as:
if m and v are the mean and variance of the non-logarithmized sample values
Ref: wiki - Log-normal_distribution

Related

how to randomly generate numbers from normal distribution in fortran given mean and standard deviationin fortran

hi im kind of new to using distributions in fortran, and not a very experienced coder, my questions is to generate "n" random numbers from a normal distribution given the mean and standard deviation. and honestly I have no idea about how to get the normal dist or how to us box mueller and all that stuff
the expected output is a number of randomly generated numbers from the normal dist

Get an interval of values possible in regression using sci-kit learn machine learning

I am trying to use regression to predict a value. For a given set of independent variables, I get a fixed number as the expected value. However, is it possible to get a range of value, so as to say that the maximum possible value be say x and minimum possible value be say y.
Using
regr = linear_model.LinearRegression()
regr.fit(X_train, Y_train)
pred = regr.predict([[a, b]])
The value of pred comes out be say 10 , but i would rather want something like max = 12 and min = 8
Simply saying a range of values.
UPDATE
Tried looking into GMM, not sure if that work for this.
Tried Gausian processes but it again give a single value something like 11.137631, which really doesn't as i am looking for a range of value rather than a single value.
The linear regression always gives same result for a given input vector, however using a random forest regressor in iteration gives different result on each iteration and that can be used to get a Minimum and maximum possible value from a given input vector forecast.

how can I get the lognormal_distribution from boost::random to act like boost::math::lognormal_distribution

I want to get a boost::variate_generator which gives me numbers distributed to the lognormal distribution according to http://en.wikipedia.org/wiki/Log-normal_distribution.
There is a distribution in boost::math that implements the formula from the wikipedia entry, but it doesn't work with the variate_generator.
And the one from boost::random, that works with the variate_generator, is somewhat different from the above mentioned.
http://www.boost.org/doc/libs/1_46_1/doc/html/boost/lognormal_distribution.html. Mu needs to be >0 and mu and sigma are calculated instead of just used as given.
Does someone have any idea how I can get it to work with the former formula?
EDIT
# Howard Hinnant:
there is this init() function which gets called in the constructor. So the formula is the same, but sigma and mean get calculated this way (why I don't know):
_nmean = log(_mean*_mean/sqrt(_sigma*_sigma + _mean*_mean));
_nsigma = sqrt(log(_sigma*_sigma/_mean/_mean+result_type(1)));
I may be mistaken, but from inspecting the boost code, it looks to me like the boost lognormal_distribution is consistent with the wikipedia description. The documentation would be consistent if the boost documentation dropped the N subscript from mu and sigma.

Calculating a Random for C++

This is probably a super easy question, but I just wanted to make 10000% sure before I did it.
Basically Im doing a formula for a program, it takes some certain values and does things when them.....etc..
Anyways Lets say I have some values called:
N
Links_Retrieved
True_Links
True_Retrieved.
I also have a % "scalar" ill call it, for this example lets say the % scalar is 10%.
Links Retrieved is ALWAYS half of N, so that's easy to calculate.
BUT I want True_Links to be ANYWHERE from 1-10% of Links_Retrieved.
Then I want True_Retrieved to be anywhere from The True_Links to 15% of Links_Retrieved.
How would I do this? would it be something like
True_Link=(((rand()%(Scalar(10%)-1))+1)/100);
?
I would divide by 100 to get the "percent" value IE .1 so it's be anywhere from .01 to .1?
and to do the True_retrieved it'd be
True_Retrieved=(rand()%(.15-True_Link))+True_Link;
am I doing this correct or am I WAYYYY off?
thanks
rand() is a very simple Random Number Generator. The Boost libraries include Boost.Random. In addition to random number generators, Boost.Random provides a set of classes to generate specific distirbutions. It sounds like you would want a distribution that's random between 1% and 10%, i.e. 0.01 and 0.1. That's done with boost::random::uniform_real(0.01, 0.1).
Maybe it would be better to use advanced random generator like Mersenne Twister.
rand() produces values between 0.0 and 1.0 inclusive, you have to scale that output to the interval you want. To get a value fact1 between 0.01 and 0.1 (1%-10%) you'd do:
perc1 = (rand()/RAND_MAX)*9.0+1.0; //percentage 1-10 on the 0-100 scale
fact1 = perc1/100.0; //factor 0.01 - 0.1 on the 0-1 scale
to get a value between perc1 and 0.15 you'd do:
percrange = (15.0 - perc1);
perc2 = (rand()/RAND_MAX)*percrange + perc1;
fact2 = perc2/100.0;
so your values become:
True_Links = fact1*Links_Retrieved;
True_Retrieved = fact2*Links_Retrieved;
This is sort-of-pseudocode. You should make sure parc1, perc2, fact1, fact2 and percrange are floating point values, and the final multiplications are done in floating point and rounded to integer numbers.

Random number generator that produces a power-law distribution?

I'm writing some tests for a C++ command line Linux app. I'd like to generate a bunch of integers with a power-law/long-tail distribution. Meaning, I get a some numbers very frequently but most of them relatively infrequently.
Ideally there would just be some magic equations I could use with rand() or one of the stdlib random functions. If not, an easy to use chunk of C/C++ would be great.
Thanks!
This page at Wolfram MathWorld discusses how to get a power-law distribution from a uniform distribution (which is what most random number generators provide).
The short answer (derivation at the above link):
x = [(x1^(n+1) - x0^(n+1))*y + x0^(n+1)]^(1/(n+1))
where y is a uniform variate, n is the distribution power, x0 and x1 define the range of the distribution, and x is your power-law distributed variate.
If you know the distribution you want (called the Probability Distribution Function (PDF)) and have it properly normalized, you can integrate it to get the Cumulative Distribution Function (CDF), then invert the CDF (if possible) to get the transformation you need from uniform [0,1] distribution to your desired.
So you start by defining the distribution you want.
P = F(x)
(for x in [0,1]) then integrated to give
C(y) = \int_0^y F(x) dx
If this can be inverted you get
y = F^{-1}(C)
So call rand() and plug the result in as C in the last line and use y.
This result is called the Fundamental Theorem of Sampling. This is a hassle because of the normalization requirement and the need to analytically invert the function.
Alternately you can use a rejection technique: throw a number uniformly in the desired range, then throw another number and compare to the PDF at the location indeicated by your first throw. Reject if the second throw exceeds the PDF. Tends to be inefficient for PDFs with a lot of low probability region, like those with long tails...
An intermediate approach involves inverting the CDF by brute force: you store the CDF as a lookup table, and do a reverse lookup to get the result.
The real stinker here is that simple x^-n distributions are non-normalizable on the range [0,1], so you can't use the sampling theorem. Try (x+1)^-n instead...
I just wanted to carry out an actual simulation as a complement to the (rightfully) accepted answer. Although in R, the code is so simple as to be (pseudo)-pseudo-code.
One tiny difference between the Wolfram MathWorld formula in the accepted answer and other, perhaps more common, equations is the fact that the power law exponent n (which is typically denoted as alpha) does not carry an explicit negative sign. So the chosen alpha value has to be negative, and typically between 2 and 3.
x0 and x1 stand for the lower and upper limits of the distribution.
So here it is:
set.seed(0)
x1 = 5 # Maximum value
x0 = 0.1 # It can't be zero; otherwise X^0^(neg) is 1/0.
alpha = -2.5 # It has to be negative.
y = runif(1e7) # Number of samples
x = ((x1^(alpha+1) - x0^(alpha+1))*y + x0^(alpha+1))^(1/(alpha+1))
plot(density(x), ylab="log density x", col=2)
or plotted in logarithmic scale:
plot(density(x), log="xy", ylab="log density x", col=2)
Here is the summary of the data:
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1000 0.1208 0.1584 0.2590 0.2511 4.9388
I can't comment on the math required to produce a power law distribution (the other posts have suggestions) but I would suggest you familiarize yourself with the TR1 C++ Standard Library random number facilities in <random>. These provide more functionality than std::rand and std::srand. The new system specifies a modular API for generators, engines and distributions and supplies a bunch of presets.
The included distribution presets are:
uniform_int
bernoulli_distribution
geometric_distribution
poisson_distribution
binomial_distribution
uniform_real
exponential_distribution
normal_distribution
gamma_distribution
When you define your power law distribution, you should be able to plug it in with existing generators and engines. The book The C++ Standard Library Extensions by Pete Becker has a great chapter on <random>.
Here is an article about how to create other distributions (with examples for Cauchy, Chi-squared, Student t and Snedecor F)