This is probably a super easy question, but I just wanted to make 10000% sure before I did it.
Basically Im doing a formula for a program, it takes some certain values and does things when them.....etc..
Anyways Lets say I have some values called:
N
Links_Retrieved
True_Links
True_Retrieved.
I also have a % "scalar" ill call it, for this example lets say the % scalar is 10%.
Links Retrieved is ALWAYS half of N, so that's easy to calculate.
BUT I want True_Links to be ANYWHERE from 1-10% of Links_Retrieved.
Then I want True_Retrieved to be anywhere from The True_Links to 15% of Links_Retrieved.
How would I do this? would it be something like
True_Link=(((rand()%(Scalar(10%)-1))+1)/100);
?
I would divide by 100 to get the "percent" value IE .1 so it's be anywhere from .01 to .1?
and to do the True_retrieved it'd be
True_Retrieved=(rand()%(.15-True_Link))+True_Link;
am I doing this correct or am I WAYYYY off?
thanks
rand() is a very simple Random Number Generator. The Boost libraries include Boost.Random. In addition to random number generators, Boost.Random provides a set of classes to generate specific distirbutions. It sounds like you would want a distribution that's random between 1% and 10%, i.e. 0.01 and 0.1. That's done with boost::random::uniform_real(0.01, 0.1).
Maybe it would be better to use advanced random generator like Mersenne Twister.
rand() produces values between 0.0 and 1.0 inclusive, you have to scale that output to the interval you want. To get a value fact1 between 0.01 and 0.1 (1%-10%) you'd do:
perc1 = (rand()/RAND_MAX)*9.0+1.0; //percentage 1-10 on the 0-100 scale
fact1 = perc1/100.0; //factor 0.01 - 0.1 on the 0-1 scale
to get a value between perc1 and 0.15 you'd do:
percrange = (15.0 - perc1);
perc2 = (rand()/RAND_MAX)*percrange + perc1;
fact2 = perc2/100.0;
so your values become:
True_Links = fact1*Links_Retrieved;
True_Retrieved = fact2*Links_Retrieved;
This is sort-of-pseudocode. You should make sure parc1, perc2, fact1, fact2 and percrange are floating point values, and the final multiplications are done in floating point and rounded to integer numbers.
Related
drmModeModeInfo structure from DRM contains uint32_t vrefresh; field, and the values are actually good there i.e. I’m getting 24-75Hz for different video modes. But refresh rates aren’t integers, they’re rational numbers, right now for my display the value is 59997/1000.
Is it possible to get the precise numbers on Linux? Or at least a floating point value?
The numerator is drmModeModeInfo::clock * 1000, the clock field is in kilohertz and we need Hz for the formula.
The denominator is the product of drmModeModeInfo::htotal and drmModeModeInfo::vtotal fields.
For better result, I simplify the rational by dividing both numerator and denominator by their greatest common divisor. For this part, I have used an algorithm from Wikipedia.
I’m not sure if current displays support signal frequencies above 2^32 Hz = 4.29 GHz, but even if they do not, future ones may do so, i.e. you better use 64-bit integer math there.
I simply use this:
drmModeModeInfoPtr mode = 0;
.
.
double freq = mode->clock * 1000.0f / (mode->htotal * mode->vtotal);
freq = round(freq * 1000.0f) / 1000.0f;
I have one number which I need to find the ceiling and the floor value of (203,400) in order to use this number to create a weighted average. From this number I want: 200,000 and 210,000 so the code I was using that doesn't work is:
S1CovA_ceil = ceil(S1CovA,10000);
S1CovA_floor = floor(S1CovA,10000);
When I run this program, I get these errors:
ERROR 72-185: The CEIL function call has too many arguments.
ERROR 72-185: The FLOOR function call has too many arguments.
Does anybody know a way around this or different SAS code I could use?
CEIL and FLOOR only remove decimals - specifically rounding to integer value. If you want it rounded to (above/below) multiple of 10,000, you have to do it a bit more complicatedly:
S1CovA_ceil = ceil(s1covA/10000)*10000;
And the same for floor. Basically you have to divide it by the desired rounding level, round the rest with ceil/floor, and then multiply back.
Unfortunately, as far as I'm aware, SAS doesn't allow rounding in a particular direction except for straight integer rounding.
You can also use the round() function...
%LET ROUNDTO = 10000 ;
data xyz ;
S1CovA_ceil = round(S1CovA+(&ROUNDTO / 2),&ROUNDTO) ;
S1CovA_floor = round(S1CovA-(&ROUNDTO / 2),&ROUNDTO) ;
run ;
Try
S1CovA_ceil = ceil(S1CovA/10000)*10000;
S1CovA_floor = floor(S1CovA/10000)*10000;
First let me explain the problem I'm trying to solve. I'm integrating my code with 3rd party library which does quite complicated financial predictions. For the purposes of this question let's just say I have a blackbox which returns y when I pass in x.
Now, what I need to do is find input (x) for a given output (y). Since I know lowest and highest possible input values I wrote the following algorithm:
define starting input range (minimum input value to maximum input value)
divide the range into two equal parts and find output for a middle value
find which half output falls into
repeat steps 2 and 3 until range is too small to divide any further
This algorithm does the job nicely, I don't see any problems with it. However, is there a faster way to solve this problem?
It sounds like x and y are strongly correlated (i.e. as x increases, so does y), as otherwise your divide and conquer algorithm wouldn't work.
Assumuing this is the case, and you could work out a correlation factor, then you might be able to multiply the midpoint by the correlation factor to potentially hone in the expected value quicker.
Please note that I've not tested this idea at all, but it's something to think about. Possible improvements would be to make the correlationFactor a moving average, or precompute it based on, say, the deciles between xLow and xHigh.
Also, this assumes that calling f(x) is relatively inexpensive. If it is expensive, then the increased number of calls to f(x) would dwarf any savings. In fact - I'm starting to think this is a stupid idea...
Hopefully the following pseudo-code illustrates what I mean:
DivideAndConquer(xLow, xHigh, correlationFactor, expectedValue)
xMid = (xHigh - xLow) * correlationFactor
// Add some range checks to make sure that xMid is within xLow and xHigh!!
y = f(xMid)
if (y == expectedValue)
return expectedValue
elseif (y < expectedValue)
correlationFactor = (xMid - xLow) / (f(xMid) - f(xLow))
return DivideAndConquer(xLow, xMid, correlationFactor, expectedValue)
else
correlationFactor = (xHigh - xMid) / (f(xHigh) - f(xMid))
return DivideAndConquer(xMid, xHigh, correlationFactor, expectedValue)
i am using MS Visual studio 2010.
and now I would like to generate a random number in the range from 3 to 200 by a log normal distribution.
I heard that "central limit theorem" can convert the uniform distribution to normal distribution, but it seem too much work for me, because my range has 198 numbers :
a = random(MaxRange+1); // mean i have to write this for 198 time???!!!!
x = (a+.......)/198 ; //this will obtain a number which is a normal distribution right???
then, may i just write
y = log (x); // and is this mean that y is log normal distribution????
thanks for answering my question....
Well random will give you uniformly distributed random numbers as you said correctly. In order to generator variables with normal distribution you can use the Box-Muller transformation which is simple to implement.
Next you need to generate your lognormal variable v. By calculating v = exp(mu + sig * n) where n is your normal distributed random variable.
I don't quite understand what you mean with range 3 to 200 as the lognormal distribution has support ]0,inf[
You may want to look at the lognormal_distribution class inside Boost random library. See here for an example of how to generate numbers from a given distribution (you have to instantiate a boost::variate_generator with a given random number generator plus an instance of the distribution).
Further to Azrael3000's answer,
Let the lognormal variable lgn is generated as lgn = exp(mu + sig * stdn) where stdn is the standard normal variable, then we must note that the mu and sig for the equation above are given as:
if m and v are the mean and variance of the non-logarithmized sample values
Ref: wiki - Log-normal_distribution
I'm writing some tests for a C++ command line Linux app. I'd like to generate a bunch of integers with a power-law/long-tail distribution. Meaning, I get a some numbers very frequently but most of them relatively infrequently.
Ideally there would just be some magic equations I could use with rand() or one of the stdlib random functions. If not, an easy to use chunk of C/C++ would be great.
Thanks!
This page at Wolfram MathWorld discusses how to get a power-law distribution from a uniform distribution (which is what most random number generators provide).
The short answer (derivation at the above link):
x = [(x1^(n+1) - x0^(n+1))*y + x0^(n+1)]^(1/(n+1))
where y is a uniform variate, n is the distribution power, x0 and x1 define the range of the distribution, and x is your power-law distributed variate.
If you know the distribution you want (called the Probability Distribution Function (PDF)) and have it properly normalized, you can integrate it to get the Cumulative Distribution Function (CDF), then invert the CDF (if possible) to get the transformation you need from uniform [0,1] distribution to your desired.
So you start by defining the distribution you want.
P = F(x)
(for x in [0,1]) then integrated to give
C(y) = \int_0^y F(x) dx
If this can be inverted you get
y = F^{-1}(C)
So call rand() and plug the result in as C in the last line and use y.
This result is called the Fundamental Theorem of Sampling. This is a hassle because of the normalization requirement and the need to analytically invert the function.
Alternately you can use a rejection technique: throw a number uniformly in the desired range, then throw another number and compare to the PDF at the location indeicated by your first throw. Reject if the second throw exceeds the PDF. Tends to be inefficient for PDFs with a lot of low probability region, like those with long tails...
An intermediate approach involves inverting the CDF by brute force: you store the CDF as a lookup table, and do a reverse lookup to get the result.
The real stinker here is that simple x^-n distributions are non-normalizable on the range [0,1], so you can't use the sampling theorem. Try (x+1)^-n instead...
I just wanted to carry out an actual simulation as a complement to the (rightfully) accepted answer. Although in R, the code is so simple as to be (pseudo)-pseudo-code.
One tiny difference between the Wolfram MathWorld formula in the accepted answer and other, perhaps more common, equations is the fact that the power law exponent n (which is typically denoted as alpha) does not carry an explicit negative sign. So the chosen alpha value has to be negative, and typically between 2 and 3.
x0 and x1 stand for the lower and upper limits of the distribution.
So here it is:
set.seed(0)
x1 = 5 # Maximum value
x0 = 0.1 # It can't be zero; otherwise X^0^(neg) is 1/0.
alpha = -2.5 # It has to be negative.
y = runif(1e7) # Number of samples
x = ((x1^(alpha+1) - x0^(alpha+1))*y + x0^(alpha+1))^(1/(alpha+1))
plot(density(x), ylab="log density x", col=2)
or plotted in logarithmic scale:
plot(density(x), log="xy", ylab="log density x", col=2)
Here is the summary of the data:
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1000 0.1208 0.1584 0.2590 0.2511 4.9388
I can't comment on the math required to produce a power law distribution (the other posts have suggestions) but I would suggest you familiarize yourself with the TR1 C++ Standard Library random number facilities in <random>. These provide more functionality than std::rand and std::srand. The new system specifies a modular API for generators, engines and distributions and supplies a bunch of presets.
The included distribution presets are:
uniform_int
bernoulli_distribution
geometric_distribution
poisson_distribution
binomial_distribution
uniform_real
exponential_distribution
normal_distribution
gamma_distribution
When you define your power law distribution, you should be able to plug it in with existing generators and engines. The book The C++ Standard Library Extensions by Pete Becker has a great chapter on <random>.
Here is an article about how to create other distributions (with examples for Cauchy, Chi-squared, Student t and Snedecor F)