First let me explain the problem I'm trying to solve. I'm integrating my code with 3rd party library which does quite complicated financial predictions. For the purposes of this question let's just say I have a blackbox which returns y when I pass in x.
Now, what I need to do is find input (x) for a given output (y). Since I know lowest and highest possible input values I wrote the following algorithm:
define starting input range (minimum input value to maximum input value)
divide the range into two equal parts and find output for a middle value
find which half output falls into
repeat steps 2 and 3 until range is too small to divide any further
This algorithm does the job nicely, I don't see any problems with it. However, is there a faster way to solve this problem?
It sounds like x and y are strongly correlated (i.e. as x increases, so does y), as otherwise your divide and conquer algorithm wouldn't work.
Assumuing this is the case, and you could work out a correlation factor, then you might be able to multiply the midpoint by the correlation factor to potentially hone in the expected value quicker.
Please note that I've not tested this idea at all, but it's something to think about. Possible improvements would be to make the correlationFactor a moving average, or precompute it based on, say, the deciles between xLow and xHigh.
Also, this assumes that calling f(x) is relatively inexpensive. If it is expensive, then the increased number of calls to f(x) would dwarf any savings. In fact - I'm starting to think this is a stupid idea...
Hopefully the following pseudo-code illustrates what I mean:
DivideAndConquer(xLow, xHigh, correlationFactor, expectedValue)
xMid = (xHigh - xLow) * correlationFactor
// Add some range checks to make sure that xMid is within xLow and xHigh!!
y = f(xMid)
if (y == expectedValue)
return expectedValue
elseif (y < expectedValue)
correlationFactor = (xMid - xLow) / (f(xMid) - f(xLow))
return DivideAndConquer(xLow, xMid, correlationFactor, expectedValue)
else
correlationFactor = (xHigh - xMid) / (f(xHigh) - f(xMid))
return DivideAndConquer(xMid, xHigh, correlationFactor, expectedValue)
Related
In a cell, is it possible to do if(x=y, z, x) without having to repeat x in the value_if_false argument? Whether there is a way of using if() to make this work or another function doesn't matter, and there isn't a specific formula I'm struggling with as I come across this blocker quite often (hence posting).
To help illustrate the need, if we take x as a complex or more advanced formula, such as
ARRAYFORMULA(IF(E$6:Q$6 < EoMONTH($P$4,0), "Not Active", IF(E$6:Q$6<$Q$4 + ISBLANK($Q$4) > 0,
COUNTIF({'Data'!$B$3:$B&'Data'!$I$3:$I&'Data'!$K$3:$K},$B$4&$C9&E$6:Q$6), "Not Active")))
and I wanted to put an if statement in there that changed the result only if a condition was true, the formula would more than double in size due to having to reference x twice:
=ARRAYFORMULA(IF(IF(E$6:Q$6 < EoMONTH($P$4,0), "Not Active", IF(E$6:Q$6<$Q$4 + ISBLANK($Q$4) > 0,
COUNTIF({'Data'!$B$3:$B&'Data'!$I$3:$I&'Data'!$K$3:$K},$B$4&$C9&E$6:Q$6), "Not Active"))) = 0, "No data", IF(E$6:Q$6 < EoMONTH($P$4,0), "Not Active", IF(E$6:Q$6<$Q$4 + ISBLANK($Q$4) > 0,
COUNTIF({'Data'!$B$3:$B&'Data'!$I$3:$I&'Data'!$K$3:$K},$B$4&$C9&E$6:Q$6), "Not Active"))))
This is just an example (the code is irrelevant), I'm trying to keep my formulas neat, tidy and efficient so that handing off to others is easier. Then I'm also mindful that it is calculating the same complex formula twice, which would probably slow the spreadsheet down especially when iterated throughout a spreadsheet.
Interested to hear the community thoughts and suggestions on this, hopefully I was clear in explaining it. :)
The only simple way to achieve this would be with the use of helper columns. They don't need to be in the same sheet as your main equation, but they do need to be within that same spreadsheet as a whole (ie you could have a sheet named "calc" that's specifically used to calculate intermediate steps and set "variables" by referencing those cells).
The only other option (which gets a bit complicated) is to create a custom function within Google Apps Script. For example, if you wanted to calculate (B1*A4)/C5 in multiple places, you could create a custom function like this:
/**
* Returns a calculation using cells A4, B1, and C5.
* #return A calculation using cells A4, B1, and C5.
* #customfunction
*/
function x() {
var ss = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('MainSheet');
var val1 = ss.getRange('B1').getValue();
var val2 = ss.getRange('A4').getValue();
var val3 = ss.getRange('C5').getValue();
return (val1*val2)/val3;
}
Then in your sheet, you could use this within a formula like this:
=if(A1="yes", x(), "no")
This custom function could obviously be altered to fit one's needs (ex taking in arguments to define the cells that the calculations should be done on instead of hard coding them, etc).
Other than this, there is currently no way to define variables within a formula itself.
This is possible to a certain extent, using TEXT's Meta Instructions, if you're using numbers and simple math conditions.
x
y
z
output
10
10
5
10
=TEXT(A3,"[="&B3&"]0;"&C3&"")
x
y
z
output
11
10
5
5
As long as your complex formula returns a number for x(or the output can be coerced to a number), this should be possible and it avoids repetition.
I agree, I would love if there was like a DECODE or NVL type function you could use so that you didn't need to repeat the original statement multiple times.
However, in many cases, when I encounter this, I can often reference another cell. Not in the way that has been suggested already, where the formula exists in another cell, but rather that the decision to perform the formula is based on another cell.
For example, using your values, lets assume the formula ((if(x=y, z, x)) only gets calculated when column 'w' is populated. Maybe column 'w' is a key component of the formula. Then you can write the formula as: if(w="",z,x). It's not exactly the same as testing the answer to the equation first and doesn't work in all situations, but in many cases I can find another field that's of key relevance to the formula that lets me get around this.
A construction company has 6 projects, for each they need $d_i$ workers. The company has no workers at the beginning of project 1.
Each new worker must take a safety course that costs 300, and 50 more for each worker.
If there is no new worker there is no course.
Firing a worker does not cost any money, and a workers can't be rehired.
Given that the salary of a worker is 100 per project, formulate a linear programming problem that minimizes the workers costs.
What I tried:
Let $x_i$ be the number of new workers for project $i$.
Let $y_i$ be the number of old workers remaining from previous projects until project $i$ (all the workers hired - all the workers that were fired)
Let $z_i$ be an indicator such that $z_i =0 \iff x_i>0$
The function I'm trying to solve is:
$\min(\sum_{i=1}^6 150x_i + 300(1-z_i) + 100y_i)$
s.t:
\begin{align}
x_i,y_i,z_i &\ge 0 \\
z_i &\ge 1-x_i \\
y_i + x_i &\ge d_i \\
y_i &\ge y_{i-1} + x_i
\end{align}
Something feels not right to me. The main reason is that I tried to use matlab to solve this and it failed.
What did I do wrong? How can I solve this question?
When I see this correctly you have two small mistakes in your constraints.
The first appears when you use z_i >= 1-x_i. This allows z_i to take the value 1 all the time, which will never give you the extra cost of 300. You need to upper bound z_i such that z_i will not be 1 when you have x_i>0. For this constraint you need something called big M. For sufficiently large M you would then use z_i <= 1-x_i/M. This way when x_i=0 you can have z_i=1, otherwise the right hand side is smaller than 1 and due to integrality z_i has to be zero. Note that you usually want to choose M as tight as possible. So in your case d_i might be a good choice.
The second small mistake lays in y_i >= y_{i-1} + x_i. This way you can increase y_i over y_{i-1} without having to set any x_i. To force x_i to increase you need to flip the inequality. Additionally by the way you defined y_i this inequality should refer to x_{i-1}. Thus you should end up with y_i <= y_{i-1} + x_{i-1}. Additionally you need to take care of corner cases (i.e. y_1 = 0)
I think with these two changes it should work. Let me know whether it helped you. And if it still doesn't work I might have missed something.
My son is learning how to calculate the formula for a parabola using a directrix and focus point on his Khan Academy course. (a,b) is the focus point, k is the parameter for the directrix as y=k. I wanted to show him a simple way to check his results using Sympy; programming helps hugely in solidifying internal algorithms. Step 1 is clearly to set the equation out.
Parabola = Eq(sqrt((y-k)**2),sqrt((x-a)**2+(y-b)**2))
I first solved this for y, intending then to show how to substitute values and derive the equation, thus:
Y = solve(Parabola,y)
This was in a reasonable form, having collected the 1/(2b-2k) to the outside.
Next, I substituted the value of the focus and directrix into the equation, obtaining the equation y= 1/6*(x**2+16*x+49), which is correct.
He needed next to resolve this in a form (x+c1)(x+c2)+remainder. There does not seem to be a direct way to factor from the equation above into this form, at least not from an hour searching the docs.
Answer = Y[0].subs({a:-8,b:-1,k:-4})
factor(Answer,deep=True)
Of course I understand how to reduce to a square factorization plus remainder; my question is solely whether this is possible in sympy and, if so, how?
A second, perhaps trivial, question is why Sympy returns some factorizations as (constant - x) where (x -constant) is preferred: is there a way of specifying the form?
Thanks for any help, on behalf of my son, to whom I am showing the wonders of Sympy.
The process is usually called "completing the square". It is not implemented as a single SymPy method, but one can use the SymPy equation solver to find the coefficients of such a form of the polynomial:
>>> var('A B C')
>>> solve(Eq(Answer, A*(x-B)**2 + C), [A, B, C])
[(1/6, -8, -5/2)]
So the parabola vertex is at (8, -5/2), and the polynomial can be written as 1/6*(x+8)**2 - 5/2
This is probably a super easy question, but I just wanted to make 10000% sure before I did it.
Basically Im doing a formula for a program, it takes some certain values and does things when them.....etc..
Anyways Lets say I have some values called:
N
Links_Retrieved
True_Links
True_Retrieved.
I also have a % "scalar" ill call it, for this example lets say the % scalar is 10%.
Links Retrieved is ALWAYS half of N, so that's easy to calculate.
BUT I want True_Links to be ANYWHERE from 1-10% of Links_Retrieved.
Then I want True_Retrieved to be anywhere from The True_Links to 15% of Links_Retrieved.
How would I do this? would it be something like
True_Link=(((rand()%(Scalar(10%)-1))+1)/100);
?
I would divide by 100 to get the "percent" value IE .1 so it's be anywhere from .01 to .1?
and to do the True_retrieved it'd be
True_Retrieved=(rand()%(.15-True_Link))+True_Link;
am I doing this correct or am I WAYYYY off?
thanks
rand() is a very simple Random Number Generator. The Boost libraries include Boost.Random. In addition to random number generators, Boost.Random provides a set of classes to generate specific distirbutions. It sounds like you would want a distribution that's random between 1% and 10%, i.e. 0.01 and 0.1. That's done with boost::random::uniform_real(0.01, 0.1).
Maybe it would be better to use advanced random generator like Mersenne Twister.
rand() produces values between 0.0 and 1.0 inclusive, you have to scale that output to the interval you want. To get a value fact1 between 0.01 and 0.1 (1%-10%) you'd do:
perc1 = (rand()/RAND_MAX)*9.0+1.0; //percentage 1-10 on the 0-100 scale
fact1 = perc1/100.0; //factor 0.01 - 0.1 on the 0-1 scale
to get a value between perc1 and 0.15 you'd do:
percrange = (15.0 - perc1);
perc2 = (rand()/RAND_MAX)*percrange + perc1;
fact2 = perc2/100.0;
so your values become:
True_Links = fact1*Links_Retrieved;
True_Retrieved = fact2*Links_Retrieved;
This is sort-of-pseudocode. You should make sure parc1, perc2, fact1, fact2 and percrange are floating point values, and the final multiplications are done in floating point and rounded to integer numbers.
I'm writing some tests for a C++ command line Linux app. I'd like to generate a bunch of integers with a power-law/long-tail distribution. Meaning, I get a some numbers very frequently but most of them relatively infrequently.
Ideally there would just be some magic equations I could use with rand() or one of the stdlib random functions. If not, an easy to use chunk of C/C++ would be great.
Thanks!
This page at Wolfram MathWorld discusses how to get a power-law distribution from a uniform distribution (which is what most random number generators provide).
The short answer (derivation at the above link):
x = [(x1^(n+1) - x0^(n+1))*y + x0^(n+1)]^(1/(n+1))
where y is a uniform variate, n is the distribution power, x0 and x1 define the range of the distribution, and x is your power-law distributed variate.
If you know the distribution you want (called the Probability Distribution Function (PDF)) and have it properly normalized, you can integrate it to get the Cumulative Distribution Function (CDF), then invert the CDF (if possible) to get the transformation you need from uniform [0,1] distribution to your desired.
So you start by defining the distribution you want.
P = F(x)
(for x in [0,1]) then integrated to give
C(y) = \int_0^y F(x) dx
If this can be inverted you get
y = F^{-1}(C)
So call rand() and plug the result in as C in the last line and use y.
This result is called the Fundamental Theorem of Sampling. This is a hassle because of the normalization requirement and the need to analytically invert the function.
Alternately you can use a rejection technique: throw a number uniformly in the desired range, then throw another number and compare to the PDF at the location indeicated by your first throw. Reject if the second throw exceeds the PDF. Tends to be inefficient for PDFs with a lot of low probability region, like those with long tails...
An intermediate approach involves inverting the CDF by brute force: you store the CDF as a lookup table, and do a reverse lookup to get the result.
The real stinker here is that simple x^-n distributions are non-normalizable on the range [0,1], so you can't use the sampling theorem. Try (x+1)^-n instead...
I just wanted to carry out an actual simulation as a complement to the (rightfully) accepted answer. Although in R, the code is so simple as to be (pseudo)-pseudo-code.
One tiny difference between the Wolfram MathWorld formula in the accepted answer and other, perhaps more common, equations is the fact that the power law exponent n (which is typically denoted as alpha) does not carry an explicit negative sign. So the chosen alpha value has to be negative, and typically between 2 and 3.
x0 and x1 stand for the lower and upper limits of the distribution.
So here it is:
set.seed(0)
x1 = 5 # Maximum value
x0 = 0.1 # It can't be zero; otherwise X^0^(neg) is 1/0.
alpha = -2.5 # It has to be negative.
y = runif(1e7) # Number of samples
x = ((x1^(alpha+1) - x0^(alpha+1))*y + x0^(alpha+1))^(1/(alpha+1))
plot(density(x), ylab="log density x", col=2)
or plotted in logarithmic scale:
plot(density(x), log="xy", ylab="log density x", col=2)
Here is the summary of the data:
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1000 0.1208 0.1584 0.2590 0.2511 4.9388
I can't comment on the math required to produce a power law distribution (the other posts have suggestions) but I would suggest you familiarize yourself with the TR1 C++ Standard Library random number facilities in <random>. These provide more functionality than std::rand and std::srand. The new system specifies a modular API for generators, engines and distributions and supplies a bunch of presets.
The included distribution presets are:
uniform_int
bernoulli_distribution
geometric_distribution
poisson_distribution
binomial_distribution
uniform_real
exponential_distribution
normal_distribution
gamma_distribution
When you define your power law distribution, you should be able to plug it in with existing generators and engines. The book The C++ Standard Library Extensions by Pete Becker has a great chapter on <random>.
Here is an article about how to create other distributions (with examples for Cauchy, Chi-squared, Student t and Snedecor F)