How to best handle the numerator blank in DIVIDE? - powerbi

Divide handles division by zero error by returning alternate result. However if the numerator is blank then it returns blank.
There are 3 ways to solve this:
COALESCE(DIVIDE(n,d),0)
Use IF to check whether ISBLANK(n) then return 0 otherwise perform the division.
Use +0 - example: DIVIDE(n+0,d)
Which of the above is the most performant and clean approach?

The most performant and clean approach is to leave the blanks. You should be really careful when converting blanks due to the issues described in the link above, since you force a value to always return, when used in a table or chart, it will always show values for groupings where it shouldn't have to.
If you are using the measure in a card, you can get away with using any of the 3, or DIVIDE(n, d, 0) as the difference in performance between them should be negligible.

Related

DoCPLEX Solving LP Problem Partially at a time

I am working on Linear Programming Problem with 800K Constraints and the problem takes 20 mins to solve but if I solve the problem for half horizon it just takes 1 min. Is there a way in DoCPLEX where I can solve for partial horizon and then use the solution to solve for other half of the problem without using a for-loop
Three suggestions:
load your problem as LP or SAV into cplex interactive optimizer and run display problem stats. This might show (or rule out) precision issues (ill-conditioned problem). Also it will output number of nonzeros
set datacheck parameters to 2, this might detect numerical issues in data
have you tried different LP algorithms? Using the lpmethod parameter you could try primal, dual or barrier algorithm to see whether one runs faster on your problem.
Reference:
https://www.ibm.com/support/knowledgecenter/SSSA5P_12.10.0/ilog.odms.cplex.help/CPLEX/Parameters/topics/LPMETHOD.html
In DOcplex:
model.parameters.datacheck = 2
model.parameters.lpmethod = 4 # for barrier
From your answers, I can think of the following:
if you are in pure LP (is this true?) I see no point in rounding numbers (but yes, that would help in a MIP, try rounding coefficients whose fractional part is say less than 1e-7: 4.0000001 -> 4)
1e+14 conditioning denotes serious modeling issue: a common source is mixing different objectives with coefficients. Have you tried multi-objective to avoid that?
Another source is big_M formulations, to which you should prefer indicator constraints. If you are not in these two cases, then try to renormalize the data to keep in a smaller condition range...
Finally, you might try setting markowitz tolerance to 0.99, to add extra cautiouness in simplex factorizations, but behavior may vary from one dataset to the other...

stata: inequality constraint in xttobit

Is it possible to constrain parameters in Stata's xttobit to be non-negative? I read a paper where the authors said they did just that, and I am trying to work out how.
I know that you can constrain parameters to be strictly positive by exponentially transforming the variables (e.g. gen x1_e = exp(x1)) and then calling nlcom after estimation (e.g. nlcom exp(_b[x1:_y]) where y is the independent variable. (That may not be exactly right, but I am pretty sure the general idea is correct. Here is a similar question from Statlist re: nlsur).
But what would a non-negative constraint look like? I know that one way to proceed is by transforming the variables, for example squaring them. However, I tried this with the author's data and still found negative estimates from xttobit. Sorry if this is a trivial question, but it has me a little confused.
(Note: this was first posted on CV by mistake. Mea culpa.)
Update: It seems I misunderstand what transformation means. Suppose we want to estimate the following random effects model:
y_{it} = a + b*x_{it} + v_i + e_{it}
where v_i is the individual random effect for i and e_{it} is the idiosyncratic error.
From the first answer, would, say, an exponential transformation to constrain all coefficients to be positive look like:
y_{it} = exp(a) + exp(b)*x_{it} + v_i + e_{it}
?
I think your understanding of constraining parameters by transforming the associated variable is incorrect. You don't transform the variable, but rather you fit your model having reexpressed your model in terms of transformed parameters. For more details, see the FAQ at http://www.stata.com/support/faqs/statistics/regression-with-interval-constraints/, and be prepared to work harder on your problem than you might have expected to, since you will need to replace the use of xttobit with mlexp for the transformed parameterization of the tobit log-likelihood function.
With regard to the difference between non-negative and strictly positive constraints, for continuous parameters all such constraints are effectively non-negative, because (for reasonable parameterization) a strictly positive constraint can be made arbitrarily close to zero.

When to use machine epsilon and when not to?

I'm reading a book about rendering 3d graphics and the author sometimes uses epsilon and sometimes doesn't.
Notice the if at the beginning using epsilon and the other ifs that don't.
What's the logic behind this? I can see he avoids any chance for division by zero but when not using epsilon in the function there's still a chance it will return a value that will make the outer code to divide by zero.
Book is Real-Time Rendering 3rd Edition, by the way.
The first statement, if(|f| > ϵ) is just checking to make sure f is significantly different from 0. It's important to do that in that specific spot in the code because the next two statements divide by f.
The other statements don't need to do that, so they don't need to use ϵ.
For example,
if(t1 > t2) swap(t1, t2);
is a self-contained statement that compares two numbers to each other and swaps them if the wrong one is greater. Since it's not comparing to see if a value is close to 0, there's no need to use ϵ.
If the value that is returned from this block of code can make the calling code divide by zero, that should be handled in the calling code.

Another way to calculate double type variables in c++?

Short version of the question: overflow or timeout in current settings when calculating large int64_t and double, anyway to avoid these?
Test case:
If only demand is 80,000,000,000, solved with correct result. But if it's 800,000,000,000, returned incorrect 0.
If input has two or more demands (means more inequalities need to be calculated), smaller value will also cause incorrectness. e.g., three equal demands of 20,000,000,000 will cause the problem.
I'm using COIN-OR CLP linear programming solver to solve some network flow problems. I use int64_t when representing the link bandwidth. But CLP uses double most of time and cannot transfer to other types easily.
When the values of the variables are not that large (typically smaller than 10,000,000,000) and the constraints (inequalities) are relatively few, it will give the solution I want it to. But if either of the above factors increases, the tool will stop and return a 0 value solution. I think the reason is the calculation complexity is over its maximum, so program breaks at some trivial point (it uses LP simplex method).
The inequality is some kind of:
totalFlowSum <= usePercentage * demand
I changed it to
totalFlowSum - usePercentage * demand <= 0
Since totalFLowSum and demand are very large int64_t, usePercentage is double, if the constraints like this are too many (several or even more), or if the demand is larger than 100,000,000,000, the returned solution will be wrong.
Is there any way to correct this, like increase the break threshold or avoid this level of calculation magnitude?
Decrease some accuracy is acceptable. I have a possible solution is that 1,000 times smaller on inputs and 1,000 time larger on outputs. But this is kind of naïve and may cause too much code modification in the program.
Update:
I have changed the formulation to
totalFlowSum / demand - usePercentage <= 0
but the problem still exists.
Update 2:
I divided usePercentage by 1000, making its coefficient from 1 to 0.001, it worked. But if I also divide totalFlowSum/demand by 1000 simultaneously, still no result. I don't know why...
I changed the rhs of equalities from 0 to 0.1, the problem is then solved! Since the inputs are very large, 0.1 offset won't impact the solution at all.
I think the reason is that previous coeffs are badly scaled, so the complier failed to find an exact answer.

How does this C++ function use memoization?

#include <vector>
std::vector<long int> as;
long int a(size_t n){
if(n==1) return 1;
if(n==2) return -2;
if(as.size()<n+1)
as.resize(n+1);
if(as[n]<=0)
{
as[n]=-4*a(n-1)-4*a(n-2);
}
return mod(as[n], 65535);
}
The above code sample using memoization to calculate a recursive formula based on some input n. I know that this uses memoization, because I have written a purely recursive function that uses the same formula, but this one much, much faster for much larger values of n. I've never used vectors before, but I've done some research and I understand the concept of them. I understand that memoization is supposed to store each calculated value, so that instead of performing the same calculations over again, it can simply retrieve ones that have already been calculated.
My question is: how is this memoization, and how does it work? I can't seem to see in the code at which point it checks to see if a value for n already exists. Also, I don't understand the purpose of the if(as[n]<=0). This formula can yield positive and negative values, so I'm not sure what this check is looking for.
Thank you, I think I'm close to understanding how this works, it's actually a bit more simple than I was thinking it was.
I do not think the values in the sequence can ever be 0, so this should work for me, as I think n has to start at 1.
However, if zero was a viable number in my sequence, what is another way I could solve it? For example, what if five could never appear? Would I just need to fill my vector with fives?
Edit: Wow, I got a lot of other responses while checking code and typing this one. Thanks for the help everyone, I think I understand it now.
if (as[n] <= 0) is the check. If valid values can be negative like you say, then you need a different sentinel to check against. Can valid values ever be zero? If not, then just make the test if (as[n] == 0). This makes your code easier to write, because by default vectors of ints are filled with zeroes.
The code appears to be incorrectly checking is (as[n] <= 0), and recalculates the negative values of the function(which appear to be approximately every other value). This makes the work scale linearly with n instead of 2^n with the recursive solution, so it runs a lot faster.
Still, a better check would be to test if (as[n] == 0), which appears to run 3x faster on my system. Even if the function can return 0, a 0 value just means it will take slightly longer to compute (although if 0 is a frequent return value, you might want to consider a separate vector that flags whether the value has been computed or not instead of using a single vector to store the function's value and whether it has been computed)
If the formula can yield both positive and negative values then this function has a serious bug. The check if(as[n]<=0) is supposed to be checking if it had already cached this value of computation. But if the formula can be negative this function recalculates this cached value alot...
What it really probably wanted was a vector<pair<bool, unsigned> >, where the bool says if the value has been calculated or not.
The code, as posted, only memoizes about 40% of the time (precisely when the remembered value is positive). As Chris Jester-Young pointed out, a correct implementation would instead check if(as[n]==0). Alternatively, one can change the memoization code itself to read as[n]=mod(-4*a(n-1)-4*a(n-2),65535);
(Even the ==0 check would spend effort when the memoized value was 0. Luckily, in your case, this never happens!)
There's a bug in this code. It will continue to recalculate the values of as[n] for as[n] <= 0. It will memoize the values of a that turn out to be positive. It works a lot faster than code without the memoization because there are enough positive values of as[] so that the recursion is terminated quickly. You could improve this by using a value of greater than 65535 as a sentinal. The new values of the vector are initialized to zero when the vector expands.