Why Confidence does not consider B in Association rule mining - data-mining

For the ARM rule A->B, confidence can be calculated as AUB/A, which gives you out of all transactions in which A is present, how many have A and B together.
My feeling is that in confidence calculation we are giving more importance to A rather than B. But B should also be considered equally valuable in determining the importance of a rule.
Precisely, I want to say that there should be another term (may be called inverse of confidence) which equals AUB/B.
Any kind of explanations are welcome....

The equation wasn't chosen for fun or intuition.
There is a reason why the rule is A -> B, and not A <- B...
The mathematical reason to define it this way is called conditional probabilitiy.
The decision rule A -> B is a conditional rule. If A then B. Not the other way round... the formula you propose is for if B then A.

Related

Pyomo - How to consider all scenarios in the objective function within the ReferenceModel?

I am facing a problem in defining the objective function shown in Figure: Objective Function.
In here, p_n = 1/#Scenarios and W-n: Final Wealth in Scenario n. Scenarios are equiprobable, so p_n is known in advance.
The problem is: How can I define the sum of the cross-product of p_n and W_n over EVERY scenario, if the Deterministic ReferenceModel considers only one scenario? In other terms, given that the set of scenarios is added only after having defined the objective function in the deterministic model, how can I consider all the scenarios?
Moreover, I have to count the number of times in which the Final Wealth is lower than the Target Wealth, in a given pre-specified stage. How can I set the counter, if the Deterministic ReferenceModel considers just one scenario?
I am a new Pyomo user, and a quantitative finance student with no much knowledge about Operation Research, so I apologize in advance if the questions are obvious or silly.
Thank you!

How to generate meaningful examples to test n-ary properties with property-based testing?

In a property-based test setting like Haskell's quickcheck for custom data structures, how do you generate test data for n-ary properties of relations, e.g., transitivity or symmetry? The implementation language does not matter, I think.
Here is a naive C++ example using rapidcheck (just because I have this tool at hand, right now):
rc::check("Double equality is symmetric.", [](double a, double b) {
RC_ASSERT(!(a == b) || (b == a)); // a == b ==> b == a
});
In such a naive case it is quite unlikely that the tool will generate many examples where the premise (a == b) actually holds, so you end up wasting a lot of effort on meaningless tests. It gets even worse for 3-ary relations like transitivity.
Is there a general technique to tackle these issues? Do I need to generate equal pairs (for some constructive definition of "equals")? What about stuff like orderings?
What I do to raise the probability of value clashes is to restrict value generation to a smaller range and sometimes combine that with a more general generator.
Consider the following generator adapted from https://johanneslink.net/how-to-specify-it/#46-a-note-on-generation:
#Provide
Arbitrary<Integer> keys() {
return Arbitraries.oneOf(
Arbitraries.integers().between(-25, 25),
Arbitraries.integers()
);
}
Generation will first choose with equal probability between any integer and an integer between -25 and +25. Thus about every 100th value will be a duplicate.
In more difficult cases I might even have a generator that picks from a small set of predefined values.
UPDATE: The latest version of jqwik allows to explicitly generate duplicates with a given probability: https://jqwik.net/docs/snapshot/user-guide.html#inject-duplicate-values
I don't know, though, if QuickCheck or any other PBT library has a similar feature.

If condition in Cplex objective function

I am new to Cplex. I'm solving an integer programming problem, but I have a problem with objective function.
Problem is that I have some project, that have a due date D, and if project is tardy, than I have a tardiness penalty b, so it looks like b*(cn-D). Where cn is a real complection time of a project, and it is decision variable.
It must look like this
if (cn-D)>=0 then b*(cn-D)==0
I tried to use "if-then" constraint, but seems that it isn't working with decision variable.
I looked on question similar to this, but but could not find solution. Please, help me define correct objective function.
The standard way to model this is:
min sum(i, penalty(i)*Tardy(i))
Tardy(i) >= CompletionTime(i) - DueDate(i)
Tardy(i) >= 0
Tardy is a non-negative variable and can never become negative. The other quantities are:
penalty: a constant indicating the cost associated with job i being tardy one unit of time.
CompletionTime: a variable that holds the completion time of job i
DueDate: a constant with the due date of job i.
The above measures the sum. Sometimes we also want to measure the count: the number of jobs that are tardy. This is to prevent many jobs being tardy. In the most general case one would have both the sum and the count in the objective with different weights or penalties.
There is a virtually unlimited number of papers showing MIP formulations on scheduling models that involve tardiness. Instead of reinventing the wheel, it may be useful to consult some of them and see what others have done to formulate this.

stata: inequality constraint in xttobit

Is it possible to constrain parameters in Stata's xttobit to be non-negative? I read a paper where the authors said they did just that, and I am trying to work out how.
I know that you can constrain parameters to be strictly positive by exponentially transforming the variables (e.g. gen x1_e = exp(x1)) and then calling nlcom after estimation (e.g. nlcom exp(_b[x1:_y]) where y is the independent variable. (That may not be exactly right, but I am pretty sure the general idea is correct. Here is a similar question from Statlist re: nlsur).
But what would a non-negative constraint look like? I know that one way to proceed is by transforming the variables, for example squaring them. However, I tried this with the author's data and still found negative estimates from xttobit. Sorry if this is a trivial question, but it has me a little confused.
(Note: this was first posted on CV by mistake. Mea culpa.)
Update: It seems I misunderstand what transformation means. Suppose we want to estimate the following random effects model:
y_{it} = a + b*x_{it} + v_i + e_{it}
where v_i is the individual random effect for i and e_{it} is the idiosyncratic error.
From the first answer, would, say, an exponential transformation to constrain all coefficients to be positive look like:
y_{it} = exp(a) + exp(b)*x_{it} + v_i + e_{it}
?
I think your understanding of constraining parameters by transforming the associated variable is incorrect. You don't transform the variable, but rather you fit your model having reexpressed your model in terms of transformed parameters. For more details, see the FAQ at http://www.stata.com/support/faqs/statistics/regression-with-interval-constraints/, and be prepared to work harder on your problem than you might have expected to, since you will need to replace the use of xttobit with mlexp for the transformed parameterization of the tobit log-likelihood function.
With regard to the difference between non-negative and strictly positive constraints, for continuous parameters all such constraints are effectively non-negative, because (for reasonable parameterization) a strictly positive constraint can be made arbitrarily close to zero.

Linear form of function (a/b) for ampl/cplex

I am trying to solve a minimisation problem and I want to minimise an expression
a/b
Where both a & b are variables. Hence this is not a linear problem...
How can I transform this function into an other one (being a linear one).
There is a detailed section on how to handle ratios in Linear Programming on the lpsolve site. It should be general enough to apply to AMPL and CPLEX as well.
There are several ways to do this, but the simplest to explain requires that you solve a series of linear programs. First, remove the objective and add a constraint
a <= c * b
Where c is a known upper bound on the solution. Then do a binary search on c you can a range where c_l, c_u where the problem is infeasible for
a <= c_l * b
but feasible for
a <= c_u * b
The general form of the obj should be a linear fractional function, something like f_{0}(x)=(c^Tx+d)/(e^Tx+f). For your case, X=(a,b),c=(1,0),(e=0,1),d=f=0.
To solve this kind of opt, something called linear fractional programming can be used. it's like linear constrainted version of linear fractional function and Charnes-Cooper transformation is applied to transform into a LP. You can find the main idea from wiki. Many OR books talk more about this such as pp53, pp165 in the Boyd's "convex optimization" (free to download).