The Test Cover problem can be defined as follows:
Suppose we have a set of n diseases and a set of m tests we can perform to check for symptoms. We also are given the following:
an nxn matrix A where A[i][j] is a binary value representing the result of running the jth test on a patient with the the ith disease (1 indicates a positive result, 0 indicates negative);
the cost of running test j, c_j; and that
any patient will have exactly one disease
The task is to find a set of tests that can uniquely identify each of the the n diseases at minimal cost.
This problem can be formulated as an Integer Linear Program, where we want to minimize the objective function \sum_{j=1}^{m} c_j x_j, where x_j = 1 if we choose to include test j in our set, and 0 otherwise.
My question is:
What is the set of linear constraints for this problem?
Incidentally, I believe this problem is NP-hard (as is Integer Linear Programming in general).
Well if I am correct you just need to ensure
\sum_j x_j.A_ij >= 1 forall i
Let T be the matrix that results from deleting the jth column of A for all j such that x_j = 0.
Then choosing a set of tests that can uniquely distinguish any two diseases is equivalent to ensuring that every row of T is unique.
Observe that two rows k and l are identical if and only if (T[k][j] XOR T[l][j]) = 0 for all j.
So, the constraints we want are
\sum_{j=1}^{m} x_j(A[k][j] XOR A[l][j]) >= 1
for all 1 <= k <= m and 1 <= l <= 1 such that k != l.
Note that the constraints above are linear, since we can just pre-compute the coefficient (A[k][j] XOR A[l][j]).
Related
I got a question regarding this constraints in the paper. This paper says it used big M method in order to make non-linear programming model into LP. I get that big number M1is a huge number, but I don't get what big number M1 really does on the constraints. Would you guys give me some insight on the use of the big M in this constraints?
Below is constraints with big number M1.
The paper says these constraints are
when K[m][i] = p[i]*x[m][i],
maximize sum(m in M, i in I) (K[m][i]-c[i]*x[m][i]
K[m][i]-M[1]*(1-x[m][i]) <= p[i]
K[m][i]+M[1]*(1-x[m][i]) >= p[i]
K[m][i]-M[1]*x[m][i] <= 0
it originally looked like this in non linear programming
maximize sum(m in M, i in I)(p[i]-c[i])*x[m][i]
So, basically, converting nonlinear programming into linear programming led to a little change in some decision variables and 3 additional constraints with big number M.
Here is another constraint that includes big number M.
sum (j in J) b[i][j]*p[j]-p[i]<= M[1]*y[i]
which originally looked like
p[i]<sum (j in J) b[i][j]*p[j], if y[i]==1
Here is the last constraint with big number M
(r[m][j]=p[j])*b[i][j]*x[m][i] >= -y[i]*m[1]
which was
(r[m][j]-p[j])*b[i][j]*x[m][i](1-y[i])>=0
in nonlinear program.
I really want to know what does big M do in the model.
It would be really appreciated if anyone gives me some insight.
Thank you.
As you said, the big-M is used to model the non-linear constraint
K[m][i] = p[i] * x[m][i]
in case x is a binary variable. The assumption is that M is an upper bound on K[m][i] and that K[m][i] is a non-negative variable, i.e. 0 <= K[m][i] <= M. Also p is assumed to be non-negative.
Since x[m][i] is binary, we can have two cases in a feasible solution:
x[m][i] = 0. In that case the product p[i] * x[m][i] is 0 and thus K[m][i] should be zero as well. This is enforced by constraint K[m][i] - M * x[m][i] <= 0 which in this case becomes just K[m][i] <= 0. The two other constraints involving M become redundant in this case. For example, the first constraint reduces to K[m][i] <= p[i] + M which is always true since M is an upper bound on K[m][i] and p is non-negative.
x[m][i] = 1. In that case the product p[i] * x[m][i] is just p[i] and the first two constraints involving M become K[m][i] <= p[i] and K[m][i] >= p[i] (which is equivalent to K[m][i] = p[i]). The last constraint involving M becomes K[m][i] <= M which is redundant since M is an upper bound on K[m][i].
So the role of M here is to "enable/disable" certain constraints depending on the value of x.
to model logical constraints you may either use logical constraints or rely on big M
https://www.ibm.com/support/pages/difference-between-using-indicator-constraints-and-big-m-formulation
I tend to suggest logical constraint as the default choice.
In https://www.linkedin.com/pulse/how-opl-alex-fleischer/
let me share the example
How to multiply a decision variable by a boolean decision variable in CPLEX ?
// suppose we want b * x <= 7
dvar int x in 2..10;
dvar boolean b;
dvar int bx;
maximize x;
subject to
{
// Linearization
bx<=7;
2*b<=bx;
bx<=10*b;
bx<=x-2*(1-b);
bx>=x-10*(1-b);
// if we use CP we could write directly
// b*x<=7
// or rely on logical constraints within CPLEX
// (b==1) => (bx==x);
// (b==0) => (bx==0);
}
Solving this problem on codechef:
After visiting a childhood friend, Chef wants to get back to his home.
Friend lives at the first street, and Chef himself lives at the N-th
(and the last) street. Their city is a bit special: you can move from
the X-th street to the Y-th street if and only if 1 <= Y - X <= K,
where K is the integer value that is given to you. Chef wants to get
to home in such a way that the product of all the visited streets'
special numbers is minimal (including the first and the N-th street).
Please, help him to find such a product. Input
The first line of input consists of two integer numbers - N and K -
the number of streets and the value of K respectively. The second line
consist of N numbers - A1, A2, ..., AN respectively, where Ai equals
to the special number of the i-th street. Output
Please output the value of the minimal possible product, modulo
1000000007. Constraints
1 ≤ N ≤ 10^5 1 ≤ Ai ≤ 10^5 1 ≤ K ≤ N Example
Input: 4 2 1 2 3 4.
Output: 8
It could be solved using graphs based on this tutorial
I tried to solve it without using graphs and just using recursion and DP.
My approach:
Take an array and calculate the min product to reach every index and store it in the respective index.
This could be calculated using top down approach and recursively sending index (eligible) until starting index is reached.
Out of all calculated values store the minimum one.
If it is already calculated return it else calculate.
CODE:
#include<iostream>
#include<cstdio>
#define LI long int
#define MAX 100009
#define MOD 1000000007
using namespace std;
LI dp[MAX]={0};
LI ar[MAX],k,orig;
void cal(LI n)
{
if(n==0)
return;
if(dp[n]!=0)
return;
LI minn=MAX;
for(LI i=n-1;i>=0;i--)
{
if(ar[n]-ar[i]<=k && ar[n]-ar[i]>=1)
{
cal(i);
minn=(min(dp[i]*ar[n],minn))%MOD;
}
}
dp[n]=minn%MOD;
return;
}
int main()
{
LI n,i;
scanf("%ld %ld",&n,&k);
orig=n;
for(i=0;i<n;i++)
scanf("%ld",&ar[i]);
dp[0]=ar[0];
cal(n-1);
if(dp[n-1]==MAX)
printf("0");
else printf("%ld",dp[n-1]);
return 0;
}
Its been 2 days and I have checked every corner cases and constraints but it still gives Wrong answer! Whats wrong with the solution?
Need Help.
Analysis
There are many problems. Here is what I found:
You restrict the product to a value inferior to 100009 without reason. The product can be way higher that that (this is indeed the reason why the problem only asked the value modulo 1000000007)
You restrict your moves from streets whose difference in special number is K whereas the problem statement says that you can move between any cities whose index difference is inferior to K
In you dynamic programming function you compute the product and store the modulo of the product. This can lead to a problem because the modulo of a big number can be lower than the modulo of a lower number. This may corrupt later computations.
The integral type you use, long int, is too short.
The complexity of your algorithm is too high.
From all these problems, the last one is the most serious. I fixed it by changing the whole aproach and using a better datastructure.
1st Problem
In your main() function:
if(dp[n-1]==MAX)
printf("0");
In your cal() function:
LI minn=MAX;
You should replace this line with:
LI minn = std::numeric_limits<LI>::max();
Do not forget to:
#include <limits>
2nd Problem
for(LI i=n-1;i>=0;i--)
{
if(ar[n]-ar[i]<=k && ar[n]-ar[i]>=1)
{
. . .
}
}
You should replace the for loop condition:
for(LI i=n-1;i>=n-k;i--)
And remove altogether the condition on the special numbers.
3rd Problem
You are looking for the path whose product of special numbers is the lowest. In your current setting, you compare path's product after having taken the modulo of the product. This is wrong, as the modulo of a higher number may become very low (for instance a path whose product is 1000000008 will have a modulo of 1 and you will choose this path, even if there is a path whose product is only 2).
This means you should compare the real products, without taking their modulo. As these products can become very high you should take their logarithm. This will allow you to compare the products with a simple double. Remember that:
log(a*b) = log(a) + log(b)
4th Problem
Use unsigned long long.
5th Problem
I fixed all these issues and submitted on codechef CHRL4. I got all but one test case accepted. The testcase not accepted was because of a timeout. This is due to the fact that your algorithm has got a complexity of O(k*n).
You can achieve O(n) complexity using a bottom-up dynamic programming approach, instead of top-down and using a data structure that will return the minimum log value of the k previous streets. You can lookup sliding window minimum algorithm to find how to do.
References
numeric_limits::max()
my own codechef CHRL4 solution: bottom-up dp + sliding window minimum
Let h(y) be the function defined as (a*y+b)mod m. So h(y) can take values from 0 to m-1.
Now we are given 7 integers- a,b,x,n,c,d,m. Our task is to find the total count of h(x),h(x+1),h(x+2)...h(x+n) such that the value of h(x+i) falls in the range of [c,d].where 0<=i<=n
Integer limits are:
1 ≤ m ≤ 10^15, c ≤ d < m, a,b < m, x+n ≤ 10^15, and a*(x+n) + b ≤ 10^15
For Example.
for input set {1,0,0,8,0,8,9} the output should be 9. Please suggest an efficient algorithm. Thanks!!!
This isn't a particularly strong hash. The only hard part about this problem is the obtuse notation with single-letter variables and specifying the problem as a 7-tuple.
Each increment of x increases h(x) by a. Therefore the total distance along x to get from c to d is simply (d-c)/a. Add one for the fencepost problem, or specify the problem with a half-open range for the sake of sanity.
I came accross this question in a programming contest, i think it can be solved by DP but cannot think of any, so plz help. Here's the questn :
There are n stack of coins placed linearly, each labelled from 1 to n. You also have a sack of coins containing infinite coins with you. All the coins in the stacks and the sack are identical. All you have to do is to make the heights of coins non-decreasing.
You select two stacks i and j and place one coin on each of the stacks of coins from stack'i' to stack'j' (inclusive). This complete operations is considered as one move. You have to minimize the number of moves to make the heights non-decreasing.
No. of Test Cases < 50
1 <= n <= 10^5
0 <= hi <= 10^9
Input Specification :
There will be a number of test cases. Read till EOF. First line of each test case will contain a single integer n, second line contains n heights (h[i]) of stacks.
Output Specification :
Output single integer denoting the number of moves for each test case.
for eg: H={3,2,1}
answer is 2
step1: i=2, j=3, H = {3,3,2}
step2: i=3, j=3, H = {3,3,3}
I am given a array A[] having N elements which are positive integers
.I have to find the number of sequences of lengths 1,2,3,..,N that satisfy a particular property?
I have built an interval tree with O(nlogn) complexity.Now I want to count the number of sequences that satisfy a certain property ?
All the properties required for the problem are related to sum of the sequences
Note an array will have N*(N+1)/2 sequences. How can I iterate over all of them in O(nlogn) or O(n) ?
If we let k be the moving index from 0 to N(elements), we will run an algorithm that is essentially looking for the MIN R that satisfies the condition (lets say I), then every other subset for L = k also is satisfied for R >= I (this is your short circuit). After you find I, simply return an output for (L=k, R>=I). This of course assumes that all numerics in your set are >= 0.
To find I, for every k, begin at element k + (N-k)/2. Figure out if this defined subset from (L=k, R=k+(N-k)/2) satisfies your condition. If it does, then decrement R until your condition is NOT met, then R=1 is your MIN (your could choose to print these results as you go, but they results in these cases would be essentially printed backwards). If (L=k, R=k+(N-k)/2) does not satisfy your condition, then INCREMENT R until it does, and this becomes your MIN for that L=k. This degrades your search space for each L=k by a factor of 2. As k increases and approaches N, your search space continuously decreases.
// This declaration wont work unless N is either a constant or MACRO defined above
unsigned int myVals[N];
unsigned int Ndiv2 = N / 2;
unsigned int R;
for(unsigned int k; k < N; k++){
if(TRUE == TESTVALS(myVals, k, Ndiv2)){ // It Passes
for(I = NDiv2; I>=k; I--){
if(FALSE == TESTVALS(myVals, k, I)){
I++;
break;
}
}
}else{ // It Didnt Pass
for(I = NDiv2; I>=k; I++){
if(TRUE == TESTVALS(myVals, k, I)){
break;
}
}
}
// PRINT ALL PAIRS from L=k, from R=I to R=N-1
if((k & 0x00000001) == 0) Ndiv2++;
} // END --> for(unsigned int k; k < N; k++)
The complexity of the algorithm above is O(N^2). This is because for each k in N(i.e. N iterations / tests) there is no greater than N/2 values for each that need testing. Big O notation isnt concerned about the N/2 nor the fact that truly N gets smaller as k grows, it is concerned with really only the gross magnitude. Thus it would say N tests for every N values thus O(N^2)
There is an Alternative approach which would be FASTER. That approach would be to whenever you wish to move within the secondary (inner) for loops, you could perform a move have the distance algorithm. This would get you to your O(nlogn) set of steps. For each k in N (which would all have to be tested), you run this half distance approach to find your MIN R value in logN time. As an example, lets say you have a 1000 element array. when k = 0, we essentially begin the search for MIN R at index 500. If the test passes, instead of linearly moving downward from 500 to 0, we test 250. Lets say the actual MIN R for k = 0 is 300. Then the tests to find MIN R would look as follows:
R=500
R=250
R=375
R=312
R=280
R=296
R=304
R=300
While this is oversimplified, your are most likely going to have to optimize, and test 301 as well 299 to make sure youre in the sweet spot. Another not is to be careful when dividing by 2 when you have to move in the same direction more than once in a row.
#user1907531: First of all , if you are participating in an online contest of such importance at national level , you should refrain from doing this cheap tricks and methodologies to get ahead of other deserving guys. Second, a cheater like you is always a cheater but all this hampers the hard work of those who have put in making the questions and the competitors who are unlike you. Thirdly, if #trumetlicks asks you why haven't you tagged the ques as homework , you tell another lie there.And finally, I don't know how could so many people answer this question this cheater asked without knowing the origin/website/source of this question. This surely can't be given by a teacher for homework in any Indian school. To tell everyone this cheater has asked you the complete solution of a running collegiate contest in India 6 hours before the contest ended and he has surely got a lot of direct helps and top of that invited 100's others to cheat from the answers given here. So, good luck to all these cheaters .