dramatic error in lp_solve? - linear-programming

I've a simple problem that I passed to lp_solve via the IDE (5.5.2.0)
/* Objective function */
max: +r1 +r2;
/* Constraints */
R1: +r1 +r2 <= 4;
R2: +r1 -2 b1 = 0;
R3: +r2 -3 b2 = 0;
/* Variable bounds */
b1 <= 1;
b2 <= 1;
/* Integer definitions */
int b1,b2;
The obvious solution to this problem is 3. SCIP as well as CBC give 3 as answer but not lp_solve. Here I get 2. Is there a major bug in the solver?
Thank's in advance.

I had contact to the developer group that cares about lpsolve software. The error will be fixed in the next version of lpsolve.

When I tried it, I am getting 3 as the optimal value for the Obj function.
Model name: 'LPSolver' - run #1
Objective: Maximize(R0)
SUBMITTED
Model size: 3 constraints, 4 variables, 6 non-zeros.
Sets: 0 GUB, 0 SOS.
Using DUAL simplex for phase 1 and PRIMAL simplex for phase 2.
The primal and dual simplex pricing strategy set to 'Devex'.
Relaxed solution 4 after 4 iter is B&B base.
Feasible solution 2 after 6 iter, 3 nodes (gap 40.0%)
Optimal solution 2 after 7 iter, 4 nodes (gap 40.0%).
Excellent numeric accuracy ||*|| = 0
MEMO: lp_solve version 5.5.2.0 for 32 bit OS, with 64 bit REAL variables.
In the total iteration count 7, 1 (14.3%) were bound flips.
There were 2 refactorizations, 0 triggered by time and 0 by density.
... on average 3.0 major pivots per refactorization.
The largest [LUSOL v2.2.1.0] fact(B) had 8 NZ entries, 1.0x largest basis.
The maximum B&B level was 3, 0.8x MIP order, 3 at the optimal solution.
The constraint matrix inf-norm is 3, with a dynamic range of 3.
Time to load data was 0.001 seconds, presolve used 0.017 seconds,
... 0.007 seconds in simplex solver, in total 0.025 seconds.

Related

Inconsistent rows allocation in scalapack

Consider the following simple fortran program
program test_vec_allocation
use mpi
implicit none
integer(kind=8) :: N
! =========================BLACS and MPI=======================
integer :: ierr, size, rank,dims(2)
! -------------------------------------------------------------
integer, parameter :: block_size = 100
integer :: context, nprow, npcol, local_nprow, local_npcol
integer :: numroc, indxl2g, descmat(9),descvec(9)
integer :: mloc_mat ,nloc_mat ,mloc_vec ,nloc_vec
call blacs_pinfo(rank,size)
dims=0
call MPI_Dims_create(size, 2, dims, ierr)
nprow = dims(1);npcol = dims(2)
call blacs_get(0,0,context)
call blacs_gridinit(context, 'R', nprow, npcol)
call blacs_gridinfo(context, nprow, npcol, local_nprow,local_npcol)
N = 700
mloc_vec = numroc(N,block_size,local_nprow,0, nprow)
nloc_vec = numroc(1,block_size,local_npcol,0, npcol)
print *,"Rank", rank, mloc_vec, nloc_vec
call blacs_gridexit(context)
call blacs_exit(0)
end program test_vec_allocation
when I run it with 11 mpi ranks i get
Rank 0 100 1
Rank 4 100 1
Rank 2 100 1
Rank 1 100 1
Rank 3 100 1
Rank 10 0 1
Rank 6 100 1
Rank 5 100 1
Rank 9 0 1
Rank 8 0 1
Rank 7 0 1
which is how i would expect scalapack to divide this array, however, for even number of ranks i get:
Rank 0 200 1
Rank 8 200 0
Rank 9 100 1
Rank 10 100 0
Rank 1 200 0
Rank 6 200 1
Rank 11 100 0
Rank 3 200 1
Rank 4 200 0
Rank 2 200 0
Rank 7 200 0
Rank 5 200 0
which makes no sense, why would rank 0 get 200 elements for block size 100 and ranks * block size > N.
Because of this my program works for mpi ranks 1,2,3,5,7,11, but fails for ranks 4,6,8,9,10,12, etc (I dont why it is failing for rank 9!). Can anyone explain what is wrong in my approach?
GFortran version: 6.1.0
SCALPACK version: 2.1.0
MacOS version: 10.11
There are a number of things wrong with your code
1) Firstly don't use Integer( 8 ). As Vladimir put it, please unlearn this. Not only is it not portable and therefore very bad practice (please see many examples here, e.g. Fortran 90 kind parameter) here it is wrong as numroc expects an integer of default kind as its first argument (see e.g. https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-fortran/top/scalapack-routines/scalapack-utility-functions-and-routines/numroc.html)
2) You call an MPI routine before you call MPI_Init, with a hand full of exceptions (and this isn't one) this results in undefined behaviour. Note the description at https://www.netlib.org/blacs/BLACS/QRef.html#BLACS_PINFO makes no reference to actually calling MPI_Init. As such I also prefer to call MPI_Finalise
3) You have misunderstood MPI_Dims_create. You seem to assume you will get a 1 dimensional distribution, but you actually ask it for a two dimensional one. Quoting from the standard at https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
The entries in the array dims are set to describe a Cartesian grid
with ndims dimensions and a total of nnodes nodes. The dimensions are
set to be as close to each other as possible,using an appropriate
divisibility algorithm. The caller may further constrain the
operation of this routine by specifying elements of array dims. If
dims[i] is set to a positive number,the routine will not modify the
number of nodes in dimension i; only those entries where dims[i] = 0
are modified by the call.
You set dims equal to zero, so the routine is free to set both dimensions. Thus for 11 processes you will get a 1x11 or 11x1 grid, which is what you seem to expect. However for 12 processes, as The dimensions are set to be as close to each other as possible you will get either a 3x4 or 4x3 grid, NOT 12x1. If it is 3x4 along each row you expect numroc to return 3 processes with 200 elements ( 2 blocks ), and 1 with 100. As there are 3 rows you therefore expect 3x3=9 processes returning 200 and 3x1=3 returning 100. This is what you see. Also try 15 procs - you will see an odd number of processes that according to you "does not work", this is because (advanced maths alert) 15=3x5. Incidentally on my machine 9 processes does NOT return 3x3 - this looks like a bug in openmpi to me.

Need help writing estimates statements in proc genmod

I'm using proc genmod to predict an outcome measured at 4 time points. The outcome is a total score on a mood inventory, which can range from 0 to 82. A lot of participants have a score of 0, so the negative binomial distribution in proc genmod seemed like a good fit for the data.
Now, I'm struggling with how to write/interpret the estimates statements. The primary predictors are TBI status at baseline (0=no/1=yes), and visit (0=baseline, 1=second visit, 2=third visit, 4=fourth visit), and an interaction of TBI status and visit.
How do I write my estimates, such that I'm getting out:
1. the average difference in mood inventory score for person with TBI versus a person without, at baseline.
and
2. the average difference in mood inventory change score for a person with TBI versus a person without, over the 4 study visits?
Below is what I have thus far, but I'm not sure how to interpret the output, also below, if indeed my code is correct.:
proc genmod data = analyze_long_3 ;
class id screen_tbi (param = ref ref = first) ;
model nsi_total = visit_cent screen_tbi screen_tbi*visit_cent /dist=negbin ;
output predicted = predstats;
repeated subject=id /type=cs;
estimate "tbi" intercept 1 visit_cent 0 0 0 0 screen_tbi 1 0 /exp;
estimate "no tbi" intercept 1 visit_cent 0 0 0 0 screen_tbi 0 1 /exp;
estimate 'longitudinal TBI' intercept 1
visit_cent -1 1 1 1
screen_tbi 1 0
screen_tbi*visit_cent 1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0 / exp;
estimate 'longitudinal no TBI ' intercept 1
visit_cent -1 1 1 1
screen_tbi 0 1
screen_tbi*visit_cent 0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1 / exp;
where sample = 1 ;
run;
The first research question is to have the average difference score, at baseline, for person with TBI versus a person without. It can be achieved by the following steps:
1) Get the estimated average log (score) when TBI = yes, and Visit = baseline;
2) Get the estimated average log (score) when TBI = no, and Visit =baseline;
3) 1) – 2) to have the difference in log(score) values
4) Exp[3)] to have the difference as percentage of change in scores
To simplify, let T=TBI levels, and V = Visit Levels. One thing to clarify, in your post, there are 4 visit points, the first as reference; therefore there should be 3 parameters for V, not four.
Taking the example of step 1), let’s try to write the ESTIMATE statement. It is a bit tricky. At first it sounds like this (T=0 and V =0 as reference):
ESTIMATE ‘Overall average’ intercept T 1 V 0 0 0;
But it is wrong. In the above statement, all arguments for V are set to 0. When all arguments are 0, it is the same as taking out V from the statement:
ESTIMATE ‘Overall average’ intercept T 1;
This is not the estimate of average for T=1 at baseline level. Rather, it produces an average for T=1, regardless of visit points, or, an average for all visit levels.
The problem is that the reference is set as V=0. In that case, SAS cannot tell the difference between estimates for the reference level, and the estimates for all levels. Indeed it always estimates the average for all levels. To solve it, the reference has to be set to -1, i.e., T=-1 and V=-1 as reference, such that the statement likes:
ESTIMATE ‘Average of T=1 V=baseline’ intercept T 1 V -1 -1 -1;
Now that SAS understands: fine! the job is to get the average at baseline level, not at all levels.
To make the reference value as -1 instead of 0, in the CLASS statement, the option should be specified as PARAM = EFFECT, not PARAM = REF. That brings another problem: once PARAM is not set as REF, SAS will ignore the user defined references. For example:
CLASS id T (ref=’…’) V (ref=’…’) / PARAM=EFFECT;
The (ref=’…’) is ignored when PARAM=EFFECT. How to let SAS make TBI=No and Visit=baseline as references? Well, SAS automatically takes the last level as the reference. For example, if the variable T is ordered ascendingly, the value -1 comes as the first level, while the value 1 comes as the last level; therefore 1 will be the reference. Conversely, if T is ordered in descending order, the value -1 comes at the end and will be used as the ref. This is achieved by the option ‘DESCENDING’ in the CLASS statement.
CLASS id T V / PARAM=EFFECT DESCENDING;
That way, the parameters are ordered as:
T 1 (TBI =1)
T -1 (ref level of TBI, i.e., TBI=no)
V 1 0 0 (for visit =4)
V 0 1 0 (visit = 3)
V 0 0 1 (visit =2)
V -1 -1 -1 (this is the ref level, visit=baseline)
The above information is reported in the ODS table ‘Class Level Information’. It is always good to check the very table each time after running PROC GENMOD. Note that the level (visit = 4) comes before the level (visit =3), visit =3 coming before visit=2.
Now, let’s talk a bit about the parameters and the model equation. As you might know, in SAS, the V for multi-levels is indeed broken down into dummy Vs. If baseline is set as ref level, the dummies will be like:
V4 = the fourth visit or baseline
V3= the third visit, or baseline
V2 = the second visit or baseline
Accordingly, the equation can be written as:
LOG(s) = b0 + b1*T + b2*V4 + b3*V3 + b4*V2
whereas:
s = the total score on a mood inventory
T = 1 for TBI status of yes, = -1 for TBI status of no
V4 = 1 for the fourth visit, = -1 for baseline
V3 = 1 for the third visit, =-1 for baseline
V2 = 1 for the second visit, = -1 for the baseline
b0 to b4 are beta estimates for the parameters
Of note, the order in the model is the same as the order defined in the statement CLASS, and the same as the order in the ODS table ‘Class Level Information’. The V4, V3, V2 have to appear in the model, all or none, i.e., if the VISIT term is to be included, V4 V3 V2 should be all introduced into the model equation. If the VISIT term is not included, none of V4, V3, and V2 should be in the equation.
With interaction terms, 3 more dummy terms must be created:
T_V4 = T*V4
T_V3 = T*V3
T_V2 = T*V2
Hence the equation with interaction terms:
Log(s) = b0 + b1*T + b2*V4 + b3*V3 + b4*V2 + b5*T_V4 + b6* T_V3 + b7* T_V2
The SAS statement of ‘ESTIMATE’ is correspondent to the model equation.
For example, to estimate an overall average for all parameters and all levels, the equation is:
[Log(S)] = b0 ;
whereas [LOG(S)] stands for the expected LOG(score). Accordingly, the statement is:
ESTIMATE ‘overall (all levels of T and V)’ INTERCEPT;
In the above statement, ‘INTERCEPT’ in the statement is correspondent to ‘b0’ in the equation
To estimate an average of log (score) for T =1, and for all levels of visit points, the equation is
[LOG(S)] = b0 + b1 * T = b0 + b1 * 1
And the statement is
ESTIMATE ‘T=Yes, V= all levels’ INTERCEPT T 1;
In the above case, ‘T 1’ in the statement is correspondent to the part “*1” in the equation (i.e., let T=1)
To estimate an average of log (score) for T =1, and for visit = baseline, the equation is:
[Log(s)] = b0 + b1*T + b2*V4 + b3*V3 + b4*V2
= b0 + b1*(1) + b2*(-1)+ b3*(-1) + b4*(-1)
The statement is:
ESTIMATE ‘T=Yes, V=Baseline’ INTERCEPT T 1 V -1 -1 -1;
‘V -1 -1 -1’ in the statement is correspondent to the values of V4, V3, and V2 in the equation. We’ve mentioned above that the dummies V4 V3 and V2 must be all introduced into the model. That is why for the V term, there are always three numbers, such as ‘V -1 -1 -1’, or ‘V 1 1 1’, etc. SAS will give warning in log if you make it like ‘V -1 -1 -1 -1’, because there are four '-1's, 1 more than required. In that case, the excessive '-1' will be ignored. On the contrary, ‘V 1 1’ is fine. It is the same as ‘V 1 1 0’. But what does 'V 1 1 0' means? To figure it out, you have to read Allison’s book (see reference).
For now, let’s carry on, and add the interaction terms. The equation:
[Log(s)] = b0 + b1*T + b2*V4 + b3*V3 + b4*V2 + b5*T_V4 + b6*T_V3 + b7*T_V2
As T_V4 = T*V4 = 1 * (-1) = -1, similarly T_V3 = -1, T_V2=-1, substitute into the equation:
[Log(s)] = b0 + b1*1 + b2*(-1)+ b3*(-1)+ b4*(-1)+ b5*(-1) + b6*(-1) + b7*(-1)
The statement is:
ESTIMATE ‘(1) T=Yes, V=Baseline, with interaction’ INTERCEPT T 1 V -1 -1 -1 T*V -1 -1 -1;
The ‘T*V -1 -1 -1’ are correspondent to the values of T_V4, T_V3 and T_V2 in the equation.
And that is the statement for step 1)!
Step 2 follows the same thoughts. To get the estimated average log (score) when TBI = no, and Visit =baseline.
T = -1, V4=-1, V3=-1, V2=-1.
T_V4 = T * V4 = (-1) * (-1) = 1
T_V3 = T * V3 = (-1) * (-1) = 1
T_V2 = T * V2 = (-1) * (-1) = 1
Substituting the values in the equation:
[Log(s)] = b0 + b1*1 + b2*(-1)+ b3*(-1)+ b4*(-1)+ b5*(1) + b6*(1) + b7*(1)
Note that the numbers: For T: 1; for V: -1 -1 -1; for interaction terms: 1 1 1
And the SAS statement:
ESTIMATE ‘(2) T=No, V=Baseline, with interaction’ INTERCEPT T 1 V -1 -1 -1 T*V 1 1 1;
The estimate results can be found in the ODS table ‘Contrast Estimate Results’.
For step 3), subtract the estimate (1) – (2), to have the difference of log(score); and for step(4), have the exponent of the diff in step 3).
For the second research question:
The average difference in mood inventory change score for a person with TBI versus a person without, over the 4 study visits.
Over the 4 study visits means for all visit levels. By now, you might have known that the statement is simpler:
ESTIMATE ‘(1) T=Yes, V=all levels’ INTERCEPT T 1;
ESTIMATE ‘(2) T=Yes, V=all levels’ INTERCEPT T -1;
Why there are no interaction terms? Because all visit levels are considered. And when all levels are considered, you do not have to put any visit-related terms into the statement.
Finally, the above approach requires some manual calculation. Indeed it is possible to make one single line of ESTIMATE statement that is equivalent to the aforementioned approach. However, the method we discussed above is way easier to understand. For more sophisticated methods, please read Allison’s book.
Reference:
1. Allison, Paul D. Logistic Regression Using SAS®: Theory and Application, Second Edition. Copyright © 2012, SAS Institute Inc.,Cary, North Carolina, USA.

SAS - Selecting optimal quantities

I'm trying to solve a problem in SAS where I have quantities of customers across a range of groups, and the quantities I select need to be as even across the different categories as possible. This will be easier to explain with a small table, which is a simplification of a much larger problem I'm trying to solve.
Here is the table:
Customer Category | Revenue band | Churn Band | # Customers
A 1 1 4895
A 1 2 383
A 1 3 222
A 2 1 28
A 2 2 2828
A 2 3 232
B 1 1 4454
B 1 2 545
B 1 3 454
B 2 1 4534
B 2 2 434
B 2 3 454
Suppose I need to select 3000 customers from category A, and 3000 customers from category B. From the second category, within each A and B, I need to select an equal amount from 1 and 2. If possible, I need to select a proportional amount across each 1, 2, and 3 subcategories. Is there an elegant solution to this problem? I'm relatively new to SAS and so far I've investigated OPTMODEL, but the examples are either too simple or too advanced to be much use to me yet.
Edit: I've thought about using survey select. I can use this to select equal sizes across the Revenue Bands 1, 2, and 3. However where I'm lacking customers in the individual churn bands, surveyselect may not select the maximum number of customers available where those numbers are low, and I'm back to manually selecting customers.
There are still some ambiguities in the problem statement, but I hope that the PROC OPTMODEL code below is a good start for you. I tried to add examples of many different features, so that you can toy around with the model and hopefully get closer to what you actually need.
Of the many things you could optimize, I am minimizing the maximum violation from your "If possible" goal, e.g.:
min MaxMismatch = MaxChurnMismatch;
I was able to model your constraints as a Linear Program, which means that it should scale very well. You probably have other constraints you did not mention, but that would probably beyond the scope of this site.
With the data you posted, you can see from the output of the print statements that the optimal penalty corresponds to choosing 1500 customers from A,1,1, where the ideal would be 1736. This is more expensive than ignoring the customers from several groups:
[1] ChooseByCat
A 3000
B 3000
[1] [2] [3] Choose IdealProportion
A 1 1 1500 1736.670
A 1 2 0 135.882
A 1 3 0 78.762
A 2 1 28 9.934
A 2 2 1240 1003.330
A 2 3 232 82.310
B 1 1 1500 1580.210
B 1 2 0 193.358
B 1 3 0 161.072
B 2 1 1500 1608.593
B 2 2 0 153.976
B 2 3 0 161.072
Proportion MaxChurnMisMatch
0.35478 236.67
That is probably not the ideal solution, but figuring how to model exactly your requirements would not be as useful for this site. You can contact me offline if that is relevant.
I've added quotes from your problem statement as comments in the code below.
Have fun!
data custCounts;
input cat $ rev churn n;
datalines;
A 1 1 4895
A 1 2 383
A 1 3 222
A 2 1 28
A 2 2 2828
A 2 3 232
B 1 1 4454
B 1 2 545
B 1 3 454
B 2 1 4534
B 2 2 434
B 2 3 454
;
proc optmodel printlevel = 0;
set CATxREVxCHURN init {} inter {<'A',1,1>};
set CAT = setof{<c,r,ch> in CATxREVxCHURN} c;
num n{CATxREVxCHURN};
read data custCounts into CATxREVxCHURN=[cat rev churn] n;
put n[*]=;
var Choose{<c,r,ch> in CATxREVxCHURN} >= 0 <= n[c,r,ch]
, MaxChurnMisMatch >= 0, Proportion >= 0 <= 1
;
/* From OP:
Suppose I need to select 3000 customers from category A,
and 3000 customers from category B. */
num goal = 3000;
/* See "implicit slice" for the parenthesis notation, i.e. (c) below. */
impvar ChooseByCat{c in CAT} =
sum{<(c),r,ch> in CATxREVxCHURN} Choose[c,r,ch];
con MatchCatGoal{c in CAT}:
ChooseByCat[c] = goal;
/* From OP:
From the second category, within each A and B,
I need to select an equal amount from 1 and 2 */
con MatchRevenueGroupsWithinCat{c in CAT}:
sum{<(c),(1),ch> in CATxREVxCHURN} Choose[c,1,ch]
= sum{<(c),(2),ch> in CATxREVxCHURN} Choose[c,2,ch]
;
/* From OP:
If possible, I need to select a proportional amount
across each 1, 2, and 3 subcategories. */
con MatchBandProportion{<c,r,ch> in CATxREVxCHURN, sign in / 1 -1 /}:
MaxChurnMismatch >= sign * ( Choose[c,r,ch] - Proportion * n[c,r,ch] );
min MaxMismatch = MaxChurnMismatch;
solve;
print ChooseByCat;
impvar IdealProportion{<c,r,ch> in CATxREVxCHURN} = Proportion * n[c,r,ch];
print Choose IdealProportion;
print Proportion MaxChurnMismatch;
quit;

How to know if an index in a binary heap is on an odd level?

If I have a binary heap , with the typical properties of left neighbor of position "pos" being (2*pos)+1 while right neighbor is (2*pos)+2 and parent node in (pos-1) )/ 2, how can I efficiently determine if a given index represents a node on an odd level (with the level of the root being level 0) ?
(Disclaimer: This is a more complete answer based on Jarod42's comment.)
The formula you want is:
floor(log2(pos+1)) mod 2
To understand why, look at the levels of the first few nodes:
0 Level: 0
1 2 1
3 4 5 6 2
7 8 9 10 11 12 13 14 3
0 -> 0
1 -> 1
2 -> 1
3 -> 2
...
6 -> 2
7 -> 3
...
The first step is to find a function that will map node numbers to level numbers in this way. Adding one to the number and taking a base 2 logarithm will give you almost (but not quite) what you want:
log2 (0+1) = log2 1 = 0
log2 (1+1) = log2 2 = 1
log2 (2+1) = log2 3 = 1.6 (roughly)
log2 (3+1) = log2 4 = 2
....
log2 (6+1) = log2 7 = 2.8 (roughly)
log2 (7+1) = log2 8 = 3
You can see from this that rounding down to the nearest integer in each case will give you the level of each node, hence giving us floor(log2(pos+1)).
As Jarod42 said, it's then a case of looking at the parity of the level number, which just involves taking the number mod 2. This will give either 0 (the level is even) or 1 (the level is odd).

How to balance between two arrays such as the difference is minimized?

I have an array A[]={3,2,5,11,17} and B[]={2,3,6}, size of B is always less than A. Now I have to map from every element B to distinct elements of A such that the total difference sum( abs(Bi-Aj) ) becomes minimum (Where Bi has been mapped to Aj). What is the type of algorithm?
For the example input, I could select, 2->2=0 , 3->3=0 and then 6->5=1. So the total cost is 0+0+1 = 1. I have been thinking sorting both the arrays and then take the first sizeof B elements from the A. Will this work?
It can be thought of as an unbalanced Assignment Problem.
The cost matrix shall be the difference in values of B[i] and A[j]. You can add dummy elements to B so that the problem becomes balanced and put the costs associated very high.
Then Hungarian Algorithm can be applied to solve it.
For the example case A[]={3,2,5,11,17} and B[]={2,3,6} the cost matrix shall be:
. 3 2 5 11 17
2 1 0 3 9 15
3 0 1 2 8 14
6 3 4 1 5 11
d1 16 16 16 16 16
d2 16 16 16 16 16