How can I write an if condition for my variable in GLPK? - linear-programming

Here is my full problem:
Information:
*Max. total investment: $125
*Pay-off is the sum of the units bought x pay-off/unit
*Cost per investment: Buy-in cost + cost/unit x number of units if you buy at least one unit
*The cost is sum of the costs per investment
Constraints:
*You may not invest in both 2 and 5.
*You may invest in 1 only if you invest at least one of 2 and 3.
*You must invest at least two of 3,4,5.
*You may not invest more than max number of units.
Problem: Maximize profit : pay-off - cost
xi: # of units i ∈ {1,2,3,4,5}
yi=1 if xi>0 else yi=0
cost = sum{i in I} buyInCost_i * yi + cost-unit_i*xi
pay-off = sum{i in I} (pay-off/unit)_i*xi
profit = pay-off - cost
Maximize profit
Subject to
y2+y5 <= 1
y1<= y2+y3
y3+y4+y5 >= 2
x1<=5, x2<=4, x3<=5, x4<=7, x5<=3
cost<=125
Here is my question:
For example I have this binary variable y
yi=1 if xi>0 else yi=0 and i ∈ {1,2,3,4,5}
I declared i as a data set
set I;
data;
set I := 1 2 3 4 5;
I don't know how to add if else condition to y variable in glpk. Can you please help me out?
My modelling :
set I;
/*if x[i]>0 y[i]=1 else y[i]=0 ?????*/
var y{i in I}, binary;
param a{i in I};
/* buy-in cost of investment i */
param b{i in I};
/* cost per unit of investment i */
param c{i in I};
/* pay-off per unit of investment i */
param d{i in I};
/* max number of units of investment i */
var x{i in I} >=0;
/* Number of units that is bought of investment i */
var po := sum{i in I} c[i]*x[i];
var cost := sum{i in I} a[i]*y[i] + b[i]*x[i];
maximize profit: po-cost;
s.t. c1: y[2]+y[5]<=1;
s.t. c2: y[1]<y[2]+y[3];
s.t. c3: y[3]+y[4]+y[5]>=2;
s.t. c4: x[1]<=5
x[2]<=4
x[3]<=5
x[4]<=7
x[5]<=3;
s.t. c5: cost <=125;
s.t. c6{i in I}: M * y[i] > x[i]; // if condition of y[i]
set I := 1 2 3 4 5;
param a :=
1 25
2 35
3 28
4 20
5 40;
param b :=
1 5
2 7
3 6
4 4
5 8;
param c :=
1 15
2 25
3 17
4 13
5 18;
param d :=
1 5
2 4
3 5
4 7
5 3;
param M := 10000;
I am getting this syntax error:
problem.mod:21: syntax error in variable statement
Context: ...I } ; param d { i in I } ; var x { i in I } >= 0 ; var po :=
MathProg model processing error

You can't directly do that (there is no way to write 'directly' an if constraint in a LP).
However, there are workarounds for this.
For example, you can write:
M * yi > xi
where M is a large constant (greater than any value of xi).
This way:
if xi > 0, then the constraint is equivalent to yi > 0, that is yi == 1 since yi is binary (if M is large enough).
if xi == 0, then the constraint is always verified, and yi will be equal to 0 since your objective is increasing with yi and you are minimizing.
in both case, the constraint is equivalent to the if test.

Related

National competition in programming math problem

I encountered this problem practicing for an upcoming national competition. The problem goes as follows: You need to create a mixture of two ingredients being in relation to 1:1. You are given N different mixtures, each having its own weight Wi, and its relation in the mixture between the ingredients Mi, Ti (Each value, N, Wi, Mi, and Ti, will be less than 100). We need to find the biggest possible weight of the final mixture, keeping the relation to 1:1. We can take from each given mixture how much we want, we don't necessarily need to take the whole mixture, we can take some portion of it.
So with the given relation 1:1 in the final mixture, we know that we need to have an equal amount of weight from both ingredients possible. After that I need to know if I take K grams of some mixture, how much weight that is for ingredients A and B. So I came up with the following formula:
Let W be the weight in grams, and M and T be the relation between the ingredients respectively. If we want to take K (K <= W) grams we have the following:
Weight of ingredient A = M * (K / (M+T))
Weight of ingredient B = T * (K / (M+T))
#include <bits/stdc++.h>
using namespace std;
class state{
public:
int weight;
int A;
int B;
};
int n;
vector<state> arr;
double ans= 0;
void f(double weight_A, double weight_B, int idx){
if(weight_A == weight_B)
ans = max(ans, weight_A + weight_B);
if(idx >= n)
return;
int weight = arr[idx].weight, relA = arr[idx].A, relB = arr[idx].B;
for(int K = 0; K <= weight; K++){
f(weight_A + relA * (K * 1.0/(relA + relB)), weight_B + relB * (K * 1.0/(relA + relB)), idx+1);
}
}
int main(){
cin>>n;
for(int i = 0; i < n; i++){
state in;
cin>>in.weight>>in.A>>in.B;
arr.push_back(in);
}
f(0.0, 0.0, 0);
cout<<fixed<<setprecision(8);
cout<<ans<<endl;
}
The problem I encountered was that we don't necessarily need to take integer weights, some times to achieve the maximum possible weight of the final product we need to take decimal weights. Let's take a look at this example:
5
14 3 2
4 1 3
4 2 2
6 6 1
10 4 3
We have N = 5, and in each row are given 3 integers, Wi, Mi, and Ti. The weight of the ith mixture and its relation. My solution for this example gives 20.0000, and the correct solution for the above example is 20.85714286. Looking back my initial idea won't work because of the decimal numbers. I suppose there is some formula but I can't figure it out, can anyone help?
This is a Linear Programming problem, so you can solve it by constructing the problem in standard form, and then solve it with an optimization algorithm, like the simplex algorithm.
The objective is to maximize the quantity of medicine (from the original problem), that is the sum of quantities taken from each jar (I'll call the quantities x1, x2, ...).
The quantities are bounded to be lower than the weight Wi available in each jar.
The constraint is that the total amount of honey (first ingredient) is equal to the total amount of tahini (second ingredient). This would mean that:
sum(Mi/(Mi+Ti)*xi) = sum(Ti/(Mi+Ti)*xi)
You can take the second summation to the LHS and get:
sum((Mi-Ti)/(Mi+Ti)*xi) = 0
In order to get integer multipliers just multiply everything by the least common multiple of the denominators lcm(Mi+ti) and then divide by the gcd of the coefficients.
Using your example, the constraint would be:
(3-2)/(3+2) x1 + (1-3)/(1+3) x2 + (2-2)/(2+2) x3 + (6-1)/(6+1) x4 + (4-3)/(4+3) x5 = 0
that is
1/5 x1 -2/4 x2 + 0/4 x3 + 5/7 x4 + 1/7 x5 = 0
Multiply by the lcm(5,4,4,7,7)=140:
28 x1 -70 x2 + 0 x3 + 100 x4 + 20 x5 = 0
divide by 2:
14 x1 -35 x2 +0 x3 + 50 x4 + 10 x5 = 0
We are ready to solve the problem. Let's write it in CPLEX format:
maximize
quantity: x1 + x2 + x3 + x4 + x5
subject to
mix: 14 x1 -35 x2 +0 x3 + 50 x4 + 10 x5 = 0
bounds
x1 <= 14
x2 <= 4
x3 <= 4
x4 <= 6
x5 <= 10
end
Feed it to GLPK:
#include <stdio.h>
#include <stdlib.h>
#include <glpk.h>
int main(void)
{
glp_prob *P;
P = glp_create_prob();
glp_read_lp(P, NULL, "problem.cplex");
glp_adv_basis(P, 0);
glp_simplex(P, NULL);
glp_print_sol(P, "output.txt");
glp_delete_prob(P);
return 0;
}
And the output is:
Problem:
Rows: 1
Columns: 5
Non-zeros: 4
Status: OPTIMAL
Objective: quantity = 20.85714286 (MAXimum)
No. Row name St Activity Lower bound Upper bound Marginal
------ ------------ -- ------------- ------------- ------------- -------------
1 mix NS 0 0 = 0.0714286
No. Column name St Activity Lower bound Upper bound Marginal
------ ------------ -- ------------- ------------- ------------- -------------
1 x1 B 2.85714 0 14
2 x2 NU 4 0 4 3.5
3 x3 NU 4 0 4 1
4 x4 NL 0 0 6 -2.57143
5 x5 NU 10 0 10 0.285714
Karush-Kuhn-Tucker optimality conditions:
KKT.PE: max.abs.err = 0.00e+00 on row 0
max.rel.err = 0.00e+00 on row 0
High quality
KKT.PB: max.abs.err = 0.00e+00 on row 0
max.rel.err = 0.00e+00 on row 0
High quality
KKT.DE: max.abs.err = 0.00e+00 on column 0
max.rel.err = 0.00e+00 on column 0
High quality
KKT.DB: max.abs.err = 0.00e+00 on row 0
max.rel.err = 0.00e+00 on row 0
High quality
End of output
Of course given your input you should construct the problem in memory and feed it to the simplex algorithm without going through a file. Additionally, there's no need to get integer coefficients, it was just to allow a nicer problem formulation.

how can we find the nth 3 word combination from a word corpus of 3000 words

I have a word corpus of say 3000 words such as [hello, who, this ..].
I want to find the nth 3 word combination from this corpus.I am fine with any order as long as the algorithm gives consistent output.
What would be the time complexity of the algorithm.
I have seen this answer but was looking for something simple.
(Note that I will be using 1-based indexes and ranks throughout this answer.)
To generate all combinations of 3 elements from a list of n elements, we'd take all elements from 1 to n-2 as the first element, then for each of these we'd take all elements after the first element up to n-1 as the second element, then for each of these we'd take all elements after the second element up to n as the third element. This gives us a fixed order, and a direct relation between the rank and a specific combination.
If we take element i as the first element, there are (n-i choose 2) possibilities for the second and third element, and thus (n-i choose 2) combinations with i as the first element. If we then take element j as the second element, there are (n-j choose 1) = n-j possibilities for the third element, and thus n-j combinations with i and j as the first two elements.
Linear search in tables of binomial coefficients
With tables of these binomial coefficients, we can quickly find a specific combination, given its rank. Let's look at a simplified example with a list of 10 elements; these are the number of combinations with element i as the first element:
i
1 C(9,2) = 36
2 C(8,2) = 28
3 C(7,2) = 21
4 C(6,2) = 15
5 C(5,2) = 10
6 C(4,2) = 6
7 C(3,2) = 3
8 C(2,2) = 1
---
120 = C(10,3)
And these are the number of combinations with element j as the second element:
j
2 C(8,1) = 8
3 C(7,1) = 7
4 C(6,1) = 6
5 C(5,1) = 5
6 C(4,1) = 4
7 C(3,1) = 3
8 C(2,1) = 2
9 C(1,1) = 1
So if we're looking for the combination with e.g. rank 96, we look at the number of combinations for each choice of first element i, until we find which group of combinations the combination ranked 96 is in:
i
1 36 96 > 36 96 - 36 = 60
2 28 60 > 28 60 - 28 = 32
3 21 32 > 21 32 - 21 = 11
4 15 11 <= 15
So we know that the first element i is 4, and that within the 15 combinations with i=4, we're looking for the eleventh combination. Now we look at the number of combinations for each choice of second element j, starting after 4:
j
5 5 11 > 5 11 - 5 = 6
6 4 6 > 4 6 - 4 = 2
7 3 2 <= 3
So we know that the second element j is 7, and that the third element is the second combination with j=7, which is k=9. So the combination with rank 96 contains the elements 4, 7 and 9.
Binary search in tables of running total of binomial coefficients
Instead of creating a table of the binomial coefficients and then performing a linear search, it is of course more efficient to create a table of the running total of the binomial coefficient, and then perform a binary search on it. This will improve the time complexity from O(N) to O(logN); in the case of N=3000, the two look-ups can be done in log2(3000) = 12 steps.
So we'd store:
i
1 36
2 64
3 85
4 100
5 110
6 116
7 119
8 120
and:
j
2 8
3 15
4 21
5 26
6 30
7 33
8 35
9 36
Note that when finding j in the second table, you have to subtract the sum corresponding with i from the sums. Let's walk through the example of rank 96 and combination [4,7,9] again; we find the first value that is greater than or equal to the rank:
3 85 96 > 85
4 100 96 <= 100
So we know that i=4; we then subtract the previous sum next to i-1, to get:
96 - 85 = 11
Now we look at the table for j, but we start after j=4, and subtract the sum corresponding to 4, which is 21, from the sums. then again, we find the first value that is greater than or equal to the rank we're looking for (which is now 11):
6 30 - 21 = 9 11 > 9
7 33 - 21 = 12 11 <= 12
So we know that j=7; we subtract the previous sum corresponding to j-1, to get:
11 - 9 = 2
So we know that the second element j is 7, and that the third element is the second combination with j=7, which is k=9. So the combination with rank 96 contains the elements 4, 7 and 9.
Hard-coding the look-up tables
It is of course unnecessary to generate these look-up tables again every time we want to perform a look-up. We only need to generate them once, and then hard-code them into the rank-to-combination algorithm; this should take only 2998 * 64-bit + 2998 * 32-bit = 35kB of space, and make the algorithm incredibly fast.
Inverse algorithm
The inverse algorithm, to find the rank given a combination of elements [i,j,k] then means:
Finding the index of the elements in the list; if the list is sorted (e.g. words sorted alphabetically) this can be done with a binary search in O(logN).
Find the sum in the table for i that corresponds with i-1.
Add to that the sum in the table for j that corresponds with j-1, minus the sum that corresponds with i.
Add to that k-j.
Let's look again at the same example with the combination of elements [4,7,9]:
i=4 -> table_i[3] = 85
j=7 -> table_j[6] - table_j[4] = 30 - 21 = 9
k=9 -> k-j = 2
rank = 85 + 9 + 2 = 96
Look-up tables for N=3000
This snippet generates the look-up table with the running total of the binomial coefficients for i = 1 to 2998:
function C(n, k) { // binomial coefficient (Pascal's triangle)
if (k < 0 || k > n) return 0;
if (k > n - k) k = n - k;
if (! C.t) C.t = [[1]];
while (C.t.length <= n) {
C.t.push([1]);
var l = C.t.length - 1;
for (var i = 1; i < l / 2; i++)
C.t[l].push(C.t[l - 1][i - 1] + C.t[l - 1][i]);
if (l % 2 == 0)
C.t[l].push(2 * C.t[l - 1][(l - 2) / 2]);
}
return C.t[n][k];
}
for (var total = 0, x = 2999; x > 1; x--) {
total += C(x, 2);
document.write(total + ", ");
}
This snippet generates the look-up table with the running total of the binomial coefficients for j = 2 to 2999:
for (var total = 0, x = 2998; x > 0; x--) {
total += x;
document.write(total + ", ");
}
Code example
Here's a quick code example, unfortunately without the full hardcoded look-up tables, because of the size restriction on answers on SO. Run the snippets above and paste the results into the arrays iTable and jTable (after the leading zeros) to get the faster version with hard-coded look-up tables.
function combinationToRank(i, j, k) {
return iTable[i - 1] + jTable[j - 1] - jTable[i] + k - j;
}
function rankToCombination(rank) {
var i = binarySearch(iTable, rank, 1);
rank -= iTable[i - 1];
rank += jTable[i];
var j = binarySearch(jTable, rank, i + 1);
rank -= jTable[j - 1];
var k = j + rank;
return [i, j, k];
function binarySearch(array, value, first) {
var last = array.length - 1;
while (first < last - 1) {
var middle = Math.floor((last + first) / 2);
if (value > array[middle]) first = middle;
else last = middle;
}
return (value <= array[first]) ? first : last;
}
}
var iTable = [0]; // append look-up table values here
var jTable = [0, 0]; // and here
// remove this part when using hard-coded look-up tables
function C(n,k){if(k<0||k>n)return 0;if(k>n-k)k=n-k;if(!C.t)C.t=[[1]];while(C.t.length<=n){C.t.push([1]);var l=C.t.length-1;for(var i=1;i<l/2;i++)C.t[l].push(C.t[l-1][i-1]+C.t[l-1][i]);if(l%2==0)C.t[l].push(2*C.t[l-1][(l-2)/2])}return C.t[n][k]}
for (var iTotal = 0, jTotal = 0, x = 2999; x > 1; x--) {
iTable.push(iTotal += C(x, 2));
jTable.push(jTotal += x - 1);
}
document.write(combinationToRank(500, 1500, 2500) + "<br>");
document.write(rankToCombination(1893333750) + "<br>");

Need help writing estimates statements in proc genmod

I'm using proc genmod to predict an outcome measured at 4 time points. The outcome is a total score on a mood inventory, which can range from 0 to 82. A lot of participants have a score of 0, so the negative binomial distribution in proc genmod seemed like a good fit for the data.
Now, I'm struggling with how to write/interpret the estimates statements. The primary predictors are TBI status at baseline (0=no/1=yes), and visit (0=baseline, 1=second visit, 2=third visit, 4=fourth visit), and an interaction of TBI status and visit.
How do I write my estimates, such that I'm getting out:
1. the average difference in mood inventory score for person with TBI versus a person without, at baseline.
and
2. the average difference in mood inventory change score for a person with TBI versus a person without, over the 4 study visits?
Below is what I have thus far, but I'm not sure how to interpret the output, also below, if indeed my code is correct.:
proc genmod data = analyze_long_3 ;
class id screen_tbi (param = ref ref = first) ;
model nsi_total = visit_cent screen_tbi screen_tbi*visit_cent /dist=negbin ;
output predicted = predstats;
repeated subject=id /type=cs;
estimate "tbi" intercept 1 visit_cent 0 0 0 0 screen_tbi 1 0 /exp;
estimate "no tbi" intercept 1 visit_cent 0 0 0 0 screen_tbi 0 1 /exp;
estimate 'longitudinal TBI' intercept 1
visit_cent -1 1 1 1
screen_tbi 1 0
screen_tbi*visit_cent 1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0 / exp;
estimate 'longitudinal no TBI ' intercept 1
visit_cent -1 1 1 1
screen_tbi 0 1
screen_tbi*visit_cent 0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1 / exp;
where sample = 1 ;
run;
The first research question is to have the average difference score, at baseline, for person with TBI versus a person without. It can be achieved by the following steps:
1) Get the estimated average log (score) when TBI = yes, and Visit = baseline;
2) Get the estimated average log (score) when TBI = no, and Visit =baseline;
3) 1) – 2) to have the difference in log(score) values
4) Exp[3)] to have the difference as percentage of change in scores
To simplify, let T=TBI levels, and V = Visit Levels. One thing to clarify, in your post, there are 4 visit points, the first as reference; therefore there should be 3 parameters for V, not four.
Taking the example of step 1), let’s try to write the ESTIMATE statement. It is a bit tricky. At first it sounds like this (T=0 and V =0 as reference):
ESTIMATE ‘Overall average’ intercept T 1 V 0 0 0;
But it is wrong. In the above statement, all arguments for V are set to 0. When all arguments are 0, it is the same as taking out V from the statement:
ESTIMATE ‘Overall average’ intercept T 1;
This is not the estimate of average for T=1 at baseline level. Rather, it produces an average for T=1, regardless of visit points, or, an average for all visit levels.
The problem is that the reference is set as V=0. In that case, SAS cannot tell the difference between estimates for the reference level, and the estimates for all levels. Indeed it always estimates the average for all levels. To solve it, the reference has to be set to -1, i.e., T=-1 and V=-1 as reference, such that the statement likes:
ESTIMATE ‘Average of T=1 V=baseline’ intercept T 1 V -1 -1 -1;
Now that SAS understands: fine! the job is to get the average at baseline level, not at all levels.
To make the reference value as -1 instead of 0, in the CLASS statement, the option should be specified as PARAM = EFFECT, not PARAM = REF. That brings another problem: once PARAM is not set as REF, SAS will ignore the user defined references. For example:
CLASS id T (ref=’…’) V (ref=’…’) / PARAM=EFFECT;
The (ref=’…’) is ignored when PARAM=EFFECT. How to let SAS make TBI=No and Visit=baseline as references? Well, SAS automatically takes the last level as the reference. For example, if the variable T is ordered ascendingly, the value -1 comes as the first level, while the value 1 comes as the last level; therefore 1 will be the reference. Conversely, if T is ordered in descending order, the value -1 comes at the end and will be used as the ref. This is achieved by the option ‘DESCENDING’ in the CLASS statement.
CLASS id T V / PARAM=EFFECT DESCENDING;
That way, the parameters are ordered as:
T 1 (TBI =1)
T -1 (ref level of TBI, i.e., TBI=no)
V 1 0 0 (for visit =4)
V 0 1 0 (visit = 3)
V 0 0 1 (visit =2)
V -1 -1 -1 (this is the ref level, visit=baseline)
The above information is reported in the ODS table ‘Class Level Information’. It is always good to check the very table each time after running PROC GENMOD. Note that the level (visit = 4) comes before the level (visit =3), visit =3 coming before visit=2.
Now, let’s talk a bit about the parameters and the model equation. As you might know, in SAS, the V for multi-levels is indeed broken down into dummy Vs. If baseline is set as ref level, the dummies will be like:
V4 = the fourth visit or baseline
V3= the third visit, or baseline
V2 = the second visit or baseline
Accordingly, the equation can be written as:
LOG(s) = b0 + b1*T + b2*V4 + b3*V3 + b4*V2
whereas:
s = the total score on a mood inventory
T = 1 for TBI status of yes, = -1 for TBI status of no
V4 = 1 for the fourth visit, = -1 for baseline
V3 = 1 for the third visit, =-1 for baseline
V2 = 1 for the second visit, = -1 for the baseline
b0 to b4 are beta estimates for the parameters
Of note, the order in the model is the same as the order defined in the statement CLASS, and the same as the order in the ODS table ‘Class Level Information’. The V4, V3, V2 have to appear in the model, all or none, i.e., if the VISIT term is to be included, V4 V3 V2 should be all introduced into the model equation. If the VISIT term is not included, none of V4, V3, and V2 should be in the equation.
With interaction terms, 3 more dummy terms must be created:
T_V4 = T*V4
T_V3 = T*V3
T_V2 = T*V2
Hence the equation with interaction terms:
Log(s) = b0 + b1*T + b2*V4 + b3*V3 + b4*V2 + b5*T_V4 + b6* T_V3 + b7* T_V2
The SAS statement of ‘ESTIMATE’ is correspondent to the model equation.
For example, to estimate an overall average for all parameters and all levels, the equation is:
[Log(S)] = b0 ;
whereas [LOG(S)] stands for the expected LOG(score). Accordingly, the statement is:
ESTIMATE ‘overall (all levels of T and V)’ INTERCEPT;
In the above statement, ‘INTERCEPT’ in the statement is correspondent to ‘b0’ in the equation
To estimate an average of log (score) for T =1, and for all levels of visit points, the equation is
[LOG(S)] = b0 + b1 * T = b0 + b1 * 1
And the statement is
ESTIMATE ‘T=Yes, V= all levels’ INTERCEPT T 1;
In the above case, ‘T 1’ in the statement is correspondent to the part “*1” in the equation (i.e., let T=1)
To estimate an average of log (score) for T =1, and for visit = baseline, the equation is:
[Log(s)] = b0 + b1*T + b2*V4 + b3*V3 + b4*V2
= b0 + b1*(1) + b2*(-1)+ b3*(-1) + b4*(-1)
The statement is:
ESTIMATE ‘T=Yes, V=Baseline’ INTERCEPT T 1 V -1 -1 -1;
‘V -1 -1 -1’ in the statement is correspondent to the values of V4, V3, and V2 in the equation. We’ve mentioned above that the dummies V4 V3 and V2 must be all introduced into the model. That is why for the V term, there are always three numbers, such as ‘V -1 -1 -1’, or ‘V 1 1 1’, etc. SAS will give warning in log if you make it like ‘V -1 -1 -1 -1’, because there are four '-1's, 1 more than required. In that case, the excessive '-1' will be ignored. On the contrary, ‘V 1 1’ is fine. It is the same as ‘V 1 1 0’. But what does 'V 1 1 0' means? To figure it out, you have to read Allison’s book (see reference).
For now, let’s carry on, and add the interaction terms. The equation:
[Log(s)] = b0 + b1*T + b2*V4 + b3*V3 + b4*V2 + b5*T_V4 + b6*T_V3 + b7*T_V2
As T_V4 = T*V4 = 1 * (-1) = -1, similarly T_V3 = -1, T_V2=-1, substitute into the equation:
[Log(s)] = b0 + b1*1 + b2*(-1)+ b3*(-1)+ b4*(-1)+ b5*(-1) + b6*(-1) + b7*(-1)
The statement is:
ESTIMATE ‘(1) T=Yes, V=Baseline, with interaction’ INTERCEPT T 1 V -1 -1 -1 T*V -1 -1 -1;
The ‘T*V -1 -1 -1’ are correspondent to the values of T_V4, T_V3 and T_V2 in the equation.
And that is the statement for step 1)!
Step 2 follows the same thoughts. To get the estimated average log (score) when TBI = no, and Visit =baseline.
T = -1, V4=-1, V3=-1, V2=-1.
T_V4 = T * V4 = (-1) * (-1) = 1
T_V3 = T * V3 = (-1) * (-1) = 1
T_V2 = T * V2 = (-1) * (-1) = 1
Substituting the values in the equation:
[Log(s)] = b0 + b1*1 + b2*(-1)+ b3*(-1)+ b4*(-1)+ b5*(1) + b6*(1) + b7*(1)
Note that the numbers: For T: 1; for V: -1 -1 -1; for interaction terms: 1 1 1
And the SAS statement:
ESTIMATE ‘(2) T=No, V=Baseline, with interaction’ INTERCEPT T 1 V -1 -1 -1 T*V 1 1 1;
The estimate results can be found in the ODS table ‘Contrast Estimate Results’.
For step 3), subtract the estimate (1) – (2), to have the difference of log(score); and for step(4), have the exponent of the diff in step 3).
For the second research question:
The average difference in mood inventory change score for a person with TBI versus a person without, over the 4 study visits.
Over the 4 study visits means for all visit levels. By now, you might have known that the statement is simpler:
ESTIMATE ‘(1) T=Yes, V=all levels’ INTERCEPT T 1;
ESTIMATE ‘(2) T=Yes, V=all levels’ INTERCEPT T -1;
Why there are no interaction terms? Because all visit levels are considered. And when all levels are considered, you do not have to put any visit-related terms into the statement.
Finally, the above approach requires some manual calculation. Indeed it is possible to make one single line of ESTIMATE statement that is equivalent to the aforementioned approach. However, the method we discussed above is way easier to understand. For more sophisticated methods, please read Allison’s book.
Reference:
1. Allison, Paul D. Logistic Regression Using SAS®: Theory and Application, Second Edition. Copyright © 2012, SAS Institute Inc.,Cary, North Carolina, USA.

Histogram of the distribution of dice rolls

I saw a question on careercup, but I do not get the answer I want there. I wrote an answer myself and want your comment on my analysis of time complexity and comment on the algorithm and code. Or you could provide a better algorithm in terms of time. Thanks.
You are given d > 0 fair dice with n > 0 "sides", write an function that returns a histogram of the frequency of the result of dice rolls.
For example, for 2 dice, each with 3 sides, the results are:
(1, 1) -> 2
(1, 2) -> 3
(1, 3) -> 4
(2, 1) -> 3
(2, 2) -> 4
(2, 3) -> 5
(3, 1) -> 4
(3, 2) -> 5
(3, 3) -> 6
And the function should return:
2: 1
3: 2
4: 3
5: 2
6: 1
(my sol). The time complexity if you use a brute force depth first search is O(n^d). However, you can use the DP idea to solve this problem. For example, d=3 and n=3. You can use the result of d==1 when computing d==2:
d==1
num #
1 1
2 1
3 1
d==2
first roll second roll is 1
num # num #
1 1 2 1
2 1 -> 3 1
3 1 4 1
first roll second roll is 2
num # num #
1 1 3 1
2 1 -> 4 1
3 1 5 1
first roll second roll is 3
num # num #
1 1 4 1
2 1 -> 5 1
3 1 6 1
Therefore,
second roll
num #
2 1
3 2
4 3
5 2
6 1
The time complexity of this DP algorithm is
SUM_i(1:d) {n*[n(d-1)-(d-1)+1]} ~ O(n^2*d^2)
~~~~~~~~~~~~~~~ <--eg. d=2, n=3, range from 2~6
The code is written in C++ as follows
vector<pair<int,long long>> diceHisto(int numSide, int numDice) {
int n = numSide*numDice;
vector<long long> cur(n+1,0), nxt(n+1,0);
for(int i=1; i<=numSide; i++) cur[i]=1;
for(int i=2; i<=numDice; i++) {
int start = i-1, end = (i-1)*numSide; // range of previous sum of rolls
//cout<<"start="<<start<<" end="<<end<<endl;
for(int j=1; j<=numSide; j++) {
for(int k=start; k<=end; k++)
nxt[k+j] += cur[k];
}
swap(cur,nxt);
for(int j=start; j<=end; j++) nxt[j]=0;
}
vector<pair<int,long long>> result;
for(int i=numDice; i<=numSide*numDice; i++)
result.push_back({i,cur[i]});
return result;
}
You can do it in O(n*d^2). First, note that the generating function for an n-sided dice is p(n) = x+x^2+x^3+...+x^n, and that the distribution for d throws has generating function p(n)^d. Representing the polynomials as arrays, you need O(nd) coefficients, and multiplying by p(n) can be done in a single pass in O(nd) time by keeping a rolling sum.
Here's some python code that implements this. It has one non-obvious optimisation: it throws out a factor x from each p(n) (or equivalently, it treats the dice as having faces 0,1,2,...,n-1 rather than 1,2,3,...,n) which is why d is added back in when showing the distribution.
def dice(n, d):
r = [1] + [0] * (n-1) * d
nr = [0] * len(r)
for k in xrange(d):
t = 0
for i in xrange(len(r)):
t += r[i]
if i >= n:
t -= r[i-n]
nr[i] = t
r, nr = nr, r
return r
def show_dist(n, d):
for i, k in enumerate(dice(n, d)):
if k: print i + d, k
show_dist(6, 3)
The time and space complexity are easy to see: there's nested loops with d and (n-1)*d iterations so the time complexity is O(n.d^2), and there's two arrays of size O(nd) and no other allocation, so the space complexity is O(nd).
Just in case, here a simple example in Python using the OpenTurns platform.
import openturns as ot
d = 2 # number of dice
n = 6 # number of sides per die
# possible values
dice_distribution = ot.UserDefined([[i] for i in range(1, n + 1)])
# create the sum distribution d times the sum
sum_distribution = sum([dice_distribution] * d)
That's it!
print(sum_distribution)
will show you all the possible values and their corresponding probabilities:
>>> UserDefined(
{x = [2], p = 0.0277778},
{x = [3], p = 0.0555556},
{x = [4], p = 0.0833333},
{x = [5], p = 0.111111},
{x = [6], p = 0.138889},
{x = [7], p = 0.166667},
{x = [8], p = 0.138889},
{x = [9], p = 0.111111},
{x = [10], p = 0.0833333},
{x = [11], p = 0.0555556},
{x = [12], p = 0.0277778}
)
You can also draw the probability distribution function:
sum_distribution.drawPDF()

need algorithm to find the nth palindromic number

consider that
0 -- is the first
1 -- is the second
2 -- is the third
.....
9 -- is the 10th
11 -- is the 11th
what is an efficient algorithm to find the nth palindromic number?
I'm assuming that 0110 is not a palindrome, as it is 110.
I could spend a lot of words on describing, but this table should be enough:
#Digits #Pal. Notes
0 1 "0" only
1 9 x with x = 1..9
2 9 xx with x = 1..9
3 90 xyx with xy = 10..99 (in other words: x = 1..9, y = 0..9)
4 90 xyyx with xy = 10..99
5 900 xyzyx with xyz = 100..999
6 900 and so on...
The (nonzero) palindromes with even number of digits start at p(11) = 11, p(110) = 1001, p(1100) = 100'001,.... They are constructed by taking the index n - 10^L, where L=floor(log10(n)), and append the reversal of this number: p(1101) = 101|101, p(1102) = 102|201, ..., p(1999) = 999|999, etc. This case must be considered for indices n >= 1.1*10^L but n < 2*10^L.
When n >= 2*10^L, we get the palindromes with odd number of digits, which start with p(2) = 1, p(20) = 101, p(200) = 10001 etc., and can be constructed the same way, using again n - 10^L with L=floor(log10(n)), and appending the reversal of that number, now without its last digit: p(21) = 11|1, p(22) = 12|1, ..., p(99) = 89|8, ....
When n < 1.1*10^L, subtract 1 from L to be in the correct setting with n >= 2*10^L for the case of an odd number of digits.
This yields the simple algorithm:
p(n) = { L = logint(n,10);
P = 10^(L - [1 < n < 1.1*10^L]); /* avoid exponent -1 for n=1 */
n -= P;
RETURN( n * 10^L + reverse( n \ 10^[n >= P] ))
}
where [...] is 1 if ... is true, 0 else, and \ is integer division.
(The expression n \ 10^[...] is equivalent to: if ... then n\10 else n.)
(I added the condition n > 1 in the exponent to avoid P = 10^(-1) for n=0. If you use integer types, you don't need this. Another choice it to put max(...,0) as exponent in P, or use if n=1 then return(0) right at the start. Also notice that you don't need L after assigning P, so you could use the same variable for both.)