I'm trying to run this model in Cplex/OPL to find the best path given distances of shops, and quantities + prices in each shop for different products.
The problem is I'm getting results of paths that are disjointed - "islands".
What constraint do i need to add to get only one closed path that begins and ends at node 1?
this is the model:
dvar float+ x[products,stores];
dvar boolean y[stores,stores];
minimize ((sum(i in products, j in stores) prices[i,j]*x[i,j]) + (sum(j in stores, k
in stores) gas*distance[j,k]*y[j,k]));
subject to {
sum(k in stores) y[1,k] == 1;
sum(j in stores) y[j,1] == 1;
forall(j in stores)
sum(i in products) x[i,j] <= M*sum(k in stores) y[j,k];
forall(j in stores)
sum(i in products) x[i,j] <= M*sum(k in stores) y[k,j];
forall(i in products) sum(j in stores) x[i,j] == demand[i];
forall(i in products, j in stores) x[i,j] <= quantity[i,j];
forall(j in stores, k in stores) y[j,k] + y[k,j] <= 1;
forall(j in stores) sum(k in stores) y[j,k] == sum(k in stores) y[k,j];
}
Thanks!
What you are solving is a variation of the traveling salesman problem. Specifically, what you are getting are called subtours, which is a closed path that involves fewer than all of the nodes (shops in your case.)
There are an exponential number of subtours for a given set of nodes. A huge body of literature exists around subtour elimination constraints. Fortunately, for smallish problems in practice, we can get away with adding subtour elimination constraints on an as needed basis.
Subtour Elimination
Here's the idea for how to eliminate a subtour S that involves s shops (s < num_stores):
In English: We start off by partitioning the set of nodes (shops) into two groups S and T. Let S be the set of shops in the subtour. Let T be the set of shops outside of S, i.e. all the other shops. We want to break the single-loop path which involves just the shops in S.
Pseudocode
Do while:
Solve the current problem
If you don't find any subtours, you are done. Exit.
If there are subtours, add the subtour elimination constraints (see details below)
Continue
Implementing a set of constraints to eliminate current subtour
For each subtour ("island") you have to first create a set of shops_in_subtour. All the other nodes (shops) go into the other set (T) shops_not_in_subtour
Add the following to the your constraints set:
forall(s in shops_in_subtour)
forall(t in shops_not_in_subtour)
sum y[s,t] > = 1; // at least one edge must leave S and go to T
sum y[t,s] > = 1; // at least one edge must enter the set S from T
If your problem is small, you will see that adding a few of these sets of constraints will suffice. Hope that helps you move forward.
Update based on OP's follow-up question
How to Detect Subtours
You will check of the existence of subtours outside of CPLEX/Solver
Idea in English: You start from the origin shop and traverse the path, keeping track of each visited shop. If you come back to the origin, and there are still some binary Y variables left, you have one or more subtours. (Technically, you can start from any shop, but starting from one is easier to understand.)
Initialize visited_shop = NULL;
Find the Yij that is 1. (Starting edge for the path)
Do while (There are more Y variables = 1 remaining)
Add the destination of the new binary variable to list of visited_shops
Check if you are done (are you back at shop 1):
if Yes:
If no more binary variables left, you are done. Exit.
If there are some more non-zero binary variables, subtour detected
if No:
pick the next Y variable whose origin is current Y_variable's destination
Continue
Related
Hi I'm trying to implement tANS in a compute shader, but I am confused about the size of the state set. Also apologies but my account is too new to embed pictures of latex formatted equations.
Imagine we have a symbol frame S comprised of symbols s₁ to sₙ:
S = {s₁, s₂, s₁, s₂, ..., sₙ}
|S| = 2ᵏ
and the probability of each symbol is
pₛₙ = frequency(sₙ) / |S|
∑ pₛ₁ + pₛ₂ + ... pₛₙ = 1
According to Jarek Duda's slides (which can be found here) the first step in constructing the encoding function is to calculate the number of states L:
L = |S|
so that we can create a set of states
𝕃 = {L, ..., 2L - 1}
from which we can construct the encoding table from. In our example, this is simple L = |S| = 2^k. However, we don't want L to necessarily equal |S| because |S| could be enormous, and constructing an encoding table corresponding to size |S| would be counterproductive to compression. Jarek's solution is to create a quantization function so that we can choose an
L : L < |S|
which approximates the symbol probabilities
Lₛ / L ≈ pₛₙ
However as L decreases, the quality of the compression decreases, so I have two questions:
How small can we make L while still achieving compression?
What is a "good" way of determining the size of L for a given |S|?
In Jarek's ANS toolkit he uses the depth of a Huffman tree created from S to get the size of L, but this seems like a lot of work when we already know the upper bound of L (|S|; as I understand it when L = |S| we are at the Shannon entropy; thus making L > |S| would not increase compression). Instead it seems like it would be faster to choose an L that is both less than |S| and above some minimum L. A "good" size of L therefore would achieve some amount of compression, but more importantly would be easy to calculate. However we would need to determine the minimum L. Based on the pictures of sample ANS tables it seems like the minimum size of L could be the frequency of the most probable symbol, but I don't know enough about ANS to confirm this.
After mulling it over for awhile, both questions have very simple answers. The smallest L that still achieves lossless compression is L = |A|, where A is the alphabet of symbols to be encoded(I apologize, the lossless criterion should have been included in the original question). If L < |A| then we are pigeonholing symbols, thus losing information. When L = |A| what we essentially have is a fixed length variable code, where each symbol has an equal probability weighting in our encoding table. The answer to the second part is even more simple now that we know the answer to the first question. L can be pretty much whatever you want so long as its greater than the size of the alphabet to be encoded. Usually we want L to be a power of two for computational efficiency and then we want L to be greater than |A| to achieve better compression, so a very common L size is 2 times the greatest power of two equal to or greater than the size of the alphabet. This can easily be found by something like this:
int alphabetSize = SizeOfAlphabet();
int L = pow(2, ceil(log(alphabetSize, 2)) + 1);
I'm trying to find an efficient algorithm for determining whether two convex hulls intersect or not. The hulls consist of data points in N-dimensional space, where N is 3 up to 10 or so. One elegant algorithm was suggested here using linprog from scipy, but you have to loop over all points in one hull, and it turns out the algorithm is very slow for low dimensions (I tried it and so did one of the respondents). It seems to me the algorithm could be generalized to answer the question I am posting here, and I found what I think is a solution here. The authors say that the general linear programming problem takes the form Ax + tp >= 1, where the A matrix contains the points of both hulls, t is some constant >= 0, and p = [1,1,1,1...1] (it's equivalent to finding a solution to Ax > 0 for some x). As I am new to linprog() it isn't clear to me whether it can handle problems of this form. If A_ub is defined as on page 1 of the paper, then what is b_ub?
There is a nice explanation of how to do this problem, with an algorithm in R, on this website. My original post referred to the scipy.optimize.linprog library, but this proved to be insufficiently robust. I found that the SCS algorithm in the cvxpy library worked very nicely, and based on this I came up with the following python code:
import numpy as np
import cvxpy as cvxpy
# Determine feasibility of Ax <= b
# cloud1 and cloud2 should be numpy.ndarrays
def clouds_overlap(cloud1, cloud2):
# build the A matrix
cloud12 = np.vstack((-cloud1, cloud2))
vec_ones = np.r_[np.ones((len(cloud1),1)), -np.ones((len(cloud2),1))]
A = np.r_['1', cloud12, vec_ones]
# make b vector
ntot = len(cloud1) + len(cloud2)
b = -np.ones(ntot)
# define the x variable and the equation to be solved
x = cvxpy.Variable(A.shape[1])
constraints = [A*x <= b]
# since we're only determining feasibility there is no minimization
# so just set the objective function to a constant
obj = cvxpy.Minimize(0)
# SCS was the most accurate/robust of the non-commercial solvers
# for my application
problem = cvxpy.Problem(obj, constraints)
problem.solve(solver=cvxpy.SCS)
# Any 'inaccurate' status indicates ambiguity, so you can
# return True or False as you please
if problem.status == 'infeasible' or problem.status.endswith('inaccurate'):
return True
else:
return False
cube = np.array([[1,1,1],[1,1,-1],[1,-1,1],[1,-1,-1],[-1,1,1],[-1,1,-1],[-1,-1,1],[-1,-1,-1]])
inside = np.array([[0.49,0.0,0.0]])
outside = np.array([[1.01,0,0]])
print("Clouds overlap?", clouds_overlap(cube, inside))
print("Clouds overlap?", clouds_overlap(cube, outside))
# Clouds overlap? True
# Clouds overlap? False
The area of numerical instability is when the two clouds just touch, or are arbitrarily close to touching such that it isn't possible to definitively say whether they overlap or not. That is one of the cases where you will see this algorithm report an 'inaccurate' status. In my code I chose to consider such cases overlapping, but since it is ambiguous you can decide for yourself what to do.
I have a Pyomo model connected to a Django-created website.
My decision variable has 4 indices and I have a huge amount of constraints running on it.
Since Pyomo takes a ton of time to read in the constraints with so many variables, I want to sparse out the index set to only contain variables that actually could be 1 (i have some conditions on that)
I saw this post
Create a variable with sparse index in pyomo
and tried a for loop for all my conditions. I created a set "AllowedVariables" to later put this inside my constraints.
But Django's server takes so long to create this set while performing the system check, it never comes out.
Currently i have this model:
model = AbstractModel()
model.x = Var(model.K, model.L, model.F, model.Z, domain=Boolean)
def ObjRule(model):
# some rule, sense maximize
model.Obj = pyomo.environ.Objective(rule=ObjRule, sense=maximize)
def ARule(model,l):
maxA = sum(model.x[k,l,f,z] * for k in model.K for f in model.F
for z in model.Z and (k,l,f,z) in model.AllowedVariables)
return maxA <= 1
model.maxA = Constraint(model.L, rule=ARule)
The constraint is exemplary, I have 15 more similar ones. I currently create "AllowedVariables" this way:
AllowedVariables = []
for k in model.K:
for l in model.L:
..... check all sorts of conditions, break if not valid
AllowedVaraibles.append((k,l,f,z))
model.AllowedVariables = Set(initialize=AllowedVariables)
Using this, the Django server starts checking....and never stops
performing system checks...
Sadly, I somehow need some restriction on the variables or else the reading for the solver will take way to long since the constraints contain so many unnecessary variables that have to be 0 anyways.
Any ideas on how I can sparse my variable set?
Problem =====>
Basically there are three .rrd which are generated for three departments.
From that we fetch three values (MIN, MAX, CURRENT) and print ins 3x3 format. There is a python script which does that.
eg -
Dept1: Min=10 Max=20 Cur=15
Dept2: Min=0 Max=10 Cur=5
Dept3: Min=10 Max=30 Cur=25
Now I want to add the values together (Min, Max, Cur) and print in one line.
eg -
Dept: Min=20 Max=60 Cur=45
Issue I am facing =====>
No matter what CDEF i write, I am breaking the graph. :(
This is the part I hate as i do not get any error message.
As far as I understand(please correct me if i am wrong) I definitely cannot store the value anywhere in my program as a graph is returned.
What would be a proper way to add the values in this condition.
Please let me know if my describing the problem is lacking more detail.
You can do this with a VDEF over a CDEF'd sum.
DEF:a=dept1.rrd:ds0:AVERAGE
DEF:b=dept2.rrd:ds0:AVERAGE
DEF:maxa=dept1.rrd:ds0:MAXIMUM
DEF:maxb=dept2.rrd:ds0:MAXIMUM
CDEF:maxall=maxa,maxb,+
CDEF:all=a,b,+
VDEF:maxalltime=maxall,MAXIMUM
VDEF:alltimeavg=all,AVERAGE
PRINT:maxalltime:Max=%f
PRINT:alltimeavg:Avg=%f
LINE:all#ff0000:AllDepartments
However, you should note that, apart form at the highest granularity, the Min and Max totals will be wrong! This is because max(a+b) != max(a) + max(b). If you dont calculate the min/max aggregate at time of storage, the granularity will be gone at time of display.
For example, if a = (1, 2, 3) and b = (3, 2, 1), then max(a) + max(b) = 6; however the maximum at any point in time is in fact 4. The same issue applies to using min(a) + min(b).
I am using the following function to calculate the t-stat for data in data frame (x):
wilcox.test.all.genes<-function(x,s1,s2) {
x1<-x[s1]
x2<-x[s2]
x1<-as.numeric(x1)
x2<-as.numeric(x2)
wilcox.out<-wilcox.test(x1,x2,exact=F,alternative="two.sided",correct=T)
out<-as.numeric(wilcox.out$statistic)
return(out)
}
I need to write a for loop that will iterate a specific number of times. For each iteration, the columns need to be shuffled, the above function performed and the maximum t-stat value saved to a list.
I know that I can use the sample() function to shuffle the columns of the data frame, and the max() function to identify the maximum t-stat value, but I can't figure out how to put them together to achieve a workable code.
You are trying to generate empiric p-values, corrected for the multiple comparisons you are making because of the multiple columns in your data. First, let's simulate an example data set:
# Simulate data
n.row = 100
n.col = 10
set.seed(12345)
group = factor(sample(2, n.row, replace=T))
data = data.frame(matrix(rnorm(n.row*n.col), nrow=n.row))
Calculate the Wilcoxon test for each column, but we will replicate this many times while permuting the class membership of the observations. This gives us an empiric null distribution of this test statistic.
# Re-calculate columnwise test statisitics many times while permuting class labels
perms = replicate(500, apply(data[sample(nrow(data)), ], 2, function(x) wilcox.test(x[group==1], x[group==2], exact=F, alternative="two.sided", correct=T)$stat))
Calculate the null distribution of the maximum test statistic by collapsing across the multiple comparisons.
# For each permuted replication, calculate the max test statistic across the multiple comparisons
perms.max = apply(perms, 2, max)
By simply sorting the results, we can now determine the p=0.05 critical value.
# Identify critical value
crit = sort(perms.max)[round((1-0.05)*length(perms.max))]
We can also plot our distribution along with the critical value.
# Plot
dev.new(width=4, height=4)
hist(perms.max)
abline(v=crit, col='red')
Finally, comparing a real test statistic to this distribution will give you an empiric p-value, corrected for multiple comparisons by controlling the family-wise error to p<0.05. For example, let's pretend a real test stat was 1600. We could then calculate the p-value like:
> length(which(perms.max>1600))/length(perms.max)
[1] 0.074