How to get constraint matrix after presolving in PySCIPOpt - linear-programming

I have a pyscipopt.Model variable model, encoding an integer program with linear constraints.
I want to evaluate some solution on all rows of the constraint matrix. The only way I thought of is to use model.getSolVal() to get solution values on modified variables, and compute the dot product with the rows of the modified constraint matrix manually.
The following snippet extracts the nonzero coefficients of a constraint in the model:
constr = model.getConss()[0]
coeff_dict = model.getValsLinear(constr)
It runs fine before presolving, but after presolving (or just optimizing) I get the following error
Warning: 'coefficients not available for constraints of type ', 'logicor'.
My current solution is to disable presolving completely, in which case the variables aren't modified. Can I avoid that?

I am assuming you want to do something with the row activities and not just check whether your solution is feasible?
getValsLinear is only usable for linear constraints. In presolving SCIP upgrades linear constraints to some more specialized constraint types (in your case logicor).
There exists a function in SCIP called SCIPgetConsVals that does what you want getValsLinear to do (it gets you the values for all constraints that have a linear representation). However, that function is not wrapped in PySCIPopt yet.
You can easily wrap that function yourself (and even head over to https://github.com/scipopt/PySCIPOpt and create a Pull-request).
The other option is to read a settings file that forbids the linear constraints from being upgraded.
constraints/linear/upgrade/logicor = FALSE
constraints/linear/upgrade/indicator = FALSE
constraints/linear/upgrade/knapsack = FALSE
constraints/linear/upgrade/setppc = FALSE
constraints/linear/upgrade/xor = FALSE
constraints/linear/upgrade/varbound = FALSE
would be the settings you need. That way you still have presolving just without constraint upgrades.

Related

Must a PyOMO's model objective function return a PyOMO Var object?

I have an objective function in a model that returns a regular float which must be minimized.
But when I try to solve it I get the following Warning and the solver doesn't search for a solution:
WARNING: Constant objective detected, replacing with a placeholder to prevent
solver failure.
I also get a second Warning:
WARNING: Empty constraint block written in LP format - solver may error
Do I have to add contraints to the model? I already set bounds on the parameters to be optimized.

Gurobi shadow price of a variable without generating separate constraint

Gurobi 9.0.0 // C++
I am trying to get the shadow price of variables without explicitly generating a constraint for them.
I am generating variables the following way:
GRBModel* milp_model
milp_model->addVar(lb, up, type, 0, name)
Now I would like to get the shadow price (dual) for these variables.
I found this article which says that for "a linear program with lower and upper bounds on a variable, i.e., l ≤ x ≤ u" [...] "Gurobi gives access to the reduced cost of x, which corresponds to sl+su".
To get the shadow price of a constraint one would use the GRB functions according to the following answer (python but same idea) using the Pi constraint attribute.
What would be the GRB function that returns the previously mentioned reduced cost of x / shadow price of a variable?
I tried gurobi_var.get(GRB_DoubleAttr_Pi) which works for gurobi_constr.get(GRB_DoubleAttr_Pi)
but it returns: Not right attribute. Error code = 10003
Can anyone help me with this?
I suppose you are referring to the reduced costs of the variables. You can get them via the variable attribute RC as explained here. And then you need to figure out whether these dual values are corresponding to the upper or lower bound as discussed here.

About autograd in pyorch, Adding new user-defined layers, how should I make its parameters update?

everyone !
My demand is a optical-flow-generating problem. I have two raw images and a optical flow data as ground truth, now my algorithm is to generate optical flow using raw images, and the euclidean distance between generating optical flow and ground truth could be defined as a loss value, so it can implement a backpropagation to update parameters.
I take it as a regression problem, and I have to ideas now:
I can set every parameters as (required_grad = true), and compute a loss, then I can loss.backward() to acquire the gradient, but I don’t know how to add these parameters in optimizer to update those.
I write my algorithm as a model. If I design a “custom” model, I can initilize several layers such as nn.Con2d(), nn.Linear() in def init() and I can update parameters in methods like (torch.optim.Adam(model.parameters())), but if I define new layers by myself, how should I add this layer’s parameters in updating parameter collection???
This problem has confused me several days. Are there any good methods to update user-defined parameters? I would be very grateful if you could give me some advice!
Tensor values have their gradients calculated if they
Have requires_grad == True
Are used to compute some value (usually loss) on which you call .backward().
The gradients will then be accumulated in their .grad parameter. You can manually use them in order to perform arbitrary computation (including optimization). The predefined optimizers accept an iterable of parameters and model.parameters() does just that - it returns an iterable of parameters. If you have some custom "free-floating" parameters you can pass them as
my_params = [my_param_1, my_param_2]
optim = torch.optim.Adam(my_params)
and you can also merge them with the other parameter iterables like below:
model_params = list(model.parameters())
my_params = [my_param_1, my_param_2]
optim = torch.optim.Adam(model_params + my_params)
In practice however, you can usually structure your code to avoid that. There's the nn.Parameter class which wraps tensors. All subclasses of nn.Module have their __setattr__ overridden so that whenever you assign an instance of nn.Parameter as its property, it will become a part of Module's .parameters() iterable. In other words
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.my_param_1 = nn.Parameter(torch.tensor(...))
self.my_param_2 = nn.Parameter(torch.tensor(...))
will allow you to write
module = MyModule()
optim = torch.optim.Adam(module.parameters())
and have the optim update module.my_param_1 and module.my_param_2. This is the preferred way to go, since it helps keep your code more structured
You won't have to manually include all your parameters when creating the optimizer
You can call module.zero_grad() and zero out the gradient on all its children nn.Parameters.
You can call methods such as module.cuda() or module.double() which, again, work on all children nn.Parameters instead of requiring to manually iterate through them.

Ordering by sum of difference

I have a model that has one attribute with a list of floats:
values = ArrayField(models.FloatField(default=0), default=list, size=64, verbose_name=_('Values'))
Currently, I'm getting my entries and order them according to the sum of all diffs with another list:
def diff(l1, l2):
return sum([abs(v1-v2) for v1, v2 in zip(l1, l2)])
list2 = [0.3, 0, 1, 0.5]
entries = Model.objects.all()
entries.sort(key=lambda t: diff(t.values, list2))
This works fast if my numer of entries is very slow small. But I'm afraid with a large number of entries, the comparison and sorting of all the entries will get slow since they have to be loaded from the database. Is there a way to make this more efficient?
best way is to write it yourself, right now you are iterating over a list over 4 times!
although this approach looks pretty but it's not good.
one thing that you can do is:
have a variable called last_diff and set it to 0
iterate through all entries.
iterate though each entry.values
from i = 0 to the end of list, calculate abs(entry.values[i]-list2[i])
sum over these values in a variable called new_diff
if new_diff > last_diff break from inner loop and push the entry into its right place (it's called Insertion Sort, check it out!)
in this way, in average scenario, time complexity is much lower than what you are doing now!
and maybe you must be creative too. I'm gonna share some ideas, check them for yourself to make sure that they are fine.
assuming that:
values list elements are always positive floats.
list2 is always the same for all entries.
then you may be able to say, the bigger the sum over the elements in values, the bigger the diff value is gonna be, no matter what are the elements in list2.
then you might be able to just forget about whole diff function. (test this!)
The only way to makes this really go faster, is to move as much work as possible to the database, i.e. the calculations and the sorting. It wasn't easy, but with the help of this answer I managed to actually write a query for that in almost pure Django:
class Unnest(models.Func):
function = 'UNNEST'
class Abs(models.Func):
function = 'ABS'
class SubquerySum(models.Subquery):
template = '(SELECT sum(%(field)s) FROM (%(subquery)s) _sum)'
x = [0.3, 0, 1, 0.5]
pairdiffs = Model.objects.filter(pk=models.OuterRef('pk')).annotate(
pairdiff=Abs(Unnest('values')-Unnest(models.Value(x, ArrayField(models.FloatField())))),
).values('pairdiff')
entries = Model.objects.all().annotate(
diff=SubquerySum(pairdiffs, field='pairdiff')
).order_by('diff')
The unnest function turns each element of the values into a row. In this case it happens twice, but the two resulting columns are instantly subtracted and made positive. Still, there are as many rows per pk as there are values. These need to be summed, but that's not as easy as it sounds. The column can't be simply be aggregated. This was by far the most tricky part—even after fiddling with it for so long, I still don't quite understand why Postgres needs this indirection. Of the few options there are to make it work, I believe a subquery is the single one expressible in Django (and only as of 1.11).
Note that the above behaves exactly the same as with zip, i.e. the when one array is longer than the other, the remainder is ignored.
Further improvements
While it will be a lot faster already when you don't have to retrieve all rows anymore and loop over them in Python, it doesn't change yet that it results in a full table scan. All rows will have to be processed, every single time. You can do better, though. Have a look into the cube extension. Use it to calculate the L1 distance—at least, that seems what you're calculating—directly with the <#> operator. That will require the use of RawSQL or a custom Expression. Then add a GiST index on the SQL expression cube("values"), or directly on the field if you're able to change the type from float[] to cube. In case of the latter, you might have to implement your own CubeField too—I haven't found any package yet that provides it. In any case, with all that in place, top-N queries on the lowest distance will be fully indexed hence blazing fast.

Weka Resample to balance instances in binary dataset

I've only been using Weka for a couple of weeks but I am absolutely blown away by how great it is!
But I have a question, I have a dataset with a target column which is either True or False.
6709 instances in my dataset are True
25318 instances are False.
I want to randomly add duplicates of my True instances to produce a new dataset with 25318 True and 25318 False.
The only filter I can find which does this is the supervised Resample filter however I am having trouble understanding what parameters I should use.
(there might be a better filter to do what I want)
I've got some success with these parameters
biasToUniformClass = 1.0
invertSelection = False
noReplacement = False
randomSeed = 1
sampleSizePercent = 157.5 (a magic number I've arrived at by trial and error)
This produces 25277 True and 25165 False. Not exactly what I want, but quite close.
The problem is that I cant figure out how to arrive at the magic number. I'm also not getting exactly the numbers of instances that I really want.
Is there a better filter for this purpose?
If not, is there a way to calculate the sampleSizePercent magic number?
Any help is greatly appreciated :)
Supplemental question, am I best to run NominalToBinary on my boolean columns to ensure they are Binary? I'm using a NaiveBayes classifier (at the moment) and I don't have any missing instances.
Jason
I think the tricky part of this question is getting a perfect balance using the Resample Filter. This is because, as it is stated in the description, it 'Produces a random sub-sample of a dataset using either sampling with replacement or without replacement'. If these cases are being drawn randomly, there is no guarantee that you will get an equal measure between the two classes.
As for the magic number, this would be associated with the total number of cases that you would like to have when the filter is applied. In your case, it would be 50636 instead of 32027. In this case, your magic number would be 50636 / 32027 = 1.581. However, as stated above, you may not get an exact match of true and false cases.
If you really need an exact figure, you could use your favourite spreadsheet and preprocess the data. One possible method is to randomise the true cases (in a separate column), sort and copy all of the cases until the number matches the false one. It's not an automated solution, and the solution is outside of Weka, but I have used this method before and does the job reasonably quickly.
Hope this Helps!