Is there a simple way to reduce the value of positive slack variables in MILP? - linear-programming

Recently, I have been learning optimization, and my optimization problem, (minimization), is encoded in a MILP solver which tells me it's infeasible for my model. Hence, I introduced a few positive/negative slack variables.
Now, I get a feasible solution, but the positive slack variables are way bigger than what I can accept.
So, I gave penalties/weights to those variables (multiplied by large numbers), hoping that the MILP solver would reduce the variables, but that didn't work (I got the same solution)
Is there any approach to follow, in general, when the slack is too large?
Is there a better way to pick the slack variables, in general?

A common pitfall for people new to mathematical programming/optimization is that variables are non-negative by default, that is, they always have an implied lower bound of 0. Your mathematical model may not specify this explicitly, so those variables might need to be declared as free (with a lower bound of -infinity).
In general, you should double-check your model (as LP file) and compare it to the mathematical formulation.

Add both to the objective with a penalty coefficient.
Or add some upper bounds to the slacks.

Related

Is there a way to quantify impact of independent variables with gradient boosting?

I've been asked to run a model using gradient boosting or random forest. So far so good, however, the only output that comes back in terms of variable importance is based on the number of times a variable was used as a branch rule. I've now been asked to basically get coefficients or somehow quantify the impact that the variables have on the target.
Is there a way to do this with a gradient boosting model? My other thoughts were to either use only the variables that were showed to be sued as branch rules in a regular decision tree or in a GLM or regular regression model.
Any help or ides would be appreciated!! Thanks so much!
Just to make certain there is not a misunderstanding: SAS implementation of decision tree/gradient boosting (at least in EM) uses Split-Based variable Importance.
Split-Based Importance does NOT count the number splits made.
It is the ratio of the reduction of sum-of-squares by one variable (specific the sum over all splits by this variable) in relation to the reduction of sum-of-squares achieved by all splits in the model.
If you are using surrogate rules, highly correlated variables will receive roughly the same value.

How can I choose the right numerical solution from NEQNF?

I'm using a function (NEQNF manual page here) which I call using
call neqnf(SYSTEM_OF_EQUATIONS, x, xguess=x_GUESS, itmax = 10000)
where SYSTEM_OF_EQUATIONS is the subroutine that contains equations
f(1)=...x(2)...x(1)...
f(2)=...x(1)...x(4)...
f(3)=...x(3)...x(4)...
f(4)=...x(1)...x(5)...
f(5)=...x(1)...x(5)...
from IMSL libraries on Fortran that lets me to solve a non-linear system with five unknowns in five equations. Because there exists more than one solution (couple of five numbers, real or complex, that solve my system), how can I choose which couple to "use" as solution?
I link an online solver with already entered a piece of my system (only two unknowns in two equations, other variables are constant in this example) which easily show you that there exists more than one solution.
example
To conclude my issue I can say that I have to choose the couple of variables which let other variables to be positive, so an easy check is the way to choose the couple.
I don't think the question has anything to do with programming, but I will show how I understand the problem.
You supply an initial guess. Then the method just converges to some solution by a modification of a Newton method.
You can choose the root by the placement of the initial guess. However, the convergence pattern can be very unpredictable (even fractal - https://en.wikipedia.org/wiki/Newton_fractal ) and it may be very difficult to choose the particular root using the initial guess.

About optimized math functions, ranges and intervals

I'm trying to wrap my head around how people that code math functions for games and rendering engines can use an optimized math function in an efficient way; let me explain that further.
There is an high need for fast trigonometric functions in those fields, at times you can optimize a sin, a cos or other functions by rewriting them in a different form that is valid only for a given interval, often times this means that your approximation of f(x) is just for the first quadrant, meaning 0 <= x <= pi/2 .
Now the input for your f(x) is still about all 4 quadrants, but the real formula only covers 1/4 of that interval, the straightforward solution is to detect the quadrant by analyzing the input and see in which range it belongs to, then you adjust the result of the formula accordingly if the input comes from a quadrant that is not the first quadrant .
This is good in theory but this also presents a couple of really bad problems, especially considering the fact that you are doing all this to steal a couple of cycles from your CPU ( you also get a consistent implementation, that is not platform dependent like an hardcoded fsin in Intel x86 asm that only works on x86 and has a certain error range, all of this may differ on other platforms with other asm instructions ), so you should keep things working at a concurrent and high performance level .
The reason I can't wrap my head around the "switch case" with quadrants solution is:
it just prevents possible optimizations, namely memoization, considering that you usually want to put that switch-case inside the same functions that actually computes the f(x), probably the situation can be improved by implementing the formula for f(x) outside said function, but this is will lead to a doubling in the number of functions to maintain for any given math library
increase probability of more branching with a concurrent execution
generally speaking doesn't lead to better, clean, dry code, and conditional statements are often times a potential source of bugs, I don't really like switch-cases and similar things .
Assuming that I can implement my cross-platform f(x) in C or C++, how the programmers in this field usually address the problem of translating and mapping the inputs, the quadrants to the result via the actual implementation ?
Note: In the below answer I am speaking very generally about code.
Assuming that I can implement my cross-platform f(x) in C or C++, how the programmers in this field usually address the problem of translating and mapping the inputs, the quadrants to the result via the actual implementation ?
The general answer to this is: In the most obvious and simplest way possible that achieves your purpose.
I'm not entirely sure I follow most of your arguments/questions but I have a feeling you are looking for problems where really none exist. Do you truly have the need to re-implement the trigonometric functions? Don't fall into the trap of NIH (Not Invented Here).
the straightforward solution is to detect the quadrant
Yes! I love straightforward code. Code that is perfectly obvious at a glance what it does. Now, sometimes, just sometimes, you have to do some crazy things to get it to do what you want: for performance, or avoiding bugs out of your control. But the first version should be most obvious and simple code that solves your problem. From there you do testing, profiling, and benchmarking and if (only if) you find performance or other issues, then you go into the crazy stuff.
This is good in theory but this also presents a couple of really bad problems,
I would say that this is good in theory and in practice for most cases and I definitely don't see any "bad" problems. Minor issues in specific corner cases or design requirements at most.
A few things on a few of your specific comments:
approximation of f(x) is just for the first quadrant: Yes, and there are multiple reasons for this. One simply is that most trigonometric functions have identities so you can easily use these to reduce range of input parameters. This is important as many numerical techniques only work over a specific range of inputs, or are more accurate/performant for small inputs. Next, for very large inputs you'll have to trim the range anyways for most numerical techniques to work or at least work in a decent amount of time and have sufficient accuracy. For example, look at the Taylor expansion for cos() and see how long it takes to converge sufficiently for large vs small inputs.
it just prevents possible optimizations: Chances are your c++ compiler these days is way better at optimizations than you are. Sometimes it isn't but the general procedure is to let the compiler do its optimization and only do manual optimizations where you have measured and proven that you need it. Theses days it is very non-intuitive to tell what code is faster by just looking at it (you can read all the questions on SO about performance issues and how crazy some of the root causes are).
namely memoization: I've never seen memoization in place for a double function. Just think how many doubles are there between 0 and 1. Now in reduced accuracy situations you can take advantage of it but this is easily implemented as a custom function tailored for that exact situation. Thinking about it, I'm not exactly sure how to implement memoization for a double function that actually means anything and doesn't loose accuracy or performance in the process.
increase probability of more branching with a concurrent execution: I'm not sure I'd implement trigonometric functions in a concurrent manner but I suppose its entirely possible to get some performance benefits. But again, the compiler is generally better at optimizations than you so let it do its jobs and then benchmark/profile to see if you really need to do better.
doesn't lead to better, clean, dry code: I'm not sure what exactly you mean here, or what "dry code" is for that matter. Yes, sometimes you can get into trouble by too many or too complex if/switch blocks but I can't see a simple check for 4 quadrants apply here...it's a pretty basic and simple case.
So for any platform I get the same y for the same values of x: My guess is that getting "exact" values for all 53 bits of double across multiple platforms and systems is not going to be possible. What is the result if you only have 52 bits correct? This would be a great area to do some tests in and see what you get.
I've used trigonometric functions in C for over 20 years and 99% of the time I just use whatever built-in function is supplied. In the rare case I need more performance (or accuracy) as proven by testing or benchmarking, only then do I actually roll my own custom implementation for that specific case. I don't rewrite the entire gamut of <math.h> functions in the hope that one day I might need them.
I would suggest try coding a few of these functions in as many ways as you can find and do some accuracy and benchmark tests. This will give you some practical knowledge and give you some hard data on whether you actually need to reimplement these functions or not. At the very least this should give you some practical experience with implementing these types of functions and chances are answer a lot of your questions in the process.

SCIP and Branch and Price

I have a general question about SCIP. I need to use the SCIP as a Branch and Price framework for my problem, I code in c++ so I used the VRP example as a template. On some of the instances, the code stops at the fractional solution and returns that as a optimal solution, I think something is wrong, do I have to set some parameters in order to tell SCIP look for integer solution or I made a mistake, I believe it should not stop and instead branch on the fractional solution until it reaches the integer solution (without any other negative reduced cost column). I also solve the subproblem optimally! any commenets?!
If you define your variables to be continous and just add a pricer, SCIP will solve the master problem to optimality (i.e., solve the restricted master, add improving columns, solve the updated restricted master, and so on, until no more improving columns were found).
There is no reason for SCIP to check if the solution is integral, because you explicitly said that you don't mind whether the values of the variables are integral or not (by defining them to be continuous). On the other hand, if you define the variables to be of integral (or binary) type, SCIP will do exactly as I described before, but at the end check whether all integral variables have an integral value and branch if this is not the case.
However, you should note that all branching rules in SCIP do branching on variables, i.e., they take an integer variable with fractional value and split its domain; a binary variable would be fixed to 0 and 1 in the two child nodes. This is typically a bad idea for branch-and-price: first of all, it's quite unbalanced. You have a huge number of variables out of which only few will have value 1 in the end, most will be 0. Fixing a variable to 1 therefore has a high impact, while fixing it to 0 has almost no impact. But more importantly, you need to take the branching decision into account in your pricing problem. If you fixed a variable to 0, you have to keep the pricer from generating a copy of the forbidden column (which would probably improve the LP solution, because it was part of the former optimal solution). In order to to this, you might need to look for the 2nd (or later k)-best solution. Since you are solving the pricing problems as a MIP with SCIP, you might just add a constraint forbidding this solution (logicor (linear) for binary variables or bounddisjunction (not linear) for general integer variables).
I would recommend to implement your own branching rule, which takes into account that you are doing branch-and-price and branches in a way that is more balanced and does not harm your pricing too much. For an example, check out the Ryan&Foster branching rule, which is the standard for binary problems with a set-partitioning master structure. This rule is implemented in Binpacking as well as the Coloring example shipped with SCIP.
Please also check out the SCIP FAQ, where there is a whole section about branch-and-price which also covers the topic branching (in particular, how branching decisions can be stored and enforced by a constraint handler, which is something you need to do for Ryan&Foster branching): http://scip.zib.de/doc/html/FAQ.php
There were also a lot of questions about branch-and-price on the SCIP mailing list
http://listserv.zib.de/mailman/listinfo/scip/. If you want to search it, you can use google and search for "site:listserv.zib.de scip search-string"
Finally, I would like to recommend to have a look at the GCG project: http://www.or.rwth-aachen.de/gcg/
It is an extension of SCIP to a generic branch-cut-and-price solver, i.e., you do not need to implement anything, you just put in an original formulation of your model, which is then reformulated by a Dantzig-Wolfe decomposition and solved via branch-cut-and-price. You can supply the structure for the reformulation, pricing problems are solved as a MIP (as you do it also), and there are also different branching rules. GCG is also part of the SCIP optimization suite and can be easily built within the suite.

How to check, without an object function, if the constraints are feasible?

My professor gave me a binary linear programming problem, but this problem is slightly different from optimization problems I used to solve(i.e. this is probably not maximizing or minimizing the object function.)
The problem is as follows,
Given a matrix M, for entries m_ij != 0, there are corresponding x_ijk variables.
Entries m_ij = 0 can be ignored.
x_ijk is either 0 or 1, and I want to try 5 x_ijk variables for each m_ij (that is, x_ij1, x_ij2, x_ij3, x_ij4, and x_ij5. One of them is 1 and the others are 0) are enough to satisfy some conditions(a set of inequalities).
More simply, this is to check if the set of constraints involving 5 x_ijk variables for each m_ij is a valid(or feasible) constraints.
I have solved some optimization problems, but I have never solved a problem without an objective function.
What should I set as my objective function here?
0? nothing?
I might be using lp_solve or CPLEX.
Thank you in advance for your advice!
That is correct, you can set an arbitrary constant value as an objective function.
Most of the solvers I have tried allow an empty objective function. Simply leave it out from your model.
Depending on the solver and the API you are using, it can happen that you have to set the coefficients of all variables in the objective to zero.
Don't worry, it has to work.
In response to your comment: Yes, constraint programming tools can provide better performance on feasibility problems than LP solvers (such as CPLEX). I have played with the IBM ILOG CPLEX CP Optimizer a few months ago, it is free for Academic users. Both the LP solver and the CP solver failed on my problems. Don't expect a miracle from constraint programming.
Keep in mind the that time needed to solve a constraint program grows exponentially with the size of the problem in the worse case. Sooner or later, your problems will most likely become unsolvable with either tool.
Just for your information: in the end, the constraint programming solver will call the LP solver (for example CPLEX).
My advice is: try the tool you already have / use the problem formulation that is more natural to you. Check whether the tool can solve your problem. Switch tool only if the tool fails and you cannot improve your model.