If we initialize a variable model.x at a specific value (i.e. model.x = 1) before solving the model, do we need to have warmstart=True as parameter for the call of Pyomo's solve() method in order to keep those initial values for the optimization?
Keep in mind that an initialized variable should not be forced to take the specified value, it only provides the variable with an initial starting value and then the solver will change it if needed.
For now, it depends on the solver interface.
If you are using a solver through the NL-file interface (e.g., AMPL solvers), then initial variable values are always supplied to the solver (if they are not None), and it is up to the solver whether or not it attempts to use those values as a warmstart (e.g., for a MIP) or an initial iterate (e.g., for solvers that are using an optimization method that requires a starting point). For solvers that require a starting point, it is also up to the solver what value will be used for any variables that are not supplied a starting point. Often times zero is used, but this may vary between solvers.
For all other Pyomo solver interfaces (e.g., LP, MPS, Python), which mainly correspond to MIP solvers, I believe the default behavior is to not supply a warmstart. You have to specify warmstart=True when you call solve to have initial values communicated to the solver.
I do not find this consistent, mainly because when going through the the NL-file interface, the solve method does not even accept the warmstart keyword, so you have to have an if-statement when writing some general code that works with multiple interfaces.
I think I'll save further discussion for the GitHub issue.
Related
I'm using the Eigen::LevenbergMarquardt solver for a model fitting application. The functor I'm providing to it includes functions to compute the error vector and Jacobian. These functions contain a lot of similar code and some costly calculations are repeated duplicated.
The prototype of the () operator used to compute the error vector includes what appears to be an optional pointer to a Jacobian matrix. If the Eigen::LevenbergMarquardt solver can be setup to compute the error vector and Jacobian at the same time in relevant cases it would really speed up my algorithm.
int operator()(const Eigen::VectorXf& z, Eigen::VectorXf& fvec, Eigen::MatrixXf* _j = 0)
I have not found any documentation describing this _j parameter or how it can be used. Checking its value during my code shows it is always a NULL pointer.
Does anyone know what this parameter is used for and if it's possible to compute the error vector and Jacobian simultaneously when both are needed?
Not sure about this particular solver but these kind of parameters are commonly only used whenever the solver needs it. Maybe it is just an extension point for the future. Looking at the source code, the LM solver never calls the functor with that parameter.
I think a better approach in your case would be to cache redundant parts of the computation within your functor. Maybe just keep a copy of the input vector and do a quick memcmp before doing the computation. Not ideal but since the interface has no way of telling you when the inputs change, that's probably the most robust option you have.
I have the following LP:
max -x4-x5-x6-x7
s.t x0+x1+x4=1
x2+x3+x5=1
x0+x2+x6=1
x1+x3+x7=1
Gurobi gives me the following base B=[1,0,2,10], in my model I have 8 variables and rank(A)=4, but in the base I have variable x10. My question is, why Gurobi generates slack variables even with rank(A)=4? And how to get an optimal base that contains only the original variables (variables from x0 to x7)?
The problem is degenerate. There are multiple optimal bases that give the same primal solution. In other words, some variables are basic at bound. This happens a lot in practice and you should not worry about that.
To make things more complicated: there are also multiple optimal primal solutions; so we say that we have both primal and dual degeneracy.
My question is, why Gurobi generates slack variables even with rank(A)=4?
The LP problem, for solvers like Gurobi (i.e. not a tableau solver), has n structural variables and m logical variables (a.k.a. slacks). The slack variables are implicit, they are not "generated" (in the sense that the matrix A is physically augmented with an identity matrix). Again, this is not something to worry about.
And how to get an optimal base that contains only the original variables (variables from x0 to x7)?
Well, this is an optimal basis. So why would Gurobi spend time and do more pivots to try to make all slacks non-basic? AFAIK no solver would do that. They treat structural and logical variables as equals.
It is not so easy to force variables to be in the basis. A free variable will most likely be in the (optimal) basis (but no 100% guarantee). You can also specify an advanced basis for Gurobi to start from. In the extreme: if this advanced basis is optimal (and feasible) Gurobi will not do any pivots.
I believe this particular problem has 83 optimal (and feasible) bases. Only one of them has all slacks NB. I don't think it is easy to find this solution, even if you would have access to a Simplex code and can change it so (after finding an optimal solution) it continues pivoting slacks out of the basis. I think you would need to enumerate optimal bases (explicitly or implicitly).
Slack variables are generated because it is necessary to solve for a dual linear programming model and reduction bound. A complementary slackness; Si would also be generated.
Moreover, X0 forms a branch for linear independence of a dual set-up. X0 has the value of 4, slack variables are generated from the transformation of the basis; where the rank is given, to the branch X0 for linear independence.
A reduction matrix is formed to find the value of X10 which has the value of 5/2.
This helps to eliminate X10 inorder to get an optimal base from the reduction matrix.
I'm new to C++ and I think a good way for me to jump in is to build some basic models that I've built in other languages. I want to start with just Linear Regression solved using first order methods. So here's how I want things to be organized (in pseudocode).
class LinearRegression
LinearRegression:
tol = <a supplied tolerance or defaulted to 1e-5>
max_ite = <a supplied max iter or default to 1k>
fit(X, y):
// model learns weights specific to this data set
_gradient(X, y):
// compute the gradient
score(X,y):
// model uses weights learned from fit to compute accuracy of
// y_predicted to actual y
My question is when I use fit, score and gradient methods I don't actually need to pass around the arrays (X and y) or even store them anywhere so I want to use a reference or a pointer to those structures. My problem is that if the method accepts a pointer to a 2D array I need to supply the second dimension size ahead of time or use templating. If I use templating I now have something like this for every method that accepts a 2D array
template<std::size_t rows, std::size_t cols>
void fit(double (&X)[rows][cols], double &y){...}
It seems there likely a better way. I want my regression class to work with any size input. How is this done in industry? I know in some situations the array is just flattened into row or column major format where just a pointer to the first element is passed but I don't have enough experience to know what people use in C++.
You wrote a quite a few points in your question, so here are some points addressing them:
Contemporary C++ discourages working directly with heap-allocated data that you need to manually allocate or deallocate. You can use, e.g., std::vector<double> to represent vectors, and std::vector<std::vector<double>> to represent matrices. Even better would be to use a matrix class, preferably one that is already in mainstream use.
Once you use such a class, you can easily get the dimension at runtime. With std::vector, for example, you can use the size() method. Other classes have other methods. Check the documentation for the one you choose.
You probably really don't want to use templates for the dimensions.
a. If you do so, you will need to recompile each time you get a different input. Your code will be duplicated (by the compiler) to the number of different dimensions you simultaneously use. Lots of bad stuff, with little gain (in this case). There's no real drawback to getting the dimension at runtime from the class.
b. Templates (in your setting) are fitting for the type of the matrix (e.g., is it a matrix of doubles or floats), or possibly the number of dimesions (e.g., for specifying tensors).
Your regressor doesn't need to store the matrix and/or vector. Pass them by const reference. Your interface looks like that of sklearn. If you like, check the source code there. The result of calling fit just causes the class object to store the parameter corresponding to the prediction vector β. It doesn't copy or store the input matrix and/or vector.
I have a dataset from custom abstract objects and a custom distance function. Is there any good SVM libraries that allows me to train on my custom objects (not 2d points) and my custom distance function?
I searched the answers in this similar stackoverflow question, but none of them allows me to use custom objects and distance functions.
First things first.
SVM does not work on distance functions, it only accepts dot products. So your distance function (actually similarity, but usually 1-distance is similarity) has to:
be symmetric s(a,b)=s(b,a)
be positive definite s(a,a)>=0, s(a,a)=0 <=> a=0
be linear in first argument s(ka, b) = k s(a,b) and s(a+b,c) = s(a,c) + s(b,c)
This can be tricky to check, as you actually ask "is there a function from my objects to some vector space, phi such that s(phi(x), phi(y))" is a dot-product, thus leading to definition of so called kernel, K(x,y)=s(phi(x), phi(y)). If your objects are themselves elements of vector space, then sometimes it is enough to put phi(x)=x thus K=s, but it is not true in general.
Once you have this kind of similarity nearly any SVM library (for example libSVM) works with providing Gram matrix. Which is simply defined as
G_ij = K(x_i, x_j)
Thus requiring O(N^2) memory and time. Consequently it does not matter what are your objects, as SVM only works on pairwise dot-products, nothing more.
If you look appropriate mathematical tools to show this property, what can be done is to look for kernel learning from similarity. These methods are able to create valid kernel which behaves similarly to your similarity.
Check out the following:
MLPack: a lightweight library that provides lots of functionality.
DLib: a very popular toolkit that is used both in industry and academia.
Apart from these, you can also use Python packages, but import them from C++.
I've been searching for a C/C++ library that does symbolic differantation and integrals of polynoms, but haven't found one that suits my needs.
I'm afraid that the problem is that I'm not using the correct terminology.
The problem is this :
given a polynom p, I would like to look at the function
f(p) = integral of (p')^2 from a to b
And generate partial derivatives for f with respect to p's coefficients.
Theoretically, there should be no problem here as we are dealing with polynoms, but I haven't found something that can keep the connection between the original coefficients and the modified polynom.
Does anyone know if there are libraries that can do such things, or am I better of creating my own?
Have you tried to use http://www.fadbad.com/fadbad.html ? It's quite useful.
I would write my own derivative class. There are books available meanwhile which document how to do this. But assuming you know the math rules, it is rather trivial.
Using such a derivative class you can then write a template function to generate your polynomial and the derivative and the square and the integral while keeping track of the derivatives vs. the coefficients. The problem is that you may carry around a lot of derivatives which are always zero. To avoid this is rather complicated.
A normal derivative class would contain a value and an array of derivative values.
There may be a constructor to create an independent variable by value and index -- initializing the value by the passed value and all derivatives to zero except the one matching the index to 1.
Then you write operators and functions for everything you need -- which is not much assuming you're only dealing with polynomials.