I am looking for a way to perform a (medium-scale*) multivariate linear regression (ordinary least-squares, OLS) in C++. Say C++11 with using std library, and if helpful also boost; if easily installable also the use of further libraries is fine; use of e.g. std::vector format for inputs & outputs could be most convenient.
Performance is not top priority, though of course appreciated.
What does a reasonable algorithm look like? Are there any good tools readily available to help with it?
Various implementations for univariate regression algos can readily be found, e.g. here, but I have not yet found any multivariate version. Also, this question essentially asks for such a regression though it has been closed due to lack of focus, even if it's title would be the ideal one.
*Medium-scale: in my case e.g. say few ten-thousands of observations, and up to hundred explanatories/attributes (e.g. time-dummies).
Related
I work on palmprint recognition using feature2D with Open_CV library, and I use algorithms such as SIFT, SURF, ORB... to detect features and extract/match descriptors. My test include (1 vs 1) palmprint and also (1 vs Data Base) of palmprint.
Ones I get the result, I need to evaluate the algorithm, and for this I know that there are some rates or scores (like EER, rank-1 identification, recall and accuracy) which gives an estimation about how much this method was successful. Now I need to know if any of those rates are implemented in Open_CV, and how to use them. If they aren't, what are the different formulas used in the literary.
As far as I know there is little implemented in OpenCV. A common way is to store the results (e.g. in JSON) and process those with other programs such as Matlab or Python. This also allows you to change the evaluation without the need to recompute the algorithms.
There is no overall best method to show the results. It always depends on what you want to show. In my opinion ROC is the best way to express your output. It is also very widely used in research.
If you insist on doing it in C++, then you could use:
Roceasy or
DLIB
I'm working on project where I need to minimize functions by several variables like func(input_parameters, variable_parameters) -> min(variable_parameters).
I use optimizing functions from SciPy, so minimization process is a grey box: I can see the code on GitHub and read about used algorithms, but I'd like to think that it's okay and aim to testing of my own project.
Though, particular libraries shouldn't matter in this question.
At the moment I use few approaches:
Create simple examples and find global/local minima by hand and create test that performs optimization and compares its solution with the right one
If method needs gradients, compare analytically calculated gradients with their numerical approximation in tests
For iterative algorithms built upon ones provided by SciPy check that sequence of function values is monotonically nonincreasing in tests
Is there a book or an article about testing of mathematical optimization procedures?
P. S. I'm not talking about Test functions for optimization
, I'm asking about approaches used to test optimization procedure to find bugs faster.
I find the hypothesis library really useful for testing optimisation algorithms in development.
You can set it up to generate random test cases (functions, linear programs, etc) according to some specification. The idea is that you pass these to your algorithm and test for known invariants. For example you could have it throw random problems or subproblems at your algorithm and check that (for example):
Gradient descent methods produce a series of nonincreasing objectives
Local search finds a solution with no better neighbours
Heuristics maintain feasibility
There's a useful PyCon talk here explaining the idea of property based testing. It focuses more on testing APIs than algorithms, but I think the ideas transfer. I've found this approach does a pretty good job finding cases of unexpected behaviour as I'm writing a new algorithm.
I have a set of data z(0), z(1), z(2)...,z(n) that I am currently fitting with a 2 variables polynomial of the kind p(x,y) = a(1)*x^2+a(2)*y^2+a(3)*x*y+a(4). I have i=1,...,n (x(i),y(i)) coordinates that I impose to be p(x(i),y(i))=z(i). In this way I have a Overdetermined System that I can solve using Eigen SVD . I am looking for a more sophisticated method that can take care of outliers, like a Least Median of Squares robust regression (as described here) but I haven't found a C++ implementation for 2 variables. I looked in GSL but it seems there is nothing for 2 variable functions. The only other solution I can think of is using a TGraph2D in ROOT. Do you know any other solution? Numerical recipes maybe? Since I am writing C++ code I would prefer C or C++ implementations.
Since non answer has been given yet, but I am still working on this problem, I will share my progresses here.
The class TLinearFitter has a fit method that allows you to select Robust fitting - Least Trimmed Squares regression (LTS):
https://root.cern.ch/root/html532/TLinearFitter.html
Another possible solution, more time consuming maybe, but maybe more efficient on the long run is to write my own function to be minimized, and the use:
https://projects.coin-or.org/Ipopt to minimize it. Although in this approach there is a bigger "step". I don't know how to use the library and I haven't (yet?) found a nice tutorial to understand it.
here: https://wis.kuleuven.be/stat/robust/software there is a Fortran implementation of the LMedS algorithm called PROGRESS. So another possible solution could be to port this software to C/C++ and make a library out of it.
I'm looking to run a gradient descent optimization to minimize the cost of an instantiation of variables. My program is very computationally expensive, so I'm looking for a popular library with a fast implementation of GD. What is the recommended library/reference?
GSL is a great (and free) library that already implements common functions of mathematical and scientific interest.
You can peruse through the entire reference manual online. Poking around, this starts to look interesting, but I think we'd need to know more about the problem.
It sounds like you're fairly new to minimization methods. Whenever I need to learn a new set of numeric methods, I usually look in Numerical Recipes. It's a book that provides a nice overview of the most common methods in the field, their tradeoffs, and (importantly) where to look in the literature for more information. It's usually not where I stop, but it's often a helpful starting point.
For example, if your function is costly, then your goal is to minimization the number of evaluations to need to converge. If you have analytical expressions for the gradient, then a gradient-based method will probably work to your advantage, assuming that the function and its gradient are well-behaved (lack singularities) in the domain of interest.
If you don't have analytical gradients, then you're almost always better off using an approach like downhill simplex that only evaluates the function (not its gradients). Numerical gradients are expensive.
Also note that all of these approaches will converge to local minima, so they're fairly sensitive to the point at which you initially start the optimizer. Global optimization is a totally different beast.
As a final thought, almost all of the code you can find for minimization will be reasonably efficient. The real cost of minimization is in the cost function. You should spend time profiling and optimizing your cost function, and select an algorithm that will minimize the number of times you need to call it (methods like downhill simplex, conjugate gradient, and BFGS all shine on different kinds of problems).
In terms of actual code, you can find a lot of nice routines at NETLIB, in addition to the other libraries that have been mentioned. Most of the routines are in FORTRAN 77, but not all; to convert them to C, f2c is quite useful.
One of the best respected libraries for this kind of optimization work is the NAG libraries. These are used all over the world in universities and industry. They're available for C / FORTRAN. They're very non-free, and contain a lot more than just minimisation functions - A lot of general numerical mathematics is covered.
Anyway I suspect this library is overkill for what you need. But here are the parts pertaining to minimisation: Local Minimisation and Global Minimization.
Try CPLEX which is available for free for students.
I'm looking to run a gradient descent optimization to minimize the cost of an instantiation of variables. My program is very computationally expensive, so I'm looking for a popular library with a fast implementation of GD. What is the recommended library/reference?
GSL is a great (and free) library that already implements common functions of mathematical and scientific interest.
You can peruse through the entire reference manual online. Poking around, this starts to look interesting, but I think we'd need to know more about the problem.
It sounds like you're fairly new to minimization methods. Whenever I need to learn a new set of numeric methods, I usually look in Numerical Recipes. It's a book that provides a nice overview of the most common methods in the field, their tradeoffs, and (importantly) where to look in the literature for more information. It's usually not where I stop, but it's often a helpful starting point.
For example, if your function is costly, then your goal is to minimization the number of evaluations to need to converge. If you have analytical expressions for the gradient, then a gradient-based method will probably work to your advantage, assuming that the function and its gradient are well-behaved (lack singularities) in the domain of interest.
If you don't have analytical gradients, then you're almost always better off using an approach like downhill simplex that only evaluates the function (not its gradients). Numerical gradients are expensive.
Also note that all of these approaches will converge to local minima, so they're fairly sensitive to the point at which you initially start the optimizer. Global optimization is a totally different beast.
As a final thought, almost all of the code you can find for minimization will be reasonably efficient. The real cost of minimization is in the cost function. You should spend time profiling and optimizing your cost function, and select an algorithm that will minimize the number of times you need to call it (methods like downhill simplex, conjugate gradient, and BFGS all shine on different kinds of problems).
In terms of actual code, you can find a lot of nice routines at NETLIB, in addition to the other libraries that have been mentioned. Most of the routines are in FORTRAN 77, but not all; to convert them to C, f2c is quite useful.
One of the best respected libraries for this kind of optimization work is the NAG libraries. These are used all over the world in universities and industry. They're available for C / FORTRAN. They're very non-free, and contain a lot more than just minimisation functions - A lot of general numerical mathematics is covered.
Anyway I suspect this library is overkill for what you need. But here are the parts pertaining to minimisation: Local Minimisation and Global Minimization.
Try CPLEX which is available for free for students.