were can I find reliable code for critical values for probability distributions? For instance, F critical values for the fisher test...
?
Thanks for any relevant reference.
The Boost Math Toolkit contains the F distribution. To get critical values, use the quantile non-member function. If you can't use any third party tools, then you are in for a lot of work since most distributions do not have a closed form analytical expression for critical values. E.g. the normal distribution requires numerical integration.
I hope they re-propose statistical special functions for C++ TR2. Also Boost has a similar proposal. I know that doesn't help now but I think this stuff should be canned for everyone to just pick up and use.
Related
I am looking for a way to perform a (medium-scale*) multivariate linear regression (ordinary least-squares, OLS) in C++. Say C++11 with using std library, and if helpful also boost; if easily installable also the use of further libraries is fine; use of e.g. std::vector format for inputs & outputs could be most convenient.
Performance is not top priority, though of course appreciated.
What does a reasonable algorithm look like? Are there any good tools readily available to help with it?
Various implementations for univariate regression algos can readily be found, e.g. here, but I have not yet found any multivariate version. Also, this question essentially asks for such a regression though it has been closed due to lack of focus, even if it's title would be the ideal one.
*Medium-scale: in my case e.g. say few ten-thousands of observations, and up to hundred explanatories/attributes (e.g. time-dummies).
I have a set of data z(0), z(1), z(2)...,z(n) that I am currently fitting with a 2 variables polynomial of the kind p(x,y) = a(1)*x^2+a(2)*y^2+a(3)*x*y+a(4). I have i=1,...,n (x(i),y(i)) coordinates that I impose to be p(x(i),y(i))=z(i). In this way I have a Overdetermined System that I can solve using Eigen SVD . I am looking for a more sophisticated method that can take care of outliers, like a Least Median of Squares robust regression (as described here) but I haven't found a C++ implementation for 2 variables. I looked in GSL but it seems there is nothing for 2 variable functions. The only other solution I can think of is using a TGraph2D in ROOT. Do you know any other solution? Numerical recipes maybe? Since I am writing C++ code I would prefer C or C++ implementations.
Since non answer has been given yet, but I am still working on this problem, I will share my progresses here.
The class TLinearFitter has a fit method that allows you to select Robust fitting - Least Trimmed Squares regression (LTS):
https://root.cern.ch/root/html532/TLinearFitter.html
Another possible solution, more time consuming maybe, but maybe more efficient on the long run is to write my own function to be minimized, and the use:
https://projects.coin-or.org/Ipopt to minimize it. Although in this approach there is a bigger "step". I don't know how to use the library and I haven't (yet?) found a nice tutorial to understand it.
here: https://wis.kuleuven.be/stat/robust/software there is a Fortran implementation of the LMedS algorithm called PROGRESS. So another possible solution could be to port this software to C/C++ and make a library out of it.
I'm looking to run a gradient descent optimization to minimize the cost of an instantiation of variables. My program is very computationally expensive, so I'm looking for a popular library with a fast implementation of GD. What is the recommended library/reference?
GSL is a great (and free) library that already implements common functions of mathematical and scientific interest.
You can peruse through the entire reference manual online. Poking around, this starts to look interesting, but I think we'd need to know more about the problem.
It sounds like you're fairly new to minimization methods. Whenever I need to learn a new set of numeric methods, I usually look in Numerical Recipes. It's a book that provides a nice overview of the most common methods in the field, their tradeoffs, and (importantly) where to look in the literature for more information. It's usually not where I stop, but it's often a helpful starting point.
For example, if your function is costly, then your goal is to minimization the number of evaluations to need to converge. If you have analytical expressions for the gradient, then a gradient-based method will probably work to your advantage, assuming that the function and its gradient are well-behaved (lack singularities) in the domain of interest.
If you don't have analytical gradients, then you're almost always better off using an approach like downhill simplex that only evaluates the function (not its gradients). Numerical gradients are expensive.
Also note that all of these approaches will converge to local minima, so they're fairly sensitive to the point at which you initially start the optimizer. Global optimization is a totally different beast.
As a final thought, almost all of the code you can find for minimization will be reasonably efficient. The real cost of minimization is in the cost function. You should spend time profiling and optimizing your cost function, and select an algorithm that will minimize the number of times you need to call it (methods like downhill simplex, conjugate gradient, and BFGS all shine on different kinds of problems).
In terms of actual code, you can find a lot of nice routines at NETLIB, in addition to the other libraries that have been mentioned. Most of the routines are in FORTRAN 77, but not all; to convert them to C, f2c is quite useful.
One of the best respected libraries for this kind of optimization work is the NAG libraries. These are used all over the world in universities and industry. They're available for C / FORTRAN. They're very non-free, and contain a lot more than just minimisation functions - A lot of general numerical mathematics is covered.
Anyway I suspect this library is overkill for what you need. But here are the parts pertaining to minimisation: Local Minimisation and Global Minimization.
Try CPLEX which is available for free for students.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I'm looking to add some genetic algorithms to an Operations research project I have been involved in. Currently we have a program that aids in optimizing some scheduling and we want to add in some heuristics in the form of genetic algorithms. Are there any good libraries for generic genetic programming/algorithms in c++? Or would you recommend I just code my own?
I should add that while I am not new to c++ I am fairly new to doing this sort of mathematical optimization work in c++ as the group I worked with previously had tended to use a proprietary optimization package.
We have a fitness function that is fairly computationally intensive to evaluate and we have a cluster to run this on so parallelized code is highly desirable.
So is c++ a good language for this? If not please recommend some other ones as I am willing to learn another language if it makes life easier.
thanks!
I would recommend rolling your own. 90% of the work in a GP is coding the genotype, how it gets operated on, and the fitness calculation. These are parts that change for every different problem/project. The actual evolutionary algorithm part is usually quite simple.
There are several GP libraries out there ( http://en.wikipedia.org/wiki/Symbolic_Regression#Implementations ). I would use these as examples and references though.
C++ is a good choice for GP because they tend to be very computationally intensive. Usually, the fitness function is the bottleneck, so it's worthwhile to at least make this part compiled/optimized.
I use GAUL
it's a C library with all you want.
( pthread/fork/openmp/mpi )
( various crossover / mutation function )
( non GA optimisation: Hill-Climbing, N-M Simplex, Simulated annealling, Tabu, ... )
Why build your own library when there is such powerful tools ???
I haven't used this personally yet, but the Age Layered Population Structure (ALPS) method has been used to generate human competitive results and has been shown to outperform several popular methods in finding optimal solutions in rough fitness landscapes. Additionally, the link contains source code in C++ FTW.
I have had similar problems. I used to have a complicated problem and defining a solution in terms of a fixed length vector was not desirable. Even a variable length vector does not look attractive. Most of the libraries focus on cases where the cost function is cheap to calculate which did not match my problem. Lack of parallelism is their another pitfall. Expecting the user to allocate memory for being used by the library is adding insult into injury. My cases were even more complicated because most of the libraries check the nonlinear conditions before evaluation. While, I needed to check the nonlinear condition during or after the evaluation based on the result of the evaluation. It is also undesirable when I needed to evaluate the solution to calculate its cost and then I had to recalculate the solution to present it. In most of the cases, I had to write the cost function two times. Once for GA and once for presentation.
Having all of these problems, I eventually, designed my own openGA library which is now mature.
This library is based on C++ and distributed with free Mozilla Public License 2.0. It guarantees that using this library does not limit your project and it can be used for commercial or none commercial purposes for free without asking for any permission. Not all libraries are transparent in this sense.
It supports three modes of single objective, multiple objective (NSGA-III) and Interactive Genetic Algorithm (IGA).
The solution is not mandated to be a vector. It can be any structure with any customized design containing any optional values with variable length. This feature makes this library suitable for Genetic Programming (GP) applications.
C++11 is used. Template feature allows flexibility of the solution structure design.
The standard library is enough to use this library. There is no dependency beyond that. The entire library is also a single header file for ease of use.
The library supports parallelism by default unless you turn it off. If you have an N-core CPU, the number of threads are set to N by default. You can change the settings. You can also set if the solution evaluations are distributed between threads equally or they are assigned to any thread which has finished its job and is currently idle.
The solution evaluation is separated from calculation of the final cost. It means that your evaluation function can simulate the system and keep a lot of information. Your cost function is called later and reports the cost based on the evaluation. While your evaluation results are kept to be used later by the user. You do not need to re-calculate it again.
You can reject a solution at any time during the evaluation. No waste of time. In fact, the evaluation and constraint check are integrated.
The GA assist feature help you to produce the C++ code base from the information you provide.
If these features match what you need, I recommend having a look at the user manual and the examples of openGA.
The number of the readers and citation of the related publication as well as its github favorite marks is increasing and its usage is keep growing.
I suggest you have a look into the matlab optimization toolkit - it comes with GAs out of the box, you only haver to code the fitness function (and a function to generate inital population eventually) and I believe matlab has some C++ interoperability so you could code you functions in C++. I am using it for my experiments and a very nice feature is that you get all sorts of charts out of the box as well.
Said so - if your aim is to learn about genetic algorithms you're better off coding it, but if you just want to run experiments matlab and C++ (or even just matlab) is a good option.
I would need some basic vector mathematics constructs in an application. Dot product, cross product. Finding the intersection of lines, that kind of stuff.
I can do this by myself (in fact, have already) but isn't there a "standard" to use so bugs and possible optimizations would not be on me?
Boost does not have it. Their mathematics part is about statistical functions, as far as I was able to see.
Addendum:
Boost 1.37 indeed seems to have this. They also gracefully introduce a number of other solutions at the field, and why they still went and did their own. I like that.
Re-check that ol'good friend of C++ programmers called Boost. It has a linear algebra package that may well suits your needs.
I've not tested it, but the C++ eigen library is becoming increasingly more popular these days. According to them, they are on par with the fastest libraries around there and their API looks quite neat to me.
Armadillo
Armadillo employs a delayed evaluation
approach to combine several operations
into one and reduce (or eliminate) the
need for temporaries. Where
applicable, the order of operations is
optimised. Delayed evaluation and
optimisation are achieved through
recursive templates and template
meta-programming.
While chained operations such as
addition, subtraction and
multiplication (matrix and
element-wise) are the primary targets
for speed-up opportunities, other
operations, such as manipulation of
submatrices, can also be optimised.
Care was taken to maintain efficiency
for both "small" and "big" matrices.
I would stay away from using NRC code for anything other than learning the concepts.
I think what you are looking for is Blitz++
Check www.netlib.org, which is maintained by Oak Ridge National Lab and the University of Tennessee. You can search for numerical packages there. There's also Numerical Recipes in C++, which has code that goes with it, but the C++ version of the book is somewhat expensive and I've heard the code described as "terrible." The C and FORTRAN versions are free, and the associated code is quite good.
There is a nice Vector library for 3d graphics in the prophecy SDK:
Check out http://www.twilight3d.com/downloads.html
For linear algebra: try JAMA/TNT . That would cover dot products. (+matrix factoring and other stuff) As far as vector cross products (really valid only for 3D, otherwise I think you get into tensors), I'm not sure.
For an extremely lightweight (single .h file) library, check out CImg. It's geared towards image processing, but has no problem handling vectors.