I'm currently trying to find good parameters for my program (about 16 parameters and execution of the program takes about a minute). Evolutionary algorithms seemed like a nice idea and I wanted to see how they perform.
Unfortunately I don't have a good fitness function because the variance of my objective function is very high (I can not run it often enough without waiting until 2016). I can, however, compute which set of parameters is better (test two configurations against each other). Do you know if there are evolutionary algorithms that only use that information? Are there other optimization techniques more suitable? For this project I'm using C++ and MATLAB.
// Update: Thank you very much for the answers. Both look promising but I will need a few days to evaluate them. Sorry for the delay.
If your pairwise test gives a proper total ordering, i.e. if a >= b, and b >= c implies a >= c, and some other conditions . Then maybe you can construct a ranking objective on the fly, and use CMA-ES to optimize it. CMA-ES is an evolutionary algorithm and is invariant to order preserving transformation of function value, and angle-preserving transformation of inputs. Furthermore because it's a second order method, its convergence is very fast comparing to other derivative-free search heuristics, especially in higher dimensional problems where random search like genetic algorithms take forever.
If you can compare solutions in a pairwise fashion then some sort of tournament selection approach might be good. The Wikipedia article describes using it for a genetic algorithm but it is easily applied to an evolutionary algorithm. What you do is repeatedly select a small set of solutions from the population and have a tournament among them. For simplicity the tournament size could be a power of 2. If it was 8 then pair those 8 up at random and compare them, selecting 4 winners. Pair those up and select 2 winners. In a final round -- select an overall tournament winner. This solution can then be mutated 1 or more times to provide member(s) for the next generation.
Related
I have a linear problem of finding all solutions that meet all constraints.
For example my variables are = [0.323, 0.123, 1.32, 6.3...]
Is it possible to get for example top 100 solutions sorted by fitness(maximization/minimization) function?
In a continuous LP enumerating different solutions is a difficult concept. E.g. consider max x, s.t. x <= 1. Obviously x=1, x=0.99999 are solutions and so are the infinite number of solutions in between. We could enumerate "corner solutions" (or basic solutions). See here for an example. Such a scheme could be adapted to find the first 100 different corner points sorted by the objective. For models with discrete variables, many constraint programming solvers will give you the possibility to find many solutions.
If you can define a fitness function as you suggested, then you might first want to solve the LP that maximizes this function. Afterwards you can include an objective cutoff that forces your second solution to be slightly worse than the first. You can implement this by introducing a cut that is your objective function with the right hand side of optimal value - epsilon.
Of course, this will not give you all (basic) solutions, but you might discover which variables are always at the same value or how much variance there is between the different solutions.
As per my search regarding the query, that I am posting here, I have got many links which propose solution but haven't mentioned exactly how this is to be done. I have explored, for example, the following links :
Link 1
Link 2
Link 3
Link 4
etc.
Therefore, I am presenting my understanding as to how the Naive Bayes formula with tf-idf can be used here and it is as follows:
Naive-Bayes formula :
P(word|class)=(word_count_in_class + 1)/(total_words_in_class+total_unique_words_in_all_classes(basically vocabulary of words in the entire training set))
tf-idf weighting can be employed in the above formula as:
word_count_in_class : sum of(tf-idf_weights of the word for all the documents belonging to that class) //basically replacing the counts with the tfidf weights of the same word calculated for every document within that class.
total_words_in_class : sum of (tf-idf weights of all the words belonging to that class)
total_unique_words_in_all_classes : as is.
This question has been posted multiple times on stack overflow but nothing substantial has been answered so far. I want to know that the way I am thinking about the problem is correct or not i.e. implementation that I have shown above. I need to know this as I am implementing the Naive Bayes myself without taking help of any Python library which comes with the built-in functions for both Naive Bayes and tf-idf. What I actually want is to improve the accuracy(currently 30%) of the model which was using Naive Bayes trained classifier. So, if there are better ways to achieve good accuracy, suggestions are welcome.
Please suggest me. I am new to this domain.
It would be better if you actually gave us the exact features and class you would like to use, or at least give an example. Since none of those have been concretely given, I'll just assume the following is your problem:
You have a number of documents, each of which has a number of words.
You would like to classify documents into categories.
Your feature vector consists of all possible words in all documents, and has values of number of counts in each document.
Your Solution
The tf idf you gave is the following:
word_count_in_class : sum of(tf-idf_weights of the word for all the documents belonging to that class) //basically replacing the counts with the tfidf weights of the same word calculated for every document within that class.
total_words_in_class : sum of (tf-idf weights of all the words belonging to that class)
Your approach sounds reasonable. The sum of all probabilities would sum to 1 independent of the tf-idf function, and the features would reflect tf-idf values. I would say this looks like a solid way to incorporate tf-idf into NB.
Another potential Solution
It took me a while to wrap my head around this problem. The main reason for this was having to worry about maintaining probability normalization. Using a Gaussian Naive Bayes would help ignore this issue entirely.
If you wanted to use this method:
Compute mean, variation of tf-idf values for each class.
Compute the prior using a gaussian distribution generated by the above mean and variation.
Proceed as normal (multiply to prior) and predict values.
Hard coding this shouldn't be too hard since numpy inherently has a gaussian function. I just prefer this type of generic solution for these type of problems.
Additional methods to increase
Apart from the above, you could also use the following techniques to increase accuracy:
Preprocessing:
Feature reduction (usually NMF, PCA, or LDA)
Additional features
Algorithm:
Naive bayes is fast, but inherently performs worse than other algorithms. It may be better to perform feature reduction, and then switch to a discriminative model such as SVM or Logistic Regression
Misc.
Bootstrapping, boosting, etc. Be careful not to overfit though...
Hopefully this was helpful. Leave a comment if anything was unclear
P(word|class)=(word_count_in_class+1)/(total_words_in_class+total_unique_words_in_all_classes
(basically vocabulary of words in the entire training set))
How would this sum up to 1? If using the above conditional probabilities, I assume the SUM is
P(word1|class)+P(word2|class)+...+P(wordn|class) =
(total_words_in_class + total_unique_words_in_class)/(total_words_in_class+total_unique_words_in_all_classes)
To correct this, I think the P(word|class) should be like
(word_count_in_class + 1)/(total_words_in_class+total_unique_words_in_classes(vocabulary of words in class))
Please correct me if I am wrong.
I think there are two ways to do it:
Round down tf-idf as integers, then use the multinomial distribution for the conditional probabilities. See this paper https://www.cs.waikato.ac.nz/ml/publications/2004/kibriya_et_al_cr.pdf.
Use Dirichlet distribution which is a continuous version of the multinomial distribution for the conditional probabilities.
I am not sure if Gaussian mixture will be better.
I need to perform some inferences on a Bayesian network, such as the example I have created below.
I was looking at doing something like something like this to solve an inference such as P(F| A = True, B = True). My initial approach was to do something like
For every possible output of F
For every state of each observed variable (A,B)
For every unobserved variable (C, D, E, G)
// Calculate Probability
But I don't think this will work because we actually need to go over many variables at once, not each at a time.
I have heard about Pearls algorithm for message passing but am yet to find a reasonable description that isn't extremely dense. For added information, these Bayesian networks are constrained as to not have more than 15-20 nodes, and we have all the conditional probability tables, the code doesn't really have to be fast or efficient.
Basically I am looking for a way to do this, not necessarily the BEST way to do this.
Your Bayesian Network (BN) does not seem to be particularly complex. I think you should easily get away with using exact inference method, such as junction tree algorithm. Of course, you can still just do brute force enumeration, but that would be a waste of CPU resources given that there are so many nice libraries out there that implement smarter ways of doing both exact and approximate inference in graphical models.
Since your tag mentions C++, my recommendation would be libDAI. It is a well written library that implements multiple exact and approximate inference on generic factor graphs. It does not have any weird dependencies and is very easy to integrate into your project. It is particularly well suited for discrete cases, such as yours, for which you have the probability tables.
Now, you noticed that I mentioned factor graphs. If you are not familiar with the concept, I will refer you to Wikipedia article on factor graphs as well as What are "Factor Graphs" and what are they useful for?. The principle is very simple, you represent your BN as a factor graph and then libDAI will do the inference for you.
EDIT:
Since CPU resources do not seem to be a problem for you and simplicity is the key, you can always go with brute force enumeration. The idea is straightforward.
Your Bayesian Network represents a joint probability distribution, which you can write down in terms of an equation, e.g.
P(A,B,C) = P(A|B,C) * P(B|C) * P(C)
Assuming that you have tables for all your conditional probability distributions, i.e. P(A|B, C) P(B|C) and P(C) then you can simply go over all the possible values of variables A, B, and C and calculate the output.
I am looking for an iterative linear system solver to calculate a continuously changing field. For the simulation to work properly, I need to re-calculate the field (maybe several times) for every time step. Fortunately, I have a good initial guess for each time step, so it is better I can feed it into an iterative solver. And the coefficient matrix is very dense.
The problem is I checked several iterative solvers online like Gmm++, IML++, ITL, DUNE/ISTL and so on. They are either for sparse systems or don't provide interfaces for inputting initial guesses (I might be wrong since I didn't have time to go through all the documents).
So I have two questions:
1 Is there any such c++ solver available online?
2 Since the coefficient matrix can be as large as thousands * thousands, could a direct solver be quicker than an iterative solver with a really good initial guess?
Great Thanks!
He
If you check the header for Conjugate Gradient in IML++ (http://math.nist.gov/iml++/cg.h.txt), you'll see that you can very easily provide the initial guess for the solution in the very variable where you'd expect to get the solution.
In our program we use a genetic algorithm since years to sole problems for n variables, each having a fixed set of m possible values. This typically works well for ~1,000 variables and 10 possibilities.
Now i have a new task where only two possibilities (on/off) exist for each variable, but i'll probably need to solve systems with 10,000 or more variables. The existing GA does work but the solution improves only very slowly.
All the EA i find are designed rather for continuous or integer/float problems. Which one is best suited for binary problems?
Well, the Genetic Algorithm in its canonical form is among the best suited metaheuristics for binary decision problems. The default configuration that I would try is such a genetic algorithm that uses 1-elitism and that is configured with roulette-wheel selection, single point crossover (100% crossover rate) and bit flip mutation (e.g. 5% mutation probability). I would suggest you try this combination with a modest population size (100-200). If this does not work well, I would suggest to increase the population size, but also change the selection scheme to a tournament selection scheme (start with binary tournament selction and increase the tournament group size if you need even more selection pressure). The reason is that with a higher population size, the fitness-proportional selection scheme might not excert the necessary amount of selection pressure to drive the search towards the optimal region.
As an alternative, we have developed an advanced version of the GA and termed it Offspring Selection Genetic Algorithm. You can also consider trying to solve this problem with a trajectory-based algorithm like Tabu Search or Simulated Annealing that just uses mutation to move from one solution to another by just making small changes.
We have a GUI-driven software (HeuristicLab) that allows you to experiment with a number of metaheuristics on several problems. Your problem is unfortunately not included, but it's GPL licensed and you can implement your own problem there (through just the GUI even, there's a howto for that).
Like DonAndre said, canonical GA was pretty much designed for binary problems.
However...
No evolutionary algorithm is in itself a magic bullet (unless it has billions of years runtime). What matters most is your representation, and how that interacts with your mutation and crossover operators: together, these define the 'intelligence' of what is essentially a heuristic search in disguise. The aim is for each operator to have a fair chance of producing offspring with similar fitness to the parents, so if you have domain-specific knowledge that allows you to do better than randomly flipping bits or splicing bitstrings, then use this.
Roulette and tournament selection and elitism are good ideas (maybe preserving more than 1, it's a black art, who can say...). You may also benefit from adaptive mutation. The old rule of thumb is that 1/5 of offspring should be better than the parents - keep track of this quantity and vary the mutation rate appropriately. If offspring are coming out worse then mutate less; if offspring are consistently better then mutate more. But the mutation rate needs an inertia component so it doesn't adapt too rapidly, and as with any metaparameter, setting this is something of a black art. Good luck!
Why not try a linear/integer program?