I wonder what are the advantages and disadvantages of these two algorithms. I want to write AddEmUp C++ solved, but I'm not sure which (IDA or DFID) algorithm should I use.
The best article I found is this one, but it seems too old - '93. Any newer?
I think IDA* would be better, but.. ? Any other ideas?
Any ideas and info would be helpful.
Thanks ! (:
EDIT: Some good article about IDA* and good explanation of the algorithm?
EDIT2: Or some good heuristic function for that game? I have no idea how to think of some :/
The Russel & Norvig book is an excellent reference on these algorithms, and I'll give larsmans a virtual high-five for suggesting it; however I disagree that IDA* is in any appreciable way harder to program than A*. I've done it for a project where I had to write an AI to solve a sliding-block puzzle - the familiar problem of having a N x N grid of numbered tiles, and using the single free space to slide tiles around until they are in ascending order.
Recall:
F(n) = g(n) + h(n).
TotalCost = PathCost + Heuristic.
g(n) = Path cost, the distance from the initial to the current state
h(n) = Heuristic, the estimation of cost from current state to end state. To be an admissible heuristic (and thus ensure A*'s optimality), you cannot in any case overestimate the cost. See this question for more info on the effects of overestimating/underestimating heuristics on A*.
Remember that Iterative Deepening A* is just A* with a limit on the F value of nodes you are allowed to traverse. This FLimit increases with each outer iteration; with each iteration you are deepening the search.
Here's my C++ code implementing both A* and IDA* to solve the aforementioned sliding block puzzle. You can see that I use a std::priority_queue with a custom Comparator to store Puzzle states in the queue prioritized by their F value. You will also note that the only difference between A* and IDA* is the addition of an FLimit check and an outer loop that increments this FLimit. I hope this helps shed some light on this subject.
Check out Russell & Norvig, chapters 3 and 4, and realize that IDA* is hard to program correctly. You might want to try recursive best first search (RBFS), also described by R&N, or plain old A*. The latter can be implemented using an std::priority_queue.
IIRC, R&N described IDA* in the first edition, then replaced it with RBFS in the second. I haven't seen the third edition yet.
As regards your second edit, I haven't looked into the game, but a good procedure for deriving heuristics is that of relaxed problems. You take away the rules of the game until you derive a version for which the heuristic is easily expressed and implemented (and cheap to compute). Or, following a bottom-up approach, you check the main rules to see which one admits an easy heuristic, then try that and add in other rules as you need them.
DFID is just a special case of IDA* where the heuristic function is the constant 0; in other words, it has no provision for introducing heuristics. If the problem is not small enough that it can be solved without using heuristics, it seems you have no choice but to use IDA* (or some other member of the A* family). That said, IDA* is really not that hard: the implementation provided by the authors of AIMA is only about half a page of Lisp code; I imagine a C++ implementation shouldn't take more than twice that.
Related
I've tried to figure what the A* algorithm would be in case the Heuristic function would not satisfy the monotonicity condition, where in
h(u) <= e(u,v) + h(v), for every u,v such that there is an edge between u and v
is the condition for monotonicity, where h is the heuristic function, u and v are vertices in the search graph, and the function e gives the edge cost between u and v (The search graph is undirected). However, wikipedia (here) does not give the algorithm for this, nor do other sources like Norvig's book on Artificial Intelligence.
Is there a good source to study this. Pseudo code would be great!
Also, I do not wish to solve this by converting the non monotonic heuristic function to a heuristic one.
Assuming the heuristic function function is still admissible - A* algorithm will work fine.
However, for non-monotonic heuristic functions, you might need to update an already 'closed' node, and you should allow this behavior.
In the case of tree search it's not required to be consistent to be optimal. Instead if it's a graph search A* is optimal only in the case that the heuristic is admissible and consistent.
In this picture you can see an example of a non consistent heuristic: the A* algorithm doesn't find the right path.
Maybe you can modify the standard A* algorithm as #amit said, but in that case you need to consider an already closed state, so the search will not be optimal. It may find the optimal path but it will expand more nodes than in the solution with a consistent heuristic, so it'll be sub-optimal.
I have vector of pointers to a very simple Point class:
class Point{
public:
float x;
float y;
float z;
};
How do I find the closest object to a referent point using STL?
Do I need first sort the vector first or is there a more efficient way?
Sorting takes O(n*log(N)), so it's not very efficient. You can do it in O(n) by just iterating through the elements and memorizing the best match.
Using for_each from <algorithm>, you can define a function that keeps track of the closest elements and completes in O(n).
Or, you can probably even use min_element, also from <algorithm>.
The basic question here is how often you'll be doing queries against a single set of points.
If you're going to find one nearest point in the set one time, then #Lucian is right: you might as well leave the points un-sorted, and do a linear search to find the right point.
If you'll do a relatively large number of queries against the same set of points, it's worthwhile to organize the point data to improve query speed. #izomorphius has already mentioned a k-d tree, and that's definitely a good suggestion. Another possibility (admittedly, quite similar) is an oct-tree. Between the two, I find an oct-tree quite a bit easier to understand. In theory, a k-d tree should be slightly more efficient (on average), but I've never seen much difference -- though perhaps with different data the difference would become significant.
Note, however, that building something like a k-d tree or oct-tree isn't terribly slow, so you don't need to do an awful lot of queries against a set of points to justify building one. One query clearly doesn't justify it, and two probably won't either -- but contrary to what Luchian implies, O(N log N) (just for example) isn't really very slow. Roughly speaking, log(N) is the number of digits in the number N, so O(N log N) isn't really a whole lot slower than O(N). That, in turn, means you don't need a particularly large number of queries to justify organizing the data to speed up each one.
You can not go faster then a linear comparison if you only know that there are points in a vector. However if you have additional knowledge a lot can be improved. For instance if you know all the points are ordered and lie on the same line there is a logarithmic solution.
Also there are better data structures to solve your problem for instance a k-d tree. It is not part of the STL - you will have to implement it yourself but it is THE data structure to use to solve the problem you have.
you can try to use Quadtree
http://en.wikipedia.org/wiki/Quadtree
or something similar.
I am starting to work with boost graph library. I need a best-first search, which I could implement using astar_search by having zero costs. (Please correct me if I'm wrong.)
However, I wonder if there is another possibility of doing this? If the costs are not considered, the algorithm should be slightly more efficient.
EDIT: Sorry for the unclear description. I am actually implementing a potential field search, so I don't have any costs/weights associated with the edges but rather need to do a steepest-descent-search (which can overcome local minima).
Thanks for any hints!
You could definitely use A* to tackle this; you'd need h(x) to be 0 though, not g(x). A* rates nodes based on F which is defined by
F(n) = g(n) + h(n).
TotalCost = PathCost + Heuristic.
g(n) = Path cost, the distance from the initial to the current state
h(n) = Heuristic, the estimation of cost from current state to end state.
From Wikipedia:
Dijkstra's algorithm, as another
example of a best-first search
algorithm, can be viewed as a special
case of A* where h(x) = 0 for all x.
If you are comfortable with C++, I would suggest trying out YAGSBPL.
As suggested by Aphex's answer, you might want to use Dijkstra's algorithm; one way to set the edge weights is to set w(u, v) to potential(v) - potential(u), assuming that is nonnegative. Dijkstra's algorithm assumes that edge weights are positive and so that distances increase as you move away from the source node. If you are searching for the smallest potential, flip the sides of the subtraction; if you have potentials that go both up and down you might need to use something like Bellman-Ford which is not best-first.
as we know that the trees are recursive data structures, We use recurrsion in writing the procedures of tree like delete method of BST etc.
the advantage of recurrsion is, our procedures becomes very small (for example the code of inorder traversal is of only 4 or 5 lines) rather than a non recurrsive procedure which would be lengthy but not as complex as recurssive procedure in understanding perspective. that is why i hate recurrsion and i prefer to write non recurrsive procedure and i have done that in binary serach trees and avl trees.
Now please elaborate that, prefering non recursive procedures over recurrsive procedures is bad or good thing."
Recursion is a tool like any other. You don't have to use every tool that's available but you should at least understand it.
Recursion makes a certain class of problems very easy and elegant to solve and your "hatred" of it is irrational at best. It's just a different way of doing things.
The "canonical" recursive function (factorial) is shown below in both recursive and iterative forms and, in my opinion, the recursive form more clearly reflects the mathematical definition of f(1) = 1, f(n) = n*f(n-1) for n>1.
Iterative: Recursive:
def fact(n): def fact(n):
r = n if n == 1:
while n > 1: return 1
r = r * n return n * fact(n-1)
n = n - 1
return r
Pretty much the only place I would prefer an iterative solution to a recursive one (for solutions that are really well suited for recursion) is when the growth in stack size may lead to problems (the above factorial function may well be one of those since stack growth depends on n but it may also be optimised to an iterative solution by the compiler). But this stack overflow rarely happens since:
Most stacks can be configured where necessary.
Recursion (especially tail-end recursion where the recursive call is the last thing that happens in the function) can usually be optimised to an iterative solution by an intelligent compiler.
Most algorithms I use in recursive situations (such as balanced trees and so on, as you mention) tend to be O(logN) and stack use doesn't grow that fast with increased data. For example, you can process a 16-way tree storing two billion entries with only seven levels of stack (167 =~ 2.6 billion).
You should read about Tail Recursion. In general, if a compiler manages to apply tail recursion to a procedure, it it quite effective, if not, then not so.
Also a important issue is the maximum recusion depth of your compiler -- usually it's limited by the stack size. The downside here is that there's no graceful way to handle a stack overflow.
Recursion is elegant, but prone to stack overflowing. Use tail-end recursion whenever possible to give the compiler the chance to convert it an iterative solution.
It's definitely you decision which tool you want to use - but keep in mind that most algorithms dealing with tree-like data structures are usually implemented recursively. As it's common practice, your code is easier to read and less surprising for others.
Recursion is a tool. Sometimes using the "tool of recursion" makes the code easier to read, although not necessarily easier to comprehend.
In general, recursive solutions tend to be good candidates where a "divide and conquer" approach to solving a specific problem is natural.
Typically, recursion is a good fit where you can look at a problem and say "aha, if I knew the answer for a simpler variant f this problem, I could use that solution to generate the answer I want" and "the simplest possible problem is P and its solution is S". Then, the code to solve the problem as a whole boils down to looking at the in-data, simplifying it, recursively generate a (simpler) answer and then go from the simpler answer to the answer as a whole.
If we consider the problem of counting the levels of a tree, the answer is that the height of the tree is 1 more than the height of the "tallest/deepest" of the children and the height of a leaf is 1. Something like the following code. The problem can be solved iteratively, but you'd, essentially, re-implement the call stack in your own data structures.
def tree_height (tree):
if tree.is_leaf():
return 1
childmax = 0;
for child in tree.children():
childmax=max(childmax, tree_height(child))
return childmax+1
It's also worth considering that Tail Call Optimization can make some recursive functions running in constant stack space.
I did recently attach the 3rd version of Dijkstra algorithm for shortest path of single source into my project.
I realize that there are many different implementations which vary strongly in performance and also do vary in the quality of result in large graphs. With my data set (> 100.000 vertices) the runtime varies from 20 minutes to a few seconds. Th shortest paths also vary by 1-2%.
Which is the best implementation you know?
EDIT:
My Data is a hydraulic network, with 1 to 5 vertices per node. Its comparable to a street map. I made some modifications to a already accelerated algorithm (using a sorted list for all remaining nodes) and now find to the same results in a fraction of time. I have searched for such a thing quite a while. I wonder if such a implementation already exists.
I can not explain the slight differences in results. I know that Dijkstra is not heuristic, but all the implementations seem to be correct. The faster solutions have the results with shorter paths. I use double precision math exclusively.
EDIT 2:
I found out that the differences in the found path are indeed my fault. I had inserted special handling for some vertices (only valid in one direction) and forgot about that in the other implementation.
BUT im still more than surprised that Dijkstra can be accelerated dramatically by the following change:
In general a Dijkstra algorithm contains a loop like:
MyListType toDoList; // List sorted by smallest distance
InsertAllNodes(toDoList);
while(! toDoList.empty())
{
MyNodeType *node = *toDoList.first();
toDoList.erase(toDoList.first());
...
}
If you change this a little bit, it works the same, but performs better:
MyListType toDoList; // List sorted by smallest distance
toDoList.insert(startNode);
while(! toDoList.empty())
{
MyNodeType *node = *toDoList.first();
toDoList.erase(toDoList.first());
for(MyNeigborType *x = node.Neigbors; x != NULL; x++)
{
...
toDoList.insert(x->Node);
}
}
It seems, that this modification reduces the runtime by a order not of magnitude, but a order of exponent. It reduced my runtime form 30 Seconds to less than 2. I can not find this modification in any literature. It's also very clear that the reason lies in the sorted list. insert/erase performs much worse with 100.000 elements that with a hand full of.
ANSWER:
After a lot of googling i found it myself. The answer is clearly:
boost graph lib. Amazing - i had not found this for quite a while. If you think, that there is no performance variation between Dijkstra implementations, see wikipedia.
The best implementations known for road networks (>1 million nodes) have query times expressed in microseconds. See for more details the 9th DIMACS Implementation Challenge(2006). Note that these are not simply Dijkstra, of course, as the whole point was to get results faster.
May be I am not answering your question. My point is why to use Dijkstra when there are pretty much more efficient algorithms for your problem. If your graph fullfills the triangular property (it is an euclidian graph)
|ab| +|bc| > |ac|
(the distance from node a to node b plus distance from node b to node c is bigger than the distance from node a to node c) then you can apply the A* algorithm.
This algorithm is pretty efficient. Otherwise consider using heuristics.
The implementation is not the major issue. The algorithm to be used does matter.
Two points I'd like to make:
1) Dijkstra vs A*
Dijkstra's algorithm is a dynamic programming algorithm, not an heuristic. A* is an heuristic because it also uses an heuristic function (lets say h(x) ) to "estimate" how close a point x is getting to the end point. This information is exploited in subsequent decisions of which nodes to explore next.
For cases such as an Euclidean graph, then A* works well because the heuristic function is easy to define (one can simply use the Euclidean distance, for example). However, for non Euclidean graphs it may be harder to define the heuristic function, and a wrong definition can lead to a non-optimal path.
Therefore, dijkstra has the advantage over A* which is that it works for any general graph (with the exception of A* being faster in some cases). It could well be that certain implementations use these algorithms interchangeably, resulting in different results.
2) The dijkstra algorithm (and others such as A*) use a priority queue to obtain the next node to explore. A good implementation may use a heap instead of a queue, and an even better one may use a fibonacci heap. This could explain the different run times.
The last time I checked, Dijkstra's Algorithm returns an optimal solution.
All "true" implementations of Dijkstra's should return the same result each time.
Similarly, asymptotic analysis shows us that minor optimisations to particular implementations are not going to affect performance significantly as the input size increases.
It's going to depend on a lot of things. How much do you know about your input data? Is it dense, or sparse? That will change which versions of the algorithm are the fastest.
If it's dense, just use a matrix. If its sparse, you might want to look at more efficient data structures for finding the next closest vertex. If you have more information about your data set than just the graph connectivity, then see if a different algorithm would work better like A*.
Problem is, there isn't "one fastest" version of the algorithm.