as we know that the trees are recursive data structures, We use recurrsion in writing the procedures of tree like delete method of BST etc.
the advantage of recurrsion is, our procedures becomes very small (for example the code of inorder traversal is of only 4 or 5 lines) rather than a non recurrsive procedure which would be lengthy but not as complex as recurssive procedure in understanding perspective. that is why i hate recurrsion and i prefer to write non recurrsive procedure and i have done that in binary serach trees and avl trees.
Now please elaborate that, prefering non recursive procedures over recurrsive procedures is bad or good thing."
Recursion is a tool like any other. You don't have to use every tool that's available but you should at least understand it.
Recursion makes a certain class of problems very easy and elegant to solve and your "hatred" of it is irrational at best. It's just a different way of doing things.
The "canonical" recursive function (factorial) is shown below in both recursive and iterative forms and, in my opinion, the recursive form more clearly reflects the mathematical definition of f(1) = 1, f(n) = n*f(n-1) for n>1.
Iterative: Recursive:
def fact(n): def fact(n):
r = n if n == 1:
while n > 1: return 1
r = r * n return n * fact(n-1)
n = n - 1
return r
Pretty much the only place I would prefer an iterative solution to a recursive one (for solutions that are really well suited for recursion) is when the growth in stack size may lead to problems (the above factorial function may well be one of those since stack growth depends on n but it may also be optimised to an iterative solution by the compiler). But this stack overflow rarely happens since:
Most stacks can be configured where necessary.
Recursion (especially tail-end recursion where the recursive call is the last thing that happens in the function) can usually be optimised to an iterative solution by an intelligent compiler.
Most algorithms I use in recursive situations (such as balanced trees and so on, as you mention) tend to be O(logN) and stack use doesn't grow that fast with increased data. For example, you can process a 16-way tree storing two billion entries with only seven levels of stack (167 =~ 2.6 billion).
You should read about Tail Recursion. In general, if a compiler manages to apply tail recursion to a procedure, it it quite effective, if not, then not so.
Also a important issue is the maximum recusion depth of your compiler -- usually it's limited by the stack size. The downside here is that there's no graceful way to handle a stack overflow.
Recursion is elegant, but prone to stack overflowing. Use tail-end recursion whenever possible to give the compiler the chance to convert it an iterative solution.
It's definitely you decision which tool you want to use - but keep in mind that most algorithms dealing with tree-like data structures are usually implemented recursively. As it's common practice, your code is easier to read and less surprising for others.
Recursion is a tool. Sometimes using the "tool of recursion" makes the code easier to read, although not necessarily easier to comprehend.
In general, recursive solutions tend to be good candidates where a "divide and conquer" approach to solving a specific problem is natural.
Typically, recursion is a good fit where you can look at a problem and say "aha, if I knew the answer for a simpler variant f this problem, I could use that solution to generate the answer I want" and "the simplest possible problem is P and its solution is S". Then, the code to solve the problem as a whole boils down to looking at the in-data, simplifying it, recursively generate a (simpler) answer and then go from the simpler answer to the answer as a whole.
If we consider the problem of counting the levels of a tree, the answer is that the height of the tree is 1 more than the height of the "tallest/deepest" of the children and the height of a leaf is 1. Something like the following code. The problem can be solved iteratively, but you'd, essentially, re-implement the call stack in your own data structures.
def tree_height (tree):
if tree.is_leaf():
return 1
childmax = 0;
for child in tree.children():
childmax=max(childmax, tree_height(child))
return childmax+1
It's also worth considering that Tail Call Optimization can make some recursive functions running in constant stack space.
Related
Okay so I need some help with a homework problem I have in my data structures class. We are to code a recursive function for the following.
g(2x)=(2g(x)) / (1+g^2(x))... -1<=x<=1... For |x| < E (epsilon).... E=10^-6.....g(x)=x+x3/6....code for dx=10^-1.
I honestly don't know how to code this recursively. After having running code we are to then write out the time complexity in big oh notation, however I'm still stuck on step one. Any help with explanations would be greatly appreciated as I no a problem along this line will be on the final
You can use a while loop over the value of epsilon (or implement it through a recursive call). Once you get your approximation close enough (10^-6) to what you need to approximate, then you stop.
And, the complexity of such a program is somehow linked to the cyclomatic complexity (it is to say, the minimal number of nested loops or recursive calls you need to express your algorithm). In your case, the complexity is in O(n) (if you need only one loop) or O(n^2) (if you need two nested loops).
Note that the dx is the step of your approximation function.
In one of the applications I work on, it is necessary to have a function like this:
bool IsInList(int iTest)
{
//Return if iTest appears in a set of numbers.
}
The number list is known at app load up (But are not always the same between two instances of the same application) and will not change (or added to) throughout the whole of the program. The integers themselves maybe large and have a large range so it is not efficient to have a vector<bool>. Performance is a issue as the function sits in a hot spot. I have heard about Perfect hashing but could not find out any good advice. Any pointers would be helpful. Thanks.
p.s. I'd ideally like if the solution isn't a third party library because I can't use them here. Something simple enough to be understood and manually implemented would be great if it were possible.
I would suggest using Bloom Filters in conjunction with a simple std::map.
Unfortunately the bloom filter is not part of the standard library, so you'll have to implement it yourself. However it turns out to be quite a simple structure!
A Bloom Filter is a data structure that is specialized in the question: Is this element part of the set, but does so with an incredibly tight memory requirement, and quite fast too.
The slight catch is that the answer is... special: Is this element part of the set ?
No
Maybe (with a given probability depending on the properties of the Bloom Filter)
This looks strange until you look at the implementation, and it may require some tuning (there are several properties) to lower the probability but...
What is really interesting for you, is that for all the cases it answers No, you have the guarantee that it isn't part of the set.
As such a Bloom Filter is ideal as a doorman for a Binary Tree or a Hash Map. Carefully tuned it will only let very few false positive pass. For example, gcc uses one.
What comes to my mind is gperf. However, it is based in strings and not in numbers. However, part of the calculation can be tweaked to use numbers as input for the hash generator.
integers, strings, doesn't matter
http://videolectures.net/mit6046jf05_leiserson_lec08/
After the intro, at 49:38, you'll learn how to do this. The Dot Product hash function is demonstrated since it has an elegant proof. Most hash functions are like voodoo black magic. Don't waste time here, find something that is FAST for your datatype and that offers some adjustable SEED for hashing. A good combo there is better than the alternative of growing the hash table.
#54:30 The Prof. draws picture of a standard way of doing perfect hash. The perfect minimal hash is beyond this lecture. (good luck!)
It really all depends on what you mod by.
Keep in mind, the analysis he shows can be further optimized by knowing the hardware you are running on.
The std::map you get very good performance in 99.9% scenarios. If your hot spot has the same iTest(s) multiple times, combine the map result with a temporary hash cache.
Int is one of the datatypes where it is possible to just do:
bool hash[UINT_MAX]; // stackoverflow ;)
And fill it up. If you don't care about negative numbers, then it's twice as easy.
A perfect hash function maps a set of inputs onto the integers with no collisions. Given that your input is a set of integers, the values themselves are a perfect hash function. That really has nothing to do with the problem at hand.
The most obvious and easy to implement solution for testing existence would be a sorted list or balanced binary tree. Then you could decide existence in log(N) time. I doubt it'll get much better than that.
For this problem I would use a binary search, assuming it's possible to keep the list of numbers sorted.
Wikipedia has example implementations that should be simple enough to translate to C++.
It's not necessary or practical to aim for mapping N distinct randomly dispersed integers to N contiguous buckets - i.e. a perfect minimal hash - the important thing is to identify an acceptable ratio. To do this at run-time, you can start by configuring a worst-acceptible ratio (say 1 to 20) and a no-point-being-better-than-this-ratio (say 1 to 4), then randomly vary (e.g. changing prime numbers used) a fast-to-calculate hash algorithm to see how easily you can meet increasingly difficult ratios. For worst-acceptible you don't time out, or you fall back on something slower but reliable (container or displacement lists to resolve collisions). Then, allow a second or ten (configurable) for each X% better until you can't succeed at that ratio or reach the no-pint-being-better ratio....
Just so everyone's clear, this works for inputs only known at run time with no useful patterns known beforehand, which is why different hash functions have to be trialed or actively derived at run time. It is not acceptible to simple say "integer inputs form a hash", because there are collisions when %-ed into any sane array size. But, you don't need to aim for a perfectly packed array either. Remember too that you can have a sparse array of pointers to a packed array, so there's little memory wasted for large objects.
Original Question
After working with it for a while, I came up with a number of hash functions that seemed to work reasonably well on strings, resulting in a unique - perfect hashing.
Let's say the values ranged from L to H in the array. This yields a Range R = H - L + 1.
Generally it was pretty big.
I then applied the modulus operator from H down to L + 1, looking for a mapping that keeps them unique, but has a smaller range.
In you case you are using integers. Technically, they are already hashed, but the range is large.
It may be that you can get what you want, simply by applying the modulus operator.
It may be that you need to put a hash function in front of it first.
It also may be that you can't find a perfect hash for it, in which case your container class should have a fall back position.... binary search, or map or something like that, so that
you can guarantee that the container will work in all cases.
A trie or perhaps a van Emde Boas tree might be a better bet for creating a space efficient set of integers with lookup time bring constant against the number of objects in the data structure, assuming that even std::bitset would be too large.
I've been using haskell for quite a while now, and I've read most of Real World Haskell and Learn You a Haskell. What I want to know is whether there is a point to a language using lazy evaluation, in particular the "advantage" of having infinite lists, is there a task which infinite lists make very easy, or even a task that is only possible with infinite lists?
Here's an utterly trivial but actually day-to-day useful example of where infinite lists specifically come in handy: When you have a list of items that you want to use to initialize some key-value-style data structure, starting with consecutive keys. So, say you have a list of strings and you want to put them into an IntMap counting from 0. Without lazy infinite lists, you'd do something like walk down the input list, keeping a running "next index" counter and building up the IntMap as you go.
With infinite lazy lists, the list itself takes the role of the running counter; just use zip [0..] with your list of items to assign the indices, then IntMap.fromList to construct the final result.
Sure, it's essentially the same thing in both cases. But having lazy infinite lists lets you express the concept much more directly without having to worry about details like the length of the input list or keeping track of an extra counter.
An obvious example is chaining your data processing from input to whatever you want to do with it. E.g., reading a stream of characters into a lazy list, which is processed by a lexer, also producing a lazy list of tokens which are parsed into a lazy AST structure, then compiled and executed. It's like using Unix pipes.
I found it's often easier and cleaner to just define all of a sequence in one place, even if it's infinite, and have the code that uses it just grab what it wants.
take 10 mySequence
takeWhile (<100) mySequence
instead of having numerous similar but not quite the same functions that generate a subset
first10ofMySequence
elementsUnder100ofMySequence
The benefits are greater when different subsections of the same sequence are used in different areas.
Infinite data structures (including lists) give a huge boost to modularity and hence reusability, as explained & illustrated in John Hughes's classic paper Why Functional Programming Matters.
For instance, you can decompose complex code chunks into producer/filter/consumer pieces, each of which is potentially useful elsewhere.
So wherever you see real-world value in code reuse, you'll have an answer to your question.
Basically, lazy lists allow you to delay computation until you need it. This can prove useful when you don't know in advance when to stop, and what to precompute.
A standard example is u_n a sequence of numerical computations converging to some limit. You can ask for the first term such that |u_n - u_{n-1}| < epsilon, the right number of terms is computed for you.
Now, you have two such sequences u_n and v_n, and you want to know the sum of the limits to epsilon accuracy. The algorithm is:
compute u_n until epsilon/2 accuracy
compute v_n until epsilon/2 accuracy
return u_n + v_n
All is done lazily, only the necessary u_n and v_n are computed. You may want less simple examples, eg. computing f(u_n) where you know (ie. know how to compute) f's modulus of continuity.
Sound synthesis - see this paper by Jerzy Karczmarczuk:
http://users.info.unicaen.fr/~karczma/arpap/cleasyn.pdf
Jerzy Karczmarcuk has a number of other papers using infinite lists to model mathematical objects like power series and derivatives.
I've translated the basic sound synthesis code to Haskell - enough for a sine wave unit generator and WAV file IO. The performance was just about adequate to run with GHCi on a 1.5GHz Athalon - as I just wanted to test the concept I never got round to optimizing it.
Infinite/lazy structures permit the idiom of "tying the knot": http://www.haskell.org/haskellwiki/Tying_the_Knot
The canonically simple example of this is the Fibonacci sequence, defined directly as a recurrence relation. (Yes, yes, hold the efficiency complaints/algorithms discussion -- the point is the idiom.): fibs = 1:1:zipwith (+) fibs (tail fibs)
Here's another story. I had some code that only worked with finite streams -- it did some things to create them out to a point, then did a whole bunch of nonsense that involved acting on various bits of the stream dependent on the entire stream prior to that point, merging it with information from another stream, etc. It was pretty nice, but I realized it had a whole bunch of cruft necessary for dealing with boundary conditions, and basically what to do when one stream ran out of stuff. I then realized that conceptually, there was no reason it couldn't work on infinite streams. So I switched to a data type without a nil -- i.e. a genuine stream as opposed to a list, and all the cruft went away. Even though I know I'll never need the data past a certain point, being able to rely on it being there allowed me to safely remove lots of silly logic, and let the mathematical/algorithmic part of my code stand out more clearly.
One of my pragmatic favorites is cycle. cycle [False, True] generates the infinite list [False, True, False, True, False ...]. In particular, xs ! 0 = False, xs ! 1 = True, so this is just says whether or not the index of the element is odd or not. Where does this show up? Lot's of places, but here's one that any web developer ought to be familiar with: making tables that alternate shading from row to row.
The general pattern seen here is that if we want to do some operation on a finite list, rather than having to construct a specific finite list that will “do the thing we want,” we can use an infinite list that will work for all sizes of lists. camcann’s answer is in this vein.
I wonder what are the advantages and disadvantages of these two algorithms. I want to write AddEmUp C++ solved, but I'm not sure which (IDA or DFID) algorithm should I use.
The best article I found is this one, but it seems too old - '93. Any newer?
I think IDA* would be better, but.. ? Any other ideas?
Any ideas and info would be helpful.
Thanks ! (:
EDIT: Some good article about IDA* and good explanation of the algorithm?
EDIT2: Or some good heuristic function for that game? I have no idea how to think of some :/
The Russel & Norvig book is an excellent reference on these algorithms, and I'll give larsmans a virtual high-five for suggesting it; however I disagree that IDA* is in any appreciable way harder to program than A*. I've done it for a project where I had to write an AI to solve a sliding-block puzzle - the familiar problem of having a N x N grid of numbered tiles, and using the single free space to slide tiles around until they are in ascending order.
Recall:
F(n) = g(n) + h(n).
TotalCost = PathCost + Heuristic.
g(n) = Path cost, the distance from the initial to the current state
h(n) = Heuristic, the estimation of cost from current state to end state. To be an admissible heuristic (and thus ensure A*'s optimality), you cannot in any case overestimate the cost. See this question for more info on the effects of overestimating/underestimating heuristics on A*.
Remember that Iterative Deepening A* is just A* with a limit on the F value of nodes you are allowed to traverse. This FLimit increases with each outer iteration; with each iteration you are deepening the search.
Here's my C++ code implementing both A* and IDA* to solve the aforementioned sliding block puzzle. You can see that I use a std::priority_queue with a custom Comparator to store Puzzle states in the queue prioritized by their F value. You will also note that the only difference between A* and IDA* is the addition of an FLimit check and an outer loop that increments this FLimit. I hope this helps shed some light on this subject.
Check out Russell & Norvig, chapters 3 and 4, and realize that IDA* is hard to program correctly. You might want to try recursive best first search (RBFS), also described by R&N, or plain old A*. The latter can be implemented using an std::priority_queue.
IIRC, R&N described IDA* in the first edition, then replaced it with RBFS in the second. I haven't seen the third edition yet.
As regards your second edit, I haven't looked into the game, but a good procedure for deriving heuristics is that of relaxed problems. You take away the rules of the game until you derive a version for which the heuristic is easily expressed and implemented (and cheap to compute). Or, following a bottom-up approach, you check the main rules to see which one admits an easy heuristic, then try that and add in other rules as you need them.
DFID is just a special case of IDA* where the heuristic function is the constant 0; in other words, it has no provision for introducing heuristics. If the problem is not small enough that it can be solved without using heuristics, it seems you have no choice but to use IDA* (or some other member of the A* family). That said, IDA* is really not that hard: the implementation provided by the authors of AIMA is only about half a page of Lisp code; I imagine a C++ implementation shouldn't take more than twice that.
I have given a tree like this:
http://www.seqan.de/dddoc/html/streePreorder.png http://www.seqan.de/dddoc/html/streePreorder.png
i can acces each node with the next operator.
// postorder dfs
Iterator< Index<String<char> >, BottomUp<> >::Type myIterator(myIndex);
for (; !atEnd(myIterator); goNext(myIterator))
// do something with myIterator
but i want to use a recursive algorithm on the tree.
Is there a way i can make the recursive algorithm (exlude the biggest subtree on each node) iterative ?
or how i can acces the elements non-recursively ?
Edit:
The Actual problem :
I have given a recursive algorithm , that works on trees. (recursive)
I also use a library where i only can acces the Items with an iterator ( non standard , iterative)
recursive <-> iterative.
How can i solve this ?
You can convert that recursive function to an iterative function with the help of a stack.
//breadth first traversal pseudo-code
push root to a stack
while( stack isn't empty )
pop element off stack
push children
perform action on current node
depending on how you want to traverse the nodes the implementation will be different. All recursive functions can be transformed to iterative ones. A general usage on how requires information on the specific problem. Using stacks/queues and transforming into a for loop are common methods that should solve most situations.
You should also look into tail recursion and how to identify them, as these problems nicely translates into a for loop, many compilers even do this for you.
Some, more mathematically oriented recursive calls can be solved by recurrence relations. The likelihood that you come across these which haven't been solved yet is unlikely, but it might interest you.
//edit, performance?
Really depends on your implementation and the size of the tree. If there is a lot of depth in your recursive call, then you will get a stack overflow, while an iterative version will perform fine. I would get a better grasp on recursion (how memory is used), and you should be able to decide which is better for your situation. Here is an example of this type of analysis with the fibonacci numbers.
If your iterator only supports forward (and possibly backward) traversal, but not following links on the tree or fast random access, you will have a very hard time adapting a tree algorithm to it. However, in the end any answer will depend on the interface presented by your custom iterators, which you have not provided.
For example, consider the easy algorithm of tree search. If the only operation given by your iterator is "start from the first element and move on one-by-one", you obviously cannot implement tree search efficiently. Nor can you implement binary search. So you must provide a list of exactly what operations are supported, and (critically) the complexity bounds for each.
Any recursive function can alternatively be implemented with stacks. If this is the question you are asking.
Here is an article by Phil Haack on the subject.
Performance gains one way or the other are speculative, the compiler does things with our code behind the scenes that can't always predict. Implement both and get some real numbers. If they are similar use the one that you find more readable.
Even with recursive iteration, you end up with a node-per-node visit.
What you need to know is: how can my iterator be told to go depth-first, and then: how will I be notified that one level has started/ended (i.e. the start/end of a recursion step).
That knowledge can be mapped onto the recursive algorithm.