How to know the memory size of a trie? - d

Trie: https://en.wikipedia.org/wiki/Trie
I want to impl different kinds of Trie and compare their memory size.
Is there a generic function like a.memoryUsage() in core or std? Or need to impl different function in every class.

Since trie is a recursive structure, you can traverse the top of the trie and keep adding the memory of each variable in each trie to a sum.
Write a general traversal algorithm that accepts a template of tries, a function pointer and for each type of tries write a function that calculates the sum of memory of that sub-trie.
That is of course you want the specific memory usage of a specific kind of trie.
The general memory usage can be described with big O notation by another name: Space complexity

Related

Calculate memory usage of a tree structure in C++

I have a tree structure
struct TrieNode {
std::unordered_map<std::string, TrieNode> children;
std::vector<std::string> terminals;
};
Some details about its usage:
The tree is not modified after it's been populated.
The keys in unordered map are short strings (do not exceed 5 characters).
This structure can grow very large. And I need to calculate its size in memory. This size does not need to be very precise.
Are there any existing approaches to do that?
If no I was thinking of these options:
I can keep track of modifications to this structure separately.
Use a custom allocator for containers that keeps track of the space (is there a common implementation for that?).
Overload new operator for my structure to keep track of memory (not sure how to keep track of insertions into vector after that).
Calculate the size after the tree was populated by traversing the entire tree (last resort as for the large tree it would take really long time but the result is more precise).
What would be the best approach?
The last one. I have following reasons:
It's the simplest among the four approaches.
because the tree is fixed after populated, lazily evaluating the size makes more sense, because:
when the size is not used, we can save the time spend on calculating the size.
It won't take extra time, for the time complexity is also O(n), the only extra time spend is call on recursive functions.
it avoids the presence of global variable

Implement a heap not using an array

I'm prepping for a Google developer interview and have gotten stuck on a question about heaps. I need to implement a heap as a dynamic binary tree (not array) where each node has a pointer to the parent and two children and there is a global pointer to the root node. The book asks "why won't this be enough?"
How can the standard tree implementation be extended to support heap operations add() and deleteMin()? How can these operations be implemented in this data structure?
Can you keep the size of total nodes ? if so, it's easy to know where you should add new element, because that's an almost full tree.
About deleteMin, I think that it will be less effective because you can't access directly to all leaves, as in array (N/2).
You should travel through all paths till you get leaf and then compare them, probably it will cost O(n)

Store Tree Object Directly in Source to Avoid Growing at Run-time

I have 50 (large) decision trees that are currently serialized (in pre-order) as individual, long strings. All of the strings are directly stored in a .cpp declaration file in order to avoid having to read them from a file at run-time. So, at run-time, a function is called that deserializes each string and constructs its corresponding decision tree using a standard recursive process. Subsequently, a set of features (vector of doubles) is dropped down each decision tree and a class prediction is output. A la Random Forest, a majority vote is taken and final class is taken.
I've tried optimizing the code and have discovered that the re-construction of these large trees takes up the majority (~98%) of my run-time. Thus, I wanted to ask if there were some way to hardcode the entire tree object into the .cpp declaration file. So, instead of having to re-construct the trees at run-time, the tree objects are already available to be traversed at run-time.
I you have access to C++11, I think constexpr functions are your solution.
You could write functions to generate the data of the trees at compile-time, storing that data in arrays at compile time.
See this thread for a working usage example.

Heap in Dijkstra algorithm

can someone explain whats the importance of HeapDesc in ShaneSaunders Dijkstra algorithm and how it is used here?
In general i know how Dijkstra algorithm works. But, i din't get the heap part in implementation.
Its a big code. hence am posting a link if u want to have a look at it.
Here go's http://www.cosc.canterbury.ac.nz/research/RG/alg/dijkstra.cpp
In Dijkstra you need an efficient data structure that provides you with the edge of minimum cost that allows you to reach another vertex.
Heap is exactly a data structure that allows you to store the set of edges and efficiently retrieve the one with minimum cost.
HeapDesc probably implements the factory design pattern to create different kinds of Heaps. If you check the file http://www.cosc.canterbury.ac.nz/research/RG/alg/dijkstra.h, you will notice that the heap variable in the constructor is an object of type Heap.
Take a look on this article for the factory design pattern.
http://en.wikipedia.org/wiki/Factory_method_pattern
Dijkstra's algorithm involves a lot of "path of the lowest cost" lookup.
Min or max lookup is what a Heap is optimized for ( O(1) ), and this is why it is used.
As to HeapDesc itself, it just appears to be a factory method, used to allocate a Heap object.
Heap *newInstance(int n) const { return new T(n); }; // from heap.h

Indexing: Implementing Tree data structures with Arrays/Vectors

I have been implementing a heap in C++ using a vector. Since I have to access the children of a node (2n, 2n+1) easily, I had to start at index 1. Is it the right way? As per my implementation, there is always a dummy element at zeroth location.
Your way works. Alternatively you can have root at index 0 and have children at 2n+1 and 2n+2
While this works well for heaps, you end up using a huge amount of redundant memory for other tree data structures that do not necessarily have a full and complete Binary tree. For example, this means that if you have a Binary search tree of 20 nodes with a depth of 5, you end up having to use an array of 2^5=32 instead of 20. Now imagine if you need a tree of 25 nodes with a depth of 22. You end up using a huge array of 4194304, whereas you could have used a linked representation to store just the 25 nodes.
You can still use an array and not incur such a memory hit. Just allocate a large block of memory as an array and use array indices as pointers to the children.
Thus, where you had
node.left = (node.index*2)
node.right = (node.index*2+1)
You simply use
node.left = <index of left child>
node.right = <index of right child>
Or you can just use pointers/references instead of integer indices to an array if your language supports it.
Edit:
It might not be obvious to everyone that a complete binary search tree takes up O(2^d) memory. There are d levels and every level has twice as many nodes as the level its parent is in (because every node except those at the bottom has exactly two children - never one). A binary heap is a binary tree (but not a Binary Search Tree) that is always complete by definition, so an array based implementation outlined by the OP does not incur any real memory overhead. For a heap, that is the best way to implement it in code. OTOH, most other binary trees (esp. Binary Search Trees) are not guaranteed to be complete. So trying to use this approach on would need O(2^depth) memory where depth can be as large as n, where we only need O(n) memory in a linked implementation.
So my answer is: yes, this is the best way for a heap. Just don't try it for other binary trees (unless you're sure they will always be complete).