Heaps with changeable relationship - heap

A type of data structure very common and very used is the Heap. This data structure is characterized by a relationship imposed upon its nodes: if < is a relationship and node is a node, then node < node.child for all the children of the node.
But what happens when you need to change the relationship?
The brute-force way is simply to extract all elements from the Heap and add them to another Heap having the desired relationship: usually that takes O(n log(n)) time. Does it exist a Heap that allow to change its relationship in a faster way, possibly with an in-place algorithm?

Related

What is the most efficient data structure for designing a PRIM algorithm?

I am designing a Graph in c++ using a hash table for its elements. The hashtable is using open addressing and the Graph has no more than 50.000 edges. I also designed a PRIM algorithm to find the minimum spanning tree of the graph. My PRIM algorithm creates storage for the following data:
A table named Q to put there all the nodes in the beginning. In every loop, a node is visited and in the end of the loop, it's deleted from Q.
A table named Key, one for each node. The key is changed when necessary (at least one time per loop).
A table named Parent, one for each node. In each loop, a new element is inserted in this table.
A table named A. The program stores here the final edges of the minimum spanning tree. It's the table that is returned.
What would be the most efficient data structure to use for creating these tables, assuming the graph has 50.000 edges?
Can I use arrays?
I fear that the elements for every array will be way too many. I don't even consider using linked lists, of course, because the accessing of each element will take to much time. Could I use hash tables?
But again, the elements are way to many. My algorithm works well for Graphs consisting of a few nodes (10 or 20) but I am sceptical about the situation where the Graphs consist of 40.000 nodes. Any suggestion is much appreciated.
(Since comments were getting a bit long): The only part of the problem that seems to get ugly for very large size, is that every node not yet selected has a cost and you need to find the one with lowest cost at each step, but executing each step reduces the cost of a few effectively random nodes.
A priority queue is perfect when you want to keep track of lowest cost. It is efficient for removing the lowest cost node (which you do at each step). It is efficient for adding a few newly reachable nodes, as you might on any step. But in the basic design, it does not handle reducing the cost of a few nodes that were already reachable at high cost.
So (having frequent need for a more functional priority queue), I typically create a heap of pointers to objects and in each object have an index of its heap position. The heap methods all do a callback into the object to inform it whenever its index changes. The heap also has some external calls into methods that might normally be internal only, such as the one that is perfect for efficiently fixing the heap when an existing element has its cost reduced.
I just reviewed the documentation for the std one
http://en.cppreference.com/w/cpp/container/priority_queue
to see if the features I always want to add were there in some form I hadn't noticed before (or had been added in some recent C++ version). So far as I can tell, NO. Most real world uses of priority queue (certainly all of mine) need minor extra features that I have no clue how to tack onto the standard version. So I have needed to rewrite it from scratch including the extra features. But that isn't actually hard.
The method I use has been reinvented by many people (I was doing this in C in the 70's, and wasn't first). A quick google search found one of many places my approach is described in more detail than I have described it.
http://users.encs.concordia.ca/~chvatal/notes/pq.html#heap

Is this how I combine two min-heaps together?

I am currently creating a source code to combine two heaps that satisfy the min heap property with the shape invariant of a complete binary tree. However, I'm not sure if what I'm doing is the correct accepted method of merging two heaps satisfying the requirements I laid out.
Here is what I think:
Given two priority queues represented as min heaps, I insert the nodes of the second tree one by one into the first tree and fix the heap property. Then I continue this until all of the nodes in the second tree is in the first tree.
From what I see, this feels like a nlogn algorithm since I have to go through all the elements in the second tree and for every insert it takes about logn time because the height of a complete binary tree is at most logn.. But I think there is a faster way, however I'm not sure what other possible method there is.
I was thinking that I could just insert the entire tree in, but that break the shape invariant and order invariant..Is my method the only way?
In fact building a heap is possible in linear time and standard function std::make_heap guarantees linear time. The method is explained in Wikipedia article about binary heap.
This means that you can simply merge heaps by calling std::make_heap on range containing elements from both heaps. This is asymptotically optimal if heaps are of similar size. There might be a way to exploit preexisting structure to reduce constant factor, but I find it not likely.

How to do per node caching in a tree visitor

I have an application where a want to calculate different representations (mesh, voxelization, signed distance function, ...) of a tree of primitives (leaf nodes) that are combined via boolean operations (inner nodes).
My first approach to this was to write an abstract base class with a virtual getter function for each of the different representations and cached the intermediate results at the respective nodes as long as there was no change in their subtree (which would flush their cache).
However, I was unsatisfied with the ugly coupling of the tree structure with each of the different representations. To alleviate this I removed the abstract base classes and instead set up a visitor for each of the representations.
This neatly decoupled the tree from the representations but left me with the problem that I now need to cache the intermediate results somewhere else and this is where my problem starts.
TL;DR
How do I cache (arbitrary many differently typed) intermediate values at inner nodes of the tree without making the tree dependent on the value type?
My Approaches
The requirements offer two choices:
store the data in the tree but with type erasure
store the data outside the tree and somehow "connect" it to the node
The first one leaves me puzzled with some efficiency problem: I could easily add a container of boost::any (or something equivalent) in the nodes but then each visitor would have to search the whole container for it's own data.
The separation in the second one introduces the problem of keeping the cache up to date to the current tree. If there are changes in the tree (deletions, alterations of nodes) the cached values must at least be invalidated. My intuition was to use some hash function and an unordered_map but I hit some problems there as well:
I cannot use the treenodes themselves as key, so I need to introduce another class that just references tree nodes and represents them in the tree
referencing the values from the unordered_map's keys requires to erase all entries whose referencees are deleted or we have a dangling reference(/pointer) in the unordered_map which could get triggered on rehash
changes in the tree would require to reconstruct the unordered_map because keys might have changed
Am I missing some obvious solution to this?
Which approach would you favor (and why)?
I once had a similar problem and my solution was as follows:
Let each node have an unique identifier.
Let each node have a version number. Modifications that invalidate calculated values for the node just increase the version number.
Let each visitor have a caching map, where the ID pair is the key, mapped to a version/value pair.
When (re-)walking the tree, look for a node's entry in the map. If the version is correct, use the cached value. If it is outdated, calculate the new value and replace the old version/value pair.
At first, I used the node's address as IDs, but for memory reasons I had to reuse subtrees and picked the path to the node as ID. Such a path has the advantage that it can be calculated by each visitor and need not be stored at the node. In my case, each node could have at most two children, so a path was merely a set of left/right decisions, which can be stored in a simple unsigned int with some bit-shifting (my trees did never reach a depth of 32, so a 32 bit unsigned was more than enough as key).

How to convert a struct property to a pointer reference in C++?

I have a DAG in a JSON format, where each node is an entry: it has a name, and two arrays. One array is for other nodes with arrows coming into it, another array for nodes that this node is directed towards (outgoing arrows).
So, for example:
{
'id': 'A',
'connected_from' : ['B','C'],
'connects_to' : ['D','E']
}
And I have a collection of these nodes, that all together form a DAG.
I'd like to map the nodes to a struct to hold these nodes, where the id is simply a string, and I'd like the arrays to be vectors of pointers of this struct:
struct node {
string id;
vector<node*> connected_from;
vector<node*> connected_to;
}
How do I convert the node entries as 'id' in the arrays of the JSON to a pointer to the correct struct holding that node?
One obvious approach is to build a map of key-value pairs, where key = id, value = the pointer to the struct with that id, and do a lookup - but is there a better way?
no, given only the information that you've provided there isn't a better way: you need to build a map.
however, for single letter id's the map can possibly take the form of a simple array with e.g. 26 entries for the English alphabet.
There's going to be some container object holding all the nodes (otherwise you're going to leak them.) You could always scan over the container to find the nodes. But this will be inefficient - O(N^2) while a map lookup will be O(N log N ).
Though if you store the objects in sorted order in the container (or use a sorted container) you can reduce both cases to O(N log N).
The constants will be different though, so for a small graph the scan may be faster.
I think your suggestion is fine... Map from ID to node. It's simple, intuitive and fast enough for practical purposes. Considering the data is being parsed from JSON, your storage and lookups are not going to significantly impact performance. If you're really concerned, then implement a Dictionary to replace your map.
In general terms, I always advocate the simplest, cleanest approach that gets the job done. Too many people obsess about memory or performance hits in algorithms, when the actual bottleneck in their code lies elsewhere.

Hash table with two keys

I have a large amount of data the I want to be able to access in two different ways. I would like constant time look up based on either key, constant time insertion with one key, and constant time deletion with the other. Is there such a data structure and can I construct one using the data structures in tr1 and maybe boost?
Use two parallel hash-tables. Make sure that the keys are stored inside the element value, because you'll need all the keys during deletion.
Have you looked at Bloom Filters? They aren't O(1), but I think they perform better than hash tables in terms of both time and space required to do lookups.
Hard to find why you need to do this but as someone said try using 2 different hashtables.
Just pseudocode in here:
Hashtable inHash;
Hashtable outHash;
//Hello myObj example!!
myObj.inKey="one";
myObj.outKey=1;
myObj.data="blahblah...";
//adding stuff
inHash.store(myObj.inKey,myObj.outKey);
outHash.store(myObj.outKey,myObj);
//deleting stuff
inHash.del(myObj.inKey,myObj.outKey);
outHash.del(myObj.outKey,myObj);
//findin stuff
//straight
myObj=outHash.get(1);
//the other way; still constant time
key=inHash.get("one");
myObj=outHash.get(key);
Not sure, thats what you're looking for.
This is one of the limits of the design of standard containers: a container in a sense "own" the contained data and expects to be the only owner... containers are not merely "indexes".
For your case a simple, but not 100% effective, solution is to have two std::maps with "Node *" as value and storing both keys in the Node structure (so you have each key stored twice). With this approach you can update your data structure with reasonable overhead (you will do some extra map search but that should be fast enough).
A possibly "correct" solution however would IMO be something like
struct Node
{
Key key1;
Key key2;
Payload data;
Node *Collision1Prev, *Collision1Next;
Node *Collision2Prev, *Collision2Next;
};
basically having each node in two different hash tables at the same time.
Standard containers cannot be combined this way. Other examples I coded by hand in the past are for example an hash table where all nodes are also in a doubly-linked list, or a tree where all nodes are also in an array.
For very complex data structures (e.g. network of structures where each one is both the "owner" of several chains and part of several other chains simultaneously) I even resorted sometimes to code generation (i.e. scripts that generate correct pointer-handling code given a description of the data structure).