Odd behaviour from Array in BinaryHeap - d

I have a tree-like structure using Node objects with references to other Node objects. Node is a class. Now, one of the routines I'm writing needs a minimum priority queue, which I'm implementing using std.container.BinaryHeap and std.container.Array. I'm instantiating it as follows:
Node[] r;
auto heap = BinaryHeap!(Array!(Node), "a > b")(Array!Node(r));
As part of the routine, I insert elements into heap using insert and remove elements from it using removeAny. Now, the routine works correctly, but afterwards, the tree-like structure breaks (my invariants for it fail), due to nodes being missing. What's going on here and why is this happening?

could be http://d.puremagic.com/issues/show_bug.cgi?id=6998 - std.container.Array destroys class instances

Related

8 Puzzle: Sorting STL Heap/Priorirty Queue containing pointers to object by member variable

I'm working on implementing a Best-First Search algorithm to solve an 8-Puzzle problem for an assignment. Based on the requirements, it must be implemented using a (min) Priority Queue or Heap located in the Standard Template Library (STL).
I understand that it would be useful to use either data structure to organise expanded puzzle states by best heuristic cost (ie. the smallest cost).
Beginning with a 3x3 matrix (implemented using an array)
Puzzle *current=new Puzzle(initialState, goalState);
Each new puzzle state (an object) is created using:
Puzzle *next=(current->moveDown());
Puzzle *next=(current->moveRight());
Puzzle *next=(current->moveUp());
Puzzle *next=(current->moveRight());
I'd like to .push(next) onto a (Min) Priority Queue or Heap (of Puzzle*), sorted according to next->FCost.
Generically, is there a way that I can use either of these STL data structures to contain pointers to objects - sorted by a member variable (FCost) specific to each object?
Yes, you can specify a custom compare function for priority_queue
auto cmp = [](Puzzle* a, Puzzle* b) {
return a->FCost > b->FCost;
};
std::priority_queue<Puzzle*, std::vector<Puzzle*>, decltype(cmp)> queue(cmp);

Two priority queues for same pointers in c++

I have class:
class A{
//fields, methods
;
I need an efficient data structure that allows you to choose from a variety of pointers to objects of class A minima and maxima (it should work online, that is the choice of questions will alternate with requests for adding new poiters). This can be done by using two priority queues:
priority_queue<A*, vector<A*>, ComparatorForFindingLightestObjects>* qL;
priority_queue<A*, vector<A*>, ComparatorForFindingHardestObjects>* qH;
The problem is that if the object pointer is extracted from the first queue, then after a while the object is destroyed, but since a pointer to the object is still present in another queue there happens errors of reading data from the freed memory.
How solve this problem by means of the standard STL containers without writing own data structures?
I believe you're looking for boost::multi_index which is a single container accessible but multiple different "views": http://www.boost.org/doc/libs/1_59_0/libs/multi_index/doc/index.html
I think you can use std::set and delete the entry from the second set as soon as you extract the data from the first. Performance wise, both give O(log(n)) lookup and insertion. I'm not sure if this is what you want but i'll try
//Use std::set as your priority queue instead
set<A*, ComparatorForFindingLightestObjects> qL;
set<A*, ComparatorForFindingHardestObjects> qH;
auto it=qL.begin(); //The first element
if(it!=aL.end())
{
A* curr=*it;
qL.erase(curr); //Delete it from this
qH.erase(curr); //Delete this from the other queue as well
}
Also, I think you can merge your two queues or whatever and just maintain one container. You can access the minimum and maximum elements by *containerName.begin() and *containerName.rbegin() respectively

multimap representation in memory

I'm debugging my code and at one point I have a multimap which contains pairs of a long and a Note object created like this:
void Track::addNote(Note &note) {
long key = note.measureNumber * 1000000 + note.startTime;
this->noteList.insert(make_pair(key, note));
}
I wanted to look if these values are actually inserted in the multi map so I placed a breakpoint and this is what the multimap looks like (in Xcode):
It seems like I can infinitely open the elements (my actual multimap is the first element called noteList) Any ideas if this is normal and why I can't read the actual pair values (the long and the Note)?
libstdc++ implements it's maps and sets using a generic Red/Black tree. The nodes of the tree use a base class _Rb_tree_node_base which contain pointers to the same types for the parent/left/right nodes.
To access the data, it performs a static cast to the node type that's specific to the template arguments you provided. You won't be able to see the data using XCode unless you can force the cast.
It does something similar with linked lists, with a linked list node base.
Edit: It does this to remove the amount of duplicate code that is generated by the template. Rather than have a RbTree<Type1>, RbTree<Type2>, and so on; libstdc++ has a single set of operations that work on the base class, and those operations are the same regardless of the underlying type of the map. It only casts when it needs to examine the data, and the actual rotation/rebalance code is the same for all of the trees.
Seems like a bug in the component that renders the collection. About halfway down the list there is an entry that is 0x00000000, but the rendering continues below that, without any valid pointers though. Perhaps you need to add your own common-sense interpretation of the displayed data and treat a null value as the end of that part of the tree.

How to load/save C++ class instance (using STL containers) to disk

I have a C++ class representing a hierarchically organised data tree which is very large (~Gb, basically as large as I can get away with in memory). It uses an STL list to store information at each node plus iterators to other nodes. Each node has only one parent, but 0-10 children.
Abstracted, it looks something like:
struct node {
public:
node_list_iterator parent; // iterator to a single parent node
double node_data_array[X];
map<int,node_list_iterator> children; // iterators to child nodes
};
class strategy {
private:
list<node> tree; // hierarchically linked list of nodes
struct some_other_data;
public:
void build(); // build the tree
void save(); // save the tree from disk
void load(); // load the tree from disk
void use(); // use the tree
};
I would like to implement the load() and save() to disk, and it should be fairly fast, however the obvious problems are:
I don't know the size in advance;
The data contains iterators, which
are volatile;
My ignorance of C++ is prodigious.
Could anyone suggest a pure C++ solution please?
It seems like you could save the data in the following syntax:
File = Meta-data Node
Node = Node-data ChildCount NodeList
NodeList = sequence (int, Node)
That is to say, when serialized the root node contains all nodes, either directly (children) or indirectly (other descendants). Writing the format is fairly straightforward: just have a recursive write function starting at the root node.
Reading isn't that much harder. std::list<node> iterators are stable. Once you've inserted the root node, its iterator will not change, not even when inserting its children. Hence, when you're reading each node you can already set the parent iterator. This of course leaves you with the child iterators, but those are trivial: each node is a child of its parents. So, after you've read all nodes you'll fix up the child iterators. Start with the second node, the first child (The first node one was the root) and iterate to the last child. Then, for each child C, get its parent and the child to its parent's collection. Now, this means that you have to set the int child IDs aside while reading, but you can do that in a simple std::vector parallel to the std::list<node>. Once you've patched all child IDs in the respective parents, you can discard the vector.
You can use boost.serialization library. This would save entire state of your container, even the iterators.
boost.serialization is a solution, or IMHO, you can use SQLite + Visitor pattern to load and save these nodes, but it won't be easy as it sounds.
Boost Serialization has already been suggested, and it's certainly a reasonable possibility.
A great deal depends on how you're going to use the data -- the fact that you're using a multiway tree in memory doesn't mean you necessarily have to store it as a multiway tree on disk. Since you're (apparently) already pushing the limits of what you can store in memory, the obvious question is whether you're just interested in serializing the data so you can re-constitute the same tree when needed, or whether you want something like a database so you can load parts of the information into memory as needed, and update records as needed.
If you want the latter, some of your choices will also depend on how static the structure is. For example, if a particular node has N children, is that number constant or is it subject to change? If it's subject to change, is there a limit on the maximum number of children?
If you do want to be able to traverse the structure on disk, one obvious possibility would be as you write it out, substitute the file offset of the appropriate data in place of the iterator you're using in memory.
Alternatively, since it looks like (at least most of) the data in an individual node has a fixed size, you might create a database-like structure of fixed-size records, and in each record record the record numbers of the parent/children.
Knowing the overall size in advance isn't particularly important (offhand, I can't think of any way I'd use the size even if it was known in advance).
Actually, I think your best option is to move the entire data structure into database tables. That way you get the benefit of people much smarter then you (or me) having dealt with issues of serialization. It will also prevent you from having to worry about whether the structure can fit into memory.
I've answered something like this on SO before, so I will summarize:
1. Use a database.
2. Substitute file offsets for links (pointers).
3. Store the data without the tree structure, in records, as a database would.
4. Use XML to create the tree structure, using node names instead of links.
5. This would be soooo much easier if you used a database like SqLite or MySQL.
When you spend too much time on the "serialization" and less on the primary purpose of your project, you need to use a database.
If you're doing it for persistence then there are several solutions you can use from the web i.e. google "persist std::list" or you can roll your own using mmap to create a file backed memory area.

What's the simplest and most efficient data structure for building acyclic dependencies?

I'm trying to build a sequence that determines the order to destroy objects. We can assume there are no cycles. If an object A uses an object B during its (A's) construction, then object B should still be available during object A's destruction. Thus the desired order of destruction is A, B. If another object C uses object B during its (C's) construction as well, then the desired order is A, C, B. In general, as long as an object X is only destroyed after all other objects that used that object during their construction, the destruction is safe.
If our destruction order so far is AECDBF, and we now are given an X (we never know before hand what order the construction will initially happen in, it's discovered on the fly), that uses C and F during its construction, then we can get a new safe order by putting X before whichever is currently earlier in the list, C or F (happens to be C). So the new order would be ABXCDEF.
In the context of the X example, a linked list seems unsuitable because a lot of linear scanning would be involved to determine which is earlier, C or F. An array will mean slow insertions which is going to be one of the more common operations. A priority queue doesn't really have a suitable interface, there's no, "Insert this item before whichever one of these items is earliest" (we don't know the right priority before hand to make sure it's inserted before the lower priority element and without disturbing other entries).
All objects are constructed, desired order is computed, and the sequence will be iterated once and destructed in order. No other operations need to be done (in fact, after using whatever data structure to determine the order, it could be copied into a flat array and discarded).
Edit: Just to clarify, the first time an object is used is when it is constructed. So if A uses B, then E uses B, when E tries to use B it has already been created. This means a stack won't give the desired order. AB will become ABE when we want AEB.
Edit2: I'm trying to build the order 'as I go' to keep the algorithm in place. I would prefer to avoid building up a large intermediate structure and then converting that to a final structure.
Edit3: I made this too complicated ;p
Since dependencies are always initialised before the objects that depend on them, and remain available until after such objects are destroyed, it should always be safe to destroy objects in strictly reverse order of initialisation. So all you need is a linked list to which you prepend objects as they are initialised and walk on destruction, and for each object to request initialisation of all its dependencies that have not yet been initialised before it initialises itself.
So for initialisation of each object:
initialise self, initialising uninitialised dependencies as we go
add self to front of destruction list (or push self onto stack if you're using a stack)
and for destruction, just walk the linked list from the front forwards (or pop items off stack until empty), destroying as you go. The example in your first paragraph initialised in order B, A, C would thus be destroyed in order C, A, B - which is safe; the example in your edit would be initialised in order B, A, E (not A, B, E since A depends on B), and thus destroyed in order E, A, B, which is also safe.
Store it as a tree
have a node for each resource
have each resource keep a linked list of pointers to the resources that depend on that resource
have each resource keep a count of the number of resources that it depends on
keep a toplevel linked list of the resources that have no dependencies
To generate the order, go through your toplevel linked list
for each resourced processed, add it to the order
then decrement the counts of each resource that depends on it by one
if any count reaches zero, push that resource onto the toplevel list.
When the toplevel list is empty, then you've created a full order.
typedef struct _dependent Dependent;
typedef struct _resource_info ResourceInfo;
struct _dependent
{
Dependent * next;
ResourceInfo * rinfo;
}
struct _resource_info
{
Resource * resource; // whatever user-defined type you're using
size_t num_dependencies;
Dependent * dependents;
}
//...
Resource ** generateOrdering( size_t const numResources, Dependent * freeableResources )
{
Resource ** const ordering = malloc(numResources * sizeof(Resource *));
Resource ** nextInOrder = ordering;
if (ordering == NULL) return NULL;
while (freeableResources != NULL)
{
Dependent * const current = freeableResources;
Dependent * dependents = current->rinfo->dependents;
// pop from the top of the list
freeableResources = freeableResources->next;
// record this as next in order
*nextInOrder = current->rinfo->resource;
nextInOrder++;
free(current->rinfo);
free(current);
while (dependents != NULL)
{
Dependent * const later = dependents;
// pop this from the list
dependents = later->next;
later->rinfo->num_dependencies--;
if (later->rinfo->num_dependencies == 0)
{
// make eligible for freeing
later->next = freeableResources;
freeableResources = later;
}
else
{
free(later);
}
}
}
return ordering;
}
To help create the tree, you might also want to have a quick lookup table to map Resources to ResourceInfos.
It sounds like you should try to build a directed, acyclic graph with the pattern just as you described. An adjacency list representation (vector of linked lists, probably, seeing as you're getting new nodes on the fly) should do it.
One thing I'm not clear on: Do you need the computation at random times, or after you've gotten all of the information? I'm assuming the latter, that you can wait until you're graph is complete. If that's the case, your question is exactly a topological sort, for which there is time-linear-in-edges-and-vertices implementation. It is a relatively simple algorithm. I'm a bit turned around by your description (eating lunch makes me slow and sleepy, sorry), but you may in fact need a "reverse" topological sort, but the principles are identical. I won't try to explain how the algorithm works exactly (see: slow and sleepy), but I think the application should be clear. Unless if I'm entirely wrong, in which case, nevermind?
To summarize:
In a sense, you're building up the data structure, a graph, in about as efficient time as you can hope for (it depends on how you're inserting). The graph reflects which objects need to wait on which other objects. Then when you're done building it you run the topological sort, and that reflects their dependencies.
Edit: It has been a while since I've mixed up "your" and "you're". :(
It sounds to me that you have a directed acyclic graph and a topological sort will give you the order of object destruction.
You will probably also need to special handle the case where the graph has cycles (circular dependencies).
Represent it like that: a graph with an edge from A to B if A's destructor must be run after B's. Inserting X now means adding two edges, and that's O(n log n)) if you keep a sorted index of nodes. To read the destruction order: pick any node, the follow the edges until you cannot anymore. That node's destructor can be safely called. Then pick one of the remaining nodes (e.g. the previous node you traversed) and try again.
From what you say, insertions happen often but the sequence is only iterated once for destruction: this datastructure should then be suitable since it has fast insertions, at the cost of slower lookups. Maybe someone else can suggest a faster way to do lookups in this datastructure.
This sounds like you're building a tree from the leaves up.
Are you more interested in destroying first-class C++ objects in the right order to avoid dependencies, or in modeling some external, real-world behavior where you're more interested in the algorithm and repeatability?
In the first case, you can use smart, reference-counting pointers (look for shared_ptr, available in Boost and the forthcoming C++ standard) to keep track of your objects, possibly with a factory function. When object A initializes and wants to use object B, it calls B's factory function and gets a smart pointer to B, increasing B's reference count. If C also references B, B's reference count increments again. A and C can be freed in any order, and B must be freed last. If you store shared_ptrs to all of your objects in an unordered data structure, when you're done running you'll free the list of all objects, and shared_ptr will take care of the rest, in the right order. (In this example, A and C are referenced only by the list of all objects, so their reference counts are both 1, and B is referenced by each A and C and the list of all objects, so its reference count is 3. When the list of all objects releases its reference to the objects, A and C's reference counts go to 0, so they can be freed in any order. B's reference count doesn't go to 0 until A and C are each freed, so it will continue to live until all references to it are freed.)
If you're more interested in the algorithm, you can model the reference counting in your own data structures, which may end up looking something like a directed acyclic graph when you're done.