C++ implementation of game tree - c++

I'm going to represent a chess game as C++ structure. I think, that the best option will be a tree structure(cause at each deep we have several possible moves).
Is it a good approach?
struct TreeElement{
SomeMoveType move;
TreeElement *parent;
std::vector<TreeElement*> children;
};
How to efficiency store this kind of data in a file? Is there a way to make sure, that the whole tree structure will be stored in the same part of memory, which will allow to use mmap function?

To store the data in the same section of memory, you probably want to supply an Allocator object for the std::vector<TreeElement *> you use, and have that allocate from your block.
To be able to serialize it, instead of storing actual pointers, you might consider storing offsets in the block. Then when you read the data back in, you can add the address of the start of the block to each offset to turn it back into an address.
Depending on the OS/compiler you're using, there may be some support for this already. For example, Microsoft's compiler supports __based pointers, that are pretty much what I've described: a base address, and each pointer based off that address is really just an offset, not a full pointer. The mention of mmap indicates that's probably not available to you directly, but it's possible that the compiler/OS you're using has something similar. Otherwise, you'll probably have to do the job on your own (e.g., with a based_pointer class).
The real question is why you're trying to serialize a move-tree at all. In a typical case, you're better off just saving the current board position (or, about equivalently, the move history) and re-generating the move tree from that when/if needed. That's enough smaller that it's really easy to store.

I think using a std::vector is not the best way to create the tree, singly linked lists style tree construction is probably simpler and best.

Related

C++ (semi) Reflection for file save/load? (Hack?)

I have a bunch of structs in C++. I'd like to save it to file and load them up again. Problem is a few of my structs are pointers to base classes(/structs). So i'd need a way to figure out the type and create it. They really are just POD, they all have public members and no constructors.
What is the easiest way to save and load them from file? I have a LOT of structs and the only types i use are ints, pointers or c strings. I am thinking i could do some macro hacks. But really i have no idea what i should do.
Have you tried the Boost serialization library?
Don't roll your own here - use something well-developed and tested. One idea is Protocol Buffers
The pointers pose a specific issue: I suppose that multiple struct may actually refer to the same pointer and that you'd like a single pointer to be recreated when deserializing...
The first idea, to avoid boiler-plate code, is to create a compile-time reflexion tool:
BOOST_FUSION_ADAPT_STRUCT
BOOST_FUSION_ADAPT_STRUCT_NAMED
Those 2 macros will generate some wicked information on your struct so that you can then use them with Fusion algorithms, which cross the gap between compile-time and run-time.
Now, you need something that will be able to serialize and deserialize your data. Deserialization is usually a bit more difficult, though here you have the advantage of no polymorphism (which always makes things difficult).
Normally, on a first pass you identify the graph of objects to serialize, assign them all an ID, and use this ID in lieu of the pointer when serializing. For deserializing, you use a 3-columns map:
the map is ID -> (pointer to allocated object, list of pointers that could not be set)
allocate all objects, filling the ID map with a pointer to the allocated object each time
when you need to deserialize an ID, look it up in the map, if absent put a pointer to your pointer in the corresponding list
when you put the pointer to the allocated object in the map, take the time to fill all 'not set' pointers (and remove the list at the same time)
Of course, it's better to have frameworks handling it for you. You may try out s11n, if I remember correctly it handles cycles of references.

Is it possible to write a truly generic disk-baked B+Tree implementation?

I wrote a generic in-memory B+Tree implementation in C++ few times ago, and I'm thinking about making it persistent on disk (which is why B+Tree have been designed for initially).
My first thought was to use mmap (I'm under Linux) to be able to manipulate the file as normal memory and just rewrite the new operator of my nodes classes so that it returns pointers in the mapped portion and create a smart pointer which can convert RAM adresses to file offset to link my nodes with others.
But I want my implementation to be generic, so the user can store an int, an std::string, or whatever custom class he wants in the B+tree.
That's where the problem occurs: for primitive types or aggregated types that do not contain pointers that's all good, but as soon as the object contains a pointer/reference to an heap allocated object, this approach no longer works.
So my question is: is there some known way to overcome this difficulty? My personnal searches on the topic end up unsuccessful, but maybe I missed something.
As far as I know, there are three (somewhat) easy ways to solve this.
Approach 1: write a std::streambuf that points to some pre-allocated memory.
This approach allows you to use operator<< and use whatever existing code already exists to get a string representation of what you want.
Pro: re-use loads of existing code.
Con: no control over how operator<< spits out content.
Con: text-based representations only.
Approach 2: write your own (many times overloaded) output function.
Pro: can come up with binary representation.
Pro: exact control over every single output format.
Con: re-write so many output functions... writing overloads for new types by clients is a pain because they shouldn't write functions that fall in your library's namespace... unless you resort to Koenig (argument dependant) lookup!
Approach 3: write a btree_traits<> template.
Pro: can come up with binary representation.
Pro: exact control over every single output format.
Pro: more control on output and format that a function, may contain meta data and all.
Con: still requires you / your library's users to write lots of custom overloads.
Pro: have the btree_traits<> detault to use operator<< unless someone overrides the traits?
You cannot write a truly generic and transparent version since if the pointer in a non-trivial item was allocated with malloc (or new and new[]), then it's already in the heap.
A non-transparent sollution may be serializing the class is an option, and this can be done relatively easy. Before you store the class you'd have to call the serialization function and before pulling it you'd call the deserialize. Boost has good serialization features that you could make work with your B+Tree.
Handling pointers and references in a generic way means you will need to inspect the type of the structure you're trying to store, and its fields. C++ is a language not known for its reflectiveness.
But even in a language with powerful reflection, a generic solution to this problem is difficult. You might be able to get it to work for a subset of types in higher level languages like Python, Ruby, etc. A related and more powerful paradigm is the persistent programming language.
The function you want is usually implemented by delegating responsibility for writing the data block to the target type itself. It's called serialization. It simply means writing an interface with a method to dump data, and a method to load data. Any class that wants to be persisted in your B-tree then simply implements this interface.

Pointers to objects in a set or in a vector - does it matter?

just came a across a situation where I needs to store heap-allocated pointers (to a class B) in an STL container. The class that owns the privately held container (class A) also creates the instances of B. Class A will be able to return a const pointers to B instances for clients of A.
Now, does it matter if these pointer are stored in a set or a vector? I thought of having a set just to verify that no duplicates are stored but since addresses are stored, two B pointers with the same data can be stored (unless I provide a comparison class for data comparison I presume).
Any thoughts on this (quite vague) subject? What are the pros/cons for the alternatives? Are smart_pointers something to look into?
Please ask me if anything imperative is unclear, thank you!
There's nothing wrong with storing pointers in a standard container - be it a vector, set, map, or whatever. You just have to be aware of who owns that memory and make sure that it's released appropriately. When choosing a container, choose the container that makes the most sense for your needs. vector is great for random access and appending but not so great for inserting elsewhere in the container. list deals with insertions extremely well, but it doesn't have random access. Sets ensure that there are no duplicates in the container and it's sorted (though the sorting isn't very useful if the set holds pointers and you don't give a comparator function) whereas a map is a set of key-value pairs, so sorting and access is done by key. Etc. Etc. Every container has its pros and cons and which is best for a particular situation depends entirely on that situation.
As for pointers, again, having pointers in containers is fine. The issue that you need to worry about is who owns the memory and therefore must worry about freeing it. If there is a clear object that owns what a particular pointer points to, then it should probably be that object which frees it. If it's essentially the container which owns the memory, then you need to make sure that you delete all of the pointers in the container before the container is destroyed.
If you are concerned with there being multiple pointers to the same data floating around or there is no clear owner for a particular chunk of memory, then smart pointers are a good solution. Boost's shared_ptr would probably be a good one to use, and shared_ptr will be part of C++0x. Many would suggest that you should always use shared pointers, but there is some overhead involved and whether it's best for your particular application will depend entirely on your application.
Ultimately, you need to be aware of the strengths and weaknesses of the various container types and determine what the best container is for whatever you're doing. The same goes for how to deal with pointer management. You need to write your program in a way that it's clear who owns a particular chunk of memory and make sure that that owner frees it when appropriate. Shared pointers are just one solution for that (albeit an excellent one). What the best solution is depends on the particulars of your program.
Why would there be any duplicates in the first place? If class A is the sole entity responsible for creating the instances, and it holds the container privately, meaning there's no way for others to mutate it, it seems to me that there should be no cause for duplicates. Well, if there is, won't that be remediable by some checking prior to adding the pointer to the vector?
I don't know why it would matter if you store a pointer in what kind of container. Containers don't really manipulate their data, they only provide access to them in different ways. So, it's up to you :)
If you need to store pointers in stl containers, use shared_ptr.
Now, the set sounds completely wrong. What are you going to do with those?
If you need to add and remove, then list.
If you need to iterate over range, or all, then vector.
If you need to access specific, knowing a key, then map.
Take a look at others as well. One size doesn't fit all.
My answer is that any decisions that you make have be be with your goal in mind. If you need a 'no duplicates allowed' rule that a set enforces, then use a set. If not then you might want to use a vector or any container might do the trick.
As for smart_pointers yes, they are really really useful. Should they be used? I don't know, once again I don't know what your end goal is or the problem that you are trying to solve with them.
Basically it comes down this. If I said "I want to use a hammer. What do you think of that?" you would probably say "Well what for, I know that hammers are pretty good for nails and wood scenarios but they could also be used as a tool to hurt people or maybe as a book stand. Look, just wait a second, what is this for again?" The problem is that I have not, really said why I want to use a hammer. I have not said what goal I am trying to achieve.
So if you have sort of an overall goal then why not let us know, then it will be obvious if you are using the right tools for the job and we can help you more.
In my opinion, stick with vector unless you have a real reason not to. Sets come with some runtime overhead as well as quite large semantic overhead compared to a vector.

Boost shared_ptr use_count function

My application problem is the following -
I have a large structure foo. Because these are large and for memory management reasons, we do not wish to delete them when processing on the data is complete.
We are storing them in std::vector<boost::shared_ptr<foo>>.
My question is related to knowing when all processing is complete. First decision is that we do not want any of the other application code to mark a complete flag in the structure because there are multiple execution paths in the program and we cannot predict which one is the last.
So in our implementation, once processing is complete, we delete all copies of boost::shared_ptr<foo>> except for the one in the vector. This will drop the reference counter in the shared_ptr to 1. Is it practical to use shared_ptr.use_count() to see if it is equal to 1 to know when all other parts of my app are done with the data.
One additional reason I'm asking the question is that the boost documentation on the shared pointer shared_ptr recommends not using "use_count" for production code.
Edit -
What I did not say is that when we need a new foo, we will scan the vector of foo pointers looking for a foo that is not currently in use and use that foo for the next round of processing. This is why I was thinking that having the reference counter of 1 would be a safe way to ensure that this particular foo object is no longer in use.
My immediate reaction (and I'll admit, it's no more than that) is that it sounds like you're trying to get the effect of a pool allocator of some sort. You might be better off overloading operator new and operator delete to get the effect you want a bit more directly. With something like that, you can probably just use a shared_ptr like normal, and the other work you want delayed, will be handled in operator delete for that class.
That leaves a more basic question: what are you really trying to accomplish with this? From a memory management viewpoint, one common wish is to allocate memory for a large number of objects at once, and after the entire block is empty, release the whole block at once. If you're trying to do something on that order, it's almost certainly easier to accomplish by overloading new and delete than by playing games with shared_ptr's use_count.
Edit: based on your comment, overloading new and delete for class sounds like the right thing to do. If anything, integration into your existing code will probably be easier; in fact, you can often do it completely transparently.
The general idea for the allocator is pretty much the same as you've outlined in your edited question: have a structure (bitmaps and linked lists are both common) to keep track of your free objects. When new needs to allocate an object, it can scan the bit vector or look at the head of the linked list of free objects, and return its address.
This is one case that linked lists can work out quite well -- you (usually) don't have to worry about memory usage, because you store your links right in the free object, and you (virtually) never have to walk the list, because when you need to allocate an object, you just grab the first item on the list.
This sort of thing is particularly common with small objects, so you might want to look at the Modern C++ Design chapter on its small object allocator (and an article or two since then by Andrei Alexandrescu about his newer ideas of how to do that sort of thing). There's also the Boost::pool allocator, which is generally at least somewhat similar.
If you want to know whether or not the use count is 1, use the unique() member function.
I would say your application should have some method that eliminates all references to the Foo from other parts of the app, and that method should be used instead of checking use_count(). Besides, if use_count() is greater than 1, what would your program do? You shouldn't be relying on shared_ptr's features to eliminate all references, your application architecture should be able to eliminate references. As a final check before removing it from the vector, you could assert(unique()) to verify it really is being released.
I think you can use shared_ptr's custom deleter functionality to call a particular function when the last copy has been released. That way, you're not using use_count at all.
You would need to hold something other than a copy of the shared_ptr in your vector so that the shared_ptr is only tracking the outstanding processing.
Boost has several examples of custom deleters in the shared_ptr docs.
I would suggest that instead of trying to use the shared_ptr's use_count to keep track, it might be better to implement your own usage counter. this way you will have full control over this rather than using the shared_ptr's one which, as you rightly suggest, is not recommended. You can also pre-set your own counter to allow for the number of threads you know will need to act on the data, rather than relying on them all being initialised at the beginning to get their copies of the structure.

Locating objects (structs) in memory - how to?

How would you locate an object in memory, lets say that you have a struct defined as:
struct POINT {
int x;
int y;
};
How would I scan the memory region of my app to find instances of this struct so that I can read them out?
Thanks R.
You can't without adding type information to the struct. In memory a struct like that is nothing else than 2 integers so you can't recognize them any better than you could recognize any other object.
You can't. Structs don't store any type information (unless they have virtual member functions), so you can't distinguish them from any other block of sizeof(POINT) bytes.
Why don't you store your points in a vector or something?
You can't. You have to know the layout to know what section of memory have to represent a variable. That's a kind of protocol and that's why we use text based languages instead raw values.
You don't - how would you distinguish two arbitrary integers from random noise?
( but given a Point p; in your source code, you can obtain its address using the address-of operator ... Point* pp = &p;).
Short answer: you can't. Any (appropriately aligned) sequence of 8 bytes could potentially represent a POINT. In fact, an array of ints will be indistinguishable from an array of POINTS. In some cases, you could take advantage of knowledge of the compiler implementation to do better. For instance, if the struct had virtual functions, you could look for the correct vtable pointer - but there could also be false positives.
If you want to keep track of objects, you need to register them in their constructor and unregister them in their destructor (and pay the performance penalty), or give them their own allocator.
There's no way to identify that struct. You need to put the struct somewhere it can be found, on the stack or on the heap.
Sometimes data structures are tagged with identifying information to assist with debugging or memory management. As a means of data organization, it is among the worst possible approaches.
You probably need to a lot of general reading on memory management.
There is no standard way of doing this. The platform may specify some APIs which allow you to access the stack and the free store. Further, even if you did, without any additional information how would you be sure that you are reading a POINT object and not a couple of ints? The compiler/linker can read this because it deals with (albeit virtual) addresses and has some more information (and control) than you do.
You can't. Something like that would probably be possible on some "tagged" architecture that also supported tagging objects of user-defined types. But on a traditional architecture it is absolutely impossible to say for sure what is stored in memory simply by looking at the raw memory content.
You can come closer to achieving what you want by introducing a unique signature into the type, like
struct POINT {
char signature[8];
int x;
int y;
};
and carefully setting it to some fixed and "unique" pattern in each object of POINT type, and then looking for that pattern in memory. If it is your application, you can be sure with good degree of certainty that each instance of the pattern is your POINT object. But in general, of course, there will never be any guarantee that the pattern you found belongs to your object, as opposed to being there purely accidentally.
What everyone else has said is true. In memory, your struct is just a few bytes, there's nothing in particular to distinguish it.
However, if you feel like a little hacking, you can look up the internals of your C library and figure out where memory is stored on the heap and how it appears. For example, this link shows how stuff gets allocated in one particular system.
Armed with this knowledge, you could scan your heap to find allocated blocks that were sizeof(POINT), which would narrow down the search considerably. If you look at the table you'll notice that the file name and line number of the malloc() call are being recorded - if you know where in your source code you're allocating POINTs, you could use this as a reference too.
However, if your struct was allocated on the stack, you're out of luck.