A design issue with a tree of information in C++ - c++

Sorry in advance for the lengthy explanation!
I have a C++ application that uses a hash_map to store a tree of information that was parsed from a text file. The values in the map are either a child hash_map or a string. These values were parsed from a text file and then stored into the map.
I wanted to avoid having to send the strings and maps as a copy to the hash map assignment function, so when file was parsed, I created a pointer to new string() or a new hash_map() and stored the value into the map as "arbitrary" data (pointer to a void).
However, this poses a pretty big problem when it comes to clean-up, as deleting a void doesn't behave like one would want it to (and it makes sense). I looked for an easy solution by just creating an Object class and made child classes called StringObj and HashMap, which stored their respective data, and the appropriate destructor was called since the hash_map value type was changed to a pointer to an Object.
Is there an easier way to solve this? I looked into dynamic casting and thought it might work well, since I could catch the exception from the failed cast, and treat it appropriately, but I can't help but feel there might be a simpler solution, or that I'm over-complicating it a bit.
Suggestions?
Thanks in advance,
Jengerer

Use boost::variant (which is equivalent to a C++ union for user-defined types), C++ union (applicable in this case as you're working with only pointers) or boost::any (which can store any type) to store a pointer to either hash_map or string.

One other option is that you could store a std::pair<hash_map*, string*> for each entry in the hash map. Obviously set the unused pointer in each pair to NULL so you can tell which is used and which isn't.
Debatable whether that's neater than your approach or not, although I would hazard that it's less code since you don't need definitions of Object, StringObj and HashMap.

Related

Scenarios where we force to use Pointers in c++

I had been in an interview and asked to give an example or scenario in CPP where we can't proceed without pointers, means we have to use pointer necessarily.
I have given an example of function returning array whose size is not known then we need to return pointer that is name of the array which is actually a pointer. But the interviewer said its internal to array give some other example.
So please help me with some other scenarios for the same.
If you are using a C Library which has a function that returns a pointer, you have to use pointers then, not a reference.
There are many other cases (explicitly dealing with memory, for instance) - but these two came to my mind first:
linked data-structures
How: You need to reference parts of your structure in multiple places. You use pointers for that, because containers (which also use pointers internally) do not cover all your data-structure needs. For example,
class BinTree {
BinTree *left, *right;
public:
// ...
};
Why there is no alternative: there are no generic tree implementations in the standard (not counting the sorting ones).
pointer-to-implementation pattern (pimpl)
How: Your public .hpp file has the methods, but only refers to internal state via an opaque Whatever *; and your internal implementation actually knows what that means and can access its fields. See:
Is the pImpl idiom really used in practice?
Why there is no alternative: if you provide your implementation in binary-only form, users of the header cannot access internals without decompiling/reverse engineering. It is a much stronger form of privacy.
Anyplace you would want to use a reference, but have to allow for null values
This is common in libraries where if you pass a non zero pointer, it will be set to the value
It is also a convention to have arguments to a function that will be changed to use a pointer, rather than a reference to emphasize that the value can be changed to the user.
Here are some cases:
Objects with large lifetime. You created some object in function. You need this object afterwards (not even copy of it).
But if you created it without pointers, on stack - after function would finish, this object would die. So you need to create this object using dynamic memory and return pointer to it.
Stack space is not enough. You need object which needs lot of memory, hence allocating it on the stack won't fit your needs, since stack has less space than heap usually. So you need to create the object again using dynamic memory on heap and return pointer to it.
You need reference semantics. You have structure which you passed to some function and you want the function to modify this structure, in this case you need to pass a pointer to this structure, otherwise you can't modify the original structure, since copy of it will be passed to the function if you don't use pointers.
Note: in the latter case, indeed using pointer is not necessary, since you can substitute it using reference.
PS. You can browse here for more scenarios, and decide in which cases are pointer usages necessary.
pointers are important for performance example of this are for functions. originally when you pass a value in a function it copies the value from the argument and stores to the parameter
but in pointers you can indirectly access them and do what you want

C++: What is recommended way to distinguish between string which never had the value set and empty string?

We have a Java messaging API which we are translating to C++. The messages typically have simple data types, like string, int, double, etc. When a message is constructed, we initialize all the member variables to a default value which the API recognizes as a "null" value (i.e. never set to any value), e.g. Integer.MAX_VALUE for int types. Any fields which are considered null are not serialized and sent.
In Java, strings automatically initialize to null so it's easy to differentiate between a string field which is null versus a string which is empty string (which is a legal value to send in the message).
I'm not sure of the best way to handle this in C++, since the strings automatically initialize to an empty string, and empty string is a legal value to send over the API. We could default strings to some control character (which would not be a legal value in our API), but I'm wondering if there is a more conventional or better way to do this.
We're all new here to C++, so we may have overlooked some obvious approach.
The recommended way is to make is that the object doesn't exist until it has a valid value. If a message wit a null string isn't valid, why allow it?
You can't avoid it in Java, because a string can always be null.
But C++ gives you the tool to create a class which is guaranteed to always hold a string.
And it sounds like that's what you want.
For what you're asking for, the best approach is really to build into the class the invariant that objects of this class always have a string set. Instead of setting all the objects to some default value in the constructor, define the constructor to take the actual parameters and set the members to valid values.
However, if you want to specify an "optional" value, there are a couple of broad approaches:
either use a pointer (preferably a smart pointer). A pointer to a string can be null, or it can point to a valid string (which, again, may or may not be empty)
alternatively, use something like boost::optional from the Boost libraries. This is a clever little utility template which lets you define, well, optional values (the object may contain a string, or it may be null)
or you could simply add a bool flag (something like has_string, which, when not set, indicates that no string has been set, and the string value should be disregarded).
Personally, I'd prefer the last two approaches, but all three are fairly commonly used, and will work just fine. But the best approach is the one in which you design the class so that the compiler can guarantee that it'll always be valid. If you don't want messages with a null string, let the compiler ensure that messages will never have a null string.
To replicate Java "things can have values, or lack values", probably the most general way is to store boost::optional<T>, or in the next version of the standard, std::optional<T>.
You do have to throw in some * and -> if you want to read their values, and be careful about optional<bool> because its default conversion to bool is "am I initialized or not?", not the bool that is stored. But operator= does pretty much what you want it to when writing to it, it is just reading from it that can do unexpected things in a bool context.
To tell if an optional<T> is initialized, just evaluate it in a bool context like you might a pointer. To extract its value after you have confirmed it is initialized, use the unary * operator.
boost is a relatively high quality library with a high rate of code migrating from it to the C++ standard in 5 to 10 years. It does contain some scary parts (like phoenix!), and in general you should make sure that whatever component you are using isn't already in the C++ standard library (having migrated there). boost::optional in particular is part of their header-only libraries, which are easier to use (as you don't have to build boost to use them).

Dynamic type dereferrencing?

In attempting to answer another question, I was intrigued by a bout of curiousity, and wanted to find out if an idea was possible.
Is it possible to dynamically dereference either a void * pointer (we assume it points to a valid referenced dynamically allocated copy) or some other type during run time to return the correct type?
Is there some way to store a supplied type (as in, the class knows the void * points to an int), if so how?
Can said stored type (if possible) be used to dynamically dereference?
Can a type be passed on it's own as an argument to a function?
Generally the concept (no code available) is a doubly-linked list of void * pointers (or similar) that can dynamically allocated space, which also keep with them a copy of what type they hold for later dereference.
1) Dynamic references:
No. Instead of having your variables hold just pointers, have them hold a struct containing both the actual pointer and a tag defining what type the pointer is pointing to
struct Ref{
int tag;
void *ref;
};
and then, when "dereferencing", first check the tag to find out what you want to do.
2) Storing types in your variables, passing them to functions.
This doesn't really make sense, as types aren't values that can be stored around. Perhaps what you just want is to pass around a class / constructor function and that is certainly feasible.
In the end, C and C++ are bare-bones languages. While a variable assignment in a dynamic language looks a lot like a variable assignment in C (they are just a = after all) in reality the dynamic language is doing a lot of extra stuff behind the scenes (something it is allowed to do, since a new language is free to define its semantics)
Sorry, this is not really possible in C++ due to lack of type reflection and lack of dynamic binding. Dynamic dereferencing is especially impossible due to these.
You could try to emulate its behavior by storing types as enums or std::type_info* pointers, but these are far from practical. They require registration of types, and huge switch..case or if..else statements every time you want to do something with them. A common container class and several wrapper classes might help achieving them (I'm sure this is some design pattern, any idea of its name?)
You could also use inheritance to solve your problem if it fits.
Or perhaps you need to reconsider your current design. What exactly do you need this for?

C++ (semi) Reflection for file save/load? (Hack?)

I have a bunch of structs in C++. I'd like to save it to file and load them up again. Problem is a few of my structs are pointers to base classes(/structs). So i'd need a way to figure out the type and create it. They really are just POD, they all have public members and no constructors.
What is the easiest way to save and load them from file? I have a LOT of structs and the only types i use are ints, pointers or c strings. I am thinking i could do some macro hacks. But really i have no idea what i should do.
Have you tried the Boost serialization library?
Don't roll your own here - use something well-developed and tested. One idea is Protocol Buffers
The pointers pose a specific issue: I suppose that multiple struct may actually refer to the same pointer and that you'd like a single pointer to be recreated when deserializing...
The first idea, to avoid boiler-plate code, is to create a compile-time reflexion tool:
BOOST_FUSION_ADAPT_STRUCT
BOOST_FUSION_ADAPT_STRUCT_NAMED
Those 2 macros will generate some wicked information on your struct so that you can then use them with Fusion algorithms, which cross the gap between compile-time and run-time.
Now, you need something that will be able to serialize and deserialize your data. Deserialization is usually a bit more difficult, though here you have the advantage of no polymorphism (which always makes things difficult).
Normally, on a first pass you identify the graph of objects to serialize, assign them all an ID, and use this ID in lieu of the pointer when serializing. For deserializing, you use a 3-columns map:
the map is ID -> (pointer to allocated object, list of pointers that could not be set)
allocate all objects, filling the ID map with a pointer to the allocated object each time
when you need to deserialize an ID, look it up in the map, if absent put a pointer to your pointer in the corresponding list
when you put the pointer to the allocated object in the map, take the time to fill all 'not set' pointers (and remove the list at the same time)
Of course, it's better to have frameworks handling it for you. You may try out s11n, if I remember correctly it handles cycles of references.

C++ Storing objects in a file

I have a list of objects that I would like to store in a file as small as possible for later retrieval. I have been carefully reading this tutorial, and am beginning (I think) to understand, but have several questions. Here is the snippet I am working with:
static bool writeHistory(string fileName)
{
fstream historyFile;
historyFile.open(fileName.c_str(), ios::binary);
if (historyFile.good())
{
list<Referral>::iterator i;
for(i = AllReferrals.begin();
i != AllReferrals.end();
i++)
{
historyFile.write((char*)&(*i),sizeof(Referral));
}
return true;
} else return false;
}
Now, this is adapted from the snippet
file.write((char*)&object,sizeof(className));
taken from the tutorial. Now what I believe it is doing is converting the object to a pointer, taking the value and size and writing that to the file. But if it is doing this, why bother doing the conversions at all? Why not take the value from the beginning? And why does it need the size? Furthermore, from my understanding then, why does
historyFile.write((char*)i,sizeof(Referral));
not compile? i is an iterator (and isn't an iterator a pointer?). or simply
historyFile.write(i,sizeof(Referral));
Why do i need to be messing around with addresses anyway? Aren't I storing the data in the file? If the addresses/values are persisting on their own, why can't i just store the addresses deliminated in plain text and than take their values later?
And should I still be using the .txt extension? < edit> what should I use instead then? I tried .dtb and was not able to create the file. < /edit> I actually can't even seem to get file to open without errors with the ios::binary flag. I'm also having trouble passing the filename (as a string class string, converted back by c_str(), it compiles but gives an error).
Sorry for so many little questions, but it all basically sums up to how to efficiently store objects in a file?
What you are trying to do is called serialization. Boost has a very good library for doing this.
What you are trying to do can work, in some cases, with some very important conditions. It will only work for POD types. It is only guaranteed to work for code compiled with the same version of the compiler, and with the same arguments.
(char*)&(*i)
says to take the iterator i, dereference it to get your object, take the address of it and treat it as an array of characters. This is the start of what is being written to the file. sizeof(Referral) is the number of bytes that will be written out.
An no, an iterator is not necessarily a pointer, although pointers meet all the requirements for an iterator.
Question #1 why does ... not compile?
Answer: Because i is not a Referral* -- it's a list::iterator ;; an iterator is an abstraction over a pointer, but it's not a pointer.
Question #2 should I still be using the .txt extension?
Answer: probably not. .txt is associated by many systems to the MIME type text/plain.
Unasked Question: does this work?
Answer: if a Referral has any pointers on it, NO. When you try to read the Referrals from the file, the pointers will be pointing to the location on memory where something used to live, but there is no guarantee that there is anything valid there anymore, least of all the thing that the pointers were pointing to originally. Be careful.
isn't an iterator a pointer?
An iterator is something that acts like a pointer from the outside. In most (perhaps all) cases, it is actually some form of object instead of a bare pointer. An iterator might contain a pointer as an internal member variable that it uses to perform its job, but it just as well might contain something else or additional variables if necessary.
Furthermore, even if an iterator has a simple pointer inside of it, it might not point directly at the object you're interested in. It might point to some kind of bookkeeping component used by the container class which it can then use to get the actual object of interest. Fortunately, we don't need to care what those internal details actually are.
So with that in mind, here's what's going on in (char*)&(*i).
*i returns a reference to the object stored in the list.
& takes the address of that object, thus yielding a pointer to the object.
(char*) casts that object pointer into a char pointer.
That snippet of code would be the short form of doing something like this:
Referral& r = *i;
Referral* pr = &r;
char* pc = (char*)pr;
Why do i need to be messing around
with addresses anyway?
And why does it need the size?
fstream::write is designed to write a series of bytes to a file. It doesn't know anything about what those bytes mean. You give it an address so that it can write the bytes that exist starting wherever that address points to. You give it a size so that it knows how many bytes to write.
So if I do:
MyClass ExampleObject;
file.write((char*)ExampleObject, sizeof(ExampleObject));
Then it writes all the bytes that exist directly within ExampleObject to the file.
Note: As others have mentioned, if the object you want to write has members that dynamically allocate memory or otherwise make use of pointers, then the pointed to memory will not be written by a single simple fstream::write call.
will serialization give a significant boost in storage efficiency?
In theory, binary data can often be both smaller than plain-text and faster to read and write. In practice, unless you're dealing with very large amounts of data, you'll probably never notice the difference. Hard drives are large and processors are fast these days.
And efficiency isn't the only thing to consider:
Binary data is harder to examine, debug, and modify if necessary. At least without additional tools, but even then plain-text is still usually easier.
If your data files are going to persist between different versions of your program, then what happens if you need to change the layout of your objects? It can be irritating to write code so that a version 2 program can read objects in a version 1 file. Furthermore, unless you take action ahead of time (like by writing a version number in to the file) then a version 1 program reading a version 2 file is likely to have serious problems.
Will you ever need to validate the data? For instance, against corruption or against malicious changes. In a binary scheme like this, you'd need to write extra code. Whereas when using plain-text the conversion routines can often help fill the roll of validation.
Of course, a good serialization library can help out with some of these issues. And so could a good plain-text format library (for instance, a library for XML). If you're still learning, then I'd suggest trying out both ways to get a feel for how they work and what might do best for your purposes.
What you are trying to do (reading and writing raw memory to/from file) will invoke undefined behaviour, will break for anything that isn't a plain-old-data type, and the files that are generated will be platform dependent, compiler dependent and probably even dependent on compiler settings.
C++ doesn't have any built-in way of serializing complex data. However, there are libraries that you might find useful. For example:
http://www.boost.org/doc/libs/1_40_0/libs/serialization/doc/index.html
Did you have already a look at boost::serialization, it is robust, has a good documentation, supports versioning and if you want to switch to an XML format instead of a binary one, it'll be easier.
Fstream.write simply writes raw data to a file. The first parameter is a pointer to the starting address of the data. The second parameter is the length (in bytes) of the object, so write knows how many bytes to write.
file.write((char*)&object,sizeof(className));
^
This line is converting the address of object to a char pointer.
historyFile.write((char*)i,sizeof(Referral));
^
This line is trying to convert an object (i) into a char pointer (not valid)
historyFile.write(i,sizeof(Referral));
^
This line is passing write an object, when it expects a char pointer.