Internal storage of Classes and Structs - c++

I want to use pointer magikry to save a C++ class using the following method that writes byte data into a file:
result Osp::Io::File::Write (const void *buffer, int length);
Parameters:
buffer — A pointer to the user-supplied buffer that contains byte data to be written
length — The buffer length in bytes
Exceptions:
E_SUCCESS — The method is successful.
E_INVALID_STATE — The file has not been opened as yet.
E_ILLEGAL_ACCESS — The file is not opened for write operation, or access is denied due to insufficient permission.
E_INVALID_ARG — Either of the following conditions has occurred:
The specified buffer contains a null pointer.
The specified buffer length is equal or smaller than 0.
The file handle is invalid (either the file is closed by another method, or the memory is corrupted).
E_STORAGE_FULL — The disk space is full.
E_IO — An unexpected device failure has occurred as the media ejected suddenly or file corruption is detected.
I'd rather not assume that there will be any sort of buffering, although I am confident each byte won't occasion a whole block of flash to be rewritten but I was wondering if there is a niftier way to write all the data fields of a class (and nothing else, eg static fields) by, eg, a pointer to the object (*this)?

In C++ you don't write "raw" objects into files, but rather serialize them. There's no magic, you need to write your serialization code yourself (overloading operators << and >>, for convenience).
You can do it the old C-style by just dumping memory, but in addition to the problems this would generally cause with C (alignment, endian issues when transferring data between systems), you also get the problems introduced by C++ (internal class representation, possible "hidden" data members such as a v-table, etc).
If you want to ensure you read and write reliable data that can be transferred between different systems and/or different pieces of software - you better implement the serialization, and don't look for shortcuts.
You can use libraries like Boost.Serialization for that.

Related

Size of object and C++ standard

Looking around I found many places where the way to get the size of a certain object (class or struct) is explained. I read about the padding, about the fact that virtual function table influences the size and that "pure method" object has size of 1 byte. However I could not find whether these are facts about implementation or C++ standard (at least I was not able to find all them).
In particular I am in the following situation: I'm working with some data which are encoded in some objects. These objects do not hold pointers to other data. They do not inherit from any other class, but they have some methods (non virtual). I have to put these data in a buffer to send them via some socket. Now reading what I mentioned above, I simply copy my objects on the sender buffer, noticing that the data are "serialized" correctly, i.e. each member of the object is copied, and methods do not affect the byte structure.
I would like to know if what I get is just because of the implementation of the compiler or if it is prescribed by the standard.
The memory layouts of classes are not specified in the C++ standard precisely. Even the memory layout of scalar objects such as integers aren't specified. They are up to the language implementation to decide, and generally depend on the underlying hardware. The standard does specify restrictions that the implementation specific layout must satisfy.
If a type is trivially copyable, then it can be "serialised" by copying its memory into a buffer, and it can be de-it serialised back as you describe. However, such trivial serialisation only works when the process that de-serialises uses the same memory layout. This cannot generally be assumed to be the case since the other process may be running on entirely different hardware and may have been compiled with a different (version of) compiler.
You should use POD (plain-old-data). A structure is POD if it hasn't virtual table, some constructors, private methods and many other things.
There is garantee the pod data is placed in memory in declaration order.
There is alignment in pod data. And you should specify right alignment (it's your decision). See #pragma pack (push, ???).

What are opaque identifiers with regards to handles?

I have read this article and I encountered the following
A resource handle can be an opaque identifier, in which case it is
often an integer number (often an array index in an array or "table"
that is used to manage that type of resource), or it can be a pointer
that allows access to further information.
So a handle is either an opaque identifier or a pointer that allows access to further information. But from what I understand, these specific pointers are opaque pointers, so what exactly is the difference between these pointers ,which are opaque pointer, and opaque identifiers?
One of the literal meanings of "opaque" is "not transparent".
In computer science, an opaque identifier or a handle is one that doesn't expose its inner details. This means we can only access information from it by using some defined interface, and can't otherwise access information about its value (if any) or internal structure.
As an example, a FILE in the C standard library (and available in C++ through <cstdio>) is an opaque type. We don't know if it is a data structure, an integer, or anything else. All we know is that a set of functions, like fopen() return a pointer to one (i.e. a FILE *) and other functions (fclose(), fprintf(), ....) accept a FILE * as an argument. If we have a FILE *, we can't reliably do anything with it (e.g. actually write to a file) unless we use those functions.
The advantage of that is it allows different implementations to use different ways of representing a file. As long as our code uses the supplied functions, we don't have to worry about the internal workings of a FILE, or of I/O functions. Compiler vendors (or implementers of the standard library) worry about getting the internal details right. We simply use the opaque type FILE, and pointers to it, and stick to using standard functions, and our code works with all implementations (compilers, standard library versions, host systems)
An opaque identifier can be of any type. It can be an integer, a pointer, even a pointer to a pointer, or a data structure. Integers and pointers are common choices, but not the only ones. The key is only using a defined set of operations (i.e. a specific interface) to interact with those identifiers, without getting our hands dirty by playing with internal details.
A handle is said to be "opaque" when client code doesn't know how to see what it references. It's simply some identifier that can be used to identify something. Often it will be a pointer to an incomplete type that's only defined within a library and who's definition isn't visible to client code. Or it could just be an integer that references some element in some data structure. The important thing is that the client doesn't know or care what the handle is. The client only cares that it uniquely identifies some resource.
Consider the following interface:
widget_handle create_widget();
void do_a_thing(widget_handle);
void destroy_widget(widget_handle);
Here, it doesn't actually matter to the calling code what a widget_handle is, how the library actually stores widgets, or how the library actually uses a widget_handle to find a particular widget. It could be a pointer to a widget, or it could be an index into some global array of widgets. The caller doesn't care. All that matters is that it somehow identifies a widget.
One possible difference is that an integer handle can have "special" values, while pointer handle cannot.
For example, the file descriptors 0,1,2 are stdin, stdout, stderr. This would be harder to pull off if you have a pointer for a handle.
You really you shouldn't care. They could be everything.
Suppose you buy a ticket from person A for an event. You must give this ticket to person B to access the event.
The nature of the ticket is irrelevant to you, it could be:
a paper ticket,
an alphanumerical code,
a barcode,
a QR-Code,
a photo,
a badge,
whatever.
You don't care. Only A and B use the ticket for its nature, you are just carrying it around. Only B knows how to verify the validity and only A know how to issue a correct ticket.
An opaque pointer could be a memory location directly, while an integer could be an offset of a base memory address in a table but how is that relevant to you, client of the opaque handle?
In classic Mac OS memory management, handles were doubly indirected pointers. The handle pointed to a "master pointer" which was the address of the actual data. This allowed moving the actual storage of the object in memory. When a block was moved, its master pointer would be updated by the memory manager.
In order to use the data the handle ultimately referenced, the handle had to be locked, which would prevent it being moved. (There was little concurrency in the system so unless one was calling the operating system or libraries which might, one could also rely on memory not getting moved. Doing so was somewhat perilous however as code often evolved to call something that could move memory inside a place where that was not expected.)
In this design, the handle is a pointer but it is not an opaque type. A generic handle is a void ** in C, but often one had a typed handle. If you look here you'll find lots of handle types that are more concrete. E.g. StringHandle.

Reading and writing class objects to binary file

I would like to know what happens when I write:
object.write((char*)&class_object, sizeof(class_object));
// or
object.read((char*)&class_object, sizeof(class_object));
From what I read so far, the class_object is converted to a pointer. But I don't know how it manages to convert data carried by the object into binary. What does the binary actually represent?
I am a beginner.
EDIT
Could you please explain what really happens when we write the above piece of code? I mean, what actually happens when we write (char*)*S, say where S is the object of a class that I have declared?
Imagine it this way, the class instance is just some memory chunk resting in your RAM, if you convert your class to a char pointer:
SomeClass someClassInstance;
char* data = reinterpret_cast<char*>(&someClassInstance);
It will point to the same data in your memory but it will be treated as a byte array in your program.
If you convert it back:
SomeClass* instance = reinterpret_cast<SomeClass*>(data);
It will be treated as the class again.
So in order to write your class to a file and later reconstruct it, you can just write the data to some file which will be sizeof(SomeClass) in size and later read the file and convert the raw bytes to the class instance.
However, keep in mind that you can only do this if your class is POD (Plain Old Data)!
In practice, your code won't work and is likely to yield undefined behavior, at least when your class or struct is not a POD (plain old data) and contains pointers or virtual functions (so has some vtable).
The binary file would contain the bit representation of your object, and this is not portable to another computer, or even to another process running the same program (notably because of ASLR) unless your object is a POD.
See also this answer to a very similar question.
You probably want some serialization. Since disks and file accesses are a lot slower (many dozen of thousands slower) than the CPU, it is often wise to use some more portable data representation. Practically speaking, you should consider some textual representation like e.g. JSON, XML, YAML etc.... Libraries such as jsoncpp are really easy to use, and you'll need to code something to transform your object into some JSON, and to create some object from a JSON.
Remember also that data is often more costly and more precious than code. The point is that you often want some old data (written by a previous version of your program) to be read by a newer version of your program. And that might not be trivial (e.g. if you have added or changed the type of some field in your class).
You could also read about dynamic software updating. It is an interesting research subject. Be aware of databases.
Read also about parsing techniques, notably about recursive descent parsers. They are relevant.

C++ Storing objects in a file

I have a list of objects that I would like to store in a file as small as possible for later retrieval. I have been carefully reading this tutorial, and am beginning (I think) to understand, but have several questions. Here is the snippet I am working with:
static bool writeHistory(string fileName)
{
fstream historyFile;
historyFile.open(fileName.c_str(), ios::binary);
if (historyFile.good())
{
list<Referral>::iterator i;
for(i = AllReferrals.begin();
i != AllReferrals.end();
i++)
{
historyFile.write((char*)&(*i),sizeof(Referral));
}
return true;
} else return false;
}
Now, this is adapted from the snippet
file.write((char*)&object,sizeof(className));
taken from the tutorial. Now what I believe it is doing is converting the object to a pointer, taking the value and size and writing that to the file. But if it is doing this, why bother doing the conversions at all? Why not take the value from the beginning? And why does it need the size? Furthermore, from my understanding then, why does
historyFile.write((char*)i,sizeof(Referral));
not compile? i is an iterator (and isn't an iterator a pointer?). or simply
historyFile.write(i,sizeof(Referral));
Why do i need to be messing around with addresses anyway? Aren't I storing the data in the file? If the addresses/values are persisting on their own, why can't i just store the addresses deliminated in plain text and than take their values later?
And should I still be using the .txt extension? < edit> what should I use instead then? I tried .dtb and was not able to create the file. < /edit> I actually can't even seem to get file to open without errors with the ios::binary flag. I'm also having trouble passing the filename (as a string class string, converted back by c_str(), it compiles but gives an error).
Sorry for so many little questions, but it all basically sums up to how to efficiently store objects in a file?
What you are trying to do is called serialization. Boost has a very good library for doing this.
What you are trying to do can work, in some cases, with some very important conditions. It will only work for POD types. It is only guaranteed to work for code compiled with the same version of the compiler, and with the same arguments.
(char*)&(*i)
says to take the iterator i, dereference it to get your object, take the address of it and treat it as an array of characters. This is the start of what is being written to the file. sizeof(Referral) is the number of bytes that will be written out.
An no, an iterator is not necessarily a pointer, although pointers meet all the requirements for an iterator.
Question #1 why does ... not compile?
Answer: Because i is not a Referral* -- it's a list::iterator ;; an iterator is an abstraction over a pointer, but it's not a pointer.
Question #2 should I still be using the .txt extension?
Answer: probably not. .txt is associated by many systems to the MIME type text/plain.
Unasked Question: does this work?
Answer: if a Referral has any pointers on it, NO. When you try to read the Referrals from the file, the pointers will be pointing to the location on memory where something used to live, but there is no guarantee that there is anything valid there anymore, least of all the thing that the pointers were pointing to originally. Be careful.
isn't an iterator a pointer?
An iterator is something that acts like a pointer from the outside. In most (perhaps all) cases, it is actually some form of object instead of a bare pointer. An iterator might contain a pointer as an internal member variable that it uses to perform its job, but it just as well might contain something else or additional variables if necessary.
Furthermore, even if an iterator has a simple pointer inside of it, it might not point directly at the object you're interested in. It might point to some kind of bookkeeping component used by the container class which it can then use to get the actual object of interest. Fortunately, we don't need to care what those internal details actually are.
So with that in mind, here's what's going on in (char*)&(*i).
*i returns a reference to the object stored in the list.
& takes the address of that object, thus yielding a pointer to the object.
(char*) casts that object pointer into a char pointer.
That snippet of code would be the short form of doing something like this:
Referral& r = *i;
Referral* pr = &r;
char* pc = (char*)pr;
Why do i need to be messing around
with addresses anyway?
And why does it need the size?
fstream::write is designed to write a series of bytes to a file. It doesn't know anything about what those bytes mean. You give it an address so that it can write the bytes that exist starting wherever that address points to. You give it a size so that it knows how many bytes to write.
So if I do:
MyClass ExampleObject;
file.write((char*)ExampleObject, sizeof(ExampleObject));
Then it writes all the bytes that exist directly within ExampleObject to the file.
Note: As others have mentioned, if the object you want to write has members that dynamically allocate memory or otherwise make use of pointers, then the pointed to memory will not be written by a single simple fstream::write call.
will serialization give a significant boost in storage efficiency?
In theory, binary data can often be both smaller than plain-text and faster to read and write. In practice, unless you're dealing with very large amounts of data, you'll probably never notice the difference. Hard drives are large and processors are fast these days.
And efficiency isn't the only thing to consider:
Binary data is harder to examine, debug, and modify if necessary. At least without additional tools, but even then plain-text is still usually easier.
If your data files are going to persist between different versions of your program, then what happens if you need to change the layout of your objects? It can be irritating to write code so that a version 2 program can read objects in a version 1 file. Furthermore, unless you take action ahead of time (like by writing a version number in to the file) then a version 1 program reading a version 2 file is likely to have serious problems.
Will you ever need to validate the data? For instance, against corruption or against malicious changes. In a binary scheme like this, you'd need to write extra code. Whereas when using plain-text the conversion routines can often help fill the roll of validation.
Of course, a good serialization library can help out with some of these issues. And so could a good plain-text format library (for instance, a library for XML). If you're still learning, then I'd suggest trying out both ways to get a feel for how they work and what might do best for your purposes.
What you are trying to do (reading and writing raw memory to/from file) will invoke undefined behaviour, will break for anything that isn't a plain-old-data type, and the files that are generated will be platform dependent, compiler dependent and probably even dependent on compiler settings.
C++ doesn't have any built-in way of serializing complex data. However, there are libraries that you might find useful. For example:
http://www.boost.org/doc/libs/1_40_0/libs/serialization/doc/index.html
Did you have already a look at boost::serialization, it is robust, has a good documentation, supports versioning and if you want to switch to an XML format instead of a binary one, it'll be easier.
Fstream.write simply writes raw data to a file. The first parameter is a pointer to the starting address of the data. The second parameter is the length (in bytes) of the object, so write knows how many bytes to write.
file.write((char*)&object,sizeof(className));
^
This line is converting the address of object to a char pointer.
historyFile.write((char*)i,sizeof(Referral));
^
This line is trying to convert an object (i) into a char pointer (not valid)
historyFile.write(i,sizeof(Referral));
^
This line is passing write an object, when it expects a char pointer.

Named Pipe strategies with dynamic memory?

Hokay so I have an application where I need some IPC... I'm thinking named pipes are the way to go because they are so easy to use.
Anyways, I have a question about how to handle dynamic memory using named pipes.
Say I have a class such as this:
class MyTestClass {
public:
MyTestClass() { _data = new int(4); }
int GetData() { return *_data; }
int GetData2() { return _data2; }
private:
int* _data;
int _data2;
};
Now when I create a buffer full of MyTestClass objects then send them over the pipe, I'm obviously losing _data in the destination process and getting garbage. Is there some strategy to this that I should use? I can use value types for simple cases, but for many complex classes I need to use some sort of dynamic memory and I like pointers.
Or, should I just look at using shared memory instead? Thanks
Both named pipes and shared memory have similar issues: You need to serialize the contents of the structure into the on the sending side and deserialize the structure from the on the receiving side.
The serialization process is essentially identical whether you're using named pipes or shared memory. For embedded pointers (like _data and _data2) you need to serialize the contents of the pointer in a consistent fashion.
There are lots of serialization strategies that you could use, depending on how your structures are laid out in memory and how efficient your IPC has to be. Or you could use DCE RPC and let the RPC marshaling code handle the complexities for you.
To send the data over a named pipe, you must serialize (or marshal) the data on the sending end, and deserialize (or unmarshal) it on the receiving end.
It sounds suspiciously as if you are simply writing a copy of the bytes in the data structure. This is no good whatsoever. You aren't copying the allocated data (it isn't stored between the first and last bytes of the data structure, but somewhere else altogether) and you are copying a pointer (_data) from one machine (or process) to another, and the memory address in the local process has no guaranteed meaning in the other.
Define yourself a wire protocol (if desparate, look at ASN.1 - no, on second thoughts, don't get that desparate) that defines the layout of the data for transmission over the wire. Then implement sender and receiver (or seralizer and deserializer) functions. Or find someone else's code that does this already.
Also remember to deal with endian-ness - you must define which sequence the bytes are sent through the named pipe.
For example, you could define that the message sent consists of a 4-byte unsigned integer in network byte order defining how many structures follow, and each structure could be a sequence of 4 signed 4-byte integers for the array followed by a single signed 4-byte integer for _data2 (also sent in network byte order).
Note that the choice of named pipe as the IPC mechanism is largely immaterial; unless you are using shared memory (inherently on the same machine), then endian-ness has to be dealt with, and even with shared memory, you need to deal with serialization.