I'm exploring various ways to do c++, vb.net fused projects. What I love about c++ is that you can load up a bunch of binary data, give it a pointer with the appropriate type and bam! You have yourself a pod object instance.
I'm looking a way to do the same in vb.net but I'm rather fuzzy on the way vb.net handles its memory. Basically, is there a way to convert binary data into an object without too much overhead in a manner that the exact same data would also be viable in c++ and its equivalent class definition? Further more, could you do that to binary data in another processes memory or perhaps shared memory without moving the actual data? (so that if both c++ and vb.net programs object instances point to the same chunk of memory, if one changes the data, the other one is instantly up to date too)
Related
I would like to know what happens when I write:
object.write((char*)&class_object, sizeof(class_object));
// or
object.read((char*)&class_object, sizeof(class_object));
From what I read so far, the class_object is converted to a pointer. But I don't know how it manages to convert data carried by the object into binary. What does the binary actually represent?
I am a beginner.
EDIT
Could you please explain what really happens when we write the above piece of code? I mean, what actually happens when we write (char*)*S, say where S is the object of a class that I have declared?
Imagine it this way, the class instance is just some memory chunk resting in your RAM, if you convert your class to a char pointer:
SomeClass someClassInstance;
char* data = reinterpret_cast<char*>(&someClassInstance);
It will point to the same data in your memory but it will be treated as a byte array in your program.
If you convert it back:
SomeClass* instance = reinterpret_cast<SomeClass*>(data);
It will be treated as the class again.
So in order to write your class to a file and later reconstruct it, you can just write the data to some file which will be sizeof(SomeClass) in size and later read the file and convert the raw bytes to the class instance.
However, keep in mind that you can only do this if your class is POD (Plain Old Data)!
In practice, your code won't work and is likely to yield undefined behavior, at least when your class or struct is not a POD (plain old data) and contains pointers or virtual functions (so has some vtable).
The binary file would contain the bit representation of your object, and this is not portable to another computer, or even to another process running the same program (notably because of ASLR) unless your object is a POD.
See also this answer to a very similar question.
You probably want some serialization. Since disks and file accesses are a lot slower (many dozen of thousands slower) than the CPU, it is often wise to use some more portable data representation. Practically speaking, you should consider some textual representation like e.g. JSON, XML, YAML etc.... Libraries such as jsoncpp are really easy to use, and you'll need to code something to transform your object into some JSON, and to create some object from a JSON.
Remember also that data is often more costly and more precious than code. The point is that you often want some old data (written by a previous version of your program) to be read by a newer version of your program. And that might not be trivial (e.g. if you have added or changed the type of some field in your class).
You could also read about dynamic software updating. It is an interesting research subject. Be aware of databases.
Read also about parsing techniques, notably about recursive descent parsers. They are relevant.
I read in the standards n4296 (Draft) § 1.8 page 7:
An object is a region of storage. [ Note: A function is not an object,
regardless of whether or not it occupies storage in the way that
objects do. —end note ]
I spent some days on the net looking for a good reason for such exclusion, with no luck. Maybe because I do not fully understand objects. So:
Why is a function not an object? How does it differ?
And does this have any relation with the functors (function objects)?
A lot of the difference comes down to pointers and addressing. In C++¹ pointers to functions and pointers to objects are strictly separate kinds of things.
C++ requires that you can convert a pointer to any object type into a pointer to void, then convert it back to the original type, and the result will be equal to the pointer you started with². In other words, regardless of exactly how they do it, the implementation has to ensure that a conversion from pointer-to-object-type to pointer-to-void is lossless, so no matter what the original was, whatever information it contained can be recreated so you can get back the same pointer as you started with by conversion from T* to void * and back to T*.
That's not true with a pointer to a function though--if you take a pointer to a function, convert it to void *, and then convert it back to a pointer to a function, you may lose some information in the process. You might not get back the original pointer, and dereferencing what you do get back gives you undefined behavior (in short, don't do that).
For what it's worth, you can, however, convert a pointer to one function to a pointer to a different type of function, then convert that result back to the original type, and you're guaranteed that the result is the same as you started with.
Although it's not particularly relevant to the discussion at hand, there are a few other differences that may be worth noting. For example, you can copy most objects--but you can't copy any functions.
As far as relationship to function objects goes: well, there really isn't much of one beyond one point: a function object supports syntax that looks like a function call--but it's still an object, not a function. So, a pointer to a function object is still a pointer to an object. If, for example, you convert one to void *, then convert it back to the original type, you're still guaranteed that you get back the original pointer value (which wouldn't be true with a pointer to a function).
As to why pointers to functions are (at least potentially) different from pointers to objects: part of it comes down to existing systems. For example, on MS-DOS (among others) there were four entirely separate memory models: small, medium, compact, and large. Small model used 16 bit addressing for either functions or data. Medium used 16 bit addresses for data, and 20-bit addresses for code. Compact reversed that (16 bit addresses for code, 20-bit addresses for data). Large used 20-bit addresses for both code and data. So, in either compact or medium model, converting between pointers to code and pointers to functions really could and did lead to problems.
More recently, a fair number of DSPs have used entirely separate memory buses for code and for data and (like with MS-DOS memory models) they were often different widths, converting between the two could and did lose information.
These particular rules came to C++ from C, so the same is true in C, for whatever that's worth.
Although it's not directly required, with the way things work, pretty much the same works out to be true for a conversion from the original type to a pointer to char and back, for whatever that's worth.
Why a function is not an object? How does it differ?
To understand this, let's move from bottom to top in terms of abstractions involved. So, you have your address space through which you can define the state of the memory and we have to remember that fundamentally it's all about this state you operate on.
Okay, let's move a bit higher in terms of abstractions. I am not taking about any abstractions imposed by a programming language yet (like object, array, etc.) but simply as a layman I want to keep a record of a portion of the memory, lets call it Ab1 and another one called Ab2.
Both have a state fundamentally but I intend to manipulate/make use of the state differently.
Differently...Why and How?
Why ?
Because of my requirements (to perform addition of 2 numbers and store the result back, for example). I will be using use Ab1 as a long usage state and Ab2 as relatively shorter usage state. So, I will create a state for Ab1(with the 2 numbers to add) and then use this state to populate some of state of Ab2(copy them temporarily) and perform further manipulation of Ab2(add them) and save a portion of resultant Ab2 to Ab1(the added result). Post that Ab2 becomes useless and we reset its state.
How?
I am going to need some management of both the portions to keep track of what words to pick from Ab1 and copy to Ab2 and so on. At this point I realize that I can make it work to perform some simple operations but something serious shall require a laid out specification for managing this memory.
So, I look for such management specification and it turns out there exists a variety of these specifications (with some having built-in memory model, others provide flexibility to manage the memory yourself) with a better design. In-fact because they(without even dictating how to manage the memory directly) have successfully defined the encapsulation for this long lived storage and rules for how and when this can be created and destroyed.
The same goes for Ab2 but the way they present it makes me feel like this is much different from Ab1. And indeed, it turns out to be. They use a stack for state manipulation of Ab2 and reserve memory from heap for Ab1. Ab2 dies after a while.(after finished executing).
Also, the way you define what to do with Ab2 is done through yet another storage portion called Ab2_Code and specification for Ab1 involves similarly Ab1_Code
I would say, this is fantastic! I get so much convenience that allows me to solve so many problems.
Now, I am still looking from a layman's perspective so I don't feel surprised really having gone through the thought process of it all but if you question things top-down, things can get a bit difficult to put into perspective.(I suspect that's what happened in your case)
BTW, I forgot to mention that Ab1 is called an object officially and Ab2 a function stack while Ab1_Code is the class definition and Ab2_Code is the function definition code.
And it is because of these differences imposed by the PL, you find that they are so different.(your question)
Note: Don't take my representation of Ab1/Object as a long storage abstraction as a rule or a concrete thing - it was from layman perspective. The programming language provides much more flexibility in terms of managing lifecycle of an object. So, object may be deployed like Ab1 but it can be much more.
And does this have any relation with the functors (function objects)?
Note that the first part answer is valid for many programming languages in general(including C++), this part has to do specifically with C++ (whose spec you quoted). So you have pointer to a function, you can have a pointer to an object too. Its just another programming construct that C++ defines. Notice that this is about having a pointer to the Ab1, Ab2 to manipulate them rather than having another distinct abstraction to act upon.
You can read about its definition, usage here:
C++ Functors - and their uses
Let me answer the question in simpler language (terms).
What does a function contain?
It basically contains instructions to do something. While executing the instructions, the function can temporarily store and / or use some data - and might return some data.
Although the instructions are stored somewhere - those instructions themselves are not considered as objects.
Then, what are the objects?
Generally, objects are entities which contain data - which get manipulated / changed / updated by functions (the instructions).
Why the difference?
Because computers are designed in such way that the instructions do not depend on the data.
To understand this, let's think about a calculator. We do different mathematical operations using a calculator. Say, if we want to add some numbers, we provide the numbers to the calculator. No matter what the numbers are, the calculator will add them in the same way following the same instructions (if the result exceeds the calculator's capacity to store, it will show an error - but that is because of calculator's limitation to store the result (the data), not because of its instructions for addition).
Computers are designed in the similar manner. That is why when you use a library function (for example qsort()) on some data which are compatible with the function, you get the same result as you expect - and the functionality of the function doesn't change if the data changes - because the instructions of the function remains unchanged.
Relation between function and functors
Functions are set of instructions; and while they are being executed, some temporary data can be required to store. In other words, some objects might be temporarily created while executing the function. These temporary objects are functors.
I'm designing a JNI interface that passes string parameters from Java to C++. I need high performance and have been able to use Direct ByteBuffer and String.getBytes() to do that fairly well, but the penalty for passing strings to C/C++ still remains fairly high. I recently read about the Open JDK's Unsafe class. This excellent page got me started, but I'm finding Unsafe to be woefully, but understandably poorly documented.
I'm wondering, if I use the Unsafe class to obtain a pointer to a string and pass it to C++, is there a risk that the object has moved before the C++ code is entered? And even while C++ is executing? Or are these addresses provided by the Unsafe code somehow pinned? If they aren't pinned, how are these Unsafe pointers ever useful?
Unsafe is not meant to interop with JNI. So obtained via Unsafe could change any time (even in parallel with your C++).
JNI API has ability to pin object in memory to access array content (in HotSpot JVM it would block GC thus may have negative effect on GC pause duration).
In particular, Get*ArrayElements would pin array until you explicitly do Release*ArrayElements. GetStringChars work similar way.
Direct ByteBuffer hold pointer to memory buffer outside of heap, hense this buffer is not moving and you can access it for Native code.
I've read the Java source for java.misc.Unsafe and have a bit more insight.
Unsafe has at least two ways of dealing with memory.
allocateMemory/reallocateMemory/freeMemory/etc -- As far as I can tell this allocation of memory is outside the heap so faces no GC'ing challenges. I have indirectly tested this and it seems that the long returned is simply a pointer to the memory. It seems very likely that this type of memory is safe to pass through JNI to native code. And the application Java code should be able to quickly modify/query it before and after JNI calls by using some of the other intrinsic Unsafe methods that support this style of memory pointer.
object+offset - These methods accept a pointer to an object and an "offset" token to indicate where in the object to fetch/modify the value. The objects presumably are always in the Java heap, but passing the object to these methods probably helps resolve GC complications. It does sounds like the "offset" is sometimes a "cookie" rather than an actual offset, but it also sounds like that in the case of arrays, arrayBaseOffset() returns an "offset" that one can manipulate arithmetically. I don't know if this object+offset is safe for JNI code. I don't see a method to generate a pointer directly to the Java object in the heap that one could (dangerously) pass through JNI. One could pass an object and offset, but given the cost of passing Objects through JNI, this approach is not appealing anyway.
Like (1), the code associated with the page I referenced in my posting is probably pretty safe for JNI interactions. It takes the object+offset approach when dealing with String, but uses approach (1) when dealing with the direct ByteBuffer, which always reside outside the Java heap. Direct ByteBuffer's are very JNI friendly and often they can be used in ways that avoids the JNI Object passing costs I allude to in my comment to Tom above.
I'm attempting to implement a Save/Load feature into my small game. To accomplish this I have a central class that stores all the important variables of the game such as position, etc. I then save this class as binary data to a file. Then simply load it back for the loading function. This seems to work MOST of the time, but if I change certain things then try to do a save/load the program will crash with memory access violations. So, are classes guaranteed to have the same structure in memory on every run of the program or can the data be arranged at random like a struct?
Response to Jesus - I mean the data inside the class, so that if I save the class to disk, when I load it back, will everything fit nicely back.
Save
fout.write((char*) &game,sizeof Game);
Load
fin.read((char*) &game, sizeof Game);
Your approach is extremely fragile. With many restrictions, it can work. These restrictions are not worth subjecting your users (or yourself!) to in typical cases.
Some Restrictions:
Never refer to external memory (e.g. a pointer or reference)
Forbid ABI changes/differences. Common case: memory layout and natural alignment on 32 vs 64 will vary. The user will need a new 'game' for each ABI.
Not endian compatible.
Altering your type's layouts will break your game. Changing your compiler options can do this.
You're basically limited to POD data.
Use offsets instead of pointers to refer to internal data (This reference would be in contiguous memory).
Therefore, you can safely use this approach in extremely limited situations -- that typically applies only to components of a system, rather than the entire state of the game.
Since this is tagged C++, "boost - Serialization" would be a good starting point. It's well tested and abstracts many of the complexities for you.
Even if this would work, just don't do it. Define a file format at the byte-level and write sensible 'convert to file format' and 'convert from file format' functions. You'll actually know the format of the file. You'll be able to extend it. Newer versions of the program will be able to read files from older versions. And you'll be able to update your platform, build tools, and classes without fear of causing your program to crash.
Yes, classes and structures will have the same layout in memory every time your program runs., although I can't say if the standard enforces this. The machine code generated by C++ compilers use "hard-coded" offsets to access type fields, so they are fixed. Realistically, the layout will only change if you modify the C++ class definition (field sizes, order, virtual methods, etc.), compile with a different compiler or change compiler options.
As long as the type is POD and without pointer fields, it should be safe to simply dump it to a file and read it back with the exact same program. However, because of the above-mentionned concerns, this approach is quite inflexible with regard to versionning and interoperability.
[edit]
To respond to your own edit, do not do this with your "Game" object! It certainly has pointers to other objects, and those objects will not exist anymore in memory or will be elsewhere when you'll reload your file.
You might want to take a look at this.
Classes are not guaranteed to have the same structure in memory as pointers can point to different locations in memory each time a class is created.
However, without posting code it is difficult to say with certainty where the problem is.
I wrote a generic in-memory B+Tree implementation in C++ few times ago, and I'm thinking about making it persistent on disk (which is why B+Tree have been designed for initially).
My first thought was to use mmap (I'm under Linux) to be able to manipulate the file as normal memory and just rewrite the new operator of my nodes classes so that it returns pointers in the mapped portion and create a smart pointer which can convert RAM adresses to file offset to link my nodes with others.
But I want my implementation to be generic, so the user can store an int, an std::string, or whatever custom class he wants in the B+tree.
That's where the problem occurs: for primitive types or aggregated types that do not contain pointers that's all good, but as soon as the object contains a pointer/reference to an heap allocated object, this approach no longer works.
So my question is: is there some known way to overcome this difficulty? My personnal searches on the topic end up unsuccessful, but maybe I missed something.
As far as I know, there are three (somewhat) easy ways to solve this.
Approach 1: write a std::streambuf that points to some pre-allocated memory.
This approach allows you to use operator<< and use whatever existing code already exists to get a string representation of what you want.
Pro: re-use loads of existing code.
Con: no control over how operator<< spits out content.
Con: text-based representations only.
Approach 2: write your own (many times overloaded) output function.
Pro: can come up with binary representation.
Pro: exact control over every single output format.
Con: re-write so many output functions... writing overloads for new types by clients is a pain because they shouldn't write functions that fall in your library's namespace... unless you resort to Koenig (argument dependant) lookup!
Approach 3: write a btree_traits<> template.
Pro: can come up with binary representation.
Pro: exact control over every single output format.
Pro: more control on output and format that a function, may contain meta data and all.
Con: still requires you / your library's users to write lots of custom overloads.
Pro: have the btree_traits<> detault to use operator<< unless someone overrides the traits?
You cannot write a truly generic and transparent version since if the pointer in a non-trivial item was allocated with malloc (or new and new[]), then it's already in the heap.
A non-transparent sollution may be serializing the class is an option, and this can be done relatively easy. Before you store the class you'd have to call the serialization function and before pulling it you'd call the deserialize. Boost has good serialization features that you could make work with your B+Tree.
Handling pointers and references in a generic way means you will need to inspect the type of the structure you're trying to store, and its fields. C++ is a language not known for its reflectiveness.
But even in a language with powerful reflection, a generic solution to this problem is difficult. You might be able to get it to work for a subset of types in higher level languages like Python, Ruby, etc. A related and more powerful paradigm is the persistent programming language.
The function you want is usually implemented by delegating responsibility for writing the data block to the target type itself. It's called serialization. It simply means writing an interface with a method to dump data, and a method to load data. Any class that wants to be persisted in your B-tree then simply implements this interface.