C++ object in memory - c++

Is there a standard in storing a C++ objects in memory? I wish to set a char* pointer to a certain address in memory, so that I can read certain objects' variables directly from the memory byte by byte. When I am using Dev C++, the variables are stored one by one right in the memory address of an object in the order that they were defined. Now, can it be different while using a different compiler (like the variables being in a different order, or somewhere else)? Thank you in advance. :-)

The variables can't be in a different order, as far as I know. However, there may be varying amounts of padding between members. Also I think all bets are off with virtual classes and different implementations of user-defined types (such as std::string) may be completely different between libraries (or even build options).
It seems like a very suspicious thing to do. What do you need it for: to access private members?

I believe that the in-memory layout of objects is implementation defined - not the ordering, necessarily, but the amount of space. In particular, you will probably run into issues with byte-alignment and so-forth, especially across platforms.
Can you give us some details of what you're trying to do?

Implementations are free to do anything they want :P. However since C++ has to appeal to certain styles of programming, you will find a deterministic way of accessing your fields for your specific compiler/platform/cpu architecture.
If your byte ordering is varied on a different compiler, my first assumption would be byte packing issues. If you need the class to have a certain specific byte ordering first look up "#pragma pack" directives for your compiler... you can change the packing order into something less optimal but deterministic. Please note this piece of advice generally applies to POD data types.

The C++ compiler is not allowed to reorder variables within a visibility block (public, protected, etc). But it is allowed to reorder variables in separate visibility blocks. For example:
struct A {
int a;
short b;
char c;
};
struct B {
int a;
public:
short b;
protected:
char c;
};
In the above, the variables in A will always be laid out in the order a, b, c. The variables in B might be laid out in another order if the compiler chose. And, of course, there are alignment and packing requirements so there might be "spaces" between some of the variables if needed.

Keep in mind when working with multi-dimensional arrays that they are stored in Row Major Order.

The order of the variables should never change, but as others have said, the byte packing will vary. Another thing to consider is the endianness of the platform.
To get around the byte alignment/packing problem, most compilers offer some way to guide the process. In gcc you could use __attribute__((__packed__)) and in msvc #pragma pack.

I've worked with something that did this professionally, and as far as I could tell, it worked very specifically because it was decoding something another tool encoded, so we always knew exactly how it worked.
We did also use structs that we pointed at a memory address, then read out data via the struct's variables, but the structs notably included packing and we were on an embedded platform.
Basically, you can do this, so long as you know -exactly- how everything is constructed on a byte-by-byte level. (You might be able to get away with knowing when it's constructed the same way, which could save some time and learning)

It sounds like you want to marshall objects between machines over a TCP/IP connection. You can probably get away with this if the code was compiled with the same compiler on each end, otherwise, I'm not so sure. Keep in mind that if the platforms can be different, then you might need to take into account different processor endians!

Sounds like what you real want to ask is how to serialize your objects

http://dieharddeveloper.blogspot.in/2013/07/c-memory-layout-and-process-image.html
In the middle of the process's address space, there is a region is reserved for shared objects. When a new process is created, the process manager first maps the two segments from the executable into memory. It then decodes the program's ELF header. If the program header indicates that the executable was linked against a shared library, the process manager (PM) will extract the name of the dynamic interpreter from the program header. The dynamic interpreter points to a shared library that contains the runtime linker code.

Related

Are classes guaranteed to have the same organization in memory between program runs?

I'm attempting to implement a Save/Load feature into my small game. To accomplish this I have a central class that stores all the important variables of the game such as position, etc. I then save this class as binary data to a file. Then simply load it back for the loading function. This seems to work MOST of the time, but if I change certain things then try to do a save/load the program will crash with memory access violations. So, are classes guaranteed to have the same structure in memory on every run of the program or can the data be arranged at random like a struct?
Response to Jesus - I mean the data inside the class, so that if I save the class to disk, when I load it back, will everything fit nicely back.
Save
fout.write((char*) &game,sizeof Game);
Load
fin.read((char*) &game, sizeof Game);
Your approach is extremely fragile. With many restrictions, it can work. These restrictions are not worth subjecting your users (or yourself!) to in typical cases.
Some Restrictions:
Never refer to external memory (e.g. a pointer or reference)
Forbid ABI changes/differences. Common case: memory layout and natural alignment on 32 vs 64 will vary. The user will need a new 'game' for each ABI.
Not endian compatible.
Altering your type's layouts will break your game. Changing your compiler options can do this.
You're basically limited to POD data.
Use offsets instead of pointers to refer to internal data (This reference would be in contiguous memory).
Therefore, you can safely use this approach in extremely limited situations -- that typically applies only to components of a system, rather than the entire state of the game.
Since this is tagged C++, "boost - Serialization" would be a good starting point. It's well tested and abstracts many of the complexities for you.
Even if this would work, just don't do it. Define a file format at the byte-level and write sensible 'convert to file format' and 'convert from file format' functions. You'll actually know the format of the file. You'll be able to extend it. Newer versions of the program will be able to read files from older versions. And you'll be able to update your platform, build tools, and classes without fear of causing your program to crash.
Yes, classes and structures will have the same layout in memory every time your program runs., although I can't say if the standard enforces this. The machine code generated by C++ compilers use "hard-coded" offsets to access type fields, so they are fixed. Realistically, the layout will only change if you modify the C++ class definition (field sizes, order, virtual methods, etc.), compile with a different compiler or change compiler options.
As long as the type is POD and without pointer fields, it should be safe to simply dump it to a file and read it back with the exact same program. However, because of the above-mentionned concerns, this approach is quite inflexible with regard to versionning and interoperability.
[edit]
To respond to your own edit, do not do this with your "Game" object! It certainly has pointers to other objects, and those objects will not exist anymore in memory or will be elsewhere when you'll reload your file.
You might want to take a look at this.
Classes are not guaranteed to have the same structure in memory as pointers can point to different locations in memory each time a class is created.
However, without posting code it is difficult to say with certainty where the problem is.

Access data in shared memory C++ POSIX

I open a piece of shared memory and get a handle of it. I'm aware there are several vectors of data stored in the memory. I'd like to access those vectors of data and perform some actions on them. How can I achieve this? Is it appropriate to treat the shared memory as an object so that we can define those vectors as fields of the object and those needed actions as member functions of the object?
I've never dealt with shared memory before. To make things worse, I'm new to C++ and POSIX. Could someone please provide some guidance? Simple examples would be greatly appreciated.
int my_shmid = shmget(key,size,shmflgs);
...
void* address_of_my_shm1 = shat(my_shmid,0,shmflags);
Object* optr = static_cast<Object*>(address_of_my_shm1);
...or, in some other thread/process to which you arranged to pass the address_of_my_shm1
...by some other means
void* address_of_my_shm2 = shat(my_shmid,address_of_my_shm1,shmflags);
You may want to assert that address_of_shm1 == address_of_shm2. But note that I say "may" - you don't actually have to do this. Some types/structs/classes can be read equally well at different addresses.
If the object will appear in different address spaces, then pointers outside the shhm in process A may not point to the same thing as in process B. In general, pointers outside the shm are bad. (Virtual functions are pointers outside the object, and outside the shm. Bad, unless you have other reason to trust them.)
Pointers inside the shm are usable, if they appear at the same address.
Relative pointers can be quite usable, but, again, so long as they point only inside the shm. Relative pointers may be relative to the base of an object, i.e. they may be offsets. Or they may be relative to the pointer itself. You can define some nice classes/templates that do these calculations, with casting going on under the hood.
Sharing of objects through shmem is simplest if the data is just POD (Plain Old Data). Nothing fancy.
Because you are in different processes that are not sharing the whole address space, you may not be guaranteed that things like virtual functions will appear at the same address in all processes using the shm shared memory segment. So probably best to avoid virtual functions. (If you try hard and/or know linkage, you may in some circumstances be able to share virtual functions. But that is one of the first things I would disable if I had to debug.)
You should only do this if you are aware of your implementation's object memory model. And if advanced (for C++) optimizations like splitting structs into discontiguous hot and cold parts are disabled. Since such optimizations rae arguably not legal for C++, you are probably safe.
Obviously you are better off if you are casting to the same object type/class on all sides.
You can get away with non-virtual functions. However, note that it can be quite easy to have the same class, but different versions of the class - e.g. differing in size, e.g. adding a new field and changing the offsets of all of the other fields - so you need to be quite careful to ensure all sides are using the same definitions and declarations.

C++ Class Memory Model And Alignment

I have several questions to ask that pertains to data position and alignment in C++. Do classes have the same memory placement and memory alignment format as structs?
More specifically, is data loaded into memory based on the order in which it's declared? Do functions affect memory alignment and data position or are they allocated to another location? Generally speaking, I keep all of my memory alignment and position dependent stuff like file headers and algorithmic data within a struct. I'm just curious to know whether or not this is intrinsic to classes as it is to structs and whether or not it will translate well into classes if I chose to use that approach.
Edit: Thanks for all your answers. They've really helped a lot.
Do classes have the same memory placement and memory alignment format
as structs?
The memory placement/alignment of objects is not contingent on whether its type was declared as a class or a struct. The only difference between a class and a struct in C++ is that a class have private members by default while a struct have public members by default.
More specifically, is data loaded into memory based on the order in
which it's declared?
I'm not sure what you mean by "loaded into memory". Within an object however, the compiler is not allowed to rearrange variables. For example:
class Foo {
int a;
int b;
int c;
};
The variables c must be located after b and b must be located after a within a Foo object. They are also constructed (initialized) in the order shown in the class declaration when a Foo is created, and destructed in the reverse order when a Foo is destroyed.
It's actually more complicated than this due to inheritance and access modifiers, but that is the basic idea.
Do functions affect memory alignment and data position or are they
allocated to another location?
Functions are not data, so alignment isn't a concern for them. In some executable file formats and/or architectures, function binary code does in fact occupy a separate area from data variables, but the C++ language is agnostic to that fact.
Generally speaking, I keep all of my memory alignment and position
dependent stuff like file headers and algorithmic data within a
struct. I'm just curious to know whether or not this is intrinsic to
classes as it is to structs and whether or not it will translate well
into classes if I chose to use that approach.
Memory alignment is something that's almost automatically taken care of for you by the compiler. It's more of an implementation detail than anything else. I say "almost automatically" since there are situations where it may matter (serialization, ABIs, etc) but within an application it shouldn't be a concern.
With respect with reading files (since you mention file headers), it sounds like you're reading files directly into the memory occupied by a struct. I can't recommend that approach since issues with padding and alignment may make your code work on one platform and not another. Instead you should read the raw bytes a couple at a time from the file and assign them into the structs with simple assignment.
Do classes have the same memory placement and memory alignment format as structs?
Yes. Technically there is no difference between a class and a struct. The only difference is the default member access specification otherwise they are identical.
More specifically, is data loaded into memory based on the order in which it's declared?
Yes.
Do functions affect memory alignment and data position or are they allocated to another location?
No. They do not affect alignment. Methods are compiled separately. The object does not contain any reference to methods (to those that say virtual tables do affect members the answer is yes and no but this is an implementation detail that does not affect the relative difference between members. The compiler is allowed to add implementation specific data to the object).
Generally speaking, I keep all of my memory alignment and position dependent stuff like file headers and algorithmic data within a struct.
OK. Not sure how that affects anything.
I'm just curious to know whether or not this is intrinsic to classes as it is to structs
Class/Structs different name for the same thing.
and whether or not it will translate well into classes if I chose to use that approach.
Choose what approach?
C++ classes simply translate into structs with all the instance variables as the data contained inside the structs, while all the functions are separated from the class and are treated like functions with accept those structs as an argument.
The exact way instance variables are stored depends on the compiler used, but they generally tend to be in order.
C++ classes do not participate in "persistence", like binary-mode structures, and shouldn't have alignment attached to them. Keep the classes simple.
Putting alignment with classes may have negative performance benefits and may have side effects too.

Locating objects (structs) in memory - how to?

How would you locate an object in memory, lets say that you have a struct defined as:
struct POINT {
int x;
int y;
};
How would I scan the memory region of my app to find instances of this struct so that I can read them out?
Thanks R.
You can't without adding type information to the struct. In memory a struct like that is nothing else than 2 integers so you can't recognize them any better than you could recognize any other object.
You can't. Structs don't store any type information (unless they have virtual member functions), so you can't distinguish them from any other block of sizeof(POINT) bytes.
Why don't you store your points in a vector or something?
You can't. You have to know the layout to know what section of memory have to represent a variable. That's a kind of protocol and that's why we use text based languages instead raw values.
You don't - how would you distinguish two arbitrary integers from random noise?
( but given a Point p; in your source code, you can obtain its address using the address-of operator ... Point* pp = &p;).
Short answer: you can't. Any (appropriately aligned) sequence of 8 bytes could potentially represent a POINT. In fact, an array of ints will be indistinguishable from an array of POINTS. In some cases, you could take advantage of knowledge of the compiler implementation to do better. For instance, if the struct had virtual functions, you could look for the correct vtable pointer - but there could also be false positives.
If you want to keep track of objects, you need to register them in their constructor and unregister them in their destructor (and pay the performance penalty), or give them their own allocator.
There's no way to identify that struct. You need to put the struct somewhere it can be found, on the stack or on the heap.
Sometimes data structures are tagged with identifying information to assist with debugging or memory management. As a means of data organization, it is among the worst possible approaches.
You probably need to a lot of general reading on memory management.
There is no standard way of doing this. The platform may specify some APIs which allow you to access the stack and the free store. Further, even if you did, without any additional information how would you be sure that you are reading a POINT object and not a couple of ints? The compiler/linker can read this because it deals with (albeit virtual) addresses and has some more information (and control) than you do.
You can't. Something like that would probably be possible on some "tagged" architecture that also supported tagging objects of user-defined types. But on a traditional architecture it is absolutely impossible to say for sure what is stored in memory simply by looking at the raw memory content.
You can come closer to achieving what you want by introducing a unique signature into the type, like
struct POINT {
char signature[8];
int x;
int y;
};
and carefully setting it to some fixed and "unique" pattern in each object of POINT type, and then looking for that pattern in memory. If it is your application, you can be sure with good degree of certainty that each instance of the pattern is your POINT object. But in general, of course, there will never be any guarantee that the pattern you found belongs to your object, as opposed to being there purely accidentally.
What everyone else has said is true. In memory, your struct is just a few bytes, there's nothing in particular to distinguish it.
However, if you feel like a little hacking, you can look up the internals of your C library and figure out where memory is stored on the heap and how it appears. For example, this link shows how stuff gets allocated in one particular system.
Armed with this knowledge, you could scan your heap to find allocated blocks that were sizeof(POINT), which would narrow down the search considerably. If you look at the table you'll notice that the file name and line number of the malloc() call are being recorded - if you know where in your source code you're allocating POINTs, you could use this as a reference too.
However, if your struct was allocated on the stack, you're out of luck.

Aligning a class to a class it inherits from? Force all stack alignment? Change sizeof?

I want to have a base class which dictates the alignment of the objects which inherit from it. This works fine for the heap because I can control how that gets allocated, and how arrays of it get allocated in a custom array template. However, the actual size of the class as far as C++ is concerned doesn't change at all. Which is to say if I perform pointer arithmetic on a pointer of the derived class then it will skip to the wrong location. That's issue one.
Issue two is stack alignment. There doesn't seem to be a way to directly force the entire stack to be 16 byte aligned pointers.
The only thing I can see to affect these at all is the vc++ and g++ compiler specific settings, but the problem here is I don't want to have to manually fix the alignment all the time. That would be bound to be error prone, not to mention a pain.
I could also maybe make some kind of smart pointer but that would introduce more issues, too.
If I just align the base class, will the child classes be aligned, as well? If so, that would solve most of my problems but that seems doubtful to be the case (though I will try this).
If a base class has a particular alignment requirement, then any derived classes will have at least that alignment (they could get a stricter alignment requirement due to their own members). Otherwise the compiler couldn't guarantee that accessing the base members would meet the requirements they have.
However, there's really nothing you can portably do for stack alignment - that's handled entirely by the compiler and the compiler's platform ABI requirements. As you indicate, a compiler option or maybe a pragma might let you exert some control, but nothing portable.
Keeping the Michael's answer in mind with respect to portability.. I assume you are aiming at SSE vectorization and are starting to feel that pain and that really ought to be a standard way to do it with explicit hints and low-level control. Both compilers you mention do strive for 16 byte alignment on stack by default.
So while compiler support does vary, can be changed, is challenged etc, it can also be heavily underutilised on one of them; it depends on the optimizers and types you are containing and naturally the vectorizer choices for sanity. Then again, it's not clear what you are doing and what the use is.
Since your problem definition without an example is sketchy, perhaps you can give a shot to declspec(align(16)) for California (with LTCG), and __attribute((aligned (16))) for gcc. If that helps, it'd be good to let us know and what specifically worked or not in example, which compiler warnings or errors too. I would also avoid any smart pointer usage for various reasons.
sizeof being in your question indicates you are also facing challenges with regards to variable arrays and pointer types, in which case the former might be problematic. And even though you can live with losing the copy constructor and assignment facilities, it will do no harm to inspect the machine code. That is, in case of complication, what you need to do is look at the output and in case of VC++ switch to Intel for better hints in Windows, and test for/compilet and runtime assert/trap failures early or for the other one pick any free object file analysis tool. Then you just work around the clearly problematic constructs in a sanest way you can, something that is always possible.
We had a similar problem on PARISC HPUX where the only atomic construct was a 4 byte atomic clear that required 16 byte alignment.
We could fudge it by declaring our 4 byte quantity like so:
struct Stupid_HPUX
{
int 4byteLockWordInHere[4] ;
} ;
and picking the right one to use at runtime. For your problem, you could do something similar
union rawMemory
{
int blah[(sizeof(yourtype) + 16)/4] ;
char c[1] ;
} u ;
then use placement new with a runtime determined address to fix the location required.