C++ Class Memory Model And Alignment - c++

I have several questions to ask that pertains to data position and alignment in C++. Do classes have the same memory placement and memory alignment format as structs?
More specifically, is data loaded into memory based on the order in which it's declared? Do functions affect memory alignment and data position or are they allocated to another location? Generally speaking, I keep all of my memory alignment and position dependent stuff like file headers and algorithmic data within a struct. I'm just curious to know whether or not this is intrinsic to classes as it is to structs and whether or not it will translate well into classes if I chose to use that approach.
Edit: Thanks for all your answers. They've really helped a lot.

Do classes have the same memory placement and memory alignment format
as structs?
The memory placement/alignment of objects is not contingent on whether its type was declared as a class or a struct. The only difference between a class and a struct in C++ is that a class have private members by default while a struct have public members by default.
More specifically, is data loaded into memory based on the order in
which it's declared?
I'm not sure what you mean by "loaded into memory". Within an object however, the compiler is not allowed to rearrange variables. For example:
class Foo {
int a;
int b;
int c;
};
The variables c must be located after b and b must be located after a within a Foo object. They are also constructed (initialized) in the order shown in the class declaration when a Foo is created, and destructed in the reverse order when a Foo is destroyed.
It's actually more complicated than this due to inheritance and access modifiers, but that is the basic idea.
Do functions affect memory alignment and data position or are they
allocated to another location?
Functions are not data, so alignment isn't a concern for them. In some executable file formats and/or architectures, function binary code does in fact occupy a separate area from data variables, but the C++ language is agnostic to that fact.
Generally speaking, I keep all of my memory alignment and position
dependent stuff like file headers and algorithmic data within a
struct. I'm just curious to know whether or not this is intrinsic to
classes as it is to structs and whether or not it will translate well
into classes if I chose to use that approach.
Memory alignment is something that's almost automatically taken care of for you by the compiler. It's more of an implementation detail than anything else. I say "almost automatically" since there are situations where it may matter (serialization, ABIs, etc) but within an application it shouldn't be a concern.
With respect with reading files (since you mention file headers), it sounds like you're reading files directly into the memory occupied by a struct. I can't recommend that approach since issues with padding and alignment may make your code work on one platform and not another. Instead you should read the raw bytes a couple at a time from the file and assign them into the structs with simple assignment.

Do classes have the same memory placement and memory alignment format as structs?
Yes. Technically there is no difference between a class and a struct. The only difference is the default member access specification otherwise they are identical.
More specifically, is data loaded into memory based on the order in which it's declared?
Yes.
Do functions affect memory alignment and data position or are they allocated to another location?
No. They do not affect alignment. Methods are compiled separately. The object does not contain any reference to methods (to those that say virtual tables do affect members the answer is yes and no but this is an implementation detail that does not affect the relative difference between members. The compiler is allowed to add implementation specific data to the object).
Generally speaking, I keep all of my memory alignment and position dependent stuff like file headers and algorithmic data within a struct.
OK. Not sure how that affects anything.
I'm just curious to know whether or not this is intrinsic to classes as it is to structs
Class/Structs different name for the same thing.
and whether or not it will translate well into classes if I chose to use that approach.
Choose what approach?

C++ classes simply translate into structs with all the instance variables as the data contained inside the structs, while all the functions are separated from the class and are treated like functions with accept those structs as an argument.
The exact way instance variables are stored depends on the compiler used, but they generally tend to be in order.

C++ classes do not participate in "persistence", like binary-mode structures, and shouldn't have alignment attached to them. Keep the classes simple.
Putting alignment with classes may have negative performance benefits and may have side effects too.

Related

Size of object and C++ standard

Looking around I found many places where the way to get the size of a certain object (class or struct) is explained. I read about the padding, about the fact that virtual function table influences the size and that "pure method" object has size of 1 byte. However I could not find whether these are facts about implementation or C++ standard (at least I was not able to find all them).
In particular I am in the following situation: I'm working with some data which are encoded in some objects. These objects do not hold pointers to other data. They do not inherit from any other class, but they have some methods (non virtual). I have to put these data in a buffer to send them via some socket. Now reading what I mentioned above, I simply copy my objects on the sender buffer, noticing that the data are "serialized" correctly, i.e. each member of the object is copied, and methods do not affect the byte structure.
I would like to know if what I get is just because of the implementation of the compiler or if it is prescribed by the standard.
The memory layouts of classes are not specified in the C++ standard precisely. Even the memory layout of scalar objects such as integers aren't specified. They are up to the language implementation to decide, and generally depend on the underlying hardware. The standard does specify restrictions that the implementation specific layout must satisfy.
If a type is trivially copyable, then it can be "serialised" by copying its memory into a buffer, and it can be de-it serialised back as you describe. However, such trivial serialisation only works when the process that de-serialises uses the same memory layout. This cannot generally be assumed to be the case since the other process may be running on entirely different hardware and may have been compiled with a different (version of) compiler.
You should use POD (plain-old-data). A structure is POD if it hasn't virtual table, some constructors, private methods and many other things.
There is garantee the pod data is placed in memory in declaration order.
There is alignment in pod data. And you should specify right alignment (it's your decision). See #pragma pack (push, ???).

How can class definitions not occupy memory?

So I have read this about if class definitions occupy memory and this about if function occupy memory. This is what I do not get: How come class definitions do not occupy memory if functions do, or their code does. I mean, class definitions are also code, so shouldn't that occupy memory just like function code does?
It is not entirely correct to say that class definitions do not occupy memory: any class with member functions may place some code in memory, although the amount of code and its actual placement depends heavily on function inlining.
The Q&A at the first link talks about sizeof, which shows a per-instance memory requirement of the class, which excludes memory requirements for storing member functions, static members, inlined functions, dispatch tables, and so on. This is because all these elements are shared among all instances of the class.
You don't need to keep the class definition anywhere, because the details of how to create an instance of a class are encoded in its constructors.
(In a sense, the class definition is code, it's just not represented explicitly.)
All you need to know in order to create an object is
How big it is,
Which constructor to use for creating it, and
Which its virtual functions are.
To create an instance of class A:
Reserve a piece of memory of size sizeof(A) (or be handed one),
Associate that piece of memory with the virtual functions of A, if any (usually held in a table in a predetermined location), and
Tell the relevant A constructor where the A should be created, and then let it do the actual work.
You don't need to know a thing about the types of member variables or anything like that, the constructors know what to do once they know where the object is to be created.
(Every member variable can be found at an offset from the beginning of the object, so the constructor knows where things must be.)
To create a function, on the other hand, you would need to store its definition in some form and then generate the code at runtime. (This is usually called "Just-in-time" compilation.)
This requires a compiler, which means that you need to either
Include a compiler in every executable, or
Provide (or require everyone to install) a shared compiler for all executables (Java VMs usually contain at least one).
C++ compilers instead generate the functions in advance.
Abusing terminology a little, you could say that the functions are "instantiated" by the compilation process, with the source code as a blueprint.

Access data in shared memory C++ POSIX

I open a piece of shared memory and get a handle of it. I'm aware there are several vectors of data stored in the memory. I'd like to access those vectors of data and perform some actions on them. How can I achieve this? Is it appropriate to treat the shared memory as an object so that we can define those vectors as fields of the object and those needed actions as member functions of the object?
I've never dealt with shared memory before. To make things worse, I'm new to C++ and POSIX. Could someone please provide some guidance? Simple examples would be greatly appreciated.
int my_shmid = shmget(key,size,shmflgs);
...
void* address_of_my_shm1 = shat(my_shmid,0,shmflags);
Object* optr = static_cast<Object*>(address_of_my_shm1);
...or, in some other thread/process to which you arranged to pass the address_of_my_shm1
...by some other means
void* address_of_my_shm2 = shat(my_shmid,address_of_my_shm1,shmflags);
You may want to assert that address_of_shm1 == address_of_shm2. But note that I say "may" - you don't actually have to do this. Some types/structs/classes can be read equally well at different addresses.
If the object will appear in different address spaces, then pointers outside the shhm in process A may not point to the same thing as in process B. In general, pointers outside the shm are bad. (Virtual functions are pointers outside the object, and outside the shm. Bad, unless you have other reason to trust them.)
Pointers inside the shm are usable, if they appear at the same address.
Relative pointers can be quite usable, but, again, so long as they point only inside the shm. Relative pointers may be relative to the base of an object, i.e. they may be offsets. Or they may be relative to the pointer itself. You can define some nice classes/templates that do these calculations, with casting going on under the hood.
Sharing of objects through shmem is simplest if the data is just POD (Plain Old Data). Nothing fancy.
Because you are in different processes that are not sharing the whole address space, you may not be guaranteed that things like virtual functions will appear at the same address in all processes using the shm shared memory segment. So probably best to avoid virtual functions. (If you try hard and/or know linkage, you may in some circumstances be able to share virtual functions. But that is one of the first things I would disable if I had to debug.)
You should only do this if you are aware of your implementation's object memory model. And if advanced (for C++) optimizations like splitting structs into discontiguous hot and cold parts are disabled. Since such optimizations rae arguably not legal for C++, you are probably safe.
Obviously you are better off if you are casting to the same object type/class on all sides.
You can get away with non-virtual functions. However, note that it can be quite easy to have the same class, but different versions of the class - e.g. differing in size, e.g. adding a new field and changing the offsets of all of the other fields - so you need to be quite careful to ensure all sides are using the same definitions and declarations.

Is Memory Layout for a class successive?

When we declare object of a class is its memory layout successive(One after the other)?If its successive than does padding occurs in it (like structure padding)?Please help me out with the concepts of memory layout for a class
Thanks in advance.
When we declare object of a class is
its memory allocation successive(One
after the other)?
The Standard doesn't give any such guarantee. Object memory layout is implementation-defined.
Usually, memory address for data members increases in the order they're defined in the class . But this order may be disrupted at any place where the access-specifiers (private, protected, public) are encountered. This has been discussed in great detail in Inside the C++ Object Model by Lippman.
An excerpt from C/C++ Users Journal,
The compiler isn't allowed to do this
rearrangement itself, though. The
standard requires that all data that's
in the same public:, protected:, or
private: must be laid out in that
order by the compiler. If you
intersperse your data with access
specifiers, though, the compiler is
allowed to rearrange the
access-specifier-delimited blocks of
data to improve the layout, which is
why some people like putting an access
specifier in front of every data
member.
Interesting, isn't it?

C++ object in memory

Is there a standard in storing a C++ objects in memory? I wish to set a char* pointer to a certain address in memory, so that I can read certain objects' variables directly from the memory byte by byte. When I am using Dev C++, the variables are stored one by one right in the memory address of an object in the order that they were defined. Now, can it be different while using a different compiler (like the variables being in a different order, or somewhere else)? Thank you in advance. :-)
The variables can't be in a different order, as far as I know. However, there may be varying amounts of padding between members. Also I think all bets are off with virtual classes and different implementations of user-defined types (such as std::string) may be completely different between libraries (or even build options).
It seems like a very suspicious thing to do. What do you need it for: to access private members?
I believe that the in-memory layout of objects is implementation defined - not the ordering, necessarily, but the amount of space. In particular, you will probably run into issues with byte-alignment and so-forth, especially across platforms.
Can you give us some details of what you're trying to do?
Implementations are free to do anything they want :P. However since C++ has to appeal to certain styles of programming, you will find a deterministic way of accessing your fields for your specific compiler/platform/cpu architecture.
If your byte ordering is varied on a different compiler, my first assumption would be byte packing issues. If you need the class to have a certain specific byte ordering first look up "#pragma pack" directives for your compiler... you can change the packing order into something less optimal but deterministic. Please note this piece of advice generally applies to POD data types.
The C++ compiler is not allowed to reorder variables within a visibility block (public, protected, etc). But it is allowed to reorder variables in separate visibility blocks. For example:
struct A {
int a;
short b;
char c;
};
struct B {
int a;
public:
short b;
protected:
char c;
};
In the above, the variables in A will always be laid out in the order a, b, c. The variables in B might be laid out in another order if the compiler chose. And, of course, there are alignment and packing requirements so there might be "spaces" between some of the variables if needed.
Keep in mind when working with multi-dimensional arrays that they are stored in Row Major Order.
The order of the variables should never change, but as others have said, the byte packing will vary. Another thing to consider is the endianness of the platform.
To get around the byte alignment/packing problem, most compilers offer some way to guide the process. In gcc you could use __attribute__((__packed__)) and in msvc #pragma pack.
I've worked with something that did this professionally, and as far as I could tell, it worked very specifically because it was decoding something another tool encoded, so we always knew exactly how it worked.
We did also use structs that we pointed at a memory address, then read out data via the struct's variables, but the structs notably included packing and we were on an embedded platform.
Basically, you can do this, so long as you know -exactly- how everything is constructed on a byte-by-byte level. (You might be able to get away with knowing when it's constructed the same way, which could save some time and learning)
It sounds like you want to marshall objects between machines over a TCP/IP connection. You can probably get away with this if the code was compiled with the same compiler on each end, otherwise, I'm not so sure. Keep in mind that if the platforms can be different, then you might need to take into account different processor endians!
Sounds like what you real want to ask is how to serialize your objects
http://dieharddeveloper.blogspot.in/2013/07/c-memory-layout-and-process-image.html
In the middle of the process's address space, there is a region is reserved for shared objects. When a new process is created, the process manager first maps the two segments from the executable into memory. It then decodes the program's ELF header. If the program header indicates that the executable was linked against a shared library, the process manager (PM) will extract the name of the dynamic interpreter from the program header. The dynamic interpreter points to a shared library that contains the runtime linker code.