How are objects stored in memory in C++?
For a regular class such as
class Object
{
public:
int i1;
int i2;
char i3;
int i4;
private:
};
Using a pointer of Object as an array can be used to access i1 as follows?
((Object*)&myObject)[0] === i1?
Other questions on SO seem to suggest that casting a struct to a pointer will point to the first member for POD-types. How is this different for classes with constructors if at all?
Also in what way is it different for non-POD types?
Edit:
In memory therefore would the above class be laid out like the following?
[i1 - 4bytes][i2 - 4bytes][i3 - 1byte][padding - 3bytes][i4 - 4bytes]
Almost. You cast to an Object*, and neglected to take an address. Let's re-ask as the following:
((int*)&myObject)[0] == i1
You have to be really careful with assumptions like this. As you've defined the structure, this should be true in any compiler you're likely to come across. But all sorts of other properties of the object (which you may have omitted from your example) will, as others said, make it non-POD and could (possibly in a compiler-dependent way) make the above statement not true.
Note that I wouldn't be so quick to tell you it would work if you had asked about i3 -- in that case, even for plain POD, alignment or endianness could easily screw you up.
In any case, you should be avoiding this kind of thing, if possible. Even if it works fine now, if you (or anybody else who doesn't understand that you're doing this trick) ever changes the structure order or adds new fields, this trick will fail in all the places you've used it, which may be hard to find.
Answer to your edit: If that's your entire class definition, and you're using one of the mainstream compilers with default options, and running on an x86 processor, then yes, you've probably guessed the right memory layout. But choice of compiler, compiler options, and different CPU architecture could easily invalidate your assumptions.
Classes without virtual members and without inheritance are laid out in memory just like structs. But, when you start getting levels of inheritance things can get tricky and it can be hard to figure out what order things are in memory (particularly multiple inheritance).
When you have virtual members, they have a "vtable" in memory which contains pointers to the actual function which gets created based on the inheritance hierarchy of the class.
The bottom line is: don't access classes this way at all if you can avoid it (and also don't memset them or memcpy them). If you must do this (why?) then take care that you know exactly how your class objects are going to be in memory and be careful to avoid inheritance.
It's difference in that this trick is only valid for POD types. That's really all there is to it. The standard specifies that this cast is valid for a POD type, but makes no guarantees about what happens with non-POD types.
It really depends on the compiler, or rather, it is left up to the compiler to determine the memory layout.
For instance, a mix of public, private, and protected member variables could be laid out such that each access type is contiguous. Or, derived classes could have member variables interleaved with unused space in the super class.
Things get worse with virtual inheritance, where the virtually inherited base classes can be layed out anywhere in the memory allocated for that particular instance.
POD is different because it needs to be compatible with C.
Usually what matters isn't whether the class has a constructor: what matters is whether the class has any virtual methods. For details, google for 'vtable' and 'vptr'.
Related
struct Something
{
int a;
int b;
Something(char* buffer)
{
memcpy(this, buffer, sizeof(Something));
};
};
Is this legal? safe? to me it looks fine but I'm not sure if the C++ standard prohibits it somehow.
... from the fact that it's no longer a POD type after I added the constructor.
That's no fact (merely fake news ;-) ). Adding a constructor doesn't change a structs POD type status.
You can also easily check this with a static_assert:
static_assert( "Something must be a POD type!",std::is_pod(Something)::value);
Is this legal?
I'm not so sure. Depends on the context. In your's it would work and compile without errors or warnings like intended
safe?
Certainly not.
It calls undefined behavior in various ways.
this may consist of more than just the data members. There maybe things like a vtable held.
The compiler is allowed to change the memory layout of the member variables. So padding might occur.
In cause the data is interchanged via a network, endianess comes into play, and has to be considered during de-/serialization
You should note, that any kind of reinterpret_cast (i.e. c-style cast) is giving you undefined behavior to some degree.
You need 100% to know what you're doing, and I recommend to check the emitted assembly output and memory layout every time when you use such constructs.
It is guaranteed to be successfully compiled without error messages. So by definition, it is 100% legal.
In this particular case, it would work as intended. But, if a structure uses virtual functions, it would store vptr, and this thing would not work. You may, for instance, later add a virtual function, and the constructor will stop working. So no, it is not safe.
I found below post
C++ polymorphism without pointers
that explains to have polymorphism feature C++ must use pointer or reference types.
I looked into some further resources all of them says the same but the reason .
Is there any technical difficulty to support polymorphism with values or it is possible but C++ have decided to not to provide that ability ?
The problem with treating values polymorphically boils down to the object slicing issue: since derived objects could use more memory than their base class, declaring a value in the automatic storage (i.e. on the stack) leads to allocating memory only for the base, not for the derived object. Therefore, parts of the object that belong to the derived class may be sliced off. That is why C++ designers made a conscious decision to re-route virtual member-functions to the implementations in the base class, which cannot touch the data members of the derived class.
The difficulty comes from the fact that what you call objects are allocated in automatic memory (on the stack) and the size must be known at compile-time.
Size of pointers are known at compile-time regardless of what they point to, and references are implemented as pointers under the hood, so no worries there.
Consider objects though:
BaseObject obj = ObjectFactory::createDerived();
How much memory should be allocated for obj if createDerived() conditionally returns derived objects? To overcome this, the object returned is sliced and "converted* to a BaseObject whose size is known.
This all stems from the "pay for what you use" mentality.
The short answer is because the standard specifies it. But are there any insurmountable technical barriers to allowing it?
C++ data structures have known size. Polymorphism typically requires that the data structures can vary in size. In general, you cannot store a different (larger) type within the storage of a smaller type, so storing a child class with extra variables (or other reasons to be larger) within storage for a parent class is not generally possible.
Now, we can get around this. We can create a buffer larger than what is required to store the parent class, and construct child classes within that buffer: but in this case, exposure to said instance will be via references, and you will carefully wrap the class.
This is similar to the technique known as "small object optimization" used by boost::any, boost::variant and many implementations of std::string, where we store (by value) objects in a buffer within a class and manage their lifetime manually.
There is also an issue where Derived pointers to an instance can have different values than Base pointers to an instance: value instances of objects in C++ are presumed to exist where the storage for the instance starts by most implementations.
So in theory, C++ could allow polymorphic instances if we restricted it to derived classes that could be stored in the same memory footprint, with the same "pointer to" value for both Derived and Base, but this would be an extremely narrow corner case, and could reduce the kinds of optimizations and assumptions compilers could make about value instances of a class in nearly every case! (Right now, the compiler can assume that value instances of a class C have virtual methods that are not overridden elsewhere, as an example) That is a non trivial cost for an extremely marginal benefit.
What more, we are capable of using the C++ language to emulate this corner case using existing language features (placement new, references, and manual destruction) if we really need it, without imposing that above cost.
It is not immediately clear what you mean by "polymorphism with values". In C++ when you have an object of type A, it always behaves as an object of type A. This is perfectly normal and logical thing to expect. I don't see how it can possible behave in any other way. So, it is not clear what "ability" that someone decided "not to provide" you are talking about.
Polymorphism in C++ means one thing: virtual function calls made through an expression with polymorphic type are resolved in accordance with the dynamic type of that expression (as opposed to static type for non-virtual functions). That's all there is to it.
Polymorphism in C++ always works in accordance with the above rule. It works that way through pointers. It works that way through references. It works that way through immediate objects ("values" as you called them). So, it not not correct to say that polymorphism in C++ only works with pointers and references. It works with "values" as well. They all follow the same rule, as stated above.
However, for an immediate object (a "value") its dynamic type is always the same as it static type. So, even though polymorphism works for immediate values, it does not demonstrate anything truly "polymorphic". The behavior of an immediate object with polymorphism is the same as it would be without polymorphism. So, polymorphism of an immediate object is degenerate, trivial polymorphism. It exists only conceptually. This is, again, perfectly logical: an object of type A should behave as an object of type A. How else can it behave?
In order to observe the actual non-degenerate polymorphism, one needs an expression whose static type is different from its dynamic type. Non-trivial polymorphism is observed when an expression of static type A behaves (with regard to virtual function calls) as an object of different type B. For this an expression of static type A must actually refer to an object of type B. This is only possible with pointers or references. The only way to create that difference between static and dynamic type of an expression is through using pointers or references.
In other words, its not correct to say that polymorphism in C++ only works through pointers or references. It is correct to say is that with pointers or references polymorphism becomes observable and non-trivial.
When should I define a type as a struct or as a class?
I know that struct are value types while classes are reference types. So I wonder, for example, should I define a stack as a struct or a class?
Reason #1 to choose struct vs class: classes have inheritance, structs do not. If you need polymorphism, you must use classes.
Reason #2: structs are normally value types (though you can make them reference types if you work at it). Classes are always reference types. So, if you want a value type, choose a struct. If you want a reference type, it's easiest to go with a class.
Reason #3: If you have a type with a lot of data members, then you're probably going to want a reference type (to avoid expensive copying), in which case, you're probably going to choose a class.
Reason #4: If you want deterministic destruction of your type, then it's going to need to be a struct on the stack. Nothing on the GC heap has deterministic destruction, and the destructiors/finalizers of stuff on the GC heap may never be run. If they're collected by the GC, then their finalizers will be run, but otherwise, they won't. So, if you want your type to automatically be destroyed when it leaves scope, you need to use a struct and put it on the stack.
As for your particular case, containers should normally be reference types (copying all of their elements every time that you pass one around would be insanely expensive), and a Stack is a container, so you're going to want to use a class unless you want to go to the trouble of making it a ref-counted struct, which is decidedly more work. It just has the advantage of guaranteeing that its destructor will run when it's not used anymore.
On a side note, if you create a container which is a class, you're probably going to want to make it final so that its various functions can be inlined (and won't be virtual if that class doesn't derive from anything other than Object and they're not functions that Object has), which can be important for something like a container where performance can definitely matter.
Read "D"iving Into the D Programming Language
In D you get structs and then you get classes. They share many amenities but have different charters: structs are value types, whereas classes are meant for dynamic polymorphism and are accessed solely by reference. That way confusions, slicing-related bugs, and comments à la // No! Do NOT inherit! do not exist. When you design a type, you decide upfront whether it'll be a monomorphic value or a polymorphic reference. C++ famously allows defining ambiguous-gender types, but their use is rare, error-prone, and objectionable enough to warrant simply avoiding them by design.
For your Stack type, you are probably best off defining an interface first and then implementations thereof (using class) so that you don't tie-in a particular implementation of your Stack type to its interface.
Is there any efficiency disadvantage associated with deep inheritance trees (in c++), i.e, a large set of classes A, B, C, and so on, such that B extends A, C extends B, and so one. One efficiency implication that I can think of is, that when we instantiate the bottom most class, say C, then the constructors of B and A are also called, which will have performance implications.
Let's enumerate the operations we should consider:
Construction/destruction
Each constructor/destructor will call its base class equivalents. However, as James McNellis pointed out, you were obviously going to do that work anyway. You didn't derived from A just because it was there. So the work is going to get done one way or another.
Yes, it will involve a few more function calls. But function call overhead will be nothing compared to the actual work any significantly deep class hierarchy will have to actually do. If you're at the point where function call overhead is actually important for performance, I would strongly suggest that calling constructors at all is probably not what you want to be doing in that code.
Object Size
In general, the overhead for a derived class is nothing. The overhead for virtual members is a pointer or for virtual inheritance.
Member Function Calls, Static
By this, I mean calling non-virtual member functions, or calling virtual member functions with class names (ClassName::FunctionName syntax). Both of these allow the compiler to know at compile time which function to call.
The performance of this is invariant with the size of the hierarchy, since it's compile-time determined.
Member Function Calls, Dynamic
This is calling virtual functions with the full and complete expectation of runtime calls.
Under most sane C++ implementations, this is invariant with the size of the object hierarchy. Most implementations use a v-table for each class. Each object has a v-table pointer as a member. For any particular dynamic call, the compiler accesses the v-table pointer, picks out the method, and calls it. Since the v-table is the same for each class, it won't be any slower for a class that has a deep hierarchy than one with a shallow one.
Virtual inheritance plays a bit with this.
Pointer Casts, Static
This refers to static_cast or any equivalent operation. This means the implicit cast from a derived class to a base class, the explicit use of static_cast or C-style casts, etc.
Note that this technically includes reference casting.
The performance of static casts between classes (up or down) is invariant with the size of the hierarchy. Any pointer offsets will be compile-time generated. This should be true for virtual inheritance as well as non-virtual inheritance, but I'm not 100% certain of that.
Pointer Casts, Dynamic
This obviously refers to the explicit use of dynamic_cast. This is typically used when casting from a base class to a derived one.
The performance of dynamic_cast will likely change for a large hierarchy. But sane implementations should only check the classes between the current class and the requested one. So it's simply linear in the number of classes between the two, not linear in the number of classes in the hierarchy.
Typeof
This means the use of the typeof operator to fetch the std::type_info object associated with an object.
The performance of this will be invariant with the size of the hierarchy. If the class is a virtual one (has virtual functions or virtual base classes), then it will simply pull it out of the vtable. If it's not virtual, then it's compile-time defined.
Conclusion
In short, most operations are invariant with the size of the hierarchy. But even in the cases where it has an impact, it's not a problem.
I'd be more concerned with some design ethic where you felt the need to build such a hierarchy. In my experience, hierarchies like this come from two lines of design.
The Java/C# ideal of having everything derived from a common base class. This is a horrible idea in C++ and should never be used. Each object should derive from what it needs to, and only that. C++ was built on the "pay for what you use" principle, and deriving from a common base works against that. In general, anything you could do with such a common base class is either something you shouldn't be doing period, or something that could be done with function overloading (using operator<< to convert to strings, for example).
Misuse of inheritance. Using inheritance when you should be using containment. Inheritance creates an "is a" relationship between objects. More often than not, "has a" relationships (one object having another as a member) are far more useful and flexible. They make it easier to hide data, and you don't allow the user to pretend one class is another.
Make sure that your design does not fall afoul of one of these principles.
There will be but not as bad as the programmer performance implications.
As #Nicol points out, it may be doing a number of things.
If those are things that you require to be done, regardless of design, because they are all precisely necessary steps in getting the program from call main to exit within the fewest possible cycles, then your design is simply a matter of coding clarity (or maybe lack of it :).
In my experience performance tuning, as in this example, what I often see as a huge source of wasted time is over-design of data (i.e. class) structures.
Wierdly enough, the justification for the data structures is often (guess what?) - performance!
In my experience, the thing to do with data structure is keep it as simple as possible and as normalized as possible. If it is completely normalized, then any single change to it can't make it inconsistent. You can't always achieve complete normality, in which case you have to deal with the possibility that the data can be temporarily inconsistent.
This is why people write notification handlers, and this is encouraged in OOP.
The idea is, if you change something in one place, that can trigger notifications that "automatically" propagate the change to other places, trying to maintain consistency.
The problem with notifications is they can run away. Simply changing some boolean property from true to false can cause a fire-storm of notifications ripping through the data structure in ways no one programmer understands, updating databases, painting windows, zipping files, etc. etc. I often find this is where most clock cycles go.
I think it is simpler and far more efficient to temporarily tolerate inconsistency, and periodically repair it with some kind of sweeping process.
Another way data structures go along with huge inefficiency is if the data is effectively being interpreted by some process to produce some output.
This is very common in graphics.
If the data changes at a very slow rate, it may make sense to "compile" it rather than "interpret" it.
In other words, translate it into a simpler instruction set, or source code which is compiled "on the fly", which can then execute far more quickly to produce the desired output.
Given a variable foo of type FooClass* and a member variable in that class named bar, is the distance between foo and &(foo->bar) the same in any situation with some constraints:
FooClass is a non-POD type.
We know that foo will always point to an instance of FooClass, and not some subtype of it.
We only care about behaviour under a single compiler and a single compilation; that is, the value this may result in under gcc is never used in code compiled with MSVC, and it is never saved to be re-used between compilations. It is computed in the binary and used in the binary, and that is it.
We don't use a custom new, although some instances of the class may be stack-allocated and some heap-allocated.
There is no explicit ctor for FooClass; it relies upon the compiler-generated one (and each of the fields in FooClass is either POD or default-constructable).
I can't find a guarantee either way on this in the standard (nor did I expect to), but my rudimentary testing with gcc leads me to believe that it will always be the case there. I also know that this guarantee is made for POD-types, but let us assume this type can't be POD.
An update/clarification: this is just for a single compilation of a single binary; the calculated offsets will never leave that single execution. Basically, I want to be able to uniquely identify the fields of a class in a static map and then be able to lookup into that map for some macro/template/EVIL trickery. It is merely for my own amusement, and no life support machines will rely on this code.
After you have compiled your program, Yes*.
The offset will remain constant.
There is one very important restriction, however: foo must be pointing specifically to a FooClass object. Not a class derived from FooClass, or anything else for that matter.
The reason that C++ makes the POD distinction regarding member offsets is because both multiple inheritance and the location (or lack of) a vtable pointer can create situations where the address of an object is not the same as the address of that object's base.
Under a single compiler where the compiler settings are always the same and there is nothing added to or taken away from FooClass, then yes, the distance between the address stored at foo and &(foo->bar) will always be the same, or the compiler wouldn't be able to generate proper code that worked across compilation units.
However, once you add anything to the class, or change the compiler settings, all bets are off.
As far as I know, this should always be the case, POD class or not. At compile time, based on the compiler, architecture, settings, etc., the compiler determines the size of the class and the offsets of all its members. This is then fixed for all instances of the class in the compilation unit (and by extension the linked unit, if the one-definition rule is preserved).
Since the compiler treats type pointers literally, even if the underlying type is wrong (eg: the pointer has been c-style cast incorrectly), the computed distance between &foo and &(foo.bar) will be the same, since the offset is known statically at compile time.
Note: This has definitely been done before, effectively. See, for example, Microsoft's ATL data binding code using their 'offsetof' macro...
I'm no expert but i gonna try answering you anyway :)
FooClass is a non-POD type. This mean it could have more than one sections of private, public or protected. Within such a section, the order is that of the definition of the members, but across those sections, order is arbitrary and unspecified.
foo will always point to FooClass. Well so we have guarantee there is no offset adjustment done. At least in one compilation, offsets will be the same then (don't have the backing up Standard quote. But it can't work if they were different).
We only care about behavior on a single compiler. Well since the order of members is unspecified across sections of access modifiers and the compiler is allowed to put padding between members, this won't buy us much.
We only care about objects on the stack (automatic storage duration). Well i don't see how that changes anything of the object layout.
So after all i don't think you have any guarantee that the offset will be constant across compilations. For considerations within one compilation (so if we would play with a compiler whose generated code uses an ABI that changes with each different compilation), the offset just can't be different. But even if you know the offset, you can't access the member. Your only way to access a member is using the member access operator -> and . (that's said in 9.2/9).
Why not use data member pointers? They allow accessing members safely. Here is an example: (looking up members by name).
There are two things off the top of my head that will affect the internal layout of an object:
the size of member objects. Hopefully you'll recompile all affected modules if you change a member definition.
packing pragmas may change the padding added between members. Be sure that the packing is the same for all modules that use the class, or at least the section where the class is defined. If you don't, you'll have bigger problems than unpredictable offsets.
Bottom line: If this class contains anything other than PODs, then you can make absolutely no assumptions about the offsets. If the class is just a collection of public PODs, then you're safe.
Here is a link to a portion of a chapter in an excellent intermediate C++ book. I recommend everyone to read this book if you're serious about C++.
This particular excerpt addresses a portion the question presented here:
http://my.safaribooksonline.com/0321321928/ch11?portal=oreilly
For the rest of the details, check out the book. My "bottom line" above is a simplistic summary of this chapter.
Yes, the offset is determined at compile-time, so as long as you're not comparing offsets across compilations or compilers, it will always be constant.