Reinterpret_cast vs placement new

Reinterpret_cast vs placement new - c++

From reading this post, it is clear that placement news in c++ are used to call a class constructor on a pre-allocated memory location.
In the case that the memory is already initialized, is a placement new or a reinterpret_cast more appropriate?
For example, let's say I read a raw stream of bytes representing a framed message from a TCP socket. I put this stream into a framesync and retrieve a buffer of a known size that represents my class, which I'll call Message. I know of two ways to proceed.
Create a constructor that takes a flag telling the class not to initialize. Do a placement new on the buffer passing the "don't initialize" flag.
Message::Message( bool initialize )
{
//
// Initialize if requested
//
if( initialize )
{
Reset( );
}
}
void Message::Reset( void )
{
m_member1 = 1;
m_member2 = 2;
}
Message* message = new ( buffer ) Message( false );
Use a reinterpret_cast
Message* message = reinterpret_cast< Message* > ( buffer );
I believe that both of these will produce an identical result. Is one preferred over the other as more correct, more OO, safer, easier to read, or better style?

The only meaningful rule is this:
If an instance of some type T has already been constructed at address a, then reinterpret_cast<T*>(a) to get a pointer to the object that already exists.
If an instance of some type T has not yet been constructed at address a, then use placement new to construct an instance of type T at addres a.
They are completely different operations.
The question you need to ask is very, very simple: "does the object already exist?"
If yes, you can access it (via a cast). If no, then you need to construct it (via placement new)
The two operations have nothing to do with each others.
It's not a question of which one you should prefer, because they do different things. You should prefer the one which does what you want.

I would say neither.
Using placement new and having a special method of construction seems like a hack. For one thing the standard says that, for example, an int class member that's not initialized has 'indeterminate value' and accessing it 'may' result in undefined behavior. It's not specified that the int will assume the value of the unmodified underlying bytes interpreted as an int. I don't think that there's anything that prevents a conforming implementation from zero initializing the memory before calling the constructor.
For this use of reinterpret_cast to be well defined you have to jump through some hoops, and even then using the resulting object will probably violate strict aliasing rules.
More practically, if you directly send the implementation-specified representation of a class across the network you'll be relying on the the communicating systems having compatible layouts (compatible representations, alignment, etc.).
Instead you should do real serialization and deserialization, for example by using memcpy() and ntoh() to get the data from the buffer into the members of an existing object.
struct Message {
uint32_t m_member1;
uint16_t m_member2;
};
extern char *buffer;
Message m;
memcpy(&m.m_member1, buffer, sizeof m.m_member1);
m.m_member1 = ntohl(m.m_member1);
buffer += sizeof m.m_member1;
memcpy(&m.m_member2, buffer, sizeof m.m_member2);
m.m_member2 = ntohs(m.m_member2);
buffer += sizeof m.m_member2;
If you don't just use a preexisting library you'll probably want to wrap this stuff up in a framework of your own.
This way you don't have to deal with alignment issues, the network representation is well defined and can be passed between differing implementations, and the program doesn't use technically undefined behavior.

Related

Initializing an array of trivially_copyable but not default_constructible objects from bytes. Confusion in [intro.object]

We are initializing (large) arrays of trivially_copiable objects from secondary storage, and questions such as this or this leaves us with little confidence in our implemented approach.
Below is a minimal example to try to illustrate the "worrying" parts in the code.
Please also find it on Godbolt.
Example
Let's have a trivially_copyable but not default_constructible user type:
struct Foo
{
Foo(double a, double b) :
alpha{a},
beta{b}
{}
double alpha;
double beta;
};
Trusting cppreference:
Objects of trivially-copyable types that are not potentially-overlapping subobjects are the only C++ objects that may be safely copied with std::memcpy or serialized to/from binary files with std::ofstream::write()/std::ifstream::read().
Now, we want to read a binary file into an dynamic array of Foo. Since Foo is not default constructible, we cannot simply:
std::unique_ptr<Foo[]> invalid{new Foo[dynamicSize]}; // Error, no default ctor
Alternative (A)
Using uninitialized unsigned char array as storage.
std::unique_ptr<unsigned char[]> storage{
new unsigned char[dynamicSize * sizeof(Foo)] };
input.read(reinterpret_cast<char *>(storage.get()), dynamicSize * sizeof(Foo));
std::cout << reinterpret_cast<Foo *>(storage.get())[index].alpha << "\n";
Is there an UB because object of actual type Foo are never explicitly created in storage?
Alternative (B)
The storage is explicitly typed as an array of Foo.
std::unique_ptr<Foo[]> storage{
static_cast<Foo *>(::operator new[](dynamicSize * sizeof(Foo))) };
input.read(reinterpret_cast<char *>(storage.get()), dynamicSize * sizeof(Foo));
std::cout << storage[index].alpha << "\n";
This alternative was inspired by this post. Yet, is it better defined? It seems there are still no explicit creation of object of type Foo.
It is notably getting rid of the reinterpret_cast when accessing the Foo data member (this cast might have violated the Type Aliasing rule).
Overall Questions
Are any of these alternatives defined by the standard? Are they actually different?
If not, is there a correct way to implement this (without first initializing all Foo instances to values that will be discarded immediately after)
Is there any difference in undefined behaviours between versions of the C++ standard?
(In particular, please see this comment with regard to C++20)

What you're trying to do ultimately is create an array of some type T by memcpying bytes from elsewhere without default constructing the Ts in the array first.
Pre-C++20 cannot do this without provoking UB at some point.
The problem ultimately comes down to [intro.object]/1, which defines the ways objects get created:
An object is created by a definition, by a new-expression, when implicitly changing the active member of a union, or when a temporary object is created ([conv.rval], [class.temporary]).
If you have a pointer of type T*, but no T object has been created in that address, you can't just pretend that the pointer points to an actual T. You have to cause that T to come into being, and that requires doing one of the above operations. And the only available one for your purposes is the new-expression, which requires that the T is default constructible.
If you want to memcpy into such objects, they must exist first. So you have to create them. And for arrays of such objects, that means they need to be default constructible.
So if it is at all possible, you need a (likely defaulted) default constructor.
In C++20, certain operations can implicitly create objects (provoking "implicit object creation" or IOC). IOC only works on implicit lifetime types, which for classes:
A class S is an implicit-lifetime class if it is an aggregate or has at least one trivial eligible constructor and a trivial, non-deleted destructor.
Your class qualifies, as it has a trivial copy constructor (which is "eligible") and a trivial destructor.
If you create an array of byte-wise types (unsigned char, std::byte, or char), this is said to "implicitly create objects" in that storage. This property also applies to the memory returned by malloc and operator new. This means that if you do certain kinds of undefined behavior to pointers to that storage, the system will automatically create objects (at the point where the array was created) that would make that behavior well-defined.
So if you allocate such storage, cast a pointer to it to a T*, and then start using it as though it pointed to a T, the system will automatically create Ts in that storage, so long as it was appropriately aligned.
Therefore, your alternative A works just fine:
When you apply [index] to your casted pointer, C++ will retroactively create an array of Foo in that storage. That is, because you used the memory like an array of Foo exists there, C++20 will make an array of Foo exist there, exactly as if you had created it back at the new unsigned char statement.
However, alternative B will not work as is. You did not use new[] Foo to create the array, so you cannot use delete[] Foo to delete it. You can still use unique_ptr, but you'll have to create a deleter that explicitly calls operator delete on the pointer:
struct mem_delete
{
template<typename T>
void operator(T *ptr)
{
::operator delete[](ptr);
}
};
std::unique_ptr<Foo[], mem_delete> storage{
static_cast<Foo *>(::operator new[](dynamicSize * sizeof(Foo))) };
input.read(reinterpret_cast<char *>(storage.get()), dynamicSize * sizeof(Foo));
std::cout << storage[index].alpha << "\n";
Again, storage[index] creates an array of T as if it were created at the time the memory was allocated.

My first question is: What are you trying to achieve?
Is there an issue with reading each entry individually?
Are you assuming that your code will speed up by reading an array?
Is latency really a factor?
Why can't you just add a default constructor to the class?
Why can't you enhance input.read() to read directly into an array? See std::extent_v<T>
Assuming the constraints you defined, I would start with writing it the simple way, reading one entry at a time, and benchmark it.
Having said that, that which you describe is a common paradigm and, yes, can break a lot of rules.
C++ is very (overly) cautious about things like alignment which can be issues on certain platforms and non-issues on others. This is only "undefined behaviour" because no cross-platform guarantees can be given by the C++ standard itself, even though many techniques work perfectly well in practice.
The textbook way to do this is to create an empty buffer and memcpy into a proper object, but as your input is serialised (potentially by another system), there isn't actually a guarantee that the padding and alignment will match the memory layout which the local compiler determined for the sequence so you would still have to do this one item at a time.
My advice is to write a unit-test to ensure that there are no issues and potentially embed that into the code as a static assertion. The technique you described breaks some C++ rules but that doesn't mean it's breaking, for example, x86 rules.

Alternative (A): Accessing a —non-static— member of an object before its lifetime begins.
The behavior of the program is undefined (See: [basic.life]).
Alternative (B): Implicit call to the implicitly deleted default constructor.
The program is ill-formed (See: [class.default.ctor]).
I'm not sure about the latter. If someone more knowledgeable knows if/why this is UB please correct me.

You can manage the memory yourself, and then return a unique_ptr which uses a custom deleter. Since you can't use new[], you can't use the plain version of unique_ptr<T[]> and you need to manually call the destructor and deleter using an allocator.
template <class Allocator = std::allocator<Foo>>
struct FooDeleter : private Allocator {
using pointer = typename std::allocator_traits<Allocator>::pointer;
explicit FooDeleter(const Allocator &alloc, len) : Allocator(alloc), len(len) {}
void operator()(pointer p) {
for (pointer i = p; i != p + len; ++i) {
Allocator::destruct(i);
}
Allocator::deallocate(p, len);
}
size_t len;
};
std::unique_ptr<Foo[], FooDeleter<>> create(size_t len) {
std::allocator<Foo> alloc;
Foo *p = nullptr, *i = nullptr;
try {
p = alloc.allocate(len);
for (i = p; i != p + len; ++i) {
alloc.construct(i , 1.0f, 2.0f);
}
} catch (...) {
while (i > p) {
alloc.destruct(i--);
}
if (p)
alloc.deallocate(p);
throw;
}
return std::unique_ptr<Foo[], FooDeleter<>>{p, FooDeleter<>(alloc, len)};
}

C++, Resetting class members without per-member-assignment

In plain C it's common to reset a struct after instantiation:
struct MyClass obj;
memset( &obj, 0, sizeof(struct MyClass) );
This is convenient - especially when using an object oriented paradigm, since all members are guaranteed to be reset to null etc. no matter how many members are added over time.
I'm looking for a way to do the same in C++. Obviously you can't simply reset the memory since the vtable is part of it. Also, in my particular case I can't use templates.
One solution I've seen is to declare a struct with all members, which you in turn can reset in a single blow:
class MyClass{
MyClass(){ memset(&m, 0, sizeof(m)); }
struct{
int member;
} m;
};
I'm however not very fond of this solution.
I guess "hacks" are available, and if you know one, please also say something about the risks of using it, e.g. if it can differ between compilers etc.
Thanks

If you want to assure that you allocated a memory block with zeros you can use a placement new operator:
size_t sz = sizeof(MyClass);
char *buf = new char[sz];
memset(buf, 0, sz);
MyClass* instance = new (buf) MyClass;

Not to be a spoilsport, but why don't you simply add an initializer list for the members that need a defined value for the object to be valid, and let the compiler figure out the ideal way to initialize the object.
Depending on the type of the object, this can save quite an amount of code size and/or time, plus it is safe even if the "empty" representation for a certain type is not all-zeros. For example, I have hacked a compiler once that the NULL pointer is 0x02000000, and converting between integers and pointers XORs with that value; your program would then initialize any pointer members to non-NULL values.

memset() or value initialization to zero out a struct?

In Win32 API programming it's typical to use C structs with multiple fields. Usually only a couple of them have meaningful values and all others have to be zeroed out. This can be achieved in either of the two ways:
STRUCT theStruct;
memset( &theStruct, 0, sizeof( STRUCT ) );
or
STRUCT theStruct = {};
The second variant looks cleaner - it's a one-liner, it doesn't have any parameters that could be mistyped and lead to an error being planted.
Does it have any drawbacks compared to the first variant? Which variant to use and why?

Those two constructs a very different in their meaning. The first one uses a memset function, which is intended to set a buffer of memory to certain value. The second to initialize an object. Let me explain it with a bit of code:
Lets assume you have a structure that has members only of POD types ("Plain Old Data" - see What are POD types in C++?)
struct POD_OnlyStruct
{
int a;
char b;
};
POD_OnlyStruct t = {}; // OK
POD_OnlyStruct t;
memset(&t, 0, sizeof t); // OK as well
In this case writing a POD_OnlyStruct t = {} or POD_OnlyStruct t; memset(&t, 0, sizeof t) doesn't make much difference, as the only difference we have here is the alignment bytes being set to zero-value in case of memset used. Since you don't have access to those bytes normally, there's no difference for you.
On the other hand, since you've tagged your question as C++, let's try another example, with member types different from POD:
struct TestStruct
{
int a;
std::string b;
};
TestStruct t = {}; // OK
{
TestStruct t1;
memset(&t1, 0, sizeof t1); // ruins member 'b' of our struct
} // Application crashes here
In this case using an expression like TestStruct t = {} is good, and using a memset on it will lead to crash. Here's what happens if you use memset - an object of type TestStruct is created, thus creating an object of type std::string, since it's a member of our structure. Next, memset sets the memory where the object b was located to certain value, say zero. Now, once our TestStruct object goes out of scope, it is going to be destroyed and when the turn comes to it's member std::string b you'll see a crash, as all of that object's internal structures were ruined by the memset.
So, the reality is, those things are very different, and although you sometimes need to memset a whole structure to zeroes in certain cases, it's always important to make sure you understand what you're doing, and not make a mistake as in our second example.
My vote - use memset on objects only if it is required, and use the default initialization x = {} in all other cases.

Depending on the structure members, the two variants are not necessarily equivalent. memset will set the structure to all-bits-zero whereas value initialization will initialize all members to the value zero. The C standard guarantees these to be the same only for integral types, not for floating-point values or pointers.
Also, some APIs require that the structure really be set to all-bits-zero. For instance, the Berkeley socket API uses structures polymorphically, and there it is important to really set the whole structure to zero, not just the values that are apparent. The API documentation should say whether the structure really needs to be all-bits-zero, but it might be deficient.
But if neither of these, or a similar case, applies, then it's up to you. I would, when defining the structure, prefer value initialization, as that communicates the intent more clearly. Of course, if you need to zeroize an existing structure, memset is the only choice (well, apart from initializing each member to zero by hand, but that wouldn't normally be done, especially for large structures).

If your struct contains things like :
int a;
char b;
int c;
Then bytes of padding will be inserted between b and c. memset will zero those, the other way will not, so there will be 3 bytes of garbage (if your ints are 32 bits). If you intend to use your struct to read/write from a file, this might be important.

I would use value initialization because it looks clean and less error prone as you mentioned. I don't see any drawback in doing it.
You might rely on memset to zero out the struct after it has been used though.

Not that it's common, but I guess the second way also has the benefit of initializing floats to zero, while doing a memset would certainly not.

The value initialization is prefered because it can be done at compile time.
Also it correctly 0 initializes all POD types.
The memset is done at runtime.
Also using memset is suspect if the struct is not POD.
Does not correctly initialize (to zero) non int types.

In some compilers STRUCT theStruct = {}; would translate to memset( &theStruct, 0, sizeof( STRUCT ) ); in the executable. Some C functions are already linked in to do runtime setup so the compiler have these library functions like memset/memcpy available to use.

If there are lots of pointer members and you are likely to add more in the future, it can help to use memset. Combined with appropriate assert(struct->member) calls you can avoid random crashes from trying to deference a bad pointer that you forgot to initialize. But if you're not as forgetful as me, then member-initialization is probably the best!
However, if your struct is being used as part of a public API, you should get client code to use memset as a requirement. This helps with future proofing, because you can add new members and the client code will automatically NULL them out in the memset call, rather than leaving them in a (possibly dangerous) uninitialized state. This is what you do when working with socket structures for example.

Managing C++ objects in a buffer, considering the alignment and memory layout assumptions

I am storing objects in a buffer. Now I know that I cannot make assumptions about the memory layout of the object.
If I know the overall size of the object, is it acceptible to create a pointer to this memory and call functions on it?
e.g. say I have the following class:
[int,int,int,int,char,padding*3bytes,unsigned short int*]
1)
if I know this class to be of size 24 and I know the address of where it starts in memory
whilst it is not safe to assume the memory layout is it acceptible to cast this to a pointer and call functions on this object which access these members?
(Does c++ know by some magic the correct position of a member?)
2)
If this is not safe/ok, is there any other way other than using a constructor which takes all of the arguments and pulling each argument out of the buffer one at a time?
Edit: Changed title to make it more appropriate to what I am asking.

You can create a constructor that takes all the members and assigns them, then use placement new.
class Foo
{
int a;int b;int c;int d;char e;unsigned short int*f;
public:
Foo(int A,int B,int C,int D,char E,unsigned short int*F) : a(A), b(B), c(C), d(D), e(E), f(F) {}
};
...
char *buf = new char[sizeof(Foo)]; //pre-allocated buffer
Foo *f = new (buf) Foo(a,b,c,d,e,f);
This has the advantage that even the v-table will be generated correctly. Note, however, if you are using this for serialization, the unsigned short int pointer is not going to point at anything useful when you deserialize it, unless you are very careful to use some sort of method to convert pointers into offsets and then back again.
Individual methods on a this pointer are statically linked and are simply a direct call to the function with this being the first parameter before the explicit parameters.
Member variables are referenced using an offset from the this pointer. If an object is laid out like this:
0: vtable
4: a
8: b
12: c
etc...
a will be accessed by dereferencing this + 4 bytes.

Basically what you are proposing doing is reading in a bunch of (hopefully not random) bytes, casting them to a known object, and then calling a class method on that object. It might actually work, because those bytes are going to end up in the "this" pointer in that class method. But you're taking a real chance on things not being where the compiled code expects it to be. And unlike Java or C#, there is no real "runtime" to catch these sorts of problems, so at best you'll get a core dump, and at worse you'll get corrupted memory.
It sounds like you want a C++ version of Java's serialization/deserialization. There is probably a library out there to do that.

Non-virtual function calls are linked directly just like a C function. The object (this) pointer is passed as the first argument. No knowledge of the object layout is required to call the function.

It sounds like you're not storing the objects themselves in a buffer, but rather the data from which they're comprised.
If this data is in memory in the order the fields are defined within your class (with proper padding for the platform) and your type is a POD, then you can memcpy the data from the buffer to a pointer to your type (or possibly cast it, but beware, there are some platform-specific gotchas with casts to pointers of different types).
If your class is not a POD, then the in-memory layout of fields is not guaranteed, and you shouldn't rely on any observed ordering, as it is allowed to change on each recompile.
You can, however, initialize a non-POD with data from a POD.
As far as the addresses where non-virtual functions are located: they are statically linked at compile time to some location within your code segment that is the same for every instance of your type. Note that there is no "runtime" involved. When you write code like this:
class Foo{
int a;
int b;
public:
void DoSomething(int x);
};
void Foo::DoSomething(int x){a = x * 2; b = x + a;}
int main(){
Foo f;
f.DoSomething(42);
return 0;
}
the compiler generates code that does something like this:
function main:
allocate 8 bytes on stack for object "f"
call default initializer for class "Foo" (does nothing in this case)
push argument value 42 onto stack
push pointer to object "f" onto stack
make call to function Foo_i_DoSomething#4 (actual name is usually more complex)
load return value 0 into accumulator register
return to caller
function Foo_i_DoSomething#4 (located elsewhere in the code segment)
load "x" value from stack (pushed on by caller)
multiply by 2
load "this" pointer from stack (pushed on by caller)
calculate offset of field "a" within a Foo object
add calculated offset to this pointer, loaded in step 3
store product, calculated in step 2, to offset calculated in step 5
load "x" value from stack, again
load "this" pointer from stack, again
calculate offset of field "a" within a Foo object, again
add calculated offset to this pointer, loaded in step 8
load "a" value stored at offset,
add "a" value, loaded int step 12, to "x" value loaded in step 7
load "this" pointer from stack, again
calculate offset of field "b" within a Foo object
add calculated offset to this pointer, loaded in step 14
store sum, calculated in step 13, to offset calculated in step 16
return to caller
In other words, it would be more or less the same code as if you had written this (specifics, such as name of DoSomething function and method of passing this pointer are up to the compiler):
class Foo{
int a;
int b;
friend void Foo_DoSomething(Foo *f, int x);
};
void Foo_DoSomething(Foo *f, int x){
f->a = x * 2;
f->b = x + f->a;
}
int main(){
Foo f;
Foo_DoSomething(&f, 42);
return 0;
}

A object having POD type, in this case, is already created (Whether or not you call new. Allocating the required storage already suffices), and you can access the members of it, including calling a function on that object. But that will only work if you precisely know the required alignment of T, and the size of T (the buffer may not be smaller than it), and the alignment of all the members of T. Even for a pod type, the compiler is allowed to put padding bytes between members, if it wants. For a non-POD types, you can have the same luck if your type has no virtual functions or base classes, no user defined constructor (of course) and that applies to the base and all its non-static members too.
For all other types, all bets are off. You have to read values out first with a POD, and then initialize a non-POD type with that data.

I am storing objects in a buffer. ... If I know the overall size of the object, is it acceptable to create a pointer to this memory and call functions on it?
This is acceptable to the extent that using casts is acceptable:
#include <iostream>
namespace {
class A {
int i;
int j;
public:
int value()
{
return i + j;
}
};
}
int main()
{
char buffer[] = { 1, 2 };
std::cout << reinterpret_cast<A*>(buffer)->value() << '\n';
}
Casting an object to something like raw memory and back again is actually pretty common, especially in the C world. If you're using a class hierarchy, though, it would make more sense to use pointer to member functions.
say I have the following class: ...
if I know this class to be of size 24 and I know the address of where it starts in memory ...
This is where things get difficult. The size of an object includes the size of its data members (and any data members from any base classes) plus any padding plus any function pointers or implementation-dependent information, minus anything saved from certain size optimizations (empty base class optimization). If the resulting number is 0 bytes, then the object is required to take at least one byte in memory. These things are a combination of language issues and common requirements that most CPUs have regarding memory accesses. Trying to get things to work properly can be a real pain.
If you just allocate an object and cast to and from raw memory you can ignore these issues. But if you copy an object's internals to a buffer of some sort, then they rear their head pretty quickly. The code above relies on a few general rules about alignment (i.e., I happen to know that class A will have the same alignment restrictions as ints, and thus the array can be safely cast to an A; but I couldn't necessarily guarantee the same if I were casting parts of the array to A's and parts to other classes with other data members).
Oh, and when copying objects you need to make sure you're properly handling pointers.
You may also be interested in things like Google's Protocol Buffers or Facebook's Thrift.
Yes these issues are difficult. And, yes, some programming languages sweep them under the rug. But there's an awful lot of stuff getting swept under the rug:
In Sun's HotSpot JVM, object storage is aligned to the nearest 64-bit boundary. On top of this, every object has a 2-word header in memory. The JVM's word size is usually the platform's native pointer size. (An object consisting of only a 32-bit int and a 64-bit double -- 96 bits of data -- will require) two words for the object header, one word for the int, two words for the double. That's 5 words: 160 bits. Because of the alignment, this object will occupy 192 bits of memory.
This is because Sun is relying on a relatively simple tactic for memory alignment issues (on an imaginary processor, a char may be allowed to exist at any memory location, an int at any location that is divisible by 4, and a double may need to be allocated only on memory locations that are divisible by 32 -- but the most restrictive alignment requirement also satisfies every other alignment requirement, so Sun is aligning everything according to the most restrictive location).
Another tactic for memory alignment can reclaim some of that space.

If the class contains no virtual functions (and therefore class instances have no vptr), and if you make correct assumptions about the way in which the class' member data is laid out in memory, then doing what you're suggesting might work (but might not be portable).
Yes, another way (more idiomatic but not much safer ... you still need to know how the class lays out its data) would be to use the so-called "placement operator new" and a default constructor.

That depends upon what you mean by "safe". Any time you cast a memory address into a point in this way you are bypassing the type safety features provided by the compiler, and taking the responsibility to yourself. If, as Chris implies, you make an incorrect assumption about the memory layout, or compiler implementation details, then you will get unexpected results and loose portability.
Since you are concerned about the "safety" of this programming style it is likely worth your while to investigate portable and type-safe methods such as pre-existing libraries, or writing a constructor or assignment operator for the purpose.

What is the Performance, Safety, and Alignment of a Data member hidden in an embedded char array in a C++ Class?

I have seen a codebase recently that I fear is violating alignment constraints. I've scrubbed it to produce a minimal example, given below. Briefly, the players are:
Pool. This is a class which allocates memory efficiently, for some definition of 'efficient'. Pool is guaranteed to return a chunk of memory that is aligned for the requested size.
Obj_list. This class stores homogeneous collections of objects. Once the number of objects exceeds a certain threshold, it changes its internal representation from a list to a tree. The size of Obj_list is one pointer (8 bytes on a 64-bit platform). Its populated store will of course exceed that.
Aggregate. This class represents a very common object in the system. Its history goes back to the early 32-bit workstation era, and it was 'optimized' (in that same 32-bit era) to use as little space as possible as a result. Aggregates can be empty, or manage an arbitrary number of objects.
In this example, Aggregate items are always allocated from Pools, so they are always aligned. The only occurrences of Obj_list in this example are the 'hidden' members in Aggregate objects, and therefore they are always allocated using placement new. Here are the support classes:
class Pool
{
public:
Pool();
virtual ~Pool();
void *allocate(size_t size);
static Pool *default_pool(); // returns a global pool
};
class Obj_list
{
public:
inline void *operator new(size_t s, void * p) { return p; }
Obj_list(const Args *args);
// when constructed, Obj_list will allocate representation_p, which
// can take up much more space.
~Obj_list();
private:
Obj_list_store *representation_p;
};
And here is Aggregate. Note that member declaration member_list_store_d:
// Aggregate is derived from Lesser, which is twelve bytes in size
class Aggregate : public Lesser
{
public:
inline void *operator new(size_t s) {
return Pool::default_pool->allocate(s);
}
inline void *operator new(size_t s, Pool *h) {
return h->allocate(s);
}
public:
Aggregate(const Args *args = NULL);
virtual ~Aggregate() {};
inline const Obj_list *member_list_store_p() const;
protected:
char member_list_store_d[sizeof(Obj_list)];
};
It is that data member that I'm most concerned about. Here is the pseudocode for initialization and access:
Aggregate::Aggregate(const Args *args)
{
if (args) {
new (static_cast<void *>(member_list_store_d)) Obj_list(args);
}
else {
zero_out(member_list_store_d);
}
}
inline const Obj_list *Aggregate::member_list_store_p() const
{
return initialized(member_list_store_d) ? (Obj_list *) &member_list_store_d : 0;
}
You may be tempted to suggest that we replace the char array with a pointer to the Obj_list type, initialized to NULL or an instance of the class. This gives the proper semantics, but just shifts the memory cost around. If memory were still at a premium (and it might be, this is an EDA database representation), replacing the char array with a pointer to an Obj_list would cost one more pointer in the case when Aggregate objects do have members.
Besides that, I don't really want to get distracted from the main question here, which is alignment. I think the above construct is problematic, but can't really find more in the standard than some vague discussion of the alignment behavior of the 'system/library' new.
So, does the above construct do anything more than cause an occasional pipe stall?
Edit: I realize that there are ways to replace the approach using the embedded char array. So did the original architects. They discarded them because memory was at a premium. Now, if I have a reason to touch that code, I'll probably change it.
However, my question, about the alignment issues inherent in this approach, is what I hope people will address. Thanks!

Ok - had a chance to read it properly. You have an alignment problem, and invoke undefined behaviour when you access the char array as an Obj_list. Most likely your platform will do one of three things: let you get away with it, let you get away with it at a runtime penalty or occasionally crash with a bus error.
Your portable options to fix this are:
allocate the storage with malloc or
a global allocation function, but
you think this is too
expensive.
as Arkadiy says, make your buffer an Obj_list member:
Obj_list list;
but you now don't want to pay the cost of construction. You could mitigate this by providing an inline do-nothing constructor to be used only to create this instance - as posted the default constructor would do. If you follow this route, strongly consider invoking the dtor
list.~Obj_list();
before doing a placement new into this storage.
Otherwise, I think you are left with non portable options: either rely on your platform's tolerance of misaligned accesses, or else use any nonportable options your compiler gives you.
Disclaimer: It's entirely possible I'm missing a trick with unions or some such. It's an unusual problem.

The alignment will be picked by the compiler according to its defaults, this will probably end up as four-bytes under GCC / MSVC.
This should only be a problem if there is code (SIMD/DMA) that requires a specific alignment. In this case you should be able to use compiler directives to ensure that member_list_store_d is aligned, or increase the size by (alignment-1) and use an appropriate offset.

Can you simply have an instance of Obj_list inside Aggregate? IOW, something along the lines of
class Aggregate : public Lesser
{
...
protected:
Obj_list list;
};
I must be missing something, but I can't figure why this is bad.
As to your question - it's perfectly compiler-dependent. Most compilers, though, will align every member at word boundary by default, even if the member's type does not need to be aligned that way for correct access.

If you want to ensure alignment of your structures, just do a
// MSVC
#pragma pack(push,1)
// structure definitions
#pragma pack(pop)
// *nix
struct YourStruct
{
....
} __attribute__((packed));
To ensure 1 byte alignment of your char array in Aggregate

Allocate the char array member_list_store_d with malloc or global operator new[], either of which will give storage aligned for any type.
Edit: Just read the OP again - you don't want to pay for another pointer. Will read again in the morning.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reinterpret_cast vs placement new - c++

Related

Initializing an array of trivially_copyable but not default_constructible objects from bytes. Confusion in [intro.object]

C++, Resetting class members without per-member-assignment

memset() or value initialization to zero out a struct?

Managing C++ objects in a buffer, considering the alignment and memory layout assumptions

What is the Performance, Safety, and Alignment of a Data member hidden in an embedded char array in a C++ Class?

Categories

Resources