Forwarding to in-place constructor - c++

I have a message class which was previously a bit of a pain to work with, you had to construct the message class, tell it to allocate space for your object and then populate the space either by construction or memberwise.
I want to make it possible to construct the message object with an immediate, inline new of the resulting object, but to do so with a simple syntax at the call site while ensuring copy elision.
#include <cstdint>
typedef uint8_t id_t;
enum class MessageID { WorldPeace };
class Message
{
uint8_t* m_data; // current memory
uint8_t m_localData[64]; // upto 64 bytes.
id_t m_messageId;
size_t m_size; // amount of data used
size_t m_capacity; // amount of space available
// ...
public:
Message(size_t requestSize, id_t messageId)
: m_data(m_localData)
, m_messageId(messageId)
, m_size(0), m_capacity(sizeof(m_localData))
{
grow(requestSize);
}
void grow(size_t newSize)
{
if (newSize > m_capacity)
{
m_data = realloc((m_data == m_localData) ? nullptr : m_data, newSize);
assert(m_data != nullptr); // my system uses less brutal mem mgmt
m_size = newSize;
}
}
template<typename T>
T* allocatePtr()
{
size_t offset = size;
grow(offset + sizeof(T));
return (T*)(m_data + offset);
}
#ifdef USE_CPP11
template<typename T, typename Args...>
Message(id_t messageId, Args&&... args)
: Message(sizeof(T), messageID)
{
// we know m_data points to a large enough buffer
new ((T*)m_data) T (std::forward<Args>(args)...);
}
#endif
};
Pre-C++11 I had a nasty macro, CONSTRUCT_IN_PLACE, which did:
#define CONSTRUCT_IN_PLACE(Message, Typename, ...) \
new ((Message).allocatePtr<Typename>()) Typename (__VA_ARGS__)
And you would say:
Message outgoing(sizeof(MyStruct), MessageID::WorldPeace);
CONSTRUCT_IN_PLACE(outgoing, MyStruct, wpArg1, wpArg2, wpArg3);
With C++11, you would use
Message outgoing<MyStruct>(MessageID::WorldPeace, wpArg1, wpArg2, wpArg3);
But I find this to be messy. What I want to implement is:
template<typename T>
Message(id_t messageId, T&& src)
: Message(sizeof(T), messageID)
{
// we know m_data points to a large enough buffer
new ((T*)m_data) T (src);
}
So that the user uses
Message outgoing(MessageID::WorldPeace, MyStruct(wpArg1, wpArg2, wpArg3));
But it seems that this first constructs a temporary MyStruct on the stack turning the in-place new into a call to the move constructor of T.
Many of these messages are simple, often POD, and they are often in marshalling functions like this:
void dispatchWorldPeace(int wpArg1, int wpArg2, int wpArg3)
{
Message outgoing(MessageID::WorldPeace, MyStruct(wpArg1, wpArg2, wpArg3));
outgoing.send(g_listener);
}
So I want to avoid creating an intermediate temporary that is going to require a subsequent move/copy.
It seems like the compiler should be able to eliminate the temporary and the move and forward the construction all the way down to the in-place new.
What am I doing that is causing it not to? (GCC 4.8.1, Clang 3.5, MSVC 2013)

You won't be able to elide the copy/move in the placement new: copy elision is entirely based on the idea that the compiler knows at construction time where the object will eventually end up. Also, since copy elision actually changes the behavior of the program (after all, it won't call the respective constructor and the destructor even if they have side-effects) copy elision is limited to a few very specific cases (listed in 12.8 [class.copy] paragraph 31: essentially when returning a local variable by name, when throwing a local variable by name, when catching an exception of the correct type by value, and when copying/moving a temporary variable; see the clause for exact details). Since [placement] new is none of the contexts where the copy can be elided and the argument to constructor is clearly not a temporary (it is named), the copy/move will never be elided. Even adding the missing std::forward<T>(...) to your constructor will cause the copy/move to be elided:
template<typename T>
Message(id_t messageId, T&& src)
: Message(sizeof(T), messageID)
{
// placement new take a void* anyway, i.e., no need to cast
new (m_data) T (std::forward<T>(src));
}
I don't think you can explicitly specify a template parameter when calling a constructor. Thus, I think the closest you could probably get without constructing the object ahead of time and getting it copied/moved is something like this:
template <typename>
struct Tag {};
template <typename T, typename A>
Message::Message(Tag<T>, id_t messageId, A... args)
: Message(messageId, sizeof(T)) {
new(this->m_data) T(std::forward<A>(args)...);
}
One approach which might make things a bit nicer is using the id_t to map to the relevant type assuming that there is a mapping from message Ids to the relevant type:
typedef uint8_t id_t;
template <typename T, id_t id> struct Tag {};
struct MessageId {
static constexpr Tag<MyStruct, 1> WorldPeace;
// ...
};
template <typename T, id_t id, typename... A>
Message::Message(Tag<T, id>, A&&... args)
Message(id, sizeof(T)) {
new(this->m_data) T(std::forward<A>)(args)...);
}

Foreword
The conceptual barrier that even C++2049 cannot cross is that you require all the bits that compose your message to be aligned in a contiguous memory block.
The only way C++ can give you that is through the use of the placement new operator. Otherwise, objects will simply be constructed according to their storage class (on the stack or through whatever you define as a new operator).
It means any object you pass to your payload constructor will be first constructed (on the stack) and then used by the constructor (that will most likely copy-construct it).
Avoiding this copy completely is impossible. You may have a forward constructor doing the minimal amount of copy, but still the scalar parameters passed to the initializer will likely be copied, as will any data that the constructor of the initializer deemed necessary to memorize and/or produce.
If you want to be able to pass parameters freely to each of the constructors needed to build the complete message without them being first stored in the parameter objects, it will require
the use of a placement new operator for each of the sub-objects that compose the message,
the memorization of each single scalar parameter passed to the various sub-constructors,
specific code for each object to feed the placement new operator with the proper address and call the constructor of the sub-object.
You will end up with a toplevel message constructor taking all possible initial parameters and dispatching them to the various sub-objects constructors.
I don't even know if this is feasible, but the result would be very fragile and error-prone at any rate.
Is that what you want, just for the benefit of a bit of syntactic sugar?
If you're offering an API, you cannot cover all cases. The best approach is to make something that degrades nicely, IMHO.
The simple solution would be to limit payload constructor parameters to scalar values or implement "in-place sub-construction" for a limited set of message payloads that you can control. At your level you cannot do more than that to make sure the message construction proceeds with no extra copies.
Now the application software will be free to define constructors that take objects as parameters, and then the price to pay will be these extra copies.
Besides, this might be the most efficient approach, if the parameter is something costly to construct (i.e. the construction time is greater than the copy time, so it is more efficient to create a static object and modify it slightly between each message) or if it has a greater lifetime than your function for any reason.
a working, ugly solution
First, let's start with a vintage, template-less solution that does in-place construction.
The idea is to have the message pre-allocate the right kind of memory (local buffer of dynamic) depending on the size of the object.
The proper base address is then passed to a placement new to construct the message contents in place.
#include <cstdint>
#include <cstdio>
#include <new>
typedef uint8_t id_t;
enum class MessageID { WorldPeace, Armaggedon };
#define SMALL_BUF_SIZE 64
class Message {
id_t m_messageId;
uint8_t* m_data;
uint8_t m_localData[SMALL_BUF_SIZE];
public:
// choose the proper location for contents
Message (MessageID messageId, size_t size)
{
m_messageId = (id_t)messageId;
m_data = size <= SMALL_BUF_SIZE ? m_localData : new uint8_t[size];
}
// dispose of the contents if need be
~Message ()
{
if (m_data != m_localData) delete m_data;
}
// let placement new know about the contents location
void * location (void)
{
return m_data;
}
};
// a macro to do the in-place construction
#define BuildMessage(msg, id, obj, ... ) \
Message msg(MessageID::id, sizeof(obj)); \
new (msg.location()) obj (__VA_ARGS__); \
// example uses
struct small {
int a, b, c;
small (int a, int b, int c) :a(a),b(b),c(c) {}
};
struct big {
int lump[1000];
};
int main(void)
{
BuildMessage(msg1, WorldPeace, small, 1, 2, 3)
BuildMessage(msg2, Armaggedon, big)
}
This is just a trimmed down version of your initial code, with no templates at all.
I find it relatively clean and easy to use, but to each his own.
The only inefficiency I see here is the static allocation of 64 bytes that will be useless if the message is too big.
And of course all type information is lost once the messages are constructed, so accessing their contents afterward would be awkward.
About forwarding and construction in place
Basically, the new && qualifier does no magic. To do in-place construction, the compiler needs to know the address that will be used for object storage before calling the constructor.
Once you've invoked an object creation, the memory has been allocated and the && thing will only allow you to use that address to pass ownership of the said memory to another object without resorting to useless copies.
You can use templates to recognize a call to the Message constructor involving a given class passed as message contents, but that will be too late: the object will have been constructed before your constructor can do anything about its memory location.
I can't see a way to create a template on top of the Message class that would defer an object construction until you have decided at which location you want to construct it.
However, you could work on the classes defining the object contents to have some in-place construction automated.
This will not solve the general problem of passing objects to the constructor of the object that will be built in place.
To do that, you would need the sub-objects themselves to be constructed through a placement new, which would mean implementing a specific template interface for each of the initializers, and have each object provide the address of construction to each of its sub-objects.
Now for syntactic sugar.
To make the ugly templating worth the while, you can specialize your message classes to handle big and small messages differently.
The idea is to have a single lump of memory to pass to your sending function. So in case of small messages, the message header and contents are defined as local message properties, and for big ones, extra memory is allocated to include the message header.
Thus the magic DMA used to propell your messages through the system will have a clean data block to work with either way.
Dynamic allocations will still occur once per big message, and never for small ones.
#include <cstdint>
#include <new>
// ==========================================================================
// Common definitions
// ==========================================================================
// message header
enum class MessageID : uint8_t { WorldPeace, Armaggedon };
struct MessageHeader {
MessageID id;
uint8_t __padding; // one free byte here
uint16_t size;
};
// small buffer size
#define SMALL_BUF_SIZE 64
// dummy send function
int some_DMA_trick(int destination, void * data, uint16_t size);
// ==========================================================================
// Macro solution
// ==========================================================================
// -----------------------------------------
// Message class
// -----------------------------------------
class mMessage {
// local storage defined even for big messages
MessageHeader m_header;
uint8_t m_localData[SMALL_BUF_SIZE];
// pointer to the actual message
MessageHeader * m_head;
public:
// choose the proper location for contents
mMessage (MessageID messageId, uint16_t size)
{
m_head = size <= SMALL_BUF_SIZE
? &m_header
: (MessageHeader *) new uint8_t[size + sizeof (m_header)];
m_head->id = messageId;
m_head->size = size;
}
// dispose of the contents if need be
~mMessage ()
{
if (m_head != &m_header) delete m_head;
}
// let placement new know about the contents location
void * location (void)
{
return m_head+1;
}
// send a message
int send(int destination)
{
return some_DMA_trick (destination, m_head, (uint16_t)(m_head->size + sizeof (m_head)));
}
};
// -----------------------------------------
// macro to do the in-place construction
// -----------------------------------------
#define BuildMessage(msg, obj, id, ... ) \
mMessage msg (MessageID::id, sizeof(obj)); \
new (msg.location()) obj (__VA_ARGS__); \
// ==========================================================================
// Template solution
// ==========================================================================
#include <utility>
// -----------------------------------------
// template to check storage capacity
// -----------------------------------------
template<typename T>
struct storage
{
enum { local = sizeof(T)<=SMALL_BUF_SIZE };
};
// -----------------------------------------
// base message class
// -----------------------------------------
class tMessage {
protected:
MessageHeader * m_head;
tMessage(MessageHeader * head, MessageID id, uint16_t size)
: m_head(head)
{
m_head->id = id;
m_head->size = size;
}
public:
int send(int destination)
{
return some_DMA_trick (destination, m_head, (uint16_t)(m_head->size + sizeof (*m_head)));
}
};
// -----------------------------------------
// general message template
// -----------------------------------------
template<bool local_storage, typename message_contents>
class aMessage {};
// -----------------------------------------
// specialization for big messages
// -----------------------------------------
template<typename T>
class aMessage<false, T> : public tMessage
{
public:
// in-place constructor
template<class... Args>
aMessage(MessageID id, Args...args)
: tMessage(
(MessageHeader *)new uint8_t[sizeof(T)+sizeof(*m_head)], // dynamic allocation
id, sizeof(T))
{
new (m_head+1) T(std::forward<Args>(args)...);
}
// destructor
~aMessage ()
{
delete m_head;
}
// syntactic sugar to access contents
T& contents(void) { return *(T*)(m_head+1); }
};
// -----------------------------------------
// specialization for small messages
// -----------------------------------------
template<typename T>
class aMessage<true, T> : public tMessage
{
// message body defined locally
MessageHeader m_header;
uint8_t m_data[sizeof(T)]; // no need for 64 bytes here
public:
// in-place constructor
template<class... Args>
aMessage(MessageID id, Args...args)
: tMessage(
&m_header, // local storage
id, sizeof(T))
{
new (m_head+1) T(std::forward<Args>(args)...);
}
// syntactic sugar to access contents
T& contents(void) { return *(T*)(m_head+1); }
};
// -----------------------------------------
// helper macro to hide template ugliness
// -----------------------------------------
#define Message(T) aMessage<storage<T>::local, T>
// something like typedef aMessage<storage<T>::local, T> Message<T>
// ==========================================================================
// Example
// ==========================================================================
#include <cstdio>
#include <cstring>
// message sending
int some_DMA_trick(int destination, void * data, uint16_t size)
{
printf("sending %d bytes #%p to %08X\n", size, data, destination);
return 1;
}
// some dynamic contents
struct gizmo {
char * s;
gizmo(void) { s = nullptr; };
gizmo (const gizmo& g) = delete;
gizmo (const char * msg)
{
s = new char[strlen(msg) + 3];
strcpy(s, msg);
strcat(s, "#");
}
gizmo (gizmo&& g)
{
s = g.s;
g.s = nullptr;
strcat(s, "*");
}
~gizmo()
{
delete s;
}
gizmo& operator=(gizmo g)
{
std::swap(s, g.s);
return *this;
}
bool operator!=(gizmo& g)
{
return strcmp (s, g.s) != 0;
}
};
// some small contents
struct small {
int a, b, c;
gizmo g;
small (gizmo g, int a, int b, int c)
: a(a), b(b), c(c), g(std::move(g))
{
}
void trace(void)
{
printf("small: %d %d %d %s\n", a, b, c, g.s);
}
};
// some big contents
struct big {
gizmo lump[1000];
big(const char * msg = "?")
{
for (size_t i = 0; i != sizeof(lump) / sizeof(lump[0]); i++)
lump[i] = gizmo (msg);
}
void trace(void)
{
printf("big: set to ");
gizmo& first = lump[0];
for (size_t i = 1; i != sizeof(lump) / sizeof(lump[0]); i++)
if (lump[i] != first) { printf(" Erm... mostly "); break; }
printf("%s\n", first.s);
}
};
int main(void)
{
// macros
BuildMessage(mmsg1, small, WorldPeace, gizmo("Hi"), 1, 2, 3);
BuildMessage(mmsg2, big , Armaggedon, "Doom");
((small *)mmsg1.location())->trace();
((big *)mmsg2.location())->trace();
mmsg1.send(0x1000);
mmsg2.send(0x2000);
// templates
Message (small) tmsg1(MessageID::WorldPeace, gizmo("Hello"), 4, 5, 6);
Message (big ) tmsg2(MessageID::Armaggedon, "Damnation");
tmsg1.contents().trace();
tmsg2.contents().trace();
tmsg1.send(0x3000);
tmsg2.send(0x4000);
}
output:
small: 1 2 3 Hi#*
big: set to Doom#
sending 20 bytes #0xbf81be20 to 00001000
sending 4004 bytes #0x9e58018 to 00002000
small: 4 5 6 Hello#**
big: set to Damnation#
sending 20 bytes #0xbf81be0c to 00003000
sending 4004 bytes #0x9e5ce50 to 00004000
Arguments forwarding
I see little point in doing constructor parameters forwarding here.
Any bit of dynamic data referenced by the message contents would have to be either static or copied into the message body, otherwise the referenced data would vanish as soon as the message creator would go out of scope.
If the users of this wonderfully efficient library start passing around magic pointers and other global data inside messages, I wonder how the global system performance will like that. But that's none of my business, after all.
Macros
I resorted to a macro to hide the template ugliness in type definition.
If someone has an idea to get rid of it, I'm interested.
Efficiency
The template variation requires an extra forwarding of the contents parameters to reach the constructor. I can't see how that could be avoided.
The macro version wastes 68 bytes of memory for big messages, and some memory for small ones (64 - sizeof (contents object)).
Performance-wise, this extra bit of memory is the only gain the templates offer. Since all these objects are supposedly constructed on the stack and live for a handful of microseconds, it is pretty neglectible.
Compared to your initial version, this one should handle message sending more efficiently for big messages. Here again, if these messages are rare and only offered for convenience, the difference is not terribly useful.
The template version maintains a single pointer to the message payload, that could be spared for small messages if you implemented a specialized version of the send function.
Hardly worth the code duplication, IMHO.
A last word
I think I know pretty well how an operating system works and what performances concerns might be. I wrote quite a few real-time applications, plus some drivers and a couple of BSPs in my time.
I also saw more than once a very efficient system layer ruined by too permissive an interface that allowed application software programmers to do the silliest things without even knowing.
That is what triggered my initial reaction.
If I had my say in global system design, I would forbid all these magic pointers and other under-the-hood mingling with object references, to limit non-specialist users to an inoccuous use of system layers, instead of allowing them to inadvertently spread cockroaches through the system.
Unless the users of this interface are template and real-time savvies, they will not understand a bit what is going on beneath the syntactic sugar crust, and might very soon shoot themselves (and their co-workers and the application software) in the foot.
Suppose a poor application software programmer adds a puny field in one of its structs and crosses unknowingly the 64 bytes barrier. All of a sudden the system performance will crumble, and you will need Mr template & real time expert to explain the poor guy that what he did killed a lot of kittens.
Even worse, the system degradation might be progressive or unnoticeable at first, so one day you might wake up with thousands of lines of code that did dynamic allocations for years without anybody noticing, and the global overhaul to correct the problem might be huge.
If, on the other hand, all people in your company are munching at templates and mutexes for breakfast, syntactic sugar is not even required in the first place.

Related

How to represent existing data as std::vector

I have to pass existing data (unsigned char memory area with known size) to the library function expecting const std::vector<std::byte>& . Is there any way to "fool" the library function to believe that it received a vector while operating on existing data?
I have data from the old legacy as a pointer and size, not as a std::vector. Legacy C code allocates memory by malloc() and provides pointer and size. Please do not suggest touching the legacy code - by the end of the phrase I'll cease to be an employee of the company.
I don't want to create temporary vector and copy data because memory throughtput is huge (> 5GB/sec).
Placement new creates vector - but with the first bytes used for the vector data itself. I cannot use few bytes before the memory area - legacy code didn't expect that (see above - memory area is allocated by malloc()).
Changing third party library is out of question. It expects const std::vectorstd::byte& - not span iterators etc.
It looks that I have no way but to go with temporary vector but maybe there are other ideas... I wouldn't care but it is about intensive video processing and there will be a lot of data to copy for nothing.
Is there any way to "fool" the library function to believe that it received a vector while operating on existing data?
No.
The potential options are:
Put the data in a vector in the first place.
Or change the function expecting a vector to not expect a vector.
Or create a vector and copy the data.
If 1. and 2. are not valid options for you, that leaves you with 3. whether you want it or not.
As the top answer mentions, this is impossible to do in standard C++. And you should not try to do it.
If you can tolerate only using libstdc++ and getting potentially stuck with a specific standard library version, it looks like you can do it. Again, you should not do this. I'm only writing this answer as it seems to be possible without UB in this specific circumstance.
It appears that the current version of libstdc++ exposes their vectors' important members as protected: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_vector.h#L422
All you need to do is inherit from std::vector (it's not forbidden), write your own constructor for setting these protected members, and write a destructor to reset the members so that the actual vector destructor does not delete your memory.
#include <vector>
#include <cstddef>
template <class T>
struct dont_use_me_in_prod : std::vector<T>
{
dont_use_me_in_prod(T* data, size_t n) {
this->_M_impl._M_start = data;
this->_M_impl._M_finish = data + n;
this->_M_impl._M_end_of_storage = this->_M_impl._M_finish;
}
~dont_use_me_in_prod() {
this->_M_impl._M_start = nullptr;
this->_M_impl._M_finish = nullptr;
this->_M_impl._M_end_of_storage = nullptr;
}
};
void innocent_function(const std::vector<int>& v);
void please_dont_do_this_in_prod(int* vals, int n) {
dont_use_me_in_prod evil_vector(vals, n);
innocent_function(evil_vector);
}
Note that this is not compiler, but standard library dependent, meaning that it'll work with clang as well as long as you use libstdc++ with it. But this is not conforming, so you gotta fix innocent_function somehow soon:
https://godbolt.org/z/Tfcn7rdKq
The problem is std::vector is not a reference class like std::string_view or std::span. std::vector owns the managed memory. It allocates the memory and releases the owned memory. It is not designed to acquire the external buffer and release the managed buffer.
What you can do is a very dirty hack. You can create new structure with exactly the same layout as a std::vector, assign the data and size fields with what you get from external lib, and then pass this struct as a std::vector const& using reinterpret_cast. It can work as your library does not modify the vector (I assume they do not perform const_cast on std::vector const&).
The drawback is that code is unmaintainable. The next STL update can cause application crash, if the layout of the std::vector is changed.
Following is a pseudo code
struct FakeVector
{
std::byte* Data;
std::size Size;
std::size Capacity;
};
void onNewData(std::byte* ptr, size_t size)
{
auto vectorRef = FakeVector{ptr, size, size};
doSomething(*reinterpret_cast<std::vector<std::byte>*>(&vectorRef));
}
Well, I've found the way working for me. I must admit that it is not fully standard compliant because casting of vector results in undefined behavior but for the foreseeable future I wouldn't expect this to fail. Idea is to use my own Allocator for the vector that accepts the buffer from the legacy code and works on it. The problem is that std::vector<std::byte> calls default initialization on resize() that zeroes the buffer. If there is a way to disable that - it would be a perfect solution but I have not found... So here the ugly cast comes - from the std::vector<InnerType> where InnerType is nothing but std::byte with default constructor disabled to the std::vector<std::byte> that library expects. Working code is shown at https://godbolt.org/z/7jME79EE9 , also here:
#include <cstdlib>
#include <iostream>
#include <vector>
#include <cstddef>
struct InnerType {
std::byte value;
InnerType() {}
InnerType(std::byte v) : value(v) {}
};
static_assert(sizeof(InnerType) == sizeof(std::byte));
template <class T> class AllocatorExternalBufferT {
T* const _buffer;
const size_t _size;
public:
typedef T value_type;
constexpr AllocatorExternalBufferT() = delete;
constexpr AllocatorExternalBufferT(T* buf, size_t size) : _buffer(buf), _size(size) {}
[[nodiscard]] T* allocate(std::size_t n) {
if (n > _size / sizeof(T)) {
throw std::bad_array_new_length();
}
return _buffer;
}
void deallocate(T*, std::size_t) noexcept {}
};
template <class T, class U> bool operator==(const AllocatorExternalBufferT <T>&, const AllocatorExternalBufferT <U>&) { return true; }
template <class T, class U> bool operator!=(const AllocatorExternalBufferT <T>&, const AllocatorExternalBufferT <U>&) { return false; }
typedef std::vector<InnerType, AllocatorExternalBufferT<InnerType>> BufferDataVector;
typedef std::vector<std::byte, AllocatorExternalBufferT<std::byte>> InterfaceVector;
static void report(const InterfaceVector& vec) {
std::cout << "size=" << vec.size() << " capacity=" << vec.capacity() << " ";
for(const auto& el : vec) {
std::cout << static_cast<int>(el) << " ";
}
std::cout << "\n";
}
int main() {
InnerType buffer4allocator[16] ;
BufferDataVector v((AllocatorExternalBufferT<InnerType>(buffer4allocator, sizeof(buffer4allocator)))); // double parenthesis here for "most vexing parse" nonsense
v.resize(sizeof(buffer4allocator));
std::cout << "memory area kept intact after resizing vector:\n";
report(*reinterpret_cast<InterfaceVector*>(&v));
}
Yes you can do this. Not in a nice safe way but it's certainly possible.
All you need to do is create a fake std::vector that has the same ABI (memory layout) as std::vector. Then set it's internal pointer to point to your data and reinterpet_cast your fake vector back to a std::vector.
I wouldn't recommend it unless you really need to do it because any time your compiler changes its std::vector ABI (field layout basically) it will break. Though to be fair that is very unlikely to happen these days.

Easy way of managing the recycling of C++ STL vectors of POD types

My application consists of calling dozens of functions millions of times. In each of those functions, one or a few temporary std::vector containers of POD (plain old data) types are initialized, used, and then destructed. By profiling my code, I find the allocations and deallocations lead to a huge overhead.
A lazy solution is to rewrite all the functions as functors containing those temporary buffer containers as class members. However this would blow up the memory consumption as the functions are many and the buffer sizes are not trivial.
A better way is to analyze the code, gather all the buffers, premeditate how to maximally reuse them, and feed a minimal set of shared buffer containers to the functions as arguments. But this can be too much work.
I want to solve this problem once for all my future development during which temporary POD buffers become necessary, without having to have much premeditation. My idea is to implement a container port, and take the reference to it as an argument for every function that may need temporary buffers. Inside those functions, one should be able to fetch containers of any POD type from the port, and the port should also auto-recall the containers before the functions return.
// Port of vectors of POD types.
struct PODvectorPort
{
std::size_t Nlent; // Number of dispatched containers.
std::vector<std::vector<std::size_t> > X; // Container pool.
PODvectorPort() { Nlent = 0; }
};
// Functor that manages the port.
struct PODvectorPortOffice
{
std::size_t initialNlent; // Number of already-dispatched containers
// when the office is set up.
PODvectorPort *p; // Pointer to the port.
PODvectorPortOffice(PODvectorPort &port)
{
p = &port;
initialNlent = p->Nlent;
}
template<typename X, typename Y>
std::vector<X> & repaint(std::vector<Y> &y) // Repaint the container.
{
// return *((std::vector<X>*)(&y)); // UB although works
std::vector<X> *rst = nullptr;
std::memcpy(&rst, &y, std::min(
sizeof(std::vector<X>*), sizeof(std::vector<Y>*)));
return *rst; // guess it makes no difference. Should still be UB.
}
template<typename T>
std::vector<T> & lend()
{
++p->Nlent;
// Ensure sufficient container pool size:
while (p->X.size() < p->Nlent) p->X.push_back( std::vector<size_t>(0) );
return repaint<T, std::size_t>( p->X[p->Nlent - 1] );
}
void recall() { p->Nlent = initialNlent; }
~PODvectorPortOffice() { recall(); }
};
struct ArbitraryPODstruct
{
char a[11]; short b[7]; int c[5]; float d[3]; double e[2];
};
// Example f1():
// f2(), f3(), ..., f50() are similarly defined.
// All functions are called a few million times in certain
// order in main().
// port is defined in main().
void f1(other arguments..., PODvectorPort &port)
{
PODvectorPort portOffice(port);
// Oh, I need a buffer of chars:
std::vector<char> &tmpchar = portOffice.lend();
tmpchar.resize(789); // Trivial if container already has sufficient capacity.
// ... do things
// Oh, I need a buffer of shorts:
std::vector<short> &tmpshort = portOffice.lend();
tmpshort.resize(456); // Trivial if container already has sufficient capacity.
// ... do things.
// Oh, I need a buffer of ArbitraryPODstruct:
std::vector<ArbitraryPODstruct> &tmpArb = portOffice.lend();
tmpArb.resize(123); // Trivial if container already has sufficient capacity.
// ... do things.
// Oh, I need a buffer of integers, but also tmpArb is no longer
// needed. Why waste it? Cache hot.
std::vector<int> &tmpint = portOffice.repaint(tmpArb);
tmpint.resize(300); // Trivial.
// ... do things.
}
Although the code is compliable by both gcc-8.3 and MSVS 2019 with -O2 to -Ofast, and passes extensive tests for all options, I expect criticism due to the hacky nature of PODvectorPortOffice::repaint(), which "casts" the vector type in-place.
A set of sufficient but not necessary conditions for the correctness and efficiency of the above code are:
std::vector<T> stores 3 pointers to the underlying buffer's &[0], &[0] + .size(), &[0] + .capacity().
std::vector<T>'s allocator calls malloc().
malloc() returns an 8-byte (or sizeof(std::size_t)) aligned address.
So, if this is unacceptable to you, what would be the modern, proper way of addressing my need? Is there a way of writing a manager that achieve what my code does only without violating the Standard?
Thanks!
Edits: A little more context of my problem. Those functions mainly compute some simple statistics of the inputs. The inputs are data streams of financial parameters of different types and sizes. To compute the statistics, those data need to be altered and re-arranged first, thus the buffers for temporary copies. Computing the statistics is cheap, thus the allocations and deallocations can become expensive, relatively. Why do I want a manger for arbitrary POD type? Because 2 weeks from now I may start receiving a data stream of a different type, which can be a bunch of primitive types zipped in a struct, or a struct of the composite types encountered so far. I, of course, would like the upper stream to just send separate flows of primitive types, but I have no control of that aspect.
More edits: after tons of reading and code experimenting regarding the strict aliasing rule, the answer should be, don't try everything I put up there --- it works, for now, but don't do it. Instead, I'll be diligent and stick to my previous code-as-you-go style, just add a vector<vector<myNewType> > into the port once a new type comes up, and manage it in a similar way. The accepted answer also offers a nice alternative.
Even more edits: conceived a stronger class that has better chance to thwart potential optimizations under the strict aliasing rule. DO NOT USE IT WITHOUT TESTING AND THOROUGH UNDERSTANDING OF THE STRICT ALIASING RULE.
// -std=c++17
#include <cstring>
#include <cstddef>
#include <iostream>
#include <vector>
#include <chrono>
// POD: plain old data.
// Idea: design a class that can let you maximally reuse temporary
// containers during a program.
// Port of vectors of POD types.
template <std::size_t portsize = 42>
class PODvectorPort
{
static constexpr std::size_t Xsize = portsize;
std::size_t signature;
std::size_t Nlent; // Number of dispatched containers.
std::vector<std::size_t> X[portsize]; // Container pool.
PODvectorPort(const PODvectorPort &);
PODvectorPort & operator=( const PODvectorPort& );
public:
std::size_t Ndispatched() { return Nlent; }
std::size_t showSignature() { return signature; }
PODvectorPort() // Permuted random number generator.
{
std::size_t state = std::chrono::high_resolution_clock::now().time_since_epoch().count();
state ^= (uint64_t)(&std::memmove);
signature = ((state >> 18) ^ state) >> 27;
std::size_t rot = state >> 59;
signature = (signature >> rot) | (state << ((-rot) & 31));
Nlent = 0;
}
template<typename podvecport>
friend class PODvectorPortOffice;
};
// Functor that manages the port.
template<typename podvecport>
class PODvectorPortOffice
{
// Number of already-dispatched containers when the office is set up.
std::size_t initialNlent;
podvecport *p; // Pointer to the port.
PODvectorPortOffice( const PODvectorPortOffice& ); // non construction-copyable
PODvectorPortOffice& operator=( const PODvectorPortOffice& ); // non copyable
constexpr void check()
{
while (__cplusplus < 201703)
{
std::cerr << "PODvectorPortOffice: C++ < 17, Stall." << std::endl;
}
// Check if allocation will be 8-byte (or more) aligned.
// Intend it not to work on machine < 64-bit.
constexpr std::size_t aln = alignof(std::max_align_t);
while (aln < 8)
{
std::cerr << "PODvectorPortOffice: Allocation is not at least 8-byte aligned, Stall." <<
std::endl;
}
while ((aln & (aln - 1)) != 0)
{
std::cerr << "PODvectorPortOffice: Alignment is not a power of 2 bytes. Stall." << std::endl;
}
// Random checks to see if sizeof(vector<S>) != sizeof(vector<T>).
if(true)
{
std::size_t vecHeadSize[16] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
vecHeadSize[0] = sizeof(std::vector<char>(0));
vecHeadSize[1] = sizeof(std::vector<short>(1));
vecHeadSize[2] = sizeof(std::vector<int>(2));
vecHeadSize[3] = sizeof(std::vector<long>(3));
vecHeadSize[4] = sizeof(std::vector<std::size_t>(5));
vecHeadSize[5] = sizeof(std::vector<float>(7));
vecHeadSize[6] = sizeof(std::vector<double>(11));
vecHeadSize[7] = sizeof(std::vector<std::vector<char> >(13));
vecHeadSize[8] = sizeof(std::vector<std::vector<int> >(17));
vecHeadSize[9] = sizeof(std::vector<std::vector<double> >(19));
struct tmpclass1 { char a; short b; };
struct tmpclass2 { char a; float b; };
struct tmpclass3 { char a; double b; };
struct tmpclass4 { int a; char b; };
struct tmpclass5 { double a; char b; };
struct tmpclass6 { double a[5]; char b[3]; short c[3]; };
vecHeadSize[10] = sizeof(std::vector<tmpclass1>(23));
vecHeadSize[11] = sizeof(std::vector<tmpclass2>(29));
vecHeadSize[12] = sizeof(std::vector<tmpclass3>(31));
vecHeadSize[13] = sizeof(std::vector<tmpclass4>(37));
vecHeadSize[14] = sizeof(std::vector<tmpclass4>(41));
vecHeadSize[15] = sizeof(std::vector<tmpclass4>(43));
std::size_t notSame = 0;
for(int i = 0; i < 16; ++i)
notSame += vecHeadSize[i] != sizeof(std::size_t) * 3;
while (notSame)
{
std::cerr << "sizeof(std::vector<S>) != sizeof(std::vector<T>), \
PODvectorPortOffice cannot handle. Stall." << std::endl;
}
}
}
void recall() { p->Nlent = initialNlent; }
public:
PODvectorPortOffice(podvecport &port)
{
check();
p = &port;
initialNlent = p->Nlent;
}
template<typename X, typename Y>
std::vector<X> & repaint(std::vector<Y> &y) // Repaint the container.
// AFTER A VECTOR IS REPAINTED, DO NOT USE THE OLD VECTOR AGAIN !!
{
while (std::is_same<bool, X>::value)
{
std::cerr << "PODvectorPortOffice: Cannot repaint the vector to \
std::vector<bool>. Stall." << std::endl;
}
std::vector<X> *x;
std::vector<Y> *yp = &y;
std::memcpy(&x, &yp, sizeof(x));
return *x; // Not compliant with strict aliasing rule.
}
template<typename T>
std::vector<T> & lend()
{
while (p->Nlent >= p->Xsize)
{
std::cerr << "PODvectorPortOffice: No more containers. Stall." << std::endl;
}
++p->Nlent;
return repaint<T, std::size_t>( p->X[p->Nlent - 1] );
}
~PODvectorPortOffice()
{
// Because p->signature can only be known at runtime, an aggressive,
// compliant compiler (ACC) will never remove this
// branch. Volatile might do, but trustworthiness?
if(p->signature == 0)
{
constexpr std::size_t sizeofvec = sizeof(std::vector<std::size_t>);
char dummy[sizeofvec * p->Xsize];
std::memcpy(dummy, p->X, p->Nlent * sizeofvec);
std::size_t ticketNum = 0;
char *xp = (char*)(p->X);
for(int i = 0, iend = p->Nlent * sizeofvec; i < iend; ++i)
{
xp[i] &= xp[iend - i - 1] * 5;
ticketNum += xp[i] ^ ticketNum;
}
std::cerr << "Congratulations! After the port office was decommissioned, \
you found a winning lottery ticket. The odds is less than 2.33e-10. Your \
ticket number is " << ticketNum << std::endl;
std::memcpy(p->X, dummy, p->Nlent * sizeofvec);
// According to the strict aliasing rule, a char* can point to any memory
// block pointed by another pointer of any type T*. Thus given an ACC,
// the writes to that block via the char* must be fully acknowledged in
// time by T*, namely, for reading contents from T*, a reload instruction
// will be kept in the assembly code to achieve a sort of
// "register-cache-memory coherence" (RCMC).
// We also do not care about the renters' (who received the reference via
// .lend()) RCMC, because PODvectorPortOffice never accesses the contents
// of those containers.
}
recall();
}
};
Any adversarial test case to break it, especially on GCC>=8.3 or MSVS >= 2019, is welcomed!
Let me frame this by saying I don't think there's an "authoritative" answer to this question. That said, you've provided enough constraints that a suggested path is at least worthwhile. Let's review the requirements:
Solution must use std::vector. This is in my opinion the most unfortunate requirement for reasons I won't get into here.
Solution must be standards compliant and not resort to rule violations, like the strict aliasing rule.
Solution must either reduce the number of allocations performed, or reduce the overhead of allocations to the point of being negligible.
In my opinion this is definitely a job for a custom allocator. There are a couple of off-the-shelf options that come close to doing what you want, for example the Boost Pool Allocators. The one you're most interested in is boost::pool_allocator. This allocator will create a singleton "pool" for each distinct object size (note: not object type), which grows as needed, but never shrinks until you explicitly purge it.
The main difference between this and your solution is that you'll have distinct pools of memory for objects of different sizes, which means it will use more memory than your posted solution, but in my opinion this is a reasonable trade-off. To be maximally efficient, you could simply start a batch of operations by creating vectors of each needed type with an appropriate size. All subsequent vector operations which use these allocators will do trivial O(1) allocations and deallocations. Roughly in pseudo-code:
// be careful with this, probably want [[nodiscard]], this is code
// is just rough guidance:
void force_pool_sizes(void)
{
std::vector<int, boost::pool_allocator<int>> size_int_vect;
std::vector<SomePodSize16, boost::pool_allocator<SomePodSize16>> size_16_vect;
...
size_int_vect.resize(100); // probably makes malloc calls
size_16_vect.resize(200); // probably makes malloc calls
...
// on return, objects go out of scope, but singleton pools
// with allocated blocks of memory remain for future use
// until explicitly purged.
}
void expensive_long_running(void)
{
force_pool_sizes();
std::vector<int, boost::pool_allocator<int>> data1;
... do stuff, malloc/free will never be called...
std::vector<SomePodSize16, boost::pool_allocator<SomePodSize16>> data2;
... do stuff, malloc/free will never be called...
// free everything:
boost::singleton_pool<boost::pool_allocator_tag, sizeof(int)>::release_memory();
}
If you want to take this a step further on being memory efficient, if you know for a fact that certain pool sizes are mutually exclusive, you could modify the boost pool_allocator to use a slightly different singleton backing store which allows you to move a memory block from one block size to another. This is probably out of scope for now, but the boost code itself is straightforward enough, if memory efficiency is critical, it's probably worthwhile.
It's worth pointing out that there's probably some confusion about the strict aliasing rule, especially when it comes to implementing your own memory allocators. There are lots and lots of SO questions about strict aliasing and what it does and doesn't mean. This one is a good place to start.
The key takeaway is that it's perfectly ordinary and acceptable in low level C++ code to take an array of memory and cast it to some object type. If this were not the case, std::allocator wouldn't exist. You also wouldn't have much use for things like std::aligned_storage. Look at the example use case for std::aligned_storage on cppreference. An STL-like static_vector class is created which keeps an array of aligned_storage objects that get recast to a concrete type. Nothing about this is "unacceptable" or "illegal", but it does require some additional knowledge and care in handling.
The reason your solution is especially going to enrage the code lawyers is that you're taking pointers of one non-char object type and casting them to different non-char object types. This is a particularly offensive violation of the strict aliasing rule, but also not really necessary given some of your other options.
Also keep in mind that it's not an error to alias memory, it's a warning. I'm not saying go crazy with aliasing, but I am saying that as with all things C and C++, there are justifiable cases to break rules, when you have very thorough knowledge and understanding of both your compiler and the machine you're running on. Just be prepared for some very long and painful debug sessions if it turns out you didn't in fact know those two things as well as you thought you did.

Avoid memory allocation with std::function and member function

This code is just for illustrating the question.
#include <functional>
struct MyCallBack {
void Fire() {
}
};
int main()
{
MyCallBack cb;
std::function<void(void)> func = std::bind(&MyCallBack::Fire, &cb);
}
Experiments with valgrind shows that the line assigning to func dynamically allocates about 24 bytes with gcc 7.1.1 on linux.
In the real code, I have a few handfuls of different structs all with a void(void) member function that gets stored in ~10 million std::function<void(void)>.
Is there any way I can avoid memory being dynamically allocated when doing std::function<void(void)> func = std::bind(&MyCallBack::Fire, &cb); ? (Or otherwise assigning these member function to a std::function)
Unfortunately, allocators for std::function has been dropped in C++17.
Now the accepted solution to avoid dynamic allocations inside std::function is to use lambdas instead of std::bind. That does work, at least in GCC - it has enough static space to store the lambda in your case, but not enough space to store the binder object.
std::function<void()> func = [&cb]{ cb.Fire(); };
// sizeof lambda is sizeof(MyCallBack*), which is small enough
As a general rule, with most implementations, and with a lambda which captures only a single pointer (or a reference), you will avoid dynamic allocations inside std::function with this technique (it is also generally better approach as other answer suggests).
Keep in mind, for that to work you need guarantee that this lambda will outlive the std::function. Obviously, it is not always possible, and sometime you have to capture state by (large) copy. If that happens, there is no way currently to eliminate dynamic allocations in functions, other than tinker with STL yourself (obviously, not recommended in general case, but could be done in some specific cases).
As an addendum to the already existent and correct answer, consider the following:
MyCallBack cb;
std::cerr << sizeof(std::bind(&MyCallBack::Fire, &cb)) << "\n";
auto a = [&] { cb.Fire(); };
std::cerr << sizeof(a);
This program prints 24 and 8 for me, with both gcc and clang. I don't exactly know what bind is doing here (my understanding is that it's a fantastically complicated beast), but as you can see, it's almost absurdly inefficient here compared to a lambda.
As it happens, std::function is guaranteed to not allocate if constructed from a function pointer, which is also one word in size. So constructing a std::function from this kind of lambda, which only needs to capture a pointer to an object and should also be one word, should in practice never allocate.
Run this little hack and it probably will print the amount of bytes you can capture without allocating memory:
#include <iostream>
#include <functional>
#include <cstring>
void h(std::function<void(void*)>&& f, void* g)
{
f(g);
}
template<size_t number_of_size_t>
void do_test()
{
size_t a[number_of_size_t];
std::memset(a, 0, sizeof(a));
a[0] = sizeof(a);
std::function<void(void*)> g = [a](void* ptr) {
if (&a != ptr)
std::cout << "malloc was called when capturing " << a[0] << " bytes." << std::endl;
else
std::cout << "No allocation took place when capturing " << a[0] << " bytes." << std::endl;
};
h(std::move(g), &g);
}
int main()
{
do_test<1>();
do_test<2>();
do_test<3>();
do_test<4>();
}
With gcc version 8.3.0 this prints
No allocation took place when capturing 8 bytes.
No allocation took place when capturing 16 bytes.
malloc was called when capturing 24 bytes.
malloc was called when capturing 32 bytes.
Many std::function implementations will avoid allocations and use space inside the function class itself rather than allocating if the callback it wraps is "small enough" and has trivial copying. However, the standard does not require this, only suggests it.
On g++, a non-trivial copy constructor on a function object, or data exceeding 16 bytes, is enough to cause it to allocate. But if your function object has no data and uses the builtin copy constructor, then std::function won't allocate.
Also, if you use a function pointer or a member function pointer, it won't allocate.
While not directly part of your question, it is part of your example.
Do not use std::bind. In virtually every case, a lambda is better: smaller, better inlining, can avoid allocations, better error messages, faster compiles, the list goes on. If you want to avoid allocations, you must also avoid bind.
I propose a custom class for your specific usage.
While it's true that you shouldn't try to re-implement existing library functionality because the library ones will be much more tested and optimized, it's also true that it applies for the general case. If you have a particular situation like in your example and the standard implementation doesn't suite your needs you can explore implementing a version tailored to your specific use case, which you can measure and tweak as necessary.
So I have created a class akin to std::function<void (void)> that works only for methods and has all the storage in place (no dynamic allocations).
I have lovingly called it Trigger (inspired by your Fire method name). Please do give it a more suited name if you want to.
// helper alias for method
// can be used in user code
template <class T>
using Trigger_method = auto (T::*)() -> void;
namespace detail
{
// Polymorphic classes needed for type erasure
struct Trigger_base
{
virtual ~Trigger_base() noexcept = default;
virtual auto placement_clone(void* buffer) const noexcept -> Trigger_base* = 0;
virtual auto call() -> void = 0;
};
template <class T>
struct Trigger_actual : Trigger_base
{
T& obj;
Trigger_method<T> method;
Trigger_actual(T& obj, Trigger_method<T> method) noexcept : obj{obj}, method{method}
{
}
auto placement_clone(void* buffer) const noexcept -> Trigger_base* override
{
return new (buffer) Trigger_actual{obj, method};
}
auto call() -> void override
{
return (obj.*method)();
}
};
// in Trigger (bellow) we need to allocate enough storage
// for any Trigger_actual template instantiation
// since all templates basically contain 2 pointers
// we assume (and test it with static_asserts)
// that all will have the same size
// we will use Trigger_actual<Trigger_test_size>
// to determine the size of all Trigger_actual templates
struct Trigger_test_size {};
}
struct Trigger
{
std::aligned_storage_t<sizeof(detail::Trigger_actual<detail::Trigger_test_size>)>
trigger_actual_storage_;
// vital. We cannot just cast `&trigger_actual_storage_` to `Trigger_base*`
// because there is no guarantee by the standard that
// the base pointer will point to the start of the derived object
// so we need to store separately the base pointer
detail::Trigger_base* base_ptr = nullptr;
template <class X>
Trigger(X& x, Trigger_method<X> method) noexcept
{
static_assert(sizeof(trigger_actual_storage_) >=
sizeof(detail::Trigger_actual<X>));
static_assert(alignof(decltype(trigger_actual_storage_)) %
alignof(detail::Trigger_actual<X>) == 0);
base_ptr = new (&trigger_actual_storage_) detail::Trigger_actual<X>{x, method};
}
Trigger(const Trigger& other) noexcept
{
if (other.base_ptr)
{
base_ptr = other.base_ptr->placement_clone(&trigger_actual_storage_);
}
}
auto operator=(const Trigger& other) noexcept -> Trigger&
{
destroy_actual();
if (other.base_ptr)
{
base_ptr = other.base_ptr->placement_clone(&trigger_actual_storage_);
}
return *this;
}
~Trigger() noexcept
{
destroy_actual();
}
auto destroy_actual() noexcept -> void
{
if (base_ptr)
{
base_ptr->~Trigger_base();
base_ptr = nullptr;
}
}
auto operator()() const
{
if (!base_ptr)
{
// deal with this situation (error or just ignore and return)
}
base_ptr->call();
}
};
Usage:
struct X
{
auto foo() -> void;
};
auto test()
{
X x;
Trigger f{x, &X::foo};
f();
}
Warning: only tested for compilation errors.
You need to thoroughly test it for correctness.
You need to profile it and see if it has a better performance than other solutions. The advantage of this is because it's in house cooked you can make tweaks to the implementation to increase performance on your specific scenarios.
As #Quuxplusone mentioned in their answer-as-a-comment, you can use inplace_function here. Include the header in your project, and then use like this:
#include "inplace_function.h"
struct big { char foo[20]; };
static stdext::inplace_function<void(), 8> inplacefunc;
static std::function<void()> stdfunc;
int main() {
static_assert(sizeof(inplacefunc) == 16);
static_assert(sizeof(stdfunc) == 32);
inplacefunc = []() {};
// fine
struct big a;
inplacefunc = [a]() {};
// test.cpp:15:24: required from here
// inplace_function.h:237:33: error: static assertion failed: inplace_function cannot be constructed from object with this (large) size
// 237 | static_assert(sizeof(C) <= Capacity,
// | ~~~~~~~~~~^~~~~~~~~~~
// inplace_function.h:237:33: note: the comparison reduces to ‘(20 <= 8)’
}

Out-parameters and move semantics

Consider a case of a lock-free concurrent data structure where a pop() operation needs to return an item or false if the cointainer is empty (rather than blocking or throwing). The data structure is templated on a user type T, which can potentially be large (but also could be lightweight, and I want things to be efficient in either case). T has to be at least movable, but I don't want it to have to be copyable.
I was thinking that the function signature would be bool DS<T>::pop(T &item) so the item is extracted as an out-parameter rather than return value (which instead is used to indicate success or failure). However, how do I actually pass it out? Assume there's an underlying buffer. Would I do item = std::move(_buff[_tail])—does it make sense to move into a reference out-parameter? A downside is that the user would have to pass in a default-constructed T, which goes a bit against effective RAII because the result is an object that hasn't actually initialized its resources if the function fails.
Another option is returning an std::pair<bool, T> rather than using an out-parameter, but there again needs to be a default-constructible T which holds no resource in the case of failure, for the return std::make_pair(false, T).
A third option would be returning the item as std::unique_ptr<T>, but this incurs useless overhead in the case of T being a pointer or another lightweight type. While I could store just pointers in the data structure with the actual items stored externally, that incurs not just the penalty of additional dereferences and cache misses, but also removes the natural padding that items stored directly in the buffer add and help minimize producer and consumer threads from hitting the same cache lines.
#include <boost/optional.hpp>
#include <string>
template<class T>
struct atomic_queue
{
using value_type = T;
auto pop() -> boost::optional<T>
{
boost::optional<T> result;
/*
* insert atomic ops here, optionally filling result
*/
return result;
};
auto push(T&& arg) -> bool
{
/*
* insert atomic ops here, optionally stealing arg
*/
return true;
};
static auto make_empty_result() {
return boost::optional<T>();
}
};
struct difficult {
difficult(std::string);
difficult() = delete;
difficult(difficult const&) = delete;
difficult& operator=(difficult const&) = delete;
difficult(difficult &&) = default;
difficult& operator=(difficult &&) = default;
};
extern void spin();
int main()
{
atomic_queue<difficult> q;
auto d = difficult("arg");
while(not q.push(std::move(d)))
spin();
auto popped = q.make_empty_result();
while(not (popped = q.pop()))
spin();
auto& val = popped.get();
}

On what platforms will this crash, and how can I improve it?

I've written the rudiments of a class for creating dynamic structures in C++. Dynamic structure members are stored contiguously with (as far as my tests indicate) the same padding that the compiler would insert in the equivalent static structure. Dynamic structures can thus be implicitly converted to static structures for interoperability with existing APIs.
Foremost, I don't trust myself to be able to write Boost-quality code that can compile and work on more or less any platform. What parts of this code are dangerously in need of modification?
I have one other design-related question: Is a templated get accessor the only way of providing the compiler with the requisite static type information for type-safe code? As it is, the user of dynamic_struct must specify the type of the member they are accessing, whenever they access it. If that type should change, all of the accesses become invalid, and will either cause spectacular crashes—or worse, fail silently. And it can't be caught at compile time. That's a huge risk, and one I'd like to remedy.
Example of usage:
struct Test {
char a, b, c;
int i;
Foo object;
};
void bar(const Test&);
int main(int argc, char** argv) {
dynamic_struct<std::string> ds(sizeof(Test));
ds.append<char>("a") = 'A';
ds.append<char>("b") = '2';
ds.append<char>("c") = 'D';
ds.append<int>("i") = 123;
ds.append<Foo>("object");
bar(ds);
}
And the code follows:
//
// dynamic_struct.h
//
// Much omitted for brevity.
//
/**
* For any type, determines the alignment imposed by the compiler.
*/
template<class T>
class alignment_of {
private:
struct alignment {
char a;
T b;
}; // struct alignment
public:
enum { value = sizeof(alignment) - sizeof(T) };
}; // class alignment_of
/**
* A dynamically-created structure, whose fields are indexed by keys of
* some type K, which can be substituted at runtime for any structure
* with identical members and packing.
*/
template<class K>
class dynamic_struct {
public:
// Default maximum structure size.
static const int DEFAULT_SIZE = 32;
/**
* Create a structure with normal inter-element padding.
*/
dynamic_struct(int size = DEFAULT_SIZE) : max(size) {
data.reserve(max);
} // dynamic_struct()
/**
* Copy structure from another structure with the same key type.
*/
dynamic_struct(const dynamic_struct& structure) :
members(structure.members), max(structure.max) {
data.reserve(max);
for (iterator i = members.begin(); i != members.end(); ++i)
i->second.copy(&data[0] + i->second.offset,
&structure.data[0] + i->second.offset);
} // dynamic_struct()
/**
* Destroy all members of the structure.
*/
~dynamic_struct() {
for (iterator i = members.begin(); i != members.end(); ++i)
i->second.destroy(&data[0] + i->second.offset);
} // ~dynamic_struct()
/**
* Get a value from the structure by its key.
*/
template<class T>
T& get(const K& key) {
iterator i = members.find(key);
if (i == members.end()) {
std::ostringstream message;
message << "Read of nonexistent member \"" << key << "\".";
throw dynamic_struct_access_error(message.str());
} // if
return *reinterpret_cast<T*>(&data[0] + i->second.offset.offset);
} // get()
/**
* Append a member to the structure.
*/
template<class T>
T& append(const K& key, int alignment = alignment_of<T>::value) {
iterator i = members.find(key);
if (i != members.end()) {
std::ostringstream message;
message << "Add of already existing member \"" << key << "\".";
throw dynamic_struct_access_error(message.str());
} // if
const int modulus = data.size() % alignment;
const int delta = modulus == 0 ? 0 : sizeof(T) - modulus;
if (data.size() + delta + sizeof(T) > max) {
std::ostringstream message;
message << "Attempt to add " << delta + sizeof(T)
<< " bytes to struct, exceeding maximum size of "
<< max << ".";
throw dynamic_struct_size_error(message.str());
} // if
data.resize(data.size() + delta + sizeof(T));
new (static_cast<void*>(&data[0] + data.size() - sizeof(T))) T;
std::pair<iterator, bool> j = members.insert
({key, member(data.size() - sizeof(T), destroy<T>, copy<T>)});
if (j.second) {
return *reinterpret_cast<T*>(&data[0] + j.first->second.offset);
} else {
std::ostringstream message;
message << "Unable to add member \"" << key << "\".";
throw dynamic_struct_access_error(message.str());
} // if
} // append()
/**
* Implicit checked conversion operator.
*/
template<class T>
operator T&() { return as<T>(); }
/**
* Convert from structure to real structure.
*/
template<class T>
T& as() {
// This naturally fails more frequently if changed to "!=".
if (sizeof(T) < data.size()) {
std::ostringstream message;
message << "Attempt to cast dynamic struct of size "
<< data.size() << " to type of size " << sizeof(T) << ".";
throw dynamic_struct_size_error(message.str());
} // if
return *reinterpret_cast<T*>(&data[0]);
} // as()
private:
// Map from keys to member offsets.
map_type members;
// Data buffer.
std::vector<unsigned char> data;
// Maximum allowed size.
const unsigned int max;
}; // class dynamic_struct
There's nothing inherently wrong with this kind of code. Delaying type-checking until runtime is perfectly valid, although you will have to work hard to defeat the compile-time type system. I wrote a homogenous stack class, where you could insert any type, which functioned in a similar fashion.
However, you have to ask yourself- what are you actually going to be using this for? I wrote a homogenous stack to replace the C++ stack for an interpreted language, which is a pretty tall order for any particular class. If you're not doing something drastic, this probably isn't the right thing to do.
In short, you can do it, and it's not illegal or bad or undefined and you can make it work - but you only should if you have a very desperate need to do things outside the normal language scope. Also, your code will die horrendously when C++0x becomes Standard and now you need to move and all the rest of it.
The easiest way to think of your code is actually a managed heap of a miniature size. You place on various types of object.. they're stored contiguously, etc.
Edit: Wait, you didn't manage to enforce type safety at runtime either? You just blew compile-time type safety but didn't replace it? Let me post some far superior code (that is somewhat slower, probably).
Edit: Oh wait. You want to convert your dynamic_struct, as the whole thing, to arbitrary unknown other structs, at runtime? Oh. Oh, man. Oh, seriously. What. Just no. Just don't. Really, really, don't. That's so wrong, it's unbelievable. If you had reflection, you could make this work, but C++ doesn't offer that. You can enforce type safety at runtime per each individual member using dynamic_cast and type erasure with inheritance. Not for the whole struct, because given a type T you can't tell what the types or binary layout is.
I think the type-checking could be improved. Right now it will reinterpret_cast itself to any type with the same size.
Maybe create an interface to register client structures at program startup, so they may be verified member-by-member — or even rearranged on the fly, or constructed more intelligently in the first place.
#define REGISTER_DYNAMIC_STRUCT_CLIENT( STRUCT, MEMBER ) \
do dynamic_struct::registry< STRUCT >() // one registry obj per client type \
.add( # MEMBER, &STRUCT::MEMBER, offsetof( STRUCT, MEMBER ) ) while(0)
// ^ name as str ^ ptr to memb ^ check against dynamic offset
I have one question: what do you get out of it ?
I mean it's a clever piece of code but:
you're fiddling with memory, the chances of blow-up are huge
it's quite complicated too, I didn't get everything and I would certainly have to pose longer...
What I am really wondering is what you actually want...
For example, using Boost.Fusion
struct a_key { typedef char type; };
struct object_key { typedef Foo type; };
typedef boost::fusion<
std::pair<a_key, a_key::type>,
std::pair<object_key, object_key::type>
> data_type;
int main(int argc, char* argv[])
{
data_type data;
boost::fusion::at_key<a_key>(data) = 'a'; // compile time checked
}
Using Boost.Fusion you get compile-time reflection as well as correct packing.
I don't really see the need for "runtime" selection here (using a value as key instead of a type) when you need to pass the right type to the assignment anyway (char vs Foo).
Finally, note that this can be automated, thanks to preprocessor programming:
DECLARE_ATTRIBUTES(
mData,
(char, a)
(char, b)
(char, c)
(int, i)
(Foo, object)
)
Not much wordy than a typical declaration, though a, b, etc... will be inner types rather than attributes names.
This has several advantages over your solution:
compile-time checking
perfect compliance with default generated constructors / copy constructors / etc...
much more compact representation
no runtime lookup of the "right" member