Initialising a virtual class from a byte array using memcpy - c++

My app receives a byte[] via the network which contains the memberwise representation of a fixed-size struct which is out of my control. Let's call it Data:
struct Data {
int id;
int count;
}
This worked fine:
char buffer[]; // filled with bytes from the network...
Data data;
memcpy(&data, data, sizeof(Data));
Now, I want to make the Data type inherit from a class with a (pure) virtual function:
struct Data : public SomethingVirtual {
int id;
int count;
}
And the above code no longer works. For a virtual class, the first N bytes of the instance appear to contain a pointer to the vtable, and so the data is offset.
I could increase the pointer for the call to memcpy, but I'm now wondering if this approach is misguided. I would like to avoid having too many layers of indirection or memory copying if possible. Also the approach should work across architectures and compilers ideally, although I'm primarily targeting i686 using g++.
What is a good solution to this problem?

Once you derive from something and it has virtual memebers its no longer POD. You best solution is to serialize into your data structure then build your virtual class ontop of the trivial data.
struct Data
{...}
struct AddedValue : public SomethingVirtual
{
AddedValue(Data)
private:
Data MyData;
}

You may not use memcpy on non POD types. Possible solution would be to use aggregation instead of inheritance. And have Data as member value, that still would be POD type and thus can be used in memcpy.

Related

Union vs inheritance in structures

Assume I have some structures, which basically have a 'general' field, and some unique data, like:
struct A
{
char type;
int data;
};
struct B
{
char type;
int data[2];
};
And etc (I have a lot of them). So I can just make a base structure with same fields, and inherit others. I though that I can do the same thing using union, e.g.:
union AnyClass
{
struct A _a;
struct B _b;
...
};
I am receiving some data (which exactly fits the biggest member in union), so would prefer to use following syntax:
// to read it from buffer (I am receiving data from another PC, which stores data the same way (in the same union) as I do
char buf[sizeof(AnyClass)];
char type = buf[0]; // detect type
AnyClass inst;
memcpy(&inst, buf, sizeof(inst));
switch(type)
{
... // handle data according to its type
}
// if I want to create a new object, and send it, I can use
AnyClass myObj;
new (&myObj._b) B();
... // do whatever I want
NOTE: I am aware that I have to align data somehow, so both machines (received/sender) should interpretate buf correctly.
Can it run faster than same problem solution using BaseStructure and inherited others (so, I have to cast them right away), or it will be compiled to nearly the same code?
Is it OK to use, or it is just a poor design?
If there is another solution, can you explain it shortly?
The performance difference between mentioned approaches will be minor. It is a good chance that you will not notice it at all.
I would shape your classes like that:
class AnyClass
{
char type;
union
{
struct
{
int data1;
};
struct
{
int data2[2];
};
};
;
Note using anonymous structs and unions.
Why do you need the character buffer at all? Always allocate the typed structure and better define it without ctors and dectors. I do not like this line:
char type = buf[0]; // detect type
Here you directly assume the physical offset. The less assumptions about the layout of the structures you make, the better the result will be.

Variable class/struct structure? (Not template & not union?)

I have tried union...
struct foo
{
union
{
struct // 2 bytes
{
char var0_1;
};
struct // 5 bytes
{
char var1_1;
int var1_2;
};
};
};
Problem: Unions do what I want, except they will always take the size of the biggest datatype. In my case I need struct foo to have some initialization that allows me to tell it which structure to chose of the two (if that is even legal) as shown below.
So after that, I tried class template overloading...
template <bool B>
class foo { }
template <>
class foo<true>
{
char var1;
}
template <>
class foo<false>
{
char var0;
int var1;
}
Problem: I was really happy with templates and the fact that I could use the same variable name on the char and int, but the problem was the syntax. Because the classes are created on compile-time, the template boolean variable needed to be a hardcoded constant, but in my case the boolean needs to be user-defined on runtime.
So I need something of the two "worlds." How can I achieve what I'm trying to do?
!!NOTE: The foo class/struct will later be inherited, therefore as already mentioned, size of foo is of utmost importance.
EDIT#1::
Application:
Basically this will be used to read/write (using a pointer as an interface) a specific data buffer and also allow me to create (new instance of the class/struct) the same data buffer. The variables you see above specify the length. If it's a smaller data buffer, the length is written in a char/byte. If it's a bigger data buffer, the first char/byte is null as a flag, and the int specifies the length instead. After the length it's obvious that the actual data follows, hence why the inheritance. Size of class is of the utmost importance. I need to have my cake and eat it too.
A layer of abstraction.
struct my_buffer_view{
std::size_t size()const{
if (!m_ptr)return 0;
if (*m_ptr)return *m_ptr;
return *reinterpret_cast<std::uint32_t const*>(m_ptr+1);
}
std::uint8_t const* data() const{
if(!m_ptr)return nullptr;
if(*m_ptr)return m_ptr+1;
return m_ptr+5;
}
std::uint8_t const* begin()const{return data();}
std::uint8_t const* end()const{return data()+size();}
my_buffer_view(std::uint_t const*ptr=nullptr):m_ptr(ptr){}
my_buffer_view(my_buffer_view const&)=default;
my_buffer_view& operator=(my_buffer_view const&)=default;
private:
std::uint8_t const* m_ptr=0;
};
No variable sized data anywhere. I coukd have used a union for size etx:
struct header{
std::uint8_t short_len;
union {
struct{
std::uint32_t long_len;
std::uint8_t long_buf[1];
}
struct {
std::short_buf[1];
}
} body;
};
but I just did pointer arithmetic instead.
Writing such a buffer to a bytestream is another problem entirely.
Your solution does not make sense. Think about your solution: you could define two independents classes: fooTrue and fooFalse with corresponding members exactly with the same result.
Probably, you are looking for a different solution as inheritance. For example, your fooTrue is baseFoo and your fooFalse is derivedFoo with as the previous one as base and extends it with another int member.
In this case, you have the polymorphism as the method to work in runtime.
You can't have your cake and eat it too.
The point of templates is that the specialisation happens at compile time. At run time, the size of the class is fixed (albeit, in an implementation-defined manner).
If you want the choice to be made at run time, then you can't use a mechanism that determines size at compile-time. You will need a mechanism that accommodates both possible needs. Practically, that means your base class will need to be large enough to contain all required members - which is essentially what is happening with your union based solution.
In reference to your "!!NOTE". What you are doing qualifies as premature optimisation. You are trying to optimise size of a base class without any evidence (e.g. measurement of memory usage) that the size difference is actually significant for your application (e.g. that it causes your application to exhaust available memory). The fact that something will be a base for a number of other classes is not sufficient, on its own, to worry about its size.

C++ Class Representing Network Packet

I program mainly in C for the embedded world and recently I have been experimenting around with C++ and I have an idea. This question pertains to data transferred over a network.
Currently in C I do something like this contrived example (disregarding packing):
typedef struct {
time_t date;
float value;
} Message1;
typedef union {
char raw[sizeof(Message1)];
Message1 msg;
} Overlay;
int my_func(Message1* ptr)
{
/* do stuff with stuff */
}
Data is placed into Overlay.raw and inspected through msg (regarding endianness of course). Can I do something similar in C++ without using a struct?
class Message1 {
public:
time_t date;
float value;
int my_func() { /* do stuff with stuff */ };
}
typedef union {
char raw[sizeof(Message1)];
Message1 msg;
}
I've done some experiments and from what I can tell it seems to be working so far. However I want to know more details about how C++ aligns stuff in the class. Like, will it break if I put a private section after the public section? What if I use inheritance? Is this a Dumb(tm) thing to do?
You generally want to keep unions simple. None of the construct, copy, assign, or move semantics apply to them; even if members have the functions defined. It's generally not a good idea to use them with complex data types though, since you need to worry about vtables, placement of access modified members, etc... However, POD classes are basically the same as C structs (C++ structs are also essentially the same as classes).
As I understand it, memory layout isn't part of C++ the standard, aside from the order of member variables for POD types. Public, protected, and private variables can be placed in separate memory regions. I think inherited member layouts are also implementation defined. So any code that would depend on layout would be platform/compiler specific. Members are generally laid out in sequential order, but again it's generally not a good idea to depend on layout (multiple inheritance for example). Obviously alignments are still platform/compiler defined as well, but you can control alignment using alignas(T) (C++11).
Also, it's probably just style preference, but it might be better to use the union as an explicit type. instead of a typedef.
union pkt {
char raw[sizeof(Message)]
Message msg;
}
I can't see a good reason to use unions here, at all.
You get no benefit of using a union with a byte array over a cast of the struct pointer to a (char*).
If you want to send a packet you don't need a union to access the data.
typedef struct {
time_t date;
float value;
} Message1;
void sendData(uin8_t *pData, int size)
{
while (size--)
sendByte(*pData++);
}
int main()
{
Message1 myMessage;
sendData( &myMessage, sizeof(myMessage) );
}
Btw. Sending data directly from a structure over a network results regular in problems with padding and/or endianess between different platforms.

Alternative schemes for implementing vptr?

This question is not about the C++ language itself(ie not about the Standard) but about how to call a compiler to implement alternative schemes for virtual function.
The general scheme for implementing virtual functions is using a pointer to a table of pointers.
class Base {
private:
int m;
public:
virtual metha();
};
equivalently in say C would be something like
struct Base {
void (**vtable)();
int m;
}
the first member is usually a pointer to a list of virtual functions, etc. (a piece of area in the memory which the application has no control of). And in most case this happens to cost the size of a pointer before considering the members, etc. So in a 32bit addressing scheme around 4 bytes, etc. If you created a list of 40k polymorphic objects in your applications, this is around 40k x 4 bytes = 160k bytes before any member variables, etc. I also know this happens to be the fastest and common implementation among C++ compiles.
I know this is complicated by multiple inheritance (especially with virtual classes in them, ie diamond struct, etc).
An alternative way to do the same is to have the first variable as a index id to a table of vptrs(equivalently in C as below)
struct Base {
char classid; // the classid here is an index into an array of vtables
int m;
}
If the total number of classes in an application is less than 255(including all possible template instantiations, etc), then a char is good enough to hold an index thereby reducing the size of all polymorphic classes in the application(I am excluding alignment issues, etc).
My questions is, is there any switch in GNU C++, LLVM, or any other compiler to do this?? or reduce the size of polymorphic objects?
Edit: I understand about the alignment issues pointed out. Also a further point, if this was on a 64bit system(assuming 64bit vptr) with each polymorphic object members costing around 8 bytes, then the cost of vptr is 50% of the memory. This mostly relates to small polymorphics created in mass, so I am wondering if this scheme is possible for at least specific virtual objects if not the whole application.
You're suggestion is interesting, but it won't work if the executable is made of several modules, passing objects among them. Given they are compiled separately (say DLLs), if one module creates an object and passes it to another, and the other invokes a virtual method - how would it know which table the classid refers to? You won't be able to add another moduleid because the two modules might not know about each other when they are compiled. So unless you use pointers, I think it's a dead end...
A couple of observations:
Yes, a smaller value could be used to represent the class, but some processors require data to be aligned so that saving in space may be lost by the requirement to align data values to e.g. 4 byte boundaries. Further, the class-id must be in a well defined place for all members of a polymorphic inheritance tree, so it is likely to be ahead of other date, so alignment problems can't be avoided.
The cost of storing the pointer has been moved to the code, where every use of a polymorphic function requires code to translate the class-id to either a vtable pointer, or some equivalent data structure. So it isn't for free. Clearly the cost trade-off depends on the volume of code vs numer of objects.
If objects are allocated from the heap, there is usually space wasted in orer to ensure objects are alogned to the worst boundary, so even if there is a small amount of code, and a large number of polymorphic objects, the memory management overhead migh be significantly bigger than the difference between a pointer and a char.
In order to allow programs to be independently compiled, the number of classes in the whole program, and hence the size of the class-id must be known at compile time, otherwise code can't be compiled to access it. This would be a significant overhead. It is simpler to fix it for the worst case, and simplify compilation and linking.
Please don't let me stop you trying, but there are quite a lot more issues to resolve using any technique which may use a variable size id to derive the function address.
I would strongly encourage you to look at Ian Piumarta's Cola also at Wikipedia Cola
It actually takes a different approach, and uses the pointer in a much more flexible way, to to build inheritance, or prototype-based, or any other mechanism the developer requires.
No, there is no such switch.
The LLVM/Clang codebase avoids virtual tables in classes that are allocated by the tens of thousands: this work well in a closed hierachy, because a single enum can enumerate all possible classes and then each class is linked to a value of the enum. The closed is obviously because of the enum.
Then, virtuality is implemented by a switch on the enum, and appropriate casting before calling the method. Once again, closed. The switch has to be modified for each new class.
A first alternative: external vpointer.
If you find yourself in a situation where the vpointer tax is paid way too often, that is most of the objects are of known type. Then you can externalize it.
class Interface {
public:
virtual ~Interface() {}
virtual Interface* clone() const = 0; // might be worth it
virtual void updateCount(int) = 0;
protected:
Interface(Interface const&) {}
Interface& operator=(Interface const&) { return *this; }
};
template <typename T>
class InterfaceBridge: public Interface {
public:
InterfaceBridge(T& t): t(t) {}
virtual InterfaceBridge* clone() const { return new InterfaceBridge(*this); }
virtual void updateCount(int i) { t.updateCount(i); }
private:
T& t; // value or reference ? Choose...
};
template <typename T>
InterfaceBridge<T> interface(T& t) { return InterfaceBridge<T>(t); }
Then, imagining a simple class:
class Counter {
public:
int getCount() const { return c; }
void updateCount(int i) { c = i; }
private:
int c;
};
You can store the objects in an array:
static Counter array[5];
assert(sizeof(array) == sizeof(int)*5); // no v-pointer
And still use them with polymorphic functions:
void five(Interface& i) { i.updateCount(5); }
InterfaceBridge<Counter> ib(array[3]); // create *one* v-pointer
five(ib);
assert(array[3].getCount() == 5);
The value vs reference is actually a design tension. In general, if you need to clone you need to store by value, and you need to clone when you store by base class (boost::ptr_vector for example). It is possible to actually provide both interfaces (and bridges):
Interface <--- ClonableInterface
| |
InterfaceB ClonableInterfaceB
It's just extra typing.
Another solution, much more involved.
A switch is implementable by a jump table. Such a table could perfectly be created at runtime, in a std::vector for example:
class Base {
public:
~Base() { VTables()[vpointer].dispose(*this); }
void updateCount(int i) {
VTables()[vpointer].updateCount(*this, i);
}
protected:
struct VTable {
typedef void (*Dispose)(Base&);
typedef void (*UpdateCount)(Base&, int);
Dispose dispose;
UpdateCount updateCount;
};
static void NoDispose(Base&) {}
static unsigned RegisterTable(VTable t) {
std::vector<VTable>& v = VTables();
v.push_back(t);
return v.size() - 1;
}
explicit Base(unsigned id): vpointer(id) {
assert(id < VTables.size());
}
private:
// Implement in .cpp or pay the cost of weak symbols.
static std::vector<VTable> VTables() { static std::vector<VTable> VT; return VT; }
unsigned vpointer;
};
And then, a Derived class:
class Derived: public Base {
public:
Derived(): Base(GetID()) {}
private:
static void UpdateCount(Base& b, int i) {
static_cast<Derived&>(b).count = i;
}
static unsigned GetID() {
static unsigned ID = RegisterTable(VTable({&NoDispose, &UpdateCount}));
return ID;
}
unsigned count;
};
Well, now you'll realize how great it is that the compiler does it for you, even at the cost of some overhead.
Oh, and because of alignment, as soon as a Derived class introduces a pointer, there is a risk that 4 bytes of padding are used between Base and the next attribute. You can use them by careful selecting the first few attributes in Derived to avoid padding...
The short answer is that no, I don't know of any switch to do this with any common C++ compiler.
The longer answer is that to do this, you'd just about have to build most of the intelligence into the linker, so it could coordinate distributing the IDs across all the object files getting linked together.
I'd also point out that it wouldn't generally do a whole lot of good. At least in a typical case, you want each element in a struct/class at a "natural" boundary, meaning its starting address is a multiple of its size. Using your example of a class containing a single int, the compiler would allocate one byte for the vtable index, followed immediately by three byes of padding so the next int would land at an address that was a multiple of four. The end result would be that objects of the class would occupy precisely the same amount of storage as if we used a pointer.
I'd add that this is not a far-fetched exception either. For years, standard advice to minimize padding inserted into structs/classes has been to put the items expected to be largest at the beginning, and progress toward the smallest. That means in most code, you'd end up with those same three bytes of padding before the first explicitly defined member of the struct.
To get any good from this, you'd have to be aware of it, and have a struct with (for example) three bytes of data you could move where you wanted. Then you'd move those to be the first items explicitly defined in the struct. Unfortunately, that would also mean that if you turned this switch off so you have a vtable pointer, you'd end up with the compiler inserting padding that might otherwise be unnecessary.
To summarize: it's not implemented, and if it was wouldn't usually accomplish much.

Better way to extend data members of C++ class / struct

I have this problem again and again... and still have not a satisfactory answer...
Especially when I put the class into a container, later on I need to record more information on every element in the container during a specific processing, but after processing I do not need the extra information anymore....
I often found some libraries try to solve the above situation by defining a void* in their data structure to provide user-defined data structure extension. Just the same described in this Q&A.
But it produces memory / resource handling problem... and other problems that I feel this approach is error-prone.
In the modern day of object-oriented programming, I am thinking of
using inheritance & polymorphism. Use base class's pointer in the container, but then I have to add derived class's accessor into the base class. It is kind of strange...
is there any other better ways to extend a class's property while maintain container comparability in C++?
The best way to store extra data about a object without actually compromising the integrity of the object itself is to store a pair of data in the container instead.
struct User { ... };
struct ExtraData { ... };
typedef std::pair<User, ExtraData> UserAndExtraData;
Now I can create a container type in C++ which stores both pieces of information together without compromising the independence of either type.
std::vector<UserAndExtraData> vector;
I would look into the Decorator Pattern. You can decorate your objects while processing them then throw the decorated objects away. If there is a lot of shared data you can also look into the FlyWeight pattern.
"User" could be extended by template parameters. for example,
template <typename... Extra>
struct User : Extra...
{
...
};
struct ExtraData {...};
struct ExtraExtraData {...};
using ExtraUser = User<ExtraData>;
using MoreExtraUser = User<ExtraData, ExtraExtraData>;
In the modern day of object-oriented programming, I am thinking of
using inheritance & polymorphism. Use base class's pointer in the
container, but then I have to add derived class's accessor into the
base class. It is kind of stange...
you don't need to put a pointer to your derived class in your base class when using inheritance. You just need to cast to the derived class. the problem is getting your data into the derived objects when it's stored in the base objects - you can only cast them if they were created as the derived type, even if your collection holds them as the base type. (if they are created as the derived type, then just cast!)
So if you have a collection of BaseC, you can create a new class DerivedC that has a copy constructor that takes a BaseC. You can copy your BaseC object into it, perform your processing on the DerivedC objects and then copy these back into a BaseC object for storage. This uses the Flyweight pattern. Note that if you have a collection of BaseC objects, you cannot just pretend they are DerivedC classes as they will not have the storage to hold all the data members, you need to create new DerivedC objects.
Alternatively, create a new class just for processing that contains a (smart pointer) reference to your base class objects, copy the reference in, perform the processing, delete the processing objects when you're done.
If your objects are in a vector, then a simple approach is to make a parallel vector:
void doSomething(const vector<MyObject>& my_objects)
{
vector<ExtraData> extra_data;
int n_objects = extra_data.size();
extra_data.reserve(n_objects);
for (int i=0; i!=n_objects; ++i) {
extra_data.push_back(calcExtraData(my_objects[i]));
}
// now use my_objects[i] and extra_data[i] together.
// extra data goes away when the function returns.
}
You don't have to modify your original objects, and it is very efficient.
If you have some other container type, you can use a map:
void doSomething(const set<MyObject>& my_objects)
{
map<MyObject*,ExtraData> extra_data;
set<MyObject>::const_iterator i=my_objects.begin(), end=my_objects.end();
for (;i!=end;++i) {
extra_data[&*i] = calcExtraData(*i);
}
// now use extra_data[&obj] to access the extra data for obj.
// extra data goes away when the function returns.
}
this isn't as efficient as with vectors, but you still don't have to modify your original classes.
However, it becomes more difficult to maintain the parallel structures if the underlying container can change during the processing.
One simple option is to add a type parameter representing the "extra data"...
template<class ExtraDataType>
struct MyExtensibleContainer
{
...
ExtraDataType extra;
};
Perhaps if you indicate why this solution isn't sufficient, the true requirements will come through.
Example for int and void*:
struct IntOrVoid
{
};
struct IntOrVoid1 : IntOrVoid
{
int x;
};
struct IntOrVoid2 : IntOrVoid
{
void* x;
};
typedef shared_ptr<IntOrVoid> PIntOrVoid;
then use MyExtensibleContainer<PIntOrVoid>
or altenatively:
union IntOrVoid
{
int x_int;
void* x_voidp;
};
then use MyExtensibleContainer<IntOrVoid>
The problem you are describing has nothing to do with adding an "extra" data type. The problem you are describing has to do with holding a variant type that can have one of many hetrogeneous types. There are many ways to do this, it is a much more general problem.