Assume I have some structures, which basically have a 'general' field, and some unique data, like:
struct A
{
char type;
int data;
};
struct B
{
char type;
int data[2];
};
And etc (I have a lot of them). So I can just make a base structure with same fields, and inherit others. I though that I can do the same thing using union, e.g.:
union AnyClass
{
struct A _a;
struct B _b;
...
};
I am receiving some data (which exactly fits the biggest member in union), so would prefer to use following syntax:
// to read it from buffer (I am receiving data from another PC, which stores data the same way (in the same union) as I do
char buf[sizeof(AnyClass)];
char type = buf[0]; // detect type
AnyClass inst;
memcpy(&inst, buf, sizeof(inst));
switch(type)
{
... // handle data according to its type
}
// if I want to create a new object, and send it, I can use
AnyClass myObj;
new (&myObj._b) B();
... // do whatever I want
NOTE: I am aware that I have to align data somehow, so both machines (received/sender) should interpretate buf correctly.
Can it run faster than same problem solution using BaseStructure and inherited others (so, I have to cast them right away), or it will be compiled to nearly the same code?
Is it OK to use, or it is just a poor design?
If there is another solution, can you explain it shortly?
The performance difference between mentioned approaches will be minor. It is a good chance that you will not notice it at all.
I would shape your classes like that:
class AnyClass
{
char type;
union
{
struct
{
int data1;
};
struct
{
int data2[2];
};
};
;
Note using anonymous structs and unions.
Why do you need the character buffer at all? Always allocate the typed structure and better define it without ctors and dectors. I do not like this line:
char type = buf[0]; // detect type
Here you directly assume the physical offset. The less assumptions about the layout of the structures you make, the better the result will be.
Related
Assuming I've got the following struct A definition from a subcontractor:
struct A {
int var0;
int var1;
int var2;
};
I cannot change anything about it but I have to use this struct for the call to the subcontractors API. In my calculation components I'll use a more generic version called struct B:
struct B {
int var[3];
int other_vars[3];
// [...]
};
Given these, I am looking for a simple way to map the array var from my more generic struct B to the explicit variable declarations of struct A.
The current implementation is simple as stupid
a.var0 = b.var[0];
a.var1 = b.var[1];
a.var2 = b.var[2];
which produces a very large mapping file and perhaps failures in the future, if struct A gets an update.
Possible Solutions:
I thought about something like memcpy, but I think that is very unsafe..
#define MAP(from, to, var) \
std::memcpy(&to.var##0, &from.var, sizeof(from.var));
MAP(b, a, var);
Online GDB with this example
Sidenotes:
The structs are much bigger than shown here. There are a bunch of different variables defined that way with much higher indexes.
Due to the fact that the code for struct A is generated by the subcontractor (we get the generated .h file), I cannot guarantee that the variables are in the right order and not intermitted by other variables. That's why my possible solution is not good enought in my opinion.
As long as the members of struct A map to struct B::var in the same order, memcpy is the best way to do this. B both structs are 'standard-layout' types, they are guaranteed to occupy contiguous bytes of storage.
From C++ standard :
An object of trivially copyable or standard-layout type shall occupy contiguous bytes of storage.
To deal with any potential padding issues that may make this process unsafe; static assertions can be used to ensure everything lines up as expected.
#include <type_traits>
struct A {
int var0;
int var1;
int var2;
// int var3; // Adding this variable will cause static_assert to fail.
};
struct B {
int var[3];
int other_vars[3];
// [...]
};
// Note: for B to be trivially-copyable or standard-layout, all the members must also be.
static_assert( std::is_trivially_copyable<A>::value);
static_assert( std::is_trivially_copyable<B>::value);
static_assert( std::is_standard_layout<A>::value);
static_assert( std::is_standard_layout<B>::value);
static_assert(sizeof(A) == sizeof(B::var), "Incompatible mapping to subcontractor.");
Again, this only works so long as struct A doesn't change so that A::var0 should now map to B::var[1] or some non-contiguous order.
I have tried union...
struct foo
{
union
{
struct // 2 bytes
{
char var0_1;
};
struct // 5 bytes
{
char var1_1;
int var1_2;
};
};
};
Problem: Unions do what I want, except they will always take the size of the biggest datatype. In my case I need struct foo to have some initialization that allows me to tell it which structure to chose of the two (if that is even legal) as shown below.
So after that, I tried class template overloading...
template <bool B>
class foo { }
template <>
class foo<true>
{
char var1;
}
template <>
class foo<false>
{
char var0;
int var1;
}
Problem: I was really happy with templates and the fact that I could use the same variable name on the char and int, but the problem was the syntax. Because the classes are created on compile-time, the template boolean variable needed to be a hardcoded constant, but in my case the boolean needs to be user-defined on runtime.
So I need something of the two "worlds." How can I achieve what I'm trying to do?
!!NOTE: The foo class/struct will later be inherited, therefore as already mentioned, size of foo is of utmost importance.
EDIT#1::
Application:
Basically this will be used to read/write (using a pointer as an interface) a specific data buffer and also allow me to create (new instance of the class/struct) the same data buffer. The variables you see above specify the length. If it's a smaller data buffer, the length is written in a char/byte. If it's a bigger data buffer, the first char/byte is null as a flag, and the int specifies the length instead. After the length it's obvious that the actual data follows, hence why the inheritance. Size of class is of the utmost importance. I need to have my cake and eat it too.
A layer of abstraction.
struct my_buffer_view{
std::size_t size()const{
if (!m_ptr)return 0;
if (*m_ptr)return *m_ptr;
return *reinterpret_cast<std::uint32_t const*>(m_ptr+1);
}
std::uint8_t const* data() const{
if(!m_ptr)return nullptr;
if(*m_ptr)return m_ptr+1;
return m_ptr+5;
}
std::uint8_t const* begin()const{return data();}
std::uint8_t const* end()const{return data()+size();}
my_buffer_view(std::uint_t const*ptr=nullptr):m_ptr(ptr){}
my_buffer_view(my_buffer_view const&)=default;
my_buffer_view& operator=(my_buffer_view const&)=default;
private:
std::uint8_t const* m_ptr=0;
};
No variable sized data anywhere. I coukd have used a union for size etx:
struct header{
std::uint8_t short_len;
union {
struct{
std::uint32_t long_len;
std::uint8_t long_buf[1];
}
struct {
std::short_buf[1];
}
} body;
};
but I just did pointer arithmetic instead.
Writing such a buffer to a bytestream is another problem entirely.
Your solution does not make sense. Think about your solution: you could define two independents classes: fooTrue and fooFalse with corresponding members exactly with the same result.
Probably, you are looking for a different solution as inheritance. For example, your fooTrue is baseFoo and your fooFalse is derivedFoo with as the previous one as base and extends it with another int member.
In this case, you have the polymorphism as the method to work in runtime.
You can't have your cake and eat it too.
The point of templates is that the specialisation happens at compile time. At run time, the size of the class is fixed (albeit, in an implementation-defined manner).
If you want the choice to be made at run time, then you can't use a mechanism that determines size at compile-time. You will need a mechanism that accommodates both possible needs. Practically, that means your base class will need to be large enough to contain all required members - which is essentially what is happening with your union based solution.
In reference to your "!!NOTE". What you are doing qualifies as premature optimisation. You are trying to optimise size of a base class without any evidence (e.g. measurement of memory usage) that the size difference is actually significant for your application (e.g. that it causes your application to exhaust available memory). The fact that something will be a base for a number of other classes is not sufficient, on its own, to worry about its size.
I program mainly in C for the embedded world and recently I have been experimenting around with C++ and I have an idea. This question pertains to data transferred over a network.
Currently in C I do something like this contrived example (disregarding packing):
typedef struct {
time_t date;
float value;
} Message1;
typedef union {
char raw[sizeof(Message1)];
Message1 msg;
} Overlay;
int my_func(Message1* ptr)
{
/* do stuff with stuff */
}
Data is placed into Overlay.raw and inspected through msg (regarding endianness of course). Can I do something similar in C++ without using a struct?
class Message1 {
public:
time_t date;
float value;
int my_func() { /* do stuff with stuff */ };
}
typedef union {
char raw[sizeof(Message1)];
Message1 msg;
}
I've done some experiments and from what I can tell it seems to be working so far. However I want to know more details about how C++ aligns stuff in the class. Like, will it break if I put a private section after the public section? What if I use inheritance? Is this a Dumb(tm) thing to do?
You generally want to keep unions simple. None of the construct, copy, assign, or move semantics apply to them; even if members have the functions defined. It's generally not a good idea to use them with complex data types though, since you need to worry about vtables, placement of access modified members, etc... However, POD classes are basically the same as C structs (C++ structs are also essentially the same as classes).
As I understand it, memory layout isn't part of C++ the standard, aside from the order of member variables for POD types. Public, protected, and private variables can be placed in separate memory regions. I think inherited member layouts are also implementation defined. So any code that would depend on layout would be platform/compiler specific. Members are generally laid out in sequential order, but again it's generally not a good idea to depend on layout (multiple inheritance for example). Obviously alignments are still platform/compiler defined as well, but you can control alignment using alignas(T) (C++11).
Also, it's probably just style preference, but it might be better to use the union as an explicit type. instead of a typedef.
union pkt {
char raw[sizeof(Message)]
Message msg;
}
I can't see a good reason to use unions here, at all.
You get no benefit of using a union with a byte array over a cast of the struct pointer to a (char*).
If you want to send a packet you don't need a union to access the data.
typedef struct {
time_t date;
float value;
} Message1;
void sendData(uin8_t *pData, int size)
{
while (size--)
sendByte(*pData++);
}
int main()
{
Message1 myMessage;
sendData( &myMessage, sizeof(myMessage) );
}
Btw. Sending data directly from a structure over a network results regular in problems with padding and/or endianess between different platforms.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What are the differences between struct and class in C++
This question has been asked and answered a lot but every once in a while I come across something confusing.
To sum up, the difference between C++ structs and classes is the famous default public vs. private access. Apart from that, a C++ compiler treats a struct the same way it would treat a class. Structs could have constructors, copy constructors, virtual functions. etc. And the memory layout of a struct is same ad that of a class. And the reason C++ has structs is for backwward compatibility with C.
Now since people confuse as to which one to use, struct or class, the rule of thumb is if you have just plain old data, use a struct. Otherwise use a class. And I have read that structs are good in serialization but don't where this comes from.
Then the other day I came across this article: http://www.codeproject.com/Articles/468882/Introduction-to-a-Cplusplus-low-level-object-model
It says that if we have (directly quoting):
struct SomeStruct
{
int field1;
char field2;
double field3;
bool field4;
};
then this:
void SomeFunction()
{
SomeStruct someStructVariable;
// usage of someStructVariable
...
}
and this:
void SomeFunction()
{
int field1;
char field2;
double field3;
bool field4;
// usage of 4 variables
...
}
are the same.
It says the machine code generated is the same if we have a struct or just write down the variables inside the function. Now of course this only applies if your struct if a POD.
This is where I get confused. In Effective C++ Scott Meyers says that there no such thing as an empty class.
If we have:
class EmptyClass { };
It is actually laid out by the compiler for example as:
class EmptyClass
{
EmptyClass() {}
~EmptyClass() {}
...
};
So you would not have an empty class.
Now if we change the above struct to a class:
class SomeClass
{
int field1;
char field2
double field3;
bool field4;
};
does it mean that:
void SomeFunction()
{
someClass someClassVariable;
// usage of someClassVariable
...
}
and this:
void SomeFunction()
{
int field1;
char field2
double field3;
bool field4;
// usage of 4 variables
...
}
are the same in terms of machine instructions? That there is no call to someClass constructor? Or that the memory allocated is the same as instantiating a class or defining the variables individually? And what about padding? structs and classes do padding. Would padding be the same in these cases?
I'd really appreciate if somebody can shed some light on to this.
I believe the author of that article is mistaken. Although there is probably no difference between the struct and the non-member variable layout version of the two functions, I don't think this is guaranteed. The only things I can think of that are guaranteed here is that since it's a POD, the address of the struct and the first member are the same...and each member follows in memory after that at some point.
In neither case, since it's a POD (and classes can be too, don't make THAT mistake) will the data be initialized.
I would recommend not making such an assumption anyway. If you wrote code that leveraged it, and I can't imagine why you'd want to, most other developers would find it baffling anyway. Only break out the legal books if you HAVE to. Otherwise prefer to code in manners that people are used to. The only important part of all this that you really should keep in mind that POD objects are not initialized unless you do so explicitly.
The only difference is that the members of structs are public by default, while the members of classes are private by default (when I say by default, I mean "unless specified otherwise"). Check out this code:
#include <iostream>
using namespace std;
struct A {
int x;
int y;
};
class A obj1;
int main() {
obj1.x = 0;
obj1.y = 1;
cout << obj1.x << " " << obj1.y << endl;
return 0;
}
The code compiles and runs just fine.
There is no difference between structs and classes besides the default for protection (note that default protection type for base classes is different also). Books and my own 20+ years experience tells this.
Regarding default empty ctor/dector. Standard is not asking for this. Nevertheless some compiler may generate this empty pair of ctor/dector. Every reasonable optimizer would immediately throw them away. If at some place a function that is doing nothing is called, how can you detect this? How this can affect anything besides consuming CPU cycles?
MSVC is not generating useless functions. It is reasonable to think that every good compiler will do the same.
Regarding the examples
struct SomeStruct
{
int field1;
char field2;
double field3;
bool field4;
};
void SomeFunction()
{
int field1;
char field2;
double field3;
bool field4;
...
}
The padding rules, order in memory, etc may be and most likely will be completely different. Optimizer may easily throw away unused local variable. It is much less likely (if possible at all) that optimizer will remove a data field from the struct. For this to happen the struct should be in defined in cpp file, certain flags should be set, etc.
I am not sure you will find any docs about padding of local vars on the stack. AFAIK, this is 100% up to compiler for making this layout. On the contrary, layout of the structs/classes are described, there are #pargma and command line keys that control this, etc.
are the same in terms of machine instructions?
There is no reason not to be. But there is no gurantee from the standard.
That there is no call to someClass constructor?
Yes there is a call to the constructor. But the constructor does no work (as all the members are POD and the way you declare someClass someClassVariable; causes value initialization which does nothing for POD members). So since there is no work to be done there is no need to plant any instructions.
Or that the memory allocated is the same as instantiating a class or defining the variables individually?
The class may contain padding that declaring the variables individually does not.
Also I am sure that the compiler will have an easier time optimizing away individual variables.
And what about padding?
Yes there is a possibility of padding in the structure (struct/class).
structs and classes do padding. Would padding be the same in these cases?
Yes. Just make sure you compare apples to apples (ie)
struct SomeStruct
{
int field1;
char field2;
double field3;
bool field4;
};
class SomeStruct
{
public: /// Make sure you add this line. Now they are identical.
int field1;
char field2;
double field3;
bool field4;
};
return *reinterpret_cast<UInt32*>((reinterpret_cast<char*>(this) + 2));
Struct is pragma packed 1 and contains a bunch of uint, char, short fields...
Since it's UInt32, should it first be reinterpret_cast to unsigned char* instead or does it even matter?
Also, speed is critical here and I believe reinterpret_cast is the fastest of the casts as opposed to static_cast.
EDIT: The struct is actually composed of two single-byte fields followed by a union of about 16 other structs 15 of which have the UInt32 as its first field. I do a quick check that it's not the one without and then do the reinterpret_cast to the 2 byte offset.
Can't you just access the member directly? This is undefined behavior and won't work at all on systems that enforce word alignment (which is probably not a problem given you're doing it but needs to be mentioned).
reinterpret_cast wouldn't be any faster than static_cast because they just tell the compiler how to use memory at compile time. However dynamic_cast would be slower.
There's no legal way to just treat your struct + offset as a non-char type.
reinterpret_cast and static_cast should have the same runtime -- next to zero unless numerical conversion needs to be performed. You should choose the cast to use not based on "speed", but based on correctness. If you were talking about dynamic_cast you might have a cause for argument, but both reinterpret_cast and static_cast usually lead to (at worst) a register copy (E.g. from an integer register into a floating point register). (Assuming no user defined conversion operators get into the picture, then it's a function call with all it's attendant stuff)
There is no safe way to do what you're doing. That breaks the strict aliasing rule. If you wanted to do something like this your struct would need to be in some form of union where you access the UInt32 through the union.
Finally, as already mentioned, that example will fail on any platform with alignment issues. That means you'll be fine on x86, but will not be fine on x64, for example.
You forgot to mention, that you are using a pointer to an struct, not a struct by itself, in any case, I find unnecesary to use pointer arithmetic, for a particular field of a struct. The compiler and generated code, won't be any faster for using pointer arithmetic, and would make your code more comples, unnecesarily:
struct AnyInfoStruct {
char Name[65];
char Address[65];
short Whatever;
uint Years;
union AExtraData {
int A;
char B;
double C;
} ExtraData
};
// recieves generic pointer, hidding struct fields:
void showMsg(void* AnyPtr)
{
AnyInfoStruct* MyAnyInfo = &(static_cast<*AnyPtr>);
cout << "Years: " << MyAnyInfo->Years << "\n";
cout << "ExtraData.A: " << MyAnyInfo->ExtraData.A << "\n";
}
void main()
{
AnyInfoStruct* MyAnyInfo;
// hide struct into a ptr
void* AnyPtr = AnyInfoStruct;
showMsg(MyAnyInfo);
}
Cheers.
UPDATE1: Added "union" to example.
Since you say that the struct contains ints and shorts, I'm going to go out on a limb and answer on the assumption that this union is POD. If so then you benefit from 9.5/1:
one special guarantee is made in
order to simplify the use of unions:
If a POD-union contains several
POD-structs that share a common
initial sequence (9.2), and if an
object of this POD-union type contains
one of the POD-structs, it is
permitted to inspect the common
initial sequence of any of POD-struct
members
So, assuming your structure looks like this:
struct Foo1 { UInt32 a; other stuff; };
struct Foo2 { UInt32 b; other stuff; };
...
struct Foo15 { UInt32 o; other stuff; };
struct Bar { UInt16 p; other stuff; };
// some kind of packing pragma
struct Baz {
char is_it_Foo;
char something_else;
union {
Foo1 f1;
Foo2 f2;
...
Foo15 f15;
Bar b;
} u;
};
Then you can do this:
Baz *baz = whatever;
if (baz->is_it_Foo) {
UInt32 n = baz->u.f1.a;
}
If the members of the union aren't POD, then your reinterpret_cast is broken anyway, since there is no longer any guarantee that the first data member of the struct is located at offset 0 from the start of the struct.