I have a very big class with a bunch of members, and I want to initialize them with a given specific value.The code below is the most naive implementation, but I don't like it since it's inelegant and hard to maintain because I have to list all the members in the constructor.
struct I_Dont_Like_This_Approach {
int foo;
long bar;
unsigned baz;
int a;
int b;
int c;
int d;
SomeStruct and_so_on;
/*...*/
public:
explicit I_Dont_Like_This_Approach(int i) : foo(i), bar(i), baz(i), a(i), b(i), c(i), d(i), and_so_on(i) /*...*/ {}
};
I thought of an alternative implementation using templates.
template <int N>
struct MyBigClass {
int foo{N};
long bar{N};
unsigned baz{N};
int a{N};
int b{N};
int c{N};
int d{N};
SomeStruct and_so_on{N};
/*...*/
};
but I'm not sure if the code below is safe.
MyBigClass<1> all_one;
MyBigClass<2> all_two;
/* Is the following reinterpret_cast safe? */
all_one = reinterpret_cast<decltype(all_one) &>(all_two);
Does the C++ specification have any guarantees about the data layout compatibility of such templated structs? Or is there a more reasonable implementation? (in standard C++, and don't use macros)
I would argue that the first one is much more maintainable, with the right warnings enabled (and a modern compiler), you will see if your initializer list gets out of sync with the class fields at compile time.
As to your alternative.. you're using templates as compiler arguments, which is not what they're meant to be. That brings a whole slew of issues:
instantiated templates get copied in memory, making your executable larger. Though in this case, I'm hoping your compiler is smart enough to see that the field structure is the same and treat it as one type.
your code now works only with constant literal integers, no more run-time variables.
there is indeed no guarantee that the memory structure of those two classes is the same. You can disable optimizations in most compilers (like pack, alignment, etc), but that comes at the cost of disabling optimizations, which isn't actually necessary except to support your specific code.
And related to the last one, if you ever need to consider whether this is ever going to break, you're heading down a very dark road. I mean any sane person can tell you it will "probably work", but the fact that you have no guarantees in the language that pretty much popularized memory corruption and buffer overflows should terrify you. Write constructors.
I have tried union...
struct foo
{
union
{
struct // 2 bytes
{
char var0_1;
};
struct // 5 bytes
{
char var1_1;
int var1_2;
};
};
};
Problem: Unions do what I want, except they will always take the size of the biggest datatype. In my case I need struct foo to have some initialization that allows me to tell it which structure to chose of the two (if that is even legal) as shown below.
So after that, I tried class template overloading...
template <bool B>
class foo { }
template <>
class foo<true>
{
char var1;
}
template <>
class foo<false>
{
char var0;
int var1;
}
Problem: I was really happy with templates and the fact that I could use the same variable name on the char and int, but the problem was the syntax. Because the classes are created on compile-time, the template boolean variable needed to be a hardcoded constant, but in my case the boolean needs to be user-defined on runtime.
So I need something of the two "worlds." How can I achieve what I'm trying to do?
!!NOTE: The foo class/struct will later be inherited, therefore as already mentioned, size of foo is of utmost importance.
EDIT#1::
Application:
Basically this will be used to read/write (using a pointer as an interface) a specific data buffer and also allow me to create (new instance of the class/struct) the same data buffer. The variables you see above specify the length. If it's a smaller data buffer, the length is written in a char/byte. If it's a bigger data buffer, the first char/byte is null as a flag, and the int specifies the length instead. After the length it's obvious that the actual data follows, hence why the inheritance. Size of class is of the utmost importance. I need to have my cake and eat it too.
A layer of abstraction.
struct my_buffer_view{
std::size_t size()const{
if (!m_ptr)return 0;
if (*m_ptr)return *m_ptr;
return *reinterpret_cast<std::uint32_t const*>(m_ptr+1);
}
std::uint8_t const* data() const{
if(!m_ptr)return nullptr;
if(*m_ptr)return m_ptr+1;
return m_ptr+5;
}
std::uint8_t const* begin()const{return data();}
std::uint8_t const* end()const{return data()+size();}
my_buffer_view(std::uint_t const*ptr=nullptr):m_ptr(ptr){}
my_buffer_view(my_buffer_view const&)=default;
my_buffer_view& operator=(my_buffer_view const&)=default;
private:
std::uint8_t const* m_ptr=0;
};
No variable sized data anywhere. I coukd have used a union for size etx:
struct header{
std::uint8_t short_len;
union {
struct{
std::uint32_t long_len;
std::uint8_t long_buf[1];
}
struct {
std::short_buf[1];
}
} body;
};
but I just did pointer arithmetic instead.
Writing such a buffer to a bytestream is another problem entirely.
Your solution does not make sense. Think about your solution: you could define two independents classes: fooTrue and fooFalse with corresponding members exactly with the same result.
Probably, you are looking for a different solution as inheritance. For example, your fooTrue is baseFoo and your fooFalse is derivedFoo with as the previous one as base and extends it with another int member.
In this case, you have the polymorphism as the method to work in runtime.
You can't have your cake and eat it too.
The point of templates is that the specialisation happens at compile time. At run time, the size of the class is fixed (albeit, in an implementation-defined manner).
If you want the choice to be made at run time, then you can't use a mechanism that determines size at compile-time. You will need a mechanism that accommodates both possible needs. Practically, that means your base class will need to be large enough to contain all required members - which is essentially what is happening with your union based solution.
In reference to your "!!NOTE". What you are doing qualifies as premature optimisation. You are trying to optimise size of a base class without any evidence (e.g. measurement of memory usage) that the size difference is actually significant for your application (e.g. that it causes your application to exhaust available memory). The fact that something will be a base for a number of other classes is not sufficient, on its own, to worry about its size.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What are the differences between struct and class in C++
This question has been asked and answered a lot but every once in a while I come across something confusing.
To sum up, the difference between C++ structs and classes is the famous default public vs. private access. Apart from that, a C++ compiler treats a struct the same way it would treat a class. Structs could have constructors, copy constructors, virtual functions. etc. And the memory layout of a struct is same ad that of a class. And the reason C++ has structs is for backwward compatibility with C.
Now since people confuse as to which one to use, struct or class, the rule of thumb is if you have just plain old data, use a struct. Otherwise use a class. And I have read that structs are good in serialization but don't where this comes from.
Then the other day I came across this article: http://www.codeproject.com/Articles/468882/Introduction-to-a-Cplusplus-low-level-object-model
It says that if we have (directly quoting):
struct SomeStruct
{
int field1;
char field2;
double field3;
bool field4;
};
then this:
void SomeFunction()
{
SomeStruct someStructVariable;
// usage of someStructVariable
...
}
and this:
void SomeFunction()
{
int field1;
char field2;
double field3;
bool field4;
// usage of 4 variables
...
}
are the same.
It says the machine code generated is the same if we have a struct or just write down the variables inside the function. Now of course this only applies if your struct if a POD.
This is where I get confused. In Effective C++ Scott Meyers says that there no such thing as an empty class.
If we have:
class EmptyClass { };
It is actually laid out by the compiler for example as:
class EmptyClass
{
EmptyClass() {}
~EmptyClass() {}
...
};
So you would not have an empty class.
Now if we change the above struct to a class:
class SomeClass
{
int field1;
char field2
double field3;
bool field4;
};
does it mean that:
void SomeFunction()
{
someClass someClassVariable;
// usage of someClassVariable
...
}
and this:
void SomeFunction()
{
int field1;
char field2
double field3;
bool field4;
// usage of 4 variables
...
}
are the same in terms of machine instructions? That there is no call to someClass constructor? Or that the memory allocated is the same as instantiating a class or defining the variables individually? And what about padding? structs and classes do padding. Would padding be the same in these cases?
I'd really appreciate if somebody can shed some light on to this.
I believe the author of that article is mistaken. Although there is probably no difference between the struct and the non-member variable layout version of the two functions, I don't think this is guaranteed. The only things I can think of that are guaranteed here is that since it's a POD, the address of the struct and the first member are the same...and each member follows in memory after that at some point.
In neither case, since it's a POD (and classes can be too, don't make THAT mistake) will the data be initialized.
I would recommend not making such an assumption anyway. If you wrote code that leveraged it, and I can't imagine why you'd want to, most other developers would find it baffling anyway. Only break out the legal books if you HAVE to. Otherwise prefer to code in manners that people are used to. The only important part of all this that you really should keep in mind that POD objects are not initialized unless you do so explicitly.
The only difference is that the members of structs are public by default, while the members of classes are private by default (when I say by default, I mean "unless specified otherwise"). Check out this code:
#include <iostream>
using namespace std;
struct A {
int x;
int y;
};
class A obj1;
int main() {
obj1.x = 0;
obj1.y = 1;
cout << obj1.x << " " << obj1.y << endl;
return 0;
}
The code compiles and runs just fine.
There is no difference between structs and classes besides the default for protection (note that default protection type for base classes is different also). Books and my own 20+ years experience tells this.
Regarding default empty ctor/dector. Standard is not asking for this. Nevertheless some compiler may generate this empty pair of ctor/dector. Every reasonable optimizer would immediately throw them away. If at some place a function that is doing nothing is called, how can you detect this? How this can affect anything besides consuming CPU cycles?
MSVC is not generating useless functions. It is reasonable to think that every good compiler will do the same.
Regarding the examples
struct SomeStruct
{
int field1;
char field2;
double field3;
bool field4;
};
void SomeFunction()
{
int field1;
char field2;
double field3;
bool field4;
...
}
The padding rules, order in memory, etc may be and most likely will be completely different. Optimizer may easily throw away unused local variable. It is much less likely (if possible at all) that optimizer will remove a data field from the struct. For this to happen the struct should be in defined in cpp file, certain flags should be set, etc.
I am not sure you will find any docs about padding of local vars on the stack. AFAIK, this is 100% up to compiler for making this layout. On the contrary, layout of the structs/classes are described, there are #pargma and command line keys that control this, etc.
are the same in terms of machine instructions?
There is no reason not to be. But there is no gurantee from the standard.
That there is no call to someClass constructor?
Yes there is a call to the constructor. But the constructor does no work (as all the members are POD and the way you declare someClass someClassVariable; causes value initialization which does nothing for POD members). So since there is no work to be done there is no need to plant any instructions.
Or that the memory allocated is the same as instantiating a class or defining the variables individually?
The class may contain padding that declaring the variables individually does not.
Also I am sure that the compiler will have an easier time optimizing away individual variables.
And what about padding?
Yes there is a possibility of padding in the structure (struct/class).
structs and classes do padding. Would padding be the same in these cases?
Yes. Just make sure you compare apples to apples (ie)
struct SomeStruct
{
int field1;
char field2;
double field3;
bool field4;
};
class SomeStruct
{
public: /// Make sure you add this line. Now they are identical.
int field1;
char field2;
double field3;
bool field4;
};
I am trying to create something like a list. However, different instances of the list may have a different number of entries, and the type of entry is based on input given by the user. For example, the user states that they want the structure of each entry in the list to contain an int id, a std::string name, a double metricA, and a long metricB. Based on this input, the following is created:
struct some_struct {
int id;
std::string name;
double metricA;
long metricB;
}
list<some_struct> some_list;
The user input may be read from a file, input on the screen, etc. Additionally, their are a variable number of entries in some_struct. In other words, it may have the entries listed above, it may have just 2 of them, or it may have 10 completely different ones. Is there someway to create a struct like this?
Additionally, being able to apply comparison operators to each member of some_struct is a must. I could use boost::any to store the data, but that creates issues with comparison operators, and also incurs more overhead than is ideal.
C++ is a strongly-typed language, meaning you have to declare your data structure types. To that end you cannot declare a struct with arbitrary number or type of members, they have to be known upfront.
Now there are ways, of course, to deal with such issues in C++. To name a few:
Use a map (either std::map or std::unordered_map) to create a "table" instead of a structure. Map strings to strings, i.e. names to string representation of the values, and interpret them to your heart.
Use pre-canned variant type like boost::any.
Use polymorphism - store pointers to base in the list, and have the virtual mechanism dispatch operations invoked on the values.
Create a type system for your input language. Then have table of values per type, and point into appropriate table from the list.
There probably as many other ways to do this as there are C++ programmers.
There are many ways to solve the problem of data structures with varying members and which is best depends a lot on how exactly it is going to be used.
The most obvious is to use inheritance. You derive all your possibilities from a base class:
struct base_struct {
int id;
std::string name;
};
list<base_struct*> some_list;
struct some_struct : public base_struct {
double metricA;
};
struct some_other_struct : public base_struct {
int metricB;
};
base_struct *s1 = new some_struct;
s1->id = 1;
// etc
base_struct *s2 = new some__other_struct;
s2->id = 2;
// etc
some_list.push_back(s1);
some_list.push_back(s2);
The tricky bit is that you'll have to make sure that when you get elements back out, you case appropriately. dynamic_cast can do this in a type-safe manner:
some_struct* ss = dynamic_cast<some_struct*>(some_list.front());
You can query the name before casting using type_info:
typeid(*some_list.front()).name();
Note that both these require building with RTTI, which is usually OK, but not always as RTTI has a performance cost and can bloat your memory footprint, especially if templates are used extensively.
In a previous project, we dealt with something similar using boost any. The advantage of any is that it allows you to mix types that aren't derived from one another. In retrospect, I'm not sure I'd do that again because it made the code a bit too apt to fail at runtime because type checking is being deferred until then. (This is true of the dynamic_cast approach as well.
In the bad old C days, we solved this same problem with a union:
struct base_struct {
int id;
std::string name;
union { // metricA and metricB share memory and only one is ever valid
double metricA;
int metricB;
};
};
Again, you have the problem that you have to deal with ensuring that it is the right type yourself.
In the era before the STL, many container systems were written to take a void*, again requiring the user to know when to cast. In theory, you could still do that by saying list<void*> but you'd have no way to query the type.
Edit: Never, ever use the void* method!
I ended up using a list with a boost::variant. The performance was far better than using boost::any. It went something like this:
#include <boost/variant/variant.hpp>
#include <list>
typedef boost::variant< short, int, long, long long, double, string > flex;
typedef pair<string, flex> flex_pair;
typedef list< flex_pair > row_entry;
list< row_entry > all_records;
Instead of having to remember to initialize a simple 'C' structure, I might derive from it and zero it in the constructor like this:
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
memset(this, 0, sizeof(MY_STRUCT));
}
};
This trick is often used to initialize Win32 structures and can sometimes set the ubiquitous cbSize member.
Now, as long as there isn't a virtual function table for the memset call to destroy, is this a safe practice?
You can simply value-initialize the base, and all its members will be zero'ed out. This is guaranteed
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct():MY_STRUCT() { }
};
For this to work, there should be no user declared constructor in the base class, like in your example.
No nasty memset for that. It's not guaranteed that memset works in your code, even though it should work in practice.
PREAMBLE:
While my answer is still Ok, I find litb's answer quite superior to mine because:
It teaches me a trick that I did not know (litb's answers usually have this effect, but this is the first time I write it down)
It answers exactly the question (that is, initializing the original struct's part to zero)
So please, consider litb's answer before mine. In fact, I suggest the question's author to consider litb's answer as the right one.
Original answer
Putting a true object (i.e. std::string) etc. inside will break, because the true object will be initialized before the memset, and then, overwritten by zeroes.
Using the initialization list doesn't work for g++ (I'm surprised...). Initialize it instead in the CMyStruct constructor body. It will be C++ friendly:
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct() { n1 = 0 ; n2 = 0 ; }
};
P.S.: I assumed you did have no control over MY_STRUCT, of course. With control, you would have added the constructor directly inside MY_STRUCT and forgotten about inheritance. Note that you can add non-virtual methods to a C-like struct, and still have it behave as a struct.
EDIT: Added missing parenthesis, after Lou Franco's comment. Thanks!
EDIT 2 : I tried the code on g++, and for some reason, using the initialization list does not work. I corrected the code using the body constructor. The solution is still valid, though.
Please reevaluate my post, as the original code was changed (see changelog for more info).
EDIT 3 : After reading Rob's comment, I guess he has a point worthy of discussion: "Agreed, but this could be an enormous Win32 structure which may change with a new SDK, so a memset is future proof."
I disagree: Knowing Microsoft, it won't change because of their need for perfect backward compatibility. They will create instead an extended MY_STRUCTEx struct with the same initial layout as MY_STRUCT, with additionnal members at the end, and recognizable through a "size" member variable like the struct used for a RegisterWindow, IIRC.
So the only valid point remaining from Rob's comment is the "enormous" struct. In this case, perhaps a memset is more convenient, but you will have to make MY_STRUCT a variable member of CMyStruct instead of inheriting from it.
I see another hack, but I guess this would break because of possible struct alignment problem.
EDIT 4: Please take a look at Frank Krueger's solution. I can't promise it's portable (I guess it is), but it is still interesting from a technical viewpoint because it shows one case where, in C++, the "this" pointer "address" moves from its base class to its inherited class.
Much better than a memset, you can use this little trick instead:
MY_STRUCT foo = { 0 };
This will initialize all members to 0 (or their default value iirc), no need to specifiy a value for each.
This would make me feel much safer as it should work even if there is a vtable (or the compiler will scream).
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I'm sure your solution will work, but I doubt there are any guarantees to be made when mixing memset and classes.
This is a perfect example of porting a C idiom to C++ (and why it might not always work...)
The problem you will have with using memset is that in C++, a struct and a class are exactly the same thing except that by default, a struct has public visibility and a class has private visibility.
Thus, what if later on, some well meaning programmer changes MY_STRUCT like so:
struct MY_STRUCT
{
int n1;
int n2;
// Provide a default implementation...
virtual int add() {return n1 + n2;}
};
By adding that single function, your memset might now cause havoc.
There is a detailed discussion in comp.lang.c+
The examples have "unspecified behaviour".
For a non-POD, the order by which the compiler lays out an object (all bases classes and members) is unspecified (ISO C++ 10/3). Consider the following:
struct A {
int i;
};
class B : public A { // 'B' is not a POD
public:
B ();
private:
int j;
};
This can be laid out as:
[ int i ][ int j ]
Or as:
[ int j ][ int i ]
Therefore, using memset directly on the address of 'this' is very much unspecified behaviour. One of the answers above, at first glance looks to be safer:
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I believe, however, that strictly speaking this too results in unspecified behaviour. I cannot find the normative text, however the note in 10/5 says: "A base class subobject may have a layout (3.7) different from the layout of a most derived object of the same type".
As a result, I compiler could perform space optimizations with the different members:
struct A {
char c1;
};
struct B {
char c2;
char c3;
char c4;
int i;
};
class C : public A, public B
{
public:
C ()
: c1 (10);
{
memset(static_cast<B*>(this), 0, sizeof(B));
}
};
Can be laid out as:
[ char c1 ] [ char c2, char c3, char c4, int i ]
On a 32 bit system, due to alighments etc. for 'B', sizeof(B) will most likely be 8 bytes. However, sizeof(C) can also be '8' bytes if the compiler packs the data members. Therefore the call to memset might overwrite the value given to 'c1'.
Precise layout of a class or structure is not guaranteed in C++, which is why you should not make assumptions about the size of it from the outside (that means if you're not a compiler).
Probably it works, until you find a compiler on which it doesn't, or you throw some vtable into the mix.
If you already have a constructor, why not just initialize it there with n1=0; n2=0; -- that's certainly the more normal way.
Edit: Actually, as paercebal has shown, ctor initialization is even better.
My opinion is no. I'm not sure what it gains either.
As your definition of CMyStruct changes and you add/delete members, this can lead to bugs. Easily.
Create a constructor for CMyStruct that takes a MyStruct has a parameter.
CMyStruct::CMyStruct(MyStruct &)
Or something of that sought. You can then initialize a public or private 'MyStruct' member.
From an ISO C++ viewpoint, there are two issues:
(1) Is the object a POD? The acronym stands for Plain Old Data, and the standard enumerates what you can't have in a POD (Wikipedia has a good summary). If it's not a POD, you can't memset it.
(2) Are there members for which all-bits-zero is invalid ? On Windows and Unix, the NULL pointer is all bits zero; it need not be. Floating point 0 has all bits zero in IEEE754, which is quite common, and on x86.
Frank Kruegers tip addresses your concerns by restricting the memset to the POD base of the non-POD class.
Try this - overload new.
EDIT: I should add - This is safe because the memory is zeroed before any constructors are called. Big flaw - only works if object is dynamically allocated.
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
// whatever
}
void* new(size_t size)
{
// dangerous
return memset(malloc(size),0,size);
// better
if (void *p = malloc(size))
{
return (memset(p, 0, size));
}
else
{
throw bad_alloc();
}
}
void delete(void *p, size_t size)
{
free(p);
}
};
If MY_STRUCT is your code, and you are happy using a C++ compiler, you can put the constructor there without wrapping in a class:
struct MY_STRUCT
{
int n1;
int n2;
MY_STRUCT(): n1(0), n2(0) {}
};
I'm not sure about efficiency, but I hate doing tricks when you haven't proved efficiency is needed.
Comment on litb's answer (seems I'm not yet allowed to comment directly):
Even with this nice C++-style solution you have to be very careful that you don't apply this naively to a struct containing a non-POD member.
Some compilers then don't initialize correctly anymore.
See this answer to a similar question.
I personally had the bad experience on VC2008 with an additional std::string.
What I do is use aggregate initialization, but only specifying initializers for members I care about, e.g:
STARTUPINFO si = {
sizeof si, /*cb*/
0, /*lpReserved*/
0, /*lpDesktop*/
"my window" /*lpTitle*/
};
The remaining members will be initialized to zeros of the appropriate type (as in Drealmer's post). Here, you are trusting Microsoft not to gratuitously break compatibility by adding new structure members in the middle (a reasonable assumption). This solution strikes me as optimal - one statement, no classes, no memset, no assumptions about the internal representation of floating point zero or null pointers.
I think the hacks involving inheritance are horrible style. Public inheritance means IS-A to most readers. Note also that you're inheriting from a class which isn't designed to be a base. As there's no virtual destructor, clients who delete a derived class instance through a pointer to base will invoke undefined behaviour.
I assume the structure is provided to you and cannot be modified. If you can change the structure, then the obvious solution is adding a constructor.
Don't over engineer your code with C++ wrappers when all you want is a simple macro to initialise your structure.
#include <stdio.h>
#define MY_STRUCT(x) MY_STRUCT x = {0}
struct MY_STRUCT
{
int n1;
int n2;
};
int main(int argc, char *argv[])
{
MY_STRUCT(s);
printf("n1(%d),n2(%d)\n", s.n1, s.n2);
return 0;
}
It's a bit of code, but it's reusable; include it once and it should work for any POD. You can pass an instance of this class to any function expecting a MY_STRUCT, or use the GetPointer function to pass it into a function that will modify the structure.
template <typename STR>
class CStructWrapper
{
private:
STR MyStruct;
public:
CStructWrapper() { STR temp = {}; MyStruct = temp;}
CStructWrapper(const STR &myStruct) : MyStruct(myStruct) {}
operator STR &() { return MyStruct; }
operator const STR &() const { return MyStruct; }
STR *GetPointer() { return &MyStruct; }
};
CStructWrapper<MY_STRUCT> myStruct;
CStructWrapper<ANOTHER_STRUCT> anotherStruct;
This way, you don't have to worry about whether NULLs are all 0, or floating point representations. As long as STR is a simple aggregate type, things will work. When STR is not a simple aggregate type, you'll get a compile-time error, so you won't have to worry about accidentally misusing this. Also, if the type contains something more complex, as long as it has a default constructor, you're ok:
struct MY_STRUCT2
{
int n1;
std::string s1;
};
CStructWrapper<MY_STRUCT2> myStruct2; // n1 is set to 0, s1 is set to "";
On the downside, it's slower since you're making an extra temporary copy, and the compiler will assign each member to 0 individually, instead of one memset.