Can "sizeof" a class or object ever be zero? - c++

We all know that sizeof an empty class or an object of empty class will be 1 byte.
I came across something where sizeof a class and its object is coming as 0. The program is syntactically correct as there were no compilation or run time errors. Is this undefined behavior? The use case I'm trying to execute makes any sense and looks like a valid one? Is it a big blunder to not to give exact subscript or size for an array in the class? The code snippet is as below:
#include<iostream>
using namespace std;
class A
{
char a[];
};
int main()
{
A b;
cout<<sizeof(A)<<endl;
cout<<sizeof(b)<<endl;
return 0;
}
output:
0
0
The sizeof an empty class is one byte (non zero basically) and the reason for that is said like "To make sure that different objects have different addresses".
What happens in this case then when sizeof class is coming a zero?
Note: Observed the same behavior for int a[] as well.

It's called "flexible array member" and it's a feature of C99 (I think). It's not valid C++ - you don't have warnings/errors, probably because the compiler supports it as an extension.
Compiling with -Wall -Wextra -pedantic -std=c++NN (98, 03, 11, 14, ..) should generate warning (the last two flags will disable any compiler extensions).
You can see some information in this related question: Is using flexible array members in C bad practice?
For example, here's what GCC says about this:
In ISO C99, you would use a flexible array member, which is slightly different in syntax and semantics:
...
Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero.
(source: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html).
This explains the 0 size of char a[] and not the 0 for the class, but as I already mentioned - it's a C feature and not a valid C++.

If you compile with -pedantic flag
$ g++ -W -Wall -pedantic prog.cpp prog.cpp:5:11: warning: ISO C++
forbids zero-size array ‘a’ [-pedantic]
C++ does not support VLAs and thus your class declaration is not legal and going outside the scope of standard C++ rules.

Your code is not standard C++, thus I can not see any sense in that.
If you use pedantic flag, you should receive this:
gsamaras#pythagoras:~$ g++ -pedantic file.cpp
file.cpp:5:11: warning: ISO C++ forbids zero-size array ‘a’ [-Wpedantic]
char a[];
^
Try changing your class to
class A {
char a[5];
};
then you should get an output of
5
5
like you should expect.
However, you can argue that without the flag, your code does compile and outputs zeroes. As a counter I could say that the same goes if you use this class:
class A {
char a[0];
};
but I am pretty sure you know that zero-sized arrays are not allowed, but still this thing compiles fine and gives an output of zeroes.

Empty base classes can be optimized to zero bytes, which would technically make sizeof(base) also 0.
The "1 byte" thing is really an implementation detail, coming from the rule that distinct objects need to have distinct addresses.
So:
struct base { };
struct derived : base { };
Both sizeof(base) and sizeof(derived) are allowed to be 0, because the derived object is the same object as the base object contained within.
However:
struct base1 { };
struct base2 { };
struct derived : base1, base2 { };
Here, sizeof(derived) must be 1, because the standard requires that
derived d;
assert(static_cast<base1 *>(&d) != static_cast<base2 *>(&d));
Similarly:
struct type1 { };
struct type2 { };
struct combined { type1 obj1; type2 obj2; };
requires that
combined c;
assert(&c.obj1 != &c.obj2);
Many compiler vendors take the shortcut and simply make empty classes take up one byte.

The size of a class can be 0. Consider the following piece of code
#include <iostream>
using namespace std;
class A
{
public:
int a[0];
void getA(){
cout<<"Hello World";
}
};
class B
{
};
int main()
{
cout<<"The size of A is "<<sizeof(A)<<endl; // prints 0
A w;
cout<<"The size of object of A is "<<sizeof(w)<<endl; //prints 0
cout<<"The size of the array a in A is "<<sizeof(w.a)<<endl; // prints 0
cout<<"The value from function of class A is "<<w.getA()<<endl; // Gives a compilation error
cout<<"The size of B is "<<sizeof(B)<<endl; //prints 1
}
Output:
The size of A is 0
The size of object of A is 0
The size of the array a in A is 0
The size of B is 1
So, accessing functions present in the class with a size 0 leads to compilation error.

Related

POD-ness with nested structs/classes

I have a question concerning POD-ness. I expected that if B is non-POD and B is a member in A, so would A be non-POD.
However the following code example outputs "10", hence B is correctly considered non-POD but A is.
struct A
{
int i;
struct B
{
std::string s;
};
};
std::cout << std::is_pod<A>::value;
std::cout << std::is_pod<A::B>::value;
Is this a bug in GCC? I'm using "c++ (GCC) 7.3.1 20180312".
I don't see the sense in this behaviour. Lets say I wanted to optimize buffer allocations and use the POD-check in order to determine whether I would have to use new or can use malloc/realloc for a specific type. I would be totally wrong to use malloc to allocate storage for A.
Best regards
A has a type A::B in it.
Instances of A have no instance of A::B in it. There is only a definition of the type, but no instantiation of it.
Add B b; to A and your anomaly goes away.

Struct member order causing "non-trivial designated initializers not supported" error

Previously seen here and here
I have the following structure:
struct myStruct {
long int mem0;
int mem1;
int mem2;
// -- Place 1 --
short int sh;
-- Place 2 --
char array[5];
// -- Place 3 --
};
I try to initialize it as follows:
struct myStruct ms1 = {
mem0 : 124,
mem1 : 120,
mem2 : 99,
mem3 : 12,
}; // Line 36
If I place any of the following lines in place 2 or place 3
char mem3;
int mem3;
I get the following error:
Azulejo-Main-Engine-1v2-4% g++ test2.cpp -o test2
test2.cpp: In function ‘int main()’:
test2.cpp:36:5: sorry, unimplemented: non-trivial designated initializers not supported
};
^
Azulejo-Main-Engine-1v2-4%
However, If I place it in place 1, my program compiles (and executes as expected).
Can you please explain me why this is the case?.
I'm trying to port C code into C++. How can prevent this kind of errors?. I don't have any control on the structure declarations used by the code.
Azulejo-Main-Engine-1v2-4% g++ --version
g++ (GCC) 5.2.0
"sorry, unimplemented: <...>" always means that the compiler simply hasn't been updated to allow this yet. That's regardless of whether it will be allowed in the future, whether it is allowed by the standard, or whether it even makes sense.
As mentioned in the comments, this is not valid C++, this is a compiler extension. You can avoid problems of this kind by limiting yourself to valid C++. GCC will diagnose this and many other extensions if you pass it the -pedantic flag. It will treat such extensions as a hard error if you pass it the -pedantic-errors flag. If you then see that you are writing non-portable C++, update your code to make it portable, either by:
struct myStruct ms1 = {
124,
120,
99,
12
};
which requires you to place mem3 after mem2, or by
struct myStruct ms1 {};
ms1.mem0 = 124;
ms1.mem1 = 120;
ms1.mem2 = 99;
ms1.mem3 = 12;
which does not require any specific placement of mem3, or by adding a constructor to your myStruct taking mem0...mem3 as parameters.
Some extra details about the complications of this extension:
The initialisation of structures in C++ normally happens in whatever order the fields are declared. This makes handling exceptions fairly easy: the fields can be destroyed in the reverse order. If not all fields had been constructed yet, then start the destruction at the last field that had been constructed.
If you allow fields to be initialised in arbitrary order, then the destruction gets complicated. Given struct myStruct ms1 { mem1: f(), mem0: g() };, if g throws an exception, then either mem1 was already initialised and needs to be destructed, without mem0 also getting destructed, or the compiler rearranges the initialisers, meaning g() gets called before f(). The former is hard to get right in the compiler, the latter is very unintuitive.
A special exception could be made for trivially destructable fields, where no user code needs to be run when the fields are destroyed, but it hasn't been implemented yet.
If you place mem3 right after mem2, then the problem is avoided: the order of initialisation matches the field order exactly.

Empty array declaration - strange compiler behavior

I've found a strange looking piece of code in a project I have to maintain. There's an empty array member of a class which doesn't lead to an compiler error. I've tested some variations of such a code with MSVC 10.0:
template<class T> struct A {
int i[];
}; // warning C4200: nonstandard extension used : zero-sized array in struct/union
template<class T> struct B { static int i[]; };
template<class T> int B<T>::i[];
struct C {
int i[];
}; //warning C4200: nonstandard extension used : zero-sized array in struct/union
template<class T> struct D { static int i[]; };
template<class T> int D<T>::i[4];
template<> int D<int>::i[] = { 1 };
int main()
{
A<void> a;
B<void> b;
C c;
D<void> d0;
D<int> d1;
a.i[0] = 0; // warning C4739: reference to variable 'a' exceeds its storage space
b.i[0] = 0; // warning C4789: destination of memory copy is too small
c.i[0] = 0; // warning C4739: reference to variable 'c' exceeds its storage space
int i[]; // error C2133: 'i' : unknown size
d0.i[0] = 0; // ok
d0.i[1] = 0; // ok
return 0;
}
The error message at int i[] is absolutely sensible to me. The code which is shown with class D is well-formed standard C++. But what's about the classes A, B and C? What kind of types are the member variables int i[] in this classes?
EDIT:
your doubt is explained by the definition of the extension to the language, which allows for zero-sized arrays at the end of structs/unions. I have not tried it, but if you declare another member after the zero-sized array, it should fail.
so, if you allocate a variable on the stack, you have to know its size; the exception to the rule is when allocating an array at the end of a struct/union, where some C-typical trickery is possible.
In c++ this raises a warning because the default copy constructor and assignment operator will probably not work.
PREVIOUS ANSWER:
The compiler warns you about the fact that you are trying to define an array with zero size. This is not allowed in standard C/C++.
Let's see the differences class by class.
In class D:
template<class T> struct D { static int i[]; };
it works because you are just declaring the type of a static member variable. For this to link, you need also defining the actual array, in a definition statement like you do:
template<> int D<int>::i[] = { 1 };
here you also specify the size of the array through the initializer.
With class B, you are doing something similar, but the definition is:
template<class T> int B<T>::i[];
i.e., you don't specify the size and get the warning.
With class A, more of the same, you are defining a member variable of type array without the size.
Good one. Just to be certain, you are wondering why the compiler isn't flagging it as an error right? In that case, I think this problem is unpredictable across compilers but I'm aware of this happening on MSVC all the time.
http://support.microsoft.com/kb/98409
Let me see if I can explain it like they did. If I were to declare a struct with an empty array like this,
struct a
{
int x;
char empty[];
};
the compiler might allocate 4 bytes for x and probably another 4 bytes for the char pointer. empty will contain the address 4 bytes past the start of struct a.
Since it is a character array of no length, trying to access it would be an error since there is no trailing 0 to signify the end of the string.
I could choose to initialize the struct later to point to the start of an actual string to overcome this error.
struct a myStruct = { 1, "hello world"}; // empty now points to the start of "hello world"
Since a struct is basically a class, turns out you can do the same thing with a class if you make sure its an aggregate and not a full class.
So there ya go. MSVC compilers treat arrays with no fixed sized as a pointer when declared within a struct/class. Remember that class definitions are merely just declarations. The compiler doesn't allocate space for them until you create an instance for it. When you start to think about it, it sorta makes since. How will the compiler know if you plan to allocate storage for it later. It becomes a run-time artifact but the compiler was still smart enough to warn you about the problem.

Virtual tables on anonymous classes

I have something similar to this in my code:
#include <iostream>
#include <cstdlib>
struct Base
{
virtual int Virtual() = 0;
};
struct Child
{
struct : public Base
{
virtual int Virtual() { return 1; }
} First;
struct : public Base
{
virtual int Virtual() { return 2; }
} Second;
};
int main()
{
Child child;
printf("ble: %i\n", ((Base*)&child.First)->Virtual());
printf("ble: %i\n", ((Base*)&child.Second)->Virtual());
system("PAUSE");
return 0;
}
I'd expect this to give this output:
ble: 1
ble: 2
and it does so, when compiled under GCC (3.4.5 I believe).
Compiling and running this under Visual Studio 2008 however, gives this:
ble: 2
ble: 2
What is interesting, is that if I give the Base-derived structs names (struct s1 : public Base), it works correctly.
Which behavior, if any, is correct? Is VS just being prissy, or is it adhering to the standard? Am I missing something vital here?
It appears this is a bug in VS 2008, possibly because it overwrites or ignores the vtable for the first unnamed class in favor of the vtable for the second since the internal names are identical. (When you name one explicitly, the internal names for the vtables are no longer identical.)
As far as I can tell from the standard, this should work as you expect and gcc is right.
It is visible how MSVC is getting it wrong from the debugging symbols. It generates temporary names for the anonymous structs, respectively Child::<unnamed-type-First> and Child::<unnamed-type-Second>. There is however only one vtable, it is named Child::<unnamed-tag>::'vftable' and both constructors use it. The different name for the vtable surely is part of the bug.
There are several bugs reported at connection.microsoft.com that are related to anonymous types, none of which ever made it to "must-fix" status. Not the one you found though, afaict. Maybe the workaround is just too simple.
I can confirm this is a known bug in the VC compiler (and it repos in VC10); the two anonymous classes are incorrectly sharing a vtable.
Anonymous structs are not part of the C++ standard.
Edit: Anonymous structs are kind of an ambiguous term. It can mean two things:
class outer
{
public:
struct {
int a;
int b;
} m_a; // 1
struct {
int c;
}; // 2
union {
int d;
int e;
}; // 3
};
1 is what is going on here, a better name than anonymous struct would be "unnamed struct". The struct type itself doesn't have a name, but the object does (m_a).
2 is also known as an anonymous struct, and isn't legal C++. There is no object name, and the idea is you could access the field 'c' directly on objects of type outer. This compiles only because of a compiler extension in Visual Studio (will fail under /Za)
3 Anonymous unions, by contrast, are legal C++.
I confused the two, because here we're calling #1 an "anonymous struct", and wires in my brain crossed with #2.

Is this C++ structure initialization trick safe?

Instead of having to remember to initialize a simple 'C' structure, I might derive from it and zero it in the constructor like this:
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
memset(this, 0, sizeof(MY_STRUCT));
}
};
This trick is often used to initialize Win32 structures and can sometimes set the ubiquitous cbSize member.
Now, as long as there isn't a virtual function table for the memset call to destroy, is this a safe practice?
You can simply value-initialize the base, and all its members will be zero'ed out. This is guaranteed
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct():MY_STRUCT() { }
};
For this to work, there should be no user declared constructor in the base class, like in your example.
No nasty memset for that. It's not guaranteed that memset works in your code, even though it should work in practice.
PREAMBLE:
While my answer is still Ok, I find litb's answer quite superior to mine because:
It teaches me a trick that I did not know (litb's answers usually have this effect, but this is the first time I write it down)
It answers exactly the question (that is, initializing the original struct's part to zero)
So please, consider litb's answer before mine. In fact, I suggest the question's author to consider litb's answer as the right one.
Original answer
Putting a true object (i.e. std::string) etc. inside will break, because the true object will be initialized before the memset, and then, overwritten by zeroes.
Using the initialization list doesn't work for g++ (I'm surprised...). Initialize it instead in the CMyStruct constructor body. It will be C++ friendly:
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct() { n1 = 0 ; n2 = 0 ; }
};
P.S.: I assumed you did have no control over MY_STRUCT, of course. With control, you would have added the constructor directly inside MY_STRUCT and forgotten about inheritance. Note that you can add non-virtual methods to a C-like struct, and still have it behave as a struct.
EDIT: Added missing parenthesis, after Lou Franco's comment. Thanks!
EDIT 2 : I tried the code on g++, and for some reason, using the initialization list does not work. I corrected the code using the body constructor. The solution is still valid, though.
Please reevaluate my post, as the original code was changed (see changelog for more info).
EDIT 3 : After reading Rob's comment, I guess he has a point worthy of discussion: "Agreed, but this could be an enormous Win32 structure which may change with a new SDK, so a memset is future proof."
I disagree: Knowing Microsoft, it won't change because of their need for perfect backward compatibility. They will create instead an extended MY_STRUCTEx struct with the same initial layout as MY_STRUCT, with additionnal members at the end, and recognizable through a "size" member variable like the struct used for a RegisterWindow, IIRC.
So the only valid point remaining from Rob's comment is the "enormous" struct. In this case, perhaps a memset is more convenient, but you will have to make MY_STRUCT a variable member of CMyStruct instead of inheriting from it.
I see another hack, but I guess this would break because of possible struct alignment problem.
EDIT 4: Please take a look at Frank Krueger's solution. I can't promise it's portable (I guess it is), but it is still interesting from a technical viewpoint because it shows one case where, in C++, the "this" pointer "address" moves from its base class to its inherited class.
Much better than a memset, you can use this little trick instead:
MY_STRUCT foo = { 0 };
This will initialize all members to 0 (or their default value iirc), no need to specifiy a value for each.
This would make me feel much safer as it should work even if there is a vtable (or the compiler will scream).
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I'm sure your solution will work, but I doubt there are any guarantees to be made when mixing memset and classes.
This is a perfect example of porting a C idiom to C++ (and why it might not always work...)
The problem you will have with using memset is that in C++, a struct and a class are exactly the same thing except that by default, a struct has public visibility and a class has private visibility.
Thus, what if later on, some well meaning programmer changes MY_STRUCT like so:
struct MY_STRUCT
{
int n1;
int n2;
// Provide a default implementation...
virtual int add() {return n1 + n2;}
};
By adding that single function, your memset might now cause havoc.
There is a detailed discussion in comp.lang.c+
The examples have "unspecified behaviour".
For a non-POD, the order by which the compiler lays out an object (all bases classes and members) is unspecified (ISO C++ 10/3). Consider the following:
struct A {
int i;
};
class B : public A { // 'B' is not a POD
public:
B ();
private:
int j;
};
This can be laid out as:
[ int i ][ int j ]
Or as:
[ int j ][ int i ]
Therefore, using memset directly on the address of 'this' is very much unspecified behaviour. One of the answers above, at first glance looks to be safer:
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I believe, however, that strictly speaking this too results in unspecified behaviour. I cannot find the normative text, however the note in 10/5 says: "A base class subobject may have a layout (3.7) different from the layout of a most derived object of the same type".
As a result, I compiler could perform space optimizations with the different members:
struct A {
char c1;
};
struct B {
char c2;
char c3;
char c4;
int i;
};
class C : public A, public B
{
public:
C ()
: c1 (10);
{
memset(static_cast<B*>(this), 0, sizeof(B));
}
};
Can be laid out as:
[ char c1 ] [ char c2, char c3, char c4, int i ]
On a 32 bit system, due to alighments etc. for 'B', sizeof(B) will most likely be 8 bytes. However, sizeof(C) can also be '8' bytes if the compiler packs the data members. Therefore the call to memset might overwrite the value given to 'c1'.
Precise layout of a class or structure is not guaranteed in C++, which is why you should not make assumptions about the size of it from the outside (that means if you're not a compiler).
Probably it works, until you find a compiler on which it doesn't, or you throw some vtable into the mix.
If you already have a constructor, why not just initialize it there with n1=0; n2=0; -- that's certainly the more normal way.
Edit: Actually, as paercebal has shown, ctor initialization is even better.
My opinion is no. I'm not sure what it gains either.
As your definition of CMyStruct changes and you add/delete members, this can lead to bugs. Easily.
Create a constructor for CMyStruct that takes a MyStruct has a parameter.
CMyStruct::CMyStruct(MyStruct &)
Or something of that sought. You can then initialize a public or private 'MyStruct' member.
From an ISO C++ viewpoint, there are two issues:
(1) Is the object a POD? The acronym stands for Plain Old Data, and the standard enumerates what you can't have in a POD (Wikipedia has a good summary). If it's not a POD, you can't memset it.
(2) Are there members for which all-bits-zero is invalid ? On Windows and Unix, the NULL pointer is all bits zero; it need not be. Floating point 0 has all bits zero in IEEE754, which is quite common, and on x86.
Frank Kruegers tip addresses your concerns by restricting the memset to the POD base of the non-POD class.
Try this - overload new.
EDIT: I should add - This is safe because the memory is zeroed before any constructors are called. Big flaw - only works if object is dynamically allocated.
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
// whatever
}
void* new(size_t size)
{
// dangerous
return memset(malloc(size),0,size);
// better
if (void *p = malloc(size))
{
return (memset(p, 0, size));
}
else
{
throw bad_alloc();
}
}
void delete(void *p, size_t size)
{
free(p);
}
};
If MY_STRUCT is your code, and you are happy using a C++ compiler, you can put the constructor there without wrapping in a class:
struct MY_STRUCT
{
int n1;
int n2;
MY_STRUCT(): n1(0), n2(0) {}
};
I'm not sure about efficiency, but I hate doing tricks when you haven't proved efficiency is needed.
Comment on litb's answer (seems I'm not yet allowed to comment directly):
Even with this nice C++-style solution you have to be very careful that you don't apply this naively to a struct containing a non-POD member.
Some compilers then don't initialize correctly anymore.
See this answer to a similar question.
I personally had the bad experience on VC2008 with an additional std::string.
What I do is use aggregate initialization, but only specifying initializers for members I care about, e.g:
STARTUPINFO si = {
sizeof si, /*cb*/
0, /*lpReserved*/
0, /*lpDesktop*/
"my window" /*lpTitle*/
};
The remaining members will be initialized to zeros of the appropriate type (as in Drealmer's post). Here, you are trusting Microsoft not to gratuitously break compatibility by adding new structure members in the middle (a reasonable assumption). This solution strikes me as optimal - one statement, no classes, no memset, no assumptions about the internal representation of floating point zero or null pointers.
I think the hacks involving inheritance are horrible style. Public inheritance means IS-A to most readers. Note also that you're inheriting from a class which isn't designed to be a base. As there's no virtual destructor, clients who delete a derived class instance through a pointer to base will invoke undefined behaviour.
I assume the structure is provided to you and cannot be modified. If you can change the structure, then the obvious solution is adding a constructor.
Don't over engineer your code with C++ wrappers when all you want is a simple macro to initialise your structure.
#include <stdio.h>
#define MY_STRUCT(x) MY_STRUCT x = {0}
struct MY_STRUCT
{
int n1;
int n2;
};
int main(int argc, char *argv[])
{
MY_STRUCT(s);
printf("n1(%d),n2(%d)\n", s.n1, s.n2);
return 0;
}
It's a bit of code, but it's reusable; include it once and it should work for any POD. You can pass an instance of this class to any function expecting a MY_STRUCT, or use the GetPointer function to pass it into a function that will modify the structure.
template <typename STR>
class CStructWrapper
{
private:
STR MyStruct;
public:
CStructWrapper() { STR temp = {}; MyStruct = temp;}
CStructWrapper(const STR &myStruct) : MyStruct(myStruct) {}
operator STR &() { return MyStruct; }
operator const STR &() const { return MyStruct; }
STR *GetPointer() { return &MyStruct; }
};
CStructWrapper<MY_STRUCT> myStruct;
CStructWrapper<ANOTHER_STRUCT> anotherStruct;
This way, you don't have to worry about whether NULLs are all 0, or floating point representations. As long as STR is a simple aggregate type, things will work. When STR is not a simple aggregate type, you'll get a compile-time error, so you won't have to worry about accidentally misusing this. Also, if the type contains something more complex, as long as it has a default constructor, you're ok:
struct MY_STRUCT2
{
int n1;
std::string s1;
};
CStructWrapper<MY_STRUCT2> myStruct2; // n1 is set to 0, s1 is set to "";
On the downside, it's slower since you're making an extra temporary copy, and the compiler will assign each member to 0 individually, instead of one memset.