Size of empty class and empty function? - c++

What will be solution in the following code
Class A{}
void func(){}
printf("%d,%d",sizeof(A),sizeof(func));

Size of an empty class is non zero(most probably 1), so as to have two objects of the class at different addresses.
http://www2.research.att.com/~bs/bs_faq2.html#sizeof-empty explains it better
class A{};
void func(){}
std::cout<<sizeof(A)<<std::endl<<sizeof(&func));// prints 1 and 4 on my 32 bit system

You are taking the size of a function which cannot be done. You need to preceede the name with a & to take the size of a pointer to that function. You also need to cast the sizeof value to be of type int, which is what printf expects for the d specifier
printf("%d,%d", (int)sizeof(A), (int)sizeof(&func));
As for the concrete values ---- It's not known what they are beyond that are greater or equal to 1. It depends on the compiler.

By sizeof(func), you probably mean sizeof(&func), which is the size of a function pointer in bytes.
As far as what the size of an A object is, the standard only says:
Complete objects and member subobjects of class type shall have nonzero size.
which is qualified by footnote 94:
Base class subobjects are not so constrained.
The reason for the footnote is to allow for a compiler optimization known as empty member optimization, which is commonly used in template libraries to access routines of a class type (e.g. allocator routines of an allocator class type) without needing a member object of that class type.

Related

Is converting a reinterpret_cast'd derived class pointer to base class pointer undefined behavior?

Have a look at is simple example:
struct Base { /* some virtual functions here */ };
struct A: Base { /* members, overridden virtual functions */ };
struct B: Base { /* members, overridden virtual functions */ };
void fn() {
A a;
Base *base = &a;
B *b = reinterpret_cast<B *>(base);
Base *x = b;
// use x here, call virtual functions on it
}
Does this little snippet have Undefined Behavior?
The reinterpret_cast is well defined, it returns an unchanged value of base, just with the type of B *.
But I'm not sure about the Base *x = b; line. It uses b, which has a type of B *, but it actually points to an A object. And I'm not sure, whether x is a "proper" Base pointer, whether virtual functions can be called with it.
static_cast (or an implicit derived-to-base-pointer conversion, which does exactly the same thing) is substantially different from reinterpret_cast. There is no guarantee that that the base subobject starts at the same address as the complete object.
Most implementations place the first base subobject at the same address as the complete object, but of course even such implementations cannot place two different non-empty base subobjects at the same address. (An object with virtual functions is not empty). When the base subobject is not at the same address as the complete object, static_cast is not a no-op, it involves pointer adjustment.
There are implementations that never place even the first base subobject at the same address as the complete object. It is allowed to place the base subobject after all members of derived, for example. IIRC the Sun C++ compiler used to layout classes this way (don't know if it's still doing that). On such an implementation, this code is almost guaranteed to fail.
Similar code with B having more than one base will fail on many implementations. Example.
The reinterpret_cast is valid (the result can be dereferenced) if the two classes are layout-compatible; that is
they both have standard layout,
they both have the same non-static data members
But the classes do not have standard layout because one of the requirements of StandardLayoutType it that the class has no virtual functions or virtual base classes.
Regarding the validity of pointers derived from conversions, the standard has this to say in the section on "Safely-derived pointers":
6.7.4.3 Safely-derived pointers
4. An implementation may have relaxed pointer safety, in which case the validity of a pointer value does not depend on whether it is a safely-derived pointer value. Alternatively, an implementation may have strict pointer safety, in which case a pointer value referring to an object with dynamic storage duration that is not a safely-derived pointer value is an invalid pointer value unless the referenced complete object has previously been declared reachable. [ Note: The effect of using an invalid pointer value (including passing it to a deallocation function) is undefined, see 6.7.4.2. This is true even if the unsafely-derived pointer value might compare equal to some safely-derived pointer value. —end note ] It is implementation-defined whether an implementation has relaxed or strict pointer safety.
Yes, It does have undefined behavior. The layout about suboject of Base in A and B is undefined. x may be not a real Base oject.
If A and B are a verbatim copy of each other (except for their names) and are declared in the same context (same namespace, same #defines, no __LINE__ usage), then common C++ compilers (gcc, clang) will produce two binary representations which are fully interchangeable.
If A and B use the same method signatures but the bodies of corresponding methods differ, it is unsafe to cast A* to B* because the optimization pass in the compiler could for example partially inline the body of void B::method() at the call site b->method() while the programmer's assumption could be that b->method() will call A::method(). Therefore, as soon as the programmer uses an optimizing compiler the behavior of accessing A through type B* becomes undefined.
Problem: All compilers are always at least to some extent "optimizing" the source code passed to them, even at -O0. In cases of behavior not mandated by the C++ standard (that is: undefined behavior), the compiler's implicit assumptions - when all optimizations are turned off - might differ from programmer's assumptions. The implicit assumptions have been made by the developers of the compiler.
Conclusion: If the programmer is able to avoid using an optimizing compiler then it is safe to access A via B*. The only issue such a programmer needs to tackle with is that non-optimizing compilers do not exist.
A managed C++ implementation might abort the program when A* is casted to B* via reinterpret_cast, when b->field is accessed, or when b->method() is called. Some other managed C++ implementation might try harder to avoid a program crash and so it will resort to temporary duck typing when it sees the program accessing A via B*.
Some questions are:
Can the programmer guess what the managed C++ implementation will do in cases of behavior not mandated by the C++ standard?
What if the programmer sends the code to another programmer who will pass it to a different managed C++ implementation?
If a case isn't covered by the C++ standard, does it mean that a C++ implementation can choose to do anything it considers appropriate in order to cope with the case?

Odd usage of special pointer values

I am using a C++ implementation of an algorithm which makes odd usage of special pointer values, and I would like to known how safe and portable is this.
First, there is some structure containing a pointer field. It initializes an array of such structures by zeroing the array with memset(). Later on, the code relies on the pointer fields initialized that way to compare equal to NULL; wouldn't that fail on a machine whose internal representation of the NULL pointer is not all-bits-zero?
Subsequently, the code sets some pointers to, and laters compares some pointers being equal to, specific pointer values, namely ((type*) 1) and ((type*) 2). Clearly, these pointers are meant to be some flags, not supposed to be dereferenced. But can I be sure that some genuine valid pointer would not compare equal to one of these? Is there any better (safe, portable) way to do that (i.e. use specific pointer values that can be taken by pointer variables only through explicit assignment, in order to flag specific situations)?
Any comment is welcome.
To sum up the comments I received, both issues raised in the question are indeed expected to work on "usual" setup, but comes with no guarantee.
Now if I want absolute guarantees, it seems my best option is, for the NULL pointers, set them either manually or with a proper constructor, and for the special pointer values, to create manually sentinel pointer values.
For the latter, in a C++ class I guess the most elegant solution is to use static members
class The_class
{
static const type reserved;
static const type* const sentinel;
};
provided that they can be initialized somewhere:
const type The_class::reserved = foo; // 'foo' is a constant expression of type 'type'
const type* const The_class::sentinel = &The_class::reserved;
If type is templated, either the above initialization must be instantiated for each type intended, or one must resort to non-static (less elegant but still usefull) "reserved" and "sentinel" members.
template <typename type>
class The_class
{
type reserved; // cannot be static anymore, nor const for complicated 'type' without adapted constructor
const type* const sentinel;
public:
The_class() : sentinel(&reserved);
};

Struct alignment and type reinterpretation

Lets say I have two types A and B.
Then I make this type
struct Pair{
A a;
B b;
};
Now I have a function such as this.
void function(Pair& pair);
And lets assume that function will only ever use the a part of the pair.
Then is it undefined behavior to use and call the function in this way?
A a;
function(reinterpret_cast<Pair&>(a));
I know that a compiler may insert padding bytes after a member but can it also do it before the first member?
I think it's defined behavior, assuming Pair is standard-layout. Otherwise, it's undefined behavior.
First, a standard layout class and its first member share an address. The new wording in [basic.compound] (which clarifies earlier rules) reads:
Two objects a and b are pointer-interconvertible if:
* [...]
* one is a standard-layout class object and the other is the first non-static data member of that object,
or, [...]
* [...]
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a
pointer to one from a pointer to the other via a reinterpret_cast (5.2.10).
Also from [class.mem]:
If a standard-layout class object has any non-static data members, its address is the same as the address
of its first non-static data member. Otherwise, its address is the same as the address of its first base class
subobject (if any).
So the reinterpret_cast from A to Pair is fine. If then function only ever access the a object, then that access well-defined, as the offset of A is 0, so the behavior is equivalent to having function take an A& directly. Any access to the b would be undefined, obviously.
However, while I believe the code is defined behavior, it's a bad idea. It's defined behavior NOW, but somebody someday might change function to refer to pair.b and then you're in a world of pain. It'd be a lot easier to simply write:
void function(A& a) { ... }
void function(Pair& p) { function(p.a); }
and just call function directly with your a.
Yes, it's undefined behaviour.
In a struct pair, there can be padding between a and b. An assignment to a member of a struct can modify any padding in the struct. So an assignment to pair.a can modify memory where it thinks there is padding in the struct, where in reality there is just random memory following the memory occupied by your a.

Array is not being default initialized

In class Foo, I have a member variable of type Bar. Bar is constructed in my Foo constructor by passing in a pointer to an array like this:
template<typename> T
class Foo{
Bar myBar;
size_t mySize;
size_t top;
Foo(size):myBar(new T[size]),mySize(size),top(0){}; //creates a Bar element of certain size
}
template<typename> T
class Bar{
T* myListPtr;
Bar::Bar(T* tPtr):myListPtr(tPtr){} //stores a pointer to a T array
}
Now to my knowledge, if T has a default constructor, C++ should be calling this on all elements in the array. However when I output the values that are being pointed to, I got an array like this(with size=8 and T type 'double')
9.3218e-306
0
0
0
0
3.81522e-270
nan
nan
How do I make sure that all the values are default initialized and not just some. This array might hold char, double, any type, but the type will always have have a default constructor. What could be causing this problem?
Now to my knowledge, if T has a default constructor, C++ should be calling this on all elements in the array. However when I output the values that are being pointed to, I got an array like this(with size=8 and T type 'double')
You are correct in that the default constructor will be used if one exists. However primitive types like double (and int, float, char, etc) do not have constructors, default or otherwise. Although the language lets you work with primitive types and class types with the same syntax in many cases, this is a fundamental difference.
So given a class MyClass then
MyClass myInstance;
will call its default constructor.
But given a double, then
double myDouble;
will not initialize it in any way.
As a consequence, creating an array of doubles as you do in your example will not initialize them.
Now the next logical question would be how to initialize them. This stackoverflow question provides an answer and also gives the worthwhile advice that using std::vector can help avoid some of the troubles that raw arrays often have.

What is the syntax for an array type?

Is it type[]? For example, could I have
T<int[]>;
for some template T.
The type of an "array of type T" is T [dimension], which is what you could pass as template parameters. E.g.:
someTemplate<int [10]> t; // array type as template parameter
int a[5]; // array of 5 ints named 'a'
Arrays need to have a dimension which must be greater than 0. This means that e.g. U u[]; is illegal.
There are cases that might seem like exceptions, the first being parameters:
void f(T[]);
This is a special rule for parameters and f() is actually equivalent to the following:
void f(T*);
Then there is direct inialization of arrays:
int a[] = { 1, 2, 3, 4 };
Here the array size is implicitly given through the number of elements in the initializer, thus the type of a is int[4].
There are also incomplete array types without specificied bounds, but you can't directly create instances of these (see Johannes answer for more):
template<class T> struct X { typedef T type; };
X<int[]>::type a = { 1, 2, 3 };
If you are looking for dynamic arrays, prefer standard containers like std::vector<T> instead.
There are two syntaxes to denote array types. The first is the type-id syntax and is used everywhere where the language expects a compile time type, which looks like:
T[constant-expression]
T[]
This specifies an array type that, in the first form, has a number of elements given by an integer constant expression (means it has to be known at compile time). In the second form, it specifies an array type with an unknown number of elements. Similar to class types that you declare without a body, such an array type is said to be incomplete, and you cannot create arrays of that type
// not valid: what size would it have?
int a[];
You can, however, specify that type. For example you may typedef it
typedef int unknown_int_array[];
In the same manner, you may specify it as a template type argument, so the answer to your question is yes you can pass such a type specifier to a template. Notice that i talk about specifiers here, because the form you use here is not the type itself.
The second way is using the new-type-id syntax which allows denoting runtime types by having non-constant bounds
T[expression]
This allows passing variables as element count, and also allows passing a zero. In such a case, a zero element array is created. That syntax is only usable with the new operator for supporting dynamic arrays.
If possible, you might consider instead using dynamic arrays, and passing in a pointer as the templated type. Such as...
T<int*> myVar;
This started as a comment to Georg's answer, but it ran a bit long...
It seems that you may be missing some key abstraction in your mental model of arrays (at least C-style ones). Local arrays are allocated on the stack with a hard-coded size. If you have an array inside a class or struct, the space for the array is part of the object itself (whether on the stack or heap). Global arrays may even be represented directly in the size of the executable.
This means that any time you want to use an array, you must specify its size to the compiler. The only reason you can leave the brackets empty in a parameter list is because functions treat array parameters as pointers. The function would hardly be useful if it could only operate on one size of array.
Templates are no exception. If you want the size of the templated array to vary, you can add an extra template parameter. You still have to specify the size at compile time for any given instance, though.
The syntax for declaring arrays is
<type> <variable>[<size>];
When using a template the declaration is, in example
template <class T>
T var[4];