Why Box smallBox = scoped!Box(); is a BUG in D? - d

In "Programming in D" book "destroy and scoped" chapter the author writes that one should be careful when using scoped as it introduces a bug IF one specifies the actual class type on the left hand side.
Box c = scoped!Box();
In that definition, c is not the proxy object; rather, as defined by the programmer, a class variable referencing the encapsulated object. Unfortunately, the proxy object that is constructed on the right-hand side gets terminated at the end of the expression that constructs it. As a result, using c in the program would be an error, likely causing a runtime error.
Such that
Box smallBox = scoped!Box(); // BUG!
auto smallBox = scoped!Box(); // works fine
const smallBox = scoped!Box(); // works fine
The explanation given is a little bit high level to me because how auto smallBox is different from Box smallBox except that the type will be inferred by the compiler? What is so different between explicit type specification and D compiler inferring that it allows scoped proxy struct to terminate the object ahead of time?

Generally speaking, the compiler will try to convert your type on the right to match the type on the left in a declaration. Both sides have a specific type already - the left-hand side will not actually influence code on the right to be a different type - just it will convert. The compiler will try to do as little conversion as possible to match the left, and issue an error if it is impossible.
auto a = x;
Here, auto has no restrictions, so the type of a is identical to the type of x, thus no conversion is necessary. This is your most basic case of type inference.
const a = x;
Here, the left-hand side is const, but otherwise is unrestricted. Thus, the compiler will try to convert x's type to const without further changing it. This is slightly more complex type inference, but still pretty simple.
Box a = x;
But here, the left-hand side is specifically typed Box. So whatever x is, the compiler will try to convert it specifically to Box. This might call upon various user-defined conversions, like alias this, inside the right-hand type, or might do implicit conversions too.
Let's get concrete:
byte a;
auto x = a; // x is `byte`, just like a
const y = a; // y is now `const(byte)`, applying const to a's type
int z = a; // z is always an int, now a is converted to it.
In the case of z there, a got implicitly converted to int. This is allowed, so no error, but z and a are now different things. You see similar things with base classes and interfaces:
class A {}
class B : A {}
auto one = new B(); // one is type B
A two = new B(); // two is now of type A, the B object got converted to the base class A
With byte, int, and with classes, this basically works as expected. Bytes and ints are basically the same thing, and classes keep a runtime type tag to remember who they really are.
But with structs, it can lead to some information being lost.
struct A {}
struct B { A a; alias a this; }
That struct B is now implicitly convertable to type A... but it does so by just returning an individual member (a) while leaving the rest of B behind.
B b;
A a = b; // the same as `A a = b.a;`, it only keeps that one member in `a`
This is what scoped does on the inside. It looks something like this:
struct Scoped(Class) {
Class c;
alias c this;
~this() { destroy(c); } // destructor also destroys child object
}
That alias this magic line allows it to convert back to the Class type... but it does so by only returning that one reference member, while abandoning the rest of Scoped, which - by the definition of Scoped's purpose - means it is gone and destroys the Class' memory in the process. Thus you are left with a reference to a destroyed object, which is the bug they are warning about.

Related

Templated function can't convert 'int' to nullptr_t

TL;DR:
Why can't templated functions access the same conversions that non-templated functions can?
struct A {
A(std::nullptr_t) {}
};
template <typename T>
A makeA(T&& arg) {
return A(std::forward<T>(arg));
}
void foo() {
A a1(nullptr); //This works, of course
A a2(0); //This works
A a3 = makeA(0); //This does not
}
Background
I'm trying to write some templated wrapper classes to use around existing types, with the goal of being drop-in replacements with minimal need to rewrite existing code that uses the now-wrapped values.
One particular case I can't get my head around is as follows: we have a class which can be constructed from std::nullptr_t (here called A), and as such, there's plenty of places in the code base where someone has assigned zero to an instance.
However, the wrapper cannot be assigned a zero, despite forwarding the constructors. I have made a very similar example that reproduces the issue without using an actual wrapper class - a simple templated function is sufficient to show the issue.
I would like to allow that syntax of being able to assign zero to continue to be allowed - it isn't my favourite, but minimising friction to moving to newer code is often a necessity just to get people on board with using them.
I also don't want to add a constructor that takes any int other than zero because that's very much absurd, was never allowed before, and it should continue to be caught at compile time.
If such a thing is not possible, it would satisfy me to find an explanation, because with as much as I know so far, it makes no sense to me.
This example has the same behaviour in VC++ (Intellisense seems to be OK with it though...), Clang, and GCC. Ideally a solution will also work in all 3 (4 with intellisense) compilers.
A more directly applicable example is follows:
struct A {
A(){}
A(std::nullptr_t) {}
};
template <typename T>
struct Wrapper {
A a;
Wrapper(const A& a):a (a) {}
template <typename T>
Wrapper(T&& t): a(std::forward<T>(t)){}
Wrapper(){}
};
void foo2() {
A a1;
a1 = 0; // This works
Wrapper<A> a2;
a2 = 0; //This does not
}
Why has the compiler decided to treat the zero as an int?
Because it is an integer.
The literal 0 is a literal. Literals get to do funny things. String literals can be converted into const char* or const char[N], where N is the length of the string + NUL terminator. The literal 0 gets to do funny things too; it can be used to initialize a pointer with a NULL pointer constant. And it can be used to initialize an object of type nullptr_t. And of course, it can be used to create an integer.
But once it gets passed as a parameter, it can't be a magical compiler construct anymore. It becomes an actual C++ object with a concrete type. And when it comes to template argument deduction, it gets the most obvious type: int.
Once it becomes an int, it stops being a literal 0 and behaves exactly like any other int. Not unless it is used in a constexpr context (like your int(0)), where the compiler can figure out that it is indeed a literal 0 and therefore can take on its magical properties. Function parameters are never constexpr, and thus they cannot partake in this.
See [conv.ptr]/1:
A null pointer constant is an integer literal with value zero or a prvalue of type std::nullptr_t. A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type [...]
So the integer literal 0 can be converted to a null pointer. But if you attempt to convert some other integer value, that is not a literal, to a pointer type then the above quote does not apply. In fact there is no other implicit conversion from integer to pointer (since none such is listed in [conv.ptr]), so your code fails.
Note: Explicit conversion is covered by [expr.reinterpret.cast]/5.

Difference between f and &f [duplicate]

It's interesting that using the function name as a function pointer is equivalent to applying the address-of operator to the function name!
Here's the example.
typedef bool (*FunType)(int);
bool f(int);
int main() {
FunType a = f;
FunType b = &a; // Sure, here's an error.
FunType c = &f; // This is not an error, though.
// It's equivalent to the statement without "&".
// So we have c equals a.
return 0;
}
Using the name is something we already know in array. But you can't write something like
int a[2];
int * b = &a; // Error!
It seems not consistent with other parts of the language. What's the rationale of this design?
This question explains the semantics of such behavior and why it works. But I'm interested in why the language was designed this way.
What's more interesting is the function type can be implicitly converted to pointer to itself when using as a parameter, but will not be converted to a pointer to itself when using as a return type!
Example:
typedef bool FunctionType(int);
void g(FunctionType); // Implicitly converted to void g(FunctionType *).
FunctionType h(); // Error!
FunctionType * j(); // Return a function pointer to a function
// that has the type of bool(int).
Since you specifically ask for the rationale of this behavior, here's the closest thing I can find (from the ANSI C90 Rationale document - http://www.lysator.liu.se/c/rat/c3.html#3-3-2-2):
3.3.2.2 Function calls
Pointers to functions may be used either as (*pf)() or as pf().
The latter construct, not sanctioned in the Base Document, appears in
some present versions of C, is unambiguous, invalidates no old code,
and can be an important shorthand. The shorthand is useful for
packages that present only one external name, which designates a
structure full of pointers to object s and functions : member
functions can be called as graphics.open(file) instead of
(*graphics.open)(file). The treatment of function designators can
lead to some curious , but valid , syntactic forms . Given the
declarations :
int f ( ) , ( *pf ) ( ) ;
then all of the following expressions are valid function calls :
( &f)(); f(); (*f)(); (**f)(); (***f)();
pf(); (*pf)(); (**pf)(); (***pf)();
The first expression on each line was discussed in the previous
paragraph . The second is conventional usage . All subsequent
expressions take advantage of the implicit conversion of a function
designator to a pointer value , in nearly all expression contexts .
The Committee saw no real harm in allowing these forms ; outlawing
forms like (*f)(), while still permitting *a (for int a[]),
simply seemed more trouble than it was worth .
Basically, the equivalence between function designators and function pointers was added to make using function pointers a little more convenient.
It's a feature inherited from C.
In C, it's allowed primarily because there's not much else the name of a function, by itself, could mean. All you can do with an actual function is call it. If you're not calling it, the only thing you can do is take the address. Since there's no ambiguity, any time a function name isn't followed by a ( to signify a call to the function, the name evaluates to the address of the function.
That actually is somewhat similar to one other part of the language -- the name of an array evaluates to the address of the first element of the array except in some fairly limited circumstances (being used as the operand of & or sizeof).
Since C allowed it, C++ does as well, mostly because the same remains true: the only things you can do with a function are call it or take its address, so if the name isn't followed by a ( to signify a function call, then the name evaluates to the address with no ambiguity.
For arrays, there is no pointer decay when the address-of operator is used:
int a[2];
int * p1 = a; // No address-of operator, so type is int*
int (*p2)[2] = &a; // Address-of operator used, so type is int (*)[2]
This makes sense because arrays and pointers are different types, and it is possible for example to return references to arrays or pass references to arrays in functions.
However, with functions, what other type could be possible?
void foo(){}
&foo; // #1
foo; // #2
Let's imagine that only #2 gives the type void(*)(), what would the type of &foo be? There is no other possibility.

Let the compiler make the final choice which type to use

This question requires knowledge of C++ template meta-programming as (indirectly) expression templates are involved. I say indirectly because its not directly a question on expression templates, but involves C++ type computations. If you don't know what that is please don't answer this question.
To avoid putting out a question without enough background information let me elaborate a bit on the general problem I am trying to solve and then go to the more specific parts.
Suppose you have a library that provides Integers that the user can do calculations with just like with ints.
Furthermore it is possible to construct a Integer from an int. Just like:
Integer<int> i(2);
Internally my Integer class is a class template:
template<class T>
class Integer {
// cut out
};
So I can define it on whatever integer type I like.
Now without changing the API, I would like to change the library in a way that if Integer was constructed from an int it should be represented internally by a different type, say IntegerLit. The reason for this is that I can speed up some calculation knowing that an instance of Integer was created from an int (can pass it as a int argument to a function instead of as a general object described by a base pointer + separate data. This just as a comment.)
It is essential that the type is different when constructing from an int because I need the compiler to take up different code paths depending on whether constructed from an int or not. I cannot do this with a runtime data flag. (The reason in short: The compiler generates a function that takes either an int or the above mentioned more general type of object depending on the type.)
Having this said I run into a problem: When the uses does something like this:
Integer<int> a,b(2);
a = b + b;
Here a should be the general Integer and b the specialized IntegerLit. However, my problem is how to express this in C++ as the user is free to use the very same type Integer to define her variables.
Making the types polymorphic, i.e. deriving IntegerLit from Integer won't work. It looks fine for a moment. However since the user creates instances of Integer (the base class) that won't work because it is the base class the compiler sticks into the expression tree (this is why expression templates are involved in the question). So again no distinction is possible between the two cases. Doing a RTTI check a la dynamic cast is really not what I want on that point.
More promising seems to be adding a literal template parameter bool lit to the type saying if it was constructed from an int or not. The point is to not specify the conversion rule for one not literal one but do specify it for the other case.
However, I can't get that to work. The following code only compiles (GCC 4.7 C++11) if not constructing from an int. Otherwise it fails because the Integer is not specified with true as the value for lit. So the compiler searches the default implementation which doesn't have the conversion rule. It is not an option changing the API and requiring to write Integer<int,true> when constructing from an int.
template<class T,bool lit=false>
class Integer
{
public:
Integer() {
std::cout << __PRETTY_FUNCTION__ << "\n";
}
T F;
};
template<>
template<class T>
class Integer<T,true>
{
public:
Integer(int i) {
std::cout << __PRETTY_FUNCTION__ << "\n";
}
T F;
};
I am starting wondering if something like this is possible with C++.
Is there maybe a new feature of C++11 that can help here?
No, this isn't how C++ works. If you define b as Integer b, then it's an Integer. THis applies regardless of the expression which is subsequently used to initialize b.
Also, consider the following: extern Integer b;. There's an expression somewhere else that initializes b, yet the compiler still has to figure out here what type b has. (Not "will have", but "has").
You can't do that, exactly anyways.
But using "auto" you can come close.
// The default case
Integer<int> MakeInt()
{
return Integer<int>();
}
// The generic case
template <class T>
Integer<T> MakeInt(T n)
{
return Integer<T>(n);
}
// And specific to "int"
Integer<IntegerLit> MakeInt(int n)
{
return Integer<IntegerLit>(n);
}
auto a = MakeInt();
auto b = MakeInt(2);
auto c = MakeInt('c');
auto d = MakeInt(2.5);
You can't do that. Once you've said a variable is Integer<int> that is the variable's type. What you can do is make the underlying rep of the Integer vary depending on which constructor is used, something like this:
template<class T>
class Integer {
// cut out
Integer() : rep_(IntRep()) {}
explicit Integer(int val) : rep_(IntLitRep(val)) {}
private:
boost::variant<IntRep, IntLitRep> rep_;
};
Then you can easily determine which variant version is active and utilize different code paths when needed.
EDIT: In this case even though the type of the Integer is the same, you can easily use template functions to make it appear that it's behaving as two separate types (since the rep changes effective type).

Container covariance in C++

I know that C++ doesn't support covariance for containers elements, as in Java or C#. So the following code probably is undefined behavior:
#include <vector>
struct A {};
struct B : A {};
std::vector<B*> test;
std::vector<A*>* foo = reinterpret_cast<std::vector<A*>*>(&test);
Not surprisingly, I received downvotes when suggesting this a solution to another question.
But what part of the C++ standard exactly tells me that this will result in undefined behavior? It's guaranteed that both std::vector<A*> and std::vector<B*> store their pointers in a continguous block of memory. It's also guaranteed that sizeof(A*) == sizeof(B*). Finally, A* a = new B is perfectly legal.
So what bad spirits in the standard did I conjure (except style)?
The rule violated here is documented in C++03 3.10/15 [basic.lval], which specifies what is referred to informally as the "strict aliasing rule"
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
In short, given an object, you are only allowed to access that object via an expression that has one of the types in the list. For a class-type object that has no base classes, like std::vector<T>, basically you are limited to the types named in the first, second, and last bullets.
std::vector<Base*> and std::vector<Derived*> are entirely unrelated types and you can't use an object of type std::vector<Base*> as if it were a std::vector<Derived*>. The compiler could do all sorts of things if you violate this rule, including:
perform different optimizations on one than on the other, or
lay out the internal members of one differently, or
perform optimizations assuming that a std::vector<Base*>* can never refer to the same object as a std::vector<Derived*>*
use runtime checks to ensure that you aren't violating the strict aliasing rule
[It might also do none of these things and it might "work," but there's no guarantee that it will "work" and if you change compilers or compiler versions or compilation settings, it might all stop "working." I use the scare-quotes for a reason here. :-)]
Even if you just had a Base*[N] you could not use that array as if it were a Derived*[N] (though in that case, the use would probably be safer, where "safer" means "still undefined but less likely to get you into trouble).
You are invoking the bad spirit of reinterpret_cast<>.
Unless you really know what you do (I mean not proudly and not pedantically) reinterpret_cast is one of the gates of evil.
The only safe use I know of is managing classes and structures between C++ and C functions calls.
There maybe some others however.
The general problem with covariance in containers is the following:
Let's say your cast would work and be legal (it isn't but let's assume it is for the following example):
#include <vector>
struct A {};
struct B : A { public: int Method(int x, int z); };
struct C : A { public: bool Method(char y); };
std::vector<B*> test;
std::vector<A*>* foo = reinterpret_cast<std::vector<A*>*>(&test);
foo->push_back(new C);
test[0]->Method(7, 99); // What should happen here???
So you have also reinterpret-casted a C* to a B*...
Actually I don't know how .NET and Java manage this (I think they throw an exception when trying to insert a C).
I think it'll be easier to show than tell:
struct A { int a; };
struct Stranger { int a; };
struct B: Stranger, A {};
int main(int argc, char* argv[])
{
B someObject;
B* b = &someObject;
A* correct = b;
A* incorrect = reinterpret_cast<A*>(b);
assert(correct != incorrect); // troubling, isn't it ?
return 0;
}
The (specific) issue showed here is that when doing a "proper" conversion, the compiler adds some pointer ajdustement depending on the memory layout of the objects. On a reinterpret_cast, no adjustement is performed.
I suppose you'll understand why the use of reinterpet_cast should normally be banned from the code...

reinterpret_cast for almost pod data (is layout-compatibility enough)

I am trying to learn about static_cast and reinterpret_cast.
If I am correct the standard (9.2.18) says that reinterpret_cast for pod data is safe:
A pointer to a POD-struct object,
suitably converted using a
reinterpret_cast, points to its
initial member (or if that member is a
bit-field, then to the unit in which
it resides) and vice versa. [ Note: There might therefore be unnamed
padding within a POD-struct object, but not at its beginning, as necessary to achieve
appropriate alignment. — end
note ]
My question is how strictly to interpret this. Is, for example, layout-compatibility enough? and if not, why not?
To me, the following example shows an example where a strict 'only POD is valid' interpretation seems to be wrong.
class complex_base // a POD-class (I believe)
{
public:
double m_data[2];
};
class complex : public complex_base
{ //Not a POD-class (due to constructor and inheritance)
public:
complex(const double real, const double imag);
}
double* d = new double[4];
//I believe the following are valid because complex_base is POD
complex_base& cb1 = reinterpret_cast<complex_base&>(d[0]);
complex_base& cb2 = reinterpret_cast<complex_base&>(d[2]);
//Does the following complete a valid cast to complex even though complex is NOT POD?
complex& c1 = static_cast<complex&>(cb1);
complex& c2 = static_cast<complex&>(cb2);
Also, what can possibly break if complex_base::m_data is protected (meaning that complex_base is not pod)? [EDIT: and how do I protect myself/detect such breakages]
It seems to me that layout-compatibility should be enough - but this does not seem to be what the standard says.
EDIT:
Thanks for the answers. They also helped me find this,
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2342.htm
I believe the following are valid because complex_base is POD
You are wrong. d[0] does not refer to the first member of a complex_base object. Its alignment may therefor not be good enough for a complex_base object, therefor such a cast is not safe (and not allowed by the text you quote).
Does the following complete a valid cast to complex even though complex is NOT POD?
cb1 and cb2 do not point to subobjects of an object of type complex, therefor the static_cast produces undefined behavior. Refer to 5.2.9p5 of C++03
If the lvalue of type "cv1 B" is actually a sub-object of an object of type D, the lvalue refers to the enclosing object of type D. Otherwise, the result of the cast is undefined.
It's not enough if merely the types involved fit together. The text talks about a pointer pointing to a POD-struct object and about an lvalue referring to a certain subobject.
oth complex and complex_base are standard-layout objects. The C++0x spec says, instead of the text you quoted:
Is POD-ness requirement too strict?
This is a different question, not regarding your example code. Yes, requiring POD-ness is too strict. In C++0x this was recognized, and a new requirement which is more loose, "standard-layout" is given. I do think that both complex and complex_base are standard-layout classes, by the C++0x definition. The C++0x spec says, instead of the text you quoted:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
I interpret that as allowing to cast a pointer to a double, which actually points to a complex member (member by inheritance), to be casted to a complex*. A Standard-layout class is one that either has no base classes containing non-static data, or has only one base-class containing non-static data. Thus there is an unique "initial member".
What can break is that non-POD class instances may have vtable pointers, in order to implement virtual dispatch, if they have any virtual functions, including the virtual dtor. The vtbl pointer will typicaly be the first member of the non-POD class.
(Technically, virtual dispatch doesn't have to implemented this way; practically it is. That's why the Standard has to be so strict about what qualifies as a POD type.)
I'm honestly not sure why just having a ctor ("8.5.1(1): "An aggregate is an array or class (clause 9) with no user-declared constructors (12.1)") disqualifies something from being a POD. But, it does.
In the particular case you have here, of course, there's no need for a reinterpret cast. Instead, just add a conversion operator to the base class:
class complex_base // a POD-class (I believe)
{
public:
double m_data[2];
operator double*() {
return m_data;
}
};
complex_base b; // create a complex_base
double* p = b;
Since a complex_base isn't a double*, the C++ compiler will apply one (and only one) user-defined conversion operator in order to assign b to p. That means that p = b invokes the conversion operator, yielding p = b.operator double*() (and note that that's actually legal syntax -- you can directly call conversion operators, not that you should), which of course does whatever it does, in this case return m_data.
Note that this is questionable, as we now have direct access to b's internals. In practice, we might return a const double*, or a copy, or a smart copy-on-write "pointer" or ....
Of course, in this case, m_data is public anyway, so we're not worse off than if we just wrote:
double* p = b.m_data;
We are a bit better off, actually, because clients of complex_base don't need to know how to convert it to a double.