I put the tag language lawyer, although I have the feeling that this is on the wrong side of the standard boundary. I haven't seen a conversation exactly on this point, and but I had at work, so I would like to have some certainty about this.
The issue is accessing (potentially) private fields of virtual base classes. Say I compute the offset of a private field of a class, and then use this offset outside the class to access (read/write) the member variable at this location.
I saw that there is an extension for GCC and clang offsetof (this one is conditionally defined in C++17, what does it mean?), and using it is equivalent to some pointer arithmetic like this:
#include <iostream>
class A
{
int a{};
public:
int aa{};
static ptrdiff_t getAOffset()
{
A instance;
return reinterpret_cast<ptrdiff_t>(static_cast<const void*>(&instance)) - reinterpret_cast<ptrdiff_t>(static_cast<const void*>(&(instance.a)));
//return offsetof(A, a); // "same" as this call to offset
}
int get() const
{
return a;
}
};
class B: public virtual A
{
};
void update_field(char* pointer, ptrdiff_t offset, int value)
{
int* field = reinterpret_cast<int*>(pointer + offset);
*field = value;
}
void modify_a(B& instance)
{
update_field(reinterpret_cast<char*>(dynamic_cast<A*>(&instance)), A::getAOffset(), 1);
}
int main()
{
B instance;
std::cout << instance.get() << std::endl;
modify_a(instance);
std::cout << instance.get() << std::endl;
}
I also made a coliru (pedantic) that doesn't complain, but still...
https://coliru.stacked-crooked.com/a/faecd0b248eff651
Is there something in the standard that authorizes this or is this in undefined behavior land? Happy to see also if there is a difference between the standards.
Here is a sample C++ question to find out the outcome.
#include <iostream>
#include <vector>
class A
{
public:
A(int n = 0) : m_n(n) { }
public:
virtual int f() const { return m_n; }
virtual ~A() { }
protected:
int m_n;
};
class B
: public A
{
public:
B(int n = 0) : A(n) { }
public:
virtual int f() const { return m_n + 1; }
};
int main()
{
const A a(1);
const B b(3);
const A *x[2] = { &a, &b };
typedef std::vector<A> V;
V y({ a, b });
V::const_iterator i = y.begin();
std::cout << x[0]->f() << x[1]->f()
<< i->f() << (i + 1)->f() << std::endl;
return 0;
}
The output I expected was "1 4 1 4" but the correct answer is "1 4 1 3".
From above,
x[0]->f()
i.e., x[0] is nothing but a pointer to an object of type A and calling f() returns 1.
x[1]->f()
i.e., x[1] is nothing but a pointer to an object of type A (base class pointer pointing to derived class object) and calls derived class f() that returns (3 + 1) = 4
I am not sure how this behaves when we add the objects a and b into a vector container and iterating them through const_iterator with inheritance
i->f()
I can understand this as i is just a pointer to the first element i.e., object a.
But what will happen here?
(i + 1)->f()
My understanding is that it points to the next element in the sequence i.e., object b and calling f() through derived class pointer should call its member function rather than base class one's?
The vector y contains two objects of type A. Not type B. When it is constructed, it makes copies of a and b, slicing b as it does so. So (i + 1)->f() calls A::f() on that copy of the A portion of b, giving 3.
I have two classes
class A { C* c; }
class B { D* d; }
and find I need to construct a std::vector whose elements are either A or B (with the sequence decided at run time. So I constructed a polymorphic
class Poly {
int oType;
void* oPtr;
}
as well as constructor
Poly::Poly(int type)
{
if (type == 1) oPtr = new (A*) oPtr();
if (type == 2) oPtr = new (B*) oPtr();
oType = type;
}
along with a similarly structured destructor. Now
std::vector<Poly*> test;
works. However, I am having trouble accessing the subobjects.
I tried
if (test->oType == 1) test->oPtr->a;
if (test->oType == 1) test->(A*)oPtr->a;
if (test->oType == 1) (A*)(test->oPtr)->a;
all giving me the compiler error:
'void*' is not a pointer-to-object type
How do I convince the compiler that it's OK to reference a, if I know that the type of oPtr is A*?
How do I convince the compiler that it's OK to reference a, if I know
that the type of oPtr is A*?
Strictly I think the answer to that is: ((A*)(test->oPtr))->a. The better way to do that in C++ uses the cast operator: static_cast<A*>(test->oPtr)->a
HOWEVER This is not typically how this problem is addressed in c++. So I have provided a more usual approach that you may find useful:
class Poly
{
public:
virtual ~Poly() {}
virtual void do_something() = 0; // each sub-type has its own version of this
};
class A: public Poly
{
public:
void do_something() /* override */ // c++11 only
{
std::cout << "Doing something A specific\n";
}
};
class B: public Poly
{
public:
void do_something() /* override */ // c++11 only
{
std::cout << "Doing something B specific\n";
}
};
int main()
{
std::vector<Poly*> polys;
// create data structure
polys.push_back(new A);
polys.push_back(new A);
polys.push_back(new B);
polys.push_back(new A);
// use objects polymorphically
for(size_t i = 0; i < polys.size(); ++i)
polys[i]->do_something();
// clean up memory (consider using 'smart pointers')
for(size_t i = 0; i < polys.size(); ++i)
delete polys[i];
}
As others mentioned, the polymorphic way is to use virtual functions.
Here is an implementation using smart pointers. The creator class is responsible for creating the Poly object we are asking for. This isolates the creation to one class.
Note that there are more sophisticated ways of doing this. The goal here is to show, more or less, how it would be done using C++.
#include <vector>
#include <memory>
#include <iostream>
class Poly
{
public:
virtual void Test() = 0;
};
typedef std::unique_ptr<Poly> PolyPtr;
class A : public Poly
{
public:
void Test() { std::cout << "Test for A" << "\n"; }
};
class B : public Poly
{
public:
void Test() { std::cout << "Test for B" << "\n"; }
};
class PolyCreator
{
public:
PolyPtr CreatePolyObject(int oType)
{
switch( oType )
{
case 1:
return PolyPtr(new A());
case 2:
return PolyPtr(new B());
}
throw "Could not find type in list";
}
};
int main()
{
PolyCreator pCreator;
std::vector<PolyPtr> PolyPtrVect;
// create objects
PolyPtrVect.push_back(pCreator.CreatePolyObject(1));
PolyPtrVect.push_back(pCreator.CreatePolyObject(2));
// call Test functions for each
std::vector<PolyPtr>::iterator it = PolyPtrVect.begin();
while ( it != PolyPtrVect.end())
{
(*it)->Test();
++it;
}
}
Output:
Test for A
Test for B
Note
There is only one if() statement that is isolated to the PolyCreator class.
There are no memory leaks due to usage of std::unique_ptr.
Poly is an abstract class. All derived classes must implement the Test function.
Consider the following setup.
Base class:
class Thing {
int f1;
int f2;
Thing(NO_INIT) {}
Thing(int n1 = 0, int n2 = 0): f1(n1),f2(n2) {}
virtual ~Thing() {}
virtual void doAction1() {}
virtual const char* type_name() { return "Thing"; }
}
And derived classes that are different only by implementation of methods above:
class Summator {
Summator(NO_INIT):Thing(NO_INIT) {}
virtual void doAction1() override { f1 += f2; }
virtual const char* type_name() override { return "Summator"; }
}
class Substractor {
Substractor(NO_INIT):Thing(NO_INIT) {}
virtual void doAction1() override { f1 -= f2; }
virtual const char* type_name() override { return "Substractor"; }
}
The task I have requires ability to change class (VTBL in this case) of existing objects on the fly. This is known as dynamic subclassing if I am not mistaken.
So I came up with the following function:
// marker used in inplace CTORs
struct NO_INIT {};
template <typename TO_T>
inline TO_T* turn_thing_to(Thing* p)
{
return ::new(p) TO_T(NO_INIT());
}
that does just that - it uses inplace new to construct one object in place of another. Effectively this just changes vtbl pointer in objects. So this code works as expected:
Thing* thing = new Thing();
cout << thing->type_name() << endl; // "Thing"
turn_thing_to<Summator>(thing);
cout << thing->type_name() << endl; // "Summator"
turn_thing_to<Substractor>(thing);
cout << thing->type_name() << endl; // "Substractor"
The only major problems I have with this approach is that
a) each derived classes shall have special constructors like Thing(NO_INIT) {} that shall do precisely nothing. And b) if I will want to add members like std::string to the Thing they will not work - only types that have NO_INIT constructors by themselves are allowed as members of the Thing.
Question: is there a better solution for such dynamic subclassing that solves 'a' and 'b' problems ? I have a feeling that std::move semantic may help to solve 'b' somehow but not sure.
Here is the ideone of the code.
(Already answered at RSDN http://rsdn.ru/forum/cpp/5437990.1)
There is a tricky way:
struct Base
{
int x, y, z;
Base(int i) : x(i), y(i+i), z(i*i) {}
virtual void whoami() { printf("%p base %d %d %d\n", this, x, y, z); }
};
struct Derived : Base
{
Derived(Base&& b) : Base(b) {}
virtual void whoami() { printf("%p derived %d %d %d\n", this, x, y, z); }
};
int main()
{
Base b(3);
Base* p = &b;
b.whoami();
p->whoami();
assert(sizeof(Base)==sizeof(Derived));
Base t(std::move(b));
Derived* d = new(&b)Derived(std::move(t));
printf("-----\n");
b.whoami(); // the compiler still believes it is Base, and calls Base::whoami
p->whoami(); // here it calls virtual function, that is, Derived::whoami
d->whoami();
};
Of course, it's UB.
For your code, I'm not 100% sure it's valid according to the standard.
I think the usage of the placement new which doesn't initialize any member variables, so to preserve previous class state, is undefined behavior in C++. Imagine there is a debug placement new which will initialize all uninitialized member variable into 0xCC.
union is a better solution in this case. However, it does seem that you are implementing the strategy pattern. If so, please use the strategy pattern, which will make code a lot easier to understand & maintain.
Note: the virtual should be removed when using union.
Adding it is ill-formed as mentioned by Mehrdad, because introducing virtual function doesn't meet standard layout.
example
#include <iostream>
#include <string>
using namespace std;
class Thing {
int a;
public:
Thing(int v = 0): a (v) {}
const char * type_name(){ return "Thing"; }
int value() { return a; }
};
class OtherThing : public Thing {
public:
OtherThing(int v): Thing(v) {}
const char * type_name() { return "Other Thing"; }
};
union Something {
Something(int v) : t(v) {}
Thing t;
OtherThing ot;
};
int main() {
Something sth{42};
std::cout << sth.t.type_name() << "\n";
std::cout << sth.t.value() << "\n";
std::cout << sth.ot.type_name() << "\n";
std::cout << sth.ot.value() << "\n";
return 0;
}
As mentioned in the standard:
In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence (9.2), and if an object of this standard-layout union type contains one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of standard-layout struct members; see 9.2. — end note ]
Question: is there a better solution for such dynamic subclassing that solves 'a' and 'b' problems ?
If you have fixed set of sub-classes then you may consider using algebraic data type like boost::variant. Store shared data separately and place all varying parts into variant.
Properties of this approach:
naturally works with fixed set of "sub-classes". (though, some kind of type-erased class can be placed into variant and set would become open)
dispatch is done via switch on small integral tag. Sizeof tag can be minimized to one char. If your "sub-classes" are empty - then there will be small additional overhead (depends on alignment), because boost::variant does not perform empty-base-optimization.
"Sub-classes" can have arbitrary internal data. Such data from different "sub-classes" will be placed in one aligned_storage.
You can make bunch of operations with "sub-class" using only one dispatch per batch, while in general case with virtual or indirect calls dispatch will be per-call. Also, calling method from inside "sub-class" will not have indirection, while with virtual calls you should play with final keyword to try to achieve this.
self to base shared data should be passed explicitly.
Ok, here is proof-of-concept:
struct ThingData
{
int f1;
int f2;
};
struct Summator
{
void doAction1(ThingData &self) { self.f1 += self.f2; }
const char* type_name() { return "Summator"; }
};
struct Substractor
{
void doAction1(ThingData &self) { self.f1 -= self.f2; }
const char* type_name() { return "Substractor"; }
};
using Thing = SubVariant<ThingData, Summator, Substractor>;
int main()
{
auto test = [](auto &self, auto &sub)
{
sub.doAction1(self);
cout << sub.type_name() << " " << self.f1 << " " << self.f2 << endl;
};
Thing x = {{5, 7}, Summator{}};
apply(test, x);
x.sub = Substractor{};
apply(test, x);
cout << "size: " << sizeof(x.sub) << endl;
}
Output is:
Summator 12 7
Substractor 5 7
size: 2
LIVE DEMO on Coliru
Full Code (it uses some C++14 features, but can be mechanically converted into C++11):
#define BOOST_VARIANT_MINIMIZE_SIZE
#include <boost/variant.hpp>
#include <type_traits>
#include <functional>
#include <iostream>
#include <utility>
using namespace std;
/****************************************************************/
// Boost.Variant requires result_type:
template<typename T, typename F>
struct ResultType
{
mutable F f;
using result_type = T;
template<typename ...Args> T operator()(Args&& ...args) const
{
return f(forward<Args>(args)...);
}
};
template<typename T, typename F>
auto make_result_type(F &&f)
{
return ResultType<T, typename decay<F>::type>{forward<F>(f)};
}
/****************************************************************/
// Proof-of-Concept
template<typename Base, typename ...Ts>
struct SubVariant
{
Base shared_data;
boost::variant<Ts...> sub;
template<typename Visitor>
friend auto apply(Visitor visitor, SubVariant &operand)
{
using result_type = typename common_type
<
decltype( visitor(shared_data, declval<Ts&>()) )...
>::type;
return boost::apply_visitor(make_result_type<result_type>([&](auto &x)
{
return visitor(operand.shared_data, x);
}), operand.sub);
}
};
/****************************************************************/
// Demo:
struct ThingData
{
int f1;
int f2;
};
struct Summator
{
void doAction1(ThingData &self) { self.f1 += self.f2; }
const char* type_name() { return "Summator"; }
};
struct Substractor
{
void doAction1(ThingData &self) { self.f1 -= self.f2; }
const char* type_name() { return "Substractor"; }
};
using Thing = SubVariant<ThingData, Summator, Substractor>;
int main()
{
auto test = [](auto &self, auto &sub)
{
sub.doAction1(self);
cout << sub.type_name() << " " << self.f1 << " " << self.f2 << endl;
};
Thing x = {{5, 7}, Summator{}};
apply(test, x);
x.sub = Substractor{};
apply(test, x);
cout << "size: " << sizeof(x.sub) << endl;
}
use return new(p) static_cast<TO_T&&>(*p);
Here is a good resource regarding move semantics: What are move semantics?
You simply can't legally "change" the class of an object in C++.
However if you mention why you need this, we might be able to suggest alternatives. I can think of these:
Do v-tables "manually". In other words, each object of a given class should have a pointer to a table of function pointers that describes the behavior of the class. To modify the behavior of this class of objects, you modify the function pointers. Pretty painful, but that's the whole point of v-tables: to abstract this away from you.
Use discriminated unions (variant, etc.) to nest objects of potentially different types inside the same kind of object. I'm not sure if this is the right approach for you though.
Do something implementation-specific. You can probably find the v-table formats online for whatever implementation you're using, but you're stepping into the realm of undefined behavior here so you're playing with fire. And it most likely won't work on another compiler.
You should be able to reuse data by separating it from your Thing class. Something like this:
template <class TData, class TBehaviourBase>
class StateStorageable {
struct StateStorage {
typedef typename std::aligned_storage<sizeof(TData), alignof(TData)>::type DataStorage;
DataStorage data_storage;
typedef typename std::aligned_storage<sizeof(TBehaviourBase), alignof(TBehaviourBase)>::type BehaviourStorage;
BehaviourStorage behaviour_storage;
static constexpr TData *data(TBehaviourBase * behaviour) {
return reinterpret_cast<TData *>(
reinterpret_cast<char *>(behaviour) -
(offsetof(StateStorage, behaviour_storage) -
offsetof(StateStorage, data_storage)));
}
};
public:
template <class ...Args>
static TBehaviourBase * create(Args&&... args) {
auto storage = ::new StateStorage;
::new(&storage->data_storage) TData(std::forward<Args>(args)...);
return ::new(&storage->behaviour_storage) TBehaviourBase;
}
static void destroy(TBehaviourBase * behaviour) {
auto storage = reinterpret_cast<StateStorage *>(
reinterpret_cast<char *>(behaviour) -
offsetof(StateStorage, behaviour_storage));
::delete storage;
}
protected:
StateStorageable() = default;
inline TData *data() {
return StateStorage::data(static_cast<TBehaviourBase *>(this));
}
};
struct Data {
int a;
};
class Thing : public StateStorageable<Data, Thing> {
public:
virtual const char * type_name(){ return "Thing"; }
virtual int value() { return data()->a; }
};
Data is guaranteed to be leaved intact when you change Thing to other type and offsets should be calculated at compile-time so performance shouldn't be affected.
With a propert set of static_assert's you should be able to ensure that all offsets are correct and there is enough storage for holding your types. Now you only need to change the way you create and destroy your Things.
int main() {
Thing * thing = Thing::create(Data{42});
std::cout << thing->type_name() << "\n";
std::cout << thing->value() << "\n";
turn_thing_to<OtherThing>(thing);
std::cout << thing->type_name() << "\n";
std::cout << thing->value() << "\n";
Thing::destroy(thing);
return 0;
}
There is still UB because of not reassigning thing which can be fixed by using result of turn_thing_to
int main() {
...
thing = turn_thing_to<OtherThing>(thing);
...
}
Here is one more solution
While it slightly less optimal (uses intermediate storage and CPU cycles to invoke moving ctors) it does not change semantic of original task.
#include <iostream>
#include <string>
#include <memory>
using namespace std;
struct A
{
int x;
std::string y;
A(int x, std::string y) : x(x), y(y) {}
A(A&& a) : x(std::move(a.x)), y(std::move(a.y)) {}
virtual const char* who() const { return "A"; }
void show() const { std::cout << (void const*)this << " " << who() << " " << x << " [" << y << "]" << std::endl; }
};
struct B : A
{
virtual const char* who() const { return "B"; }
B(A&& a) : A(std::move(a)) {}
};
template<class TO_T>
inline TO_T* turn_A_to(A* a) {
A temp(std::move(*a));
a->~A();
return new(a) B(std::move(temp));
}
int main()
{
A* pa = new A(123, "text");
pa->show(); // 0xbfbefa58 A 123 [text]
turn_A_to<B>(pa);
pa->show(); // 0xbfbefa58 B 123 [text]
}
and its ideone.
The solution is derived from idea expressed by Nickolay Merkin below.
But he suspect UB somewhere in turn_A_to<>().
I have the same problem, and while I'm not using it, one solution I thought of is to have a single class and make the methods switches based on a "item type" number in the class. Changing type is as easy as changing the type number.
class OneClass {
int iType;
const char* Wears() {
switch ( iType ) {
case ClarkKent:
return "glasses";
case Superman:
return "cape";
}
}
}
:
:
OneClass person;
person.iType = ClarkKent;
printf( "now wearing %s\n", person.Wears() );
person.iType = Superman;
printf( "now wearing %s\n", person.Wears() );
In the comments to this answer, Koushik raised a very valid point.
Take the following:
union U
{
int x;
const T y;
};
(I choose T such that there is no common initial sequence of layout compatibility here, meaning only one member may be active at any given time per [C++11: 9.5/1].)
Since only one member may be "active" at any one time (made active by writing to it), and y cannot be written to after initialisation, isn't this rather pointless? I mean, y can only be read from until the first time x is written to, and at that only if y was the initialised member.
Is there some use case I'm missing? Or is this indeed a pretty pointless confluence of language features?
(This has been mentioned before)
Here's a contrived example of a reference-semantics type where you'd only want to grant const access to. The union is used in a variant-like data type returned from a "type-erasing" function.
#include <memory>
template<class T>
struct reference_semantics
{
public:
reference_semantics(T* p ) : m(p) {}
int observe() const { return *m; }
void change(T p) { *m = p; }
private:
T* m;
};
struct variant
{
enum T { INT, DOUBLE } type;
union U
{
reference_semantics<int> const i;
reference_semantics<double> const d;
U(int* p) : i(p) {}
U(double* p) : d(p) {}
} u;
};
#include <iostream>
std::ostream& operator<<(std::ostream& o, variant const& v)
{
switch(v.type)
{
case variant::INT:
return o << "INT: "<<v.u.i.observe();
case variant::DOUBLE:
return o << "DOUBLE: "<<v.u.d.observe();
}
}
#include <string>
variant type_erased_access(std::string name)
{
// imagine accesses to a map or so
static double dval = 42.21;
static int ival = 1729;
if(name == "Lightness") return { variant::DOUBLE, &dval };
else return { variant::INT, &ival };
}
int main()
{
variant v0( type_erased_access("Lightness") );
std::cout << v0 << "\n";
variant v1( type_erased_access("Darkness") );
std::cout << v1 << "\n";
}
Imagine now that instead of int and double, much larger data types are used, and that the reference_semantics data type actually provides more functionality than just returning the value.
It might even be possible that you want to return a reference_semantics<some_type> const for some arguments, but a plain int for others. In that case, your union might even have const and non-const members.
It does have uses:
1) For offering a const_cast-like technique. In a sense, x = const_cast<...>(y).
2) When dealing with templates, sometimes you need a const version of a data type so you match other parameter types.
(I've seen (1) used when programming against legacy interfaces).
Not using unions a lot, but this might be scenario:
#include <iostream>
class Accessor;
union Union
{
private:
friend class Accessor;
int write;
public:
const int read;
Union() : read(0) {}
};
class Accessor {
public:
static void apply(Union& u, int i) { u.write = i; }
};
int main() {
Union u;
// error: ‘int Union::write’ is private
// u.write = 1;
std::cout << u.read << '\n';
Accessor::apply(u, 1);
std::cout << u.read << '\n';
}
Note: From 9.5 Unions
Note: One special guarantee is made in order to simplify the use of
unions: If a standard-layout union contains several standard-layout
structs that share a common initial sequence (9.2), and if an object
of this standard-layout union type contains one of the standard-layout
structs, it is permitted to inspect the common initial sequence of any
of standard-layout struct members; see 9.2. — end note ]
If the union represents part of a result of some method/algorithm, then it could make sense. But in that case, I'd make both values const:
union T
{
const int x;
const int y;
};