Given a class instance and pointer to a field we can obtain regular pointer pointing to field variable of this class instance - as in last assignment in following code;
class A
{
public:
int i, j;
};
int main(){
A a;
int A::*p = &A::i;
int* r = &(a.*p); // r now points to a.i;
}
Is it possible to invert this conversion: given class instance A a; and int* r obtain int A::* p (or NULL ptr if pointer given is not in instance given) as in code:
class A
{
public:
int i, j;
};
int main(){
A a;
int A::*p = &A::i;
int* r = &(a.*p); // r now points to a.i;
int A::*s = // a--->r -how to extract r back to member pointer?
}
The only way that I can think of doing it, would be to write a function that takes every known field of A, calculates it's address for given instance and compares with address given. This however requires writing custom code for every class, and might get difficult to manage. It has also suboptimal performance.
I can imagine that such conversion could be done by compiler in few operations under all implementations i know - such pointer is usually just an offset in structure so it would be just a subtraction and range check to see if given pointer is actually in this class storage. Virtual base classes add a bit of complexity, but nothing compiler couldn't handle I think. However it seems that since it's not required by standard (is it?) no compiler vendor cares.
Or am I wrong, and there is some fundamental problem with such conversion?
EDIT:
I see that there is a little misunderstanding about what I am asking about. In short I am asking if either:
There is already some implementation of it (at the compiler level I mean), but since hardly anybody uses it, almost nobody knows about it.
There is no mention of it in standard and no compiler vendor has though of it, but In principle it is possible to implement (once again: by the compiler, not compiled code.)
There is some deep-reaching problem with such an operation, that I missed.
My question was - which of those is true? And in case of the last - what is underlying problem?
I am not asking for workarounds.
There is no cross platform way to do this. Pointer to member values are commonly implemented as offsets to the start of the object. Leveraging off of that fact I made this (works in VS, haven't tried anything else):
class A
{
public:
int i, j;
};
int main()
{
A a;
int A::*p = &A::i;
int* r = &(a.*p); // r now points to a.i;
union
{
std::ptrdiff_t offset;
int A::*s;
};
offset = r - reinterpret_cast<int*>(&a);
a.*s = 7;
std::cout << a.i << '\n';
}
AFAIK C++ does not provide full reflection, which you would need to do that.
One solution is to provide reflection yourself (the way you describe is one way, it may not be the best one but it would work).
A totally non portable solution would be to locate the executable and use any debug information it may contain. Obviously non portable and requires the debug information to be there to begin with.
There's a decent description of the problem of reflection and different possible approaches to it in the introduction section of http://www.garret.ru/cppreflection/docs/reflect.html
Edit:
As I wrote above, there's no portable and general solution. But there may be a very non portable approach. I'm not giving you an implementation here as I do not have a C++ compiler at the moment to test it, but I'll describe the idea.
The basis is what Dave did in his answer: exploit the fact that a pointer to member is often just an offset. The problem is with base classes (especially virtual and multiple inheritance ones). You can approach it with templates. You can use dynamic casting to get a pointer to the base class. And eventually diff that pointer with the original to find out the offset of the base.
Related
Recently for fun I have decided to build a toy programming, compiler and vm. While starting to implement the virtual machine I got stuck. The stack which holds the variables and structs I implemented as separate arrays for each type. The problem is when I have a reference to a struct the elements are not aligned, int struct.x might be at address 2, and float struct.y might be at address 56, so accessing the struct by a reference would be impossible, because the indexes are not linear. How could I solve this?
edit:
first of all for each type I mean for each primitive, and second I know I could implement it with unions but I want to learn how it is really implemented in java, c++ or c#, that's kind of the point of making a toy language, to better understand what you are programming.
in this case, you have no real choice but to use a single data type like a uin32_t/uint64_t and simply have the compiler break values down into integer
int sp = 0;
uint32_t stack[MAX_STACK_SIZE];
OR
like the others have said, create a stack that is an array of unions, possibly using a tagged union. One implementation could be...
union values {
int i;
float f;
};
struct Type {
int tag;
union values val;
};
Type stack[MAX_STACK_SIZE];
It's up to you to decide on this but this is usually how it's done.
class C
{
public:
C() : m_x(0) { }
virtual ~C() { }
public:
static ptrdiff_t member_offset(const C &c)
{
const char *p = reinterpret_cast<const char*>(&c);
const char *q = reinterpret_cast<const char*>(&c.m_x);
return q - p;
}
private:
int m_x;
};
int main(void)
{
C c;
std::cout << ((C::member_offset(c) == 0) ? 0 : 1);
std::cout << std::endl;
std::system("pause");
return 0;
}
The program above outputs 1. What it does is just check the addresses of the c object and the c's field m_x. It prints out 1 which means the addresses are not equal. My guess is that is because the d'tor is virtual so the compiler has to create a vtable for the class and put a vpointer in the class's object. If I'm already wrong please correct me.
Apparently, it puts the vpointer at the beginning of the object, pushing the m_x field farther and thus giving it a different address. Is that the case? If so does the standard specify vpointer's position in the object? According to wiki it's implementation-dependent. And its position may change the output of the program.
So can you always predict the output of this program without specifying the target platform?
In reality, it is NEARLY ALWAYS laid out in this way. However, the C++ standard allows whatever works to be used. And I can imagine several solutions that doesn't REQUIRE the above to be true - although they would perhaps not work well as a real solution.
Note however that you can have more than one vptr/vtable for an object if you have multiple inheritance.
There are no "vpointers" in C++. The implementation of polymorphism and dynamic dispatch is left to the compiler, and the resulting class layout is not in any way specified. Certainly an object of polymorphic type will have to carry some extra state in order to identify the concrete type when given only a view of a base subobject.
Implementations with vtables and vptrs are common and popular, and putting the vptr at the beginning of the class means that you don't need any pointer adjustments for single inheritance up and downcasts.
Many C++ compilers follow (parts of) the Itanium ABI for C++, which specifies class layout decisions like this. This popular article may also provide some insights.
Yes, it is implementation dependent and no, you can't predict the program's output without knowing the target platform/compiler.
This question is not about the C++ language itself(ie not about the Standard) but about how to call a compiler to implement alternative schemes for virtual function.
The general scheme for implementing virtual functions is using a pointer to a table of pointers.
class Base {
private:
int m;
public:
virtual metha();
};
equivalently in say C would be something like
struct Base {
void (**vtable)();
int m;
}
the first member is usually a pointer to a list of virtual functions, etc. (a piece of area in the memory which the application has no control of). And in most case this happens to cost the size of a pointer before considering the members, etc. So in a 32bit addressing scheme around 4 bytes, etc. If you created a list of 40k polymorphic objects in your applications, this is around 40k x 4 bytes = 160k bytes before any member variables, etc. I also know this happens to be the fastest and common implementation among C++ compiles.
I know this is complicated by multiple inheritance (especially with virtual classes in them, ie diamond struct, etc).
An alternative way to do the same is to have the first variable as a index id to a table of vptrs(equivalently in C as below)
struct Base {
char classid; // the classid here is an index into an array of vtables
int m;
}
If the total number of classes in an application is less than 255(including all possible template instantiations, etc), then a char is good enough to hold an index thereby reducing the size of all polymorphic classes in the application(I am excluding alignment issues, etc).
My questions is, is there any switch in GNU C++, LLVM, or any other compiler to do this?? or reduce the size of polymorphic objects?
Edit: I understand about the alignment issues pointed out. Also a further point, if this was on a 64bit system(assuming 64bit vptr) with each polymorphic object members costing around 8 bytes, then the cost of vptr is 50% of the memory. This mostly relates to small polymorphics created in mass, so I am wondering if this scheme is possible for at least specific virtual objects if not the whole application.
You're suggestion is interesting, but it won't work if the executable is made of several modules, passing objects among them. Given they are compiled separately (say DLLs), if one module creates an object and passes it to another, and the other invokes a virtual method - how would it know which table the classid refers to? You won't be able to add another moduleid because the two modules might not know about each other when they are compiled. So unless you use pointers, I think it's a dead end...
A couple of observations:
Yes, a smaller value could be used to represent the class, but some processors require data to be aligned so that saving in space may be lost by the requirement to align data values to e.g. 4 byte boundaries. Further, the class-id must be in a well defined place for all members of a polymorphic inheritance tree, so it is likely to be ahead of other date, so alignment problems can't be avoided.
The cost of storing the pointer has been moved to the code, where every use of a polymorphic function requires code to translate the class-id to either a vtable pointer, or some equivalent data structure. So it isn't for free. Clearly the cost trade-off depends on the volume of code vs numer of objects.
If objects are allocated from the heap, there is usually space wasted in orer to ensure objects are alogned to the worst boundary, so even if there is a small amount of code, and a large number of polymorphic objects, the memory management overhead migh be significantly bigger than the difference between a pointer and a char.
In order to allow programs to be independently compiled, the number of classes in the whole program, and hence the size of the class-id must be known at compile time, otherwise code can't be compiled to access it. This would be a significant overhead. It is simpler to fix it for the worst case, and simplify compilation and linking.
Please don't let me stop you trying, but there are quite a lot more issues to resolve using any technique which may use a variable size id to derive the function address.
I would strongly encourage you to look at Ian Piumarta's Cola also at Wikipedia Cola
It actually takes a different approach, and uses the pointer in a much more flexible way, to to build inheritance, or prototype-based, or any other mechanism the developer requires.
No, there is no such switch.
The LLVM/Clang codebase avoids virtual tables in classes that are allocated by the tens of thousands: this work well in a closed hierachy, because a single enum can enumerate all possible classes and then each class is linked to a value of the enum. The closed is obviously because of the enum.
Then, virtuality is implemented by a switch on the enum, and appropriate casting before calling the method. Once again, closed. The switch has to be modified for each new class.
A first alternative: external vpointer.
If you find yourself in a situation where the vpointer tax is paid way too often, that is most of the objects are of known type. Then you can externalize it.
class Interface {
public:
virtual ~Interface() {}
virtual Interface* clone() const = 0; // might be worth it
virtual void updateCount(int) = 0;
protected:
Interface(Interface const&) {}
Interface& operator=(Interface const&) { return *this; }
};
template <typename T>
class InterfaceBridge: public Interface {
public:
InterfaceBridge(T& t): t(t) {}
virtual InterfaceBridge* clone() const { return new InterfaceBridge(*this); }
virtual void updateCount(int i) { t.updateCount(i); }
private:
T& t; // value or reference ? Choose...
};
template <typename T>
InterfaceBridge<T> interface(T& t) { return InterfaceBridge<T>(t); }
Then, imagining a simple class:
class Counter {
public:
int getCount() const { return c; }
void updateCount(int i) { c = i; }
private:
int c;
};
You can store the objects in an array:
static Counter array[5];
assert(sizeof(array) == sizeof(int)*5); // no v-pointer
And still use them with polymorphic functions:
void five(Interface& i) { i.updateCount(5); }
InterfaceBridge<Counter> ib(array[3]); // create *one* v-pointer
five(ib);
assert(array[3].getCount() == 5);
The value vs reference is actually a design tension. In general, if you need to clone you need to store by value, and you need to clone when you store by base class (boost::ptr_vector for example). It is possible to actually provide both interfaces (and bridges):
Interface <--- ClonableInterface
| |
InterfaceB ClonableInterfaceB
It's just extra typing.
Another solution, much more involved.
A switch is implementable by a jump table. Such a table could perfectly be created at runtime, in a std::vector for example:
class Base {
public:
~Base() { VTables()[vpointer].dispose(*this); }
void updateCount(int i) {
VTables()[vpointer].updateCount(*this, i);
}
protected:
struct VTable {
typedef void (*Dispose)(Base&);
typedef void (*UpdateCount)(Base&, int);
Dispose dispose;
UpdateCount updateCount;
};
static void NoDispose(Base&) {}
static unsigned RegisterTable(VTable t) {
std::vector<VTable>& v = VTables();
v.push_back(t);
return v.size() - 1;
}
explicit Base(unsigned id): vpointer(id) {
assert(id < VTables.size());
}
private:
// Implement in .cpp or pay the cost of weak symbols.
static std::vector<VTable> VTables() { static std::vector<VTable> VT; return VT; }
unsigned vpointer;
};
And then, a Derived class:
class Derived: public Base {
public:
Derived(): Base(GetID()) {}
private:
static void UpdateCount(Base& b, int i) {
static_cast<Derived&>(b).count = i;
}
static unsigned GetID() {
static unsigned ID = RegisterTable(VTable({&NoDispose, &UpdateCount}));
return ID;
}
unsigned count;
};
Well, now you'll realize how great it is that the compiler does it for you, even at the cost of some overhead.
Oh, and because of alignment, as soon as a Derived class introduces a pointer, there is a risk that 4 bytes of padding are used between Base and the next attribute. You can use them by careful selecting the first few attributes in Derived to avoid padding...
The short answer is that no, I don't know of any switch to do this with any common C++ compiler.
The longer answer is that to do this, you'd just about have to build most of the intelligence into the linker, so it could coordinate distributing the IDs across all the object files getting linked together.
I'd also point out that it wouldn't generally do a whole lot of good. At least in a typical case, you want each element in a struct/class at a "natural" boundary, meaning its starting address is a multiple of its size. Using your example of a class containing a single int, the compiler would allocate one byte for the vtable index, followed immediately by three byes of padding so the next int would land at an address that was a multiple of four. The end result would be that objects of the class would occupy precisely the same amount of storage as if we used a pointer.
I'd add that this is not a far-fetched exception either. For years, standard advice to minimize padding inserted into structs/classes has been to put the items expected to be largest at the beginning, and progress toward the smallest. That means in most code, you'd end up with those same three bytes of padding before the first explicitly defined member of the struct.
To get any good from this, you'd have to be aware of it, and have a struct with (for example) three bytes of data you could move where you wanted. Then you'd move those to be the first items explicitly defined in the struct. Unfortunately, that would also mean that if you turned this switch off so you have a vtable pointer, you'd end up with the compiler inserting padding that might otherwise be unnecessary.
To summarize: it's not implemented, and if it was wouldn't usually accomplish much.
My program needs to handle different kinds of "notes": NoteShort, NoteLong... Different kinds of notes should be displayed in the GUI in different ways. I defined a base class of these notes, called NoteBase.
I store these notes in XML; and I have a class which reads from the XML file and store notes' data in vector<NoteBase *> list. Then I found I cannot get their own types, because they are already converted to NoteBase *!
Though if(dynamic_cast<NoteLong *>(ptr) != NULL) {...} may works, it's really too ugly. Implementing functions take NoteShort * or NoteLong * as parameter don't work. So, any good way to deal with this problem?
UPDATE: Thank you guys for replying. I don't think it should happen neither -- but it did happened. I implemented it in another way, and it's now working. However, as far as I remember, I indeed declared the (pure) virtual function in NoteBase, but forgot to declare it again in headers of the deriving classes. I guess that's what caused the issue.
UPDATE 2 (IMPORTANT):
I found this quotation from C++ Primer, which may be helpful to others:
What is sometimes a bit more surprising is that the restriction on
converting from base to derived exists even when a base pointer or
reference is actually bound to a derived object:
Bulk_item bulk;
Item_base *itemP = &bulk; // ok: dynamic type is Bulk_item
Bulk_item *bulkP = itemP; // error: can't convert base to derived
The compiler has no way to know at compile time that a specific
conversion will actually be safe at run time. The compiler looks only
at the static types of the pointer or reference to determine whether a
conversion is legal. In those cases when we know that the conversion
from base to derived is safe, we can use a static_cast (Section
5.12.4, p. 183) to override the compiler. Alternatively, we could request a conversion that is checked at run time by using a
dynamic_cast, which is covered in Section 18.2.1 (p. 773).
There are two significant trains of thought and code here, so shortest first:
You may not need to cast back up. If all Notes provide a uniform action (say Chime), then you can simply have:
class INote
{
virtual void Chime() = 0;
};
...
for_each(INote * note in m_Notes)
{
note->Chime();
}
and each Note will Chime as it should, using internal information (duration and pitch, for example).
This is clean, simple, and requires minimal code. It does mean the types all have to provide and inherit from a particular known interface/class, however.
Now the longer and far more involved methods occur when you do need to know the type and cast back up to it. There are two major methods, and a variant (#2) which may be used or combined with #3:
This can be done in the compiler with RTTI (runtime type information), allowing it to safely dynamic_cast with good knowledge of what is allowed. This only works within a single compiler and perhaps single module (DLL/SO/etc), however. If your compiler supports it and there are no significant downsides of RTTI, it is by far the easiest and takes the least work on your end. It does not, however, allow the type to identify itself (although a typeof function may be available).
This is done as you have:
NewType * obj = dynamic_cast<NewType*>(obj_oldType);
To make it entirely independent, adding a virtual method to the base class/interface (for example, Uuid GetType() const;) allows the object to identify itself at any time. This has a benefit over the third (true-to-COM) method, and a disadvantage: it allows the user of the object to make intelligent and perhaps faster decisions on what to do, but requires a) they cast (which may necessitate and unsafe reinterpret_cast or C-style cast) and b) the type cannot do any internal conversion or checking.
ClassID id = obj->GetType();
if (id == ID_Note_Long)
NoteLong * note = (NoteLong*)obj;
...
The option which COM uses is to provide a method of the form RESULT /* success */ CastTo(const Uuid & type, void ** ppDestination);. This allows the type to a) check the safety of the cast internally, b) perform the cast internally at its own discretion (there are rules on what can be done) and c) provide an error if the cast is impossible or fails. However, it a) prevents the user form optimizing well and b) may require multiple calls to find a succesful type.
NoteLong * note = nullptr;
if (obj->GetAs(ID_Note_Long, ¬e))
...
Combining the latter two methods in some fashion (if a 00-00-00-0000 Uuid and nullptr destination are passed, fill the Uuid with the type's own Uuid, for example) may be the most optimal method of both identifying and safely converting types. Both the latter methods, and them combined, are compiler and API independent, and may even achieve language-independence with care (as COM does, in qualified manner).
ClassID id = ClassID::Null;
obj->GetAs(id, nullptr);
if (id == ID_Note_Long)
NoteLong * note;
obj->GetAs(ID_Note_Long, ¬e);
...
The latter two are particularly useful when the type is almost entirely unknown: the source library, compiler, and even language are not known ahead of time, the only available information is that a given interface is provided. Working with such little data and unable to use highly compiler-specific features such as RTTI, requiring the object to provide basic information about itself is necessary. The user can then ask the object to cast itself as needed, and the object is has full discretion as to how that's handled. This is typically used with heavily virtual classes or even interfaces (pure virtual), as that may be all the knowledge the user code may have.
This method is probably not useful for you, in your scope, but may be of interest and is certainly important as to how types can identify themselves and be cast back "up" from a base class or interface.
Use polymorphism to access different implementations for the each of the derived classes like in the followin example.
class NoteBase
{
public:
virtual std::string read() = 0;
};
class NoteLong : public NoteBase
{
public:
std::string read() override { return "note long"; }
};
class NoteShort : public NoteBase
{
public:
std::string read() override { return "note short"; }
};
int main()
{
std::vector< NoteBase* > notes;
for( int i=0; i<10; ++i )
{
if( i%2 )
notes.push_back(new NoteLong() );
else
notes.push_back( new NoteShort() );
}
std::vector< NoteBase* >::iterator it;
std::vector< NoteBase* >::iterator end = notes.end();
for( it=notes.begin(); it != end; ++it )
std::cout << (*it)->read() << std::endl;
return 0;
}
As others have pointed out, you should try to design the base-class in a way that lets you do all the stuff you require without casting. If that is not possible (that is, if you need information specific to the subclasses), you can either use casting like you have done, or you can use double-dispatch.
Say I wanted to have one variable in a class always be in some relation to another without changing the "linked" variable explicitly.
For example: int foo is always 10 less than int bar.
Making it so that if I changed bar, foo would be changed as well. Is there a way to do this? (Integer overflow isn't really possible so don't worry about it.)
Example: (Obviously doesn't work, but general code for an understanding)
class A
{
int x;
int y = x - 10; // Whenever x is changed, y will become 10 less than x
};
No, you can't do that. Your best option for doing this is to use accessor and mutator member functions:
int getFoo()
{
return foo_;
}
void setFoo(int newFoo)
{
foo_ = newFoo;
}
int getBar()
{
return foo_ + 10;
}
void setBar(int newBar)
{
foo_ = newBar - 10;
}
This is called an invariant. It is a relationship that shall hold, but cannot be enforced by the means provided by the programming language. Invariants should only be introduced when they are really necessary. In a way the are a relatively "bad" thing, since they are something that can be inadvertently broken. So, the first question you have to ask yourself is whether you really have to introduce that invariant. Maybe you can do without two variables in this case, and can just generate the second value from the first variable on the fly, just like James suggested in his answer.
But if you really need two variables (and very often there's no way around it), you'll end up with an invariant. Of course, it is possible to manually implement something in C++ that would effectively link the variables together and change one when the other changes, but most of the time it is not worth the effort. The best thing you can do, if you really need two variables, again, is to be careful to keep the required relationship manually and use lots of assertions that would verify the invariant whenever it can break (and sometimes even when it can't), like
assert(y == x - 10);
in your case.
Also, I'd expect some advanced third-party C++ libraries (like, Boost, for example) to provide some high level assertion tools that can be custom-programmed to watch over invariants in the code (I can't suggest any though), i.e. you can make the language work for you here, but it has to be a library solution. The core language won't help you here.
You could create a new structure which contains both variables and overload the operators you wish to use. Similar to James McNellis' answer above, but allowing you to have it "automatically" happen whenever you operate on the variable in question.
class DualStateDouble
{
public:
DualStateDouble(double &pv1,double &pv2) : m_pv1(pv1),m_pv2(pv2)
// overload all operators needed to maintain the relationship
// operations on this double automatically effect both values
private:
double *m_pv1;
double *m_pv2;
};