This question already has an answer here:
I can't understand this line - dereferencing an address of private member variable or what?
(1 answer)
Closed 6 years ago.
I stumbled across this question, which had an answer that used an odd construct:
typedef std::queue<int> Q;
typedef Q::container_type C;
C & get (Q &q)
{
struct hack : private Q {
static C & get (Q &q) {
return q.*&hack::c;
}
};
return hack::get(q);
}
I generally follow that q has access to its own c member that is being referenced by the get function. But, I am at a loss to clearly explain it. What is happening exactly with the .*&, and why is it allowed?
typedef std::queue<int> Q;
Q is a queue adapted container.
typedef Q::container_type C;
C is the underlying container of the Q -- which is a deque<int>.
C & get (Q &q) {
get takes a queue and returns a deque. In fact it returns the deque that the queue wraps: by conventional means, this is not possible.
struct hack : private Q {
hack is a type local to the function. It inherits from Q and has only one static member function. From its name, you may suspect it is a hack. You are right.
No hack is ever instantiated.
static C & get (Q &q) {
hack::get has the same signature as get itself. In fact we delegate all of the work of get to this method.
return q.*&hack::c;
this line needs to be broken down. I will do it in more lines:
using mem_ptr_t = C Q::*; // aka typedef C Q::*mem_ptr_t;
mem_ptr_t c_mem_ptr = &hack::c;
C& ret = q.*c_mem_ptr;
return ret;
The first line defines the type of a member pointer to a field of type C within a Q. Both the C++11 and C++03 ways of naming this type are ugly.
The second line gets a member pointer to the field c in Q. It does this through the hole in the type system of C++. &hack::c is logically of type C hack::* -- a pointer to a member of type C within a class of type hack. In fact, that is why we can access it in a static member of hack. But the c in question is actually in Q, so the actual type of the expression in C++ is C Q::*: a pointer to a member variable of Q.
You cannot directly get this member pointer within hack -- &Q::c is illegal, but &hack::c is not.
You can think of member pointers as 'typed offsets' into another type: &hack::c is the "offset" of c within Q together with knowing it is of type C. Now this isn't really true -- it is some opaque value that tells the compiler how to get c from Q -- but it helps to think about it that way (and it may be implemented that way in simple cases).
We then use this member pointer together with a Q& to get the c out of the Q. Getting a member pointer is constrained by protected: using it is not! The way we do it is with operator .*, which is the member dereference operator, which you can pass either member function pointers or members on the right, and class instances on the left.
instance .* member_ptr is an expression that finds the member "pointed to" by member_ptr within the instance. In the original code, everything was done on one line:
instance .* &class_name::member_name
so it looked like there was an operator .*&.
}
};
and then we close up the static method and hack class, and:
return hack::get(q);
}
call it. This technique gives access to protected state: without it, protected members can only be accessed in child classes of the same instance. Using this, we can access protected members of any instance, without violating any bit of the standard.
It's a hack, as the nomenclature indicates.
.* takes an object on the left side, and a member pointer on the right side, and resolves the pointed-to member of the given object. & is, of course, the referencing operator; &Class::Member returns a member pointer, which cannot by itself be dereferenced but which can be used with the .* and ->* operators (the latter being the wackiest of all C++ operators). So obj .* &Class::Member has exactly the same effect as obj.Member.
The reason this more complicated version is being used comes down to a loophole in protection semantics; basically, it allows access to protected members of a base class object, even if the object is not of the same type as the class doing this dirty hack.
Personally, I think the trick is too clever by half. I'd ordinarily* write such code as:
struct hack : private Q {
static C & get (Q &q) {
return static_cast<hack &>(q).c;
}
};
Which is technically slightly less safe, but doesn't obscure what's going on.
.* Well, ordinarily I'd avoid writing such a thing at all. But I literally did this earlier today, so I can't really throw stones.
Related
In Python any bound call to member funtion or method gets converted to unbound call, i.e: obj.method() is equivalent to method(obj). That's why the first parameter of every member function is itself.
Is there a similar concept in C++ that explains why member function are accessed with dot operator?
In C++, the dot ('.') is an operator, which allows you to access a member given and object. There is another operator, arrow ('->') which allows you to access a member of an object given a pointer to that object. Each of these works for both member variables and member functions.
Inside of each (non-static) member function, the code has access to a pointer to the object ('this'), which can be used as needed. Access to members of that object are also available there.
As to why dots are used? It's just a design choice that Bjarne Stroustrup (PBUH) made a few decades ago. It mimics C's access to a member of a struct.
There is no such concept that makes obj.method() and method(obj) equivalent in c++. There has been a proposal for Uniform Function Call Syntax that would make them call the same code, but as far as I can tell it does not seem that it will be adopted any time soon.
Early C++ was a superset of the C language (it was originally called "C with Classes"), so the use of the . comes from Kernigan and Ritchie's 1973 introduction of struct to the C language, meaning "member of instance".
In order for the compiler to determine which function you are calling, it needs to know which object you are acting on, and so the simple decision was made to re-use the existing member-access syntax (object.member) and pass the object address as an implicit argument.
Why a pointer? Because C didn't have references.
The original C++ compiler, CFront, translated "C with classes" and later C++ into C code before compiling to assembly. Backwards and binary compatibility with C was critical and in order for a member function to modify the object its being invoked against, it needs to be passed by pointer or reference. Since pointers were supported by C as well as C++ they chose pointers. So this is a pointer, not a reference.
Roughly the same thing happens in Python, the this just has to be explicit by the user:
# Python
class MyClassNameHere(object):
_a = -1
def __init__(self):
self._a = 0
def setA(self, a):
self._a = a
def getA(self):
return self._a
// C++
struct MyClassNameHere {
int a_;
MyClassNameHere() : a_(0) {}
void setA(int a) { a_ = a; }
int getA() const { return a_; }
};
Because C++ is strongly typed, having to specify the this parameter would be tediously verbose, and you'd have to be mindful of your const's:
struct MyClassNameHere {
int a_ = -1;
MyClassNameHere(MyClassNameHere* this) : a_(0) {}
void setA(MyClassNameHere* this, int a) { a_ = a; }
int setA(const MyClassNameHere* this) const { return a_; }
// ^^^^^
};
I'm not sure if this is a thing (to be honest I want to say that it is not), but I was wondering if there is a way to write a c++ function so that it can choose which type of object to return.
For example, I have a base class (A) that has 3 child classes (Aa, Ab, Ac). In a factory(F) class I have a std::map<UINT, A*> that holds a number of the child classes based on a UINT id. My goal is to write a function that can build and return the correct object when I pass in an id value.
I'll probably end up returning pointers and cloning the data that they point to, but I was just curious as to whether or not the aforementioned was actually possible.
Thanks!
C++ being statically typed, the return type of a function must be known at compile time. From here arises the question:
do I know the expected return type statically on each call site of F (== it only depends on constant expression values)
or does it depend on some runtime variable.
For case #1, a function template for F would be a good approach.
But in your case, it seems you are facing #2 (because you want to return a type depending on ID that we can assume is not a constant expression).
Because of the static typing, if you are to write a function (assuming you do not overload it, because it seems your input parameters are always the same), it will have a single and well-defined return type. Basically, you do not have a syntax to say that your factory F will return either an Aa Ab or Ac (and that is a very good thing, with regard to static typing and all the compiler verifications it enables ; )
C++ solution: Type erasure
With that being said, you have a few approaches to type erasure, that will allow you to return an instance of a variant type hidden behind a common single type.
The obvious one is the pointer-to-derived to pointer-to-base conversion. It is particularly usefull if you plan to use the returned object mainly through its A interface (i.e., you will call the virtual functions defined on A).
A* F(ID aId)
This A* could point to any type deriving from A. From here, you could call every function defined on A public interface on the returned pointer. Of course, if you wanted to call an operation that is only available on a subclass, you would need to know what is the exact type on call site,and then cast the pointer to a pointer-to-derived before being able to call the operation.
A possible alternative, if you'd rather avoid dynamic memory, could be boost::variant. At the cost of having to explicitly list all the possible types the function could return.
boost::variant<Aa, Ab, Ac> F(ID aId);
You can take a look at the tutorial for a quick introduction to the syntax and features.
Sure, something like this:
class MyMapClass
{
public:
template< class ExactType > ExactType * getValue(UINT key)
{
return dynamic_cast<ExactType*>(_myMap.at(key));
}
BaseType * at(UINT key)
{
return _myMap.at(key);
}
private:
std::map<UINT, BaseType*> _myMap;
}
However, since you are storing the pointers to base types, you can as well return them as is, and rely on the caller to make a specific cast, if that goes well with your application's architecture.
Unfortunately, you can not do it fully automatically. Sooner or later you will have to determine the exact class that hides behind the base class pointer, and make a cast. With the template solution it is done "sooner":
MyDerivedType * value = myMapClassInstance.getValue<MyDerivedType>(1);
If you prefer to return the base pointer, it is done "later":
BaseType * value = myMapClassInstance.at(1);
MyDerivedType * exactValue = dynamic_cast<MyDerivedType*>(value);
This question already has answers here:
typedef struct vs struct definitions [duplicate]
(12 answers)
Closed 8 years ago.
In a few places in the code I have a struct definition that looks like this:
typedef struct tagNameOfStruct{
//some methods and members
}T_NameOfStruct, * PT_NameOfStruct, FAR LPT_NameOfStruct;
typedef struct tagNameOfStructLists{
//some methods and members
}T_NameOfStructLists, * PT_NameOfStructLists, FAR LPT_NameOfStructLists;
typedef pair<T_NameOfStruct, T_NameOfStructLists> CStructPair;
typedef map<T_NameOfStruct, T_NameOfStructLists> CStructMap;
Than I see that inside a method within a loop the following line of code
T_NameOfStruct instance;
T_NameOfStructLists listInstance;
m_MapInstance.insert(CStructPair(instance, listInstance));
//structMapInstance is a members of method class of type CStructMap
That instance is being inserted to data-structure which is used outside the scope of the function.It is passed to the data-structure by reference.
Shouldn't instance die when we leave the function scope?
What does the last line of the struct definition mean?
Edit ---- Why this is not a duplicate, please reopen---- :
The * PT_NameOfStruct, FAR LPT_NameOfStruct in the struct definition are different from the question you guys linked too. Also there is the issue of the passing the instance by ref while it is defines on the method stack. The strange thing is that the code works so I'm wondering what I'm missing here. Why don't I get an exception while trying to access destroyed objects in a different function which iterates over the data structure.
Still think it's a duplicate?
Shouldn't instance die when we leave the function scope?
Yes. If the code you haven't shown us actually uses a reference to it after that, then it's wrong. But since we can't see it, we can only guess whether or not it does that.
UPDATE: Now you've shown us what's actually happening, it's being passed to map::insert, which stores a copy of its argument (even though it takes the argument by reference). There's no problem when instance itself is destroyed, assuming it has valid copy semantics.
What does the ending of the struct definition mean?
It declares:
a class called tagNameOfStruct.
an alias for the class type, called T_NameOfStruct. In C++ this is completely pointless; in C, some people have a bad habit of doing that to avoid typing struct when specifying the type.
an alias for a pointer to the type, called PT_NameOfStruct. This is an even worse idea than the first typedef, since it hides the important information that something is a pointer.
an alias for an obsolete "far" pointer to the type, called LPT_NameOfStruct. That's a hangover from 16-bit platforms; on a modern platform it will be the same as the regular pointer type.
Don't use this as a model for how to declare class types. It's better to keep it simple:
class NameOfStruct { // or "struct", if you want members to be public by default
// members
};
NameOfStruct instance; // No need for a typedef
NameOfStruct * pointer; // Make it clear that it's a pointer
struct tagNameOfStruct{
In C++, struct tags have equivalent status to typedefs. Not so in C.
T_NameOfStruct
This is another and better name for tagNameOfStruct.
*PT_NameOfStruct
This is a typedef for a pointer to NameOfStruct.
FAR LPT_NameOfStruct;
This is an obsolete name for a FAR pointer to NameOfStruct, given that near and far pointers have meant the same thing since Windows 95 as far as I can see. Note that this is because of the preceding * in the preceding typename, which carries over to this one because of C's strange pointer syntax.
The only things you need to worry about here are NameOfStruct, which is a typedef for the struct itself, and PT_NameOfStruct, which is a pointer to it. The rest is fluff.
That instance is being inserted to data-structure which is used outside the scope of the function. It is passed to the data-structure by reference.
Shouldn't instance die when we leave the function scope?
Yes. You have found a bug.
What does the ending of the struct definition mean?
'The ending of the struct definition' means that this is the end of the struct definition. What else would it mean? Did you mean the ending of the declaration? Did you mean everything I've stated above?
In addition to the above answers,
struct s
{
}
In above struct definition you can use as follows
in c for declaring instance you should do like this
struct s instance1;
in c++ you should define as follows
struct s instance1; (or)
s instance1;
Both are valid in c++.
In c instead of using struct s at all area you can use typedef for simplification.
After seeing this question a few minutes ago, I wondered why the language designers allow it as it allows indirect modification of private data. As an example
class TestClass {
private:
int cc;
public:
TestClass(int i) : cc(i) {};
};
TestClass cc(5);
int* pp = (int*)&cc;
*pp = 70; // private member has been modified
I tested the above code and indeed the private data has been modified. Is there any explanation of why this is allowed to happen or this just an oversight in the language? It seems to directly undermine the use of private data members.
Because, as Bjarne puts it, C++ is designed to protect against Murphy, not Machiavelli.
In other words, it's supposed to protect you from accidents -- but if you go to any work at all to subvert it (such as using a cast) it's not even going to attempt to stop you.
When I think of it, I have a somewhat different analogy in mind: it's like the lock on a bathroom door. It gives you a warning that you probably don't want to walk in there right now, but it's trivial to unlock the door from the outside if you decide to.
Edit: as to the question #Xeo discusses, about why the standard says "have the same access control" instead of "have all public access control", the answer is long and a little tortuous.
Let's step back to the beginning and consider a struct like:
struct X {
int a;
int b;
};
C always had a few rules for a struct like this. One is that in an instance of the struct, the address of the struct itself has to equal the address of a, so you can cast a pointer to the struct to a pointer to int, and access a with well defined results. Another is that the members have to be arranged in the same order in memory as they are defined in the struct (though the compiler is free to insert padding between them).
For C++, there was an intent to maintain that, especially for existing C structs. At the same time, there was an apparent intent that if the compiler wanted to enforce private (and protected) at run-time, it should be easy to do that (reasonably efficiently).
Therefore, given something like:
struct Y {
int a;
int b;
private:
int c;
int d;
public:
int e;
// code to use `c` and `d` goes here.
};
The compiler should be required to maintain the same rules as C with respect to Y.a and Y.b. At the same time, if it's going to enforce access at run time, it may want to move all the public variables together in memory, so the layout would be more like:
struct Z {
int a;
int b;
int e;
private:
int c;
int d;
// code to use `c` and `d` goes here.
};
Then, when it's enforcing things at run-time, it can basically do something like if (offset > 3 * sizeof(int)) access_violation();
To my knowledge nobody's ever done this, and I'm not sure the rest of the standard really allows it, but there does seem to have been at least the half-formed germ of an idea along that line.
To enforce both of those, the C++98 said Y::a and Y::b had to be in that order in memory, and Y::a had to be at the beginning of the struct (i.e., C-like rules). But, because of the intervening access specifiers, Y::c and Y::e no longer had to be in order relative to each other. In other words, all the consecutive variables defined without an access specifier between them were grouped together, the compiler was free to rearrange those groups (but still had to keep the first one at the beginning).
That was fine until some jerk (i.e., me) pointed out that the way the rules were written had another little problem. If I wrote code like:
struct A {
int a;
public:
int b;
public:
int c;
public:
int d;
};
...you ended up with a little bit of self contradition. On one hand, this was still officially a POD struct, so the C-like rules were supposed to apply -- but since you had (admittedly meaningless) access specifiers between the members, it also gave the compiler permission to rearrange the members, thus breaking the C-like rules they intended.
To cure that, they re-worded the standard a little so it would talk about the members all having the same access, rather than about whether or not there was an access specifier between them. Yes, they could have just decreed that the rules would only apply to public members, but it would appear that nobody saw anything to be gained from that. Given that this was modifying an existing standard with lots of code that had been in use for quite a while, the opted for the smallest change they could make that would still cure the problem.
Because of backwards-compatability with C, where you can do the same thing.
For all people wondering, here's why this is not UB and is actually allowed by the standard:
First, TestClass is a standard-layout class (§9 [class] p7):
A standard-layout class is a class that:
has no non-static data members of type non-standard-layout class (or array of such types) or reference, // OK: non-static data member is of type 'int'
has no virtual functions (10.3) and no virtual base classes (10.1), // OK
has the same access control (Clause 11) for all non-static data members, // OK, all non-static data members (1) are 'private'
has no non-standard-layout base classes, // OK, no base classes
either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and // OK, no base classes again
has no base classes of the same type as the first non-static data member. // OK, no base classes again
And with that, you can are allowed to reinterpret_cast the class to the type of its first member (§9.2 [class.mem] p20):
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
In your case, the C-style (int*) cast resolves to a reinterpret_cast (§5.4 [expr.cast] p4).
A good reason is to allow compatibility with C but extra access safety on the C++ layer.
Consider:
struct S {
#ifdef __cplusplus
private:
#endif // __cplusplus
int i, j;
#ifdef __cplusplus
public:
int get_i() const { return i; }
int get_j() const { return j; }
#endif // __cplusplus
};
By requiring that the C-visible S and the C++-visible S be layout-compatible, S can be used across the language boundary with the C++ side having greater access safety. The reinterpret_cast access safety subversion is an unfortunate but necessary corollary.
As an aside, the restriction on having all members with the same access control is because the implementation is permitted to rearrange members relative to members with different access control. Presumably some implementations put members with the same access control together, for the sake of tidiness; it could also be used to reduce padding, although I don't know of any compiler that does that.
The whole purpose of reinterpret_cast (and a C style cast is even more powerful than a reinterpret_cast) is to provide an escape path around safety measures.
The compiler would have given you an error if you had tried int *pp = &cc.cc, the compiler would have told you that you cannot access a private member.
In your code you are reinterpreting the address of cc as a pointer to an int. You wrote it the C style way, the C++ style way would have been int* pp = reinterpret_cast<int*>(&cc);. The reinterpret_cast always is a warning that you are doing a cast between two pointers that are not related. In such a case you must make sure that you are doing right. You must know the underlying memory (layout). The compiler does not prevent you from doing so, because this if often needed.
When doing the cast you throw away all knowledge about the class. From now on the compiler only sees an int pointer. Of course you can access the memory the pointer points to. In your case, on your platform the compiler happened to put cc in the first n bytes of a TestClass object, so a TestClass pointer also points to the cc member.
This is because you are manipulating the memory where your class is located in memory. In your case it just happen to store the private member at this memory location so you change it. It is not a very good idea to do because you do now know how the object will be stored in memory.
Say I have some data allocated somewhere in my program, like:
some_type a;
and I want to wrap this data in a class for access. Is it valid to say,
class Foo {
private:
some_type _val;
public:
inline void doSomething() { c_doSomething(&_val); }
}
Foo *x = reinterpret_cast<Foo *>(&a);
x->double();
The class has no virtual functions, and only includes a single data item of the type I'm trying to wrap. Does the C++ standard specify that this reinterpret_cast is safe and valid? sizeof(Foo) == sizeof(some_type), no address alignment issues, or anything? (In my case, I'd be ensuring that some_type is either a primitive type like int, or a POD structure, but I'm curious what happens if we don't enforce that restriction, too - for example, a derived class of a UIWidget like a UIMenuItem, or something.)
Thanks!
Is it valid to say...
No, this is not valid. There are only a small number of types that a can be treated as; the complete list can be found in an answer I gave to another question.
Does the C++ standard specify that this reinterpret_cast is safe and valid?
The C++ Standards says very little about reinterpret_cast. Its behavior is almost entirely implementation-defined, so use of it is usually non-portable.
The correct way to do this would be to either
have a Foo constructor that takes a some_type argument and makes a copy of it or stores a reference or pointer to it, or
implement your "wrapper" interface as a set of non-member functions that take a some_type object by reference as an argument.
14882/1998/9.2.17:
"A pointer to a PODstruct
object, suitably converted using a reinterpret_cast, points to its initial
member (or if that member is a bitfield,
then to the unit in which it resides) and vice versa. [Note: There
might therefore be unnamed padding within a PODstruct
object, but not at its beginning, as necessary to
achieve appropriate alignment. ]"
So, it would be valid if your wrapper was strictly a POD in itself. However, access specifiers mean that it is not a strictly a POD. That said, I would be interested in knowing whether any current implementation changes object layout due to access specifiers. I think that for all practical purposes, you are good to go.
And for the case when the element is not a POD, it follows that the container is not a POD, and hence all bets are off.
Since your Foo object is already only valid as long as the existing a is valid:
struct Foo {
some_type &base;
Foo(some_type &base) : base (base) {}
void doSomething() { c_doSomething(&base); }
}
//...
Foo x = a;
x.doSomething();
You want to look up the rules governing POD (plain old data) types. If the C++ class is a POD type, then yes, you can cast it.
The details of what actually happens and how aliasing is handled are implementation defined, but are usually reasonable and should match what would happen with a similar C type or struct.
I happen to use this a lot in a project of mine that implements B+ trees in shared memory maps. It has worked in GCC across multiple types of Linux and BSDs including Mac OS X. It also works fine in Windows with MSVC.
Yes, this is valid as long as the wrapper type you are creating (Foo in your example) is a POD-type.