overload "->" (member access) recursively - c++

I am learning how to overload "->" and the documentation says that:
"operator-> is called again on the value that it returns, recursively, until the operator-> is reached that returns a plain pointer. After that, builtin semantics are applied to that pointer."
While it is clear what the documentation says, essentially that an overloaded "->" of a class could use itself a "special pointer" having itself an overloaded "->" that could give a "special pointer" etc etc until a "plain pointer" is found, I cannot find an example of a real use of it ( unless it is used to find a linked list last element ).
Could somebody explain what is the retionale behind the scenes, ( as that possibility isn't provided with "plain pointers" - so I dont' see any reason to provide it with "special pointers" ).
An example of real world use could help too, as probably I am missing a model where to apply the behaviour.
On the opposite side there could be the need to avoid that behaviour, how could it be done ?

Well, the -> operator works under rater special circumstances.
One can call it a pseudo-binary operator. According to its natural syntax pointer->member it takes two operands: a normal run-time operand on the left-hand side and a rather "strange" member name operand on the right-hand side. The "strangeness" of the second operand is rooted in the fact that C++ language has no user-accessible concept for representing such operands. There's nothing in the language that would express a member name as an operand. There's no way to "pass" a member name through the code to the user-defined implementation. The member name is a compile-time entity, remotely similar to constant expressions in that regard, but no constant expression in C++ can specify members. (There are expressions for pointers-to-members, but not for members themselves).
This creates rather obvious difficulties in specifying the behavior of overloaded -> operator: how do we connect what was specified on the right-hand side of -> (i.e the member name) to the code written by the user? It is not possible to do it directly. The only way out of this situation is to do it indirectly: force the user to channel the user-defined functionality of the overloaded -> operator into the functionality of some existing built-in operator. The built-in operator can handle member names naturally, through its core language capabilities.
In this particular case we have only two candidates to channel the functionality of the overloaded -> to: the built-in -> and the built-in .. It is only logical that the built-in -> was chosen for that role. This created an interesting side-effect: the possibility to write "chained" (recursive) sequences of overloaded -> operators (unwrapped implicitly by the compiler) and even infinitely recursive sequences (which are ill-formed).
Informally speaking, every time you use a smart pointer you make a real-world use of these "recursive" properties of overloaded -> operator. If you have a smart pointer sptr that points to a class object with member member, the member access syntax remains perfectly natural, e.g. sptr->member. You don't have to do it as sptr->->member or sptr->.member specifically because of the implicit "recursive" properties of overloaded ->.
Note that this recursive behavior is only applied when you use operator syntax for invoking the overloaded -> operator, i.e. the object->member syntax. However, you can also use the regular member function call syntax to call your overloaded ->, e.g. object.operator ->(). In this case the call is carried out as an ordinary function call and no recursive application of -> takes place. This is the only way to avoid the recursive behavior. If you implement overloaded -> operator whose return type does not support further applications of -> operator (for example, you can define an overloaded -> that returns int), then the object.operator ->() will be the only way to invoke your overloaded implementation. Any attempts to use the object->member syntax will be ill-formed.

I cannot find an example of a real use of it ( unless it is used to find a linked list last element ).
I think you're misunderstanding what it does. It isn't used to dereference a list element and keep dereferencing the next element. Each time you call operator-> you would get back a different type, the point is that if that second type also has an operator-> it will be called, which might return a different type again. Imagine it being like x->->->i not x->next->next->next if that helps
An example of real world use could help too, as probably I am missing a model where to apply the behaviour.
It can be useful for the Execute Around Pointer pattern.
On the opposite side there could be the need to avoid that behaviour, how could it be done ?
Call the operator explicitly:
auto x = p.operator->();

Related

C++ arrow operator overloading. How to get name of accessed method?

I want to overload array operator in this way.
A b;
b->c;
and b-> c should expand to
boverloadarrayfunction("c");
Is it possible?
Edit: Disclaimer I know it is bad thing.
Why: I want to add level of indirection to QueryInterface.
operator-> is not the array operator. None of the C++ operators is officially called that way, but the one that fits that name best would be the indexing opeator[].
What you want is not possible. In b->c, however that might be implemented, c is a symbol, i.e. the name of some variable or function. "c" on the other hand is a string literal, and the first cannot be converted to the latter (except by some black preprocessor magic, which does not fit in the expression b->c).
You might want to look up how overloading operator-> works, because it is special in the sense that it has to return either a pointer or another object that has operator-> overloaded. The compiler will call the -> on any returned object until it truly dereferences a pointer.

How to dis-ambiguate operator definitions between objects/classes in a programming language?

I'm designing my own programming language (called Lima, if you care its on www.btetrud.com), and I'm trying to wrap my head around how to implement operator overloading. I'm deciding to bind operators on specific objects (its a prototype based language). (Its also a dynamic language, where 'var' is like 'var' in javascript - a variable that can hold any type of value).
For example, this would be an object with a redefined + operator:
x =
{ int member
operator +
self int[b]:
ret b+self
int[a] self:
ret member+a
}
I hope its fairly obvious what that does. The operator is defined when x is both the right and left operand (using self to denote this).
The problem is what to do when you have two objects that define an operator in an open-ended way like this. For example, what do you do in this scenario:
A =
{ int x
operator +
self var[b]:
ret x+b
}
B =
{ int x
operator +
var[a] self:
ret x+a
}
a+b ;; is a's or b's + operator used?
So an easy answer to this question is "well duh, don't make ambiguous definitions", but its not that simple. What if you include a module that has an A type of object, and then defined a B type of object.
How do you create a language that guards against other objects hijacking what you want to do with your operators?
C++ has operator overloading defined as "members" of classes. How does C++ deal with ambiguity like this?
Most languages will give precedence to the class on the left. C++, I believe, doesn't let you overload operators on the right-hand side at all. When you define operator+, you are defining addition for when this type is on the left, for anything on the right.
In fact, it would not make sense if you allowed your operator + to work for when the type is on the right-hand side. It works for +, but consider -. If type A defines operator - in a certain way, and I do int x - A y, I don't want A's operator - to be called, because it will compute the subtraction in reverse!
In Python, which has more extensive operator overloading rules, there is a separate method for the reverse direction. For example, there is a __sub__ method which overloads the - operator when this type is on the left, and a __rsub__ which overloads the - operator when this type is on the right. This is similar to the capability, in your language, to allow the "self" to appear on the left or on the right, but it introduces ambiguity.
Python gives precedence to the thing on the left -- this works better in a dynamic language. If Python encounters x - y, it first calls x.__sub__(y) to see if x knows how to subtract y. This can either produce a result, or return a special value NotImplemented. If Python finds that NotImplemented was returned, it then tries the other way. It calls y.__rsub__(x), which would have been programmed knowing that y was on the right hand side. If that also returns NotImplemented, then a TypeError is raised, because the types were incompatible for that operation.
I think this is the ideal operator overloading strategy for dynamic languages.
Edit: To give a bit of a summary, you have an ambiguous situation, so you really only three choices:
Give precedence to one side or the other (usually the one on the left). This prevents a class with a right-side overload from hijacking a class with a left-side overload, but not the other way around. (This works best in dynamic languages, as the methods can decide whether they can handle it, and dynamically defer to the other one.)
Make it an error (as #dave is suggesting in his answer). If there is ever more than one viable choice, it is a compiler error. (This works best in static languages, where you can catch this thing in advance.)
Only allow the left-most class to define operator overloads, as in C++. (Then your class B would be illegal.)
The only other option is to introduce a complex system of precedence to the operator overloads, but then you said you want to reduce the cognitive overhead.
I'm going to answer this question by saying "duh, don't make ambiguous definitions".
If I recreate your example in C++ (using a function f instead of the + operator and int/float instead of A/B, but there really isn't much difference)...
template<class t>
void f(int a, t b)
{
std::cout << "me! me! me!";
}
template<class t>
void f(t a, float b)
{
std::cout << "no, me!";
}
int main(void)
{
f(1, 1.0f);
return 0;
}
...the compiler will tell me precisely that: error C2668: 'f' : ambiguous call to overloaded function
If you create a language powerful enough, it's always going to be possible to create things in it that don't make sense. When this happens, it's probably ok to just throw up your hands and say "this doesn't make sense".
In C++, a op b means a.op(b), so it's unambigious; the order settles it. If, in C++, you want to define an operator whose left operand is a built-in type, then the operator has to be a global function with two arguments, not a member; again, though, the order of the operands determines which method to call. It is illegal to define an operator where both operands are of built-in types.
I would suggest that given X + Y, the compiler should look for both X.op_plus(Y) and Y.op_added_to(X); each implementation should include an attribute indicating whether it should be a 'preferred', 'normal', 'fallback' implementation, and optionally also indicating that it is "common". If both implementations are defined, and they implementations are of different priorities (e.g. "preferred" and "normal"), use the type to select a preference. If both are defined to be of the same priority, and both are "common", favor the X.op_plus(Y) form. If both are defined with the same priority and they are not both "common", flag an error.
I would suggest that the ability to prioritize overloads and conversions would IMHO a very important feature for a language to have. It is not helpful for languages to squawk about ambiguous overloads in cases where both candidates would do the same thing, but languages should squawk in cases where two possible overloads would have different meanings, each of which would be useful in certain contexts. For example, given someFloat==someDouble or someDouble==someLong, a compiler should squawk, since there can be usefulness to knowing whether the numerical quantities represented by two values match, and there can also be usefulness in knowing whether the left-hand operand holds the best possible representation (for its type) of the value in the right-hand operand. Java and C# do not flag ambiguity in either case, opting instead to use the first meaning for the first expression and the second for the second, even though either meaning might be useful in either case. I would suggest that it would be better to reject such comparisons than to have them implement inconsistent semantics.
Overall, I'd suggest as a philosophy that a good language design should let a programmer indicate what's important and what isn't. If a programmer knows that certain "ambiguities" aren't problems, but other ones are, it should be easy to have the compiler flag the latter but not the former.
Addendum
I looked briefly through your proposal; it sees you're expecting bindings to be fully dynamic. I've worked with a language like that (HyperTalk, circa 1988) and it was "interesting". Consider, for example, that "2X" < "3" < 4 < 10 < "11" < "2X". Double dispatch can sometimes be useful, but only in cases where operators overloads with different semantics (e.g. string and numeric comparisons) are limited to operating on disjoint sets of things. Forbidding ambiguous operations at compile time is a good thing, since the programmer will be in a position to specify what's intended. Having such ambiguity trigger a run-time error is a bad thing, because the programmer may be long gone by the time an error surfaces. Consequently, I really can't offer any advice for how to do run-time double dispatch for operators except to say "don't", unless at compile time you restrict the operands to combinations where any possible overload would always have the same semantics.
For example, if you had an abstract "immutable list of numbers" type, with a member to report the length or return the number at a particular index, you could specify that two instances are equal if they have the same length, and every for every index they return the same number. While it would be possible to compare any two instances for equality by examining every item, that could be inefficient if e.g. one instance was a "BunchOfZeroes" type which simply held an integer N=1000000 and didn't actually store any items, and the other was an "NCopiesOfArray" which held N=500000 and {0,0} as the array to be copied. If many instances of those types are going to be compared, efficiency could be improved by having such comparisons invoke a method which, after checking overall array length, checks whether the "template" array contains any non-zero elements. If it doesn't, then it can be reported as equal the bunch-of-zeroes array without having to perform 1,000,000 element comparisons. Note that the invocation of such a method by double dispatch would not alter the program's behavior--it would merely allow it to execute more quickly.

subscript operator postfix

The C++ standard defines the expression using subscripts as a postfix expression. AFAIK, this operator always takes two arguments (the first is the pointer to T and the other is the enum or integral type). Hence it should qualify as a binary operator.
However MSDN and IBM does not list it as a binary operator.
So the question is, what is subscript operator? Is it unary or binary? For sure, it is not unary as it is not mentioned in $5.3 (at least straigt away).
What does it mean when the Standard mentions it's usage in the context of postfix expression?
I'd tend to agree with you in that operator[] is a binary operator in the strictest sense, since it does take two arguments: a (possibly implicit) reference to an object, and a value of some other type (not necessarily enumerated or integral). However, since it is a bracketing operator, you might say that the sequence of tokens [x], where x might be any valid subscript-expression, qualifies as a postfix unary operator in an abstract sense; think currying.
Also, you cannot overload a global operator[](const C&, size_t), for example. The compiler complains that operator[] must be a nonstatic member function.
You are correct that operator[] is a binary operator but it is special in that it must also be a member function.
Similar to operator()
You can read up on postfix expressions here
I just found an interesting article about operator[] and postfix expression, here
I think it's the context that [] is used in that counts. Section 5.2.1 the symbol [] is used in the context of a postfix expression that is 'is identical (by definition) to *((E1)+(E2))'. In this context, [] isn't an operator. In section 13.5.5 its used to mean the subscripting operator. In this case it's an operator that takes one argument. For example, if I wrote:
x = a[2];
It's not necessarily the case that the above statement evaluates to:
x = *(a + 2);
because 'a' might be an object. If a is an object type then in this context, [] is used as an subscript operator.
Anyway that's the best explanation I can derive from the standard that resolves apparent contradictions.
If you take a close look to http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B it will explain you that standard C++ recognize operator[] to be a binary operator, as you said.
Operator[] is, generally speaking, binary, and, despite there is the possibility to make it unary, it should always be used as binary inside a class, even because it has no sense outside a class.
It is well explained in the link I provided you...
Notice that sometimes many programmers overload operators without think too much about what they are doing, sometimes overloading them in an incorrect manner; the compiler is ease is this and accept it, but, probably, it was not the correct way to overload that operator.
Following guides like the one I provided you, is a good way to do things in the correct manner.
So, always beware examples where operators are overloaded without a good practice (out of standard), refer, first to the standard methods, and use those examples that are compliant to them.

Are . and -> in C and C++ really operators?

you probably have been tought, are tought yourselves, that . and -> are operators which retrieve members of a struct (C) or class (C++).
However, I doubt they are operators - because, if they are operators, what are their input types? Furthermore, the identifiers on both sides are interdependent - a feature which for example the + operator lacks of.
If this is correct - in what sense are these still labeled as operator in practice, and what is their formal definition with regard to language theory.
You assume that the only types which can be passed as arguments to an operator are types that can be defined within the language.
I would argue that any type which can be recognised by the compiler may be passed as an argument, including internal types such as "identifier". The operator will have two arguments in its AST representation, which is enough to allow you to define semantics.
Another argument is that language theory may provide one set of definitions for your vocabulary, but it isn't the only one.
For example, an operator may be a man who works a machine. That definition has no relevance to programming theory, but it won't stop me using for keywords in a domain-specific language expressing something to do with machine operating. Similarly, the term "operator" has a wider definition in mathematics than that which is specific to programming theory - and that definition isn't invalidated simply by working with a programming language.
To put it another way - if you didn't call it an operator, what would you call it?
EDIT
To clarify, my first argument is referring to the syntax for using the operator (the call). These operators have right arguments which are identifiers - member names - which the C++ language cannot express using a data type. The C++ language does have member pointers, but they aren't the same thing as the members - just as a variable isn't the same as a pointer to that variable.
I assume that is what the question referred to. The right parameter of those operators has a type which cannot be expressed or manipulated normally in the language.
What happens when that syntax is mapped to an overloaded operator-> function is a different thing. The function isn't the operator - it's only how the operator gets implemented.
I think the fact that you can overload the -> operator using the "operator" keyword should be a dead giveaway.
Smart pointers do it pretty often:
template<class T>
struct myPtr {
T *operator ->() { return m_ptr; }
private:
T *m_ptr;
};
The . is not overloadable, but is also an operator by definition.
Hmmm...sizeof is an operator, what is its input type? I don't think the question is useful for distinguishing operators from non-operators in this context.
And that would be because what "operator" means in the context of a programming language is exactly what the author of the language says it means. Shades of Lewis Carroll here.
This reference says they're both operators in C++:
http://www.cplusplus.com/doc/tutorial/operators/
Is that not authoritative enough?
You can overload the -> operator: Wikipedia. That page also states that you can't overload dot. There's an example of -> overloading here:
class String // this is handle
{
...
Stringrep *operator -> () const { return b_; }
private:
Stringrep *b_;
}
The arrow works on the value to the left of the arrow and returns whatever the left hand side is "holding inside". Think of a smart pointer.
THe C++03 standard refers to both as operators.
Example:
...after the . operator applied to an expression of the type of its class...
If you are not comfortable with that terminology you can use the term punctuator for ..
Online C standard (n1256):
6.5.2.3 Structure and union members
Constraints
1 The first operand of the . operator shall have a qualified or unqualified structure or union
type, and the second operand shall name a member of that type.
2 The first operand of the -> operator shall have type ‘‘pointer to qualified or unqualified
structure’’ or ‘‘pointer to qualified or unqualified union’’, and the second operand shall
name a member of the type pointed to.
They are operators, and their input types are specified by the standard.
haha, i know people have already said this in a roundabout way but just to say it directly. In C terms, label-> is actually a shorthand for (*label). .That being said, . is the operator which references elements in a struct. Therefore, -> references an element in a pointer to a struct.

c++ shorthand operator-> operator()

Suppose I have:
Foo foo;
is there a shorthand for this?
foo.operator->().operator()(1, 2);
Well... Yes. The shorter form would look as
foo.operator->()(1, 2)
As for eliminating the operator -> part... From the information you supplied so far it is impossible to say, but if it is implemented the way I can guess it is implemented (judging from your expression), then you can't eliminate it.
In C++ the use of overloaded -> operator in an expression is interpreted as a chain of repetitive overloaded -> calls, which eventually ends in a built-in -> invocation. This means that at some point the overloaded -> must return a pointer. Your overloaded -> obviously doesn't return a pointer. So, in order to use it you have no other choice but to spell it out explicitly as operator ->().
Assuming you actually meant foo.operator->().operator()(1, 2), and that you have control over the class Foo, a simpler form would be (*foo)(1, 2). It requires the operator* to that defined though, but since we usually expect foo->bar to be equivalent to (*foo).bar, it seems reasonable.
If your Foo is a smart pointer class of some sort, which points to an object which defines an operator(), this would be the most concise way of calling the object's operator().
But without more detail (and without you providing an expression that's actually valid C++ -- there's no way in which operator(1, 2) as you wrote it can be valid), it's impossible to answer your question. I'm just guessing at what you're trying to do.
Well, no, but, assuming you have write permissions to the class, you could define another member function that calls operator(), and then you'd have something like:
foo->myfunc(1,2);
That you find yourself in this position is a sign that you (or the person who wrote this class) is being a bit too cute with operator overloading.