I was looking at the C++ standard regarding member reference operators (the unary * dereferencing operator, the -> member accessor operator) as well as many other related questions:
C++ - Difference between (*). and ->?
ptr->hello(); /* VERSUS */ (*ptr).hello();
C++ pointers difference between * and ->
I saw that most answers stated that p->m is syntactic sugar for (*p).m as defined by the C++ Standard (5.2.5, paragraph 2):
The expression E1->E2 is converted to the equivalent form (*(E1)).E2
Many comments also noted that because operator* and operator-> are overloadable in classes, they should be overloaded uniformly to ensure consistent behavior.
These statements seem to contradict each other: if (as per the standard) E1->E2 is converted to the equivalent form (*(E1)).E2, then what would be the purpose of overloading operator-> (as is permitted by the standard)?
Simpler stated, are these two parts of the standard in conflict, or am I misunderstanding the Standard?
Does the E1->E2 equivalence transformation to (*(E1)).E2 apply to all complete types or only to built in ones?
The conversion from E1 -> E2 to (*(E1)).E2 only applies to raw pointer types. For class types, E1 -> E2 evaluates to (E1).operator->().E2, which potentially might recursively expand out even more copies of operator-> if the return type of operator-> is not itself a pointer type. You can see this by creating a type that supports operator* but not operator-> and trying to use the arrow operator on it; you'll get an error that operator-> is undefined.
As a follow-up, it's common to implement operator -> in terms of operator * in a way that makes the semantics of -> match the semantics for pointers. You often see things like this:
PointerType ClassType::operator-> () const {
return &**this;
}
This expression is interpreted as
&(*(*this)),
meaning "take this object (*this), dereference it (*(*this)), and get the address of what you find (&(*(*this)).)." Now, if you use the rule that E1 -> E2 should be equivalent to (*(E1)).E2, you can see that you end up getting something equivalent.
Related
I was reading Effective C++: 55 Specific Ways to Improve Your Programs and Designs by Scott Meyers and he stated:
Having a function return a constant value is generally inappropriate, but sometimes doing so can reduce the incidence of client errors without giving up safety or efficiency. For example, consider the declaration of the operator* function:
class Rational { ... };
const Rational operator*(const Rational& lhs, const Rational& rhs);
According to Meyers, do this prevents "atrocities" like this, which would be illegal if a, b were primitive types:
Rational a, b, c;
...
(a * b) = c;
This got me confused and while trying to understand why the above assignment was illegal for primitive types but not user-defined types, I came across rvalues and lvalues
I still feel I don't have a strong grasp of what rvalues and lvalues are after looking through some SO questions, but here's my basic understanding: an lvalue references a location in memory and thus can be assigned to (it can be on both sides of = operator as well); an rvalue however, cannot be assigned to because it does not reference a memory location(e.g. temporary values from function returns and literals)
My question is: why is assigning to a product of two numbers/objects legal for user-defined types (even though it does not make sense) but not primitives? Does it have to do with return types? does the overloaded * operator return an assignable value or a temporary value?
[expr.call]/14: A function call is an lvalue if the result type is an lvalue reference type or an rvalue reference to function type, an xvalue if the result type is an rvalue reference to object type, and a prvalue otherwise.
This makes sense, since the result doesn't "have a name". If you returned a reference, the implication would be that it is a reference to some object somewhere that does "have a name" (which is, generally but not always, true).
Then there's this:
[expr.ass]/1: The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand; their result is an lvalue referring to the left operand.
This is saying that an assignment requires an lvalue on the left hand side. So far so good; you've covered this yourself.
How come a non-const function call result works then?
By a special rule!
[over.oper]/8:: [..] Some predefined operators, such as +=, require an operand to be an lvalue when applied to basic types; this is not required by operator functions.
… and = applied to an object of class type invokes an operator function.
I can't readily answer the "why": on the surface of it, it made sense to relax this restriction when dealing with classes, and the original (inherited) restriction on built-ins always seemed a little excessive (in my opinion) but would have had to be kept for compatibility reasons.
But then you have people like Meyers pointing out that it now becomes useful (sort of) to return const values to effectively "undo" this change.
Ultimately I wouldn't try too hard to find a strong rationale either way.
This question already has answers here:
Why are multiple increments/decrements valid in C++ but not in C?
(4 answers)
Closed 5 years ago.
Why is
int main()
{
int i = 0;
++++i;
}
valid C++ but not valid C?
C and C++ say different things about the result of prefix ++. In C++:
[expr.pre.incr]
The operand of prefix ++ is modified by adding 1. The operand shall be
a modifiable lvalue. The type of the operand shall be an arithmetic
type other than cv bool, or a pointer to a completely-defined object
type. The result is the updated operand; it is an lvalue, and it is a
bit-field if the operand is a bit-field. The expression ++x is
equivalent to x+=1.
So ++ can be applied on the result again, because the result is basically just the object being incremented and is an lvalue. In C however:
6.5.3 Unary operators
The operand of the prefix increment or decrement operator shall have atomic, qualified, or unqualified real or pointer type, and shall be a modifiable lvalue.
The value of the operand of the prefix ++ operator is incremented. The
result is the new value of the operand after incrementation.
The result is not an lvalue; it's just the pure value of the incrementation. So you can't apply any operator that requires an lvalue on it, including ++.
If you are ever told the C++ and C are superset or subset of each other, know that it is not the case. There are many differences that make that assertion false.
In C, it's always been that way. Possibly because pre-incremented ++ can be optimised to a single machine code instruction on many CPUs, including ones from the 1970s which was when the ++ concept developed.
In C++ though there's the symmetry with operator overloading to consider. To match C, the canonical pre-increment ++ would need to return const &, unless you had different behaviour for user-defined and built-in types (which would be a smell). Restricting the return to const & is a contrivance. So the return of ++ gets relaxed from the C rules, at the expense of increased compiler complexity in order to exploit any CPU optimisations for built-in types.
I assume you understand why it's fine in C++ so I'm not going to elaborate on that.
For whatever it's worth, here's my test result:
t.c:6:2: error: lvalue required as increment operand
++ ++c;
^
Regarding CppReference:
Non-lvalue object expressions
Colloquially known as rvalues, non-lvalue object expressions are the expressions of object types that do not designate objects, but rather values that have no object identity or storage location. The address of a non-lvalue object expression cannot be taken.
The following expressions are non-lvalue object expressions:
all operators not specified to return lvalues, including
increment and decrement operators (note: pre- forms are lvalues in C++)
And Section 6.5.3.1 from n1570:
The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation.
So in C, the result of prefix increment and prefix decrement operators are not required to be lvalue, thus not incrementable again. In fact, such word can be understood as "required to be rvalue".
The other answers explain the way that the standards diverge in what they require. This answer provides a motivating example in the area of difference.
In C++, you can have a function like int& foo(int&);, which has no analog in C. It is useful (and not onerous) for C++ to have the option of foo(foo(x));.
Imagine for a moment that operations on basic types were defined somewhere, e.g. int& operator++(int&);. ++++x itself is not a motivating example, but it fits the pattern of foo above.
This question already has answers here:
Is there a platform or situation where dereferencing (but not using) a null pointer to make a null reference will behave badly?
(6 answers)
Closed 6 years ago.
Since de-referencing nullptr (NULL) is an undefined behavior both in C and C++, I am wondering if expression &(*ptr) is a valid one if ptr is nullptr (NULL).
If it is also an undefined behavior, how does OFFSETOF macro in the linked answer work?
I always thought that ptr->field is a shorthand for (*ptr).field
I think the answer to my question is similar in C and C++.
TL;DR &(*(char*)0) is well defined.
The C++ standard doesn't say that indirection of null pointer by itself has UB. Current standard draft, [expr.unary.op]
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T”, the type of the result is “T”. [snip]
The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue or a qualified-id. [snip]
There is no UB unless the lvalue of the indirection expression is converted to an rvalue.
The C standard is much more explicit. C11 standard draft §6.5.3.2
The unary & operator yields the address of its operand. If the operand has type "type", the result has type "pointer to type". If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the
& operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand.
If it is also an undefined behavior, how does offsetof work?
Prefer using the standard offsetof macro. Home-grown versions result in compiler warnings. Moreover:
offsetof is required to work as specified above, even if unary operator& is overloaded for any of the types involved. This cannot be implemented in standard C++ and requires compiler support.
offsetof is a built-in function in gcc.
In C++, you can double up the indirection operator:
vector<unique_ptr<string>> arr{make_unique<string>("Test")};
cout << **arr.begin() << endl;
But you can't double up the dereference operator:
cout << arr.begin()->->c_str() << endl;
Instead, you have to settle with this (IMO) less-legible alternative:
cout << (*arr.begin())->c_str() << endl;
operator-> is a unary operator that returns a pointer type, so it seems natural to be able to chain them. Is there any good reason for this limitation? Is there some parsing difficulty I'm not seeing?
Edit
In 5.2.5/3, The C++ standard specifies:
If E1 has the type “pointer to class X,” then the expression E1->E2 is
converted to the equivalent form (*(E1)).E2
I just wish it was specified as:
If E1 has the type “pointer to class X,” then the expression E1-> is
converted to the equivalent form (*(E1)).
It actually seems contrary for this definition to include E1 and E2, since an overloaded operator-> isn't a binary operator.
Here is a not too technical explanation.
-> is shorthand for the * and the . in (*someptr).memberfunc(). Therefore this can be expressed as someptr->memberfunc().
Two -> would be, in your example, the same as (*(*arr.begin()).).c_str(). Notice the extra dot. This doesn't make sense and it doesn't compile, since . is a binary operator, and * is a unary operator. Therefore, you would have an "extra" dot. You really want two *'s and only one .. Use one -> and one * as you have done.
-> means "dereference and get a member." You want to dereference twice, and get a member once, so double -> is not what you want.
Note that:
(*a).b
a->b
are the same thing
so
a->->b
(*a)->b
(*(*a)).b
So would be okay as an operator, but that isn't in the spirit of -> the spirit is to access things that are pointed to in structures. I'd rather type a->b than (*a).b
So while no technical reason (*a)->b tells you "a is a pointer to a pointer of a structure with b" and a->b->c is totally different to a->->b even though they look similar.
If I understand this correctly, You can get the behaviour that you are looking for with a better design.
The dereference operator actually doesn't really works just as an operator that retrieves a values for you from a pointer, if offers a slightly more complex behaviour.
The business logic of the -> operator is clearly illustrated here, it is also being expanded here, here and you can find a birds eye view about pointer related operators here .
As you can easily guess, If you have T->t you can use the drill down behaviour at your advantage, assuming that you have properly designed and defined T and its own -> operator.
This is a solution that can easily bring some polymorphic behaviour to your application.
In C++, pre-increment operator gives lvalue because incremented object itself is returned, not a copy.
But in C, it gives rvalue. Why?
C doesn't have references. In C++ ++i returns a reference to i (lvalue) whereas in C it returns a copy(incremented).
C99 6.5.3.1/2
The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation. The expression ++Eis equivalent to (E+=1).
‘‘value of an expression’’ <=> rvalue
However for historical reasons I think "references not being part of C" could be a possible reason.
C99 says in the footnote (of section $6.3.2.1),
The name ‘‘lvalue’’ comes originally
from the assignment expression E1 =
E2, in which the left operand E1 is
required to be a (modifiable) lvalue.
It is perhaps better considered as
representing an object ‘‘locator
value’’. What is sometimes called
‘‘rvalue’’ is in this International
Standard described as the ‘‘value of
an expression’’.
Hope that explains why ++i in C, returns rvalue.
As for C++, I would say it depends on the object being incremented. If the object's type is some user-defined type, then it may always return lvalue. That means, you can always write i++++++++ or ++++++i if type of i is Index as defined here:
Undefined behavior and sequence points reloaded
Off the top of my head, I can't imagine any useful statements that could result from using a pre-incremented variable as an lvalue. In C++, due to the existence of operator overloading, I can. Do you have a specific example of something that you're prevented from doing in C, due to this restriction?