Operator overloading with memory allocation? - c++

The sentence below is from, The Positive Legacy of C++ and Java by Bruce Eckel, about operator overloading in C++:
C++ has both stack allocation and heap
allocation and you must overload your
operators to handle all situations and
not cause memory leaks. Difficult
indeed.
I do not understand how operator overloading has anything to do with memory allocation. Can anyone please explain how they are correlated?

I can imagine a couple possible interpretations:
First, in C++ new and delete are both actually operators; if you choose to provide custom allocation behavior for an object by overloading these operators, you must be very careful in doing so to ensure you don't introduce leaks.
Second, some types of objects require that you overload operator= to avoid memory management bugs. For example, if you have a reference counting smart pointer object (like the Boost shared_ptr), you must implement operator=, and you must be sure to do so correctly. Consider this broken example:
template <class T>
class RefCountedPtr {
public:
RefCountedPtr(T *data) : mData(data) { mData->incrRefCount(); }
~RefCountedPtr() { mData->decrRefCount(); }
RefCountedPtr<T>& operator=(const RefCountedPtr<T>& other) {
mData = other.mData;
return *this;
}
...
protected:
T *mData;
};
The operator= implementation here is broken because it doesn't manage the reference counts on mData and other.mData: it does not decrement the reference count on mData, leading to a leak; and it does not increment the reference count on other.mData, leading to a possible memory fault down the road because the object being pointed to could be deleted before all the actual references are gone.
Note that if you do not explicitly declare your own operator= for your classes, the compiler will provide a default implementation which has behavior identical to the implementation shown here -- that is, completely broken for this particular case.
So as the article says -- in some cases you must overload operators, and you must be careful to handle all situations correctly.
EDIT: Sorry, I didn't realize that the reference was an online article, rather than a book. Even after reading the full article it's not clear what was intended, but I think Eckel was probably referring to situations like the second one I described above.

new and delete are actually operators in C++ which you can override to provide your own customized memory management. Take a look at the example here.

Operators are functions. Just because they add syntactic sugar does not mean you need not be careful with memory. You must manage memory as you would with any other member/global/friend function.
This is particularly important when you overload the when you implement a wrapper pointer class.
There is then string concatenation, by overloading the operator+ or operator+=. Take a look at the basic_string template for more information.

If you are comparing operator overloading between Java and C++, you wouldn't be talking about new and delete - Java doesn't expose enough memory management detail for new, and doesn't need delete.
You can't overload the other operators for pointer types - at least one argument must be a class or enumerated type, so he can't be talking about providing different operators for pointers.
So operators in C++ operate on values or const references to values.
It would be very unusual for operators which operator on values or const references to values to return anything other than a value.
Apart from obvious errors common to all C++ functions - returning a reference to a stack allocated object (which is the opposite of a memory leak), or returning a reference to an object created with new rather than a value (which is usually done no more than once in a career before being learnt), it would be hard to come up with a scenario where the common operators have memory issues.
So there isn't any need to create multiple versions depending on whether the operands are stack or heap allocated based on normal patterns of use.
The arguments to an operator are objects passed as either values or references. There is no portable mechanism in C++ to test whether an object was allocated heap or stack. If the object was passed by value, it will always be on the stack. So if there was a requirement to change the behaviour of the operators for the two cases, it would not be possible to do so portably in C++. (you could, on many OSes, test whether the pointer to the object is in the space normally used for stack or the space normally used for heap, but that is neither portable nor entirely reliable.) (also, even if you could have operators which took two pointers as arguments, there's no reason to believe that the objects are heap allocated just because they are pointers. that information simply doesn't exist in C++)
The only duplication you get is for cases such as operator[] where the same operator is used as both an accessor and a mutator. Then it is normal to have a const and a non-const version, so you can set the value if the receiver is not const. That is a good thing - not being able to mutate (the publicly accessible state of) objects which have been marked constant.

Related

How can deleting a void pointer do anything other than invoke the global delete operator?

The C++ standard very clearly and explicitly states that using delete or delete[] on a void-pointer is undefined behavior, as quoted in this answer:
This implies that an object cannot be deleted using a pointer of type void* because there are no objects of type void.
However, as I understand it, delete and delete[] do just two things:
Call the appropriate destructor(s)
Invoke the appropriate operator delete function, typically the global one
There is a single-argument operator delete (as well as operator delete[]), and that single argument is void* ptr.
So, when the compiler encounters a delete-expression with a void* operand, it of course could maliciously do some completely unrelated operation, or simply output no code for that expression. Better yet, it could emit a diagnostic message and refuse to compile, though the versions of MSVS, Clang, and GCC I've tested don't do this. (The latter two emit a warning with -Wall; MSVS with /W3 does not.)
But there's really only one sensible way to deal with each of the above steps in the delete operation:
void* specifies no destructor, so no destructors are invoked.
void is not a type and therefore cannot have a specific corresponding operator delete, so the global operator delete (or the [] version) must be invoked. Since the argument to the function is void*, no type conversion is necessary, and the operator function must behavior correctly.
So, can common compiler implementations (which, presumably, are not malicious, or else we could not even trust them to adhere to the standard anyway) be relied on to follow the above steps (freeing memory without invoking destructors) when encountering such delete expressions? If not, why not? If so, is it safe to use delete this way when the actual type of the data has no destructors (e.g. it's an array of primitives, like long[64])?
Can the global delete operator, void operator delete(void* ptr) (and the corresponding array version), be safely invoked directly for void* data (assuming, again, that no destructors ought to be called)?
A void* is a pointer to an object of unknown type. If you do not know the type of something, you cannot possibly know how that something is to be destroyed. So I would argue that, no, there is not "really only one sensible way to deal with such a delete operation". The only sensible way to deal with such a delete operation, is to not deal with it. Because there is simply no way you could possibly deal with it correctly.
Therefore, as the original answer you linked to said: deleting a void* is undefined behavior ([expr.delete] §2). The footnote mentioned in that answer remains essentially unchanged to this day. I'm honestly a bit astonished that this is simply specified as undefined behavior rather than making it ill-formed, since I cannot think of any situation in which this could not be detected at compile time.
Note that, starting with C++14, a new expression does not necessarily imply a call to an allocation function. And neither does a delete expression necessarily imply a call to a deallocation function. The compiler may call an allocation function to obtain storage for an object created with a new expression. In some cases, the compiler is allowed to omit such a call and use storage allocated in other ways. This, e.g., enables the compiler to sometimes pack multiple objects created with new into one allocation.
Is it safe to call the global deallocation function on a void* instead of using a delete expression? Only if the storage was allocated with the corresponding global allocation function. In general, you can't know that for sure unless you called the allocation function yourself. If you got your pointer from a new expression, you generally don't know if that pointer would even be a valid argument to a deallocation function, since it may not even point to storage obtained from calling an allocation function. Note that knowing which allocation function must've been used by a new expression is basically equivalent to knowing the dynamic type of whatever your void* points to. And if you knew that, you could also just static_cast<> to the actual type and delete it…
Is it safe to deallocate the storage of an object with trivial destructor without explicitly calling the destructor first? Based on, [basic.life] §1.4, I would say yes. Note that, if that object is an array, you might still have to call the destructors of any array elements first. Unless they are also trivial.
Can you rely on common compiler implementations to produce the behavior you deem reasonable? No. Having a formal definition of what exactly you can rely on is literally the whole point of having a standard in the first place. Assuming you have a standard-conforming implementation, you can rely on the guarantees the standard gives you. You can also rely on any additional guarantees the documentation of a particular compiler may give you, so long as you use that particular version of that particular compiler to compile your code. Beyond that, all bets are off…
If you want to invoke the deallocation function, then just call the deallocation function.
This is good:
void* p = ::operator new(size);
::operator delete(p); // only requires that p was returned by ::operator new()
This is not:
void* p = new long(42);
delete p; // forbidden: static and dynamic type of *p do not match, and static type is not polymorphic
But note, this also is not safe:
void* p = new long[42];
::operator delete(p); // p was not obtained from allocator ::operator new()
While the Standard would allow an implementation to use the type passed to delete to decide how to clean up the object in question, it does not require that implementations do so. The Standard would also allow an alternative (and arguably superior) approach based on having the memory-allocating new store cleanup information in the space immediately preceding the returned address, and having delete implemented as a call to something like:
typedef void(*__cleanup_function)(void*);
void __delete(void*p)
{
*(((__cleanup_function*)p)[-1])(p);
}
In most cases, the cost of implementing new/delete in such fashion would be relatively trivial, and the approach would offer some semantic benefit. The only significant downside of such an approach is that it would require that implementations that document the inner workings of their new/delete implementation, and whose implementations can't support a type-agnostic delete, would have to break any code that relies upon their documented inner workings.
Note that if passing a void* to delete were a constraint violation, that would forbid implementations from providing a type-agnostic delete even if they would be easily capable of doing so, and even if some code written for them would relies upon such ability. The fact that code relies upon such an ability would make it portable only to implementations that can provide it, of course, but allowing implementations to support such abilities if they choose to do so is more useful than making it a constraint violation.
Personally, I would have liked to see the Standard offer implementations two specific choices:
Allow passing a void* to delete and delete the object using whatever type had been passed to new, and define a macro indicating support for such a construct.
Issue a diagnostic if a void* is passed to delete, and define a macro indicating it does not support such a construct.
Programmers whose implementations supported type-agnostic delete could then decide whether the benefit they could receive from such feature would justify the portability limitations imposed by using it, and implementers could decide whether the benefits of supporting a wider range of programs would be sufficient to justify the small cost of supporting the feature.
void* specifies no destructor, so no destructors are invoked.
That is most likely one of the reasons it's not permitted. Deallocating the memory that backs a class instance without calling the destructor for said class is just all around a really really bad idea.
Suppose, for example, the class contains a std::map that has a few hundred thousand elements in it. That represents a significant amount of memory. Doing what you're proposing would leak all of that memory.
A void doesn't have a size, so the compiler has no way of knowing how much memory to deallocate.
How should the compiler handle the following?
struct s
{
int arr[100];
};
void* p1 = new int;
void* p2 = new s;
delete p1;
delete p2;

Purpose of assignment operator overloading in C++

I'm trying to understand the purpose of overloading some operators in C++.
Conceptually, an assignment statement can be easily implemented via:
Destruction of the old object followed by copy construction of the new object
Copy construction of the new object, followed by a swap with the old object, followed by destruction of the old object
In fact, often, the copy-and-swap implementation is the implementation of assignment in real code.
Why, then, does C++ allow the programmer to overload the assignment operator, instead of just performing the above?
Was it intended to allow a scenario in which assignment is faster than destruction + construction?
If so, when does that happen? And if not, then what use case was it intended to support?
1) Reference counting
Suppose you have a resource that is ref-counted and it is wrapped in objects.
void operator=(const MyObject& v) {
this->resource = refCount(v->resource);
}
// Here the internal resource will be copied and not the objects.
MyObject A = B;
2) Or you just want to copy the fields without fancy semantics.
void operator=(const MyObject& v) {
this->someField = v->someField;
}
// So this statement should generate just one function call and not a fancy
// collection of temporaries that require construction destruction.
MyObject A = B;
In both cases the code runs much faster. In the second case the effect is similar.
3) Also what about types...
Use the operator to handle assigning other types to your type.
void operator=(const MyObject& v) {
this->someField = v->someField;
}
void operator=(int x) {
this->value = x;
}
void operator=(float y) {
this->floatValue = y;
}
first, note that "Destruction of the old object followed by copy construction of the new object" is not exception safe.
but re "Copy construction of the new object, followed by a swap with the old object, followed by destruction of the old object", that's the swap idiom for implementing an assignment operator, and it's exception safe if done correctly.
in some cases a custom assignment operator can be faster than the swap idiom. for example, direct arrays of POD type can't really be swapped except by way of lower level assignments. so there for the swap idiom you can expect an overhead proportional to the array size.
however, historically there wasn't much focus on swapping and exception safety.
bjarne wanted exceptions originally (if i recall correctly), but they didn't get into the language until 1989 or thereabouts. so the original c++ way of programming was more focused on assignments. to the degree that a failing constructor signalled its failure by assigning 0 to this… i think, that in those days your question would not have made sense. it was just assignments all over.
typewise, some objects have identity, and others have value. it makes sense to assign to value objects, but for identity objects one typically wants to limit the ways that the object can be modified. while this doesn't require the ability to customize copy assignment (only to make it unavailable), with that ability one doesn't need any other language support.
and i think likewise for any other specific reasons one can think of: probably no such reason really requires the general ability, but the general ability is sufficient to cover it all, so it lowers the overall language complexity.
a good source to get more definitive answer than my hunches, recollections and gut feelings, is bjarne's "the design and evolution of c++" book.
probably the question has a definitive answer there.
Destruction of the old object, followed by copy construction of
the new, will not usually work. And the swap idiom is
guaranteed not to work unless the class provides a special swap
function—std::swap uses assignment in its unspecialized
implementation, and using it directly in the assignment operator
will lead to endless recursion.
And of course, the user may want to do something special, e.g.
make the assignment operator private, for example.
And finally, what is almost certainly an overruling reason: the
default assignment operator has to be compatible with C.
Actually, after seeing juanchopanza's answer (which was deleted), I think I ended up figuring it out myself.
Copy-assignment operators allow classes like basic_string to avoid allocating resources unnecessarily when they can re-use them (in this case, memory).
So when you assign to a basic_string, an overloaded copy assignment operator would avoid allocating memory, and would just copy the string data directly to the existing buffer.
If the object had to be destroyed and constructed again, the buffer would have to be reallocated, which would be potentially much more costly for a small string.
(Note that vector could benefit from this too, but only if it knew that the elements' copy constructors would never throw exceptions. Otherwise it would need to maintain its exception safety and actually perform a copy-and-swap.)
It allows you to use assignment of other types as well.
You could have a class Person with an assignment operator that assigns an ID.
But besides that, you don't always want to copy all the members as they are.
The default assignment only does a shallow copy.
For example, if the class contains pointers, or locks, you dont always want to copy them from the other object.
Usually when you have pointers you want to use a deep copy, and maybe create a copy of the object that the pointers are pointed to.
And if you have locks, you want them to be specific to the object, and you don't want to copy their state from the other object.
It is actually a common practice to provide your own copy constructor and assignment operator if your class holds pointers as members.
I have used it often as a conversion constructor but with already existing objects. i.e assigning member variable type, etc to an object.

What are the limitations of overloading, overriding and replacing new/delete?

I understand that there are 3 general ways to modify the behaviour of new and delete in C++:
Replacing the default new/delete and new[]/delete[]
Overriding or overloading the placement versions (overriding the one with a memory location passed to it, overloading when creating versions which pass other types or numbers of arguments)
Overloading class specific versions.
What are the restrictions for performing these modifications to the behaviour of new/delete?
In particular are there limitations on the signatures that new and delete can be used with?
It makes sense if any replacement versions must have the same signature (otherwise they wouldn't be replacement or would break other code, like the STL for example), but is it permissible to have global placement or class specific versions return smart pointers or some custom handle for example?
First off, don't confuse the new/delete expression with the operator new() function.
The expression is a language construct that performs construction and destruction. The operator is an ordinary function that performs memory (de)allocation.
Only the default operators (operator new(size_t) and operator delete(void *) can be used with the default new and delete expressions. All other forms are summarily called "placement" forms, and for those you can only use new, but you have to destroy objects manually by invoking the destructor. Placement forms are of rather limited and specialised need. By far the most useful placement form is global placement-new, ::new (addr) T, but the behavior of that cannot even be changed (which is presumably why it's the only popular one).
All new operators must return void *. These allocation functions are far more low-level than you might appreciate, so basically you "will know when you need to mess with them".
To repeat: C++ separates the notions of object construction and memory allocation. All you can do is provide alternative implementations for the latter.
When you overload new and delete within a class you are effectively modifying the way the memory is allocated and released for the class, asking for it to give you this control.
This may be done when a class wants to use some kind of pool to allocate its instances, either for optimisation or for tracking purposes.
Restrictions, as with pretty much any operator overload, is the parameter list you may pass, and the behaviour it is expected to adhere to.

references in C++

Once I read in a statement that
The language feature that "sealed the
deal" to include references is
operator overloading.
Why are references needed to effectively support operator overloading?? Any good explanation?
Here's what Stroustrup said in "The Design and Evolution of C++" (3.7 "references"):
References were introduced primarily to support operator overloading. ...
C passes every function argument by value, and where passing an object by value would be inefficient or inappropriate the user can pass a pointer . This strategy doesn't work where operator overloading is used. In that case, notational convenience is essential because users cannot be expected to insert address-of operators if the objects are large. For example:
a = b - c;
is acceptable (that is, conventional) notation, but
a = &b - &c;
is not. Anyway, &b - &c already has a meaning in C, and I didn't want to change that.
An obvious example would be the typical overload of ">>" as a stream extraction operator. To work as designed, this has to be able to modify both its left- and right-hand arguments. The right has to be modified, because the primary purpose is to read a new value into that variable. The left has to be modified to do things like indicating the current status of the stream.
In theory, you could pass a pointer as the right-hand argument, but to do the same for the left argument would be problematic (e.g. when you chain operators together).
Edit: it becomes more problematic for the left side partly because the basic syntax of overloading is that x#y (where "#" stands for any overloaded operator) means x.opertor#(y). Now, if you change the rules so you somehow turn that x into a pointer, you quickly run into another problem: for a pointer, a lot of those operators already have a valid meaning separate from the overload -- e.g., if I translate x+2 as somehow magically working with a pointer to x, then I've produced an expression that already has a meaning completely separate from the overload. To work around that, you could (for example) decide that for this purpose, you'd produce a special kind of pointer that didn't support pointer arithmetic. Then you'd have to deal with x=y -- so the special pointer becomes one that you can't modify directly either, and any attempt at assigning to it ends up assigning to whatever it points at instead.
We've only restricted them enough to support two operator overloads, but our "special pointer" is already about 90% of the way to being a reference with a different name...
References constitute a standard means of specifying that the compiler should handle the addresses of objects as though they were objects themselves. This is well-suited to operator overloading because operations usually need to be chained in expressions; to do this with a uniform interface, i.e., entirely by reference, you often would need to take the address of a temporary variable, which is illegal in C++ because pointers make no guarantee about the lifetimes of their referents. References, on the other hand, do. Using references tells the compiler to work a certain kind of very specific magic (with const references at least) that preserves the lifetime of the referent.
Typically when you're implementing an operator you want to operate directly on the operand -- not a copy of it -- but passing a pointer risks that you could delete the memory inside the operator. (Yes, it would be stupid, but it would be a significant danger nonetheless.) References allow for a convenient way of allowing pointer-like access without the "assignment of responsibility" that passing pointers incurs.

Why does c++ have its separate syntax for new & delete?

Why can't it just be regular function calls? New is essentially:
malloc(sizeof(Foo));
Foo::Foo();
While delete is
Foo:~Foo();
free(...);
So why does new/delete end up having it's own syntax rather than being regular functions?
Here's a stab at it:
The new operator calls the operator new() function. Similarly, the delete operator calls the operator delete() function (and similarly for the array versions).
So why is this? Because the user is allowed to override operator new() but not the new operator (which is a keyword). You override operator new() (and delete) to define your own allocator, however, you are not responsible (or allowed to for that matter) for calling appropriate constructors and destructors. These function are called automatically by the compiler when it sees the new keyword.
Without this dichotomy, a user could override the operator new() function, but the compiler would still have to treat this as a special function and call the appropriate constructor(s) for the object(s) being created.
You can overload operator new and operator delete to provide your own allocation semantics. This is useful when you want to bypass the default heap allocator's behavior. For example if you allocate and deallocate a lot of instances of a small, fixed-size object, you may want to use a pool allocator for its memory management.
Having new and delete as explicit operators like other operators makes this flexibility easier to express using C++'s operator overloading mechanism.
For auto objects on the stack, allocation/constructor call and deallocation/destructor calls basically are transparent as you request. :)
'Cause there is no way to provide complie-time type safety with a function (malloc() returns void*, remember). Additionally, C++ tries to eliminate even a slightest chance of allocated but uninitialized objects floating around. And there are objects out there without a default constructor - for these, how would you feed constructor arguments to a function? A function like this would require too much of a special-case handling; easier to promote it to a language feature. Thus operator new.
'new/delete' are keywords in the C++ language (like 'for' and 'while'), whereas malloc/calloc are function calls in the standard C library (like 'printf' and 'sleep'). Very different beasts, more than their similar syntax may let on.
The primary difference is that 'new' and 'delete' trigger additional user code - specifically, constructors and destructors. All malloc does is set aside some memory for you to use. When setting aside memory for a simple plain old data (floats or ints, for example), 'new' and 'malloc' behave very similarly. But when you ask for space for a class, the 'new' keyword sets aside memory and then calls a constructor to initialize that class. Big difference.
Why does C++ have separate syntax for greater-than? Why can't it just be a regular function call?
greaterThan(foo, bar);