How can overloaded 'operator new' cause infinite loops? - c++

I was reading a book named
"Hands-On System Programming with C++". It says on page 320 that overloading the new operator can cause infinite loops, so it should be avoided.
These overloads affect all allocations, including those used by the C++ library, so care should be taken when leveraging these overloads as infinite cyclic recursions could occur if an allocation is performed inside these functions. For example, data structures such as std::vector and std::list, or debugging functions such as std::cout and std::cerr cannot be used as these facilities use the new() and delete() operators to allocate memory.
So, how can this piece of code cause an infinite loop, and why should I not use cout and vector with it? This was the piece of code in the book. I tried to use vector, cout (inside the new operator), push_back, but can't replicate the situation. So, when exactly can this happen?
void* operator new (size_t size){
if(size > 1000) page_counter++;
return malloc(size);
}

Simply telling a std::vector to allocate some memory in operator new should do it:
void *operator new(std::size_t size) {
// std::vector<int>::reserve calls std::allocator<int>::allocate calls (this) operator new calls ...
std::vector<int>().reserve(999);
return std::malloc(size);
}
int main() {
int *p = new int(42);
// OFC, this is undefined behavior, so we *could* reach this, but... we don't, which means it definitely is UB
std::cout << "Shouldn't reach this!\n";
}
Godbolt shows it crashing
Note that a) It's not enough to just construct a std::vector, because that might not allocate. std::vector usually only allocates when you somehow tell it to. It will expand when you try to add things to it, or you can say "be at least this big" with reserve. b) You have to call operator new from somewhere to trigger the loop (here it's within the new in main).

Related

How do I delete newly allocated char that is also the function return value?

What will happen with array1 after I return it? Will it delete itself or will the space be inaccessible? How do I delete[] it?
char* ToCharArray() {
stringstream COUT;
COUT << *day<< "." << *month<< "." << *year<< " " << *hours<< *minutes;
string temp = COUT.str();
int vel = strlen(temp.size) + 1;
char *array1= new char[vel];
strcpy_s(array1, vel, temp.c_str());
return array1;
}
You delete it as you usually delete arrays in C++: with delete[]:
const char* p = ToCharArray();
// ...
delete[] p;
However, note, that the second part of ToCharArray() is pointless. You might return std::string and avoid possible memory leaks:
std::string ToCharArray() {
stringstream COUT;
COUT << *day<< "." << *month<< "." << *year<< " " << *hours<< *minutes;
return COUT.str();
}
In modern C++, it's best to avoid raw new/new[] and delete/delete[] as much as possible in favor of using standard container classes and smart pointers, and if you do have a need for manual memory management then you should try to encapsulate that as much as possible into a container class. And with C++17, you usually shouldn't even need raw pointers - if you have a pointer that you don't own, you can annotate that fact by using std::observer_ptr<T>. In particular, smart pointer types can be used to annotate lifetime expectations; for example, if a function returns a pointer to something it expects the caller to delete when it's done with it, it should return a std::unique_ptr<T>. Similarly, if a function expects to take unique ownership of an object being passed in by pointer and be able to free the memory when it's done, then it should take that argument as a std::unique_ptr<T>. The given function violates this design principle.
However, there will certainly be times when you'll want to use a third-party library which doesn't follow these design principles, and which you don't want to (or can't) edit the source code to. In those cases, what I like to do is to create smart pointers as soon as possible after getting return values from such API functions; this way, I get the maximum benefit out of following the modern C++ idioms outlined above for my code. In this case, that would look like:
std::unique_ptr<char[]> res { ToCharArray() };
Then when this object goes out of scope, the compiler will generate the code to free the memory for you. (Even if something between this line and the end of the containing block ends up throwing an exception which is not caught in that block.)
Similarly, in the case of a function expecting to take ownership of a pointer, I keep it in the unique_ptr<T> as long as possible, then pass p.release() as that argument of the function.
If this is a function which you tend to use often, it might help to create a modern C++ wrapper around the legacy C++ API:
inline std::unique_ptr<char[]> wrap_ToCharArray() {
return std::unique_ptr<char[]> { ToCharArray() };
}
This has benefits such as better exception safety for an expression like f(wrap_ToCharArray(), wrap_ToCharArray()). (Look up the rationale for std::make_unique if you are interested in the details of why this has better exception safety than f(std::unique_ptr<char[]> { ToCharArray() }, std::unique_ptr<char[]> { ToCharArray() }).)

C++ Operator overloading error check without exceptions

I have a class similar to vector that is primarily a dynamically sized array. I am writing it for a resource-limited platform so I am required to not use exceptions.
It has become clear that to use operator overloading to simplify the interface for this class dynamic allocation would have to be performed in some of the operator overload functions. The assignment operator (=) is one example.
Without exceptions though, it becomes rather challenging to inform the caller of a bad allocation error in a sensible way while still retatining strong error safety. I could have an error property of the class which the caller must check after every call that involves dynamic allocation, but this seems like a not-so-optimal solution.
EDIT:
This is the best idea I have got at the moment (highlighted as a not-so-optimal solution in the paragraph above), any improvements would be greatly appreciated:
dyn_arr & dyn_arr::operator=(dyn_arr const & rhs) {
if (reallocate(rhs.length)) // this does not destroy data on bad alloc
error |= bad_alloc; // set flag indicating the allocate has failed
else {
size_t i;
for (i = 0; i < rhs.length; ++i) // coppy the array
arr[i] = rhs.arr[i]; // assume this wont throw an exceptions and it wont fail
}
return *this;
}
then to call:
dyn_arr a = b;
if (a.error)
// handle it...
I havn't compiled this so there might be typos, but hopefully you get the idea.
There are two separate issues going on here.
The first is related to operator overloading. As CashCow mentions, overloaded operators in C++ are just syntactical sugar for function calls. In particular, operators are not required to return *this. That is merely a programming convention, created with the intention to facilitate operator chaining.
Now, chaining assignment operators (a = b = c = ...) is quite a corner case in C++ applications. So it's possible that you're better off by explicitly forbidding the users of your dyn_arr class to ever chain assignment operators. That would give you to the freedom to instead return an error code from the operator, just like from a regular function:
error_t operator = (dyn_arr const & rhs) {
void *mem = realloc(...);
if (mem == NULL) {
return ERR_BAD_ALLOC; // memory allocation failed
}
...
return ERR_SUCCESS; // all ok
}
And then in caller code:
dyn_arr a, b;
if ((a = b) != ERR_SUCCESS) {
// handle error
}
The second issue is related to the actual example you're giving:
dyn_arr a = b;
This example will NOT call the overloaded assigment operator! Instead, it means "construct dyn_arr object a with b as argument to the constructor". So this line actually calls the copy constructor of dyn_arr. If you're interested to understand why, think in terms of efficiency. If the semantics of that line included calling the assignment operator, the runtime system would have do two things as result of this line: construct a with some default state, and then immediately destroy that state by assigning to a the state of b. Instead, just doing one thing - calling the copy construction - is sufficient. (And leads to the same semantics, assuming any sane implementations of copy constructor and the assignment operator.)
Unfortunately, you're right to recognize that this issue is hard to deal with. There does not seem to be a really elegant way of handling failure in constructor, other than throwing an exception. If you cannot do that, either:
set a flag in the constructor and require/suggest the user to check for it afterwards, or
require that a pointer to already allocated memory area is
passed as an argument to the constructor.
For more details, see How to handle failure in constructor in C++?
Operator overloading has nothing to do with exceptions, it is simply allowing a "function" to be invoked by means of use of operators.
e.g. if you were writing your own vector you could implement + to concatenate two vectors or add a single item to a vector (as an alias to push_back())
Of course any operation that requires assigning more memory could run out of it (and you would get bad_alloc and have to manage that if you cannot throw it, by setting some kind of error state).

Access std::map within operator new called by static constructor

1) I have some static classes in my project that allocate variables within their constructors.
class StaticClass
{
public:
char *var;
StaticClass()
{
var=new char[100];
}
};
static StaticClass staticClass;
2) I have overridden the new and delete operators and made them keep track of all current allocations in a std::unordered_map
unordered_map<void*,size_t> allocations;
void* operator new[](size_t size)
{
void *p=malloc(size);
if (p==0) // did malloc succeed?
throw std::bad_alloc(); // ANSI/ISO compliant behavior
allocations[p]=size;
return p;
}
When my program starts, staticClass's constructor is called before allocations' constructor is, so operator new() tries to insert size into allocations before it has been initialized, which errors.
Previously, when I ran into problems with the order of static construction, I simply made the std::map into a NULL pointer, and then initialized it the first time it was used, ensuring it would be valid the first time I inserted it:
unsorted_map<void*,size_t> *allocations=NULL;
//in code called by static constructor:
if(allocations==NULL)
allocations=new unsortedmap()
//now safe to insert into allocations
However, this will no longer work since I would be calling new within operator new(), creating an infinite recursive loop.
I am aware that I could probably solve this by making another special version of operator new that takes some token argument to differentiate it, and just use that to initialize allocations, however in a more general (learning) sense, I would prefer to somehow either
a) force allocations to initialize before StaticClass does (best)
b) have some way to call the default operator new instead of my overridden one (which I don't think is possible, but...)
c) some other more general solution?
A simple way to avoid initialization order issues is to wrap your static object inside a function:
unordered_map<void*,size_t> &allocations()
{
static unordered_map<void*,size_t> static_map;
return static_map;
}
Then use it like this:
void* operator new[](size_t size)
{
void *p=malloc(size);
if (p==0) // did malloc succeed?
throw std::bad_alloc(); // ANSI/ISO compliant behavior
allocations()[p]=size;
return p;
}
However, you still run the risk of std::unordered_map using your operator new internally.

How do I free pointers inside structures?

I'm new to structures so please bear with me. I wrote a structure called gnt containing an integer pointer, an integer, and a boolean:
struct gnt
{
unsigned int* num;
unsigned int size;
bool negative;
};
Because I am allocating arbitrary length int arrays to various gnt variables (ie. k.num = new int*[j]) (for some value j), I need to free them somehow. I am not sure how to do that. Do I simply use delete[] k.num; (where k is a gnt)? Do I have to worry about the structure itself?
Also, as a side question, I wrote a recursive function to multiply out items in a list:
char* chain_multi(char** list, unsigned int start, unsigned int end)
{
/***************************************************************
This function recursively multiply every items in a list then
return the product.
***************************************************************/
unsigned int size = start - end + 1;
if(size == 1)
return copy_char(list[end]);
if(size == 2)
return multiplication(list[start], list[end]);
int rs = start - size / 2;
char* right = chain_multi(list, rs, end);
char* left = chain_multi(list, start, rs + 1);
char* product = multiplication(left, right);
delete[] left;
delete[] right;
return product;
}
Will this give any advantage over doing it without recursion? I tested with various sized lists (between 10 - 10000 entries) and there doesn't seem to be any advantage time wise... The recursive code is shorter than its counterpart though.
Thanks for any input.
Since you're using c++, you can put a destructor in the struct to do that automatically for you. There are other ways, but this is the most practical:
struct gnt
{
unsigned int* num;
unsigned int size;
bool negative;
~gnt() {delete [] num; }
};
I'd also suggest to have a constructor to make sure that num has null until it's initialized, so the destructor will work safely before that:
struct gnt
{
unsigned int* num;
unsigned int size;
bool negative;
gnt() : num(NULL) {}
~gnt() {delete [] num; }
};
To have a safe behavior when instances are assigned or initialized when created, you need the copy constructor and assignment operator. They should copy the values of all the non-dynamic members, and create a duplicate of num with the same size and contents. In such case, it'd be also recommended to initialize all the members in the constructor, because size should also always have a valid content for that to work. If you don't want to complicate things too much now, just make them private, this will cause the compiler to bark if you try to do an (unsupported) object assignment or copy:
struct gnt
{
unsigned int* num;
unsigned int size;
bool negative;
gnt() : num(NULL) {}
~gnt() {delete [] num; }
private: gnt(const gnt&); gnt &operator = (gnt &);
};
As others suggested, one alternative is to use std::vector instead of a raw pointer. That way, you don't need to worry about deallocations:
struct gnt
{
std::vector<unsigned int> num;
unsigned int size;
bool negative;
};
About the question "do I have to worry about the structure itself?", that depends on how you created its instances. If it was with operator new, yes. If not, they'll be deallocated when goin out of scope as any other variable.
Finally, about the recursion, IMO rarely the choice is about code efficiency. You should use recursion only if the code becomes simpler/cleaner AND there is no danger of adverse effects (like stack overflow). If that's not the case, I'd always go for the iterative version.
Follow the Rule:
You should pass the same address to delete[] that you received from new[].
If You allocated only a member on freestore, so you would need to deallocate only that.
You allocated the member k.num using new [], so yes you should call delete [] only for it.
Also, You can use std::vector instead of doing the memory management by yourself(unless this is some crappy assignment which restricts you from doing so)
For Standerdese Fans:
Standard C++03 ยง 3.7.4.2-3:
If a deallocation function terminates by throwing an exception, the behavior is undefined. The value of the first argument supplied to a deallocation function may be a null pointer value; if so, and if the deallocation function is one supplied in the standard library, the call has no effect. Otherwise, the value supplied
to operator delete(void*) in the standard library shall be one of the values returned by a previous invocation of either operator new(std::size_t) or operator new(std::size_t, const std::nothrow_-t&) in the standard library, and the value supplied to operator delete[](void*) in the standard library shall be one of the values returned by a previous invocation of either operator new[](std::size_t) or
operator new[](std::size_t, const std::nothrow_t&) in the standard library.
The usual advantage of recursion is simplicity and clarity (and possibly a different approach to a problem), not normally speed. In fact, rather the opposite used to be true: recursive implementations tended to be noticeably slower than iterative ones. Modern hardware has eliminated or drastically reduced that speed differential, but it would still be fairly unusual for a recursive implementation to be faster than an iterative counterpart.
free is fine on any pointer disregarding the fact it is a structure member or not. If you are writing C code maybe it would be better to use malloc() and free().
For recursion or not it depends on the context. Generally speaking recursion is ok. Recursion is slightly slower because of some function calling and parameter passing overhead. The problem with recursion is that if you go very deep in recursion level (maybe 1000 nested function call) you could end up filling the stack. This would cause the program to crash.

Issues with C++ 'new' operator?

I've recently come across this rant.
I don't quite understand a few of the points mentioned in the article:
The author mentions the small annoyance of delete vs delete[], but seems to argue that it is actually necessary (for the compiler), without ever offering a solution. Did I miss something?
In the section 'Specialized allocators', in function f(), it seems the problems can be solved with replacing the allocations with: (omitting alignment)
// if you're going to the trouble to implement an entire Arena for memory,
// making an arena_ptr won't be much work. basically the same as an auto_ptr,
// except that it knows which arena to deallocate from when destructed.
arena_ptr<char> string(a); string.allocate(80);
// or: arena_ptr<char> string; string.allocate(a, 80);
arena_ptr<int> intp(a); intp.allocate();
// or: arena_ptr<int> intp; intp.allocate(a);
arena_ptr<foo> fp(a); fp.allocate();
// or: arena_ptr<foo>; fp.allocate(a);
// use templates in 'arena.allocate(...)' to determine that foo has
// a constructor which needs to be called. do something similar
// for destructors in '~arena_ptr()'.
In 'Dangers of overloading ::operator new[]', the author tries to do a new(p) obj[10]. Why not this instead (far less ambiguous):
obj *p = (obj *)special_malloc(sizeof(obj[10]));
for(int i = 0; i < 10; ++i, ++p)
new(p) obj;
'Debugging memory allocation in C++'. Can't argue here.
The entire article seems to revolve around classes with significant constructors and destructors located in a custom memory management scheme. While that could be useful, and I can't argue with it, it's pretty limited in commonality.
Basically, we have placement new and per-class allocators -- what problems can't be solved with these approaches?
Also, in case I'm just thick-skulled and crazy, in your ideal C++, what would replace operator new? Invent syntax as necessary -- what would be ideal, simply to help me understand these problems better.
Well, the ideal would probably be to not need delete of any kind. Have a garbage-collected environment, let the programmer avoid the whole problem.
The complaints in the rant seem to come down to
"I liked the way malloc does it"
"I don't like being forced to explicitly create objects of a known type"
He's right about the annoying fact that you have to implement both new and new[], but you're forced into that by Stroustrups' desire to maintain the core of C's semantics. Since you can't tell a pointer from an array, you have to tell the compiler yourself. You could fix that, but doing so would mean changing the semantics of the C part of the language radically; you could no longer make use of the identity
*(a+i) == a[i]
which would break a very large subset of all C code.
So, you could have a language which
implements a more complicated notion of an array, and eliminates the wonders of pointer arithmetic, implementing arrays with dope vectors or something similar.
is garbage collected, so you don't need your own delete discipline.
Which is to say, you could download Java. You could then extend that by changing the language so it
isn't strongly typed, so type checking the void * upcast is eliminated,
...but that means that you can write code that transforms a Foo into a Bar without the compiler seeing it. This would also enable ducktyping, if you want it.
The thing is, once you've done those things, you've got Python or Ruby with a C-ish syntax.
I've been writing C++ since Stroustrup sent out tapes of cfront 1.0; a lot of the history involved in C++ as it is now comes out of the desire to have an OO language that could fit into the C world. There were plenty of other, more satisfying, languages that came out around the same time, like Eiffel. C++ seems to have won. I suspect that it won because it could fit into the C world.
The rant, IMHO, is very misleading and it seems to me that the author does understand the finer details, it's just that he appears to want to mislead. IMHO, the key point that shows the flaw in argument is the following:
void* operator new(std::size_t size, void* ptr) throw();
The standard defines that the above function has the following properties:
Returns: ptr.
Notes: Intentionally performs no other action.
To restate that - this function intentionally performs no other action. This is very important, as it is the key to what placement new does: It is used to call the constructor for the object, and that's all it does. Notice explicitly that the size parameter is not even mentioned.
For those without time, to summarise my point: everything that 'malloc' does in C can be done in C++ using "::operator new". The only difference is that if you have non aggregate types, ie. types that need to have their destructors and constructors called, then you need to call those constructor and destructors. Such types do not explicitly exist in C, and so using the argument that "malloc does it better" is not valid. If you have a struct in 'C' that has a special "initializeMe" function which must be called with a corresponding "destroyMe" then all points made by the author apply equally to that struct as they do to a non-aggregate C++ struct.
Taking some of his points explicitly:
To implement multiple inheritance, the compiler must actually change the values of pointers during some casts. It can't know which value you eventually want when converting to a void * ... Thus, no ordinary function can perform the role of malloc in C++--there is no suitable return type.
This is not correct, again ::operator new performs the role of malloc:
class A1 { };
class A2 { };
class B : public A1, public A2 { };
void foo () {
void * v = ::operator new (sizeof (B));
B * b = new (v) B(); // Placement new calls the constructor for B.
delete v;
v = ::operator new (sizeof(int));
int * i = reinterpret_cast <int*> (v);
delete v'
}
As I mention above, we need placement new to call the constructor for B. In the case of 'i' we can cast from void* to int* without a problem, although again using placement new would improve type checking.
Another point he makes is about alignment requirements:
Memory returned by new char[...] will not necessarily meet the alignment requirements of a struct intlist.
The standard under 3.7.3.1/2 says:
The pointer returned shall be suitably aligned so that it can be converted to a
pointer of any complete object type and then used to access the object or array in the storage allocated (until
the storage is explicitly deallocated by a call to a corresponding deallocation function).
That to me appears pretty clear.
Under specialized allocators the author describes potential problems that you might have, eg. you need to use the allocator as an argument to any types which allocate memory themselves and the constructed objects will need to have their destructors called explicitly. Again, how is this different to passing the allocator object through to an "initalizeMe" call for a C struct?
Regarding calling the destructor, in C++ you can easily create a special kind of smart pointer, let's call it "placement_pointer" which we can define to call the destructor explicitly when it goes out of scope. As a result we could have:
template <typename T>
class placement_pointer {
// ...
~placement_pointer() {
if (*count == 0) {
m_b->~T();
}
}
// ...
T * m_b;
};
void
f ()
{
arena a;
// ...
foo *fp = new (a) foo; // must be destroyed
// ...
fp->~foo ();
placement_pointer<foo> pfp = new (a) foo; // automatically !!destructed!!
// ...
}
The last point I want to comment on is the following:
g++ comes with a "placement" operator new[] defined as follows:
inline void *
operator new[](size_t, void *place)
{
return place;
}
As noted above, not just implemented this way - but it is required to be so by the standard.
Let obj be a class with a destructor. Suppose you have sizeof (obj[10]) bytes of memory somewhere and would like to construct 10 objects of type obj at that location. (C++ defines sizeof (obj[10]) to be 10 * sizeof (obj).) Can you do so with this placement operator new[]? For example, the following code would seem to do so:
obj *
f ()
{
void *p = special_malloc (sizeof (obj[10]));
return new (p) obj[10]; // Serious trouble...
}
Unfortunately, this code is incorrect. In general, there is no guarantee that the size_t argument passed to operator new[] really corresponds to the size of the array being allocated.
But as he highlights by supplying the definition, the size argument is not used in the allocation function. The allocation function does nothing - and so the only affect of the above placement expression is to call the constructor for the 10 array elements as you would expect.
There are other issues with this code, but not the one the author listed.