Why does this implementation of the C++ 'new' operator work? - c++

I've found out that the C++ compiler for AVR uCs doesn't support the new and delete operators, but also that there is a quick fix:
void * operator new(size_t size)
{
return malloc(size);
}
void operator delete(void * ptr)
{
free(ptr);
}
I'm assuming that it would now be possible to call new ClassName(args);.
However, I am not really sure how this works. For example, what actually returns a size_t here? I thought that constructors don't return anything...
Could it be that new is now supposed to be used differently (in conjunction with sizeof())?

new T(args); is roughly equivalent to the following.
void* storage = operator new(sizeof(T)); // obtain raw storage
call_constructor<T>(storage, args); // make an object in it
(Here call_constructor is supposed to call the constructor† of T making storage be the this pointer within that constructor.)
The operator new part obtains the requested amount of raw storage, and the constructor call is the one that actually makes an object, by invoking the constructor.
The code in the question only replaces the operator new part, i.e. the retrieval of storage. Both the sizeof part and the constructor invocation are done automatically by the compiler when you use new T(args).
† The language has a way to express this direct constructor invocation called "placement new", but I omitted it for clarity.

From the compiler name (uC), I presume it's for embedded controller. This would make sense as you rarely require dynamic memory management with embedded devices, but might benefit from 'C with classes'. Hopefully it supports 'placement new' so you can actually use C++.
If your compiler doesn't support new & delete, it's not really much of a a C++ compiler is it!
I think the keyword 'new' effectively gets converted to:
Object* pointer = (Object *)new(sizeof Object);
pointer->Object_Constructor(args);

Related

Compiler or Standard C++ Library - new and delete

I am developing C++ coding for software (kernel) without any library. I am confused about the new operator and delete operator. I have implemented KMalloc() and KFree(). Now, I want to know if the following coding will work without any Standard C++ Library.
void *mem = KMalloc(sizeof(__type__));
Object *obj = new (mem) ();
If this will not work, then how will I setup the vtables or whatever object structure there is in a preallocated space without any Std Lib.
You first should define what C++ standard are you targeting. I guess it is at least C++11.
Then, if you code in C++ for some operating system kernel, beware and study carefully the relevant ABI specifications (the details depend even of the version of your C++ compiler, and gory details like even exception handling and stack unwinding matter a lot).
Notice that the Linux kernel ABI is not C++ friendly (it is not the same as Linux user-land ABI for x86-64). So coding in C++ for the Linux kernel is not reasonable.
You probably want
void *mem = KMalloc(sizeof(Object));
Object *obj = new (mem) Object();
The second statement uses the placement new feature of C++, which will run the constructor on the (more or less "unitialized") memory zone passed as placement.
(notice that bitwise copy of C++ objects -e.g. with memcpy- is undefined behavior in general (except for PODs); you need to use constructors and assignment operators)
There is no "placement delete", but you can explicitly run the destructor: obj->~Object() after which any use of the object pointed by the obj pointer is undefined behavior.
Now, I want to know if that code will work without any Standard C++ Library.
It might be much harder than what you believe. You need to understand all the details of the ABI targeted by your compiler, and that is hard.
Notice that running properly constructors (and destructors) -in a good enough order- is of paramount importance for C++; practically speaking, they are notably initializing the (implicit) vtable field[s], without which your object can crash (as soon as any virtual member function or destructor gets called).
Read also about the rule of five (for C++11).
Coding your own kernel in C++ practically requires understanding a lot of details about your C++ implementation (and ABI).
NB: practically speaking, bitwise copy with memcpy of smart pointers, of std::stream-s, of std::mutex-es, of std::thread-s - and perhaps even of standard containers and of std::string-s etc...- is very likely to make a disaster. If you dare doing such bad things, you really need to look into the details of your particular implementations...
In addition to what other answers have already said, you might want to overload operator new and operator delete in order for you not needing to do the KMalloc() plus placement new trick all the time.
// In the global namespace.
void* operator new
(
size_t size
)
{
/* You might also check for `KMalloc()`'s return value and throw
* an exception like the standard `operator new`. This, however,
* requires kernel-mode exception support, which is not that easy
* to get up and running.
*/
return KMalloc( size );
}
void* operator new[]
(
size_t size
)
{
return KMalloc( size );
}
void operator delete
(
void* what
)
{
KFree( what );
}
void operator delete[]
(
void* what
)
{
KFree( what );
}
Then, code like the following will work by calling your KMalloc() and KFree() routines when necessary, along with all necessary constructors like placement new would do.
template<typename Type>
class dumb_smart_pointer
{
public:
dumb_smart_pointer()
: pointer( nullptr )
{}
explicit dumb_smart_pointer
(
Type* pointer
)
: pointer( pointer )
{}
~dumb_smart_pointer()
{
if( this->pointer != nullptr )
{
delete this->pointer;
}
}
Type& operator*()
{
return *this->pointer;
}
Type* operator->()
{
return this->pointer;
}
private:
Type* pointer;
};
dumb_smart_pointer<int> my_pointer = new int( 123 );
*my_pointer += 42;
KConsoleOutput << *my_pointer << '\n';

Can the global new operator be overridden based on allocated object's type traits?

I'm experimenting with upgrading our pooled fixed-block memory allocator to take advantage of C++11 type traits.
Currently it is possible to force any allocation of any object anywhere to be dispatched to the correct pool by overriding the global new operator in the traditional way, eg
void* operator new (std::size_t size)
{ // if-cascade just for simplest possible example
if ( size <= 64 ) { return g_BlockPool64.Allocate(); }
else if ( size <= 256 ) { return g_BlockPool256.Allocate(); }
// etc .. else assume arguendo that we know the following will work properly
else return malloc(size);
}
In many cases we could improve performance further if objects could be dispatched to different pools depending on type traits such as is_trivially_destructible. Is it possible to make a templatized global new operator that is aware of the allocated type, not just a requested size? Something equivalent to
template<class T>
void *operator new( size_t size)
{
if ( size < 64 )
{ return std::is_trivially_destructible<T>::value ?
g_BlockPool64_A.Allocate() :
g_BlockPool64_B.Allocate(); } // etc
}
Overriding the member new operator in every class won't work here; we really need this to automatically work for any allocation anywhere. Placement new won't work either: requiring every alloc to look like
Foo *p = new (mempool<Foo>) Foo();
is too cumbersome and people will forget to use it.
The short answer is no. The allocation/deallocation functions have the following signatures:
void* operator new(std::size_t);
void* operator new[](std::size_t);
void operator delete(void*);
void operator delete[](void*);
Most deviations from these signatures will result in your function not being used at all. In a typical implementation you're basically replacing the default implementations at the linker level -- i.e., the existing function has some particular mangled name. If you provide a function with a name that mangles to an identical result, it'll get linked instead. If your function doesn't mangle to the same name, it won't get linked.
A template like you've suggested might get used in some cases, but if so, it would lead to undefined behavior. Depending on how you arranged headers (for example) you could end up with code mixing the use of your template with the default functions, at which point about the best you could hope for would be that it crash quickly and cleanly.

How do I properly overload new?

My code is below. However before main() is run something simple such as a static std::string globalvar; will call new. Before MyPool mypool is initialized.
MyPool mypool;
void* operator new(size_t s) { return mypool.donew(s); }
Is there anyway I can force mypool to be initialized first? I have no idea how overloading new is suppose to work if there is no way to initialize its values so I am sure there is a solution to this.
I am using both visual studios 2010 and gcc (cross platform)
Make mypool a static variable of your operator new function:
void* operator new(size_t s) {
static MyPool mypool;
return mypool.donew(s);
}
It will be initialized upon first call of the function (i.e. the operator new).
EDIT: As the commenters pointed out, declaring the variable as static in the operator new functions limits its scope and makes it inaccessible in the operator delete. To fix that, you should make an accessor function for your pool object:
MyPool& GetMyPool() {
static MyPool mypool;
return mypool;
}
and invoke it in both operator new and operator delete:
void* operator new(size_t s) {
return GetMyPool().donew(s);
}
// similarly for delete
As before, declaring it as static local variable guarantees initialization upon first invocation of GetMyPool function. Additionally, it will be the same pool object in both operators which likely what you want.
Properly? Best don't. Try Boost.Pool, and just use their allocation mechanics. Or if you insist on using your pool, make a new allocation function. I've seen horrible things done to the operator new, and I'm feeling sorry for it. :(
IMHO, the only time you should overload new is when implementing a memory manager for observation of the allocs / deallocs. Otherwise, just write your own functions and use them instead. Or for most containers, you can give them allocators.
Global initialization occurs in three steps, zero initialization, static
initialization and dynamic initialization. In that order. If your
operator new uses non-local variables, these variables must depend on
only zero or static initialization; as you said, you cannot guarantee
that your operator new won't be called before any particular variable
with dynamic initialization will have occured.
If you need objects with dynamic initialization (often the case), there
are two ways of handling this:
declare a pointer to the object, rather than the object itself, and
in operator new, check if the pointer is null, and initialize it
there, or
call a function which returns a reference to a local instance.
Neither of these solutions is thread safe, but that's likely not a
problem. They are thread safe once the first call returns, so if
there is any invocation of new before threading starts, you're OK.
(It's something to keep in mind, however. If unsure, you can always
allocate and delete an object manually before the first thread is
started—perhaps in the initialization of a static object.)

C++: general thoughts about overloading operator new

Have you overloaded operator new in C++?
If yes, why?
An interview question, for which I humbly request some of your thoughts.
We had an embedded system where new was only rarely allowed, and the memory could never be deleted, as we had to prove a maximum heap usage for reliability reasons.
We had a third party library developer who didn't like those rules, so they overloaded new and delete to work against a chunk of memory we allocated just for them.
Yes.
Overloading operator new gives you a chance to control where an object lives in memory. I did this because I knew some details about the lifetime of my objects, and wanted to avoid fragmentation on a platform which didn't have virtual memory.
You would overload new if you're using your own allocator, doing something fancy with reference counting, instrumenting for garbage collection, debugging object lifetimes or something else entirely; you're replacing the allocator for objects. I've personally had to do it to ensure certain objects get allocated on specific mmap'ed pages of memory.
Yes, for two reasons: Custom allocator, and custom allocation tracking.
Overloading new operator may look as a good idea at first glance if you want to do custom allocation for some reason (i.e. avoiding memory fragmentation intrinsic to c-runtime allocator or/and avoiding locks on memory management calls in multithreaded programs). But when you get to implementation you may realize that in most cases you want to pass some additional context to this call, for example a thread-specific heap for objects of a given size. And overloading of new/delete simply doesn't work here. So eventually you may want to create your own facade to your custom memory management subsystem.
I found it very handy to overload operator new when writing Python extension code in C++. I wrapped the Python C-API code for allocation and deallocation in operator new and operator delete overloads, respectively – this allows for PyObject*-compatible structures that can be created with new MyType() and managed with predictable heap-allocation semantics.
It also allows for a separation of the allocation code (normally in the Python __new__ method) and the initialization code (in Python’s __init__) into, respectively, the operator new overloads and any constructors one sees fit to define.
Here’s a sample:
struct ModelObject {
static PyTypeObject* type_ptr() { return &ModelObject_Type; }
/// operator new performs the role of tp_alloc / __new__
/// Not using the 'new size' value here
void* operator new(std::size_t) {
PyTypeObject* type = type_ptr();
ModelObject* self = reinterpret_cast<ModelObject*>(
type->tp_alloc(type, 0));
if (self != NULL) {
self->weakrefs = NULL;
self->internal = std::make_unique<buffer_t>(nullptr);
}
return reinterpret_cast<void*>(self);
}
/// operator delete acts as our tp_dealloc
void operator delete(void* voidself) {
ModelObject* self = reinterpret_cast<ModelObject*>(voidself);
PyObject* pyself = reinterpret_cast<PyObject*>(voidself);
if (self->weakrefs != NULL) { PyObject_ClearWeakRefs(pyself); }
self->cleanup();
type_ptr()->tp_free(pyself);
}
/// Data members
PyObject_HEAD
PyObject* weakrefs = nullptr;
bool clean = false;
std::unique_ptr<buffer_t> internal;
/// Constructors fill in data members, analagous to __init__
ModelObject()
:internal(std::make_unique<buffer_t>())
,accessor{}
{}
explicit ModelObject(buffer_t* buffer)
:clean(true)
,internal(std::make_unique<buffer_t>(buffer))
{}
ModelObject(ModelObject const& other)
:internal(im::buffer::heapcopy(other.internal.get()))
{}
/// No virtual methods are defined to keep the struct as a POD
/// ... instead of using destructors I defined a 'cleanup()' method:
void cleanup(bool force = false) {
if (clean && !force) {
internal.release();
} else {
internal.reset(nullptr);
clean = !force;
}
}
/* … */
};

Issues with C++ 'new' operator?

I've recently come across this rant.
I don't quite understand a few of the points mentioned in the article:
The author mentions the small annoyance of delete vs delete[], but seems to argue that it is actually necessary (for the compiler), without ever offering a solution. Did I miss something?
In the section 'Specialized allocators', in function f(), it seems the problems can be solved with replacing the allocations with: (omitting alignment)
// if you're going to the trouble to implement an entire Arena for memory,
// making an arena_ptr won't be much work. basically the same as an auto_ptr,
// except that it knows which arena to deallocate from when destructed.
arena_ptr<char> string(a); string.allocate(80);
// or: arena_ptr<char> string; string.allocate(a, 80);
arena_ptr<int> intp(a); intp.allocate();
// or: arena_ptr<int> intp; intp.allocate(a);
arena_ptr<foo> fp(a); fp.allocate();
// or: arena_ptr<foo>; fp.allocate(a);
// use templates in 'arena.allocate(...)' to determine that foo has
// a constructor which needs to be called. do something similar
// for destructors in '~arena_ptr()'.
In 'Dangers of overloading ::operator new[]', the author tries to do a new(p) obj[10]. Why not this instead (far less ambiguous):
obj *p = (obj *)special_malloc(sizeof(obj[10]));
for(int i = 0; i < 10; ++i, ++p)
new(p) obj;
'Debugging memory allocation in C++'. Can't argue here.
The entire article seems to revolve around classes with significant constructors and destructors located in a custom memory management scheme. While that could be useful, and I can't argue with it, it's pretty limited in commonality.
Basically, we have placement new and per-class allocators -- what problems can't be solved with these approaches?
Also, in case I'm just thick-skulled and crazy, in your ideal C++, what would replace operator new? Invent syntax as necessary -- what would be ideal, simply to help me understand these problems better.
Well, the ideal would probably be to not need delete of any kind. Have a garbage-collected environment, let the programmer avoid the whole problem.
The complaints in the rant seem to come down to
"I liked the way malloc does it"
"I don't like being forced to explicitly create objects of a known type"
He's right about the annoying fact that you have to implement both new and new[], but you're forced into that by Stroustrups' desire to maintain the core of C's semantics. Since you can't tell a pointer from an array, you have to tell the compiler yourself. You could fix that, but doing so would mean changing the semantics of the C part of the language radically; you could no longer make use of the identity
*(a+i) == a[i]
which would break a very large subset of all C code.
So, you could have a language which
implements a more complicated notion of an array, and eliminates the wonders of pointer arithmetic, implementing arrays with dope vectors or something similar.
is garbage collected, so you don't need your own delete discipline.
Which is to say, you could download Java. You could then extend that by changing the language so it
isn't strongly typed, so type checking the void * upcast is eliminated,
...but that means that you can write code that transforms a Foo into a Bar without the compiler seeing it. This would also enable ducktyping, if you want it.
The thing is, once you've done those things, you've got Python or Ruby with a C-ish syntax.
I've been writing C++ since Stroustrup sent out tapes of cfront 1.0; a lot of the history involved in C++ as it is now comes out of the desire to have an OO language that could fit into the C world. There were plenty of other, more satisfying, languages that came out around the same time, like Eiffel. C++ seems to have won. I suspect that it won because it could fit into the C world.
The rant, IMHO, is very misleading and it seems to me that the author does understand the finer details, it's just that he appears to want to mislead. IMHO, the key point that shows the flaw in argument is the following:
void* operator new(std::size_t size, void* ptr) throw();
The standard defines that the above function has the following properties:
Returns: ptr.
Notes: Intentionally performs no other action.
To restate that - this function intentionally performs no other action. This is very important, as it is the key to what placement new does: It is used to call the constructor for the object, and that's all it does. Notice explicitly that the size parameter is not even mentioned.
For those without time, to summarise my point: everything that 'malloc' does in C can be done in C++ using "::operator new". The only difference is that if you have non aggregate types, ie. types that need to have their destructors and constructors called, then you need to call those constructor and destructors. Such types do not explicitly exist in C, and so using the argument that "malloc does it better" is not valid. If you have a struct in 'C' that has a special "initializeMe" function which must be called with a corresponding "destroyMe" then all points made by the author apply equally to that struct as they do to a non-aggregate C++ struct.
Taking some of his points explicitly:
To implement multiple inheritance, the compiler must actually change the values of pointers during some casts. It can't know which value you eventually want when converting to a void * ... Thus, no ordinary function can perform the role of malloc in C++--there is no suitable return type.
This is not correct, again ::operator new performs the role of malloc:
class A1 { };
class A2 { };
class B : public A1, public A2 { };
void foo () {
void * v = ::operator new (sizeof (B));
B * b = new (v) B(); // Placement new calls the constructor for B.
delete v;
v = ::operator new (sizeof(int));
int * i = reinterpret_cast <int*> (v);
delete v'
}
As I mention above, we need placement new to call the constructor for B. In the case of 'i' we can cast from void* to int* without a problem, although again using placement new would improve type checking.
Another point he makes is about alignment requirements:
Memory returned by new char[...] will not necessarily meet the alignment requirements of a struct intlist.
The standard under 3.7.3.1/2 says:
The pointer returned shall be suitably aligned so that it can be converted to a
pointer of any complete object type and then used to access the object or array in the storage allocated (until
the storage is explicitly deallocated by a call to a corresponding deallocation function).
That to me appears pretty clear.
Under specialized allocators the author describes potential problems that you might have, eg. you need to use the allocator as an argument to any types which allocate memory themselves and the constructed objects will need to have their destructors called explicitly. Again, how is this different to passing the allocator object through to an "initalizeMe" call for a C struct?
Regarding calling the destructor, in C++ you can easily create a special kind of smart pointer, let's call it "placement_pointer" which we can define to call the destructor explicitly when it goes out of scope. As a result we could have:
template <typename T>
class placement_pointer {
// ...
~placement_pointer() {
if (*count == 0) {
m_b->~T();
}
}
// ...
T * m_b;
};
void
f ()
{
arena a;
// ...
foo *fp = new (a) foo; // must be destroyed
// ...
fp->~foo ();
placement_pointer<foo> pfp = new (a) foo; // automatically !!destructed!!
// ...
}
The last point I want to comment on is the following:
g++ comes with a "placement" operator new[] defined as follows:
inline void *
operator new[](size_t, void *place)
{
return place;
}
As noted above, not just implemented this way - but it is required to be so by the standard.
Let obj be a class with a destructor. Suppose you have sizeof (obj[10]) bytes of memory somewhere and would like to construct 10 objects of type obj at that location. (C++ defines sizeof (obj[10]) to be 10 * sizeof (obj).) Can you do so with this placement operator new[]? For example, the following code would seem to do so:
obj *
f ()
{
void *p = special_malloc (sizeof (obj[10]));
return new (p) obj[10]; // Serious trouble...
}
Unfortunately, this code is incorrect. In general, there is no guarantee that the size_t argument passed to operator new[] really corresponds to the size of the array being allocated.
But as he highlights by supplying the definition, the size argument is not used in the allocation function. The allocation function does nothing - and so the only affect of the above placement expression is to call the constructor for the 10 array elements as you would expect.
There are other issues with this code, but not the one the author listed.