How to modernize memalign in a C++ codebase?

How to modernize memalign in a C++ codebase? - c++

I am modernizing/updating some old code that was originally written in C but is now used in a C++ codebase (no need to be backwards compatible). A bunch of this code is memory optimized with memalign with which I am very inexperienced with so my question is how one would update this code (or just leave it like it is) and whether there's even still any point to having it there at all:
The declaration:
float *table_pf;
And how it's initialized in the constructor:
table_pf = (float*)memalign(32, sizeof(float) * TALBLE_SIZE);
I was unable to find any equivalent for modern C++ but I may also have just missed it. Usually I would simply convert the pointer to a std::vector or std::array but this does not work when using memalign.

If std::array is an option for you, it's easy to align (same applies to bare arrays):
alignas(32) std::array<TALBLE_SIZE, float> table;
The standard function for dynamically allocating over-aligned memory inherited from C is std::aligned_alloc. It's nearly identical to the non-standard memalign; only difference is that it's stricter in requiring the size to be a multiple of the alignment. A pure C++ option is to use the operator new with std::align_val_t operand which will by default use std::aligned_alloc.
It's not a good idea to use the bare pointer returned by the allocation function though: You should use RAII. One option is to use std::vector with an allocator that uses over-aligned allocation function. Standard library doesn't provide such allocator though, so a custom one needs to be used. A more straightforward but less flexible option is to use std::unique_ptr with a deleter that calls std::free (or operator delete in case you had used operator new).

The code returns a TABLE_SIZE array of floats, the address would be aligned to 32 bytes.
malloc would have used the default alignment, which normally means double size alignment (but can be larger, implementation dependent).
For an array of floats - normal allocation and alignment is sufficient.
You would use memalign or the c++ aligned_alloc or posix_memalign to return a non regular alignment.
For instance, when using SSE or other SIMD extensions, that may require larger memory alignment than the default one.
My advice is to read the code, see if memory alignment is indeed needed, and either lose it for a normal allocation or move to std::aligned_alloc if it is really required.
The use of the literal 32 is suspicious in my opinion.

Related

What's the most suitable c++ replacement of calloc?

The title says it.
I have tried:
new char[nSize];
but it can return uninitialized memory.
where as calloc ensures a zero-initialization.
I could call memset, etc. - but isn't there a more direct way ?

What's the most suitable c++ replacement of calloc?
For most purposes, std::vector. Or std::string if you intend to represent a character string. It will automatically delete whatever memory it allocates.
For data structures that contain many arrays that are not mutually contiguous, you might want to avoid the slightly-larger-than-pointer size of std::vector, and instead might opt for unique pointer:
auto ptr = std::make_unique<char[]>(nSize);
You can use value initialisation with a new expression as well. This is what std::make_unique does internally:
new char[nSize]();
But I would not recommend allocations without a RAII container.
As mentioned by geza, calloc may be optimised (on some systems) such that it may elide setting the memory to zero when allocating a large block. If such optimisation applies to your case, and is measurably significant, then there may be an argument for using std::calloc in C++.

How can I emulate a stack frame in C++?

I am writing a container that uses alloca internally to allocate data on the stack. Risks of using alloca aside, assume that I must use it for the domain I am in (it's partly a learning exercise around alloca and partly to investigate possible implementations of dynamically-sized stack-allocated containers).
According to the man page for alloca (emphasis mine) :
The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.
Using implementation-specific features, I have managed to force inlining in such a way that the callers stack is used for this function-level "scoping".
However, that means that the following code will allocate a huge amount of memory on the stack (compiler optimisations aside):
for(auto iteration : range(0, 10000)) {
// the ctor parameter is the number of
// instances of T to allocate on the stack,
// it's not normally known at compile-time
my_container<T> instance(32);
}
Without knowing the implementation details of this container, one might expect any memory it allocates to be free'd when instance goes out of scope. This is not the case and can result in a stack overflow / high memory usage for the duration of the enclosing function.
One approach that came to mind was to explicitly free the memory in the destructor. Short of reverse engineering the resulting assembly, I haven't found a way of doing that yet (also see this).
The only other approach I have thought of is to have a maximum size specified at compile-time, use that to allocate a fixed-size buffer, have the real size specified at runtime and use the fixed-size buffer internally. The issue with this is that it's potentially very wasteful (suppose your maximum were 256 bytes per container, but you only needed 32 most of the time).
Hence this question; I want to find a way to provide these scope semantics to the users of this container. Non-portable is fine, so long as it's reliable on the platform its targeting (for example, some documented compiler extension that only works for x86_64 is fine).
I appreciate this could be an XY problem, so let me restate my goals clearly:
I am writing a container that must always allocate its memory on the stack (to the best of my knowledge, this rules out C VLAs).
The size of the container is not known at compile-time.
I would like to maintain the semantics of the memory as if it were held by an std::unique_ptr inside of the container.
Whilst the container must have a C++ API, using compiler extensions from C is fine.
The code need only work on x86_64 for now.
The target operating system can be Linux-based or Windows, it doesn't need to work on both.

I am writing a container that must always allocate its memory on the stack (to the best of my knowledge, this rules out C VLAs).
The normal implementation of C VLAs in most compilers is on the stack. Of course ISO C++ doesn't say anything about how automatic storage is implemented under the hood, but it's (nearly?) universal for C implementations on normal machines (that do have a call+data stack) to use that for all automatic storage including VLAs.
If your VLA is too large, you get a stack overflow rather than a fallback to malloc / free.
Neither C nor C++ specify alloca; it's only available on implementations that have a stack like "normal" machines, i.e. the same machines where you can expect VLAs to do what you want.
All of these conditions hold for all the major compilers on x86-64 (except that MSVC doesn't support VLAs).
If you have a C++ compiler that supports C99 VLAs (like GNU C++), smart compilers may reuse the same stack memory for a VLA with loop scope.
have a maximum size specified at compile-time, use that to allocate a fixed-size buffer ... wasteful
For a special case like you mention, you could maybe have a fixed-size buffer as part of the object (size as a template param), and use that if it's big enough. If not, dynamically allocate. Maybe use a pointer member to point to either the internal or external buffer, and a flag to remember whether to delete it or not in the destructor. (You need to avoid delete on an array that's part of the object, of course.)
// optionally static_assert (! (internalsize & (internalsize-1), "internalsize not a power of 2")
// if you do anything that's easier with a power of 2 size
template <type T, size_t internalsize>
class my_container {
T *data;
T internaldata[internalsize];
unsigned used_size;
int allocated_size; // intended for small containers: use int instead of size_t
// bool needs_delete; // negative allocated size means internal
}
The allocated_size only needs to be checked when it grows, so I made it signed int so we can overload it instead of needing an extra boolean member.
Normally a container uses 3 pointers instead of pointer + 2 integers, but if you don't grow/shrink often then we save space (on x86-64 where int is 32 bits and pointers are 64-bit), and allow this overloading.
A container that grows large enough to need dynamic allocation should continue using that space but then shrinks should keep using the dynamic space, so it's cheaper to grow again, and to avoid copying back into the internal storage. Unless the caller uses a function to release unused excess storage, then copy back.
A move constructor should probably keep allocation as-is, but a copy constructor should copy into the internal buffer if possible instead of allocating new dynamic storage.

Malloc vs New for Primitives

I understand the benefits of using new against malloc in C++. But for specific cases such as primitive data types (non array) - int, float etc., is it faster to use malloc than new?
Although, it is always advisable to use new even for primitives, if we are allocating an array so that we can use delete[].
But for non-array allocation, I think there wouldn't be any constructor call for int? Since, new operator allocates memory, checks if it's allocated and then calls the constructor. But just for primitives non-array heap allocation, is it better to use malloc than new?
Please advise.

Never use malloc in C++. Never use new unless you are implementing a low-level memory management primitive.
The recommendation is:
Ask yourself: "do I need dynamic memory allocation?". A lot of times you might not need it - prefer values to pointers and try to use the stack.
If you do need dynamic memory allocation, ask yourself "who will own the allocated memory/object?".
If you only need a single owner (which is very likely), you should
use std::unique_ptr. It is a zero cost abstraction over
new/delete. (A different deallocator can be specified.)
If you need shared ownership, you should use std::shared_ptr. This is not a zero cost abstraction, as it uses atomic operations and an extra "control block" to keep track of all the owners.
If you are dealing with arrays in particular, the Standard Library provides two powerful and safe abstractions that do not require any manual memory management:
std::array<T, N>: a fixed array of N elements of type T.
std::vector<T>: a resizable array of elements of type T.
std::array and std::vector should cover 99% of your "array needs".
One more important thing: the Standard Library provides the std::make_unique and std::make_shared which should always be used to create smart pointer instances. There are a few good reasons:
Shorter - no need to repeat the T (e.g. std::unique_ptr<T>{new T}), no need to use new.
More exception safe. They prevent a potential memory leak caused by the lack of a well-defined order of evaluation in function calls. E.g.
f(std::shared_ptr<int>(new int(42)), g())
Could be evaluated in this order:
new int(42)
g()
...
If g() throws, the int is leaked.
More efficient (in terms of run-time speed). This only applies to std::make_shared - using it instead of std::shared_ptr directly allows the implementation to perform a single allocation both for the object and for the control block.
You can find more information in this question.

It can still be necessary to use malloc and free in C++ when you are interacting with APIs specified using plain C, because it is not guaranteed to be safe to use free to deallocate memory allocated with operator new (which is ultimately what all of the managed memory classes use), nor to use operator delete to deallocate memory allocated with malloc.
A typical example is POSIX getline (not to be confused with std::getline): it takes a pointer to a char * variable; that variable must point to a block of memory allocated with malloc (or it can be NULL, in which case getline will call malloc for you); when you are done calling getline you are expected to call free on that variable.
Similarly, if you are writing a library, it can make sense to use C++ internally but define an extern "C" API for your external callers, because that gives you better binary interface stability and cross-language interoperability. And if you return heap-allocated POD objects to your callers, you might want to let them deallocate those objects with free; they can't necessarily use delete, and making them call YourLibraryFree when there are no destructor-type operations needed is unergonomic.
It can also still be necessary to use malloc when implementing resizable container objects, because there is no equivalent of realloc for operator new.
But as the other answers say, when you don't have this kind of interface constraint tying your hands, use one of the managed memory classes instead.

It's always better to use new. If you use malloc you still have to check manually if space is allocated.
In modern c++ you can use smart pointers. With make_unique and make_shared you never call new explicitly. std::unique_ptr is not bigger than the underlying pointer and the overhead of using it is minimal.

The answer to "should I use new or malloc" is single responsibillity rule.
Resource management should be done by a type that has that as its sole purpose.
Those classes already exists, such as unique_ptr, vector etc.
Directly using either malloc or new is a cardinal sin.

zwol's answer already gives the correct correctness answer: Use malloc()/free() when interacting with C interfaces only.
I'm not going to repeat those details, I'm going to answer the performance question.
The truth is, that the performance of malloc() and new can, and does differ. When you perform an allocation with new, the memory will generally be allocated via call to the global operator new() function, which is distinct from malloc(). It is trivial to implement operator new() by calling through to malloc(), but this is not necessarily done.
As a matter of fact, I've seen a system where an operator new() that calls through to malloc() would outperform the standard implementation of operator new() by roughly 100 CPU cycles per call. That's definitely a measurable difference, and a clear indication that the standard implementation does something very different from malloc().
So, if you are worried about performance, there is three things to do:
Measure your performance.
Write replacement implementations for the global operator new() function and its friends.
Measure your performance and compare.
The gains/losses may or may not be significant.

Why is ::operator new[] necessary when ::operator new is enough?

As we know, the C++ standard defines two forms of global allocation functions:
void* operator new(size_t);
void* operator new[](size_t);
And also, the draft C++ standard (18.6.1.2 n3797) says:
227) It is not the direct responsibility of operator
new or operator delete to note the repetition
count or element size of the array. Those operations are performed
elsewhere in the array new and delete expressions. The array new
expression, may, however, increase the size argument to operator
new to obtain space to store supplemental information.
What makes me confused is:
What if we remove void* operator new[](size_t); from the standard, and just use void* operator new(size_t) instead? What's the rationale to define a redundant global allocation function?

I think ::operator new[] may have been useful for fairly specialized systems where "big but few" arrays might be allocated by a different allocator than "small but numerous" objects. However, it's currently something of a relic.
operator new can reasonably expect that an object will be constructed at the exact address returned, but operator new[] cannot. The first bytes of the allocation block might be used for a size "cookie", the array might be sparsely initialized, etc. The distinction becomes more meaningful for member operator new, which may be specialized for its particular class.
In any case, ::operator new[] cannot be very essential, because std::vector (via std::allocator), which is currently the most popular way to obtain dynamic arrays, ignores it.
In modern C++, custom allocators are generally a better choice than customized operator new. Actually, new expressions should be avoided entirely in favor of container (or smart-pointer, etc) classes, which provide more exception safety.

::operator new[] and ~delete[] facilitate memory usage debugging, being a central point to audit allocation and deallocation operations; you can then ensure the array form is used for both or neither.
There are also lots of plausible if highly unusual/crude tuning uses:
allocate arrays from a separate pool, perhaps because that crucially improved average cache hits for small single-object dynamically-allocated objects,
different memory access hints (ala madvise) for array/non-array data
All that's a bit weird and outside the day-to-day concerns of 99.999% of programmers, but why prevent it being possible?

The standard (n3936) makes it clear that these two operators serve different but related purposes.
operator new calls the function void* operator new(std::size_t). The first argument must be identical to the argument to the operator. It returns a block of storage suitably aligned, and which may be somewhat larger than required.
operator new[] calls the function void* operator new[](std::size_t). The first argument may be larger than the argument supplied to the operator, to provide extra storage space if required for array indexing. The default implement for both is to simply call malloc().
The purpose of operator new[] is to support specialised array indexing, if available. It has nothing to do with memory pools or anything else. In a conforming implementation that made use of this feature, the implementation would set up specialised tables in the extra space and the compiler would generate code for instructions or calls to library library support routines that made use of those tables. C++ code using arrays and failing to use new[] would fail on those platforms.
I am not personally aware of any such implementation, but it resembles the kind of features required for the support of certain mainframes (CDC, IBM, etc) which have an architecture quite unlike the Intel or RISC chips we know and love.
In my opinion, the accepted answer is incorrect.
Just for completeness, the standard (n3936 mostly in S5.3.4) contains the following.
A distinction between allocating an 'array object' or a 'non-array object'
References to 'array allocation overhead', with the implication that extra storage might be needed and it might (somehow) be used for a repetition count or element size.
There is no reference to memory pools or any hint that this might be a consideration.

I'm sure there are proper use-cases out there that require separate new[] and new, but I haven't encountered one yet that is uniquely possible with this separation and nothing else.
However, I see it like this: since the user calls different versions of operator new, the C++ standard would have been guilty of wantonly and deliberately losing information if they'd defined just one operator new and had both new and new[] forward there. There is (literally) one bit of information here, that might be useful to somebody, and I don't think people on the committee could have thrown it out in good conscience!
Besides, having to implement the extra new[] is a very very minor inconvenience to the rest of us, if at all, so the trade off of preserving a single bit of information wins against having to implement a single simple function in a small fraction of our programs.

The C++ Programming Language: Special Edition p 423 says
_The operator new() and operator delete() functions allow a user to take over allocation and deallocation of individual objects; operator new[]() and operator delete[]() serve exactly the same role for the allocation and deallocation of arrays.
Thanks Tony D for correcting my misunderstanding of this nuance.
Wow, it's not often I'm caught out on something in C++ I'm so certain about - I must have been spending too much time in Objective-C!
original wrong answer
It's simple - the new[] form invokes the constructor on every element of a classic C array.
So it first allocates the space for all the objects, then iterates calling the constructor for each slot.

Malloc vs new -- different padding

I'm reviewing someone else's C++ code for our project that uses MPI for high-performance computing (10^5 - 10^6 cores). The code is intended to allow for communications between (potentially) different machines on different architectures. He's written a comment that says something along the lines of:
We'd normally use new and delete, but here I'm using malloc and free. This is necessary because some compilers will pad the data differently when new is used, leading to errors in transferring data between different platforms. This doesn't happen with malloc.
This does not fit with anything I know from standard new vs malloc questions.
What is the difference between new/delete and malloc/free? hints at the idea that the compiler could calculate the size of an object differently (but then why does that differ from using sizeof?).
malloc & placement new vs. new is a fairly popular question but only talks about new using constructors where malloc doesn't, which isn't relevant to this.
how does malloc understand alignment? says that memory is guaranteed to be properly aligned with either new or malloc which is what I'd previously thought.
My guess is that he's misdiagnosed his own bug some time in the past and deduced that new and malloc give different amounts of padding, which I think probably isn't true. But I can't find the answer with Google or in any previous question.
Help me, StackOverflow, you're my only hope!

IIRC there's one picky point. malloc is guaranteed to return an address aligned for any standard type. ::operator new(n) is only guaranteed to return an address aligned for any standard type no larger than n, and if T isn't a character type then new T[n] is only required to return an address aligned for T.
But this is only relevant when you're playing implementation-specific tricks like using the bottom few bits of a pointer to store flags, or otherwise relying on the address to have more alignment than it strictly needs.
It doesn't affect padding within the object, which necessarily has exactly the same layout regardless of how you allocated the memory it occupies. So it's hard to see how the difference could result in errors transferring data.
Is there any sign what the author of that comment thinks about objects on the stack or in globals, whether in his opinion they're "padded like malloc" or "padded like new"? That might give clues to where the idea came from.
Maybe he's confused, but maybe the code he's talking about is more than a straight difference between malloc(sizeof(Foo) * n) vs new Foo[n]. Maybe it's more like:
malloc((sizeof(int) + sizeof(char)) * n);
vs.
struct Foo { int a; char b; }
new Foo[n];
That is, maybe he's saying "I use malloc", but means "I manually pack the data into unaligned locations instead of using a struct". Actually malloc is not needed in order to manually pack the struct, but failing to realize that is a lesser degree of confusion. It is necessary to define the data layout sent over the wire. Different implementations will pad the data differently when the struct is used.

Your colleague may have had new[]/delete[]'s magic cookie in mind (this is the information the implementation uses when deleting an array). However, this would not have been a problem if the allocation beginning at the address returned by new[] were used (as opposed to the allocator's).
Packing seems more probable. Variations in ABIs could (for example) result in a different number of trailing bytes added at the end a structure (this is influenced by alignment, also consider arrays). With malloc, the position of a structure could be specified and thus more easily portable to a foreign ABI. These variations are normally prevented by specifying alignment and packing of transfer structures.

The layout of an object can't depend on whether it was allocated using malloc or new. They both return the same kind of pointer, and when you pass this pointer to other functions they won't know how the object was allocated. sizeof *ptr is just dependent on the declaration of ptr, not how it was assigned.

I think you are right. Padding is done by the compiler not new or malloc. Padding considerations would apply even if you declared an array or struct without using new or malloc at all. In any case while I can see how different implementations of new and malloc could cause problems when porting code between platforms, I completely fail to see how they could cause problems transferring data between platforms.

When I want to control the layout of my plain old data structure, with MS Visual compilers I use #pragma pack(1). I suppose such a precompiler directive is supported for most compilers, like for example gcc.
This has the consequence of aligning all fields of the structures one behind the other, without empty spaces.
If the platform on the other end does the same ( i.e. compiled its data exchange structure with a padding of 1), then the data retrieved on both side justs fits well.
Thus I have never had to to play with malloc in C++.
At worst I would have considered overloading the new operator so as it performs some tricky things, rather than using malloc directly in C++.

This is my wild guess of where this thing is coming from. As you mentioned, problem is with data transmission over MPI.
Personally, for my complicated data structures that I want to send/receive over MPI, I always implement serialization/deserialization methods that pack/unpack the whole thing into/from an array of chars. Now, due to padding we know that that size of the structure could be larger than the size of its members and thus one also needs to calculate the unpadded size of the data structure so that we know how many bytes are being sent/received.
For instance if you want to send/receive std::vector<Foo> A over MPI with the said technique, it is wrong to assume the size of resulting array of chars is A.size()*sizeof(Foo) in general. In other words, each class that implements serialize/deserialize methods, should also implement a method that reports the size of the array (or better yet store the array in a container). This might become the reason behind a bug. One way or another, however, that has nothing to do with new vs malloc as pointed out in this thread.

In c++: newkeyword is used to allocate some particular bytes of memory with respect to some data-structure. For example, you have defined some class or structure and you want to allocate memory for its object.
myclass *my = new myclass();
or
int *i = new int(2);
But in all cases you need the defined datatype (class, struct, union, int, char etc...) and only that bytes of memory will be allocated which is required for its object/variable. (ie; multiples of that datatype).
But in case of malloc() method, you can allocate any bytes of memory and you don't need to specify the data type at all times. Here you can observe it in few possibilities of malloc():
void *v = malloc(23);
or
void *x = malloc(sizeof(int) * 23);
or
char *c = (char*)malloc(sizeof(char)*35);

malloc is a type of function
and new is a type of data type in c++
in c++, if we use malloc than we must and should use typecast otherwise compiler give you error
and if we use new data type for allocation of memory than we no need to typecast

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js