C++: Allocating memory within multiple threads

C++: Allocating memory within multiple threads - c++

this is more a general question rather than a specific coding problem. Does (and if yes, how) C++ avoid that multiple threads try to allocate the same memory adresses?
For example:
#include <vector>
#include <thread>
int main() {
std::vector<int> x, y;
std::thread do_work([&x] () {
/* push_back a lot of ints to x */
});
/* push_back a lot of ints to y */
do_work.join()
/* do more stuff */ }
When the two vectors allocate memory because they reached their capacity, is it possible that both vectors try to allocate the same piece of memory since the heap is shared among threads? Is code like this unsafe because it potentially creates races?

Allocation memory (via malloc/new/HeapAlloc and such) are thread-safe by default as long as you've compiled you application against thread safe runtimes (which you will, unless you explicitly change that).
Each vector will get its own slice of memory whenever they resize, but once a slice is freed any (other) thread could end up getting it the next time an allocation occurs.
You could, however, mess things up if you replace your allocators. Like if you overload the "new operator" and you're no longer getting memory for a thread safe source. https://en.cppreference.com/w/cpp/memory/new/operator_new
You could also opt to use a non-thread-safe version of malloc if you replace if via some sort of library-preload on Linux (Overriding 'malloc' using the LD_PRELOAD mechanism).
But assuming you're not doing any of that, the default implementation of user-space allocators (new/malloc) are thread safe, and OS level allocators (VirtualAlloc/mmap) are always thread safe.

Related

Possible memory leak with malloc()

I have this code:
#include <malloc.h>
int main()
{
int x = 1000;
//~600kb memory at this point
auto m = (void**)malloc(x * sizeof(void*));
for (int i = 0; i < x; i++)
m[i] = malloc(x * sizeof(void*));
for (int i = 0; i < x; i++)
free(m[i]);
free(m);
//~1700kb memory at this point
return 0;
}
When program starts memory consumption is about ~600kb, and when it ends ~1700kb. Is it memory leak or what?

malloc() acquires memory from the system using a variety of platform-specific methods. But free() does not necessarily always give memory back to the system.
One reason the behavior you're seeing might exist is that when you first call malloc() it will ask the system for a "chunk" of memory, say 1 MB as a hypothetical example. Subsequent calls will be fulfilled from that same 1 MB chunk until it is exhausted, then another chunk will be allocated.
There is little reason to immediately return all allocated memory to the system just because the application is no longer using it. Indeed, the most likely possibilities are that (a) the application requests more memory, in which case the recently freed pieces can be doled out again, or (b) the application terminates, in which case the system can efficiently clean up all its allocated memory in a single operation.

Is it memory leak or what?
No. You have each malloc matching a free and if I dont miss something you have no memory leak.
Note that what you see in the process manager is the memory assigned to the process. This is not necessarily equal to the memory actually in use by the process. The OS does not immediately reclaim it when you call free.
If you have a more complex code you can use a tool like valgrind to inspect it for leaks. Though, you better dont use manual memory managment at all, but use std containers instead (eg std::vector for dynamic arrays).

free() and malloc() are part of your Standard Library and their behavior is implementation dependent. The standard does not require free() to release once acquired memory back to your Operating System.
How the memory is reserved is plattform specific. On POSIX systems the mmap() and munmap() system calls can be used.
Please Note that most Operating Systems implement Paging, allocating memory in chunks to processes anyway. So releasing each single byte would only pose a performance overhead.

When you are running your application under management of operating system, the process of memory allocation in languages like C/C++ is two-fold. First, there is a business of actually requesting memory to be mapped into process from operating system. This is done through OS-specific call, and an application needs not bothering about it. On Linux this call is either sbrk or mmap, depending on several factors. However, this OS-requested memory is not directly accessible to C++ code, it has no control over it. Instead, C++ runtime manages it.
The second thing is actually providing a usable pointer to C/C++ code. This is also done by runtime when C++ code asks for it.
When C/C++ code calls malloc(or new), C++ runtime first determines, if it has enough continuous memory under it's management to provide a pointer to it to application code. This process is further complicated by efficiency optimizations inside malloc/new, which usually provide for several allocation arenas, used depending on object sizes, but we won't talk about it. If it has, the memory region will be marked as used by application, and a pointer will be returned to the code.
If there is no memory available, a chunk of memory would be requested from the OS. The size of the chunk normally will be way bigger than what was requested - because requesting from OS is expensive!
When the memory is deleted/freed, C++ runtime reclaims it. It might decide to return memory back to OS (for example, if there is a pressure in terms of total memory consumption on the machine), or it can just store it for future use. Many times memory will never be returned to the OS until process exits, so any OS-level tool will show process consuming memory, even though the code delete'd/freed it all!
It is also worth noting that usually application can request memory from OS bypassing C++ runtime. For example, on Linux it could be done through mmap call or, in bizarre cases, through sbrk - but in latter case you won't be able to use runtime memory management at all after this. If you use those mechanisms, you will immediately see process memory consumption growing or decreasing with every memory request or return.

Is std::async a valid helper to destroy heap allocated object

Consider a large set of heap allocated objects used to transmit computation result from other threads. Destroying these can be an expensive task.
I am in a scenario where these object are not needed any-more and I just want them to be garbage collected without blocking the main thread. I considered using std::async the following way to destroy them asynchronously:
#include <memory>
#include <future>
#include <vector>
struct Foo
{
// Complex structure containing heap allocated data.
};
void f() {
std::vector<std::unique_ptr<Foo>> graveyard;
// Process some foo and add them in graveyard.
std::async([fooMoved = std::move(graveyard)]{});
}
In the specific program I am working on, using Visual Studio, profiling seems to confirm that this is faster – with respect to the main thread at least – than just letting graveyard destroy its content at the end of f scope.
Is this kind of improvement a glitch specific to the conditions of my test or is it a reliable way to deallocate large set of heap data? Do you see any potential drawbacks? Is there a better way to perform a similar task?

Clearing STL in dedicated thread

In one of my projects, I identified a call to std::deque<T>::clear() as a major bottleneck.
I therefore decided to move this operation in a dedicated, low-priority thread:
template <class T>
void SomeClass::parallelClear(T& c)
{
if (!c.empty())
{
T* temp = new T;
c.swap(*temp); // swap contents (fast)
// deallocate on separate thread
boost::thread deleteThread([=] () { delete temp; } );
// Windows specific: lower priority class
SetPriorityClass(deleteThread.native_handle(), BELOW_NORMAL_PRIORITY_CLASS);
}
}
void SomeClass:clear(std::deque<ComplexObject>& hugeDeque)
{
parallelClear(hugeDeque);
}
This seems to work fine (VisualC++ 2010), but I wonder if I overlooked any major flaw. I would welcome your comments about the above code.
Additional information:
SomeClass:clear() is called from a GUI-thread, and the user interface is unresponsive until the call returns. The hugeQueue, on the other hand, is unlikely to be accessed by that thread for several seconds after clearing.

This is only valid if you guarantee that access to the heap is serialized. Windows does serialize access to the primary heap by default, but it's possible to turn this behaviour off and there's no guarantee that it holds across platforms or libraries. As such, I'd be careful depending on it- make sure that it's explicitly noted that it depends on the heap being shared between threads and that the heap is thread-safe to access.
I personally would simply suggest using a custom allocator to match the allocation/deallocation pattern would be the best performance improvement here- remember that creating threads has a non-trivial overhead.
Edit: If you are using GUI/Worker thread style threading design, then really, you should create, manage and destroy the deque on the worker thread.

Be aware, that its not sure that this will improve the overall performance of your application. The windows standard heap (also the Low fragmentation heap) is not laid out for frequently transmitting allocation information from one thread to another. This will work, but it might produce quite an overhead of processing.
The documentation of the hoard memory allocator might be a starting point do get deeper into that:
http://www.cs.umass.edu/~emery/hoard/hoard-documentation.html
Your approach will though improve responsiveness etc.

In addition to the things mentioned by the other posters you should consider if the objects contained in the collection have a thread affinity, for example COM objects in single threaded apartment may not be amenable to this kind of trick.

Multi-threaded Pooled Allocators

I'm having some issues using pooled memory allocators for std::list objects in a multi-threaded application.
The part of the code I'm concerned with runs each thread function in isolation (i.e. there is no communication or synchronization between threads) and therefore I'd like to setup separate memory pools for each thread, where each pool is not thread-safe (and hence fast).
I've tried using a shared thread-safe singleton memory pool and found the performance to be poor, as expected.
This is a heavily simplified version of the type of thing I'm trying to do. A lot has been included in a pseudo-code kind of way, sorry if it's confusing.
/* The thread functor - one instance of MAKE_QUADTREE created for each thread
*/
class make_quadtree
{
private:
/* A non-thread-safe memory pool for int linked list items, let's say that it's
* something along the lines of BOOST::OBJECT_POOL
*/
pooled_allocator<int> item_pool;
/* The problem! - a local class that would be constructed within each std::list as the
* allocator but really just delegates to ITEM_POOL
*/
class local_alloc
{
public :
//!! I understand that I can't access ITEM_POOL from within a nested class like
//!! this, that's really my question - can I get something along these lines to
//!! work??
pointer allocate (size_t n) { return ( item_pool.allocate(n) ); }
};
public :
make_quadtree (): item_pool() // only construct 1 instance of ITEM_POOL per
// MAKE_QUADTREE object
{
/* The kind of data structures - vectors of linked lists
* The idea is that all of the linked lists should share a local pooled allocator
*/
std::vector<std::list<int, local_alloc>> lists;
/* The actual operations - too complicated to show, but in general:
*
* - The vector LISTS is grown as a quadtree is built, it's size is the number of
* quadtree "boxes"
*
* - Each element of LISTS (each linked list) represents the ID's of items
* contained within each quadtree box (say they're xy points), as the quadtree
* is grown a lot of ID pop/push-ing between lists occurs, hence the memory pool
* is important for performance
*/
}
};
So really my problem is that I'd like to have one memory pool instance per thread functor instance, but within each thread functor share the pool between multiple std::list objects.

Why not just construct an instance of local_alloc with a reference to make_quadtree?

A thread specific allocator is quite the challenge.
I spent some time looking for a thread specific allocator "off the shelf". The best I found was hoard ( hoard.org ). This provided a significant performance improvement, however hoard has some serious drawbacks
I experienced some crashes during testing
Commercial licensing is
expensive
It 'hooks' system calls to
malloc, a technique I consider dodgy.
So I decided to roll my own thread specific memory allocator, based on boost::pool and boost::threadspecificptr. This required a small amount of, IMHO, seriously advanced C++ code, but now seems to be working well.
It is now quite a few months since I looked at the details of this, but perhaps I can take a look at it once more.
Your comment that you are looking for a thread specific but not a thread safe allocator. This makes sense, because if the allocator is thread specific it does not need to be thread safe. However, in my experience, the extra burden of being thread safe is trivial, so long as contention does NOT occur.
However, all that theory is fun, but I think we should move on to practicalities. I believe that we need a small, instrumented stand-alone program that demonstrates the problem you need to solve. I had what sounded like a very similar problem with std::multiset allocation and wrote the program you can see here: Parallel reads from STL containers
If you can write something similar, showing your problem, then I can check if my thread specific memory allocator can be applied with advantage in your case.

Is heap memory per-process? (or) Common memory location shared by different processes?

Every process can use heap memory to store and share data within the process. We have a rule in programming whenever we take some space in heap memory, we need to release it once job is done, else it leads to memory leaks.
int *pIntPtr = new int;
.
.
.
delete pIntPtr;
My question: Is heap memory per-process?
If YES,
then memory leak is possible only when a process is in running state.
If NO,
then it means OS is able to retain data in a memory somewhere. If so, is there a way to access this memory by another process. Also this may become a way for inter-process communication.
I suppose answer to my question is YES. Please provide your valuable feedback.

On almost every system currently in use, heap memory is per-process. On older systems without protected memory, heap memory was system-wide. (In a nutshell, that's what protected memory does: it makes your heap and stack private to your process.)
So in your example code on any modern system, if the process terminates before delete pIntPtr is called, pIntPtr will still be freed (though its destructor, not that an int has one, would not be called.)
Note that protected memory is an implementation detail, not a feature of the C++ or C standards. A system is free to share memory between processes (modern systems just don't because it's a good way to get your butt handed to you by an attacker.)

In most modern operating systems each process has its own heap that is accessible by that process only and is reclaimed once the process terminates - that "private" heap is usually used by new. Also there might be a global heap (look at Win32 GlobalAlloc() family functions for example) which is shared between processes, persists for the system runtime and indeed can be used for interprocess communications.

Generally the allocation of memory to a process happens at a lower level than heap management.
In other words, the heap is built within the process virtual address space given to the process by the operating system and is private to that process. When the process exits, this memory is reclaimed by the operating system.
Note that C++ does not mandate this, this is part of the execution environment in which C++ runs, so the ISO standards do not dictate this behaviour. What I'm discussing is common implementation.
In UNIX, the brk and sbrk system calls were used to allocate more memory from the operating system to expand the heap. Then, once the process finished, all this memory was given back to the OS.
The normal way to get memory which can outlive a process is with shared memory (under UNIX-type operating systems, not sure about Windows). This can result in a leak but more of system resources rather than process resources.

There are some special purpose operating systems that will not reclaim memory on process exit. If you're targeting such an OS you likely know.
Most systems will not allow you to access the memory of another process, but again...there are some unique situations where this is not true.
The C++ standard deals with this situation by not making any claim about what will happen if you fail to release memory and then exit, nor what will happen if you attempt to access memory that isn't explicitly yours to access. This is the very essence of what "undefined behavior" means and is the core of what it means for a pointer to be "invalid". There are more issues than just these two, but these two play a part.

Normally the O/S will reclaim any leaked memory when the process terminates.
For that reason I reckon it's OK for C++ programmers to never explicitly free any memory which is needed until the process exits; for example, any 'singletons' within a process are often not explicitly freed.
This behaviour may be O/S-specific, though (although it's true for e.g. both Windows and Linux): not theoretically part of the C++ standard.

For practical purposes, the answer to your question is yes. Modern operating systems will generally release memory allocated by a process when that process is shut down. However, to depend on this behavior is a very shoddy practice. Even if we can be assured that operating systems will always function this way, the code is fragile. If some function that fails to free memory suddenly gets reused for another purpose, it might translate to an application-level memory leak.
Nevertheless, the nature of this question and the example posted requires, ethically, for me to point you and your team to look at RAII.
int *pIntPtr = new int;
...
delete pIntPtr;
This code reeks of memory leaks. If anything in [...] throws, you have a memory leak. There are several solutions:
int *pIntPtr = 0;
try
{
pIntPtr = new int;
...
}
catch (...)
{
delete pIntPtr;
throw;
}
delete pIntPtr;
Second solution using nothrow (not necessarily much better than first, but allows sensible initialization of pIntPtr at the time it is defined):
int *pIntPtr = new(nothrow) int;
if (pIntPtr)
{
try
{
...
}
catch (...)
{
delete pIntPtr;
throw;
}
delete pIntPtr;
}
And the easy way:
scoped_ptr<int> pIntPtr(new int);
...
In this last and finest example, there is no need to call delete on pIntPtr as this is done automatically regardless of how we exit this block (hurray for RAII and smart pointers).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js