I am using malloc_stats() to print malloc related statistics in which I am finding "Arena 0" for some programs and "Arena 0 and Arena 1" for some other programs.
What do these arenas represent?
The heap code resides inside the glibc component, and is packaged in the libc.so.x shared library. The current implementation of the heap uses multiple independent sub-heaps called arenas. Each arena has its own mutex for concurrency protection. Thus if there are sufficient arenas within a process' heap, and a mechanism to distribute the threads' heap accesses evenly between them, then the potential for contention for the mutexes should be minimal. It turns out that this works well for allocations. In malloc(), a test is made to see if the mutex for current target arena for the current thread is free (trylock). If so then the arena is now locked and the allocation proceeds. If the mutex is busy then each remaining arena is tried in turn and used if the mutex is not busy. In the event that no arena can be locked without blocking, a fresh new arena is created. This arena by definition is not already locked, so the allocation can now proceed without blocking. Lastly, the ID of the arena last used by a thread is retained in thread local storage, and subsequently used as the first arena to try when malloc() is next called by that thread. Therefore all calls to malloc() will proceed without blocking.
See link text. It looks like heap is a collection of arenas ("sub-heaps") to handle memory allocation between several threads, thus reducing contention.
In certain malloc implementations, an "arena" is a pool of memory from which individual allocations are made. The algorithms to determine which arena is used will differ between implementations, so it's not possible for us to explain why you see a difference. One common factor is allocation size.
Everything is there: http://www.gnu.org/software/libc/manual/html_node/Statistics-of-Malloc.html
int arena
This is the total size of memory allocated with sbrk by malloc, in bytes.
Related
Imagine that I use C++11 threads. The thread will run a function that do malloc. After that I will use join without free (the memory). So, I killed the thread. It is expected that the memory frees automatically?
No, it is not. The memory is freed only after the whole application is terminated. The whole benefit of using multiple threads (as opposed to processes) is that they share the same memory, so they collectively own all the memory allocated in one of them.
Heap (HeapAlloc) Corruption in release mode only
I can just guess but why wouldn't you check the result of HeapFree? Because according the documentation it could be the reason. Try to use HEAP_NO_SERIALIZE flag when you allocate heap.
You should not refer in any way to memory that has been freed by HeapFree. After that memory is freed, any information that may have been in it is gone forever. If you require information, do not free memory containing the information. Function calls that return information about memory (such as HeapSize) may not be used with freed memory, as they may return bogus data. Calling HeapFree twice with the same pointer can cause heap corruption, resulting in subsequent calls to HeapAlloc returning the same pointer twice.
Serialization ensures mutual exclusion when two or more threads attempt to simultaneously allocate or free blocks from the same heap. There is a small performance cost to serialization, but it must be used whenever multiple threads allocate and free memory from the same heap. Setting the HEAP_NO_SERIALIZE value eliminates mutual exclusion on the heap. Without serialization, two or more threads that use the same heap handle might attempt to allocate or free memory simultaneously, likely causing corruption in the heap. The HEAP_NO_SERIALIZE value can, therefore, be safely used only in the following situations:
The process has only one thread.
The process has multiple threads, but only one thread calls the heap functions for a specific heap.
The process has multiple threads, and the application provides its own mechanism for mutual exclusion to a specific heap.
How can I free the virtual memory that is left up after calling TerminateThread? Can it be done via VirtualFree and how of course. I fully understand the "Dangers" of TerminateThread.
In an unmanaged process, there's no realistic way to tidy up memory from the outside.
Memory can be allocated in many different ways. Ultimately it all starts with calls to VirtualAlloc, VirtualAllocEx etc. But in practice runtime libraries invariably use sub allocating heap managers. These heap allocators will get memory by calls to VirtualAlloc, but then will hand out sub-blocks. And heap managers are generally shared between threads in a process. So you've no way from the outside of knowing how to free those sub-blocks.
And even if we did not have sub-allocators, how could you know which blocks handed out by VirtualAlloc you were allowed to destroy? A thread may allocate memory with a call to VirtualAlloc and require that the memory out lives the allocating thread and is destroyed by another thread.
But if you are happy to let all of that go, and just want the stack to be destroyed (as per your comments), then this article shows you how to do so with RtlFreeUserThreadStack: http://www.nicklowe.org/2012/01/thread-termination-dont-leak-the-stack/
I am planning to write a C++ networked application where:
I use a single thread to accept TCP connections and also to read data from them. I am planning to use epoll/select to do this. The data is written into buffers that are allocated using some arena allocator say jemalloc.
Once there is enough data from a single TCP client to form a protocol message, the data is published on a ring buffer. The ring buffer structures contain the fd for the connection and a pointer to the buffer containing the relevant data.
A worker thread processes entries from the ring buffers and sends some result data to the client. After processing each event, the worker thread frees the actual data buffer to return it to the arena allocator for re use.
I am leaving out details on how the publisher makes data written by it visible to the worker thread.
So my question is: Are there any allocators which optimize for this kind of behavior i.e. allocating objects on one thread and freeing on another?
I am worried specifically about having to use locks to return memory to an arena which is not the thread affinitized arena. I am also worried about false sharing since the producer thread and the worker thread will both write to the same region. Seems like jemalloc or tcmalloc both don't optimize for this.
Before you go down the path of implementing a highly optimized allocator for your multi-threaded application, you should first just use the standard new and delete operators for your implementation. After you have a correct implementation of your application, you can move to address bottlenecks that are discovered through profiling it.
If you get to the stage where it is obvious that the standard new and delete allocators are a bottleneck to the application, the following is the approach I have used:
Assumption: The number of threads are fixed and are statically created.
Each thread has their own arena.
Each object taken from an arena has a reference back to the arena it came from.
Each arena has a separate garbage list for each thread.
When a thread frees an object, it goes back the arena it came from, but is placed in the thread specific garbage list.
The thread that actually owns the arena treats its garbage list as the real free list.
Periodically, the thread that owns an arena performs a garbage collection pass to fold objects from the other thread garbage lists into the real free list.
The "periodical" garbage collection pass doesn't necessarily have to be time based. A subset of the garbage could be reaped on every allocation and free, for example.
The best way to deal with memory allocation and deallocation issues is to not deal with it.
You mention a ring buffer. Those are usually a fixed size. If you can come up with a fixed maximum size for your protocol messages you can allocate all the memory you will ever need at program start. When deallocating, keep the memory but reset it to a fresh state.
Now, your program may need to allocate and deallocate memory while dealing with each message but that will be done in each thread and cross-thread issues will not come into play.
This can work even if your message maximum size is too large to preallocate if you can allocate the amount of memory that most messages will use and have handlers for allocating more when necessary.
Lets say I have allocated some memory in my background thread, that is, a thread stack is holding a pointer to that memory. Now I want do terminate background thread execution by calling pthread_cancel on it. Will that memory be released or not? (My platform is iOS, compiler is gcc 4.2)
Each thread by necessity requires its own stack; however there is typically only one heap per process. When a thread is destroyed, there is no automatic mechanism to free the memory allocated on the heap. All you end up with is a memory leak.
As a general rule, avoid using pthread_cancel since it is hard to ensure that pthread_cancel will run safely. Rather build in some mechanism where you can pass a message to the thread to destroy itself (after freeing any resources that it owns).
The thread stack will be removed once the thread exits. But there will be no process or code that will look into your thread stack and release any references for the objects that you've allocated on the heap. Also, typically, thread stacks don't hold any references to the memory, the thread stack is an independent space that is given to the thread to use for generic program stack, any reference will only be on the stack for as long as you are inside the function that pushed such a reference onto the stack, typically because you are referencing it with a local variable.
by default, no -- see other answers, which are more specific to the answer you seek. there is however, such a thing as a thread-specific allocator; if you were using one, you'd know.
No, it won't be deleted or freed automatically. If you're very lucky, it might be garbage collected sometime if you're running a collector. File handles, shared memory ids, mutexes etc. won't be released/deallocated either. Async cancellation is safe for e.g. pure maths calculations on data still owned by another thread, but very risky in general - that's why some threading APIs have experimented with and removed the function completely.