How does GLIBC decide segment for malloc

How does GLIBC decide segment for malloc - gdb

I look at some Linux Glibc(2.25) system and see that when the code use malloc .
sometimes the buffer has been allocated at heap segment and sometimes in anonymous segment, It's not relate for size, I can see all the segments in /proc/PID/maps
I thought that the heap segment relate for malloc And anonymous segment relate for mmap. But why GLIBC decide for the same size to use malloc and sometimes use mmap
I saw that sometimes when I use malloc in some thread the memory has been allocated at heap segment but when I switch for another thread(using GDB) the memory has been allocated to anonymous segment

glibc's malloc implementation will sometimes use brk or sbrk (what you're calling the heap -- it shows up as 'heap' in /proc/PID/maps) and sometimes use mmap. Which depends on some tradeoffs, but generally
if a process only needs a small amount of heap space, brk/sbrk is better
if a process needs a lot of heap space and/or very large blocks, mmap is better.
So GLIBC's malloc implementation has a bunch of heuristics to decide what is 'small' and what is 'large' and looks at what calls have been made so far to malloc/free in order decide on which method to use to get more memory from the system when it needs it.
There's a function mallopt you can call that affects this tuning -- there's a bunch of info on the man page about it.

Related

What Alloc API may call VirtualAlloc/reserve memory internally?

I am debugging a potential memory leak problem in a debug DLL.
The case is that process run a sub-test which loads/unloads a DLL dynamically, during the test a lot of memory is reserved and committed(1.3GB). After test is finished and DLL unloaded, still massive amount of memory remains reserved(1.2GB).
Reason why I said this reserved memory is allocated by DLL is that if I use a release DLL(nothing else changed, same test), reserved memory is ~300MB, so all the additional reserved memory must be allocated in debug DLL.
Looks like a lot of memory is committed during test but decommit(not release to free status) after test. So I want to track who reserve/decommit that large memory. But in the source code, there is no VirtualAlloc called, so questions are:
Is VirtualAlloc the only way to reserve memory?
If not, what other API can do that? If is, what other API will internally call VirtualAlloc? Quite some people online says HeapAlloc will internally call VirtualAlloc? How does it work?

[Parts of this are purely implementation detail, and not things that your application should rely on, so take them only for informational purposes, not as official documentation or contract of any kind. That said, there is some value in understanding how things are implemented under the hood, if only for debugging purposes.]
Yes, the VirtualAlloc() function is the workhorse function for memory allocation in Windows. It is a low-level function, one that the operating system makes available to you if you need its features, but also one that the system uses internally. (To be precise, it probably doesn't call VirtualAlloc() directly, but rather an even lower level function that VirtualAlloc() also calls down to, like NtAllocateVirtualMemory(), but that's just semantics and doesn't change the observable behavior.)
Therefore, HeapAlloc() is built on top of VirtualAlloc(), as are GlobalAlloc() and LocalAlloc() (although the latter two became obsolete in 32-bit Windows and should basically never be used by applications—prefer explicitly calling HeapAlloc()).
Of course, HeapAlloc() is not just a simple wrapper around VirtualAlloc(). It adds some logic of its own. VirtualAlloc() always allocates memory in large chunks, defined by the system's allocation granularity, which is hardware-specific (retrievable by calling GetSystemInfo() and reading the value of SYSTEM_INFO.dwAllocationGranularity). HeapAlloc() allows you to allocate smaller chunks of memory at whatever granularity you need, which is much more suitable for typical application programming. Internally, HeapAlloc() handles calling VirtualAlloc() to obtain a large chunk, and then divvying it up as needed. This not only presents a simpler API, but is also more efficient.
Note that the memory allocation functions provided by the C runtime library (CRT)—namely, C's malloc() and C++'s new operator—are a higher level yet. These are built on top of HeapAlloc() (at least in Microsoft's implementation of the CRT). Internally, they allocate a sizable chunk of memory that basically serves as a "master" block of memory for your application, and then divvy it up into smaller blocks upon request. As you free/delete those individual blocks, they are returned to the pool. Once again, this extra layer provides a simplified interface (and in particular, the ability to write platform-independent code), as well as increased efficiency in the general case.
Memory-mapped files and other functionality provided by various OS APIs is also built upon the virtual memory subsystem, and therefore internally calls VirtualAlloc() (or a lower-level equivalent).
So yes, fundamentally, the lowest level memory allocation routine for a normal Windows application is VirtualAlloc(). But that doesn't mean it is the workhorse function that you should generally use for memory allocation. Only call VirtualAlloc() if you actually need its additional features. Otherwise, either use your standard library's memory allocation routines, or if you have some compelling reason to avoid them (like not linking to the CRT or creating your own custom memory pool), call HeapAlloc().
Note also that you must always free/release memory using the corresponding mechanism to the one you used to allocate the memory. Just because all memory allocation functions ultimately call VirtualAlloc() does not mean that you can free that memory by calling VirtualFree(). As discussed above, these other functions implement additional logic on top of VirtualAlloc(), and thus require that you call their own routines to free the memory. Only call VirtualFree() if you allocated the memory yourself via a call to VirtualAlloc(). If the memory was allocated with HeapAlloc(), call HeapFree(). For malloc(), call free(); for new, call delete.
As for the specific scenario described in your question, it is unclear to me why you are worrying about this. It is important to keep in mind the distinction between reserved memory and committed memory. Reserved simply means that this particular block in the address space has been reserved for use by the process. Reserved blocks cannot be used. In order to use a block of memory, it must be committed, which refers to the process of allocating a backing store for the memory, either in the page file or in physical memory. This is also sometimes known as mapping. Reserving and committing can be done as two separate steps, or they can be done at the same time. For example, you might want to reserve a contiguous address space for future use, but you don't actually need it yet, so you don't commit it. Memory that has been reserved but not committed is not actually allocated.
In fact, all of this reserved memory may not be a leak at all. A rather common strategy used in debugging is to reserve a specific range of memory addresses, without committing them, to trap attempts to access memory within this range with an "access violation" exception. The fact that your DLL is not making these large reservations when compiled in Release mode suggests that, indeed, this may be a debugging strategy. And it also suggests a better way of determining the source: rather than scanning through your code looking for all of the memory-allocation routines, scan your code looking for the conditional code that depends upon the build configuration. If you're doing something different when DEBUG or _DEBUG is defined, then that is probably where the magic is happening.
Another possible explanation is the CRT's implementation of malloc() or new. When you allocate a small chunk of memory (say, a few KB), the CRT will actually reserve a much larger block but only commit a chunk of the requested size. When you subsequently free/delete that small chunk of memory, it will be decommitted, but the larger block will not be released back to the OS. The reason for this is to allow future calls to malloc/new to re-use that reserved block of memory. If a subsequent request is for a larger block than can be satisfied by the currently reserved address space, it will reserve additional address space. If, in debugging builds, you are repeatedly allocating and freeing increasingly large chunks of memory, what you're seeing may be the result of memory fragmentation. But this is really not a problem, aside from a minor performance hit, which is really not worth worrying about in debugging builds.

Memory stability of a C++ application in Linux

I want to verify the memory stability of a C++ application I wrote and compiled for Linux.
It is a network application that responds to remote clients connectings in a rate of 10-20 connections per second.
On long run, memory was rising to 50MB, eventhough the app was making calls to delete...
Investigation shows that Linux does not immediately free memory. So here are my questions :
How can force Linux to free memory I actually freed? At least I want to do this once to verify memory stability.
Otherwise, is there any reliable memory indicator that can report memory my app is actually holding?

What you are seeing is most likely not a memory leak at all. Operating systems and malloc/new heaps both do very complex accounting of memory these days. This is, in general, a very good thing. Chances are any attempt on your part to force the OS to free the memory will only hurt both your application performance and overall system performance.
To illustrate:
The Heap reserves several areas of virtual memory for use. None of it is actually committed (backed by physical memory) until malloc'd.
You allocate memory. The Heap grows accordingly. You see this in task manager.
You allocate more memory on the Heap. It grows more.
You free memory allocated in Step 2. The Heap cannot shrink, however, because the memory in #3 is still allocated, and Heaps are unable to compact memory (it would invalidate your pointers).
You malloc/new more stuff. This may get tacked on after memory allocated in step #3, because it cannot fit in the area left open by free'ing #2, or because it would be inefficient for the Heap manager to scour the heap for the block left open by #2. (depends on the Heap implementation and the chunk size of memory being allocated/free'd)
So is that memory at step #2 now dead to the world? Not necessarily. For one thing, it will probably get reused eventually, once it becomes efficient to do so. In cases where it isn't reused, the Operating System itself may be able to use the CPU's Virtual Memory features (the TLB) to "remap" the unused memory right out from under your application, and assign it to another application -- on the fly. The Heap is aware of this and usually manages things in a way to help improve the OS's ability to remap pages.
These are valuable memory management techniques that have the unmitigated side effect of rendering fine-grained memory-leak detection via Process Explorer mostly useless. If you want to detect small memory leaks in the heap, then you'll need to use runtime heap leak-detection tools. Since you mentioned that you're able to build on Windows as well, I will note that Microsoft's CRT has adequate leak-checking tools built-in. Instructions for use found here:
http://msdn.microsoft.com/en-us/library/974tc9t1(v=vs.100).aspx
There are also open-source replacements for malloc available for use with GCC/Clang toolchains, though I have no direct experience with them. I think on Linux Valgrind is the preferred and more reliable method for leak-detection anyway. (and in my experience easier to use than MSVCRT Debug).

I would suggest using valgrind with memcheck tool or any other profiling tool for memory leaks
from Valgrind's page:
Memcheck
detects memory-management problems, and is aimed primarily at
C and C++ programs. When a program is run under Memcheck's
supervision, all reads and writes of memory are checked, and calls to
malloc/new/free/delete are intercepted. As a result, Memcheck can
detect if your program:
Accesses memory it shouldn't (areas not yet allocated, areas that have been freed, areas past the end of heap blocks, inaccessible areas
of the stack).
Uses uninitialised values in dangerous ways.
Leaks memory.
Does bad frees of heap blocks (double frees, mismatched frees).
Passes overlapping source and destination memory blocks to memcpy() and related functions.
Memcheck reports these errors as soon as they occur, giving the source
line number at which it occurred, and also a stack trace of the
functions called to reach that line. Memcheck tracks addressability at
the byte-level, and initialisation of values at the bit-level. As a
result, it can detect the use of single uninitialised bits, and does
not report spurious errors on bitfield operations. Memcheck runs
programs about 10--30x slower than normal. Cachegrind
Massif
Massif is a heap profiler. It performs detailed heap profiling by
taking regular snapshots of a program's heap. It produces a graph
showing heap usage over time, including information about which parts
of the program are responsible for the most memory allocations. The
graph is supplemented by a text or HTML file that includes more
information for determining where the most memory is being allocated.
Massif runs programs about 20x slower than normal.
Using valgrind is as simple as running application with desired switches and give it as an input of valgrind:
valgrind --tool=memcheck ./myapplication -f foo -b bar

I very much doubt that anything beyond wrapping malloc and free [or new and delete ] with another function can actually get you anything other than very rough estimates.
One of the problems is that the memory that is freed can only be released if there is a long contiguous chunk of memory. What typically happens is that there are "little bits" of memory that are used all over the heap, and you can't find a large chunk that can be freed.
It's highly unlikely that you will be able to fix this in any simple way.
And by the way, your application is probably going to need those 50MB later on when you have more load again, so it's just wasted effort to free it.
(If the memory that you are not using is needed for something else, it will get swapped out, and pages that aren't touched for a long time are prime candidates, so if the system runs low on memory for some other tasks, it will still reuse the RAM in your machine for that space, so it's not sitting there wasted - it's just you can't use 'ps' or some such to figure out how much ram your program uses!)
As suggested in a comment: You can also write your own memory allocator, using mmap() to create a "chunk" to dole out portions from. If you have a section of code that does a lot of memory allocations, and then ALL of those will definitely be freed later, to allocate all those from a separate lump of memory, and when it's all been freed, you can put the mmap'd region back into a "free mmap list", and when the list is sufficiently large, free up some of the mmap allocations [this is in an attempt to avoid calling mmap LOTS of times, and then munmap again a few millisconds later]. However, if you EVER let one of those memory allocations "escape" out of your fenced in area, your application will probably crash (or worse, not crash, but use memory belonging to some other part of the application, and you get a very strange result somewhere, such as one user gets to see the network content supposed to be for another user!)

Use valgrind to find memory leaks : valgrind ./your_application
It will list where you allocated memory and did not free it.
I don't think it's a linux problem, but in your application. If you monitor the memory usage with « top » you won't get very precise usages. Try using massif (a tool of valgrind) : valgrind --tool=massif ./your_application to know the real memory usage.
As a more general rule to avoid leaks in C++ : use smart pointers instead of normal pointers.
Also in many situations, you can use RAII (http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization) instead of allocating memory with "new".

It is not typical for an OS to release memory when you call free or delete. This memory goes back to the heap manager in the runtime library.
If you want to actually release memory, you can use brk. But that opens up a very large can of memory-management worms. If you directly call brk, you had better not call malloc. For C++, you can override new to use brk directly.
Not an easy task.

The latest dlmalloc() has a concept called an mspace (others call it a region). You can call malloc() and free() against an mspace. Or you can delete the mspace to free all memory allocated from the mspace at once. Deleting an mspace will free memory from the process.
If you create an mspace with a connection, allocate all memory for the connection from that mspace, and delete the mspace when the connection closes, you would have no process growth.
If you have a pointer in one mspace pointing to memory in another mspace, and you delete the second mspace, then as the language lawyers say "the results are undefined".

how to get Heap size of a program

How to find heap memory size of a c++ program under linux platform ?I need heap memory space before the usage of new or malloc and also after that.can anyone help?
#include <malloc.h>
#include <iostream>
int main()
{
//here need heap memory space
unsigned char* I2C_Read_Data= new unsigned char[250];
//get heap memory space After the usage of new
return 0;
}

You can also add heap tracking to your own programs by overloading the new and delete operators. In a game engine I am working on, I have all memory allocation going through special functions, which attach each allocation to a particular heap tracker object. This way, at any given moment, I can pull up a report and see how much memory is being taken up by entities, actors, Lua scripts, etc.
It's not as thorough as using an external profiler (particularly when outside libraries handle their own memory management), but it is very nice for seeing exactly what memory you were responsible for.

Use valgrind's heap profiler: Massif

On Linux you can read /proc/[pid]/statm to get memory usage information.
Provides information about memory usage, measured in pages. The
columns are:
size total program size
(same as VmSize in /proc/[pid]/status)
resident resident set size
(same as VmRSS in /proc/[pid]/status)
share shared pages (from shared mappings)
text text (code)
lib library (unused in Linux 2.6)
data data + stack
dt dirty pages (unused in Linux 2.6)
See the man page for more details.
Answer by Adam Zalcman to this question describes some interesting details of the heap allocation

You can use the getrlimit function call and pass the RLIMIT_DATA for the resource. That should give you the size of the data segment for your program.

Apart from external inspection, you can also instrument your implementation of malloc to let you inspect those statistics. jemalloc and tcmalloc are implementations that, on top of performing better for multithreaded code that typical libc implementations, add some utility functions of that sort.
To dig deeper, you should learn a bit more how heap allocation works. Ultimately, the OS is the one assigning memory to processes as they ask for it, however requests to the OS (syscalls) are slower than regular calls, so in general an implementation of malloc will request large chunks to the OS (4KB or 8KB blocks are common) and the subdivise them to serve them to its callers.
You need to identify whether you are interested in the total memory consumed by the process (which includes the code itself), the memory the process requested from the OS within a particular procedure call, the memory actually in use by the malloc implementation (which adds its own book-keeping overhead, however small) or the memory you requested.
Also, fragmentation can be a pain for the latter two, and may somewhat blurs the differences between really used and assigned to.

You can try "mallinfo" and "malloc_info". They might work. mallinfo has issues when you allocate more than 2GB. malloc_info is o/s specific and notably very weird. I agree - very often it's nice to do this stuff without 3rd party tools.

Is memory allocation a system call?

Is memory allocation a system call? For example, malloc and new. Is the heap shared by different processes and managed by the OS. What about private heap? If memory allocation in the heap is managed by the OS, how expensive is this?
I would also like to have some link to places where I can read more about this topic.

In general, malloc and new do not perform a system call at each invocation. However, they use a lower-level mechanism to allocate large pages of memory. On Windows, the lower mechanism is VirtualAlloc(). I believe on POSIX systems, this is somewhat equivalent to mmap(). Both of these perform a system call to allocate memory to the process at the OS level. Subsequent allocations will use smaller parts of those large pages without incurring a system call.
The heap is normally inner-process and is not shared between processes. If you need this, most OSes have an API for allocating shared memory. A portable wrapper for these APIs is available in the Boost.Interprocess library.
If you would like to learn more about memory allocation and relationship with the OS, you should take a look at a good book on operating systems. I always suggest Modern Operating Systems by Andrew S. Tanenbaum as it is very easy to read.

(Assuming an operating system with memory protection. Might not be the case e.g. in embedded devices.)
Is memory allocation a system call?
Not necessarily each allocation. The process needs to call the kernel if its heap is not large enough for the requested allocation already, but C libraries usually request larger chunks when they do so, with the aim to reduce the number of system calls.
Is the heap shared by different processes and managed by the OS. What about private heap?
The heap is not shared between processes. It's shared between threads though.
How expensive kernel memory allocation system calls are depends entirely on the OS. Since that's a very common thing, you can expect it to be efficient under normal circumstances. Things get complicated in low RAM situations.

See the layered memory management in Win32.
Memory allocation is always a system call but the allocation is made as pages. If there are space available in the committed pages, memory manager will allocate the requested space without changing the kernel mode. The best thing about HeapAlloc is, it provides fine control over the allocation where Virtual Alloc round the allocation for a single page. It may result in excessive usage in memory.
Basically the default heap and private heaps are treated same except the default heap size is specified during the linking time. The default heap size is 1 MB and grows as required.
See this article for more details
You can also find more information in this thread

Memory allocation functions and language statements like malloc/free and new/delete are not a system calls. Malloc\free is a part of the C\C++ library and new\delete is a part of C++ runtime system. Calls of both can occasionally lead to the system calls. In the other languages memory allocation implemented in the similar way.
In general memory management can't be implemented without involving OS at all, because memory is one of the main system resources, and due to this global memory management made by OS kernel. But due to the fact that the system calls are relatively expensive, peoples try to design languages and memory allocation libraries in such a way to minimize amount of system calls.
As I know heap is an intra-process entity. That is mean that all memory allocation/deallocation requests are managed entirely by process itself. Operating system knows only the heap location and size and services two types of requests from the intra-process memory management system:
add memory page at virtual address X
release memory page from virtual address X
Local memory management system request these services when it decides that it haven't enough memory in the memory pool of heap and when it decides that it have too much memory in the memory pool of heap.
Despite the fact that the memory allocation is usually designed in such a way to minimize amount of system calls it still stay about order more expensive then memory allocation on stack. This is because the memory allocation\deallocation algorithms of heap are much more complex and expensive than the same of stack.

How to swap out a memory block?

How to make a block of memory allocated by malloc() or new:
immediately swapped out,
or lazily initialized.
In fact, I'm trying to reserve an address space. How to accomplish this?
PS. How to verify, from the user space, if a memory block is swapped out?

malloc is often implemented using mmap, so if you were to use malloc, you'd get the behavior you're after anyway. After all, why sould allocating memory force other pages out of cache when there's no guarantee that the new pages will be initialized immediately? I know that Open BSD implements malloc this way, and GNU's C lib uses mmap if your allocation is larger than some limit. I think it's just a couple pages.
I don't know about how Windows goes about all of this, but check the VirtualAlloc docs to see if it is specific about its purpose. If it documents that Windows' malloc caches its pages, then you have your answer and you should use VirtualAlloc.

To reserve a chunk of address space:
On unix, sbrk() or mmap().
On Windows, VirtualAlloc().

On Windows, you can do this with the VirtualAlloc function.
I don't know of any way to do it on Linux or OS X.

On Linux, BSD, or OS X, use malloc. I think the popular "jemalloc" implementation on FreeBSD uses a dedicated mmap for every region 1 MiB or larger. The smaller regions are still backed by mmap, so they still give most of the same behavior, but when you free the smaller regions you won't automatically unmap them. I think. The glibc "dlmalloc" implementation, which is used on Linux, also uses a dedicated mmap for allocations at least 1 MiB, but I think it uses sbrk for smaller regions. Mac OS X's malloc also uses mmap but I am not sure about the particular parameters.
A pointer that you get from a large malloc will point to a shared page in RAM filled with zero bytes. As soon as you write to a page in that region, a new page in physical RAM will be allocated and filled with zero bytes. So you see, the default behavior of malloc is already lazy. It's not that the pages are swapped out to start with, it's that they aren't even there to begin with.
If you are done with the data in a region, you can use madvise with MADV_FREE. This tells the kernel that it can free the related pages instead of swapping them out. The pages remain valid, and as soon as you write to them they'll turn back into normal pages. This is kind of like calling free and then malloc.
Summary: Just use malloc. It does what you want.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js