I would like to measure stack, heap and static memory separately because I have some constraints for each one.
To measure the heap memory I'm using valgrind->massif tool.
Massif should also be possible to measure heap AND stack memory but it shows strange resultats :
Last snapshot without --stacks=yes provides total(B)=0, useful-heap(B)=0, extra-heap(B)=0 (so all is fine)
Last snapshot with --stacks=yes provides total(B)= 2,256, useful-heap(B)=1,040, extra-heap(B)=0, stacks(B)=1,208 (which shows memory leak even if it is the same command and same binary tested... dunno why ...)
So finally I need a tool to measure stack and static memory used by a c++ binary, some help would be welcome :)
Thanks for your help !
----------- EDIT --------------
Further to the Basile Starynkevitch comment, to explain what I mean with static, stack and heap memory I took it from the Dmalloc library documentation :
Static data is the information whose storage space is compiled into the program.
/* global variables are allocated as static data */
int numbers[10];
main()
{
…
}
Stack data is data allocated at runtime to hold information used inside of functions. This data is managed by the system in the space called stack space.
void foo()
{
/* if they are, the parameters of the function are stored in the stack */
/* this local variable is stored on the stack */
float total;
…
}
main()
{
foo();
}
Heap data is also allocated at runtime and provides a programmer with dynamic memory capabilities.
main()
{
/* the address is stored on the stack */
char * string;
…
/*
* Allocate a string of 10 bytes on the heap. Store the
* address in string which is on the stack.
*/
string = (char *)malloc(10);
…
/* de-allocate the heap memory now that we're done with it */
(void)free(string);
…
}
I would like to measure stack, heap and static memory separately because I have some constraints for each one.
I cannot imagine why you have separate constraints for each one. They all sit in virtual memory! BTW you might use setrlimit(2) to set limits (perhaps from the invoking shell process, e.g. with bash ulimit builtin).
Your definitions are naive, if you consider the actual virtual address space of your process.
BTW, proc(5) enables you to query that space, e.g. using /proc/self/maps like here from inside your program (or /proc/1234/maps to query the process of pid 1234, perhaps from a terminal). You could also use /proc/self/status and /proc/self/statm (BTW try cat /proc/self/maps and cat /proc/$$/maps in a terminal). On Linux, you might also use mallinfo(3) and malloc_stats(3) to get information about memory allocation statistics.
Static data may be in the data segment (or BSS segment) of your program. But what about thread local space? And these data segments also contain data internal to various libraries, notably the C standard library libc.so (do that count?). Of course the stack segment is often bigger (since page aligned) than the actual used stack (from "bottom" to current %esp register). And a multi-threaded process has several stacks (and stack segments), one per thread.
Stack data is of course in the call stack, which contains a lot of other things (return addresses, gap, spilled registers) than just automatic variables (some of them sitting only in registers or being optimized by the compiler, without consuming any stack slot). Do they count? Also, startup code from crt0 (which calls your main) is likely to use a little bit of stack space (does it count?)...
Heap allocated data (it could be allocated from various libraries, or even from the dynamic linker) contains not only what your program gets from malloc (and friends) but also the necessary overhead. Do that count? And what about memory-mapped files? How should they count?
I would recommend querying the actual virtual address space (e.g. by reading /proc/self/maps or using pmap(1)...) but they you get something different than what you ask.
Just BTW I found it before your answer :
To measure the heap memory use valgrind -> massif
To measure the static memory use the bash function size on the binary
To measure the stack it is possible to use stackusage
It gave me all the stats I wanted
Related
I am reading mlockall()'s manpage: http://man7.org/linux/man-pages/man2/mlock.2.html
It mentions
Real-time processes that are using mlockall() to prevent delays on page
faults should reserve enough locked stack pages before entering the time-
critical section, so that no page fault can be caused by function calls. This
can be achieved by calling a function that allocates a sufficiently large
automatic variable (an array) and writes to the memory occupied by this array in
order to touch these stack pages. This way, enough pages will be mapped for the
stack and can be locked into RAM. The dummy writes ensure that not even copy-
on-write page faults can occur in the critical section.
I am a bit confused by this statement:
This can be achieved by calling a function that allocates a sufficiently large
automatic variable (an array) and writes to the memory occupied by this array in
order to touch these stack pages.
All the automatic variables (variables on stack) are created "on the fly" on the stack when the function is called. So how can I achieve what the last statement says?
For example, let's say I have this function:
void foo() {
char a;
uint16_t b;
std::deque<int64_t> c;
// do something with those variables
}
Or does it mean before I call any function, I should call a function like this in main():
void reserveStackPages() {
int64_t stackPage[4096/8 * 1024 * 1024];
memset(stackPage, 0, sizeof(stackPage));
}
If yes, does it make a difference if I first allocate the stackPage variable on heap, write and then free? Probably yes, because heap and stack are 2 different region in the RAM?
std::deque exists above is just to bring up another related question -- what if I want to reserve memory for things using both stack pages and heap pages. Will calling "heap" version of reserveStackPages() help?
The goal is to minimize all the jitters in the application (yes, I know there are many other things to look at such as TLB miss, etc; just trying to deal with one kind of jitter at once, and slowly progressing into all).
Thanks in advance.
P.S. This is for a low latency trading application if it matters.
You generally don't need to use mlockall, unless you code (more or less hard) real-time applications (I actually never used it).
If you do need it, you'll better code in C (not in genuine C++) the most real-time parts of your code, because you surely want to understand the details of memory allocation. Notice that unless you dive into std::deque implementation, you don't exactly know where it is sitting (probably most of the data is heap allocated, even if your c is an automatic variable).
You should first understand in details the virtual address space of your process. For that, proc(5) is useful: from inside your process, you'll read /proc/self/maps (see this), from outside (e.g. some terminal) you'll do cat /proc/1234/maps for a process of pid 1234. Or use pmap(1).
because heap and stack are 2 different regions in the RAM?
In fact, your process' address space contains many segments (listed in /proc/1234/maps), much more that two. Typically every dynamically linked shared library (such as libc.so) brings a few segments.
Try cat /proc/self/maps and cat /proc/$$/maps in your terminal to get a better intuition about virtual address spaces. On my machine, the first gives 19 segments of the cat process -each displayed as a line- and the second 97 segments of the zsh (my shell) process.
To ensure that your stack has enough space, you indeed could call a function allocating a large enough automatic variable, like your reserveStackPages. Beware that call stacks are practically of limited size (a few megabytes usually, see also setrlimit(2)).
If you really need mlockall (which is unlikely) you might consider linking statically your program (to have less segments in your virtual address space).
Look also into madvise(2) (and perhaps mincore(2)). It is generally much more useful than mlockall. BTW, in practice, most of your virtual memory is in RAM (unless your system experiments thrashing, and then you'll see it immediately).
Read also Operating Systems: Three Easy Pieces to understand the role of paging.
PS. Nano-second sensitive applications does not make much sense (because of cache misses that the software does not control).
The following code is generating a stack overflow error for me
int main(int argc, char* argv[])
{
int sieve[2000000];
return 0;
}
How do I get around this? I am using Turbo C++ but would like to keep my code in C
EDIT:
Thanks for the advice. The code above was only for example, I actually declare the array in a function and not in sub main. Also, I needed the array to be initialized to zeros, so when I googled malloc, I discovered that calloc was perfect for my purposes.
Malloc/calloc also has the advantage over allocating on the stack of allowing me to declare the size using a variable.
Your array is way too big to fit into the stack, consider using the heap:
int *sieve = malloc(2000000 * sizeof(*sieve));
If you really want to change the stack size, take a look at this document.
Tip: - Don't forget to free your dynamically allocated memory when it's no-longer needed.
There are 3 ways:
Allocate array on heap - use malloc(), as other posters suggested. Do not forget to free() it (although for main() it is not that important - OS will clean up memory for you on program termination).
Declare the array on unit level - it will be allocated in data segment and visible for everybody (adding static to declaration will limit the visibility to unit).
Declare your array as static - in this case it will be allocated in data segment, but visible only in main().
That's about 7MB of stack space. In visual studio you would use /STACK:###,### to reflect the size you want. If you truely want a huge stack (could be a good reason, using LISP or something :), even the heap is limited to small'sh allocations before forcing you to use VirtualAlloc), you may also want to set your PE to build with /LARGEADDRESSAAWARE (Visual Studio's linker again), but this configure's your PE header to allow your compiled binary to address the full 4GB of 32'bit address space (if running in a WOW64). If building truely massive binaries, you would also typically need to configure /bigobj as an additional linker paramerter.
And if you still need more space, you can radically violate convention by using something simular to (again MSVC's link) /merge:, which will allow you to pack one section into another, so you can use every single byte for a single shared code/data section. Naturally you would also need to configure the SECTIONS permissions in a def file or with #pgrama.
Use malloc. All check the return type is not null, if it is null then your system simply doesn't have enought memory to fit that many values.
You would be better off allocating it on the heap, not the stack. something like
int main(int argc, char* argv[])
{
int * sieve;
sieve = malloc(20000);
return 0;
}
Your array is huge.
It's possible that your machine or OS don't have or want to allocate so much memory.
If you absolutely need an enormous array, you can try to allocate it dynamically (using malloc(...)), but then you're at risk of leaking memory. Don't forget to free the memory.
The advantage of malloc is that it tries to allocate memory on the heap instead of the stack (therefore you won't get a stack overflow).
You can check the value that malloc returns to see if the allocation succeeded or failed.
If it fails, just try to malloc a smaller array.
Another option would be to use a different data structure that can be resized on the fly (like a linked list). Wether this option is good depends on what you are going to do with the data.
Yet another option would be to store things in a file, streaming data on the fly. This approach is the slowest.
If you go for storage on the hard drive, you might as well use an existing library (for databases)
As Turbo C/C++ is 16 bit compiler int datatype consumes about 2 bytes.
2bytes*2000000=40,00,000 bytes=3.8147MB space.
The auto variables of a function is stored in stack and it caused the overflow of the stack memory. Instead use the data memory [using static or global variable] or the dynamic heap memory [using the malloc/calloc] for creating the required memory as per the availability of the processor memory mapping.
Is there some reason why you can't use alloca() to allocate the space you need on the stack frame based on how big the object really needs to be?
If you do that, and still bust the stack, put it in allocated heap. I highly recommend NOT declaring it as static in main() and putting it in the data segment.
If it really has to be that big and your program can't allocate it on the heap, your program really has no business running on that type of machine to begin with.
What (exactly) are you trying to accomplish?
I am writing a simple C++ program on Mac OS. I have just
int main()
{
int *n = new int[50000000];
}
I launch this program in lldb, and put a breakpoint at the line where n is allocated. Then I launch top in another tab, I see that memory usage is 336K pre-allocation. When I do n inside lldb, so that the allocation for n happens, I expect to my memory usage to go up. However, top shows me the same amount of memory used by my program. What could be the reason for this? I am trying to understand how memory allocation happens in C++, which is why I am doing this.
I have not exited the scope of main. When I check top again, I am sitting at closing curly brace for main.
The top command shows the process stats as viewed by the operating system. It shows how much memory was allocated to the process, but not how much of this memory is effectively in use. It's not accurate for monitoring memory allocation.
Memory allocation with heap and free store is implementation dependent in C++. But tt's usually not mapped one to one with OS allocation calls. For performance reasons (calls to the OS are slower than calls inside your userland code), the memory is received from OS in larger chunks:
when the c++ runtime starts, it usually allocates some memory from the OS, in order to allocate memory it needs for standard library objects, and to initalize the free store to quickly satisfy allocation request.
Only if this initial memory is exhausted will the standard library allocate more memory from the operating system.
And allocation is done again in larger chunks, so that not every new would raise an OS call.
From your observations, I guess that this initial allocation is larger than 50 MB. Try with a much larger value to see the difference.
If you want to track memory consumption more precisely, you need some profiling tools, for example valgrind or heap command
int A[10000000]; //This gives a segmentation fault
int *A = (int*)malloc(10000000*sizeof(int));//goes without any set fault.
Now my question is, just out of curiosity, that if ultimately we are able to allocate higher space for our data structures, say for example, BSTs and linked lists created using the pointers approach in C have no as such memory limit(unless the total size exceeds the size of RAM for our machine) and for example, in the second statement above of declaring a pointer type, why is that we can't have an array declared of higher size(until it reaches the memory limit!!)...Is this because the space allocated is contiguous in a static sized array?.But then from where do we get the guarantee that in the next 1000000 words in RAM no other piece of code would be running...??
PS: I may be wrong in some of the statements i made..please correct in that case.
Firstly, in a typical modern OS with virtual memory (Linux, Windows etc.) the amount of RAM makes no difference whatsoever. Your program is working with virtual memory, not with RAM. RAM is just a cache for virtual memory access. The absolute limiting factor for maximum array size is not RAM, it is the size of the available address space. Address space is the resource you have to worry about in OSes with virtual memory. In 32-bit OSes you have 4 gigabytes of address space, part of which is taken up for various household needs and the rest is available to you. In 64-bit OSes you theoretically have 16 exabytes of address space (less than that in practical implementations, since CPUs usually use less than 64 bits to represent the address), which can be perceived as practically unlimited.
Secondly, the amount of available address space in a typical C/C++ implementation depends on the memory type. There's static memory, there's automatic memory, there's dynamic memory. The address space limits for each memory type are pre-set in advance by the compiler. Which raises the question: where are you declaring your large array? Which memory type? Automatic? Static? You provided no information, but this is absolutely necessary. If you are attempting to declare it as a local variable (automatic memory), then no wonder it doesn't work, since automatic memory (aka "stack memory") has very limited address space assigned to it. Your array simply does not fit. Meanwhile, malloc allocates dynamic memory, which normally has the largest amount of address space available.
Thirdly, many compilers provide you with options that control the initial distribution of address space between different kinds of memory. You can request a much larger stack size for your program by manipulating such options. Quite possibly you can request a stack so large, than your local array will fit in it without any problems. But in practice, for obvious reasons, it makes very little sense to declare huge arrays as local variables.
Assuming local variables, this is because on modern implementations automatic variables will be allocated on the stack which is very limited in space. This link gives some of the common stack sizes:
platform default size
=====================================
SunOS/Solaris 8172K bytes
Linux 8172K bytes
Windows 1024K bytes
cygwin 2048K bytes
The linked article also notes that the stack size can be changed for example in Linux, one possible way from the shell before running your process would be:
ulimit -s 32768 # sets the stack size to 32M bytes
While malloc on modern implementations will come from the heap, which is only limited to the memory you have available to the process and in many cases you can even allocate more than is available due to overcommit.
I THINK you're missing the difference between total memory, and your programs memory space. Your program runs in an environment created by your operating system. It grants it a specific memory range to the program, and the program has to try to deal with that.
The catch: Your compiler can't 100% know the size of this range.
That means your compiler will successfully build, and it will REQUEST that much room in memory when the time comes to make the call to malloc (or move the stack pointer when the function is called). When the function is called (creating a stack frame) you'll get a segmentation fault, caused by the stack overflow. When the malloc is called, you won't get a segfault unless you try USING the memory. (If you look at the manpage for malloc() you'll see it returns NULL when there's not enough memory.)
To explain the two failures, your program is granted two memory spaces. The stack, and the heap. Memory allocated using malloc() is done using a system call, and is created on the heap of your program. This dynamically accepts or rejects the request and returns either the start address, or NULL, depending on a success or fail. The stack is used when you call a new function. Room for all the local variables is made on the stack, this is done by program instructions. Calling a function can't just FAIL, as that would break program flow completely. That causes the system to say "You're now overstepping" and segfault, stopping the execution.
I am working in a C++ program in Linux. Now I want to check how the memory is allocated in my program. Since the Library I am using is complex, I cannot estimate manually.
I googled on online. Somebody suggests valgrind. I used it, but it crashes my program. Also somebody use getrusage (http://linux.die.net/man/2/getrusage), but I found many negative comments on that.
Anybody has suggestions on that?
SIGAR has been recommended before
It seems to give mostly total memory/cpu usage during runtime but may be useful as it has bindings for many languages and works on many platforms.
As for more detailed per-process information, you can get resident, shared, virtual memory totals as well as i/o and page faults.
If your memory is allocated by malloc, then from gdb (or your code):
(gdb) call malloc_stats()
http://www.gnu.org/software/libc/manual/html_node/Statistics-of-Malloc.html
3.2.2.11 Statistics for Memory Allocation with malloc
You can get information about dynamic memory allocation by calling the mallinfo function. This function and its associated data type are declared in malloc.h; they are an extension of the standard SVID/XPG version.
— Data Type: struct mallinfo
This structure type is used to return information about the dynamic memory allocator. It contains the following members:
int arena
This is the total size of memory allocated with sbrk by malloc, in bytes.
int ordblks
This is the number of chunks not in use. (The memory allocator internally gets chunks of memory from the operating system, and then carves them up to satisfy individual malloc requests; see Efficiency and Malloc.)
int smblks
This field is unused.
int hblks
This is the total number of chunks allocated with mmap.
int hblkhd
This is the total size of memory allocated with mmap, in bytes.
int usmblks
This field is unused.
int fsmblks
This field is unused.
int uordblks
This is the total size of memory occupied by chunks handed out by malloc.
int fordblks
This is the total size of memory occupied by free (not in use) chunks.
int keepcost
This is the size of the top-most releasable chunk that normally borders the end of the heap (i.e., the high end of the virtual address space's data segment).
— Function: struct mallinfo mallinfo (void)
This function returns information about the current dynamic memory usage in a structure of type struct mallinfo.