Memory usage doesn't increase when allocating memory - c++

I would like to find out the amount of bytes used by a process from within a C++ program by inspecting the operating system's memory information. The reason I would like to do this is to find a possible overhead in memory allocation when allocating memory (due to memory control blocks/nodes in free lists etc.) Currently I am on mac and am using this code:
#include <mach/mach.h>
#include <iostream>
int getResidentMemoryUsage() {
task_basic_info t_info;
mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;
if (task_info(mach_task_self(), TASK_BASIC_INFO,
reinterpret_cast<task_info_t>(&t_info),
&t_info_count) == KERN_SUCCESS) {
return t_info.resident_size;
}
return -1;
}
int getVirtualMemoryUsage() {
task_basic_info t_info;
mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;
if (task_info(mach_task_self(), TASK_BASIC_INFO,
reinterpret_cast<task_info_t>(&t_info),
&t_info_count) == KERN_SUCCESS) {
return t_info.virtual_size;
}
return -1;
}
int main(void) {
int virtualMemoryBefore = getVirtualMemoryUsage();
int residentMemoryBefore = getResidentMemoryUsage();
int* a = new int(5);
int virtualMemoryAfter = getVirtualMemoryUsage();
int residentMemoryAfter = getResidentMemoryUsage();
std::cout << virtualMemoryBefore << " " << virtualMemoryAfter << std::endl;
std::cout << residentMemoryBefore << " " << residentMemoryAfter << std::endl;
return 0;
}
When running this code I would have expected to see that the memory usage has increased after allocating an int. However when I run the above code I get the following output:
75190272 75190272
819200 819200
I have several questions because this output does not make any sense.
Why hasn't either the virtual/resident memory changed after an integer has been allocated?
How come the operating system is allocating such large amounts of memory to a running process.
When I do run the code and check activity monitor I find that 304 kb of memory is used but that number differs from the virtual/resident memory usage obtained programmatically.
My end goal is to be able to find the memory overhead when assigning data, so is there a way to do this (i.e. determine the bytes used by the OS and compare with the bytes allocated to find the difference is what I am currently thinking of)
Thank you for reading

The C++ runtime typically allocates a block of memory when a program starts up, and then parcels this out to your code when you use things like new, and adds it back to the block when you call delete. Hence, the operating system doesn't know anything about individual new or delete calls. This is also true for malloc and free in C (or C++)

First you measure number of pages, not really memory allocated. Second the runtime pre allocates few pages at startup. If you want to observe something allocate more than a single int. Try allocating several thousands and you will observe some changes.

Related

Max number of elements vector

I was looking to see how many elements I can stick into a vector before the program crashes. When running the code below the program crashed with a bad alloc at i=90811045, aka when trying to add the 90811045th element. My question is: Why 90811045?
it is:
not a power of two
not the value that vector.max_size() gives
the same number both in debug and release
the same number after restarting my computer
the same number regardless of what the value of the long long is
note: I know I can fix this by using vector.reserve() or other methods, I am just interested in where 90811045 comes from.
code used:
#include <iostream>
#include <vector>
int main() {
std::vector<long long> myLongs;
std::cout << "Max size expected : " << myLongs.max_size() << std::endl;
for (int i = 0; i < 160000000; i++) {
myLongs.push_back(i);
if (i % 10000 == 0) {
std::cout << "Still going! : " << i << " \r";
}
}
return 0;
}
extra info:
I am currently using 64 bit windows with 16 GB of ram.
Why 90811045?
It's probably just incidental.
That vector is not the only thing that uses memory in your process. There is the execution stack where local variables are stored. There is memory allocated by for buffering the input and output streams. Furthermore, the global memory allocator uses some of the memory for bookkeeping.
90811044 were added succesfully. The vector implementation (typically) has a deterministic strategy for allocating larger internal buffer. Typically, it multiplies the previous capacity by a constant factor (greater than 1). Hence, we can conclude that 90811044 * sizeof(long long) + other_usage is consistently small enough to be allocated successfully, but (90811044 * sizeof(long long)) * some_factor + other_usage is consistently too much.

Memory error at lower limit than expected when implementing Sieve of Eratosthenes in C++

My question is related to a problem described here. I have written a C++ implementation of the Sieve of Eratosthenes that hits a memory overflow if I set the target value too high. As suggested in that question, I am able to fix the problem by using a boolean <vector> instead of a normal array.
However, I am hitting the memory overflow at a much lower value than expected, around n = 1 200 000. The discussion in the thread linked above suggests that the normal C++ boolean array uses a byte for each entry, so with 2 GB of RAM, I expect to be able to get to somewhere on the order of n = 2 000 000 000. Why is the practical memory limit so much smaller?
And why does using <vector>, which encodes the booleans as bits instead of bytes, yield more than an eightfold increase in the computable limit?
Here is a working example of my code, with n set to a small value.
#include <iostream>
#include <cmath>
#include <vector>
using namespace std;
int main() {
// Count and sum of primes below target
const int target = 100000;
// Code I want to use:
bool is_idx_prime[target];
for (unsigned int i = 0; i < target; i++) {
// initialize by assuming prime
is_idx_prime[i] = true;
}
// But doesn't work for target larger than ~1200000
// Have to use this instead
// vector <bool> is_idx_prime(target, true);
for (unsigned int i = 2; i < sqrt(target); i++) {
// All multiples of i * i are nonprime
// If i itself is nonprime, no need to check
if (is_idx_prime[i]) {
for (int j = i; i * j < target; j++) {
is_idx_prime[i * j] = 0;
}
}
}
// 0 and 1 are nonprime by definition
is_idx_prime[0] = 0; is_idx_prime[1] = 0;
unsigned long long int total = 0;
unsigned int count = 0;
for (int i = 0; i < target; i++) {
// cout << "\n" << i << ": " << is_idx_prime[i];
if (is_idx_prime[i]) {
total += i;
count++;
}
}
cout << "\nCount: " << count;
cout << "\nTotal: " << total;
return 0;
}
outputs
Count: 9592
Total: 454396537
C:\Users\[...].exe (process 1004) exited with code 0.
Press any key to close this window . . .
Or, changing n = 1 200 000 yields
C:\Users\[...].exe (process 3144) exited with code -1073741571.
Press any key to close this window . . .
I am using the Microsoft Visual Studio interpreter on Windows with the default settings.
Turning the comment into a full answer:
Your operating system reserves a special section in the memory to represent the call stack of your program. Each function call pushes a new stack frame onto the stack. If the function returns, the stack frame is removed from the stack. The stack frame includes the memory for the parameters to your function and the local variables of the function. The remaining memory is referred to as the heap. On the heap, arbitrary memory allocations can be made, whereas the structure of the stack is governed by the control flow of your program. A limited amount of memory is reserved for the stack, when it gets full (e.g. due to too many nested function calls or due to too large local objects), you get a stack overflow. For this reason, large objects should be allocated on the heap.
General references on stack/heap: Link, Link
To allocate memory on the heap in C++, you can:
Use vector<bool> is_idx_prime(target);, which internally does a heap allocation and deallocates the memory for you when the vector goes out of scope. This is the most convenient way.
Use a smart pointer to manage the allocation: auto is_idx_prime = std::make_unique<bool[]>(target); This will also automatically deallocate the memory when the array goes out of scope.
Allocate the memory manually. I am mentioning this only for educational purposes. As mentioned by Paul in the comments, doing a manual memory allocation is generally not advisable, because you have to manually deallocate the memory again. If you have a large program with many memory allocations, inevitably you will forget to free some allocation, creating a memory leak. When you have a long-running program, such as a system service, creating repeated memory leaks will eventually fill up the entire memory (and speaking from personal experience, this absolutely does happen in practice). But in theory, if you would want to make a manual memory allocation, you would use bool *is_idx_prime = new bool[target]; and then later deallocate again with delete [] is_idx_prime.

What is the largest amount of memory I can allocate on my MacBook Pro? [duplicate]

This question already has answers here:
Allocating more memory than there exists using malloc
(6 answers)
Closed 6 years ago.
I'm trying to figure out how much memory I can allocate before the allocation will fail.
This simple C++ code allocates a buffer (of size 1024 bytes), assigns to the last five characters of the buffer, reports, and then deletes the buffer. It then doubles the size of the buffer and repeats until it fails.
Unless I'm missing something, the code is able to allocate up to 65 terabytes of memory before it fails on my MacBook Pro. Is this even possible? How can it allocate so much more memory than I have on the machine? I must be missing something simple.
int main(int argc, char *argv[])
{
long long size=1024;
long cnt=0;
while (true)
{
char *buffer = new char[size];
// Assume the alloc succeeded. We are looking for the failure after all.
// Try to write to the allocated memory, may fail
buffer[size-5] = 'T';
buffer[size-4] = 'e';
buffer[size-3] = 's';
buffer[size-2] = 't';
buffer[size-1] = '\0';
// report
if (cnt<10)
cout << "size[" << cnt << "]: " << (size/1024.) << "Kb ";
else if (cnt<20)
cout << "size[" << cnt << "]: " << (size/1024./1024.) << "Mb ";
else
cout << "size[" << cnt << "]: " << (size/1024./1024./1024.) << "Gi ";
cout << "addr: 0x" << (long)buffer << " ";
cout << "str: " << &buffer[size-5] << "\n";
// cleanup
delete [] buffer;
// double size and continue
size *= 2;
cnt++;
}
return 0;
}
When you ask for memory, an operating system reserves the right not to actually give you that memory until you actually use it.
That's what's happening here: you're only ever using 5 bytes. My ZX81 from the 1980s could handle that.
MacOS X, like almost every modern operating system, uses "delayed allocation" for memory. When you call new, the OS doesn't actually allocate any memory. It simply makes a note that your program wants a certain amount of memory, and that memory area you want starts at a certain address. Memory is only actually allocated when your program tries to use it.
Further, memory is allocated in units called "pages". I believe MacOS X uses 4kb pages, so when your program writes to the end of the buffer, the OS gives you 4096 bytes there, while retaining the rest of the buffer as simply a "your program wants this memory" note.
As for why you're hitting the limit at 64 terabytes, it's because current x86-64 processors use 48-bit addressing. This gives 256 TB of address space, which is split evenly between the operating system and your program. Doubling the 64 TB allocation would exactly fit in your program's 128 TB half of the address space, except that the program is already taking up a little bit of it.
Virtual memory is the key to allocating more address space than you have physical RAM+swap space.
malloc uses the mmap(MAP_ANONYMOUS) system call to get pages from the OS. (Assuming OS X works like Linux, since they're both POSIX OSes). These pages are all copy-on-write mapped to a single physical zero page. i.e. they all read as zero with only a TLB miss (no page fault and no allocation of physical RAM). An x86 page is 4kiB. (I'm not mentioning hugepages because they're not relevant here).
Writing to any of those pages triggers a soft page fault for the kernel to handle the copy-on-write. The kernel allocates a zeroed page of physical memory and re-wires that virtual page to be backed by the physical page. On return from the page fault, the store is re-executed and succeeds this time.
So after allocating 64TiB and storing 5 bytes to the end of it, you've used one extra page of physical memory. (And added an entry to malloc's bookkeeping data, but that was probably already allocated and in a dirty page. In a similar question about multiple tiny allocations, malloc's bookkeeping data was what eventually used up all the space).
If you actually dirtied more pages than the system had RAM + swap, the kernel would have a problem because it's too late for malloc to return NULL. This is called "overcommit", and some OSes enable it by default while others don't. In Linux, it's configurable.
As Mark explains, you run out of steam at 64TiB because current x86-64 implementations only support 48-bit virtual addresses. The upper 16 bits need to be copies of bit 47. (i.e. an address is only canonical if the 64-bit value is the sign-extension of the low 48 bits).
This requirement stops programs from doing anything "clever" with the high bits, and then breaking on future hardware that does support even larger virtual address spaces.

How to Change the Maximum Size of a malloc() Allocation in C++

As far as I can tell, calling malloc() basically means the program is asking the OS for a hunk of memory. I'm writing a program to interface with a camera, in which I need to allocate chucks of memory large enough to store hundreds of images at a time (its a fast camera).
When I allocate space for about 1.9 Gb worth of images, everything works just fine. The allocation calculation is pretty simple:
int allocateBurst( int numImages )
{
int streamSize = ZIMAGESIZE * numImages;
data.images = new unsigned short [streamSize];
return 0;
}
But as soon as I go over the 2 Gb limit, I get runtime errors like this:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
It seems like 2 Gigs might be the maximum size that I can allocate at once. I have 32 Gigs of ram, and would like to simply be able to allocate larger pieces of memory in one allocation. Is this possible?
I'm running Ubuntu 12.10.
There may be an underlying issue that the OS can't grant your large memory allocation because it is using memory for other applications. Check with your OS to see what the limits are.
Also know that some OS's will "page" memory to the hard disk. When your program asks for memory outside the page, the OS will swap pages with the hard disk. Knowing this, I recommend a classic technique of "Double Buffering" or "Multiple Buffering".
You will need at least two threads: reading and writing. One thread is responsible for reading data from the camera and placing into a buffer. When it fills up a buffer, it starts on another buffer. Meanwhile the writing thread is starting at the buffer and writing it to disk (block file writes). When the writing thread finishes a buffer, it starts on the next one. The buffers should be in a circular sequence to reuse them.
The magic is to have enough buffers so that the reader never catches up to the writer.
Since you are using a couple of small buffers, you should not get any errors from the OS.
The are methods to optimize this, such as obtaining static buffers from the OS.
The problem is you're using a signed 32-bit variable to describe an unsigned 64-bit number.
Use "size_t" instead of "int" for holding the storage count. This has nothing to do with what you intend to store, just how large a count of them you need.
#include <iostream>
int main(int /*argc*/, const char** /*argv*/)
{
int units = 2;
// 32-bit signed, i.e. 31-bit numbers.
int intSize = units * 1024 * 1024 * 1024;
// 64-bit values (ULL suffix)
size_t sizetSize = units * 1024ULL * 1024ULL * 1024ULL;
std::cout << "intSize = " << intSize << ", sizetSize = " << sizetSize << std::endl;
try {
unsigned short* intAlloc = new unsigned short[intSize];
std::cout << "intAlloc = " << intAlloc << std::endl;
delete [] intAlloc;
} catch (std::bad_alloc) {
std::cout << "intAlloc failed (std::bad_alloc)" << std::endl;
}
try {
unsigned short* sizetAlloc = new unsigned short[sizetSize];
std::cout << "sizetAlloc = " << sizetAlloc << std::endl;
delete [] sizetAlloc;
} catch (std::bad_alloc) {
std::cout << "sizetAlloc failed (std::bad_alloc)" << std::endl;
}
return 0;
}
Output (g++ -m64 -o test test.cpp under Mint 15 64 bit with g++ 4.7.3 on a virtual machine with 4Gb of memory)
intSize = -2147483648, sizetSize = 2147483648
intAlloc failed
sizetAlloc = 0x7f55affff010
int allocateBurst( int numImages )
{
// change that from int to long
long streamSize = ZIMAGESIZE * numImages;
data.images = new unsigned short [streamSize];
return 0;
}
Try using
long
OR
cast the result of the allocateBurst function to "uint_64" and the return type of the function to uint_64
Because int you allocate 32 bit allocation while long or uint_64 allocates 64 bit allocation which could possibly allocate more memory space for you.
Hope that helps

Maximum memory for stack, static and heap memory in C++

I am trying to find the maximum memory that I could allocate on stack, global and heap memory in C++. I am trying this program on a Linux system with 32 GB of memory, and on my Mac with 2 GB of RAM.
/* test to determine the maximum memory that could be allocated for static, heap and stack memory */
#include <iostream>
using namespace std;
//static/global
long double a[200000000];
int main()
{
//stack
long double b[999999999];
//heap
long double *c = new long double[3999999999];
cout << "Sizeof(long double) = " << sizeof(long double) << " bytes\n";
cout << "Allocated Global (Static) size of a = " << (double)((sizeof(a))/(double)(1024*1024*1024)) << " Gbytes \n";
cout << "Allocated Stack size of b = " << (double)((sizeof(b))/(double)(1024*1024*1024)) << " Gbytes \n";
cout << "Allocated Heap Size of c = " << (double)((3999999999 * sizeof(long double))/(double)(1024*1024*1024)) << " Gbytes \n";
delete[] c;
return 0;
}
Results (on both):
Sizeof(long double) = 16 bytes
Allocated Global (Static) size of a = 2.98023 Gbytes
Allocated Stack size of b = 14.9012 Gbytes
Allocated Heap Size of c = 59.6046 Gbytes
I am using GCC 4.2.1. My question is:
Why is my program running? I expected since stack got depleted (16 MB in linux, and 8 MB in Mac), the program should throw an error. I saw some of the many questions asked in this topic, but I couldn't solve my problem from the answers given there.
On some systems you can allocate any amount of memory that fits in the address space. The problems begin when you start actually using that memory.
What happens is that the OS reserves a virtual address range for the process, without mapping it to anything physical, or even checking that there's enough physical memory (including swap) to back that address range up. The mapping only happens in a page-by-page fashion, when the process tries to access newly allocated pages. This is called memory overcommitment.
Try accessing every sysconf(_SC_PAGESIZE)th byte of your huge arrays and see what happens.
Linux overcommits, meaning that it can allow a process more memory than is available on the system, but it is not until that memory is actually used by the process that actual memory (physical main memory or swap space on disk) is allocated for the process. My guess would be that Mac OS X works in a similar way.