I decided to use C++ 2D vectors for an applications. While testing some code I encountered a bad alloc error. I read that this may be caused by memory shortage so I investigated my program's memory use by calling the malloc_stats function at different steps in gdb.
While I am not sure I completely understand the output of the function, it seems to indicate memory leaks.
I tried to sum up the problem in the short code below:
#include <vector>
#include <iostream>
using namespace std;
vector<vector<double>> get_predictions(){
vector<vector<double>> predictions;
for(int i = 0; i < 10; i++){
vector<double> new_state;
new_state.push_back(600);
new_state.push_back(450);
new_state.push_back(100);
new_state.push_back(200);
predictions.push_back(new_state);
}
return predictions;
}
int main()
{
cout << "start" << endl;
// time loop
for(int i = 0; i < 10; i++){
auto predictions = get_predictions();
// code that uses the latest predictions
}
cout << "end" << endl;
return 0;
}
Now if I call malloc_stats() at the "start" line, the output is similar to this:
Arena 0:
system bytes = 135168
in use bytes = 74352
Total (incl. mmap):
system bytes = 135168
in use bytes = 74352
max mmap regions = 0
max mmap bytes = 0
At the "end" step, the function gives:
Arena 0:
system bytes = 135168
in use bytes = 75568
Total (incl. mmap):
system bytes = 135168
in use bytes = 75568
max mmap regions = 0
max mmap bytes = 0
The "In use bytes" field clearly increased.
Does it really mean that more allocated memory is held ?
If so, why ? Shouldn't the allocated content be freed once the different vectors go out of scope ?
Then, how to avoid such memory issues ?
Thanks a lot
Due to the suggestions in comments, I abandoned malloc_stats, which does not seem to behave as I expected, and tried out Valgrind on my code.
Valgrind confirms that there is no leak in the toy example above.
Concerning my main application, it is a program running on ARM. For reasons unknown to me, the base Valgrind implementation gotten via apt install valgrind did not output the lines concerned by memory leaks. I had to recompile it from sources available on Valgrind website. The problem must come from architecture-specific particularities ?
As a side-note, my application also uses CUDA code, which implies many false positives in Valgrind outputs.
Related
My question is related to a problem described here. I have written a C++ implementation of the Sieve of Eratosthenes that hits a memory overflow if I set the target value too high. As suggested in that question, I am able to fix the problem by using a boolean <vector> instead of a normal array.
However, I am hitting the memory overflow at a much lower value than expected, around n = 1 200 000. The discussion in the thread linked above suggests that the normal C++ boolean array uses a byte for each entry, so with 2 GB of RAM, I expect to be able to get to somewhere on the order of n = 2 000 000 000. Why is the practical memory limit so much smaller?
And why does using <vector>, which encodes the booleans as bits instead of bytes, yield more than an eightfold increase in the computable limit?
Here is a working example of my code, with n set to a small value.
#include <iostream>
#include <cmath>
#include <vector>
using namespace std;
int main() {
// Count and sum of primes below target
const int target = 100000;
// Code I want to use:
bool is_idx_prime[target];
for (unsigned int i = 0; i < target; i++) {
// initialize by assuming prime
is_idx_prime[i] = true;
}
// But doesn't work for target larger than ~1200000
// Have to use this instead
// vector <bool> is_idx_prime(target, true);
for (unsigned int i = 2; i < sqrt(target); i++) {
// All multiples of i * i are nonprime
// If i itself is nonprime, no need to check
if (is_idx_prime[i]) {
for (int j = i; i * j < target; j++) {
is_idx_prime[i * j] = 0;
}
}
}
// 0 and 1 are nonprime by definition
is_idx_prime[0] = 0; is_idx_prime[1] = 0;
unsigned long long int total = 0;
unsigned int count = 0;
for (int i = 0; i < target; i++) {
// cout << "\n" << i << ": " << is_idx_prime[i];
if (is_idx_prime[i]) {
total += i;
count++;
}
}
cout << "\nCount: " << count;
cout << "\nTotal: " << total;
return 0;
}
outputs
Count: 9592
Total: 454396537
C:\Users\[...].exe (process 1004) exited with code 0.
Press any key to close this window . . .
Or, changing n = 1 200 000 yields
C:\Users\[...].exe (process 3144) exited with code -1073741571.
Press any key to close this window . . .
I am using the Microsoft Visual Studio interpreter on Windows with the default settings.
Turning the comment into a full answer:
Your operating system reserves a special section in the memory to represent the call stack of your program. Each function call pushes a new stack frame onto the stack. If the function returns, the stack frame is removed from the stack. The stack frame includes the memory for the parameters to your function and the local variables of the function. The remaining memory is referred to as the heap. On the heap, arbitrary memory allocations can be made, whereas the structure of the stack is governed by the control flow of your program. A limited amount of memory is reserved for the stack, when it gets full (e.g. due to too many nested function calls or due to too large local objects), you get a stack overflow. For this reason, large objects should be allocated on the heap.
General references on stack/heap: Link, Link
To allocate memory on the heap in C++, you can:
Use vector<bool> is_idx_prime(target);, which internally does a heap allocation and deallocates the memory for you when the vector goes out of scope. This is the most convenient way.
Use a smart pointer to manage the allocation: auto is_idx_prime = std::make_unique<bool[]>(target); This will also automatically deallocate the memory when the array goes out of scope.
Allocate the memory manually. I am mentioning this only for educational purposes. As mentioned by Paul in the comments, doing a manual memory allocation is generally not advisable, because you have to manually deallocate the memory again. If you have a large program with many memory allocations, inevitably you will forget to free some allocation, creating a memory leak. When you have a long-running program, such as a system service, creating repeated memory leaks will eventually fill up the entire memory (and speaking from personal experience, this absolutely does happen in practice). But in theory, if you would want to make a manual memory allocation, you would use bool *is_idx_prime = new bool[target]; and then later deallocate again with delete [] is_idx_prime.
I have been testing out some of my own code to see how much allocated memory it takes to exhaust the memory on the heap or free store. However, unless my code is wrong in the testing of it, I am getting completely different results in terms of how much memory can be put on the heap.
I am testing two different programs. The first program creates vector objects on the heap. The second program creates integer objects on the heap.
Here is my code:
#include <vector>
#include <stdio.h>
int main()
{
long long unsigned bytes = 0;
unsigned megabytes = 0;
for (long long unsigned i = 0; ; i++) {
std::vector<int>* pt1 = new std::vector<int>(100000,10);
bytes += sizeof(*pt1);
bytes += pt1->size() * sizeof(pt1->at(0));
megabytes = bytes / 1000000;
if (i >= 1000 && i % 1000 == 0) {
printf("There are %d megabytes on the heap\n", megabytes);
}
}
}
The final output of this code before getting a bad_alloc error is: "There are 2000 megabytes on the heap"
In the second program:
#include <stdio.h>
int main()
{
long long unsigned bytes = 0;
unsigned megabytes = 0;
for (long long unsigned i = 0; ; i++) {
int* pt1 = new int(10);
bytes += sizeof(*pt1);
megabytes = bytes / 1000000;
if (i >= 100000 && i % 100000 == 0) {
printf("There are %d megabytes on the heap\n", megabytes);
}
}
}
The final output of this code before getting a bad_alloc error is: "There are 511 megabytes on the heap"
The final output in both programs is vastly different. Am I misunderstanding something about the free store? I thought that both results would be about the same.
It is very likely that pointers returned by new on your platform are 16-byte aligned.
If int is 4 bytes, this means that for every new int(10) you're getting four bytes and making 12 bytes unusable.
This alone would explain the difference between getting 500MB of usable space from small allocations and 2000MB from large ones.
On top of that, there's overhead of keeping track of allocated blocks (at a minimum, of their size and whether they're free or in use). That is very much specific to your system's memory allocator but also incurs per-allocation overhead. See "What is a Chunk" in https://sourceware.org/glibc/wiki/MallocInternals for an explanation of glibc's allocator.
First of all you have to understand that operating system assign memory to process in quite large chunks of memory called pages (it is a hardware property). Page size is about 4 -16 kB.
Now standard library try use memory in efficient way. So it have to find a way to chop pages to smaller pieces and manage them. To do that some extra information about heap structure have to be maintained.
Here is cool Andrei Alexandrescu cppcon talk more or less how it works (it omits information about pages management).
So when you allocating lots of small objects information about heap structure is quite large. On other hand if you allocating smaller number of larger objects is more efficient - less memory is waisted on tracking memory structure.
Note also that depending on heap strategy sometimes (when small piece of memory is requested) it is more efficient to waste some memory and return larger size of memory then it was requested.
I would like to find out the amount of bytes used by a process from within a C++ program by inspecting the operating system's memory information. The reason I would like to do this is to find a possible overhead in memory allocation when allocating memory (due to memory control blocks/nodes in free lists etc.) Currently I am on mac and am using this code:
#include <mach/mach.h>
#include <iostream>
int getResidentMemoryUsage() {
task_basic_info t_info;
mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;
if (task_info(mach_task_self(), TASK_BASIC_INFO,
reinterpret_cast<task_info_t>(&t_info),
&t_info_count) == KERN_SUCCESS) {
return t_info.resident_size;
}
return -1;
}
int getVirtualMemoryUsage() {
task_basic_info t_info;
mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;
if (task_info(mach_task_self(), TASK_BASIC_INFO,
reinterpret_cast<task_info_t>(&t_info),
&t_info_count) == KERN_SUCCESS) {
return t_info.virtual_size;
}
return -1;
}
int main(void) {
int virtualMemoryBefore = getVirtualMemoryUsage();
int residentMemoryBefore = getResidentMemoryUsage();
int* a = new int(5);
int virtualMemoryAfter = getVirtualMemoryUsage();
int residentMemoryAfter = getResidentMemoryUsage();
std::cout << virtualMemoryBefore << " " << virtualMemoryAfter << std::endl;
std::cout << residentMemoryBefore << " " << residentMemoryAfter << std::endl;
return 0;
}
When running this code I would have expected to see that the memory usage has increased after allocating an int. However when I run the above code I get the following output:
75190272 75190272
819200 819200
I have several questions because this output does not make any sense.
Why hasn't either the virtual/resident memory changed after an integer has been allocated?
How come the operating system is allocating such large amounts of memory to a running process.
When I do run the code and check activity monitor I find that 304 kb of memory is used but that number differs from the virtual/resident memory usage obtained programmatically.
My end goal is to be able to find the memory overhead when assigning data, so is there a way to do this (i.e. determine the bytes used by the OS and compare with the bytes allocated to find the difference is what I am currently thinking of)
Thank you for reading
The C++ runtime typically allocates a block of memory when a program starts up, and then parcels this out to your code when you use things like new, and adds it back to the block when you call delete. Hence, the operating system doesn't know anything about individual new or delete calls. This is also true for malloc and free in C (or C++)
First you measure number of pages, not really memory allocated. Second the runtime pre allocates few pages at startup. If you want to observe something allocate more than a single int. Try allocating several thousands and you will observe some changes.
So I'm taking an assembly course and have been tasked with making a benchmark program for my computer - needless to say, I'm a bit stuck on this particular piece.
As the title says, we're supposed to create a function to read from 5x108 different array elements, 4 bytes each time. My only problem is, I don't even think it's possible for me to create an array of 500 million elements? So what exactly should I be doing? (For the record, I'm trying to code this in C++)
//Benchmark Program in C++
#include <iostream>
#include <time.h>
using namespace std;
int main() {
clock_t t1,t2;
int readTemp;
int* arr = new int[5*100000000];
t1=clock();
cout << "Memory Test"
<< endl;
for(long long int j=0; j <= 500000000; j+=1)
{
readTemp = arr[j];
}
t2=clock();
float diff ((float)t2-(float)t1);
float seconds = diff / CLOCKS_PER_SEC;
cout << "Time Taken: " << seconds << " seconds" <<endl;
}
Your system tries to allocate 2 billion bytes (1907 MiB), while the maximum available memory for Windows is 2 gigabytes (2048 MiB). These numbers are very close. It's likely your system has allocated the remaining 141 MiB for other stuff. Even though your code is very small, OS is pretty liberal in allocation of the 2048 MiB address space, wasting large chunks for e.g. the following:
C++ runtime (standard library and other libraries)
Stack: OS allocates a lot of memory to support recursive functions; it doesn't matter that you don't have any
Paddings between virtual memory pages
Paddings used just to make specific sections of data appear at specific addresses (e.g. 0x00400000 for lowest code address, or something like that, is used in Windows)
Padding used to randomize the values of pointers
There's a Windows application that shows a memory map of a running process. You can use it by adding a delay (e.g. getchar()) before the allocation and looking at the largest contiguous free block of memory at that point, and which allocations prevent it from being large enough.
The size is possible :
5 * 10^8 * 4 = ~1.9 GB.
First you will need to allocate your array (dynamically only ! There's no such stack memory).
For your task the 4 bytes is the size of an interger, so you can do it
int* arr = new int[5*100000000];
Alternatively, if you want to be more precise, you can allocate it as bytes
int* arr = new char[5*4*100000000];
Next, you need to make the memory dirty (meaning write something into it) :
memset(arr,0,5*100000000*sizeof(int));
Now, you can benchmark cache misses (I'm guessing that's what it's intended in such a huge array) :
int randomIndex= GetRandomNumberBetween(0,5*100000000-1); // make your own random implementation
int bytes = arr[randomIndex]; // access 4 bytes through integer
If you want 5* 10 ^8 accesses randomly you can make a knuth shuffle inside your getRandomNumber instead of using pure random.
I have a simple question. I have a few files, one file is around ~20000 lines.
It has 5 fields, have some other adt (vectors and lists), but those do not cause a segfault.
The map itself will store a key value, equivalent to about 1 per line.
When I added a map to my code, I would instantly get a segfault, I copied 5000 of 20000 lines, and receive a segfault, then 1000, and it worked.
In java there is a way to increase the amount of virtually allocated memory, is there a way to do so in c++? I have even deleted elements as they are no longer used, and I can get around 2000 lines, but not more.
Here is gdb:
(gdb) exec-file readin
(gdb) run
Starting program: /x/x/x/readin readin
Program exited normally.
valgrind:
HEAP SUMMARY:
==7948== in use at exit: 0 bytes in 0 blocks
==7948== total heap usage: 20,206 allocs, 20,206 frees, 2,661,509 bytes allocated
==7948==
==7948== All heap blocks were freed -- no leaks are possible
code:
....
Flow flw = endQueue.top();
stringstream str1;
stringstream str2;
if (flw.getSrc() < flw.getDest()){
str1 << flw.getSrc();
str2 << flw.getDest();
flw_src_dest = str1.str() + "-" + str2.str();
} else {
str1 << flw.getSrc();
str2 << flw.getDest();
flw_src_dest = str2.str() + "-" + str1.str();
}
while (int_start > flw.getEnd()){
if(flw.getFlow() == 1){
ava_bw[flw_src_dest] += 5.5;
} else {
ava_bw[flw_src_dest] += 2.5;
}
endQueue.pop();
}
A segmentation fault doesn't necessarily indicate that you're out of memory. In fact, with C++, it's highly unlikely: you would usually get a bad_alloc or somesuch in this case (unless you're dumping everything in objects with automatic storage duration?!).
More likely, you have a memory corruption bug in your code, that just so happens to only be noticeable when you have more than a certain number of objects.
At any rate, the solution to memory faults is not to blindly throw more memory at the program.
Run your code through valgrind and through a debugger, and see what the real problem is.
Be careful erasing elements from a container while you are iterating over the container.
for (pos = ava_bw.begin(); pos != ava_bw.end(); ++pos) {
if (pos->second == INIT){
ava_bw.erase(pos);
}
}
I believe this will have pos pointing to the next value but then ++pos will advance it yet again. If erase(pos) resulted in pos pointing at ava_bw.end(), the ++pos will fail.
I know if you tried this with a vector, pos will be invalidated.
Edit
In the while loop you do
while (int_start > flw.getEnd()){
if(flw.getFlow() == 1){
ava_bw[flw_src_dest] += 5.5;
} else {
ava_bw[flw_src_dest] += 2.5;
}
endQueue.pop();
}
You need to do flw = endQueue.top() again.
Generally speaking, in C\C++ max amount of available heap isn't fixed at start of the program -- you can always allocate some more memory, either via direct usage of new/malloc or by using STL containers, such as stl::list, which can do it by themselves.
I don't think the problem is memory, as C++ gets as much memory as it asks for, even hogging all available memory on your PC. Look if you delete something you access later on.