Why this app doesn't consume as much memory as expected

Why this app doesn't consume as much memory as expected - c++

I wrote a simple application to test memory consumption. In this test application, I created four processes to continually consume memory, those processes won't release the memory unless the process exits.
I expected this test application to consume the most memory of RAM and cause the other application to slow down or crash. But the result is not the same as expected. Below is the code:
#include <stdio.h>
#include <unistd.h>
#include <list>
#include <vector>
using namespace std;
unsigned short calcrc(unsigned char *ptr, int count)
{
unsigned short crc;
unsigned char i;
//high cpu-consumption code
//implements the CRC algorithm
//CRC is Cyclic Redundancy Code
}
void* ForkChild(void* param){
vector<unsigned char*> MemoryVector;
pid_t PID = fork();
if (PID > 0){
const int TEN_MEGA = 10 * 10 * 1024 * 1024;
unsigned char* buffer = NULL;
while(1){
buffer = NULL;
buffer = new unsigned char [TEN_MEGA];
if (buffer){
try{
calcrc(buffer, TEN_MEGA);
MemoryVector.push_back(buffer);
} catch(...){
printf("An error was throwed, but caught by our app!\n");
delete [] buffer;
buffer = NULL;
}
}
else{
printf("no memory to allocate!\n");
try{
if (MemoryVector.size()){
buffer = MemoryVector[0];
calcrc(buffer, TEN_MEGA);
buffer = NULL;
} else {
printf("no memory ever allocated for this Process!\n");
continue;
}
} catch(...){
printf("An error was throwed -- branch 2,"
"but caught by our app!\n");
buffer = NULL;
}
}
} //while(1)
} else if (PID == 0){
} else {
perror("fork error");
}
return NULL;
}
int main(){
int children = 4;
while(--children >= 0){
ForkChild(NULL);
};
while(1) sleep(1);
printf("exiting main process\n");
return 0;
}
TOP command
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2775 steve 20 0 1503m 508 312 R 99.5 0.0 1:00.46 test
2777 steve 20 0 1503m 508 312 R 96.9 0.0 1:00.54 test
2774 steve 20 0 1503m 904 708 R 96.6 0.0 0:59.92 test
2776 steve 20 0 1503m 508 312 R 96.2 0.0 1:00.57 test
Though CPU is high, but memory percent remains 0.0. How can it be possible??
Free command
free shared buffers cached
Mem: 3083796 0 55996 428296
Free memory is more than 3G out of 4G RAM.
Does there anybody know why this test app just doesn't work as expected?

Linux uses optimistic memory allocation: it will not physically allocate a page of memory until that page is actually written to. For that reason, you can allocate much more memory than what is available, without increasing memory consumption by the system.
If you want to force the system to allocate (commit) a physical page , then you have to write to it.
The following line does not issue any write, as it is default-initialization of unsigned char, which is a no-op:
buffer = new unsigned char [TEN_MEGA];
If you want to force a commit, use zero-initialization:
buffer = new unsigned char [TEN_MEGA]();

To make the comments into an answer:
Linux will not allocate memory pages for a process until it writes to them (copy-on-write).
Additionally, you are not writing to your buffer anywhere, as the default constructor for unsigned char does not perform any initializations, and new[] default-initializes all items.

fork() returns the PID in the parent, and 0 in the child. Your ForkChild as written will execute all the work in the parent, not the child.
And the standard new operator will never return null; it will throw if it fails to allocate memory (but due to overcommit it won't actually do that either in Linux). This means your test of buffer after the allocation is meaningless: it will always either take the first branch or never reach the test. If you want a null return, you need to write new (std::nothrow) .... Include <new> for that to work.

But your program is infact doing what you expected it to do. As an answer has pointed out (# Michael Foukarakis's answer), memory not used is not allocated. In your output of the top program, I noticed that the column virt had a large amount of memory on it for each process running your program. A little googling later, I saw what this was:
VIRT -- Virtual Memory Size (KiB). The total amount of virtual memory used by the task. It includes all code, data and shared libraries plus pages that have been swapped out and pages that have been mapped but not used.
So as you can see, your program does in fact generate memory for itself, but in the form of pages and stored as virtual memory. And I think that is a smart thing to do
A snippet from this wiki page
A page, memory page, or virtual page -- a fixed-length contiguous block of virtual memory, and it is the smallest unit of data for the following:
memory allocation performed by the operating system for a program; and
transfer between main memory and any other auxiliary store, such as a hard disk drive.
...Thus a program can address more (virtual) RAM than physically exists in the computer. Virtual memory is a scheme that gives users the illusion of working with a large block of contiguous memory space (perhaps even larger than real memory), when in actuality most of their work is on auxiliary storage (disk). Fixed-size blocks (pages) or variable-size blocks of the job are read into main memory as needed.
Sources:
http://www.computerhope.com/unix/top.htm
https://stackoverflow.com/a/18917909/2089675
http://en.wikipedia.org/wiki/Page_(computer_memory)

If you want to gobble up a lot of memory:
int mb = 0;
char* buffer;
while (1) {
buffer = malloc(1024*1024);
memset(buffer, 0, 1024*1024);
mb++;
}
I used something like this to make sure the file buffer cache was empty when taking some file I/O timing measurements.
As other answers have already mentioned, your code doesn't ever write to the buffer after allocating it. Here memset is used to write to the buffer.

Related

32-bit malloc() return NULL when opening many threads?

I have a sample C++ program as below:
#include <windows.h>
#include <stdio.h>
int main(int argc, char* argv[])
{
void * pointerArr[20000];
int i = 0, j;
for (i = 0; i < 20000; i++) {
void * pointer = malloc(131125);
if (pointer == NULL) {
printf("i = %d, out of memory!\n", i);
getchar();
break;
}
pointerArr[i] = pointer;
}
for (j = 0; j < i; j++) {
free(pointerArr[j]);
}
getchar();
return 0;
}
When I run it with Visual Studio 32-bit Debug, it will run with following result:
The program can use nearly 2Gb of memory before out of memory.
This is normal behavior.
However, when I adding the code to start Thread inside the for loop as below:
#include <windows.h>
#include <stdio.h>
DWORD WINAPI thread_func(VOID* pInArgs)
{
Sleep(100000);
return 0;
}
int main(int argc, char* argv[])
{
void * pointerArr[20000];
int i = 0, j;
for (i = 0; i < 20000; i++) {
CreateThread(NULL, 0, thread_func, NULL, 0, NULL);
void * pointer = malloc(131125);
if (pointer == NULL) {
printf("i = %d, out of memory!\n", i);
getchar();
break;
}
pointerArr[i] = pointer;
}
for (j = 0; j < i; j++) {
free(pointerArr[j]);
}
getchar();
return 0;
}
The result is as below:
The memory is still just around 200Mb but function malloc will return NULL.
Could anyone help explain why the program cannot use the memory up to 2Gb before out of memory?
Is it mean creating many threads like above will cause memory leak?
In my real application, this error occur when I create about 800 threads, the RAM memory at the time "out of memory" is around 300Mb.

As noted in a comment by #macroland, the main thing happening here is that each thread is consuming 1 MiB for its stack (see MSDN CreateThread and Thread Stack Size). You say malloc returns NULL once the total you have directly allocated reaches 200 MB. Since you are allocating 131125 bytes at a time, that is 200 MB / 131125 B = 1525 threads. Their cumulative stack space will be around 1.5 GB. Adding the 200 MB of malloc memory is 1.7 GB, and miscellaneous overhead likely accounts for the rest.
So, why does Task Manager not show this? Because the full 1 MiB of thread stack space is not actually allocated (also called committed), rather it is reserved. See VirtualAlloc and the MEM_RESERVE flag. The address space has been reserved for expansion up to 1 MiB, but initially only 64 KiB are allocated, and Task Manager only counts the latter. But reserved memory will not be unilaterally repurposed by malloc until the reservation is lifted, so once it runs out of available address space, it has to return NULL.
What tool can show this? I don't know of anything off the shelf (even Process Explorer does not seem show a count of reserved memory). What I have done in the past is write my own little routine that uses VirtualQuery to enumerate the entire address space, including reserved ranges. I recommend you do the same; it's not much code to write, and very handy when coding for 32-bit Windows because the 2 GiB address space gets cramped very easily (DLLs are an obvious reason, but the default malloc also will leave unexpected reservations behind in response to certain allocation patterns even if you free everything).
In any case, if you want to create thousands of threads in a 32-bit Windows process, be sure to pass a non-zero value as the dwStackSize parameter to CreateThread, and also pass STACK_SIZE_PARAM_IS_A_RESERVATION as dwCreationFlags. The minimum is 64 KiB, which will be plenty if you avoid recursive algorithms in the threads.
Addendum: In a comment, #iinspectable cautions against using thousands of threads, citing Raymond Chen's 2005 blog post Does Windows have a limit of 2000 threads per process?. I agree that doing so is questionable for a variety of reasons; it is not my intent to endorse the practice, rather I'm just explaining one necessary element.

How to tell a C++ program to get the system memory? [duplicate]

I want to allocate my buffers according to memory available. Such that, when I do processing and memory usage goes up, but still remains in available memory limits. Is there a way to get available memory (I don't know will virtual or physical memory status will make any difference ?). Method has to be platform Independent as its going to be used on Windows, OS X, Linux and AIX. (And if possible then I would also like to allocate some of available memory for my application, someone it doesn't change during the execution).
Edit: I did it with configurable memory allocation.
I understand it is not good idea, as most OS manage memory for us, but my application was an ETL framework (intended to be used on server, but was also being used on desktop as a plugin for Adobe indesign). So, I was running in to issue of because instead of using swap, windows would return bad alloc and other applications start to fail. And as I was taught to avoid crashes and so, was just trying to degrade gracefully.

On UNIX-like operating systems, there is sysconf.
#include <unistd.h>
unsigned long long getTotalSystemMemory()
{
long pages = sysconf(_SC_PHYS_PAGES);
long page_size = sysconf(_SC_PAGE_SIZE);
return pages * page_size;
}
On Windows, there is GlobalMemoryStatusEx:
#include <windows.h>
unsigned long long getTotalSystemMemory()
{
MEMORYSTATUSEX status;
status.dwLength = sizeof(status);
GlobalMemoryStatusEx(&status);
return status.ullTotalPhys;
}
So just do some fancy #ifdefs and you'll be good to go.

There are reasons to do want to do this in HPC for scientific software. (Not game, web, business or embedded software). Scientific software routinely go through terabytes of data to get through one computation (or run) (and run for hours or weeks) -- all of which cannot be stored in memory (and if one day you tell me a terabyte is standard for any PC or tablet or phone it will be the case that the scientific software will be expected to handle petabytes or more). The amount of memory can also dictate the kind of method/algorithm that makes sense. The user does not always want to decide the memory and method - he/she has other things to worry about. So the programmer should have a good idea of what is available (4Gb or 8Gb or 64Gb or thereabouts these days) to decide whether a method will automatically work or a more laborious method is to be chosen. Disk is used but memory is preferable. And users of such software are not encouraged to be doing too many things on their computer when running such software -- in fact, they often use dedicated machines/servers.

There is no platform independent way to do this, different operating systems use different memory management strategies.
These other stack overflow questions will help:
How to get memory usage at run time in c++?
C/C++ memory usage API in Linux/Windows
You should watch out though: It is notoriously difficult to get a "real" value for available memory in linux. What the operating system displays as used by a process is no guarantee of what is actually allocated for the process.
This is a common issue when developing embedded linux systems such as routers, where you want to buffer as much as the hardware allows. Here is a link to an example showing how to get this information in a linux (in C):
http://www.unix.com/programming/25035-determining-free-available-memory-mv-linux.html

Having read through these answers I'm astonished that so many take the stance that OP's computer memory belongs to others. It's his computer and his memory to do with as he sees fit, even if it breaks other systems taking a claim it. It's an interesting question. On a more primitive system I had memavail() which would tell me this. Why shouldn't the OP take as much memory as he wants without upsetting other systems?
Here's a solution that allocates less than half the memory available, just to be kind. Output was:
Required FFFFFFFF
Required 7FFFFFFF
Required 3FFFFFFF
Memory size allocated = 1FFFFFFF
#include <stdio.h>
#include <stdlib.h>
#define MINREQ 0xFFF // arbitrary minimum
int main(void)
{
unsigned int required = (unsigned int)-1; // adapt to native uint
char *mem = NULL;
while (mem == NULL) {
printf ("Required %X\n", required);
mem = malloc (required);
if ((required >>= 1) < MINREQ) {
if (mem) free (mem);
printf ("Cannot allocate enough memory\n");
return (1);
}
}
free (mem);
mem = malloc (required);
if (mem == NULL) {
printf ("Cannot enough allocate memory\n");
return (1);
}
printf ("Memory size allocated = %X\n", required);
free (mem);
return 0;
}

Mac OS X example using sysctl (man 3 sysctl):
#include <stdio.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/sysctl.h>
int main(void)
{
int mib[2] = { CTL_HW, HW_MEMSIZE };
u_int namelen = sizeof(mib) / sizeof(mib[0]);
uint64_t size;
size_t len = sizeof(size);
if (sysctl(mib, namelen, &size, &len, NULL, 0) < 0)
{
perror("sysctl");
}
else
{
printf("HW.HW_MEMSIZE = %llu bytes\n", size);
}
return 0;
}
(may also work on other BSD-like operating systems ?)

The code below gives the total and free memory in Megabytes. Works for FreeBSD, but you should be able to use same/similar sysctl tunables on your platform and do to the same thing (Linux & OS X have sysctl at least)
#include <stdio.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/sysctl.h>
#include <sys/vmmeter.h>
int main(){
int rc;
u_int page_size;
struct vmtotal vmt;
size_t vmt_size, uint_size;
vmt_size = sizeof(vmt);
uint_size = sizeof(page_size);
rc = sysctlbyname("vm.vmtotal", &vmt, &vmt_size, NULL, 0);
if (rc < 0){
perror("sysctlbyname");
return 1;
}
rc = sysctlbyname("vm.stats.vm.v_page_size", &page_size, &uint_size, NULL, 0);
if (rc < 0){
perror("sysctlbyname");
return 1;
}
printf("Free memory : %ld\n", vmt.t_free * (u_int64_t)page_size);
printf("Available memory : %ld\n", vmt.t_avm * (u_int64_t)page_size);
return 0;
}
Below is the output of the program, compared with the vmstat(8) output on my system.
~/code/memstats % cc memstats.c
~/code/memstats % ./a.out
Free memory : 5481914368
Available memory : 8473378816
~/code/memstats % vmstat
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr ad0 ad1 in sy cs us sy id
0 0 0 8093M 5228M 287 0 1 0 304 133 0 0 112 9597 1652 2 1 97

Linux currently free memory: sysconf(_SC_AVPHYS_PAGES) and get_avphys_pages()
The total RAM was covered at https://stackoverflow.com/a/2513561/895245 with sysconf(_SC_PHYS_PAGES);.
Both sysconf(_SC_AVPHYS_PAGES) and get_avphys_pages() are glibc extensions to POSIX that give instead the total currently available RAM pages.
You then just have to multiply them by sysconf(_SC_PAGE_SIZE) to obtain the current free RAM.
Minimal runnable example at: C - Check available free RAM?

The "official" function for this is was std::get_temporary_buffer(). However, you might want to test whether your platform has a decent implemenation. I understand that not all platforms behave as desired.

Instead of trying to guess, have you considered letting the user configure how much memory to use for buffers, as well as assuming somewhat conservative defaults? This way you can still run (possibly slightly slower) with no override, but if the user know there is X memory available for the app they can improve performance by configuring that amount.

Here is a proposal to get available memory on Linux platform:
/// Provides the available RAM memory in kibibytes (1 KiB = 1024 B) on Linux platform (Available memory in /proc/meminfo)
/// For more info about /proc/meminfo : https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s2-proc-meminfo
long long getAvailableMemory()
{
long long memAvailable = -1;
std::ifstream meminfo("/proc/meminfo");
std::string line;
while (std::getline(meminfo, line))
{
if (line.find("MemAvailable:") != std::string::npos)
{
const std::size_t firstWhiteSpacePos = line.find_first_of(' ');
const std::size_t firstNonWhiteSpaceChar = line.find_first_not_of(' ', firstWhiteSpacePos);
const std::size_t nextWhiteSpace = line.find_first_of(' ', firstNonWhiteSpaceChar);
const std::size_t numChars = nextWhiteSpace - firstNonWhiteSpaceChar;
const std::string memAvailableStr = line.substr(firstNonWhiteSpaceChar, numChars);
memAvailable = std::stoll(memAvailableStr);
break;
}
}
return memAvailable;
}

How does memory on the heap get exhausted?

I have been testing out some of my own code to see how much allocated memory it takes to exhaust the memory on the heap or free store. However, unless my code is wrong in the testing of it, I am getting completely different results in terms of how much memory can be put on the heap.
I am testing two different programs. The first program creates vector objects on the heap. The second program creates integer objects on the heap.
Here is my code:
#include <vector>
#include <stdio.h>
int main()
{
long long unsigned bytes = 0;
unsigned megabytes = 0;
for (long long unsigned i = 0; ; i++) {
std::vector<int>* pt1 = new std::vector<int>(100000,10);
bytes += sizeof(*pt1);
bytes += pt1->size() * sizeof(pt1->at(0));
megabytes = bytes / 1000000;
if (i >= 1000 && i % 1000 == 0) {
printf("There are %d megabytes on the heap\n", megabytes);
}
}
}
The final output of this code before getting a bad_alloc error is: "There are 2000 megabytes on the heap"
In the second program:
#include <stdio.h>
int main()
{
long long unsigned bytes = 0;
unsigned megabytes = 0;
for (long long unsigned i = 0; ; i++) {
int* pt1 = new int(10);
bytes += sizeof(*pt1);
megabytes = bytes / 1000000;
if (i >= 100000 && i % 100000 == 0) {
printf("There are %d megabytes on the heap\n", megabytes);
}
}
}
The final output of this code before getting a bad_alloc error is: "There are 511 megabytes on the heap"
The final output in both programs is vastly different. Am I misunderstanding something about the free store? I thought that both results would be about the same.

It is very likely that pointers returned by new on your platform are 16-byte aligned.
If int is 4 bytes, this means that for every new int(10) you're getting four bytes and making 12 bytes unusable.
This alone would explain the difference between getting 500MB of usable space from small allocations and 2000MB from large ones.
On top of that, there's overhead of keeping track of allocated blocks (at a minimum, of their size and whether they're free or in use). That is very much specific to your system's memory allocator but also incurs per-allocation overhead. See "What is a Chunk" in https://sourceware.org/glibc/wiki/MallocInternals for an explanation of glibc's allocator.

First of all you have to understand that operating system assign memory to process in quite large chunks of memory called pages (it is a hardware property). Page size is about 4 -16 kB.
Now standard library try use memory in efficient way. So it have to find a way to chop pages to smaller pieces and manage them. To do that some extra information about heap structure have to be maintained.
Here is cool Andrei Alexandrescu cppcon talk more or less how it works (it omits information about pages management).
So when you allocating lots of small objects information about heap structure is quite large. On other hand if you allocating smaller number of larger objects is more efficient - less memory is waisted on tracking memory structure.
Note also that depending on heap strategy sometimes (when small piece of memory is requested) it is more efficient to waste some memory and return larger size of memory then it was requested.

What could cause a mutex to misbehave?

I've been busy the last couple of months debugging a rare crash caused somewhere within a very large proprietary C++ image processing library, compiled with GCC 4.7.2 for an ARM Cortex-A9 Linux target. Since a common symptom was glibc complaining about heap corruption, the first step was to employ a heap corruption checker to catch oob memory writes. I used the technique described in https://stackoverflow.com/a/17850402/3779334 to divert all calls to free/malloc to my own function, padding every allocated chunk of memory with some amount of known data to catch out-of-bounds writes - but found nothing, even when padding with as much as 1 KB before and after every single allocated block (there are hundreds of thousands of allocated blocks due to intensive use of STL containers, so I can't enlarge the padding further, plus I assume any write more than 1KB out of bounds would eventually trigger a segfault anyway). This bounds checker has found other problems in the past so I don't doubt its functionality.
(Before anyone says 'Valgrind', yes, I have tried that too with no results either.)
Now, my memory bounds checker also has a feature where it prepends every allocated block with a data struct. These structs are all linked in one long linked list, to allow me to occasionally go over all allocations and test memory integrity. For some reason, even though all manipulations of this list are mutex protected, the list was getting corrupted. When investigating the issue, it began to seem like the mutex itself was occasionally failing to do its job. Here is the pseudocode:
pthread_mutex_t alloc_mutex;
static bool boolmutex; // set to false during init. volatile has no effect.
void malloc_wrapper() {
// ...
pthread_mutex_lock(&alloc_mutex);
if (boolmutex) {
printf("mutex misbehaving\n");
__THROW_ERROR__; // this happens!
}
boolmutex = true;
// manipulate linked list here
boolmutex = false;
pthread_mutex_unlock(&alloc_mutex);
// ...
}
The code commented with "this happens!" is occasionally reached, even though this seems impossible. My first theory was that the mutex data structure was being overwritten. I placed the mutex within a struct, with large arrays before and after it, but when this problem occurred the arrays were untouched so nothing seems to be overwritten.
So.. What kind of corruption could possibly cause this to happen, and how would I find and fix the cause?
A few more notes. The test program uses 3-4 threads for processing. Running with less threads seems to make the corruptions less common, but not disappear. The test runs for about 20 seconds each time and completes successfully in the vast majority of cases (I can have 10 units repeating the test, with the first failure occurring after 5 minutes to several hours). When the problem occurs it is quite late in the test (say, 15 seconds in), so this isn't a bad initialization issue. The memory bounds checker never catches actual out of bounds writes but glibc still occasionally fails with a corrupted heap error (Can such an error be caused by something other than an oob write?). Each failure generates a core dump with plenty of trace information; there is no pattern I can see in these dumps, no particular section of code that shows up more than others. This problem seems very specific to a particular family of algorithms and does not happen in other algorithms, so I'm quite certain this isn't a sporadic hardware or memory error. I have done many more tests to check for oob heap accesses which I don't want to list to keep this post from getting any longer.
Thanks in advance for any help!

Thanks to all commenters. I've tried nearly all suggestions with no results, when I finally decided to write a simple memory allocation stress test - one that would run a thread on each of the CPU cores (my unit is a Freescale i.MX6 quad core SoC), each allocating and freeing memory in random order at high speed. The test crashed with a glibc memory corruption error within minutes or a few hours at most.
Updating the kernel from 3.0.35 to 3.0.101 solved the problem; both the stress test and the image processing algorithm now run overnight without failing. The problem does not reproduce on Intel machines with the same kernel version, so the problem is specific either to ARM in general or perhaps to some patch Freescale included with the specific BSP version that included kernel 3.0.35.
For those curious, attached is the stress test source code. Set NUM_THREADS to the number of CPU cores and build with:
<cross-compiler-prefix>g++ -O3 test_heap.cpp -lpthread -o test_heap
I hope this information helps someone. Cheers :)
// Multithreaded heap stress test. By Itay Chamiel 20151012.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <assert.h>
#include <pthread.h>
#include <sys/time.h>
#define NUM_THREADS 4 // set to number of CPU cores
#define ALIVE_INDICATOR NUM_THREADS
// Each thread constantly allocates and frees memory. In each iteration of the infinite loop, decide at random whether to
// allocate or free a block of memory. A list of 500-1000 allocated blocks is maintained by each thread. When memory is allocated
// it is added to this list; when freeing, a random block is selected from this list, freed and removed from the list.
void* thr(void* arg) {
int* alive_flag = (int*)arg;
int thread_id = *alive_flag; // this is a number between 0 and (NUM_THREADS-1) given by main()
int cnt = 0;
timeval t_pre, t_post;
gettimeofday(&t_pre, NULL);
const int ALLOCATE=1, FREE=0;
const unsigned int MINSIZE=500, MAXSIZE=1000;
const int MAX_ALLOC=10000;
char* membufs[MAXSIZE];
unsigned int membufs_size = 0;
int num_allocs = 0, num_frees = 0;
while(1)
{
int action;
// Decide whether to allocate or free a memory block.
// if we have less than MINSIZE buffers, allocate.
if (membufs_size < MINSIZE) action = ALLOCATE;
// if we have MAXSIZE, free.
else if (membufs_size >= MAXSIZE) action = FREE;
// else, decide randomly.
else {
action = ((rand() & 0x1)? ALLOCATE : FREE);
}
if (action == ALLOCATE) {
// choose size to allocate, from 1 to MAX_ALLOC bytes
size_t size = (rand() % MAX_ALLOC) + 1;
// allocate and fill memory
char* buf = (char*)malloc(size);
memset(buf, 0x77, size);
// add buffer to list
membufs[membufs_size] = buf;
membufs_size++;
assert(membufs_size <= MAXSIZE);
num_allocs++;
}
else { // action == FREE
// choose a random buffer to free
size_t pos = rand() % membufs_size;
assert (pos < membufs_size);
// free and remove from list by replacing entry with last member
free(membufs[pos]);
membufs[pos] = membufs[membufs_size-1];
membufs_size--;
assert(membufs_size >= 0);
num_frees++;
}
// once in 10 seconds print a status update
gettimeofday(&t_post, NULL);
if (t_post.tv_sec - t_pre.tv_sec >= 10) {
printf("Thread %d [%d] - %d allocs %d frees. Alloced blocks %u.\n", thread_id, cnt++, num_allocs, num_frees, membufs_size);
gettimeofday(&t_pre, NULL);
}
// indicate alive to main thread
*alive_flag = ALIVE_INDICATOR;
}
return NULL;
}
int main()
{
int alive_flag[NUM_THREADS];
printf("Memory allocation stress test running on %d threads.\n", NUM_THREADS);
// start a thread for each core
for (int i=0; i<NUM_THREADS; i++) {
alive_flag[i] = i; // tell each thread its ID.
pthread_t th;
int ret = pthread_create(&th, NULL, thr, &alive_flag[i]);
assert(ret == 0);
}
while(1) {
sleep(10);
// check that all threads are alive
bool ok = true;
for (int i=0; i<NUM_THREADS; i++) {
if (alive_flag[i] != ALIVE_INDICATOR)
{
printf("Thread %d is not responding\n", i);
ok = false;
}
}
assert(ok);
for (int i=0; i<NUM_THREADS; i++)
alive_flag[i] = 0;
}
return 0;
}

Memory allocation failed even when there is still enough memory

I am working on Linux (ubuntu 13.04 exactly) and currently I have a question: Why memory allocation will fail even when there is still enough memory?
I wrote a simple test application today and I encountered this issue when running this test app. Below is the code snippet I used to have the test:
#include <stdio.h>
#include <unistd.h>
#include <list>
#include <vector>
#include <strings.h>
using namespace std;
unsigned short calcrc(unsigned char *ptr, int count)
{
unsigned short crc;
unsigned char i;
//high cpu-consumption code
//implements CRC algorithm: Cylic
//Redundancy code
}
void* CreateChild(void* param){
vector<unsigned char*> MemoryVector;
pid_t PID = fork();
if (PID == 0){
const int MEMORY_TO_ALLOC = 1024 * 1024;
unsigned char* buffer = NULL;
while(1){
buffer = NULL;
try{
buffer = new unsigned char [MEMORY_TO_ALLOC]();
calcrc(buffer, MEMORY_TO_ALLOC );
MemoryVector.push_back(buffer);
} catch(...){
printf("an exception was thrown!\n");
continue;
} //try ... catch
} //while
} // if pid == 0
return NULL;
}
int main(){
int children = 4;
while(--children >= 0){
CreateChild(NULL);
};
while(1) sleep(3600);
return 0;
}
During my test, the above code starts throwing exception when there is around 220M RAM available. And from the moment on, it looks like the application is not able to get more memory any more
because the free memory shown by TOP command remains to be above 210M. So why would this happen?
UPDATE
1. Software && Hardware Information
The RAM is 4G and swap is around 9G bytes. Running "uname -a" gives: Linux steve-ThinkPad-T410 3.8.0-30-generic #44-Ubuntu SMP Thu Aug 22 20:54:42 UTC 2013 i686 i686 i686 GNU/Linux
2. Statistic Data during the Test
Right after Test App Starts Throwing Exception
steve#steve-ThinkPad-T410:~$ free
total used free shared buffers cached
Mem: 3989340 3763292 226048 0 2548 79728
-/+ buffers/cache: 3681016 308324
Swap: 9760764 9432896 327868
10 minutes after Test App Starts Throwing Exception
steve#steve-ThinkPad-T410:~$ free
total used free shared buffers cached
Mem: 3989340 3770808 218532 0 3420 80632
-/+ buffers/cache: 3686756 302584
Swap: 9760764 9436168 324596
20 minutes after Test App Starts Throwing Exception
steve#steve-ThinkPad-T410:~$ free
total used free shared buffers cached
Mem: 3989340 3770960 218380 0 4376 104716
-/+ buffers/cache: 3661868 327472
Swap: 9760764 9535700 225064
40 minutes after Test App Starts Throwing Exception
steve#steve-ThinkPad-T410:~$ free
total used free shared buffers cached
Mem: 3989340 3739168 250172 0 2272 139108
-/+ buffers/cache: 3597788 391552
Swap: 9760764 9556292 204472

May be you have no more 1MB sequential memory pages in your address space. you have free space fragmentation.

During my test, the above code starts throwing exception when there is around 220M memory. And from the moment on, it looks like the application is not able to get more memory any more because the free memory shown by TOP command remains to be above 210M. So why would this happen?
The output of top is updated every N seconds (configured), and doesn't really show the current status.
On the other hand, memory allocation is super fast.
What happens is your program eats memory, and at certain point (when top shows 200 Mb free) it starts failing.

You are running on x86-32 so your processes are 32-bit. Even with more memory + swap they will be limited by their address space, run:
grep “HIGHMEM” /boot/config-`uname -r`
grep “VMSPLIT” /boot/config-`uname -r`
to see how your kernel is configured.
Maybe your 4 child processes are limited to 3G and are using 12G + other processes ~700M to reach the numbers you are seeing.
EDIT:
So your kernel is configured to give each user-space process 3G of address-space, of which some will be taken up with the program, libraries and initial runtime memory (which will be shared due to the fork).
Therefore you have 4 children using ~3G each - ~12G + ~780M of other programs. That leaves about 220M free once the children start reporting errors.
You could just run another child process, or you could reinstall with AND64/x86-64 version of Ubuntu, where each process will be able to allocate much more memory.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js