How to alloc largest available memory on different computer? - c++

If I need a program to r/w data larger then 1T randomly, a simplest way is putting all data into memory. For the PC with 2G memory, we can nevertheless work by doing a lot of i/o. Different computers have different size memory, so how to alloc suitable memory on computers from 2G to 2T memory in one program?
I thought of popen /proc/meminfo and alloc MemFree but I think it maybe have a better way.
note:
use Linux, but other OS is welcome
avoid being OOM killed as well as possible (without root)
disk i/o as less as possible
use multiprocessing
c or c++ answer is fine

You can use the GNU extension get_avphys_pages() from glibc
The get_phys_pages function returns the number of available pages of physical the system has. To get the amount of memory this number has to be multiplied by the page size.
Sample code:
#include <unistd.h>
#include <sys/sysinfo.h>
#include <stdio.h>
int main() {
long int pagesize = getpagesize();
long int avail_pages = get_avphys_pages();
long int avail_bytes = avail_pages * pagesize;
printf( "Page size:%ld Pages:%ld Bytes:%ld\n",
pagesize, avail_pages, avail_bytes );
return 0;
}
Result Godbolt
Program returned: 0
Page size:4096 Pages:39321 Bytes:161058816
This is the amount of PHYSICAL memory in your box so:
The true available memory can be much higher if the process pages in/out
The physical memory is a maximum as there would be other processes using memory too.
So treat that result as an estimate upper bound for available DDR.
If you plan to allocate large chunks of memory use mmap() directly as malloc() would be too high level for this usage.

Related

Is it possible to allocate large amount of virtual memory in linux?

It would be efficient for some purposes to allocate a huge amount of virtual space, and page in only pages that are accessed. Allocating a large amount of memory is instantaneous and does not actually grab pages:
char* p = new char[1024*1024*1024*256];
Ok, the above was wrong as pointed out because it's a 32 bit number.
I expect that new is calling malloc which calls sbrk, and that when I access a location 4Gb beyond the start, it tries to extend the task memory by that much?
Here is the full program:
#include <cstdint>
int main() {
constexpr uint64_t GB = 1ULL << 30;
char* p = new char[256*GB]; // allocate large block of virtual space
p[0] = 1;
p[1000000000] = 1;
p[2000000000] = 1;
}
Now, I get bad_alloc when attempting to allocate the huge amount, so obviously malloc won't work.
I was under the impression that mmap would map to files, but since this is suggested I am looking into it.
Ok, so mmap seems to support allocation of big areas of virtual memory, but it requires a file descriptor. Creating huge in-memory data structures could be a win but not if they have to be backed by a file:
The following code uses mmap even though I don't like the idea of attaching to a file. I did not know what number to put in to request in virtual memory, and picked 0x800000000. mmap returns -1, so obviously I'm doing something wrong:
#include <cstdint>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
int main() {
constexpr uint64_t GB = 1ULL << 30;
void *addr = (void*)0x8000000000ULL;
int fd = creat("garbagefile.dat", 0660);
char* p = (char*)mmap(addr, 256*GB, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
p[0] = 1;
p[1000000000] = 1;
p[2000000000] = 1;
close(fd);
}
Is there any way to allocate a big chunk of virtual memory and access pages sparsely, or is this not doable?
Is it possible to allocate large amount of virtual memory in linux?
Possibly. But you may need to configure it to be allowed:
The Linux kernel supports the following overcommit handling modes
0 - Heuristic overcommit handling. Obvious overcommits of address
space are refused. Used for a typical system. It ensures a seriously
wild allocation fails while allowing overcommit to reduce swap
usage. root is allowed to allocate slightly more memory in this
mode. This is the default.
1 - Always overcommit. Appropriate for some scientific applications.
Classic example is code using sparse arrays and just relying on the
virtual memory consisting almost entirely of zero pages.
2 - Don't overcommit. The total address space commit for the system
is not permitted to exceed swap + a configurable amount (default is
50%) of physical RAM. Depending on the amount you use, in most
situations this means a process will not be killed while accessing
pages but will receive errors on memory allocation as appropriate.
Useful for applications that want to guarantee their memory
allocations will be available in the future without having to
initialize every page.
The overcommit policy is set via the sysctl `vm.overcommit_memory'.
So, if you want to allocate more virtual memory than you have physical memory, then you'd want:
# in shell
sysctl -w vm.overcommit_memory=1
RLIMIT_AS The maximum size of the process's virtual memory (address space) in bytes. This limit affects calls to brk(2), mmap(2) and mremap(2), which fail with the error ENOMEM upon exceeding this limit. Also automatic stack expansion will fail (and generate a SIGSEGV that kills the process if no alternate stack has been made available via sigaltstack(2)). Since the value is a long, on machines with a 32-bit long either this limit is at most 2 GiB, or this resource is unlimited.
So, you'd want:
setrlimit(RLIMIT_AS, {
.rlim_cur = RLIM_INFINITY,
.rlim_max = RLIM_INFINITY,
});
Or, if you cannot give the process permission to do this, then you can configure this persistently in /etc/security/limits.conf which will affect all processes (of a user/group).
Ok, so mmap seems to support ... but it requires a file descriptor. ... could be a win but not if they have to be backed by a file ... I don't like the idea of attaching to a file
You don't need to use a file backed mmap. There's MAP_ANONYMOUS for that.
I did not know what number to put in to request
Then use null. Example:
mmap(nullptr, 256*GB, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
That said, if you've configured the system as described, then new should work just as well as mmap. It'll probably use malloc which will probably use mmap for large allocations like this.
Bonus hint: You may benefit from taking advantage of using HugeTLB Pages.
The value of 256*GB does not fit into a range of 32-bit integer type. Try uint64_t as a type of GB:
constexpr uint64_t GB = 1024*1024*1024;
or, alternatively, force 64-bit multiplication:
char* p = new char[256ULL * GB];
OT: I would prefer this definition of GB:
constexpr uint64_t GB = 1ULL << 30;
As for the virtual memory limit, see this answer.

How to query amount of allocated memory on Linux (and OSX)?

While this might look like a duplicate from other questions, let me explain why it's not.
I am looking to get a specific part of my application to degrade gracefully when a certain memory limit has been reached. I could have used criteria based on remaining available physical memory, but this wouldn't be safe, because the OS could start paging out memory used by my application before reaching the criteria, which would think there is still some physical memory left, and keep allocating, etc. For the same reason, I can't used the amount of physical memory currently used by the process, because as soon as the OS would start swapping me out, I would keep allocating as the OS pages memory so the number would not grow anymore.
For this reason, I chose a criteria based on the amount of memory allocated by my application, i.e. very close to virtual memory size.
This question (How to determine CPU and memory consumption from inside a process?) provides great ways of querying the amount of virtual memory used by the current process, which I THOUGHT was what I needed.
On Windows, I'm using GetProcessMemoryInfo() and the PrivateUsage field, which works great.
On Linux, I tried several things (listed below) that did not work. The reason why virtual memory usage does not work for me is because of something that happens with OpenCL context creation on NVidia hardware on Linux. The driver reserves a region of the virtual memory space big enough to hold all RAM, all swap and all video memory. My guess is it does so for unified address space and everything. But it also means that the process reports using enormous amounts of memory. On my system for instance, top will report 23.3 Gb in the VIRT column (12 Gb of RAM, 6 Gb of swap, 2 Gb of video memory, which gives 20 Gb reserved by the NVidia driver).
On OSX, by using task_info() and the virtual_size field, I also get a bigger than expected number (a few Gb for an app that takes not even close to 1 Gb on Windows), but not as big as Linux.
So here is the big question: how can I get the amount of memory allocated by my application? I know that this is a somewhat vague question (what does "allocated memory" means?), but I'm flexible:
I would prefer to include the application static data, code section and everything, but I can live without.
I would prefer to include the memory allocated for stacks, but I can live without.
I would prefer to include the memory used by shared libraries, but I can live without.
I don't really care for mmap stuff, I can do with or without at that point.
Etc.
What is really important is that the number grows with dynamic allocation (new, malloc, anything) and shrinks when the memory is released (which I know can be implementation-dependent).
Things I have tried
Here are a couple of solutions I have tried and/or thought of but that would not work for me.
Read from /proc/self/status
This is the approach suggested by how-to-determine-cpu-and-memory-consumption-from-inside-a-process. However, as stated above, this returns the amount of virtual memory, which does not work for me.
Read from /proc/self/statm
Very slightly worst: according to http://kernelnewbies.kernelnewbies.narkive.com/iG9xCmwB/proc-pid-statm-doesnt-match-with-status, which refers to Linux kernel code, the only difference between those two values is that the second one does not substract reserved_vm to the amount of virtual memory. I would have HOPED that reserved_vm would include the memory reserved by the OpenCL driver, but it does not.
Use mallinfo() and the uordblks field
This does not seem to include all the allocations (I'm guessing the news are missing), since for an +2Gb growth in virtual memory space (after doing some memory-heavy work and still holding the memory), I'm only seeing about 0.1Gb growth in the number returned by mallinfo().
Read the [heap] section size from /proc/self/smaps
This value started at around 336,760 Kb and peaked at 1,019,496 Kb for work that grew virtual memory space by +2Gb, and then it never gets down, so I'm not sure I can't really rely on this number...
Monitor all memory allocations in my application
Yes, in an ideal world, I would have control over everybody who allocates memory. However, this is a legacy application, using tons of different allocators, some mallocs, some news, some OS-specific routines, etc. There are some plug-ins that could do whatever they want, they could be compiled with a different compiler, etc. So while this would be great to really control memory, this does not work in my context.
Read the virtual memory size before and after the OpenCL context initialization
While this could be a "hacky" way to solve the problem (and I might have to fallback to it), I would really wish for a more reliable way to query memory, because OpenCL context could be initialized somewhere out of my control, and other similar but non-OpenCL specific issues could creep in and I wouldn't know about it.
So that's pretty much all I've got. There is one more thing I have not tried yet, because it only works on OSX, but it is to use the approach described in Why does mstats and malloc_zone_statistics not show recovered memory after free?, i.e. use malloc_get_all_zones() and malloc_zone_statistics(), but I think this might be the same problem as mallinfo(), i.e. not take all allocations into account.
So, can anyone suggest a way to query memory usage (as vague of a term as this is, see above for precision) of a given process in Linux (and also OSX even if it's a different method)?
You can try and use information returned by getrusage():
#include <sys/time.h>
#include <sys/resource.h>
int getrusage(int who, struct rusage *usage);
struct rusage {
struct timeval ru_utime; /* user CPU time used */
struct timeval ru_stime; /* system CPU time used */
long ru_maxrss; /* maximum resident set size */
long ru_ixrss; /* integral shared memory size */
long ru_idrss; /* integral unshared data size */
long ru_isrss; /* integral unshared stack size */
long ru_minflt; /* page reclaims (soft page faults) */
long ru_majflt; /* page faults (hard page faults) */
long ru_nswap; /* swaps */
long ru_inblock; /* block input operations */
long ru_oublock; /* block output operations */
long ru_msgsnd; /* IPC messages sent */
long ru_msgrcv; /* IPC messages received */
long ru_nsignals; /* signals received */
long ru_nvcsw; /* voluntary context switches */
long ru_nivcsw; /* involuntary context switches */
};
If the memory information does not fit you purpose, observing the page fault counts can help monitor memory stress, which is what you intend to detect.
Have you tried a shared library interposer for Linux for section (5) above? So long as your application is not statically linking the malloc functions, you can interpose a new function between your program and the kernel malloc. I've used this tactic many times to collect stats on memory usage.
It does required setting LD_PRELOAD before running the program but no source or binary changes. It is an ideal answer in many cases.
Here is an example of a malloc interposer:
http://www.drdobbs.com/building-library-interposers-for-fun-and/184404926
You probably will also want to do calloc and free. Calls to new generally end up as a call to malloc so C++ is covered as well.
OS X seems to have similar capabilities but I have not tried it.
http://tlrobinson.net/blog/2007/12/overriding-library-functions-in-mac-os-x-the-easy-way-dyld_insert_libraries/
--Matt
Here is what I ended up using. I scan /proc/self/maps and sum the size of all the address ranges meeting my criteria, which is:
Only include ranges from inode 0 (i.e. no devices, no mapped file, etc.)
Only include ranges that are at least one of readable, writable or executable
Only include private memory
In my experiments I did not see instances of shared memory from inode 0. Maybe with inter-process shared memory...?
Here is the code for my solution:
size_t getValue()
{
FILE* file = fopen("/proc/self/maps", "r");
if (!file)
{
assert(0);
return 0;
}
size_t value = 0;
char line[1024];
while (fgets(line, 1024, file) != NULL)
{
ptrdiff_t start_address, end_address;
char perms[4];
ptrdiff_t offset;
int dev_major, dev_minor;
unsigned long int inode;
const int nb_scanned = sscanf(
line, "%16tx-%16tx %c%c%c%c %16tx %02x:%02x %lu",
&start_address, &end_address,
&perms[0], &perms[1], &perms[2], &perms[3],
&offset, &dev_major, &dev_minor, &inode
);
if (10 != nb_scanned)
{
assert(0);
continue;
}
if ((inode == 0) &&
(perms[0] != '-' || perms[1] != '-' || perms[2] != '-') &&
(perms[3] == 'p'))
{
assert(dev_major == 0);
assert(dev_minor == 0);
value += (end_address - start_address);
}
}
fclose(file);
return value;
}
Since this is looping through all the lines in /proc/self/maps, querying memory that way is significantly slower than using "Virtual Memory currently used by current process" from How to determine CPU and memory consumption from inside a process?.
However, it provides an answer much closer to what I need.

Allocate several GBs of memory for std::vector

I need to acquire several GB of data from a sensor. When I tried to allocate a big array with malloc (10 or more GB. My system has 32GB) it returns NULL. So I thought the problem could be solved with a linked list of iterators to vectors.
However I don't know how to set this up. I tried declaring "
list< vector::iterator >" but I can't allocate the memory for each vector (e/o should have 1000~2000 elements). Do you know any way to do this or maybe a better solution for this big memory allocation?
If you are using a 64-bit operating system, then malloc should be able to allocate the large size with no problem.
For example, this code runs on my windows machine (64-bit windows) and allocates 10GB of ram flawlessly:
#include <stdio.h>
#include <malloc.h>
int main(int argc, char *argv[]) {
long int size = 10L * 1024 * 1024 * 1024;
printf("size = %ld\n", size);
char *x = (char *)malloc(size);
printf("x = 0x%lx\n", x);
long int i;
for (i = 0; i < size; i += 1024*1024) {
x[i] = 'h';
}
printf("Done1\n");
}
However, if you have a 32-bit operating system, you'll be in trouble, and can't allocate over some limit (maybe 3 GB, but probably system dependent)
In that case, you'll need to write your data to a file instead.
However, if you're using a fat filesystem, then you can't write to a file that big either. In that case, you'd have to split the data among many files under 2gb in size.
You'd want to actually check the malloc result for NULL to make sure the malloc works and memory could be grabbed.
You will need to allocation this space under Windows 64 bit OS. You will ALSO have to set "large address space aware" flag, otherwise you can only get 2 GB of RAM due to how the virtual memory system works on Windows.
You may want to look into using a memory mapped file, as suggested by sehe in his answer if you do not absolutely have to have one large 10 GB chunk of continuous memory. If you have to build your application for Windows 32 bit, then this will be the only answer, as Windows 32 bit normally only allows for 2 GB of memory, unless the option is set for "large address space aware" flag, at which point it will allow 3 GB of memory usage.
When you have to deal with large blocks of memory, you are better off skipping malloc altogether and going directly to the operating system calls for memory allocation.
I usually move to memory mapped files or shared memory maps for this kind of data volumes.
This way, you're not bound to the amount of physical (process) memory available at all. You can let the OS page in and out as required. Fragmentation becomes much less of an issue (unless you actually fragment the logical address space, which is quite hard to achieve on 64 bit architectures).
More information
I have quite a number of answers on SO that show examples of storing vectors and all manner of more complicated data structures in shared memory/mapped files. You might want to look for mapped_file_device (from Boost Iostreams) or managed_shared_memory and managed_mapped_file (from Boost Interprocess)

Growing memory during assignment of large array

When assigning values to a large array the used memory keeps increasing even though no new memory is allocated. I am checking the used memory simply by the task manager (windows) or system monitor (Ubuntu).
The Problem is the same on both OS. I am using gcc 4.7 or 4.6 respectively.
This is my code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int i,j;
int n=40000000; //array size
int s=100;
double *array;
array=malloc(n*sizeof(double)); //allocate array
if(array==NULL){
return -1;
}
for(i=0;i<n;i++){ //loop for array, memory increases during this loop
for(j=0;j<s;j++){ //loop to slow down the program
array[i] = 3.0;
}
}
return 0;
}
I do not see any logical Problem, but to my knowledge I do not exceed any system limits either. So my questions are:
can the problem be reproduced by others?
what is the reason for the growing memory?
how do I solve this issue?
When modern systems 'allocate' memory, the pages are not actually allocated within physical RAM. You will get a virtual memory allocation. As you write to those pages, a physical page will be taken. So the virtual RAM taken will be increased when you do the malloc(), but only when you write the value in will the physical RAM be taken (on a page by page basis).
You should see the virtual memory used increase immediately. After that the RSS, or real memory used will increment as you write into the newly allocated memory. More information at How to measure actual memory usage of an application or process?
This is because memory allocated in Linux and on many other operating systems, isn't actually given to your program until you use it.
So you could malloc 1 GB on a 256 MB machine, and not run out of memory until you actually tried to use all 1 GB.
In Linux there is a group of overcommit settings which changes this behavior. See Cent OS: How do I turn off or reduce memory overcommitment, and is it safe to do it?

Allocating more memory than there exists using malloc

This code snippet will allocate 2Gb every time it reads the letter 'u' from stdin, and will initialize all the allocated chars once it reads 'a'.
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <vector>
#define bytes 2147483648
using namespace std;
int main()
{
char input [1];
vector<char *> activate;
while(input[0] != 'q')
{
gets (input);
if(input[0] == 'u')
{
char *m = (char*)malloc(bytes);
if(m == NULL) cout << "cant allocate mem" << endl;
else cout << "ok" << endl;
activate.push_back(m);
}
else if(input[0] == 'a')
{
for(int x = 0; x < activate.size(); x++)
{
char *m;
m = activate[x];
for(unsigned x = 0; x < bytes; x++)
{
m[x] = 'a';
}
}
}
}
return 0;
}
I am running this code on a linux virtual machine that has 3Gb of ram. While monitoring the system resource usage using the htop tool, I have realized that the malloc operation is not reflected on the resources.
For example when I input 'u' only once(i.e. allocate 2GB of heap memory), I don't see the memory usage increasing by 2GB in htop. It is only when I input 'a'(i.e. initialize), I see the memory usage increasing.
As a consequence, I am able to "malloc" more heap memory than there exists. For example, I can malloc 6GB(which is more than my ram and swap memory) and malloc would allow it(i.e. NULL is not returned by malloc). But when I try to initialize the allocated memory, I can see the memory and swap memory filling up till the process is killed.
-My questions:
1.Is this a kernel bug?
2.Can someone explain to me why this behavior is allowed?
It is called memory overcommit. You can disable it by running as root:
echo 2 > /proc/sys/vm/overcommit_memory
and it is not a kernel feature that I like (so I always disable it). See malloc(3) and mmap(2) and proc(5)
NB: echo 0 instead of echo 2 often -but not always- works also. Read the docs (in particular proc man page that I just linked to).
from man malloc (online here):
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available.
So when you just want to allocate too much, it "lies" to you, when you want to use the allocated memory, it will try to find enough memory for you and it might crash if it can't find enough memory.
No, this is not a kernel bug. You have discovered something known as late paging (or overcommit).
Until you write a byte to the address allocated with malloc (...) the kernel does little more than "reserve" the address range. This really depends on the implementation of your memory allocator and operating system of course, but most good ones do not incur the majority of kernel overhead until the memory is first used.
The hoard allocator is one big offender that comes to mind immediately, through extensive testing I have found it almost never takes advantage of a kernel that supports late paging. You can always mitigate the effects of late paging in any allocator if you zero-fill the entire memory range immediately after allocation.
Real-time operating systems like VxWorks will never allow this behavior because late paging introduces serious latency. Technically, all it does is put the latency off until a later indeterminate time.
For a more detailed discussion, you may be interested to see how IBM's AIX operating system handles page allocation and overcommitment.
This is a result of what Basile mentioned, over commit memory. However, the explanation kind of interesting.
Basically when you attempt to map additional memory in Linux (POSIX?), the kernel will just reserve it, and will only actually end up using it if your application accesses one of the reserved pages. This allows multiple applications to reserve more than the actual total amount of ram / swap.
This is desirable behavior on most Linux environments unless you've got a real-time OS or something where you know exactly who will need what resources, when and why.
Otherwise somebody could come along, malloc up all the ram (without actually doing anything with it) and OOM your apps.
Another example of this lazy allocation is mmap(), where you have a virtual map that the file you're mapping can fit inside - but you only have a small amount of real memory dedicated to the effort. This allows you to mmap() huge files (larger than your available RAM), and use them like normal file handles which is nifty)
-n
Initializing / working with the memory should work:
memset(m, 0, bytes);
Also you could use calloc that not only allocates memory but also fills it with zeros for you:
char* m = (char*) calloc(1, bytes);
1.Is this a kernel bug?
No.
2.Can someone explain to me why this behavior is allowed?
There are a few reasons:
Mitigate need to know eventual memory requirement - it's often convenient to have an application be able to an amount of memory that it considers an upper limit on the need it might actually have. For example, if it's preparing some kind of report either of an initial pass just to calculate the eventual size of the report or a realloc() of successively larger areas (with the risk of having to copy) may significantly complicate the code and hurt performance, where-as multiplying some maximum length of each entry by the number of entries could be very quick and easy. If you know virtual memory is relatively plentiful as far as your application's needs are concerned, then making a larger allocation of virtual address space is very cheap.
Sparse data - if you have the virtual address space spare, being able to have a sparse array and use direct indexing, or allocate a hash table with generous capacity() to size() ratio, can lead to a very high performance system. Both work best (in the sense of having low overheads/waste and efficient use of memory caches) when the data element size is a multiple of the memory paging size, or failing that much larger or a small integral fraction thereof.
Resource sharing - consider an ISP offering a "1 giga-bit per second" connection to 1000 consumers in a building - they know that if all the consumers use it simultaneously they'll get about 1 mega-bit, but rely on their real-world experience that, though people ask for 1 giga-bit and want a good fraction of it at specific times, there's inevitably some lower maximum and much lower average for concurrent usage. The same insight applied to memory allows operating systems to support more applications than they otherwise would, with reasonable average success at satisfying expectations. Much as the shared Internet connection degrades in speed as more users make simultaneous demands, paging from swap memory on disk may kick in and reduce performance. But unlike an internet connection, there's a limit to the swap memory, and if all the apps really do try to use the memory concurrently such that that limit's exceeded, some will start getting signals/interrupts/traps reporting memory exhaustion. Summarily, with this memory overcommit behaviour enabled, simply checking malloc()/new returned a non-NULL pointer is not sufficient to guarantee the physical memory is actually available, and the program may still receive a signal later as it attempts to use the memory.