Is there any way to easily limit a C/C++ application to a specified amount of memory (30 mb or so)? Eg: if my application tries to complete load a 50mb file into memory it will die/print a message and quit/etc.
Admittedly I can just constantly check the memory usage for the application, but it would be a bit easier if it would just die with an error if I went above.
Any ideas?
Platform isn't a big issue, windows/linux/whatever compiler.
Read the manual page for ulimit on unix systems. There is a shell builtin you can invoke before launching your executable or (in section 3 of the manual) an API call of the same name.
On Windows, you can't set a quota for memory usage of a process directly. You can, however, create a Windows job object, set the quota for the job object, and then assign the process to that job object.
Override all malloc APIs, and provide handlers for new/delete, so that you can book keep the memory usage and throw exceptions when needed.
Not sure if this is even easier/effort-saving than just do memory monitoring through OS-provided APIs.
In bash, use the ulimit builtin:
bash$ ulimit -v 30000
bash$ ./my_program
The -v takes 1K blocks.
Update:
If you want to set this from within your app, use setrlimit. Note that the man page for ulimit(3) explicitly says that it is obsolete.
You can limit the size of the virtual memory of your process using the system limits. If you process exceeds this amount, it will be killed with a signal (SIGBUS I think).
You can use something like:
#include <sys/resource.h>
#include <iostream>
using namespace std;
class RLimit {
public:
RLimit(int cmd) : mCmd(cmd) {
}
void set(rlim_t value) {
clog << "Setting " << mCmd << " to " << value << endl;
struct rlimit rlim;
rlim.rlim_cur = value;
rlim.rlim_max = value;
int ret = setrlimit(mCmd, &rlim);
if (ret) {
clog << "Error setting rlimit" << endl;
}
}
rlim_t getCurrent() {
struct rlimit rlim = {0, 0};
if (getrlimit(mCmd, &rlim)) {
clog << "Error in getrlimit" << endl;
return 0;
}
return rlim.rlim_cur;
}
rlim_t getMax() {
struct rlimit rlim = {0, 0};
if (getrlimit(mCmd, &rlim)) {
clog << "Error in getrlimit" << endl;
return 0;
}
return rlim.rlim_max;
}
private:
int mCmd;
};
And then use it like that:
RLimit dataLimit(RLIMIT_DATA);
dataLimit.set(128 * 1024 ); // in kB
clog << "soft: " << dataLimit.getCurrent() << " hard: " << dataLimit.getMax() << endl;
This implementation seems a bit verbose but it lets you easily and cleanly set different limits (see ulimit -a).
Related
This question is for:
kernel 3.10.0-1062.4.3.el7.x86_64
non transparent hugepages allocated via boot parameters and might or might not be mapped to a file (e.g. mounted hugepages)
x86_64
According to this kernel source, move_pages() will call do_pages_move() to move a page, but I don't see how it indirectly calls migrate_huge_page().
So my questions are:
can move_pages() move hugepages? if yes, should the page boundary be 4KB or 2MB when passing an array of addresses of pages? It seems like there was a patch for supporting moving hugepages 5 years ago.
if move_pages() cannot move hugepages, how can I move hugepages?
after moving hugepages, can I query the NUMA IDs of hugepages the same way I query regular pages like this answer?
According to the code below, it seems like I move hugepages via move_pages() with page size = 2MB but is it the correct way?:
#include <cstdint>
#include <iostream>
#include <numaif.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <limits>
int main(int argc, char** argv) {
const int32_t dst_node = strtoul(argv[1], nullptr, 10);
const constexpr uint64_t size = 4lu * 1024 * 1024;
const constexpr uint64_t pageSize = 2lu * 1024 * 1024;
const constexpr uint32_t nPages = size / pageSize;
int32_t status[nPages];
std::fill_n(status, nPages, std::numeric_limits<int32_t>::min());;
void* pages[nPages];
int32_t dst_nodes[nPages];
void* ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_HUGETLB, -1, 0);
if (ptr == MAP_FAILED) {
throw "failed to map hugepages";
}
memset(ptr, 0x41, nPages*pageSize);
for (uint32_t i = 0; i < nPages; i++) {
pages[i] = &((char*)ptr)[i*pageSize];
dst_nodes[i] = dst_node;
}
std::cout << "Before moving" << std::endl;
if (0 != move_pages(0, nPages, pages, nullptr, status, 0)) {
std::cout << "failed to inquiry pages because " << strerror(errno) << std::endl;
}
else {
for (uint32_t i = 0; i < nPages; i++) {
std::cout << "page # " << i << " locates at numa node " << status[i] << std::endl;
}
}
// real move
if (0 != move_pages(0, nPages, pages, dst_nodes, status, MPOL_MF_MOVE_ALL)) {
std::cout << "failed to move pages because " << strerror(errno) << std::endl;
exit(-1);
}
const constexpr uint64_t smallPageSize = 4lu * 1024;
const constexpr uint32_t nSmallPages = size / smallPageSize;
void* smallPages[nSmallPages];
int32_t smallStatus[nSmallPages] = {std::numeric_limits<int32_t>::min()};
for (uint32_t i = 0; i < nSmallPages; i++) {
smallPages[i] = &((char*)ptr)[i*smallPageSize];
}
std::cout << "after moving" << std::endl;
if (0 != move_pages(0, nSmallPages, smallPages, nullptr, smallStatus, 0)) {
std::cout << "failed to inquiry pages because " << strerror(errno) << std::endl;
}
else {
for (uint32_t i = 0; i < nSmallPages; i++) {
std::cout << "page # " << i << " locates at numa node " << smallStatus[i] << std::endl;
}
}
}
And should I query the NUMA IDs based on 4KB page size (like the code above)? Or 2MB?
For original version of 3.10 linux kernel (not redhat patched, as I have no LXR for rhel kernels) syscall move_pages will force splitting huge page (2MB; both THP and hugetlbfs styles) into small pages (4KB). move_pages uses too short chunks (around 0.5MB if I calculated correctly) and the function graph is like:
move_pages .. -> migrate_pages -> unmap_and_move ->
static int unmap_and_move(new_page_t get_new_page, unsigned long private,
struct page *page, int force, enum migrate_mode mode)
{
struct page *newpage = get_new_page(page, private, &result);
....
if (unlikely(PageTransHuge(page)))
if (unlikely(split_huge_page(page)))
goto out;
PageTransHuge returns true for both kinds of hugepages (thp and libhugetlbs):
https://elixir.bootlin.com/linux/v3.10/source/include/linux/page-flags.h#L411
PageTransHuge() returns true for both transparent huge and hugetlbfs pages, but not normal pages.
And split_huge_page will call split_huge_page_to_list which:
Split a hugepage into normal pages. This doesn't change the position of head page.
Split will also emit vm_event counter increment of kind THP_SPLIT. The counters are exported in /proc/vmstat ("file displays various virtual memory statistics"). You can check this counter with this UUOC command cat /proc/vmstat |grep thp_split before and after your test.
There were some code for hugepage migration in 3.10 version as unmap_and_move_huge_page function which is not called from move_pages. The only usage of it in 3.10 was in migrate_huge_page which is called only from memory failure handler soft_offline_huge_page (__soft_offline_page) (added 2010):
Soft offline a page, by migration or invalidation,
without killing anything. This is for the case when
a page is not corrupted yet (so it's still valid to access),
but has had a number of corrected errors and is better taken
out.
Answers:
can move_pages() move hugepages? if yes, should the page boundary be 4KB or 2MB when passing an array of addresses of pages? It seems like there was a patch for supporting moving hugepages 5 years ago.
Standard 3.10 kernel have move_pages which will accept array "pages" of 4KB page pointers and it will break (split) huge page into 512 small pages and then it will migrate small pages. There are very low chances for them to be merged back by thp as move_pages does separate requests for physical memory pages and they almost always will be non-continuous.
Don't give pointers to "2MB", it will still split every huge page mentioned and migrate only first 4KB small page of this memory.
2013 patch was not added into original 3.10 kernel.
v2 https://lwn.net/Articles/544044/ "extend hugepage migration" (3.9);
v3 https://lwn.net/Articles/559575/ (3.11)
v4 https://lore.kernel.org/patchwork/cover/395020/ (click on Related to get access to individual patches, for example move_pages patch)
The patch seems to be accepted in September 2013: https://github.com/torvalds/linux/search?q=+extend+hugepage+migration&type=Commits
if move_pages() cannot move hugepages, how can I move hugepages?
move_pages will move data from hugepages as small pages. You can: allocate huge page in manual mode at correct numa node and copy your data (copy twice if you want to keep virtual address); or update kernel to some version with the patch and use methods and tests of patch author, Naoya Horiguchi (JP). There is copy of his tests: https://github.com/srikanth007m/test_hugepage_migration_extension
(https://github.com/Naoya-Horiguchi/test_core is required)
https://github.com/srikanth007m/test_hugepage_migration_extension/blob/master/test_move_pages.c
Now I'm not sure how to start the test and how to check that it works correctly. For ./test_move_pages -v -m private -h 2048 runs with recent kernel it does not increment THP_SPLIT counter.
His test looks very similar to our tests: mmap, memset to fault pages, filling pages array with pointers to small pages, numa_move_pages
after moving hugepages, can I query the NUMA IDs of hugepages the same way I query regular pages like this answer?
You can query status of any memory by providing correct array "pages" to move_pages syscall in query mode (with null nodes). Array should list every small page of the memory region you want to check.
If you know any reliable method to check if the memory mapped to huge page or not, you can query any small page of huge page. I think that there can be probabilistic method if you can export physical address out of kernel to the user-space (using some LKM module for example): for huge page virtual and physical addresses will always have 21 common LSB bits, and for small pages bits will coincide only for 1 test in million. Or just write LKM to export PMD Directory.
I would like to find out the amount of bytes used by a process from within a C++ program by inspecting the operating system's memory information. The reason I would like to do this is to find a possible overhead in memory allocation when allocating memory (due to memory control blocks/nodes in free lists etc.) Currently I am on mac and am using this code:
#include <mach/mach.h>
#include <iostream>
int getResidentMemoryUsage() {
task_basic_info t_info;
mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;
if (task_info(mach_task_self(), TASK_BASIC_INFO,
reinterpret_cast<task_info_t>(&t_info),
&t_info_count) == KERN_SUCCESS) {
return t_info.resident_size;
}
return -1;
}
int getVirtualMemoryUsage() {
task_basic_info t_info;
mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;
if (task_info(mach_task_self(), TASK_BASIC_INFO,
reinterpret_cast<task_info_t>(&t_info),
&t_info_count) == KERN_SUCCESS) {
return t_info.virtual_size;
}
return -1;
}
int main(void) {
int virtualMemoryBefore = getVirtualMemoryUsage();
int residentMemoryBefore = getResidentMemoryUsage();
int* a = new int(5);
int virtualMemoryAfter = getVirtualMemoryUsage();
int residentMemoryAfter = getResidentMemoryUsage();
std::cout << virtualMemoryBefore << " " << virtualMemoryAfter << std::endl;
std::cout << residentMemoryBefore << " " << residentMemoryAfter << std::endl;
return 0;
}
When running this code I would have expected to see that the memory usage has increased after allocating an int. However when I run the above code I get the following output:
75190272 75190272
819200 819200
I have several questions because this output does not make any sense.
Why hasn't either the virtual/resident memory changed after an integer has been allocated?
How come the operating system is allocating such large amounts of memory to a running process.
When I do run the code and check activity monitor I find that 304 kb of memory is used but that number differs from the virtual/resident memory usage obtained programmatically.
My end goal is to be able to find the memory overhead when assigning data, so is there a way to do this (i.e. determine the bytes used by the OS and compare with the bytes allocated to find the difference is what I am currently thinking of)
Thank you for reading
The C++ runtime typically allocates a block of memory when a program starts up, and then parcels this out to your code when you use things like new, and adds it back to the block when you call delete. Hence, the operating system doesn't know anything about individual new or delete calls. This is also true for malloc and free in C (or C++)
First you measure number of pages, not really memory allocated. Second the runtime pre allocates few pages at startup. If you want to observe something allocate more than a single int. Try allocating several thousands and you will observe some changes.
I'm a student, and my Operating Systems class project has a little snag, which is admittedly a bit superfluous to the assignment specifications itself:
While I can push 1 million deques into my deque of deques, I cannot push ~10 million or more.
Now, in the actual program, there is lots of stuff going on, and the only thing already asked on Stack Overflow with even the slightest relevance had exactly that, only slight relevance. https://stackoverflow.com/a/11308962/3407808
Since that answer had focused on "other functions corrupting the heap", I isolated the code into a new project and ran that separately, and found everything to fail in exactly the same ways.
Here's the code itself, stripped down and renamed for the sake of space.
#include <iostream>
#include <string>
#include <sstream>
#include <deque>
using namespace std;
class cat
{
cat();
};
bool number_range(int lower, int upper, double value)
{
while(true)
{
if(value >= lower && value <= upper)
{
return true;
}
else
{
cin.clear();
cerr << "Value not between " << lower << " and " << upper << ".\n";
return false;
}
}
}
double get_double(char *message, int lower, int upper)
{
double out;
string in;
while(true) {
cout << message << " ";
getline(cin,in);
stringstream ss(in); //convert input to stream for conversion to double
if(ss >> out && !(ss >> in))
{
if (number_range(lower, upper, out))
{
return out;
}
}
//(ss >> out) checks for valid conversion to double
//!(ss >> in) checks for unconverted input and rejects it
cin.clear();
cerr << "Value not between " << lower << " and " << upper << ".\n";
}
}
int main()
{
int dq_amount = 0;
deque<deque <cat> > dq_array;
deque<cat> dq;
do {
dq_amount = get_double("INPUT # OF DEQUES: ", 0, 99999999);
for (int i = 0; i < number_of_printers; i++)
{
dq_array.push_back(dq);
}
} while (!number_range(0, 99999999, dq_amount));
}
In case that's a little obfuscated, the design (just in case it's related to the error) is that my program asks for you to input an integer value. It takes your input and verifies that it can be read as an integer, and then further parses it to ensure it is within certain numerical bounds. Once it's found within bounds, I push deques of myClass into a deque of deques of myClass, for the amount of times equal to the user's input.
This code has been working for the past few weeks that I've being making this project, but my upper bound had always been 9999, and I decided to standardize it with most of the other inputs in my program, which is an appreciably large 99,999,999. Trying to run this code with 9999 as the user input works fine, even with 99999999 as the upper bound. The issue is a runtime error that happens if the user input is 9999999+.
Is there any particular, clear reason why this doesn't work?
Oh, right, the error message itself from Code::Blocks 13.12:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's supprt team for more information.
Process returned 3 (0x3) execution time : 12.559 s
Press any key to continue.
I had screenshots, but I need to be 10+ reputation in order to put images into my questions.
This looks like address space exhaustion.
If you are compiling for a 32-bit target, you will generally be limited to 2 GiB of user-mode accessible address space per process, or maybe 3 GiB on some platforms. (The remainder is reserved for kernel-mode mappings shared between processes)
If you are running on a 64-bit platform and build a 64-bit binary, you should be able to do substantially more new/alloc() calls, but be advised you may start hitting swap.
Alternatively, you might be hitting a resource quota even if you are building a 64-bit binary. On Linux you can check ulimit -d to see if you have a per-process memory limit.
As far as I can tell, calling malloc() basically means the program is asking the OS for a hunk of memory. I'm writing a program to interface with a camera, in which I need to allocate chucks of memory large enough to store hundreds of images at a time (its a fast camera).
When I allocate space for about 1.9 Gb worth of images, everything works just fine. The allocation calculation is pretty simple:
int allocateBurst( int numImages )
{
int streamSize = ZIMAGESIZE * numImages;
data.images = new unsigned short [streamSize];
return 0;
}
But as soon as I go over the 2 Gb limit, I get runtime errors like this:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
It seems like 2 Gigs might be the maximum size that I can allocate at once. I have 32 Gigs of ram, and would like to simply be able to allocate larger pieces of memory in one allocation. Is this possible?
I'm running Ubuntu 12.10.
There may be an underlying issue that the OS can't grant your large memory allocation because it is using memory for other applications. Check with your OS to see what the limits are.
Also know that some OS's will "page" memory to the hard disk. When your program asks for memory outside the page, the OS will swap pages with the hard disk. Knowing this, I recommend a classic technique of "Double Buffering" or "Multiple Buffering".
You will need at least two threads: reading and writing. One thread is responsible for reading data from the camera and placing into a buffer. When it fills up a buffer, it starts on another buffer. Meanwhile the writing thread is starting at the buffer and writing it to disk (block file writes). When the writing thread finishes a buffer, it starts on the next one. The buffers should be in a circular sequence to reuse them.
The magic is to have enough buffers so that the reader never catches up to the writer.
Since you are using a couple of small buffers, you should not get any errors from the OS.
The are methods to optimize this, such as obtaining static buffers from the OS.
The problem is you're using a signed 32-bit variable to describe an unsigned 64-bit number.
Use "size_t" instead of "int" for holding the storage count. This has nothing to do with what you intend to store, just how large a count of them you need.
#include <iostream>
int main(int /*argc*/, const char** /*argv*/)
{
int units = 2;
// 32-bit signed, i.e. 31-bit numbers.
int intSize = units * 1024 * 1024 * 1024;
// 64-bit values (ULL suffix)
size_t sizetSize = units * 1024ULL * 1024ULL * 1024ULL;
std::cout << "intSize = " << intSize << ", sizetSize = " << sizetSize << std::endl;
try {
unsigned short* intAlloc = new unsigned short[intSize];
std::cout << "intAlloc = " << intAlloc << std::endl;
delete [] intAlloc;
} catch (std::bad_alloc) {
std::cout << "intAlloc failed (std::bad_alloc)" << std::endl;
}
try {
unsigned short* sizetAlloc = new unsigned short[sizetSize];
std::cout << "sizetAlloc = " << sizetAlloc << std::endl;
delete [] sizetAlloc;
} catch (std::bad_alloc) {
std::cout << "sizetAlloc failed (std::bad_alloc)" << std::endl;
}
return 0;
}
Output (g++ -m64 -o test test.cpp under Mint 15 64 bit with g++ 4.7.3 on a virtual machine with 4Gb of memory)
intSize = -2147483648, sizetSize = 2147483648
intAlloc failed
sizetAlloc = 0x7f55affff010
int allocateBurst( int numImages )
{
// change that from int to long
long streamSize = ZIMAGESIZE * numImages;
data.images = new unsigned short [streamSize];
return 0;
}
Try using
long
OR
cast the result of the allocateBurst function to "uint_64" and the return type of the function to uint_64
Because int you allocate 32 bit allocation while long or uint_64 allocates 64 bit allocation which could possibly allocate more memory space for you.
Hope that helps
Maybe someone on here can shed some light on why this is the case or what other calls to use for walking the heap for debugging purposes in an application. If it turns out that there is no way of getting Page Heap and / or the Application Verifier to work with an application that uses HeapWalk() and there is no replacement - at least I can stop looking for one :)
So a confirmation from people with more experience on the matter that these will never play together would be appreciated.
Nothing in the documentation on HeapWalk() I could find mentions problems with Page Heap / Application Verifier or suggests any replacement for this use case.
Considering the small sample code below:
If the code is executed "as is" it will work as expected. If I turn on either Page Heap or Application Verifier (or both) for it the call to HeapWalk() fails. The error code returned by GetLastError() in this case is always 0x00000001 which regarding to the Microsoft documentation is "ERROR_INVALID_FUNCTION".
Underlying motivation for the question:
Track down heap corruption and potential memory leaks in a legacy application with custom heap management. Running with Page Heap and / or Application Verifier was intended to help with that.
Thanks for reading this and any comments that could shed some light unto the issue are welcome.
#include <Windows.h>
#include <iostream>
#include <sstream>
#include <iomanip>
#include <string>
size_t DummyHeapWalk(void* heapHandle)
{
size_t commitSize = 0;
PROCESS_HEAP_ENTRY processHeapEntry;
processHeapEntry.lpData = 0;
HeapLock(heapHandle);
if (HeapWalk(heapHandle, &processHeapEntry))
{
commitSize = processHeapEntry.Region.dwCommittedSize;
}
else
{
int lastError = GetLastError();
std::ostringstream errorMsg;
errorMsg << "DummyHeapWalk failed on heapHandle [0x" << std::setfill('0') << std::setw(16) << std::hex << (size_t)(heapHandle) << "] last error: " << lastError << std::endl;
OutputDebugString(errorMsg.str().c_str());
DebugBreak();
}
HeapUnlock(heapHandle);
return commitSize;
}
int main(void)
{
HANDLE myProcess = GetProcessHeap();
std::cout << "Process Heap Commit size: " << DummyHeapWalk(myProcess) << std::endl;
return 0;
}