I am trying to make an unsigned char array in c++ that is ~ 4 gigabytes in size.
The code I am using to malloc the space for the array is below:
unsigned char *myArray= (unsigned char*)malloc(sizeof(char)*3774873600);
if(myArray==NULL)
{
cout << "Error! memory could not be allocated. \n";
}else{
cout << "You allocated memory for myArray \n";
}
When I run the program I get the success message saying that the memory was allocated. Then when I run:
myArray[0] = 20;
cout << myArray[0];
I get 20 as the answer.
However if I run:
myArray[3774873599] = 20;
cout << myArray[3774873599];
The program crashes.
I was thinking it is probably because the malloc call is asking for too much continuous memory in 1 call (4gb).
Perhaps it would be better to split the malloc call in 2 parts, and then join them together as 1 continuous array. Would that be possible?
Also in case you are wondering my computer has 64 gb of memory on a 64bit OS, and the program is compiled as 64bit, so I don't think it's at it's limits or anything.
Any help would be much appreciated!
Related
I was looking to see how many elements I can stick into a vector before the program crashes. When running the code below the program crashed with a bad alloc at i=90811045, aka when trying to add the 90811045th element. My question is: Why 90811045?
it is:
not a power of two
not the value that vector.max_size() gives
the same number both in debug and release
the same number after restarting my computer
the same number regardless of what the value of the long long is
note: I know I can fix this by using vector.reserve() or other methods, I am just interested in where 90811045 comes from.
code used:
#include <iostream>
#include <vector>
int main() {
std::vector<long long> myLongs;
std::cout << "Max size expected : " << myLongs.max_size() << std::endl;
for (int i = 0; i < 160000000; i++) {
myLongs.push_back(i);
if (i % 10000 == 0) {
std::cout << "Still going! : " << i << " \r";
}
}
return 0;
}
extra info:
I am currently using 64 bit windows with 16 GB of ram.
Why 90811045?
It's probably just incidental.
That vector is not the only thing that uses memory in your process. There is the execution stack where local variables are stored. There is memory allocated by for buffering the input and output streams. Furthermore, the global memory allocator uses some of the memory for bookkeeping.
90811044 were added succesfully. The vector implementation (typically) has a deterministic strategy for allocating larger internal buffer. Typically, it multiplies the previous capacity by a constant factor (greater than 1). Hence, we can conclude that 90811044 * sizeof(long long) + other_usage is consistently small enough to be allocated successfully, but (90811044 * sizeof(long long)) * some_factor + other_usage is consistently too much.
I would like to find out the amount of bytes used by a process from within a C++ program by inspecting the operating system's memory information. The reason I would like to do this is to find a possible overhead in memory allocation when allocating memory (due to memory control blocks/nodes in free lists etc.) Currently I am on mac and am using this code:
#include <mach/mach.h>
#include <iostream>
int getResidentMemoryUsage() {
task_basic_info t_info;
mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;
if (task_info(mach_task_self(), TASK_BASIC_INFO,
reinterpret_cast<task_info_t>(&t_info),
&t_info_count) == KERN_SUCCESS) {
return t_info.resident_size;
}
return -1;
}
int getVirtualMemoryUsage() {
task_basic_info t_info;
mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;
if (task_info(mach_task_self(), TASK_BASIC_INFO,
reinterpret_cast<task_info_t>(&t_info),
&t_info_count) == KERN_SUCCESS) {
return t_info.virtual_size;
}
return -1;
}
int main(void) {
int virtualMemoryBefore = getVirtualMemoryUsage();
int residentMemoryBefore = getResidentMemoryUsage();
int* a = new int(5);
int virtualMemoryAfter = getVirtualMemoryUsage();
int residentMemoryAfter = getResidentMemoryUsage();
std::cout << virtualMemoryBefore << " " << virtualMemoryAfter << std::endl;
std::cout << residentMemoryBefore << " " << residentMemoryAfter << std::endl;
return 0;
}
When running this code I would have expected to see that the memory usage has increased after allocating an int. However when I run the above code I get the following output:
75190272 75190272
819200 819200
I have several questions because this output does not make any sense.
Why hasn't either the virtual/resident memory changed after an integer has been allocated?
How come the operating system is allocating such large amounts of memory to a running process.
When I do run the code and check activity monitor I find that 304 kb of memory is used but that number differs from the virtual/resident memory usage obtained programmatically.
My end goal is to be able to find the memory overhead when assigning data, so is there a way to do this (i.e. determine the bytes used by the OS and compare with the bytes allocated to find the difference is what I am currently thinking of)
Thank you for reading
The C++ runtime typically allocates a block of memory when a program starts up, and then parcels this out to your code when you use things like new, and adds it back to the block when you call delete. Hence, the operating system doesn't know anything about individual new or delete calls. This is also true for malloc and free in C (or C++)
First you measure number of pages, not really memory allocated. Second the runtime pre allocates few pages at startup. If you want to observe something allocate more than a single int. Try allocating several thousands and you will observe some changes.
This question already has answers here:
Allocating more memory than there exists using malloc
(6 answers)
Closed 6 years ago.
I'm trying to figure out how much memory I can allocate before the allocation will fail.
This simple C++ code allocates a buffer (of size 1024 bytes), assigns to the last five characters of the buffer, reports, and then deletes the buffer. It then doubles the size of the buffer and repeats until it fails.
Unless I'm missing something, the code is able to allocate up to 65 terabytes of memory before it fails on my MacBook Pro. Is this even possible? How can it allocate so much more memory than I have on the machine? I must be missing something simple.
int main(int argc, char *argv[])
{
long long size=1024;
long cnt=0;
while (true)
{
char *buffer = new char[size];
// Assume the alloc succeeded. We are looking for the failure after all.
// Try to write to the allocated memory, may fail
buffer[size-5] = 'T';
buffer[size-4] = 'e';
buffer[size-3] = 's';
buffer[size-2] = 't';
buffer[size-1] = '\0';
// report
if (cnt<10)
cout << "size[" << cnt << "]: " << (size/1024.) << "Kb ";
else if (cnt<20)
cout << "size[" << cnt << "]: " << (size/1024./1024.) << "Mb ";
else
cout << "size[" << cnt << "]: " << (size/1024./1024./1024.) << "Gi ";
cout << "addr: 0x" << (long)buffer << " ";
cout << "str: " << &buffer[size-5] << "\n";
// cleanup
delete [] buffer;
// double size and continue
size *= 2;
cnt++;
}
return 0;
}
When you ask for memory, an operating system reserves the right not to actually give you that memory until you actually use it.
That's what's happening here: you're only ever using 5 bytes. My ZX81 from the 1980s could handle that.
MacOS X, like almost every modern operating system, uses "delayed allocation" for memory. When you call new, the OS doesn't actually allocate any memory. It simply makes a note that your program wants a certain amount of memory, and that memory area you want starts at a certain address. Memory is only actually allocated when your program tries to use it.
Further, memory is allocated in units called "pages". I believe MacOS X uses 4kb pages, so when your program writes to the end of the buffer, the OS gives you 4096 bytes there, while retaining the rest of the buffer as simply a "your program wants this memory" note.
As for why you're hitting the limit at 64 terabytes, it's because current x86-64 processors use 48-bit addressing. This gives 256 TB of address space, which is split evenly between the operating system and your program. Doubling the 64 TB allocation would exactly fit in your program's 128 TB half of the address space, except that the program is already taking up a little bit of it.
Virtual memory is the key to allocating more address space than you have physical RAM+swap space.
malloc uses the mmap(MAP_ANONYMOUS) system call to get pages from the OS. (Assuming OS X works like Linux, since they're both POSIX OSes). These pages are all copy-on-write mapped to a single physical zero page. i.e. they all read as zero with only a TLB miss (no page fault and no allocation of physical RAM). An x86 page is 4kiB. (I'm not mentioning hugepages because they're not relevant here).
Writing to any of those pages triggers a soft page fault for the kernel to handle the copy-on-write. The kernel allocates a zeroed page of physical memory and re-wires that virtual page to be backed by the physical page. On return from the page fault, the store is re-executed and succeeds this time.
So after allocating 64TiB and storing 5 bytes to the end of it, you've used one extra page of physical memory. (And added an entry to malloc's bookkeeping data, but that was probably already allocated and in a dirty page. In a similar question about multiple tiny allocations, malloc's bookkeeping data was what eventually used up all the space).
If you actually dirtied more pages than the system had RAM + swap, the kernel would have a problem because it's too late for malloc to return NULL. This is called "overcommit", and some OSes enable it by default while others don't. In Linux, it's configurable.
As Mark explains, you run out of steam at 64TiB because current x86-64 implementations only support 48-bit virtual addresses. The upper 16 bits need to be copies of bit 47. (i.e. an address is only canonical if the 64-bit value is the sign-extension of the low 48 bits).
This requirement stops programs from doing anything "clever" with the high bits, and then breaking on future hardware that does support even larger virtual address spaces.
So I'm taking an assembly course and have been tasked with making a benchmark program for my computer - needless to say, I'm a bit stuck on this particular piece.
As the title says, we're supposed to create a function to read from 5x108 different array elements, 4 bytes each time. My only problem is, I don't even think it's possible for me to create an array of 500 million elements? So what exactly should I be doing? (For the record, I'm trying to code this in C++)
//Benchmark Program in C++
#include <iostream>
#include <time.h>
using namespace std;
int main() {
clock_t t1,t2;
int readTemp;
int* arr = new int[5*100000000];
t1=clock();
cout << "Memory Test"
<< endl;
for(long long int j=0; j <= 500000000; j+=1)
{
readTemp = arr[j];
}
t2=clock();
float diff ((float)t2-(float)t1);
float seconds = diff / CLOCKS_PER_SEC;
cout << "Time Taken: " << seconds << " seconds" <<endl;
}
Your system tries to allocate 2 billion bytes (1907 MiB), while the maximum available memory for Windows is 2 gigabytes (2048 MiB). These numbers are very close. It's likely your system has allocated the remaining 141 MiB for other stuff. Even though your code is very small, OS is pretty liberal in allocation of the 2048 MiB address space, wasting large chunks for e.g. the following:
C++ runtime (standard library and other libraries)
Stack: OS allocates a lot of memory to support recursive functions; it doesn't matter that you don't have any
Paddings between virtual memory pages
Paddings used just to make specific sections of data appear at specific addresses (e.g. 0x00400000 for lowest code address, or something like that, is used in Windows)
Padding used to randomize the values of pointers
There's a Windows application that shows a memory map of a running process. You can use it by adding a delay (e.g. getchar()) before the allocation and looking at the largest contiguous free block of memory at that point, and which allocations prevent it from being large enough.
The size is possible :
5 * 10^8 * 4 = ~1.9 GB.
First you will need to allocate your array (dynamically only ! There's no such stack memory).
For your task the 4 bytes is the size of an interger, so you can do it
int* arr = new int[5*100000000];
Alternatively, if you want to be more precise, you can allocate it as bytes
int* arr = new char[5*4*100000000];
Next, you need to make the memory dirty (meaning write something into it) :
memset(arr,0,5*100000000*sizeof(int));
Now, you can benchmark cache misses (I'm guessing that's what it's intended in such a huge array) :
int randomIndex= GetRandomNumberBetween(0,5*100000000-1); // make your own random implementation
int bytes = arr[randomIndex]; // access 4 bytes through integer
If you want 5* 10 ^8 accesses randomly you can make a knuth shuffle inside your getRandomNumber instead of using pure random.
I am trying to find the maximum memory that I could allocate on stack, global and heap memory in C++. I am trying this program on a Linux system with 32 GB of memory, and on my Mac with 2 GB of RAM.
/* test to determine the maximum memory that could be allocated for static, heap and stack memory */
#include <iostream>
using namespace std;
//static/global
long double a[200000000];
int main()
{
//stack
long double b[999999999];
//heap
long double *c = new long double[3999999999];
cout << "Sizeof(long double) = " << sizeof(long double) << " bytes\n";
cout << "Allocated Global (Static) size of a = " << (double)((sizeof(a))/(double)(1024*1024*1024)) << " Gbytes \n";
cout << "Allocated Stack size of b = " << (double)((sizeof(b))/(double)(1024*1024*1024)) << " Gbytes \n";
cout << "Allocated Heap Size of c = " << (double)((3999999999 * sizeof(long double))/(double)(1024*1024*1024)) << " Gbytes \n";
delete[] c;
return 0;
}
Results (on both):
Sizeof(long double) = 16 bytes
Allocated Global (Static) size of a = 2.98023 Gbytes
Allocated Stack size of b = 14.9012 Gbytes
Allocated Heap Size of c = 59.6046 Gbytes
I am using GCC 4.2.1. My question is:
Why is my program running? I expected since stack got depleted (16 MB in linux, and 8 MB in Mac), the program should throw an error. I saw some of the many questions asked in this topic, but I couldn't solve my problem from the answers given there.
On some systems you can allocate any amount of memory that fits in the address space. The problems begin when you start actually using that memory.
What happens is that the OS reserves a virtual address range for the process, without mapping it to anything physical, or even checking that there's enough physical memory (including swap) to back that address range up. The mapping only happens in a page-by-page fashion, when the process tries to access newly allocated pages. This is called memory overcommitment.
Try accessing every sysconf(_SC_PAGESIZE)th byte of your huge arrays and see what happens.
Linux overcommits, meaning that it can allow a process more memory than is available on the system, but it is not until that memory is actually used by the process that actual memory (physical main memory or swap space on disk) is allocated for the process. My guess would be that Mac OS X works in a similar way.