I made a C++ variable and printed its address and it came out to be very large: 0x7ffdf584da2c.
My code is as follows:
#include <iostream>
using namespace std;
int main()
{
int var = 10;
cout << "value: " << var << " address: " << &var << endl;
return 0;
}
value: 10 address: 0x7ffec6f111c4
This type of Hexadecimal memory address (0x7ffdf584da2c) looks impossible as its Decimal comes out to be (140728722577964) which is fairly large for my laptop.
I've a dual boot laptop with Windows 10 and Ubuntu and Memory around 500 GB. The code is written in Ubuntu.
It's fine.
Computers in 2020 are very complicated. Your process gets a virtual address space whose maximum "slot" will almost certainly exceed the size of the actual RAM on your system (and page file).
You are seeing the stack which goes from top of memory (not actual memory but what the operating system handles) to bottom. So normally if begins at a high value limited by OS and architecture bits (32, 64, ...).
Related
I was looking to see how many elements I can stick into a vector before the program crashes. When running the code below the program crashed with a bad alloc at i=90811045, aka when trying to add the 90811045th element. My question is: Why 90811045?
it is:
not a power of two
not the value that vector.max_size() gives
the same number both in debug and release
the same number after restarting my computer
the same number regardless of what the value of the long long is
note: I know I can fix this by using vector.reserve() or other methods, I am just interested in where 90811045 comes from.
code used:
#include <iostream>
#include <vector>
int main() {
std::vector<long long> myLongs;
std::cout << "Max size expected : " << myLongs.max_size() << std::endl;
for (int i = 0; i < 160000000; i++) {
myLongs.push_back(i);
if (i % 10000 == 0) {
std::cout << "Still going! : " << i << " \r";
}
}
return 0;
}
extra info:
I am currently using 64 bit windows with 16 GB of ram.
Why 90811045?
It's probably just incidental.
That vector is not the only thing that uses memory in your process. There is the execution stack where local variables are stored. There is memory allocated by for buffering the input and output streams. Furthermore, the global memory allocator uses some of the memory for bookkeeping.
90811044 were added succesfully. The vector implementation (typically) has a deterministic strategy for allocating larger internal buffer. Typically, it multiplies the previous capacity by a constant factor (greater than 1). Hence, we can conclude that 90811044 * sizeof(long long) + other_usage is consistently small enough to be allocated successfully, but (90811044 * sizeof(long long)) * some_factor + other_usage is consistently too much.
I'm using Visual Studio 2019, and I noticed that in debug builds, the variables are allocated so far apart from one another. I looked at Project Properties and tried searching online but could not find anything. I ran the following code below in both Debug and Release mode and here are the respective outputs.
int main() {
int a = 3;
int b = 5;
int c = 8;
int d[5] = { 10,10,10,10,10 };
int e = 14;
std::cout << "a: " << &a
<< "\nb: " << &b
<< "\nc: " << &c
<< "\nd_start: " << &d[0]
<< "\nd_end: " << &d[4] + 1
<< "\ne: " << &e
<< std::endl;
}
As you can see below, variables are allocated as you would expect (one after the other) with no wasted memory in between. Even the last variable, e, is optimized to slot between c and d.
// Release_x64 Build Ouput
a: 0000003893EFFC40
b: 0000003893EFFC44
c: 0000003893EFFC48
d_start: 0000003893EFFC50
d_end: 0000003893EFFC64
e: 0000003893EFFC4C // e is optimized in between c and d
Below is the output that confuses me. Here you can see that a and b are allocated 32 bytes apart! So there is 28 bytes of wasted/uninitialized memory between them. The same thing happens for other variables except for the int d[5]. d has 32 uninitialized bytes after c but only has 24 uninitialized bytes before e.
// Debug_x64 Build Output
a: 00000086D7EFF3F4
b: 00000086D7EFF414
c: 00000086D7EFF434
d_start: 00000086D7EFF458
d_end: 00000086D7EFF46C
e: 00000086D7EFF484
My question is that why is this happening? Why does the MSVC allocate these variables so far apart from one another and what determines how much space to separate them by so that it's different for arrays?
The debug version of the allocates storage differently than the release version. In particular, the debug version allocates some space at the beginning and end of each block of storage, so its allocation patterns are somewhat different.
The debug allocator also checks the storage at the start and end of the block it allocated to see if it has been damaged in any way.
Storage is allocated in quantized chunks, where the quantum is unspecified but is something like 16, or 32 bytes. Thus, if you allocated a DWORD array of six elements (size = 6 * sizeof(DWORD) bytes = 24 bytes) then the allocator will actually deliver 32 bytes (one 32-byte quantum or two 16-byte quanta). So if you write element [6] (the seventh element) you overwrite some of the "dead space" and the error is not detected. But in the release version, the quantum might be 8 bytes, and three 8-byte quanta would be allocated, and writing the [6] element of the array would overwrite a part of the storage allocator data structure that belongs to the next chunk. After that it is all downhill. There error might not even show up until the program exits! You can construct similar "boundary condition" situations for any size quantum. Because the quantum size is the same for both versions of the allocator, but the debug version of the allocator adds hidden space for its own purposes, you will get different storage allocation patterns in debug and release mode.
I would like to find out the amount of bytes used by a process from within a C++ program by inspecting the operating system's memory information. The reason I would like to do this is to find a possible overhead in memory allocation when allocating memory (due to memory control blocks/nodes in free lists etc.) Currently I am on mac and am using this code:
#include <mach/mach.h>
#include <iostream>
int getResidentMemoryUsage() {
task_basic_info t_info;
mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;
if (task_info(mach_task_self(), TASK_BASIC_INFO,
reinterpret_cast<task_info_t>(&t_info),
&t_info_count) == KERN_SUCCESS) {
return t_info.resident_size;
}
return -1;
}
int getVirtualMemoryUsage() {
task_basic_info t_info;
mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;
if (task_info(mach_task_self(), TASK_BASIC_INFO,
reinterpret_cast<task_info_t>(&t_info),
&t_info_count) == KERN_SUCCESS) {
return t_info.virtual_size;
}
return -1;
}
int main(void) {
int virtualMemoryBefore = getVirtualMemoryUsage();
int residentMemoryBefore = getResidentMemoryUsage();
int* a = new int(5);
int virtualMemoryAfter = getVirtualMemoryUsage();
int residentMemoryAfter = getResidentMemoryUsage();
std::cout << virtualMemoryBefore << " " << virtualMemoryAfter << std::endl;
std::cout << residentMemoryBefore << " " << residentMemoryAfter << std::endl;
return 0;
}
When running this code I would have expected to see that the memory usage has increased after allocating an int. However when I run the above code I get the following output:
75190272 75190272
819200 819200
I have several questions because this output does not make any sense.
Why hasn't either the virtual/resident memory changed after an integer has been allocated?
How come the operating system is allocating such large amounts of memory to a running process.
When I do run the code and check activity monitor I find that 304 kb of memory is used but that number differs from the virtual/resident memory usage obtained programmatically.
My end goal is to be able to find the memory overhead when assigning data, so is there a way to do this (i.e. determine the bytes used by the OS and compare with the bytes allocated to find the difference is what I am currently thinking of)
Thank you for reading
The C++ runtime typically allocates a block of memory when a program starts up, and then parcels this out to your code when you use things like new, and adds it back to the block when you call delete. Hence, the operating system doesn't know anything about individual new or delete calls. This is also true for malloc and free in C (or C++)
First you measure number of pages, not really memory allocated. Second the runtime pre allocates few pages at startup. If you want to observe something allocate more than a single int. Try allocating several thousands and you will observe some changes.
This question already has answers here:
Allocating more memory than there exists using malloc
(6 answers)
Closed 6 years ago.
I'm trying to figure out how much memory I can allocate before the allocation will fail.
This simple C++ code allocates a buffer (of size 1024 bytes), assigns to the last five characters of the buffer, reports, and then deletes the buffer. It then doubles the size of the buffer and repeats until it fails.
Unless I'm missing something, the code is able to allocate up to 65 terabytes of memory before it fails on my MacBook Pro. Is this even possible? How can it allocate so much more memory than I have on the machine? I must be missing something simple.
int main(int argc, char *argv[])
{
long long size=1024;
long cnt=0;
while (true)
{
char *buffer = new char[size];
// Assume the alloc succeeded. We are looking for the failure after all.
// Try to write to the allocated memory, may fail
buffer[size-5] = 'T';
buffer[size-4] = 'e';
buffer[size-3] = 's';
buffer[size-2] = 't';
buffer[size-1] = '\0';
// report
if (cnt<10)
cout << "size[" << cnt << "]: " << (size/1024.) << "Kb ";
else if (cnt<20)
cout << "size[" << cnt << "]: " << (size/1024./1024.) << "Mb ";
else
cout << "size[" << cnt << "]: " << (size/1024./1024./1024.) << "Gi ";
cout << "addr: 0x" << (long)buffer << " ";
cout << "str: " << &buffer[size-5] << "\n";
// cleanup
delete [] buffer;
// double size and continue
size *= 2;
cnt++;
}
return 0;
}
When you ask for memory, an operating system reserves the right not to actually give you that memory until you actually use it.
That's what's happening here: you're only ever using 5 bytes. My ZX81 from the 1980s could handle that.
MacOS X, like almost every modern operating system, uses "delayed allocation" for memory. When you call new, the OS doesn't actually allocate any memory. It simply makes a note that your program wants a certain amount of memory, and that memory area you want starts at a certain address. Memory is only actually allocated when your program tries to use it.
Further, memory is allocated in units called "pages". I believe MacOS X uses 4kb pages, so when your program writes to the end of the buffer, the OS gives you 4096 bytes there, while retaining the rest of the buffer as simply a "your program wants this memory" note.
As for why you're hitting the limit at 64 terabytes, it's because current x86-64 processors use 48-bit addressing. This gives 256 TB of address space, which is split evenly between the operating system and your program. Doubling the 64 TB allocation would exactly fit in your program's 128 TB half of the address space, except that the program is already taking up a little bit of it.
Virtual memory is the key to allocating more address space than you have physical RAM+swap space.
malloc uses the mmap(MAP_ANONYMOUS) system call to get pages from the OS. (Assuming OS X works like Linux, since they're both POSIX OSes). These pages are all copy-on-write mapped to a single physical zero page. i.e. they all read as zero with only a TLB miss (no page fault and no allocation of physical RAM). An x86 page is 4kiB. (I'm not mentioning hugepages because they're not relevant here).
Writing to any of those pages triggers a soft page fault for the kernel to handle the copy-on-write. The kernel allocates a zeroed page of physical memory and re-wires that virtual page to be backed by the physical page. On return from the page fault, the store is re-executed and succeeds this time.
So after allocating 64TiB and storing 5 bytes to the end of it, you've used one extra page of physical memory. (And added an entry to malloc's bookkeeping data, but that was probably already allocated and in a dirty page. In a similar question about multiple tiny allocations, malloc's bookkeeping data was what eventually used up all the space).
If you actually dirtied more pages than the system had RAM + swap, the kernel would have a problem because it's too late for malloc to return NULL. This is called "overcommit", and some OSes enable it by default while others don't. In Linux, it's configurable.
As Mark explains, you run out of steam at 64TiB because current x86-64 implementations only support 48-bit virtual addresses. The upper 16 bits need to be copies of bit 47. (i.e. an address is only canonical if the 64-bit value is the sign-extension of the low 48 bits).
This requirement stops programs from doing anything "clever" with the high bits, and then breaking on future hardware that does support even larger virtual address spaces.
I am trying to find the maximum memory that I could allocate on stack, global and heap memory in C++. I am trying this program on a Linux system with 32 GB of memory, and on my Mac with 2 GB of RAM.
/* test to determine the maximum memory that could be allocated for static, heap and stack memory */
#include <iostream>
using namespace std;
//static/global
long double a[200000000];
int main()
{
//stack
long double b[999999999];
//heap
long double *c = new long double[3999999999];
cout << "Sizeof(long double) = " << sizeof(long double) << " bytes\n";
cout << "Allocated Global (Static) size of a = " << (double)((sizeof(a))/(double)(1024*1024*1024)) << " Gbytes \n";
cout << "Allocated Stack size of b = " << (double)((sizeof(b))/(double)(1024*1024*1024)) << " Gbytes \n";
cout << "Allocated Heap Size of c = " << (double)((3999999999 * sizeof(long double))/(double)(1024*1024*1024)) << " Gbytes \n";
delete[] c;
return 0;
}
Results (on both):
Sizeof(long double) = 16 bytes
Allocated Global (Static) size of a = 2.98023 Gbytes
Allocated Stack size of b = 14.9012 Gbytes
Allocated Heap Size of c = 59.6046 Gbytes
I am using GCC 4.2.1. My question is:
Why is my program running? I expected since stack got depleted (16 MB in linux, and 8 MB in Mac), the program should throw an error. I saw some of the many questions asked in this topic, but I couldn't solve my problem from the answers given there.
On some systems you can allocate any amount of memory that fits in the address space. The problems begin when you start actually using that memory.
What happens is that the OS reserves a virtual address range for the process, without mapping it to anything physical, or even checking that there's enough physical memory (including swap) to back that address range up. The mapping only happens in a page-by-page fashion, when the process tries to access newly allocated pages. This is called memory overcommitment.
Try accessing every sysconf(_SC_PAGESIZE)th byte of your huge arrays and see what happens.
Linux overcommits, meaning that it can allow a process more memory than is available on the system, but it is not until that memory is actually used by the process that actual memory (physical main memory or swap space on disk) is allocated for the process. My guess would be that Mac OS X works in a similar way.