I have a source file:
#include <cstdlib>
#include <iostream>
int main() {
void *p = std::malloc(8192);
std::cout << std::hex << (size_t)p << std::endl;
std::free(p);
}
I compiled it on two platforms: (a) on macOS, with clang++ 4.2.1, and (b) on Linux, with g++ 7.3.0.
On macOS, the printout is 7fa432001000, and on Linux it is 257ec20.
The macOS's printout is not expected. I thought malloc() should allocate memory in the heap, and if it allocates memory using mmap() under the hood then it's fine, too. But 7fa432001000 seems an address in the stack location, because the virtual memory's upper limit on a x86_64 is just below 7fffffffffff (at least it's the case for current Linux - maybe I was wrong).
My question is: why (or how) does the malloc() on macOS return such a high address? Is this because of the way Clang's libc++ was implemented?
This is due to modern systems mapping virtual addresses to actual physical addresses.
According to Wikipedia:
The computer's operating system, using a combination of hardware and
software, maps memory addresses used by a program, called virtual
addresses, into physical addresses in computer memory. Main storage,
as seen by a process or task, appears as a contiguous address space or
collection of contiguous segments. The operating system manages
virtual address spaces and the assignment of real memory to virtual
memory. Address translation hardware in the CPU, often referred to as
a memory management unit or MMU, automatically translates virtual
addresses to physical addresses. Software within the operating system
may extend these capabilities to provide a virtual address space that
can exceed the capacity of real memory and thus reference more memory
than is physically present in the computer.
As to why it's different on two different platforms, that's because different computer hardware handles memory in different ways; BIOS and other firmware could have different algorithms or techniques; the different OS or OS versions do things differently (Linux has different options when you build your kernel); the C++ run-time could be implemented differently.
Related
I'm using Microsoft Visual Studio 2008
When I create a pointer to an object, it will receive a memory address which in my case is an 8 digit hexadecimal number. E.g.: 0x02e97fc0
With 8 hexadecimal digits a computer can address 4GB of memory. I've got 8GB of memory in my computer:
Does that mean that my IDE is not using more than 4GBs out of my memory?
Is the IDE able to address only the first 4GB of my memory or any 4GB out of the 8GBs not used?
The question is not only about the size of the memory used. It is also about the location of the memory used. Latter hasn't been detailed here: The maximum amount of memory any single process on Windows can address
Where does C++ create stack and heap in memory?
Well, C++ does not really handle memory, it ask the operating system to do so. When a binary object (.exe, .dll, .so ...) is loaded into memory, this is the OS which allocate memory for the stack. When you dynamically allocate memory with new, you're asking the OS for some space in the heap.
1) Does that mean that my IDE is not using more than 4GBs out of my memory?
No, not really. In fact, modern OS like Windows use what is called virtual address space. It maps an apparently contiguous memory segment (say 0x1000 to 0xffff) to a segment of virtual space just for your program; you have absolutely no guarantee over where your objects really lie in memory. When an address is dereferenced, the OS do some magic and let your program access the physical address in memory.
Having 32 bits addresses means a single instance of your program can't use more that 4GB of memory. Two instances of your same program can, since the OS can allocate two different segments of physical address inside the apparently same segment of virtual address (0x00000000 to 0xffffffff). And Windows will allocate yet more overlapping address spaces for its own processes.
2) Is the IDE able to address only the first 4GB of my memory or any 4GB out of the 8GBs not used?
Any. Even non-contiguous memory, even disk memory ... no one can tell.
Found some Microsoft source in the comments about it: https://msdn.microsoft.com/en-us/library/aa366778.aspx
The code below is calling foo and use while(1) to watch the memory usage. As I know, after 'finished' printed, var d is deallocated and the STL container will free the data space(heap) by himself.
#include <vector>
#include <string>
#include <iostream>
void foo() {
std::vector<std::string> d(100000000);
for(int i = 0; i < 100000000; ++i) d[i] = "1,1,3";
d.resize(0);
d.shrink_to_fit();
}
int main(int argc, char *argv[])
{
foo();
std::cout << "finished" << std::endl;
while(1) {;}
return 0;
}
But what I observed(using htop): memory is not freed back to the operating system. This is just a bench and the real code related to MESOS which has memory limitation for each process.
I have tried several versions of compiler such as g++-4.7.2 g++-4.8.1, clang++ on linux server with glibc 2.15. More, I also use tcmalloc instead of default malloc, but it still do not work(in MAC machine the problem will not happen).
What's the problem? How can I make sure the memory give back to os?
Thank you.
How can I make sure the memory give back to os?
You can terminate your process.
What's the problem?
There probably isn't one. It's normal for programs not to return memory (though Linux does return memory early for some particularly large allocations). They normally use sbrk or equivalent to grow the virtual address space available to them, but it's not normally worth the effort of trying to return deallocated memory. This may be counter-intuitive, but it's also proven workable for millions of programs over many decades, so you shouldn't bother yourself with it unless you have a specific tangible problem. It shouldn't cause problems for you as the deallocated memory will be reused when the application performs further allocations, so the "MESOS memory limitation for each process" you mention still affects the "high watermark" of maximum instantaneous memory usage the same way.
Note that OSes with virtual memory support may swap long unused deallocated pages to disk so the backing RAM can be reused by the kernel or other apps.
It's also possible to take manual control of this using e.g. memory mapped files, but writing such allocators and using the from Standard containers is a non-trivial undertaking... lots of other SO questions on how to approach that problem.
Allocating memory from the OS has two downsides:
High overhead. A system call involves a switch into protected mode which takes much longer than a simple function call, and then the memory management for the OS itself is probably quite complex.
High granularity. The OS probably has a minimum size allocation like 4K. That's a lot of overhead for a 6 byte string.
For these reasons the C++ memory allocator will only ask the OS for large blocks, then parcel out pieces of it when asked via new or malloc.
When those pieces of memory are released, they're put back into a pool to be handed out again on the next request. Now it's quite possible that all of the pieces of a larger block end up being freed, but how often does that happen in real life? Chances are that there will be at least one allocation per block that sticks around for a long time, preventing the block from being returned to the OS. And if it is returned, what do you think are the chances that the program will turn around and request it back again a short time later? As a practical matter it usually doesn't pay to return blocks to the OS. Your test program is a highly artificial case that isn't worth optimizing for.
In most modern systems the operating system manages memory in pages. Application memory is managed in pools (heaps) by library functions. When your application allocates memory, the library functions attempt to find an available block of the size you requested. If the memory is not in the pool, the library calls the system to add more pages to the process to incorporate into the pool(heap). When you free memory it goes back into the pool. The allocated pages in the pool do not return to the operating system.
In debug mode I saw that the pointers have addresses like 0x01210040,
but as I realized, 0x means hexadecimal right? And there're 8 hex digits, i.e. in total there're are 128 bits that are addressed?? So does that mean that for 32-bit system the first two digits are always 0, and for a 64-bit system the first digit is 0?
Also, may I ask that, for a 32-bit program, would I be able to allocate as much as 3GB of memory as long as I remain in the heap and use only malloc()? Or is there some limitations the Windows system poses on a single thread? (the IDE I'm using is VS2012)
Since actually I was running a 32-bit program in a 64-bit system, but the program crashed with a memory leak when it only allocated about 1.5GB of memory...and I can't seem to figure out why.
(Oooops...sorry guys I think I made a simple mistake with the first question...indeed one hex digit is 4 bits, and 8 makes 32bits. However here is another question...how is address represented in a 64-bit program?)
For 32-bit Windows, the limit is actually 2GB usable per process, with virtual addresses from 0x00000000 (or simply 0x0) through 0x7FFFFFFF. The rest of the 4GB address space (0x80000000 through 0xFFFFFFFF) for use by Windows itself. Note that these have nothing to do with the actual physical memory addresses.
If your program is large address space aware, this limit is increased to 3GB on 32bit systems and 4GB for 32bit programs running on 64bit Windows.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366912(v=vs.85).aspx
And for the higher limits for large address space aware programs (IMAGE_FILE_LARGE_ADDRESS_AWARE), see here:
http://msdn.microsoft.com/en-us/library/aa366778.aspx
You might also want to take a look at the Virtual Memory article on Wikipedia to better understand how the mapping between virtual addresses and physical addresses works. The first MSDN link above also has a short explanation:
The virtual address space for a process is the set of virtual memory
addresses that it can use. The address space for each process is
private and cannot be accessed by other processes unless it is shared.
A virtual address does not represent the actual physical location of
an object in memory; instead, the system maintains a page table for
each process, which is an internal data structure used to translate
virtual addresses into their corresponding physical addresses. Each
time a thread references an address, the system translates the virtual
address to a physical address. The virtual address space for 32-bit
Windows is 4 gigabytes (GB) in size and divided into two partitions:
one for use by the process and the other reserved for use by the
system. For more information about the virtual address space in 64-bit
Windows, see Virtual Address Space in 64-bit Windows.
EDIT: As user3344003 points out, these values are not the amount of memory you can allocate using malloc or otherwise use for storing values, they just represent the size of the virtual address space.
There are a number of limits that would restrict the size of your malloc allocation.
1) The number of bits, restricts the size of the address space. For 32-bits, that is 4B.
2) System the subdivide that for the various processor modes. These days, usually 2GB goes to the user and 2GB to the kernel.
3) The address space may be limited by the size of the page tables.
4) The total virtual memory may be limited by the size of the page file.
5) Before you start malloc'ing, there be stuff already in the virtual address space (e.g., code stack, reserved area, data). Your malloc needs to return a contiguous block of memory. Largest theoretical block it could return has to fit within unallocated areas of virtual memory.
6) Your memory management heap may restrict the size that can be allocated.
There probably other limitations that I have omitted.
-=-=-=-=-
If your program crashed after allocating 1.5GB through malloc, did you check the return value from malloc to see if it was not null?
-=-=-=-=-=
The best way to allocate huge blocks of memory is through operating system services to map pages into the virtual address space.---not using malloc.
In reference to the following article
For a 32-bit application launched in a 32-bit Windows, the total size of all the mentioned data types must not exceed 2 Gbytes.
The same 32-bit program launched in a 64-bit system can allocate about 4 Gbytes (actually about 3.5 Gbytes)
The practical data you are looking at is around 1.7 GB due to space occupied by windows.
By any chance how did you find out the memory it had allocated when it crashed.?
I have a C++ application that I am trying to iron the memory leaks out of and I realized I don't fully understand the difference between virtual and physical memory.
Results from top (so 16.8g = virtual, 111m = physical):
4406 um 20 0 16.8g 111m 4928 S 64.7 22.8 36:53.65 client
My process holds 500 connections, one for each user, and at these numbers it means there is about 30 MB of virtual overhead for each user. Without going into the details of my application, the only way this could sound remotely realistic, adding together all the vectors, structs, threads, functions on the stack, etc., is if I have no idea what virtual memory actually means. No -O optimization flags, btw.
So my questions are:
what operations in C++ would inflate virtual memory so much?
Is it a problem if my task is using gigs of virtual memory?
The stack and heap function variables, vectors, etc. - do those necessarily increase the use of physical memory?
Would removing a memory leak (via delete or free() or such) necessarily reduce both physical and virtual memory usage?
Virtual memory is what your program deals with. It consists of all of the addresses returned by malloc, new, et al. Each process has its own virtual-address space. Virtual address usage is theoretically limited by the address size of your program: 32-bit programs have 4GB of address space; 64-bit programs have vastly more. Practically speaking, the amount of virtual memory that a process can allocate is less than those limits.
Physical memory are the chips soldered to your motherboard, or installed in your memory slots. The amount of physical memory in use at any given time is limited to the amount of physical memory in your computer.
The virtual-memory subsystem maps virtual addresses that your program uses to physical addresses that the CPU sends to the RAM chips. At any particular moment, most of your allocated virtual addresses are unmapped; thus physical memory use is lower than virtual memory use. If you access a virtual address that is allocated but not mapped, the operating system invisibly allocates physical memory and maps it in. When you don't access a virtual address, the operating system might unmap the physical memory.
To take your questions in turn:
what operations in C++ would inflate virtual memory so much?
new, malloc, static allocation of large arrays. Generally anything that requires memory in your program.
Is it a problem if my task is using gigs of virtual memory?
It depends upon the usage pattern of your program. If you allocate vast tracks of memory that you never, ever touch, and if your program is a 64-bit program, it may be okay that you are using gigs of virtual memory.
Also, if your memory use grows without bound, you will eventually run out of some resource.
The stack and heap function variables, vectors, etc. - do those necessarily increase the use of physical memory?
Not necessarily, but likely. The act of touching a variable ensures that, at least momentarily, it (and all of the memory "near" it) is in physical memory. (Aside: containers like std::vector may be allocated on either stack or heap, but the contained objects are allocated on the heap.)
Would removing a memory leak (via delete or free() or such) necessarily reduce both physical and virtual memory usage?
Physical: probably. Virtual: yes.
Virtual memory is the address space used by the process. Each process has a full view of the 64 bit (or 32, depending on the architecture) addressable bytes of a pointer, but not every byte maps to something real. The operating system manages the table that maps virtual address to real physical memory pages -- or whatever that address really is (no matter it seems to be memory for your application). For instance, for your application an address may point to some function, but in reality it has not yet been loaded from disk, and when you call it, it generates a page fault interruption, that the kernel treats by loading the appropriated section from the executable and mapping it to the address space of your application, so it can be executed.
From a Linux perspective (and I believe most modern OS's):
Allocating memory inflates virtual memory. Actually using the allocated memory inflates physical memory usage. Do it too much and it will be swapped to disk, and eventually your process will be killed.
mmaping files will increase only virtual memory usage, this includes the size of the executables: the larger they are, the more virtual memory used.
The only problem of using up virtual memory is that you may have it depleted. It is mainly an issue on 32 bits system, where you only have 4gb of it (and 1gb is reserved for the kernel, so application data only have 3gb).
Function calls, that allocates variables on stack, may increase physical memory usage, but you (usually) won't leak this memory.
Allocated heap variables takes up virtual memory, but will only actually get the physical memory if you read/write on them.
Freeing or deleting variables does not necessarily reduces virtual/physical memory consumption, it depends on the allocator internals, but usually does.
You can set following environment variables to control internal memory allocations by malloc. After setting it, it will answer all four questions. If you want to know other options please refer :
http://man7.org/linux/man-pages/man3/mallopt.3.html
export MALLOC_MMAP_THRESHOLD_=8192
export MALLOC_ARENA_MAX=4
#include <iostream>
using namespace std;
int main(void)
{
int *ptr = new int;
cout << "Memory address of ptr:" << ptr << endl;
cin.get();
delete ptr;
return 0;
}
Every time I run this program, I get the same memory address for ptr. Why?
[Note: my answer assumes you're working with a modern OS that uses a virtual memory system.]
Due to virtual memory, each process operates in its own unique address space, which is independent of and unaffected by any other process. The address you get from new is a virtual address, and is generated by whatever your compiler's implementation of new chooses to do.* There's no reason this couldn't be deterministic.
On the other hand, the physical address associated with your virtual memory address will most likely be different every time, and will be affected by all sorts of things. This mapping is controlled by the OS.
* new is probably implemented in terms of malloc.
I'd say it's mostly coincidence. As the memory allocator/OS can give you whatever address it wants.
The addresses you get are obviously not uniformly random (and is highly dependent on other OS factors), so it's often to get the same (virtual) address several times in the row.
So for example, on my machine: Window 7, compiled with VS2010, I get different addresses with different runs:
00134C40
00124C40
00214C40
00034C40
00144C40
001B4C40
This is an artifact of your environment. The cin.get() suggests to me that you are compiling and executing in Visual Studio, which provides an unusually predictable runtime environment. When I compile and run that code on my linux, two executions gave two different addresses.
ETA:
In comments you expressed an expectation that different processes could obtain the same memory address and that this address would be inaccessible to your program. In any modern operating system this is not the case, because the operating system is providing each process with virtual memory address spaces.
Only the operating system sees the true hardware addresses, and maintains virtual memory maps for each program, redirecting virtual addresses to physical addresses. Therefore, an arbitrary number of different processes can hold data in the same virtual address, while the operating system maps that address to a separate physical address for each process.
This guarantees that process A cannot read or write to memory in use by process B without a special provision enabling such access (such as by instructing the OS to map certain virtual memory in certain processes to the same physical memory). It allows the operating system to make different kinds of memory hardware transparent to programs.
It also allows the OS to move a program's data around behind its back to optimize system performance.
Classical example: Moving data that hasn't been used for some time to a special file on the hard disk. This is sometimes called the page file.
Memory maps are typically broken up into pages: Blocks of contiguous memory of a certain size (the page size). Data held within a page of virtual address space is usually also contiguous in physical memory, but if data runs over a page boundary, information that appears contiguous in virtual memory could easily be separated. If a C/C++ program enters undefined behavior, it may attempt to access memory in a page that the OS has not mapped to physical memory. This will cause the OS to generate an error.