I work on Peano-Hilbert data ordering (c++ 4.9, linux 64x) to coalesce dynamically allocated memory. For control I am trying to visualize the actual data distribution in the memory. For this I convert pointers to my data to integers like follows
unsigned long int address = *(unsigned long int*)(&pointer);
and then plot them as some 2D-map. It works fine for most of the case but sometimes I get values exceeding by far available memory, e.g. 140170747903888, which corresponds ~127 TB shift whereas I have only 16 GB of RAM. What the hell?
The memory management system does not handle memory in a linear way. It is free to tell a process that some memory block is in the address 0x1234123412345678, even if you only had 128MB of memory. This is called paging. The data might not even be in physical memory, but pages out to disk.
This means that you have no way of knowing where in physical memory anything is from the pointer value, since it might change all the time (or it might not even be in memory), you only know the virtual address the OS has happened to give you. And it is totally implementation dependent how it gives them out.
AMD 64 bit uses 48 bits for virtual memory addresses, which corresponds to 256TB. Virtual address space is distinct from physical RAM: the addresses are looked up in a table on the CPU and actual RAM faulted in when the pages in question are first accessed.
Related
Let's say I have 8 Gigabytes of RAM and 16 Gigabytes of swap memory. Can I allocate a 20 Gigabyte array there in C? If yes, how is it possible? What would that memory layout look like?
[linux] Can I create an array exceeding RAM, if I have enough swap memory?
Yes, you can. Note that accessing swap is veerry slooww.
how is it possible
Allocate dynamic memory. The operating system handles the rest.
How would that memory layout look like?
On an amd64 system, you can have 256 TiB of address space. You can easily fit a contiguous block of 8 GiB in that space. The operating system divides the virtual memory into pages and copies the pages between physical memory and swap space as needed.
Modern operating systems use virtual memory. In Linux and most other OSes rach process has it's own address space according to the abilities of the architecture. You can check the size of the virtual address space in /proc/cpuinfo. For example you may see:
address sizes : 43 bits physical, 48 bits virtual
This means that virtual addresses use 48 bit. Half of that is reserved for the kernel so you only can use 47 bit, or 128TiB. Any memory you allocate will be placed somewhere in those 128 TiB of address space as if you actually had that much memory.
Linux uses demand page loading and per default over commits memory. When you say
char *mem = (char*)malloc(1'000'000'000'000);
what happens is that Linux picks a suitable address and just records that you have allocated 1'000'000'000'000 (rounded up to the nearest page) of memory starting at that point. (It does some sanity check that the amount isn't totally bonkers depending on the amount of physical memory that is free, the amount of swap that is free and the overcommit setting. Per default you can allocate a lot more than you have memory and swap.)
Note that at this point no physical memory and no swap space is connected to your allocated block at all. This changes when you first write to the memory:
mem[4096] = 0;
At this point the program will page fault. Linux checks the address is actually something your program is allowed to write to, finds a physical page and map it to &mem[4096]. Then it lets the program retry to write there and everything continues.
If Linux can't find a physical page it will try to swap something out to make a physical page available for your programm. If that also fails your program will receive a SIGSEGV and likely die.
As a result you can allocate basically unlimited amounts of memory as long as you never write to more than the physical memory and swap and support. On the other hand if you initialize the memory (explicitly or implicitly using calloc()) the system will quickly notice if you try to use more than available.
You can, but not with a simple malloc. It's platform-dependent.
It requires an OS call to allocate swapable memory (it's VirtualAlloc on Windows, for example, on Linux it should be mmap and related functions).
Once it's done, the allocated memory is divided into pages, contiguous blocks of fixed size. You can lock a page, therefore it will be loaded in RAM and you can read and modify it freely. For old dinosaurs like me, it's exactly how EMS memory worked under DOS... You address your swappable memory with a kind of segment:offset method: first, you divide your linear address by the page size to find which page is needed, then you use the remainder to get the offset within this page.
Once unlocked, the page remains in memory until the OS needs memory: then, an unlocked page will be flushed to disk, in swap, and discarded in RAM... Until you lock (and load...) it again, but this operation may requires to free RAM, therefore another process may have its unlocked pages swapped BEFORE your own page is loaded again. And this is damnly SLOOOOOOW... Even on a SSD!
So, it's not always a good thing to use swap. A better way is to use memory mapped files - perfect for reading very big files mostly sequentially, with few random accesses - if it can suits your needs.
In debug mode I saw that the pointers have addresses like 0x01210040,
but as I realized, 0x means hexadecimal right? And there're 8 hex digits, i.e. in total there're are 128 bits that are addressed?? So does that mean that for 32-bit system the first two digits are always 0, and for a 64-bit system the first digit is 0?
Also, may I ask that, for a 32-bit program, would I be able to allocate as much as 3GB of memory as long as I remain in the heap and use only malloc()? Or is there some limitations the Windows system poses on a single thread? (the IDE I'm using is VS2012)
Since actually I was running a 32-bit program in a 64-bit system, but the program crashed with a memory leak when it only allocated about 1.5GB of memory...and I can't seem to figure out why.
(Oooops...sorry guys I think I made a simple mistake with the first question...indeed one hex digit is 4 bits, and 8 makes 32bits. However here is another question...how is address represented in a 64-bit program?)
For 32-bit Windows, the limit is actually 2GB usable per process, with virtual addresses from 0x00000000 (or simply 0x0) through 0x7FFFFFFF. The rest of the 4GB address space (0x80000000 through 0xFFFFFFFF) for use by Windows itself. Note that these have nothing to do with the actual physical memory addresses.
If your program is large address space aware, this limit is increased to 3GB on 32bit systems and 4GB for 32bit programs running on 64bit Windows.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366912(v=vs.85).aspx
And for the higher limits for large address space aware programs (IMAGE_FILE_LARGE_ADDRESS_AWARE), see here:
http://msdn.microsoft.com/en-us/library/aa366778.aspx
You might also want to take a look at the Virtual Memory article on Wikipedia to better understand how the mapping between virtual addresses and physical addresses works. The first MSDN link above also has a short explanation:
The virtual address space for a process is the set of virtual memory
addresses that it can use. The address space for each process is
private and cannot be accessed by other processes unless it is shared.
A virtual address does not represent the actual physical location of
an object in memory; instead, the system maintains a page table for
each process, which is an internal data structure used to translate
virtual addresses into their corresponding physical addresses. Each
time a thread references an address, the system translates the virtual
address to a physical address. The virtual address space for 32-bit
Windows is 4 gigabytes (GB) in size and divided into two partitions:
one for use by the process and the other reserved for use by the
system. For more information about the virtual address space in 64-bit
Windows, see Virtual Address Space in 64-bit Windows.
EDIT: As user3344003 points out, these values are not the amount of memory you can allocate using malloc or otherwise use for storing values, they just represent the size of the virtual address space.
There are a number of limits that would restrict the size of your malloc allocation.
1) The number of bits, restricts the size of the address space. For 32-bits, that is 4B.
2) System the subdivide that for the various processor modes. These days, usually 2GB goes to the user and 2GB to the kernel.
3) The address space may be limited by the size of the page tables.
4) The total virtual memory may be limited by the size of the page file.
5) Before you start malloc'ing, there be stuff already in the virtual address space (e.g., code stack, reserved area, data). Your malloc needs to return a contiguous block of memory. Largest theoretical block it could return has to fit within unallocated areas of virtual memory.
6) Your memory management heap may restrict the size that can be allocated.
There probably other limitations that I have omitted.
-=-=-=-=-
If your program crashed after allocating 1.5GB through malloc, did you check the return value from malloc to see if it was not null?
-=-=-=-=-=
The best way to allocate huge blocks of memory is through operating system services to map pages into the virtual address space.---not using malloc.
In reference to the following article
For a 32-bit application launched in a 32-bit Windows, the total size of all the mentioned data types must not exceed 2 Gbytes.
The same 32-bit program launched in a 64-bit system can allocate about 4 Gbytes (actually about 3.5 Gbytes)
The practical data you are looking at is around 1.7 GB due to space occupied by windows.
By any chance how did you find out the memory it had allocated when it crashed.?
If we compile and execute the code below:
int *p;
printf("%d\n", (int)sizeof(p));
it seems that the size of a pointer to whatever the type is 4 bytes, which means 32 bit, so 232 adresses are possible to store in a pointer. Since every address is associated to 1 byte, 232 bytes give 4 GB.
So, how can a pointer point to the address after 4 GB of memory? And how can a program use more than 4 GB of memory?
By principle, if you can't represent an address which goes over 2^X-1 then you can't address more than 2^X bytes of memory.
This is true for x86 even if some workarounds have been implemented and used (like PAE) that allows to have more physical memory even if with limits imposed by the fact that these are more hacks than real solutions to the problem.
With a 64 bit architecture the standard size of a pointer is doubled, so you don't have to worry anymore.
Mind that, in any case, virtual memory translates addresses from the process space to the physical space so it's easy to see that a hardware could support more memory even if the maximum addressable memory from the process point of view is still limited by the size of a pointer.
"How can a pointer point to the address after 4GB of memory?"
There is a difference between the physical memory available to the processor and the "virtual memory" seen by the process. A 32 bit process (which has a pointer of size 4 bytes) is limited to 4GB however the processor maintains a mapping (controlled by the OS) that lets each process have its own memory space, up to 4GB each.
That way 8GB of memory could be used on a 32 bit system, if there were two processes each using 4GB.
To access >4GB of address space you can do one of the following:
Compile in x86_64 (64 bit) on a 64 bit OS. This is the easiest.
Use AWE memory. AWE allows mapping a window of memory which (usually) resides above 4GB. The window address can be mapped and remapped again and again. Was used in large database applications and RAM drives in the 32 bit era.
Note that a memory address where the MSB is 1 is reserved for the kernel. Windows allows under several conditions to use up to 3GB (per process), the top 1GB is always for the kernel.
By default a 32 bit process has 2GB of user mode address space. It's possible to get 3GB via a special linker flag (in VS: /LARGEADDRESSAWARE).
i implemented a bloom filter(bit table) using three dimension char array it works well until it reaches at a point where it can no more allocate memory and gives a bad_alloc message. It gives me this error on the next expand request after allocating 600MB.
The bloom filter(the array) is expected to grow as big as 8 to 10GB.
Here is the code i used to allocate(expand) the bit table.
unsigned char ***bit_table_=0;
unsigned int ROWS_old=5;
unsigned int EXPND_SIZE=5;
void expand_bit_table()
{
FILE *temp;
temp=fopen("chunk_temp","w+b");
//copy old content
for(int i=0;i<ROWS_old;++i)
for(int j=0;j<ROWS;++j)
fwrite(bit_table_[i][j],COLUMNS,1,temp);
fclose(temp);
//delete old table
chunk_delete_bit_table();
//create expanded bit table ==> add EXP_SIZE more rows
bit_table_=new unsigned char**[ROWS_old+EXPND_SIZE];
for(int i=0;i<ROWS_old+EXPND_SIZE;++i)
{
bit_table_[i]=new unsigned char*[ROWS];
for(int k=0;k<ROWS;++k)
bit_table_[i][k]=new unsigned char[COLUMNS];
}
//copy back old content
temp=fopen("chunk_temp","r+b");
for(int i=0;i<ROWS_old;++i)
{
fread(bit_table_[i],COLUMNS*ROWS,1,temp);
}
fclose(temp);
//set remaining content of bit_table_to 0
for(int i=ROWS_old;i<ROWS_old+EXPND_SIZE;++i)
for(int j=0;j<ROWS;++j)
for(int k=0;k<COLUMNS;++k)
bit_table_[i][j][k]=0;
ROWS_old+=EXPND_SIZE;
}
What is the maximum allowable size for an array and if this is not the issue what can i do about it.
EDIT:
It is developed using a 32 bit platform.
It is run on 64 bit platform(server) with 8GB RAM.
A 32-bit program must allocate memory from the virtual memory address space. Which stores chunks of code and data, memory is allocated from the holes between them. Yes, the maximum you can hope for is around 650 megabytes, the largest available hole. That goes rapidly down from there. You can solve it by making your data structure smarter, like a tree or list instead of one giant array.
You can get more insight in the virtual memory map of your process with the SysInternals' VMMap utility. You might be able to change the base address of a DLL so it doesn't sit plumb in the middle of an otherwise empty region of the address space. Odds that you'll get much beyond 650 MB are however poor.
There's a lot more breathing room on a 64-bit operating system, a 32-bit process has a 4 gigabyte address space since the operating system components run in 64-bit mode. You have to use the /LARGEADDRESSAWARE linker option to allow the process to use it all. Still, that only works on a 64-bit OS, your program is still likely to bomb on a 32-bit OS. When you really need that much VM, the simplest approach is to just make a 64-bit OS a prerequisite and build your program targeting x64.
A 32-bit machine gives you a 4GB address space.
The OS reserves some of this (half of it by default on Windows, giving you 2GB to yourself. I'm not sure about Linux, but I believe it reserves 1GB)
This means you have 2-3 GB to your own process.
Into this space, several things need to fit:
your executable (as well as all dynamically linked libraries) are memory-mapped into it
each thread needs a stack
the heap
and quite a few other nitty gritty bits.
The point is that it doesn't really matter how much memory you end up actually using. But a lot of different pieces have to fit into this memory space. And since they're not packed tightly into one end of it, they fragment the memory space. Imagine, for simplicity, that your executable is mapped into the middle of this memory space. That splits your 3GB into two 1.5GB chunks. Now say you load two dynamic libraries, and they subdivide those two chunks into four 750MB ones. Then you have a couple of threads, each needing further chunks of memory, splitting up the remaining areas further. Of course, in reality each of these won't be placed at the exact center of each contiguous block (that'd be a pretty stupid allocation strategy), but nevertheless, all these chunks of memory subdivide the available memory space, cutting it up into many smaller pieces.
You might have 600MB memory free, but you very likely won't have 600MB of contiguous memory available. So where a single 600MB allocation would almost certainly fail, six 100MB allocations may succeed.
There's no fixed limit on how big a chunk of memory you can allocate. The answer is "it depends". It depends on the precise layout of your process' memory space. But on a 32-bit machine, you're unlikely to be able to allocate 500MB or more in a single allocation.
The maximum in-memory data a 32-bit process can access is 4GB in theory (in practice it will be somewhat smaller). So you cannot have 10GB data in memory at once (even with the OS supporting more). Also, even though you are allocating the memory dynamically, the free store available is further limited by the stack size.
The actual memory available to the process depends on the compiler settings that generates the executable.
If you really do need that much, consider persisting (parts of) the data in the file system.
Recently, I work in C++ and I have to create a array[60.000][60.000]. However, i cannot create this array because it's too large. I tried float **array or even static float array but nothing is good. Does anyone have an ideas?
Thanks for your helps!
A matrix of size 60,000 x 60,000 has 3,600,000,000 elements.
You're using type float so it becomes:
60,000 x 60,000 * 4 bytes = 14,400,000,000 bytes ~= 13.4 GB
Do you even have that much memory in your machine?
Note that the issue of stack vs heap doesn't even matter unless you have enough memory to begin with.
Here's a list of possible problems:
You don't have enough memory.
If the matrix is declared globally, you'll exceed the maximum size of the binary.
If the matrix is declared as a local array, then you will blow your stack.
If you're compiling for 32-bit, you have far exceeded the 2GB/4GB addressing limit.
Does "60.000" actually mean "60000"? If so, the size of the required memory is 60000 * 60000 * sizeof(float), which is roughly 13.4 GB. A typical 32-bit process is limited to only 2 GB, so it is clear why it doesn't fit.
On the other hand, I don't see why you shouldn't be able to fit that into a 64-bit process, assuming your machine has enough RAM.
Allocate the memory at runtime -- consider using a memory mapped file as the backing. Like everyone says, 14 gigs is a lot of memory. But it's not unreasonable to find a computer with 14GB of memory, nor is it unreasonable to page the memory as necessary.
With a matrix of this size, you will likely become very curious about memory access performance. Remember to consider the cache grain of your target architecture and if your target has a TLB you may be able to use larger pages to relieve some TLB pressure. Then again, if you don't have enough memory you'll likely care only about how fast your storage I/O is.
If it's not already obvious, you'll need an architecture that supports a 64-bit address space in order to access this memory directly/conveniently.
To initialise the 2D array of floats that you want, you will need:
60000 * 60000 * 4 bytes = 14400000000 bytes
Which is approximately 14GB of memory. That's a LOT of memory. To even hold that theoretically, you will need to be running a 64bit machine, not to mention one with quite a bit of RAM installed.
Furthermore, allocating this much memory is almost never necessary in most situations, are you sure no optimisations could be made here?
EDIT:
In light of new information from your comments on other answers: You only have 4GB memory (RAM). Your operating system is hence going to have to page at least 9GB on the Hard Drive, in reality probably more. But you also only have 20GB of Hard Drive space. This is barely enough to page all that data, especially if the disk is fragmented. Finally, (I could be wrong because you haven't stated explicitly) it is quite possible that you're running a 32bit machine. This isn't really capable of handling more than 4GB of memory at a time.
I had this problem too. I did a workaround where I chopped the array into sections (my biggest allowed array was float A_sub_matrix_20[62944560]). When I declared just one of these in main(), it seems to be put in RAM as I got a runtime exception as soon as main() starts. I was able to declare 20 buffers of that size as global variables which works (looks like in global form they are stored on the HDD - when I added A_sub_matrix_20[n] to the watch list in VisualStudio it gave a message "reading from file").