Background: I am writing a C++ program working with large amounts of geodata, and wish to load large chunks to process at a single go. I am constrained to working with an app compiled for 32 bit machines. The machine I am testing on is running a 64 bit OS (Windows 7) and has 6 gig of ram. Using MS VS 2008.
I have the following code:
byte* pTempBuffer2[3];
try
{
//size_t nBufSize = nBandBytes*m_nBandCount;
pTempBuffer2[0] = new byte[nBandBytes];
pTempBuffer2[1] = new byte[nBandBytes];
pTempBuffer2[2] = new byte[nBandBytes];
}
catch (std::bad_alloc)
{
// If we didn't get the memory just don't buffer and we will get data one
// piece at a time.
return;
}
I was hoping that I would be able to allocate memory until the app reached the 4 gigabyte limit of 32 bit addressing. However, when nBandBytes is 466,560,000 the new throws std::bad_alloc on the second try. At this stage, the working set (memory) value for the process is 665,232 K So, it I don't seem to be able to get even a gig of memory allocated.
There has been some mention of a 2 gig limit for applications in 32 bit Windows which may be extended to 3 gig with the /3GB switch for win32. This is good advice under that environment, but not relevant to this case.
How much memory should you be able to allocate under the 64 bit OS with a 32 bit application?
As much as the OS wants to give you. By default, Windows lets a 32-bit process have 2GB of address space. And this is split into several chunks. One area is set aside for the stack, others for each executable and dll that is loaded. Whatever is left can be dynamically allocated, but there's no guarantee that it'll be one big contiguous chunk. It might be several smaller chunks of a couple of hundred MB each.
If you compile with the LargeAddressAware flag, 64-bit Windows will let you use the full 4GB address space, which should help a bit, but in general,
you shouldn't assume that the available memory is contiguous. You should be able to work with multiple smaller allocations rather than a few big ones, and
You should compile it as a 64-bit application if you need a lot of memory.
on windows 32 bit, the normal process can take 2 GB at maximum, but with /3GB switch it can reach to 3 GB (for windows 2003).
but in your case I think you are allocating contiguous memory, and so the exception occured.
You can allocate as much memory as your page file will let you - even without the /3GB switch, you can allocate 4GB of memory without much difficulty.
Read this article for a good overview of how to think about physical memory, virtual memory, and address space (all three are different things). In a nutshell, you have exactly as much physical memory as you have RAM, but your app really has no interaction with that physical memory at all - it's just a convenient place to store the data that in your virtual memory. Your virtual memory is limited by the size of your pagefile, and the amount your app can use is limited by how much other apps are using (although you can allocate more, providing you don't actually use it). Your address space in the 32 bit world is 4GB. Of those, 2 GB are allocated to the kernel (or 1GB if you use the /3BG switch). Of the 2GB that are left, some is going to be used up by your stack, some by the program you are currently running, (and all the dlls, etc..). It's going to get fragmented, and you are only going to be able to get so much contiguous space - this is where your allocation is failing. But since that address space is just a convenient way to access the virtual memory you have allocated for you, it's possible to allocate much more memory, and bring chunks of it into your address space a few at a time.
Raymond Chen has an example of how to allocate 4GB of memory and map part of it into a section of your address space.
Under 32-bit Windows, the maximum allocatable is 16TB and 256TB in 64 bit Windows.
And if you're really into how memory management works in Windows, read this article.
During the ElephantsDream project the Blender Foundation with Blender 3D had similar problems (though on Mac). Can't include the link but google: blender3d memory allocation problem and it will be the first item.
The solution involved File Mapping. Haven't tried it myself but you can read up on it here: http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx
With nBandBytes at 466,560,000, you are trying to allocate 1.4 GB. A 32-bit app typically only has access to 2 GB of memory (more if you boot with /3GB and the executable is marked as large address space aware). You may be hard pressed to find that many blocks of contiguous address space for your large chunks of memory.
If you want to allocate gigabytes of memory on a 64-bit OS, use a 64-bit process.
You should be able to allocate a total of about 2GB per process. This article (PDF) explains the details. However, you probably won't be able to get a single, contiguous block that is even close to that large.
Even if you allocate in smaller chunks, you couldn't get the memory you need, especially if the surrounding program has unpredictable memory behavior, or if you need to run on different operating systems. In my experience, the heap space on a 32-bit process caps at around 1.2GB.
At this amount of memory, I would recommend manually writing to disk. Wrap your arrays in a class that manages the memory and writes to temporary files when necessary. Hopefully the characteristics of your program are such that you could effectively cache parts of that data without hitting the disk too much.
Sysinternals VMMap is great for investigating virtual address space fragmentation, which is probably limiting how much contiguous memory you can allocate. I recommend setting it to display free space, then sorting by size to find the largest free areas, then sorting by address to see what is separating the largest free areas (probably rebased DLLs, shared memory regions, or other heaps).
Avoiding extremely large contiguous allocations is probably for the best, as others have suggested.
Setting LARGE_ADDRESS_AWARE=YES (as jalf suggested) is good, as long as the libraries that your application depends on are compatible with it. If you do so, you should test your code with the AllocationPreference registry key set to enable top-down virtual address allocation.
Related
I am debugging a program which crashed because no contiguous memory can be used for my vector which needs to be reallocated. So I have a question how come the virtual memory isnot used? In which way can virtual memory be used? Thanks.
Virtual memory is used automatically by OS. You don't need to care about this.
In your case, it's most likely that you run a 32-bit application. User address space for a 32-bit process in Windows is limited to 2 GB (well, 3 GB if Windows is booted with a specific key). If your vector requires more than several hundred megabytes of contiguous address space, this may become a problem (due to address space fragmentation).
Of course, any process can run out of memory (even while using virtual memory and swap file and whatever else). Take a look at memory usage of your program in Task Manager.
Virtual memory is the only memory you ever get as a program running on a modern OS (Linux, Unix, Windows, MacOS, Symbian, etc).
It sounds like your problem is that there isn't one contiguous virtual address range that is large enough for your vector [1]. I suspect what is happening is that you need, say, more than 1.5GB in a 32-bit process, which can only use 2GB at once, so there isn't much "room" on either end to stuff other bits into before the "middle" is smaller than 1.5GB - in particular, if you have a vector that is growing, you will need two copies of the vector, one at it's current size, and one at double the size to copy into.
A simple solution, assuming you know how big the vector needs to be is to set it's size, e.g. vector<int> vec(some_size);
If you don't know, there are some more solutions:
If you have a 64-bit OS, you could try setting the LARGEADDRESSAWARE flag for the executable (assuming it's Windows). That should give you a fair bit more memory, since the 64-bit OS doesn't have to reserve a large chunk of memory space for the OS itself (that lives well outside the 32-bit address range. In a 32-bit OS, you need to boot the OS with /3GB, and set the above flag.
Or compile the code as 64-bit (after upgrading to a 64-bit OS, if needed).
[1] Unless of course, you are writing a driver and trying to allocate many megabytes of physical memory as a buffer to use for DMA - but I think you would have said so.
The problem has nothing to do with memory, or even with virtual memory. An array needs a contiguous range of addresses. The address space (normally 2 GB in a Win32 program) is fragmented so that there is not a large enough space available.
If you could get the addresses Windows would automatically provide the virtual memory to go with them.
It is time to move your app up to 64 bits.
I want to run this huge C++ project that uses up to 8.3 GB in memory. Can I run this program under certain circumstances or is it impossible ?
It's fine. You just need to be on a 64-bit architecture, and ensure that there's sufficient swap space + physical memory available
It really depends. If the program needs to have all the 8.3 GB in memory all the time (working size), you may need to have a similar amount of memory installed in your computer.
Let's now assume you have 4 GB of RAM. In such a case you will be most probably able to execute the program thanks to the use of swap (hard disk area where memory is swapped in and out with the intention of enlarging the virtual memory size). But, even if it may actually work, it could run really slow (up to the point that is not really usable) because of trashing.
On the other hand, if your program processes 8.3 GB of data, but it is processed in smaller chunks, that will mean that all the data is not in memory all the time. Then, you will not need to have installed such a big amount of RAM in your computer.
As Oli Charlesworth was mentioning you will need a 64-bit system (both the hardware and OS) or, at least, a system with PAE capabilities if you want to install more than 4 GB of RAM in your system.
Yes it is possible. You need to be in a 64-bit environment and, of course, have the RAM available. You may still be unable to allocate more than 4gb of contiguous address space at a time. It's possible that you'll have to allocate it in smaller chunks, though.
I'v run into an odd problem, my process cannot allocate more than what seems to be slightly below 1 GiB. Windows Task Manager "Mem Usage" column shows values close to 1 GiB when my software gives a bad_alloc exception. Yes, i'v checked that the value passed to memory allocation is sensible. ( no race condition / corruption exists that would make this fail ). Yes, I need all this memory and there is no way around it. ( It's a buffer for images, which cannot be compressed any further )
I'm not trying to allocate the whole 1 GiB memory in one go, there a few allocations around 300 MiB each. Would this cause problems? ( I'll try to see if making more smaller allocations works any better ). Is there some compiler switch or something else that I must set in order to get past 1 GiB? I've seen others complaining about the 2 GiB limit, which would be fine for me.. I just need little bit more :). I'm using VS 2005 with SP1 and i'm running it on a 32 bit XP and it's in C++.
On a 32-bit OS, a process has a 4GB address space in total.
On Windows, half of this is off-limits, so your process has 2GB.
This is 2GB of contiguous memory. But it gets fragmented. Your executable is loaded in at one address, each DLL is loaded at another address, then there's the stack, and heap allocations and so on. So while your process probably has enough free address space, there are no contiguous blocks large enough to fulfill your requests for memory. So making smaller allocations will probably solve it.
If your application is compiled with the LARGEADDRESSAWARE flag, it will be allowed to use as much of the remaining 2GB as Windows can spare. (And the value of that depends on your platform and environment.
for 32-bit code running on a 64-bit OS, you'll get a full 4-GB address space
for 32-bit code running on a 32-bit OS without the /3GB boot switch, the flag means nothing at all
for 32-bit code running on a 32-bit OS with the /3GB boot switch, you'll get 3GB of address space.
So really, setting the flag is always a good idea if your application can handle it (it's basically a capability flag. It tells Windows that we can handle more memory, so if Windows can too, it should just go ahead and give us as large an address space as possible), but you probably can't rely on it having an effect. Unless you're on a 64-bit OS, it's unlikely to buy you much. (The /3GB boot switch is necessary, and it has been known to cause problems with drivers, especially video drivers)
Allocating big chunks of continuous memory is always a problem.
It is very likely to get more memory in smaller chunks
You should redesign your memory structures.
You are right to suspect the larger 300MB allocations. Your process will be able to get close to 2GB (3 if you use the /3GB boot.ini switch and LARGEADDRESSAWARE link flag), but not as a large contiguous block.
Typical solutions for this are to break up the requests into tiles or strips of fixed size (say 256x256x4 bytes) and write an intermediate class to hide this representation detail.
You can quickly verify this by writing a small allocation loop that allocate blocks of different sizes.
You could also check this function from MSDN. 1GB rings a bell from here:
This parameter must be greater than or equal to 13 pages (for example,
53,248 on systems with a 4K page size), and less than the system-wide
maximum (number of available pages minus 512 pages). The default size
is 345 pages (for example, this is 1,413,120 bytes on systems with a
4K page size).
Here they mentioned that default maximum number of pages allowed for a process is 345 pages which is slightly more than 1GB.
When I have a few big allocs like that to do, I use the Windows function VirtualAlloc, to avoid stressing the default allocator.
Another way forward might be to use nedmalloc in your project.
My C++ program caches lots of objects, and in beginning of each major API call, I want to ensure that there is at least 500 MB available for the API call. I may either be running out of RAM+swap space (consider system with 1 GB RAM + 1 GB SWAP file), or I may be running out of Virtual Address in my process.(I may already be using 3.7 GB out of total 4GB address space). It's not easy for me to approximate how much data I have cached, but I can purge some of it if it is becoming an issue, and do so iteratively till I have 500 MB available in system or address space (whichever is becoming bottleneck). So my requirements are to find in C++ on 32 bit Linux:
A) Find how much RAM + SWAP space is free.
B) How much user space address space is available to my process.
C) How much Virtual Memory the process is already using. Consider it similar to 'Commit Size' or 'Working Set Size' of a process on Windows.
Any answers would be greatly appreciated.
Look at /proc/vmstat there is a lot of information about the system wide memory.
The /proc//maps will give you a lot of information about your process memory layout.
Note that if you check the memory before running a long job, another process may eat all the available memory and your program may crash anyway !
I do not know anything about your cached classes but if these objects are quite small you probably have overridden the new/delete operators. By this it is quite easy to keep track of the memory consumption (at least by counting objects)
Why not change your cache policy ? And flush old unused object.
Another ugly way is to try to allocate several chunk of memory and see the program can allocate it, and release it after that. On 32 bits it may fail because the heap may be fragmented, but if it works you sure that you have enough memory at this time.
Take a look at the source for the vmstat : here. Then search for domem() function, which gather all information about the memory (occupied and free).
I'm using an image manipulation library that throws an exception when I attempt to load images > 4GB in size. It claims to be 64bit, but wouldn't a 64bit library allow loading images larger than that? I think they recompiled their C libraries using a 64 bit memory model/compiler but still used unsigned integers and failed upgrade to use 64 bit types.
Is that a reasonable conclusion?
Edit - As an after-thought can OS memory become so fragemented that allocation of large chunks is no longer possible? (It doesn't work right after a reboot either, but just wondering.) What about under .NET? Can the .NET managed memory become so fragmented that allocation of large chunks fails?
It's a reasonable suggestion, however the exact cause could be a number of things - for example what OS are you running, how much RAM / swap do you have? The application/OS may not over-commit virtual memory so you'll need 4GB (or more) of free RAM to open the image.
Out of interest does it seem to be a definite stop at the 4GB boundary - i.e. does a 3.99GB image succeed, but a 4GB one fail - you say it does which would suggest a definite use of a 32bit size in the libraries data structures.
Update
With regards your second question - not really. Pretty much all modern OS's use virtual memory, so each process gets it's own contiguous address space. A single contiguous region in a processes' address space doesn't need to be backed by contiguous physical RAM, it can be made up of a number of separate physical areas of RAM made to look like they are contiguous; so the OS doesn't need to have a single 4GB chunk of RAM free to give your application a 4GB chunk.
It's possible that an application could fragment it's virtual address space such that there isn't room for a contiguous 4GB region, but considering the size of a 64-bit address space it's probably highly unlikely in your scenario.
Yes, unless perhaps the binary file format itself limits the size of images.