Memory allocation in MSVC++ 2019 - c++

I have a question regarding the memory allocation, particularly when using MSVC2019.
I have a C++ program compiled to x64.
By debugging I saw, that allocating variables result in very high pointer addresses, pointing into locations over the first 4GB address space (32bit). If I check the program in the Task Manager, I see it is using only around 30-50MBs of memory.
What is the reason that the variables are not allocated in the lower part of the virtual memory space when practically the whole address space under 4GB is unused?
I would expect the allocation to start from low addresses, and until the first 4GB space used, no need to allocate space over this.
Why is this interesting for me:
I have a big SW containing more than 15 years old C++ code, which was not everywhere prepared to be 64bit, on many places it casts pointers to 32bit types and by this the pointers are damaged. Most probably the original authors assumed the pointers are 32bit. What should be practically true also when compiled to 64bit, hence the program is not using much memory, the memory usage does not grow over 4GB. And it seems when compiled using compilers from 2010, this problem does not appear, probably that time the memory allocations resulted addresses in the first 4GB block even if compiled for x64.
My question is:
can this allocation strategy influenced somehow in MSVC++ 2019? Eg. to instruct he compiler/linker/memory manager to prefer allocation in the first 32bit space until no more is needed? Or, to set a size limit for the virtual address space offered by the memory manager, eg. by setting to 2GB I could achieve there will never be any pointer pointing to an allocated block over 4GB. By this, the old code would survive the cast operations assuming a pointer is 32bit.
I already tried to set NO for high memory awareness in the linker option, and checked the heap parameters, but none of them helped.
Thank you!

If your program assumes pointers will be 32-bit, you will just have to compile for 32-bit until you get proper declarations in place using ifdef to check what you are compiling for.
Just pick the x86 instead of x64 from the dropdown as a work around until you modernize your legacy code.
There's more you can do with a big address space, and since the os maps these to portions of physical memory anyway, the compiler simply chose to reap the benefits for keeping different portions of the address space apart for different purposes.
There are ways to create custom heaps and to allocate things on a specific address space if that space is available, however to work these into code would likely take just as long and be going backwards compared to properly allocating correct sizes.

Welcome to the world of virtual memory! In fact to dynamically allocate memory, the standard library kindly asks the kernel to provide it. And only the kernel is reponsable for the virtual addresses given to the program. As each process has its own virtual address translator, multiple processes can be given the same virtual addresses.
As a programmer, you should never worry about that. Use the memory addresses that the kernel has given to you and keep on. If you have to use legacy code assuming that a pointer cannot exceed 32 bits, you should simply not compile it in 64 bits mode but only in 32 bits mode.

Related

Is it true that 32Bit program will be out of memory, if other programs use too much, in 64bit windows?

I am developing a 32 bit application and got out of memory error.
And I noticed that my Visual Studio and a plugin (other apps too) used too much memory which is around 4 or 5 GB.
So I suspected that these program use up all the memory addresses where my program is able to find free memory.
I suppose that 32 bit can only use the first 4 GB, other memory it can not use at all.
I don't know if I am correct with this, other wise I will look for other answers, like I have bug in my code.
Your statement of
I suppose that 32bit can only use the first 4 giga byte, othere momery
it can not use at all.
is definitely incorrect. In a 64-bit OS, all applications can use all of the memory, regardless of what bitness it is, thanks to the translation table for virtual to physical memory being 64-bit.
Some really ancient hardware may not allow DMA to addresses above 4GB, but I really hope most of that is in the junk-yard by now.
If the system as a whole is running low on memory, it will affect all applications more or less equally.
However, a 32-bit application can only, by default, use the lower 2GB of the virtual address range (although these 2GB can be placed anywhere in the physical memory, as described above by means of a 64-bit translation table). You can extend this to nearly 4GB (3GB in a 32-bit OS, and subject to the /3GB boot flag in this case) by using /LARGEADDRESSAWARE in your linking command - this simply tells the OS that your application will "understand" that addresses can be negative, and thus will operate correctly with addresses over 2GB.
Any system can be brought down by a too heavy load.
But in normal use in Windows and any other virtual memory OS, the memory consumption of other programs does not much affect any given program execution.
Getting an out of memory error is unusual, but it can happen if you make a large allocation or if you declare a large local automatic variable. It can also happen if you fail to properly deallocate memory that's no longer used, i.e. if the program is leaking memory. For a 32-bit program on a 64-bit machine it's then not memory itself that's used up, but available address space within the program.

when will virtual memory be used (windows)?

I am debugging a program which crashed because no contiguous memory can be used for my vector which needs to be reallocated. So I have a question how come the virtual memory isnot used? In which way can virtual memory be used? Thanks.
Virtual memory is used automatically by OS. You don't need to care about this.
In your case, it's most likely that you run a 32-bit application. User address space for a 32-bit process in Windows is limited to 2 GB (well, 3 GB if Windows is booted with a specific key). If your vector requires more than several hundred megabytes of contiguous address space, this may become a problem (due to address space fragmentation).
Of course, any process can run out of memory (even while using virtual memory and swap file and whatever else). Take a look at memory usage of your program in Task Manager.
Virtual memory is the only memory you ever get as a program running on a modern OS (Linux, Unix, Windows, MacOS, Symbian, etc).
It sounds like your problem is that there isn't one contiguous virtual address range that is large enough for your vector [1]. I suspect what is happening is that you need, say, more than 1.5GB in a 32-bit process, which can only use 2GB at once, so there isn't much "room" on either end to stuff other bits into before the "middle" is smaller than 1.5GB - in particular, if you have a vector that is growing, you will need two copies of the vector, one at it's current size, and one at double the size to copy into.
A simple solution, assuming you know how big the vector needs to be is to set it's size, e.g. vector<int> vec(some_size);
If you don't know, there are some more solutions:
If you have a 64-bit OS, you could try setting the LARGEADDRESSAWARE flag for the executable (assuming it's Windows). That should give you a fair bit more memory, since the 64-bit OS doesn't have to reserve a large chunk of memory space for the OS itself (that lives well outside the 32-bit address range. In a 32-bit OS, you need to boot the OS with /3GB, and set the above flag.
Or compile the code as 64-bit (after upgrading to a 64-bit OS, if needed).
[1] Unless of course, you are writing a driver and trying to allocate many megabytes of physical memory as a buffer to use for DMA - but I think you would have said so.
The problem has nothing to do with memory, or even with virtual memory. An array needs a contiguous range of addresses. The address space (normally 2 GB in a Win32 program) is fragmented so that there is not a large enough space available.
If you could get the addresses Windows would automatically provide the virtual memory to go with them.
It is time to move your app up to 64 bits.

How to get the amount of virtual memory available in C++?

I would like to map a file into memory using mmap function and would like to know if the amount of virtual memory on the current platform is sufficient to map a huge file. For a 32 system I cannot map file larger than 4 Gb.
Would std::numeric_limits<size_t>::max() give me the amount of addressable memory or is there any other type that I should test (off_t or something else)?
As Lie Ryan has pointed out in his comment the "virtual memory" here is misused. The question, however holds: there is a type associated with a pointer and it has the maximum value that defines the upper limit of what you can possibly adress on your system. What is this type? Is it size_t or perhaps ptrdiff_t?
size_t is only required to be big enough to store the biggest possible single contiguous object. That may not be the same as the size of the address space (on systems with a segmented memory model, for example)
However, on common platforms with a flat memory space, the two are equal, and so you can get away with using size_t in practice if you know the target CPU.
Anyway, this doesn't really tell you anything useful. Sure, a 32-bit CPU has a 4GB memory space, and so size_t is a 32-bit unsigned integer. But that says nothing about how much you can allocate. Some part of the memory space is used by the OS. And some parts are already used by your own application: for mapping the executable into memory (as well as any dynamic libraries it may use), for each thread's stack, allocated memory on the heap and so on.
So no, tricks such as taking the size of size_t tells you a little bit about the address space you're running in, but nothing very usable. You can ask the OS how much memory is in use by your process and other metrics, but again, that doesn't really help you much. It is possible for a process to use just a couple of megabytes, but have that spread out over so many small allocations that it's impossible to find a contiguous block of memory larger than 100MB, say. And so, on a 32-bit machine, with a process that uses nearly no memory, you'd be unlikely to make such an allocation. (And even if the OS had a magical WhatIsTheLargestPossibleMemoryAllocationICanMake() API, that still wouldn't help you. It would tell you what you needed from a moment ago. You have no guarantee that the answer would still be valid by the time you tried to map the file.
So really, the best you can do is try to map the file, and see if it fails.
Hi you can use GlobalMemoryStatusEx and VirtualQueryEx if you coding in win32
Thing is, the size of a pointer tells you nothing about how much of that "address space" is actually available to you, i.e. can be mapped as a single contiguous chunk.
It's limited by:
the operating system. It may choose to only make a subset of the theoretically-possible address range available to you, because mappable memory is needed for OS-own purposes (like, say, making the graphics card framebuffer visible, and of course for use by the OS itself).
configurable limits. On Linux / UNIX, the "ulimit" command resp. setrlimit() system call allows to restrict the maximum size of an application's address space in various ways, and Windows has similar options through registry parameters.
the history of the application. If the application uses memory mapping extensively, the address space can fragment limiting the maximum size of "available" contiguous virtual addresses.
the hardware platform. Some CPUs have address spaces with "holes"; an example of that is 64bit x86 where pointers are only valid if they're between 0x0..0x7fffffffffff or 0xffff000000000000 and 0xffffffffffffffff. I.e. you have 2x128TB instead of the full 16EB. Think of it as 48-bit "signed" pointers ...
Finally, don't confuse "available memory" and "available address space". There's a difference between doing a malloc(someBigSize) and a mmap(..., someBigSize, ...) because the former might require availability of physical memory to accommodate the request while the latter usually only requires availability of a large-enough free address range.
For UNIX platforms, part of the answer is to use getrlimit(RLIMIT_AS) as this gives the upper bound for the current invocation of your application - as said, the user and/or admin can configure this. You're guaranteed that any attempt to mmap areas larger than that will fail.
Re your rephrased question "upper limit of what you can possibly adress on your system", is somewhat misleading; it's hardware architecture specific. There are 64bit architectures out there (x64, sparc) whose MMU happily allows (uintptr_t)(-1) as valid address, i.e. you can map something into the last page of a 64bit address space. Whether the operating system allows an application to do so or not is again an entirely different question ...
For user applications, the "high mark" isn't (always) fixed a-priori. It's tunable on e.g. Solaris or Linux. That's where getrlimit(RLIMIT_AS) comes in.
Note that again, by specification, there'd be nothing to prevent a (weird) operating system design to choose e.g. putting application stacks and heaps at "low" addresses while putting code at "high" addresses, on a platform with address space holes. You'd need full 64bit pointers there, can't make them any smaller, but there could be an arbitrary number of "inaccessible / invalid" ranges which never are made available to your app.
You can try sizeof(int*). This will give you the length (in bytes) of a pointer on the target platform. Thus, you can find out how big the addressable space is.

How to reserve bottom 4GB VM in an x64 C++ app

Working on porting a 32bit Windows C++ app to 64 bit. Unfortunately, the code uses frequent casting in both directions between DWORD and pointer values.
One of the ideas is to reserve the first 4GB of virtual process space as early as possible during process startup so that all subsequent calls to reserve memory will be from virtual addresses greater than 4 GB. This would cause an access violation error any unsafe cast from pointer to DWORD and then back to pointer and would help catch errors early.
When I look at the memory map of a very simple one line C++ program, there are many libraries loaded within bottom 4GB? Is there a way to make sure that all libraries, etc get loaded only above 4GB?
Thanks
Compile your project with /Wp64 switch (Detect 64-bit Portability Issues) and fix all warnings.
As a programmer, what do I need to worry about when moving to 64-bit windows?
You could insert calls to VirtualAlloc() as early as possible in your application, to allocate memory in the lower 4GB. If you use the MEM_RESERVE parameter, then only virtual memory space is allocated and so this will only use a very small amount of actual RAM.
However, this will only help you for memory allocated from the heap - any static data in your program will have already been allocated before WinMain(), and so you won't be able to change it's location.
(As an aside, even if you could reserve memory before your main binary was loaded, I think that the main binary needs to be loaded at a specific address - unless it is a built as a position-independent executable.)
Bruce Dawson posted code for a technique to reserve the bottom 4 GB of VM:
https://randomascii.wordpress.com/2012/02/14/64-bit-made-easy/
It reserves most of the address space (not actual memory) using VirtualAlloc, then goes after the process heap with HeapAlloc, and finishes off the CRT heap with malloc. It is straightforward, fast, and works great. On my machine it does about 3.8 GB of virtual allocations and only 1 MB of actual allocations.
The first time I tried it, I immediately found a longstanding bug in the project I was working on. Highly recommended.
The best solution is to fix these casts ...
You may get away with it truncated the pointer regardless (Same as casting to a POINTER_32) because I believe windows favours the lower 4GB for your application anyway. This is in no way guaranteed, though. You really are best off fixing these problem.
Search the code for "(DWORD)" and fix any you find. There is no better solution ...
What you are asking for is, essentially, to run 64-bit code in a 32-bit memory mode with AWE enabled (ie lose all the real advantages of 64-bit). I don't think microsoft could be bothered providing this for so little gain ... and who can blame them?

How much memory should you be able to allocate?

Background: I am writing a C++ program working with large amounts of geodata, and wish to load large chunks to process at a single go. I am constrained to working with an app compiled for 32 bit machines. The machine I am testing on is running a 64 bit OS (Windows 7) and has 6 gig of ram. Using MS VS 2008.
I have the following code:
byte* pTempBuffer2[3];
try
{
//size_t nBufSize = nBandBytes*m_nBandCount;
pTempBuffer2[0] = new byte[nBandBytes];
pTempBuffer2[1] = new byte[nBandBytes];
pTempBuffer2[2] = new byte[nBandBytes];
}
catch (std::bad_alloc)
{
// If we didn't get the memory just don't buffer and we will get data one
// piece at a time.
return;
}
I was hoping that I would be able to allocate memory until the app reached the 4 gigabyte limit of 32 bit addressing. However, when nBandBytes is 466,560,000 the new throws std::bad_alloc on the second try. At this stage, the working set (memory) value for the process is 665,232 K So, it I don't seem to be able to get even a gig of memory allocated.
There has been some mention of a 2 gig limit for applications in 32 bit Windows which may be extended to 3 gig with the /3GB switch for win32. This is good advice under that environment, but not relevant to this case.
How much memory should you be able to allocate under the 64 bit OS with a 32 bit application?
As much as the OS wants to give you. By default, Windows lets a 32-bit process have 2GB of address space. And this is split into several chunks. One area is set aside for the stack, others for each executable and dll that is loaded. Whatever is left can be dynamically allocated, but there's no guarantee that it'll be one big contiguous chunk. It might be several smaller chunks of a couple of hundred MB each.
If you compile with the LargeAddressAware flag, 64-bit Windows will let you use the full 4GB address space, which should help a bit, but in general,
you shouldn't assume that the available memory is contiguous. You should be able to work with multiple smaller allocations rather than a few big ones, and
You should compile it as a 64-bit application if you need a lot of memory.
on windows 32 bit, the normal process can take 2 GB at maximum, but with /3GB switch it can reach to 3 GB (for windows 2003).
but in your case I think you are allocating contiguous memory, and so the exception occured.
You can allocate as much memory as your page file will let you - even without the /3GB switch, you can allocate 4GB of memory without much difficulty.
Read this article for a good overview of how to think about physical memory, virtual memory, and address space (all three are different things). In a nutshell, you have exactly as much physical memory as you have RAM, but your app really has no interaction with that physical memory at all - it's just a convenient place to store the data that in your virtual memory. Your virtual memory is limited by the size of your pagefile, and the amount your app can use is limited by how much other apps are using (although you can allocate more, providing you don't actually use it). Your address space in the 32 bit world is 4GB. Of those, 2 GB are allocated to the kernel (or 1GB if you use the /3BG switch). Of the 2GB that are left, some is going to be used up by your stack, some by the program you are currently running, (and all the dlls, etc..). It's going to get fragmented, and you are only going to be able to get so much contiguous space - this is where your allocation is failing. But since that address space is just a convenient way to access the virtual memory you have allocated for you, it's possible to allocate much more memory, and bring chunks of it into your address space a few at a time.
Raymond Chen has an example of how to allocate 4GB of memory and map part of it into a section of your address space.
Under 32-bit Windows, the maximum allocatable is 16TB and 256TB in 64 bit Windows.
And if you're really into how memory management works in Windows, read this article.
During the ElephantsDream project the Blender Foundation with Blender 3D had similar problems (though on Mac). Can't include the link but google: blender3d memory allocation problem and it will be the first item.
The solution involved File Mapping. Haven't tried it myself but you can read up on it here: http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx
With nBandBytes at 466,560,000, you are trying to allocate 1.4 GB. A 32-bit app typically only has access to 2 GB of memory (more if you boot with /3GB and the executable is marked as large address space aware). You may be hard pressed to find that many blocks of contiguous address space for your large chunks of memory.
If you want to allocate gigabytes of memory on a 64-bit OS, use a 64-bit process.
You should be able to allocate a total of about 2GB per process. This article (PDF) explains the details. However, you probably won't be able to get a single, contiguous block that is even close to that large.
Even if you allocate in smaller chunks, you couldn't get the memory you need, especially if the surrounding program has unpredictable memory behavior, or if you need to run on different operating systems. In my experience, the heap space on a 32-bit process caps at around 1.2GB.
At this amount of memory, I would recommend manually writing to disk. Wrap your arrays in a class that manages the memory and writes to temporary files when necessary. Hopefully the characteristics of your program are such that you could effectively cache parts of that data without hitting the disk too much.
Sysinternals VMMap is great for investigating virtual address space fragmentation, which is probably limiting how much contiguous memory you can allocate. I recommend setting it to display free space, then sorting by size to find the largest free areas, then sorting by address to see what is separating the largest free areas (probably rebased DLLs, shared memory regions, or other heaps).
Avoiding extremely large contiguous allocations is probably for the best, as others have suggested.
Setting LARGE_ADDRESS_AWARE=YES (as jalf suggested) is good, as long as the libraries that your application depends on are compatible with it. If you do so, you should test your code with the AllocationPreference registry key set to enable top-down virtual address allocation.