64bit Memory allocation - c++

I've been asked to create a Delphi compatible dll in C++ to do simple 64bit memory management.
The background is that the system in Delphi needs to allocate a lots of chunks of memory that would go well outside 32bit addressable space. The Delphi developer explained to me that he could not allocate memory with the Delphi commands available to him. He says that he can hold a 64bit address, so he just wants to call a function I provide to allocate the memory and return a 64bit pointer to him. Then another function to free up the memory later.
Now, I only have VS 2008 at my disposal so firstly I'm not even sure I can create a Delphi compatible dll in the first place.
Any Delphi experts care to help me out. Maybe there is a way to achieve what he requires without re-inventing the wheel. Other developers must have come across this before in Delphi.
All comments appreciated.

Only 64 bit processes can address 64 bit memory. A 64 bit process can only load 64 bit dlls and 32 bits processes can only load 32 bits dlls. Delphi's compiler can only make 32 bits binaries.
So a 32 bits Delphi exe can not load your 64 bit c++ dll. It could load a 32 bit c++ dll, but then that dll wouldn't be able to address the 64 bit memory space. You are kind of stuck with this solution.
Delphi could, with the right compiler options and Windows switches address 3GB of memory without problems. Even more memory could be accessed by a 32 bits process if it uses Physical Address Extension. It then needs to swap memory pages in and out of the 32 bits memory through the use of Address Windowing Extensions.

Delphi pointers are 32-bit. Period. Your Delphi developer may be able to 'store' the 64-bit values you want to return to him, but he can't access the memory that they point to, so it's pretty futile.
Previously, I'd written:-
A 64-bit version of Delphi is on
Codegear/Embarcadero's road map
for "middle of 2009". Product quality
seems to be (at last!) taking
precedence over hitting ship dates
exactly, so don't hold your breath...
But, in August 2010, Embarcadero published a new roadmap here. This doesn't give specific dates, but mentions a 64-bit Compiler Preview, with Projected Availability, 1st Half of 2011.

You might take a look at Free Pascal as it includes a 64 bit version and is mostly Delphi compatible syntax.

In order to allocate memory shared by multiple process, you should use a memory mapped file.
The code available at http://www.delphifaq.com/faq/delphi_windows_API/f348.shtml can be used to communicate between a 32 bit and a 64 bit process.
Here are the steps:
Create a memory mapped file, either on disk, either on memory;
Create a mutex to notify file change;
One end write some data to the memory mapped file;
Then it flags the mutex;
Other end receive the mutex notification;
Then it reads the data from the memory mapped file.
It's up to you to create a custom binary layout in the memory mapped file, in order to share any data.
By design, memory mapped files are fast (it's a kernel-level / x86 CPU feature), and can handle huge memory (up to 1 GB for a 32 bit process, from my experiment).
This kind of communication is used by http://cc.embarcadero.com/Author/802978 to call any 64 bit dll from a 32 bit Delphi program.

You might also want to add a way to pin and unpin that 64-bit pointer to a 32-bit memory address. Since this is Delphi, I'm pretty sure it's Windows specific, so you might as well use Address Windowing Extensions. That way, you can support allocating, freeing, and pinning and unpinning memory to a 32-bit address range and still take advantage of a 64-bit memory allocation space. Assuming that the user will actually commit the memory such that it fits in the 32-bit virtual address space.

Related

Memory allocation in MSVC++ 2019

I have a question regarding the memory allocation, particularly when using MSVC2019.
I have a C++ program compiled to x64.
By debugging I saw, that allocating variables result in very high pointer addresses, pointing into locations over the first 4GB address space (32bit). If I check the program in the Task Manager, I see it is using only around 30-50MBs of memory.
What is the reason that the variables are not allocated in the lower part of the virtual memory space when practically the whole address space under 4GB is unused?
I would expect the allocation to start from low addresses, and until the first 4GB space used, no need to allocate space over this.
Why is this interesting for me:
I have a big SW containing more than 15 years old C++ code, which was not everywhere prepared to be 64bit, on many places it casts pointers to 32bit types and by this the pointers are damaged. Most probably the original authors assumed the pointers are 32bit. What should be practically true also when compiled to 64bit, hence the program is not using much memory, the memory usage does not grow over 4GB. And it seems when compiled using compilers from 2010, this problem does not appear, probably that time the memory allocations resulted addresses in the first 4GB block even if compiled for x64.
My question is:
can this allocation strategy influenced somehow in MSVC++ 2019? Eg. to instruct he compiler/linker/memory manager to prefer allocation in the first 32bit space until no more is needed? Or, to set a size limit for the virtual address space offered by the memory manager, eg. by setting to 2GB I could achieve there will never be any pointer pointing to an allocated block over 4GB. By this, the old code would survive the cast operations assuming a pointer is 32bit.
I already tried to set NO for high memory awareness in the linker option, and checked the heap parameters, but none of them helped.
Thank you!
If your program assumes pointers will be 32-bit, you will just have to compile for 32-bit until you get proper declarations in place using ifdef to check what you are compiling for.
Just pick the x86 instead of x64 from the dropdown as a work around until you modernize your legacy code.
There's more you can do with a big address space, and since the os maps these to portions of physical memory anyway, the compiler simply chose to reap the benefits for keeping different portions of the address space apart for different purposes.
There are ways to create custom heaps and to allocate things on a specific address space if that space is available, however to work these into code would likely take just as long and be going backwards compared to properly allocating correct sizes.
Welcome to the world of virtual memory! In fact to dynamically allocate memory, the standard library kindly asks the kernel to provide it. And only the kernel is reponsable for the virtual addresses given to the program. As each process has its own virtual address translator, multiple processes can be given the same virtual addresses.
As a programmer, you should never worry about that. Use the memory addresses that the kernel has given to you and keep on. If you have to use legacy code assuming that a pointer cannot exceed 32 bits, you should simply not compile it in 64 bits mode but only in 32 bits mode.

Can we assume the last 2 bits of a memory address are 00 and reuse those bits? A windows 7 page fault blue screen

My friend is programming in c++ on a 64-bit windows 7 PC and he came up with a crazy idea to save a little bit of memory: He observed the last 2 bits of his memory addresses seemed to always be 00, so he figured he could use those bits for other things and then when a memory address is needed, simply use a bit-mask to set the last 2 bits to 0, either when writing or reading memory. The reason why he's only using the last 2 bits is that it needs to work on 32-bit systems too. Anyway, on his Windows 7 64-bit system he got the following blue screen error when running his program:
PAGE_FAULT_IN_NON_PAGED_AREA
Could his crazy memory savings idea be causing this? I.e., can it sometimes happen that the last 2 bits of a memory address are NOT 00, and he's accessing memory that's partly on one of his memory pages, partly off his page? In any event, he needs this to work on ALL popular operating systems. So the question applies to other operating systems as well.
If (in Windows 7 64 bit, at least) his scheme IS guaranteed to work (if coded properly), then what else could be causing the unusual blue screen crash?
Your friend is taking advantage of a feature known as tagged pointers. On Windows, Raymond Chen has a warning regarding that on his blog:
There is no /8TB flag on 64-bit Windows
A customer reported that their 64-bit application was crashing on Windows 8.1. They traced the problem back to the fact that the user-mode address space for 64-bit applications on Windows 8.1 is 128TB, whereas it was only 8TB on earlier versions of Windows for x64. ...
...
As for how they ended up having a dependency on the address space being at most 8TB, they didn't say, but I have a guess: They are using the unused bits for tagging.
If you are going to use tagged pointers, you need to put your tag bits in the least significant bits, since those are bits you control. For example, if you align all your objects on 16-byte boundaries, then you have four available bits for tagging. If you're going to use upper bits for tagging, at least verify that those upper bits are available.
Something more important to watch out for - a memory pointer allocated by the OS might be aligned in a way that allows for tagging, but if an intermediate memory manager sits between the user's code and the OS (which is usually the case), that manager allocates OS memory internally and divides it up for the app to use, so the pointers that the manager gives out to the app might not be aligned in a way that allows for tagging. You cannot tag any arbitrary memory pointer without knowing where it came from or how it is aligned.
What you're seeing is pointer alignment. On modern computers, objects of size 2/4/8 have addresses which can be divided by 2/4/8. In binary, that means they end with at least 1,2 or 3 zero bits respectively.
Failing to adhere to this rule could crash your application, but not your OS. (Unless you're writing drivers, but then you already know this stuff).
On Windows, however, this error is generally fixed for you.
There could be another issue, though. By modifying the address like that, you shift the object back in memory. If you only allocated 4 bytes, and then move a 4 byte object 3 back (by adding 00000011 to the address), you'll throw away 3 bytes but also use 3 bytes that weren't allocated to your program. Again: program crash, not OS crash. However, Windows won't fix this one.
Just because malloc/operator new tends to return addresses that are 0 mod 4 or 0 mod 8, does not mean that plenty of other programming circumstances do not end up with pointers that are not so. This plan can't work.

Is it true that 32Bit program will be out of memory, if other programs use too much, in 64bit windows?

I am developing a 32 bit application and got out of memory error.
And I noticed that my Visual Studio and a plugin (other apps too) used too much memory which is around 4 or 5 GB.
So I suspected that these program use up all the memory addresses where my program is able to find free memory.
I suppose that 32 bit can only use the first 4 GB, other memory it can not use at all.
I don't know if I am correct with this, other wise I will look for other answers, like I have bug in my code.
Your statement of
I suppose that 32bit can only use the first 4 giga byte, othere momery
it can not use at all.
is definitely incorrect. In a 64-bit OS, all applications can use all of the memory, regardless of what bitness it is, thanks to the translation table for virtual to physical memory being 64-bit.
Some really ancient hardware may not allow DMA to addresses above 4GB, but I really hope most of that is in the junk-yard by now.
If the system as a whole is running low on memory, it will affect all applications more or less equally.
However, a 32-bit application can only, by default, use the lower 2GB of the virtual address range (although these 2GB can be placed anywhere in the physical memory, as described above by means of a 64-bit translation table). You can extend this to nearly 4GB (3GB in a 32-bit OS, and subject to the /3GB boot flag in this case) by using /LARGEADDRESSAWARE in your linking command - this simply tells the OS that your application will "understand" that addresses can be negative, and thus will operate correctly with addresses over 2GB.
Any system can be brought down by a too heavy load.
But in normal use in Windows and any other virtual memory OS, the memory consumption of other programs does not much affect any given program execution.
Getting an out of memory error is unusual, but it can happen if you make a large allocation or if you declare a large local automatic variable. It can also happen if you fail to properly deallocate memory that's no longer used, i.e. if the program is leaking memory. For a 32-bit program on a 64-bit machine it's then not memory itself that's used up, but available address space within the program.

The limited allocation size C++

I use Visual Studio 2008.
I have dynamically declared the variable big_massive:
unsigned int *big_massive = new unsigned int[1073741824]
But, when I tried to debug this program, I got following error: Invalid allocation size: 4294967295 bytes.
I hope there are any path to avoid such error? Thank you!
That allocation is simply not possible on 32bit x86 systems with sizeof(int)==4 (you are requesting 4GB). A process's total address space is limited to 4GB, and the process itself is usually limited to less than that (2GB or 3GB for 32bit Windows depending on boot.ini settings and the Windows edition, not sure which limit applies for 32bit processes on 64bit Windows, but 4GB is simply not possible).
For the 64bit case, you'd need to have 4GB of virtual memory available to back that allocation for it to succeed.
Amount of virtual memory per process on 32bit Windows system or 64bit Windows system running a 32bit program (WoW64): 2147483648
Amount of memory needed to hold an array of 1073741824 4-byte unsigned integers: 4294967296
Can't possibly fit in the amount of memory available, so it's an invalid allocation.
A 32 bit system cannot access more than 4GB of memory per process. However, allocating 3GB of memory is fine on OS supporting lazy allocation and overcommiting, even if you only use the first 10kB, and your maximum swap+memory is 1GB anyway. But keep in mind that relying on this is stupid in the first place.
Before trying to use that much memory, check if you can't represent your data in a more compact form. If your array has holes, or values are repeated, or you don't use the full 32bit range of your int, or you don't need those values to have a specific order, just do not use an array.
Remember RAM is for temporary data. If your data need to be written on disk, why don't you use disk space in the first place. You might even use memory-mapped files (you select a part of your file, and you can access it like memory). You might also like the (easier or not) alternatives of databases management systems.

Can you allocate a very large single chunk of memory ( > 4GB ) in c or c++?

With very large amounts of ram these days I was wondering, it is possible to allocate a single chunk of memory that is larger than 4GB? Or would I need to allocate a bunch of smaller chunks and handle switching between them?
Why???
I'm working on processing some openstreetmap xml data and these files are huge. I'm currently streaming them in since I can't load them all in one chunk but I just got curious about the upper limits on malloc or new.
Short answer: Not likely
In order for this to work, you absolutely would have to use a 64-bit processor.
Secondly, it would depend on the Operating System support for allocating more than 4G of RAM to a single process.
In theory, it would be possible, but you would have to read the documentation for the memory allocator. You would also be more susceptible to memory fragmentation issues.
There is good information on Windows memory management.
A Primer on physcal and virtual memory layouts
You would need a 64-bit CPU and O/S build and almost certainly enough memory to avoid thrashing your working set. A bit of background:
A 32 bit machine (by and large) has registers that can store one of 2^32 (4,294,967,296) unique values. This means that a 32-bit pointer can address any one of 2^32 unique memory locations, which is where the magic 4GB limit comes from.
Some 32 bit systems such as the SPARCV8 or Xeon have MMU's that pull a trick to allow more physical memory. This allows multiple processes to take up memory totalling more than 4GB in aggregate, but each process is limited to its own 32 bit virtual address space. For a single process looking at a virtual address space, only 2^32 distinct physical locations can be mapped by a 32 bit pointer.
I won't go into the details but This presentation (warning: powerpoint) describes how this works. Some operating systems have facilities (such as those described Here - thanks to FP above) to manipulate the MMU and swap different physical locations into the virtual address space under user level control.
The operating system and memory mapped I/O will take up some of the virtual address space, so not all of that 4GB is necessarily available to the process. As an example, Windows defaults to taking 2GB of this, but can be set to only take 1GB if the /3G switch is invoked on boot. This means that a single process on a 32 bit architecture of this sort can only build a contiguous data structure of somewhat less than 4GB in memory.
This means you would have to explicitly use the PAE facilities on Windows or Equivalent facilities on Linux to manually swap in the overlays. This is not necessarily that hard, but it will take some time to get working.
Alternatively you can get a 64-bit box with lots of memory and these problems more or less go away. A 64 bit architecture with 64 bit pointers can build a contiguous data structure with as many as 2^64 (18,446,744,073,709,551,616) unique addresses, at least in theory. This allows larger contiguous data structures to be built and managed.
The advantage of memory mapped files is that you can open a file much bigger than 4Gb (almost infinite on NTFS!) and have multiple <4Gb memory windows into it.
It's much more efficent than opening a file and reading it into memory,on most operating systems it uses the built-in paging support.
This shouldn't be a problem with a 64-bit OS (and a machine that has that much memory).
If malloc can't cope then the OS will certainly provide APIs that allow you to allocate memory directly. Under Windows you can use the VirtualAlloc API.
it depends on which C compiler you're using, and on what platform (of course) but there's no fundamental reason why you cannot allocate the largest chunk of contiguously available memory - which may be less than you need. And of course you may have to be using a 64-bit system to address than much RAM...
see Malloc for history and details
call HeapMax in alloc.h to get the largest available block size
Have you considered using memory mapped files? Since you are loading in really huge files, it would seem that this might be the best way to go.
It depends on whether the OS will give you virtual address space that allows addressing memory above 4GB and whether the compiler supports allocating it using new/malloc.
For 32-bit Windows you won't be able to get single chunk bigger than 4GB, as the pointer size is 32-bit, thus limiting your virtual address space to 4GB. (You could use Physical Address Extension to get more than 4GB memory; however, I believe you have to map that memory into the virtualaddress space of 4GB yourself)
For 64-bit Windows, the VC++ compiler supports 64-bit pointers with theoretical limit of the virtual address space to 8TB.
I suspect the same applies for Linux/gcc - 32-bit does not allow you, whereas 64-bit allows you.
As Rob pointed out, VirtualAlloc for Windows is a good option for this, as is an anonymouse file mapping. However, specifically with respect to your question, the answer to "if C or C++" can allocate, the answer is NO THIS IS NOT SUPPORTED EVEN ON WIN7 RC 64
In the PE/COFF specification for exe files, the field which specifies the HEAP reserve and HEAP commit, is a 32 bit quantity. This is in-line with the physical size limitations of the current heap implmentation in the windows CRT, which is just short of 4GB. So, there is no way to allocate more than 4GB from C/C++ (technicall the OS support facilities of CreateFileMapping and VirtualAlloc/VirtualAllocNuma etc... are not C or C++).
Also, BE AWARE that there are underlying x86 or amd64 ABI construct's known as the page table's. This WILL in effect do what you are concerened about, allocating smaller chunks for your larger request, even though this is happining in kernel memory, there is an effect on the overall system, these tables are finite.
If you are allocating memory in such grandious purportions, you would be well advised to allocate based on the allocation granularity (which VirtualAlloc enforces) and also to identify optional flags's or methods to enable larger pages.
4kb pages were the initial page size for the 386, subsaquently the pentium added 4MB. Today, the AMD64 (Software Optimization Guide for AMD Family 10h Processors) has a maximum page table entry size of 1GB. This mean's for your case here, let's say you just did 4GB, it would require only 4 unique entries in the kernel's directory to locate\assign and permission your process's memory.
Microsoft has also released this manual that articulates some of the finer points of application memory and it's use for the Vista/2008 platform and newer.
Contents
Introduction. 4
About the Memory Manager 4
Virtual Address Space. 5
Dynamic Allocation of Kernel Virtual
Address Space. 5
Details for x86 Architectures. 6
Details for 64-bit Architectures. 7
Kernel-Mode Stack Jumping in x86
Architectures. 7
Use of Excess Pool Memory. 8
Security: Address Space Layout
Randomization. 9
Effect of ASLR on Image Load
Addresses. 9
Benefits of ASLR.. 11
How to Create Dynamically Based
Images. 11
I/O Bandwidth. 11
Microsoft SuperFetch. 12
Page-File Writes. 12
Coordination of Memory Manager and
Cache Manager 13
Prefetch-Style Clustering. 14
Large File Management 15
Hibernate and Standby. 16
Advanced Video Model 16
NUMA Support 17
Resource Allocation. 17
Default Node and Affinity. 18
Interrupt Affinity. 19
NUMA-Aware System Functions for
Applications. 19
NUMA-Aware System Functions for
Drivers. 19
Paging. 20
Scalability. 20
Efficiency and Parallelism.. 20
Page-Frame Number and PFN Database. 20
Large Pages. 21
Cache-Aligned Pool Allocation. 21
Virtual Machines. 22
Load Balancing. 22
Additional Optimizations. 23
System Integrity. 23
Diagnosis of Hardware Errors. 23
Code Integrity and Driver Signing. 24
Data Preservation during Bug Checks. 24
What You Should Do. 24
For Hardware Manufacturers. 24
For Driver Developers. 24
For Application Developers. 25
For System Administrators. 25
Resources. 25
If size_t is greater than 32 bits on your system, you've cleared the first hurdle. But the C and C++ standards aren't responsible for determining whether any particular call to new or malloc succeeds (except malloc with a 0 size). That depends entirely on the OS and the current state of the heap.
Like everyone else said, getting a 64bit machine is the way to go. But even on a 32bit machine intel machine, you can address bigger than 4gb areas of memory if your OS and your CPU support PAE. Unfortunately, 32bit WinXP does not do this (does 32bit Vista?). Linux lets you do this by default, but you will be limited to 4gb areas, even with mmap() since pointers are still 32bit.
What you should do though, is let the operating system take care of the memory management for you. Get in an environment that can handle that much RAM, then read the XML file(s) into (a) data structure(s), and let it allocate the space for you. Then operate on the data structure in memory, instead of operating on the XML file itself.
Even in 64bit systems though, you're not going to have a lot of control over what portions of your program actually sit in RAM, in Cache, or are paged to disk, at least in most instances, since the OS and the MMU handle this themselves.