glMapBuffer() and glBuffers, how does the access with a (void*) work with hardware?

glMapBuffer() and glBuffers, how does the access with a (void*) work with hardware? - opengl

Reading through the OpenGL Programming Guide, 8th Edition.
This is really a hardware question, actually...
I come to a section on OpenGL buffers, and as far as I understand they are memory spaces allocated in graphics card memory, is this correct?
If so, how are we able to get a pointer to read or modify that memory using glMapBuffer() ? As far as I was aware, all possible memory addresses (eg on a 64bit system there are uint64_t num = 0x0; num = ~num; possible addresses) were used for system memory as in RAM / CPU side Memory.
glMapBuffers() returns a void* to some memory. How can that pointer point to memory inside the graphics card? Particularly if I had a 32 bit system, and more than 4GB of RAM, and then a Graphics Card with say 2GB/4GB of memory. Surely there aren't enough addresses?!

This is really a hardware question, actually...
No it's not. You'll see why in a moment.
I come to a section on OpenGL buffers, and as far as I understand they are memory spaces allocated in graphics card memory, is this correct?
Not quite. You must understand that while OpenGL gets you really close to the actual hardware, you're still very far from touching it directly. What glMapBuffer does is, that it sets up a virtual address range mapping. On modern computer systems the software doesn't operate on physical addresses. Instead a virtual address space (of some size) is used. This virtual address space looks like one large contiguous block of memory to the software, while in fact its backed by a patchwork of physical pages. Those pages can be implemented anyhow, they can be actual physical memory, they can be I/O memory, they can even be created in-situ by another program. The mechanism for that is provided by the CPU's Memory Management Unit in collaboration with the OS.
So for each process the OS manages a table of which part of the process virtual address space maps to what page handler. If you're running Linux have a look at /proc/$PID/maps. If you have a program that uses glMapBuffer read (with your program, don't call system) /proc/self/maps before and after the map buffer and look for the differences.
As far as I was aware, all possible memory addresses (eg on a 64bit system there are uint64_t num = 0x0; num = ~num; possible addresses) were used for system memory as in RAM / CPU side Memory.
What makes you think that? Whoever told you that (if somebody told you that) should be slapped in the face… hard.
What you have is a virtual address space. And this address space is completely different from the physical address space on the hardware side. In fact the size of the virtual address space and the size of physical address spaces can differ largely. For example for a long time there were 32 bit CPUs and 32 bit operating systems around. But already then it was desireable to have more than 4 GiB of system memory. So while the CPU would support only 32 bits of address space for a process (maximum size of a pointer), it may have provided 36 bits of physical address lines to memory, to support some 64 GiB of system RAM; it would then be the OS's job to manually switch those extra bits, so that while each process sees only some 3 GiB of system RAM (max.) processes in total could spread. A technique like that has become known as Physical Address Extension (PAE).
Furthermore not all of the address space in a process are backed by RAM. Like I already explained, address space mappings could be backed by anything. Often the memory pagefault handler will also implement swapping, i.e. if there's not enough free RAM around it will use HDD storage (in fact on Linux all userspace requests for memory are backed by the Disk I/O Cache handler). Also since the address space mappings are per process, some part of the address space is mapped kernel memory, which is the (physically) same for all processes and also resides at the same place in all processes. From user space this address space mapping is not accessible, but as soon as a syscall makes a transistion into kernel space it gets accessible; yes the OS kernel uses virtual memory internally, too. It just can't choose as broadly from the available backings (for example it would be very difficult for a network driver to operate, if its memory was backed by the network itself).
Anyway: On modern 64 bit systems you got a 64 bit pointer size, and with current hardware there are 48 bits of physical RAM address lines. Which leaves plenty of space, namely 16 × 48 bits (EDIT which means 2^16 - 1 times a 48 bit address space), for virtual mappings where there's no RAM around. And because there's so much to go around, each and every PCI card gets its very own address space, that behaves a little bit like RAM to the CPU (remember those PAE I mentioned earlier, well, in good old 32 bit times something like that had to be done to talk with extension cards already).
Now here comes the OpenGL driver. It simply provides a new address mapping handler, that usually just builds on top of the PCI address space handler, which will map a portion of virtual address space of a process. And whatever happens in that address space will be reflected by that mapping handler into a buffer ultimately accessed by the GPU. However the GPU itself may be accessing CPU memory directly. And what AMD plans is, that GPU and CPU will live on the same Die and access the same memory, so there's no longer a physical distinction there.

glMapBuffers() returns a pointer in the virtual memory space of the application, that's why the pointer could point in something above 4GB on a 64bits system.
The memory you manipulate with the mapped pointer could be a cpu copy (shadow) of the texture(or buffer) allocated on gpu or it could be the actual texture moved to system memory. It's often the operating system who decides if a texture resides on system memory or gpu memory. The operating system can move the texture from one location to another and can make a copy of it (shadow)

Computers have multiple layers of memory mapping. First all physically addressable components (including RAM, PCI memory windows, other device registers, etc) are all assigned a physical address. The size of this address varies. Even on 32-bit Intel devices the physical space might be a 36-bit address (to the CPU) while it is a 64-bit address in the PCI memory space.
On top of that mapping is a virtual mapping. There are many different virtual mappings, including one for each process. Within that space (which on a 64-bit system is as huge as you say) any combination (including repeats!) of physical space can be mapped as long as it fits. On a 64-bit system the available virtual space is so large that every possible physical device could easily be mapped.

Related

Virtual Memory or Physical Memory

Suppose we write a program in C and print the address of one of the variables declared in the program, is the address that gets printed on the screen the virtual address or the physical address of the variable?
If it is the virtual address, why is it that it still has the same range as a bit range of physical memory? Eg. for a 32 bit machine if it returns 0x833CA23E.

The address is going to be a virtual address in virtual memory, because the application has no knowledge of physical memory. That is hidden by the kernel and the MMU.
I am not sure what you mean by the same "bit range". If you have a 32-bit address space it will range across the entire 32-bit space regardless of what amount of physical memory you have. Likewise for 64-bit.

In most typical cases (Windows, Linux, etc.) it'll be a virtual address.
In the typical cases like Linux and Windows, both virtual addresses and physical addresses are normally 32 bits, so having numbers in the same range becomes inevitable. It is possible to allocate more than 4 gigabytes of memory, and when/if you do so, you end up with addresses larger than 32 bits--but unless you take special steps to do that, a 32-bit address is what you'll get by default.
When you do use more than 4 GB of memory under a 32-bit OS, you're normally doing so via some special API, such as Windows' Address Windowing Extensions. Using these, you get access to more than 4 GB of RAM, but it's not what's going to happen by default with code that's even close to portable.
Some (versions of some) operating systems also use Intel's Physical Address Extensions (PAE) to give the system as a whole access to more than 4 GB of RAM, but even when these are in use, any single process running on the system is still limited to addressing 4 GB (i.e., with PAE, you can have a limit of 4 GB per process, whereas older systems had a limit of 4 GB total, divided as needed between the processes).

It will be a 32 bit virtual address in most cases.
If your OS supports does paging then it would be the virtual address. It could have been mapped to the same physical address using paging. Linux and Windows do paging.
Another thing that matters is the architecture. On Intel x86 32bit system it will be 32 bit address. The first 10 bits of the address will be used to get page table. The second 10 bits will be used to get page from the selected page table. And the last 12 bits will give you the actual physical address from that page.
I hope it answers your question.

when will virtual memory be used (windows)?

I am debugging a program which crashed because no contiguous memory can be used for my vector which needs to be reallocated. So I have a question how come the virtual memory isnot used? In which way can virtual memory be used? Thanks.

Virtual memory is used automatically by OS. You don't need to care about this.
In your case, it's most likely that you run a 32-bit application. User address space for a 32-bit process in Windows is limited to 2 GB (well, 3 GB if Windows is booted with a specific key). If your vector requires more than several hundred megabytes of contiguous address space, this may become a problem (due to address space fragmentation).
Of course, any process can run out of memory (even while using virtual memory and swap file and whatever else). Take a look at memory usage of your program in Task Manager.

Virtual memory is the only memory you ever get as a program running on a modern OS (Linux, Unix, Windows, MacOS, Symbian, etc).
It sounds like your problem is that there isn't one contiguous virtual address range that is large enough for your vector [1]. I suspect what is happening is that you need, say, more than 1.5GB in a 32-bit process, which can only use 2GB at once, so there isn't much "room" on either end to stuff other bits into before the "middle" is smaller than 1.5GB - in particular, if you have a vector that is growing, you will need two copies of the vector, one at it's current size, and one at double the size to copy into.
A simple solution, assuming you know how big the vector needs to be is to set it's size, e.g. vector<int> vec(some_size);
If you don't know, there are some more solutions:
If you have a 64-bit OS, you could try setting the LARGEADDRESSAWARE flag for the executable (assuming it's Windows). That should give you a fair bit more memory, since the 64-bit OS doesn't have to reserve a large chunk of memory space for the OS itself (that lives well outside the 32-bit address range. In a 32-bit OS, you need to boot the OS with /3GB, and set the above flag.
Or compile the code as 64-bit (after upgrading to a 64-bit OS, if needed).
[1] Unless of course, you are writing a driver and trying to allocate many megabytes of physical memory as a buffer to use for DMA - but I think you would have said so.

The problem has nothing to do with memory, or even with virtual memory. An array needs a contiguous range of addresses. The address space (normally 2 GB in a Win32 program) is fragmented so that there is not a large enough space available.
If you could get the addresses Windows would automatically provide the virtual memory to go with them.
It is time to move your app up to 64 bits.

Allocate extra space to process

can I provide extra space to the process other than provided by the operating system.
Can extra detachable memory be used for such purposes.

can I provide extra space to the process other than provided by the
operating system.
No you cant, for every piece of memory you have to request your OS.malloc(), new and other memory allocating functions and operator resolve as a system call that request OS for memory to be provided to the program.

Every process has a definite maximum memory space allocated to it, that depends on the machine architecture. On a 32-bit machine, the maximum addressable space is 2^32 bytes ~= 4GB. Hence a process should be able to address 4 GB of memory typically. But this space is divided into two parts, 1. Kernel Space and 2. Process Space. Kernel space is used for OS drivers etc while Process space is the space where your data can be allocated. Hence the memory available to you is just the Process space.
On a typical Windows XP machine, it is equally divided. i.e. 2 GB for process space (However, there are ways to modify this. For example, the /3G option). Any allocation beyond 2 GB gives a out of memory error.This process space becomes more when you move from a 32-bit application to a 64-bit application. This is one of the major incentives for moving to 64-bit applications.
So to answer your question, there is a maximum memory available to a process beyond which the OS denies memory allocations to the process.

There are some obscure ways. E.g. if you would attach a Windows CE device to a Windows PC, the memory of that device could be accessed via the "RAPI" interface. The Windows OS wouldn't be aware of this device memory; this was handles via the ActiveSync service. It wasn't very quick memory, though.

How much memory should you be able to allocate?

Background: I am writing a C++ program working with large amounts of geodata, and wish to load large chunks to process at a single go. I am constrained to working with an app compiled for 32 bit machines. The machine I am testing on is running a 64 bit OS (Windows 7) and has 6 gig of ram. Using MS VS 2008.
I have the following code:
byte* pTempBuffer2[3];
try
{
//size_t nBufSize = nBandBytes*m_nBandCount;
pTempBuffer2[0] = new byte[nBandBytes];
pTempBuffer2[1] = new byte[nBandBytes];
pTempBuffer2[2] = new byte[nBandBytes];
}
catch (std::bad_alloc)
{
// If we didn't get the memory just don't buffer and we will get data one
// piece at a time.
return;
}
I was hoping that I would be able to allocate memory until the app reached the 4 gigabyte limit of 32 bit addressing. However, when nBandBytes is 466,560,000 the new throws std::bad_alloc on the second try. At this stage, the working set (memory) value for the process is 665,232 K So, it I don't seem to be able to get even a gig of memory allocated.
There has been some mention of a 2 gig limit for applications in 32 bit Windows which may be extended to 3 gig with the /3GB switch for win32. This is good advice under that environment, but not relevant to this case.
How much memory should you be able to allocate under the 64 bit OS with a 32 bit application?

As much as the OS wants to give you. By default, Windows lets a 32-bit process have 2GB of address space. And this is split into several chunks. One area is set aside for the stack, others for each executable and dll that is loaded. Whatever is left can be dynamically allocated, but there's no guarantee that it'll be one big contiguous chunk. It might be several smaller chunks of a couple of hundred MB each.
If you compile with the LargeAddressAware flag, 64-bit Windows will let you use the full 4GB address space, which should help a bit, but in general,
you shouldn't assume that the available memory is contiguous. You should be able to work with multiple smaller allocations rather than a few big ones, and
You should compile it as a 64-bit application if you need a lot of memory.

on windows 32 bit, the normal process can take 2 GB at maximum, but with /3GB switch it can reach to 3 GB (for windows 2003).
but in your case I think you are allocating contiguous memory, and so the exception occured.

You can allocate as much memory as your page file will let you - even without the /3GB switch, you can allocate 4GB of memory without much difficulty.
Read this article for a good overview of how to think about physical memory, virtual memory, and address space (all three are different things). In a nutshell, you have exactly as much physical memory as you have RAM, but your app really has no interaction with that physical memory at all - it's just a convenient place to store the data that in your virtual memory. Your virtual memory is limited by the size of your pagefile, and the amount your app can use is limited by how much other apps are using (although you can allocate more, providing you don't actually use it). Your address space in the 32 bit world is 4GB. Of those, 2 GB are allocated to the kernel (or 1GB if you use the /3BG switch). Of the 2GB that are left, some is going to be used up by your stack, some by the program you are currently running, (and all the dlls, etc..). It's going to get fragmented, and you are only going to be able to get so much contiguous space - this is where your allocation is failing. But since that address space is just a convenient way to access the virtual memory you have allocated for you, it's possible to allocate much more memory, and bring chunks of it into your address space a few at a time.
Raymond Chen has an example of how to allocate 4GB of memory and map part of it into a section of your address space.
Under 32-bit Windows, the maximum allocatable is 16TB and 256TB in 64 bit Windows.
And if you're really into how memory management works in Windows, read this article.

During the ElephantsDream project the Blender Foundation with Blender 3D had similar problems (though on Mac). Can't include the link but google: blender3d memory allocation problem and it will be the first item.
The solution involved File Mapping. Haven't tried it myself but you can read up on it here: http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx

With nBandBytes at 466,560,000, you are trying to allocate 1.4 GB. A 32-bit app typically only has access to 2 GB of memory (more if you boot with /3GB and the executable is marked as large address space aware). You may be hard pressed to find that many blocks of contiguous address space for your large chunks of memory.
If you want to allocate gigabytes of memory on a 64-bit OS, use a 64-bit process.

You should be able to allocate a total of about 2GB per process. This article (PDF) explains the details. However, you probably won't be able to get a single, contiguous block that is even close to that large.

Even if you allocate in smaller chunks, you couldn't get the memory you need, especially if the surrounding program has unpredictable memory behavior, or if you need to run on different operating systems. In my experience, the heap space on a 32-bit process caps at around 1.2GB.
At this amount of memory, I would recommend manually writing to disk. Wrap your arrays in a class that manages the memory and writes to temporary files when necessary. Hopefully the characteristics of your program are such that you could effectively cache parts of that data without hitting the disk too much.

Sysinternals VMMap is great for investigating virtual address space fragmentation, which is probably limiting how much contiguous memory you can allocate. I recommend setting it to display free space, then sorting by size to find the largest free areas, then sorting by address to see what is separating the largest free areas (probably rebased DLLs, shared memory regions, or other heaps).
Avoiding extremely large contiguous allocations is probably for the best, as others have suggested.
Setting LARGE_ADDRESS_AWARE=YES (as jalf suggested) is good, as long as the libraries that your application depends on are compatible with it. If you do so, you should test your code with the AllocationPreference registry key set to enable top-down virtual address allocation.

Can you allocate a very large single chunk of memory ( > 4GB ) in c or c++?

With very large amounts of ram these days I was wondering, it is possible to allocate a single chunk of memory that is larger than 4GB? Or would I need to allocate a bunch of smaller chunks and handle switching between them?
Why???
I'm working on processing some openstreetmap xml data and these files are huge. I'm currently streaming them in since I can't load them all in one chunk but I just got curious about the upper limits on malloc or new.

Short answer: Not likely
In order for this to work, you absolutely would have to use a 64-bit processor.
Secondly, it would depend on the Operating System support for allocating more than 4G of RAM to a single process.
In theory, it would be possible, but you would have to read the documentation for the memory allocator. You would also be more susceptible to memory fragmentation issues.
There is good information on Windows memory management.

A Primer on physcal and virtual memory layouts
You would need a 64-bit CPU and O/S build and almost certainly enough memory to avoid thrashing your working set. A bit of background:
A 32 bit machine (by and large) has registers that can store one of 2^32 (4,294,967,296) unique values. This means that a 32-bit pointer can address any one of 2^32 unique memory locations, which is where the magic 4GB limit comes from.
Some 32 bit systems such as the SPARCV8 or Xeon have MMU's that pull a trick to allow more physical memory. This allows multiple processes to take up memory totalling more than 4GB in aggregate, but each process is limited to its own 32 bit virtual address space. For a single process looking at a virtual address space, only 2^32 distinct physical locations can be mapped by a 32 bit pointer.
I won't go into the details but This presentation (warning: powerpoint) describes how this works. Some operating systems have facilities (such as those described Here - thanks to FP above) to manipulate the MMU and swap different physical locations into the virtual address space under user level control.
The operating system and memory mapped I/O will take up some of the virtual address space, so not all of that 4GB is necessarily available to the process. As an example, Windows defaults to taking 2GB of this, but can be set to only take 1GB if the /3G switch is invoked on boot. This means that a single process on a 32 bit architecture of this sort can only build a contiguous data structure of somewhat less than 4GB in memory.
This means you would have to explicitly use the PAE facilities on Windows or Equivalent facilities on Linux to manually swap in the overlays. This is not necessarily that hard, but it will take some time to get working.
Alternatively you can get a 64-bit box with lots of memory and these problems more or less go away. A 64 bit architecture with 64 bit pointers can build a contiguous data structure with as many as 2^64 (18,446,744,073,709,551,616) unique addresses, at least in theory. This allows larger contiguous data structures to be built and managed.

The advantage of memory mapped files is that you can open a file much bigger than 4Gb (almost infinite on NTFS!) and have multiple <4Gb memory windows into it.
It's much more efficent than opening a file and reading it into memory,on most operating systems it uses the built-in paging support.

This shouldn't be a problem with a 64-bit OS (and a machine that has that much memory).
If malloc can't cope then the OS will certainly provide APIs that allow you to allocate memory directly. Under Windows you can use the VirtualAlloc API.

it depends on which C compiler you're using, and on what platform (of course) but there's no fundamental reason why you cannot allocate the largest chunk of contiguously available memory - which may be less than you need. And of course you may have to be using a 64-bit system to address than much RAM...
see Malloc for history and details
call HeapMax in alloc.h to get the largest available block size

Have you considered using memory mapped files? Since you are loading in really huge files, it would seem that this might be the best way to go.

It depends on whether the OS will give you virtual address space that allows addressing memory above 4GB and whether the compiler supports allocating it using new/malloc.
For 32-bit Windows you won't be able to get single chunk bigger than 4GB, as the pointer size is 32-bit, thus limiting your virtual address space to 4GB. (You could use Physical Address Extension to get more than 4GB memory; however, I believe you have to map that memory into the virtualaddress space of 4GB yourself)
For 64-bit Windows, the VC++ compiler supports 64-bit pointers with theoretical limit of the virtual address space to 8TB.
I suspect the same applies for Linux/gcc - 32-bit does not allow you, whereas 64-bit allows you.

As Rob pointed out, VirtualAlloc for Windows is a good option for this, as is an anonymouse file mapping. However, specifically with respect to your question, the answer to "if C or C++" can allocate, the answer is NO THIS IS NOT SUPPORTED EVEN ON WIN7 RC 64
In the PE/COFF specification for exe files, the field which specifies the HEAP reserve and HEAP commit, is a 32 bit quantity. This is in-line with the physical size limitations of the current heap implmentation in the windows CRT, which is just short of 4GB. So, there is no way to allocate more than 4GB from C/C++ (technicall the OS support facilities of CreateFileMapping and VirtualAlloc/VirtualAllocNuma etc... are not C or C++).
Also, BE AWARE that there are underlying x86 or amd64 ABI construct's known as the page table's. This WILL in effect do what you are concerened about, allocating smaller chunks for your larger request, even though this is happining in kernel memory, there is an effect on the overall system, these tables are finite.
If you are allocating memory in such grandious purportions, you would be well advised to allocate based on the allocation granularity (which VirtualAlloc enforces) and also to identify optional flags's or methods to enable larger pages.
4kb pages were the initial page size for the 386, subsaquently the pentium added 4MB. Today, the AMD64 (Software Optimization Guide for AMD Family 10h Processors) has a maximum page table entry size of 1GB. This mean's for your case here, let's say you just did 4GB, it would require only 4 unique entries in the kernel's directory to locate\assign and permission your process's memory.
Microsoft has also released this manual that articulates some of the finer points of application memory and it's use for the Vista/2008 platform and newer.
Contents
Introduction. 4
About the Memory Manager 4
Virtual Address Space. 5
Dynamic Allocation of Kernel Virtual
Address Space. 5
Details for x86 Architectures. 6
Details for 64-bit Architectures. 7
Kernel-Mode Stack Jumping in x86
Architectures. 7
Use of Excess Pool Memory. 8
Security: Address Space Layout
Randomization. 9
Effect of ASLR on Image Load
Addresses. 9
Benefits of ASLR.. 11
How to Create Dynamically Based
Images. 11
I/O Bandwidth. 11
Microsoft SuperFetch. 12
Page-File Writes. 12
Coordination of Memory Manager and
Cache Manager 13
Prefetch-Style Clustering. 14
Large File Management 15
Hibernate and Standby. 16
Advanced Video Model 16
NUMA Support 17
Resource Allocation. 17
Default Node and Affinity. 18
Interrupt Affinity. 19
NUMA-Aware System Functions for
Applications. 19
NUMA-Aware System Functions for
Drivers. 19
Paging. 20
Scalability. 20
Efficiency and Parallelism.. 20
Page-Frame Number and PFN Database. 20
Large Pages. 21
Cache-Aligned Pool Allocation. 21
Virtual Machines. 22
Load Balancing. 22
Additional Optimizations. 23
System Integrity. 23
Diagnosis of Hardware Errors. 23
Code Integrity and Driver Signing. 24
Data Preservation during Bug Checks. 24
What You Should Do. 24
For Hardware Manufacturers. 24
For Driver Developers. 24
For Application Developers. 25
For System Administrators. 25
Resources. 25

If size_t is greater than 32 bits on your system, you've cleared the first hurdle. But the C and C++ standards aren't responsible for determining whether any particular call to new or malloc succeeds (except malloc with a 0 size). That depends entirely on the OS and the current state of the heap.

Like everyone else said, getting a 64bit machine is the way to go. But even on a 32bit machine intel machine, you can address bigger than 4gb areas of memory if your OS and your CPU support PAE. Unfortunately, 32bit WinXP does not do this (does 32bit Vista?). Linux lets you do this by default, but you will be limited to 4gb areas, even with mmap() since pointers are still 32bit.
What you should do though, is let the operating system take care of the memory management for you. Get in an environment that can handle that much RAM, then read the XML file(s) into (a) data structure(s), and let it allocate the space for you. Then operate on the data structure in memory, instead of operating on the XML file itself.
Even in 64bit systems though, you're not going to have a lot of control over what portions of your program actually sit in RAM, in Cache, or are paged to disk, at least in most instances, since the OS and the MMU handle this themselves.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js