Should volatile be used when mapping GPU memory? - c++

Both OpenGL and Vulkan allow to obtain a pointer to a part of GPUs memory by using glMapBuffer and vkMapMemory respectively. They both give a void* to the mapped memory. To interpret its contents as some data it has to be cast to an appropriate type. The simplest example could be to cast to a float* to interpret the memory as an array of floats or vectors or similar.
It seems that any kind of memory mapping is undefined behaviour in C++, as it has no notion of memory mapping. However, this isn't really an issue because this topic is outside of the scope of the C++ Standard. However, there is still the question of volatile.
In the linked question the pointer is additionally marked as volatile because the contents of the memory it points at can be modified in a way that the compiler cannot anticipate during compilation. This seems reasonable though I rarely see people use volatile in this context (more broadly, this keyword seems to be barely used at all nowadays).
At the same time in this question the answer seems to be that using volatile is unnecessary. This is due to the fact that the memory they speak of is mapped using mmap and later given to msync which can be treated as modifying the memory, which is similar to explicitly flushing it in Vulkan or OpenGL. I'm afraid though that this doesn't apply to neither OpenGL nor Vulkan.
In case of the memory being mapped as not GL_MAP_FLUSH_EXPLICIT_BIT or it being VK_MEMORY_PROPERTY_HOST_COHERENT_BIT than no flushing is needed at all and the memory contents update automagically. Even if the memory is flushed by hand by using vkFlushMappedMemoryRanges or glFlushMappedBufferRange neither of these functions actually takes the mapped pointer as a parameter, so the compiler couldn't possibly know that they modify the mapped memory's contents.
As such, is it necessary to mark pointers to mapped GPU memory as volatile? I know that technically this is all undefined behaviour, but I am asking what is required in practice on real hardware.
By the way, neither the Vulkan Specification or the OpenGL Specification mention the volatile qualifier at all.
EDIT: Would marking the memory as volatile incur a performance overhead?

OK, let's say that we have a compiler that is omniscient about everything that happens in your code. This means that the compiler can follow any pointer, even through the runtime execution of your code perfectly and correctly every time, no matter how you try to hide it. So even if you read a byte at one end of your program, the compiler will somehow remember the exact bytes you've read and anytime you try to read them again, it can choose to not execute that read and just give you the previous value, unless the compiler is aware of something that can change it.
But let's also say that our omniscient compiler is completely oblivious to everything that happens in OpenGL/Vulkan. To this compiler, the graphics API is a black box. Here, there be dragons.
So you get a pointer from the API, read from it, the GPU writes to it, and then you want to read that new data the GPU just wrote. Why would a compiler believe that the data behind that pointer has been altered; after all, the alterations came from outside of the system, from a source that the C++ standard does not recognize.
That's is what volatile is for, right?
Well, here's the thing. In both OpenGL and Vulkan, to ensure that you can actually read that data, you need to do something. Even if you map the memory coherently, you have to make an API call to ensure that the GPU process that wrote to the memory has actually executed. For Vulkan, you're waiting on a fence or an event. For OpenGL, you're waiting on a fence or executing a full finish.
Either way, before executing the read from the memory, the omniscient compiler encounters a function call into a black box which as established earlier the compiler knows nothing about. Since the mapped pointer itself came from the same black box, the compiler cannot assume that the black box doesn't have a pointer to that memory. So as far as the compiler is concerned, calling those functions could have written data to that memory.
And therefore, our omniscient-yet-oblivious compiler cannot optimize away such memory accesses. Once we get control back from those functions, the compiler must assume that any memory from any pointer reachable through that address could have been altered.
And if the compiler were able to peer into the graphics API itself, to read and understand what those functions are doing, then it would definitely see things that would tell it, "oh, I should not make assumptions about the state of memory retrieved through these pointers."
This is why you don't need volatile.
Also, note that the same applies to writing data. If you write to persistent, coherent mapped memory, you still have to perform some synchronization action with the graphics API so that your CPU writes so that the GPU isn't reading it. So that's where the compiler knows that it can no longer rely on its knowledge of previously written data.

Related

Mutable vs Immutable storage

A few closely-related questions regarding buffer objects in OpenGL.
Besides persistent mapping, is there any other reason to allocate an immutable buffer? Even if the user allocates memory for the buffer only once, with mutable buffers he always has the ability to do it again if he needs to. Plus, with mutable buffers you can explicitly specify a usage hint.
How do people usually change data through a mapped pointer? The way I see it, you can either make changes to a single element, or multiple. For single-element changes all I could think of is an operator[] on a mapped pointer as if it was a C-style array. For multi-element changes, only thing I could think of is a memcpy, but in that case isn't it just better to use glBufferSubData?
Speaking of glBufferSubData, is there truly any difference between calling it and just doing a memcpy on a mapped pointer? I've heard the former does more than 1 memcpy, is it true?
Is there a known reason why you can't specify a usage hint for an immutable buffer?
I know these questions are mostly about performance, and thus can be answered with a simple "just do some profiling and see", but at the time of me asking this it's not so much about performance as it is about design - i.e., I want to know the good practices of choosing between a mutable buffer vs an immutable one, and how should I be modifying their contents.
Even if the user allocates memory for the buffer only once, with mutable buffers he always has the ability to do it again if he needs to.
And that's precisely why you shouldn't use them. Reallocating a buffer object's storage (outside of invalidation) is not a useful thing. Drivers have to do a lot of work to make it feasible.
So having an API that takes away tools you shouldn't use is a good thing.
How do people usually change data through a mapped pointer?
You generally use whatever tool is most appropriate to the circumstance. The point of having a mapped pointer is to access the storage directly, so writing your data elsewhere and copying it in manually is kind of working against that purpose.
Is there a known reason why you can't specify a usage hint for an immutable buffer?
Because the immutable buffer API was written by people who didn't want to have terrible, useless, and pointless parameters. The usage hint on mutable buffers is completely ignored by several implementations because users were so consistently confused about what those hints mean that people used them in strange scenarios.
Immutable buffers instead make you state how you intend to use the buffer, and then hold you to it. If you ask for a static buffer whose contents you will never modify, then you cannot modify it, period. This is prevented at the API level, unlike usage hints, where you could use any buffer in any particular way regardless of the hint.
Hints were a bad idea and needed to die.

When should glInvalidateBufferData be used?

OpenGL provides the functions glInvalidateBufferData and glInvalidateBufferSubData for invalidating buffers.
What these functions do according to the OpenGL wiki:
When a buffer or region thereof is invalidated, it means that the contents of that buffer are now undefined.
and
The idea is that, by invalidating a buffer or range thereof, the implementation will simply grab a new piece of memory to use for any later operations. Thus, while previously issued GL commands can still read the buffer's original data, you can fill the invalidated buffer with new values (via mapping or glBufferSubData) without causing implicit synchronization.
I'm having a hard time understanding when this function should be called and when it shouldn't. In theory, if the contents of the buffer are going to be overwritten it makes absolutely no difference if the previous contents were trashed or not. Does this mean that a buffer should be called before every write? Or only in certain situations?
In theory, if the contents of the buffer are going to be overwritten it makes absolutely no difference if the previous contents were trashed or not.
You're absolutely right.
In essence whether you call glBufferData or glInvalidateBufferData you're achieving the same end goal. The difference is that with glBufferData you're more saying "I need this much memory now", which in turn means that the old memory is discarded. However with glInvalidateBufferData you're saying "I don't need this memory anymore". The same goes for glInvalidateBufferSubData() vs glBufferSubData() as well as all the other glInvalidate*() functions.
The key here is that if you have a buffer, and you currently aren't needing it anymore however you're keeping the handle for later use. Then you can call glInvalidateBufferData to tell that the memory can be released.
The same goes for glInvalidateBufferSubData(). If you suddenly don't need the last half chunk of the memory assigned to the buffer, then now your driver knows that this chunk of memory can be reassigned.
When should glInvalidateBufferData be used?
So to answer your question. When you have a buffer laying around and you don't need the memory anymore.

program terminates with dynamic memory on the heap

I will preface this question to c/c++ as it mostly pertains to that, and I have seen it have the most impact with c/c++.
this has concerned me for some time, and I understand some of this problem can be avoided (and I would like to avoid the lectures on ways to avoid, but rather focus on the aftermath just in case it does happen), but I would still have the underlying question.
initial thoughts:
A pointer simply serves as a address to an object somewhere else in memory (this can be because of needing to modify the number of things of that type int[], or because the nature of the thing can change throughout the lifespan of the thing polymophism)
anytime the keyword new is used it should have a corresponding keyword delete (if not multiple depending on exception handling, and multiple exit points)
when a dynamically allocated memory chunk is acted upon by keyword delete the destructor is called (and its actions are performed if any), the memory chunk is returned to the system store to be made available for other things, and (depending on compiler, macros, or programmer) the pointer is set to NULL to avoid illegal memory accessing.
situation:
when I am writing a program that uses dynamic memory (combination of pointers, new, and delete). if something happens, and the program terminates unexpectedly (unhandled exception, memory access error, illegal operation. etc). the system should attempt to remove all memory that the program is using, and return it to the system, but pointers are not always cleared. this may vary between operating system, and compiler (on how program termination is performed), but the things that were pointed to may still exist in memory because all that was deleted was the pointer, and not the thing that was pointed to. granted this can be quite small loss (less then a MB for a small program, but for say stress testing a data store, or processing large files this can be quite large possibly even in the GB range.
the direct question is what steps can be taken to get that memory back? the only thing that I have found that works is to just restart the system (this is when using g++, and VS2008/2010 on a windows system)
If the program terminates, then all memory it was using is returned to the system. At least under Windows which you say you are using. If you think this is not happening, then perhaps your program is not actually terminating at all.
The heap is bound to the allocator, and the allocator is bound to the process. When the process exits, the heap comes undone. Only system-shared resources aren't deallocated.

Why allocate memory? (C++)

I am trying to understand memory allocation in C++.
A question that comes to my mind is why is it so necessary to allocate memory? And what happens if we use memory without allocating it?
Also, I was shocked to see how careless C++ is on memory allocation. If gives free access to memory through arrays with no bounds checking.
int main()
{
int *p = new int[5];
p[1] = 3;
p[11118] = 9;
cout<<p[11118]<<'\n';
}
The above code works, outputs 9.
In what cases would assigning a value to a non allocated memory location be dangerous? What are the potential ill-effects? Is it possible that the memory location I am accessing has been allocated to some other program and assigning a value to it might cause that program to crash/behave in a very unexpected fashion?
The above code is Undefined Behaviour. It can work, work incorrectly, not work at all, crash, or order pizza through Microsoft Skype. Thou shalt not rely on undefined behavior.
Why is it neccessary to allocate memory?
Because that way, you mark the memory as yours. Nobody else can use it. It also verifies that there is in fact memory available. If your system only has 1000 bytes of memory, just picking byte 1500 to store some data at is a bad idea.
What happens if we use memory without allocating it?
Nobody knows. The address you write you might not exist. A different process might have already started using it, so you overwrite their data. The memory could be protected; in the former case, for instance, the operating system may notice that you are accessing memory another process has laid claim to, and stop you. You might own that region of memory, but a different part of the program is using it for some reason, and you've overwritten your own data.
Free access to memory through arrays with no bounds checking.
That code does not work... it functions as expected, at the moment, but that is not the same thing. Formally, that is undefined behavior, so the compiler can emit code to do anything it wants.
In what cases would assigning value to a non allocated memory location would be dangerous?
I gave some examples above. It is also possible to break your stack. When you call a function, the address the function should return to is stored. If you overwrite that value through careless memory access, then when you leave that function, who knows where you'll end up? Maybe the person exploiting your program... a common exploit is to load executable code into some part of memory, then use a bug in an existing program to run it. Once, on an embedded device I was working on, I had a fencepost error that resulted in my function returning into the middle of another instruction elsewhere. That should have crashed my chip, but as luck would have it the second half of that instruction was itself a valid instruction. The sequence of code that ended up running caused the device to gain sentience, and eventually finished the project we were working on itself. Now, it just plays WoW in my basement. Thus is the horror of undefined behavior.
Many good answers, but I feel that there's something missing regarding "why we need to allocate memory". I think it is important to know how the control flow of a computer program works at the lowest level, since C and C++ are relatively thin layers of abstraction over the hardware.
While it is possible to write a program in one huge global scope with ifs and gotos alone, most real-world programs are split into functions, which are separate, movable modules which can call each other at will. To keep track of all the data (arguments, return value, local variables), all this data is put on a one-dimensional, contiguous area of memory called the stack. Calling a function puts stuff on the stack, and returning from a function pops the data back off, and the same area of memory is overwritten by the next function call.
That way, all function code can be stored abstractly by just remembering offsets to local data relative to its entry point, and the same function can be called from many different contexts -- the function's local variables may be at different absolute addresses, but they're always at the same relative position relative to the function's entry address.
The fact that the stack memory is constantly overwritten as functions get called and return means that you cannot place any persistent data on the stack, i.e. in a local variable, because the memory for the local variables is not kept intact after the function returns. If your function needs to store persistent data somewhere, it must store that data somewhere else. This other location is the so-called heap, on which you manually (also called "dynamically") request persistent storage via malloc or new. That area of memory lies elsewhere and will not be recycled or overwritten by anyone, and you may safely pass a pointer to that memory around for as long as you like. The only downside is that unless you manually tell the system that you're done, it won't be able to use the memory for anything else, which is why you must manually clean up this dynamically allocated memory. But the need for functions to store persistent information is the reason we need to allocate memory.
(Just to complete the picture: local variables on the stack are said to be "automatically allocated". There is also "static allocation", which happens at compile time and is where global variables live. If you have a global char[30000], you may happily read from and write to that from anywhere in your program.)
Allocating memory on the heap allows dynamic allocation of a dynamic amount of memory with a dynamic lifetime.
If you want bounds-checking, you can get it through std::vector::at().
In what cases would assigning value to a non allocated memory location would be dangerous?
All cases.
what are the potential ill-affects?
Unexpected behavior.
Is it possible that the memory location i am accessing has been allocated to some other program and assigning a value to it might cause that program to crash/behave in a very unexpected fashion?
Depends on the operating system.
This seems like two questions:
Why doesn't c++ do bounds-checking?
Why do we need dynamic memory allocation?
My answers:
Because then it'd be slower. You can always write an accessor function that checks bounds, like std::vector::at().
Because not being able to resize memory at runtime can be very inconvenient (see early FORTRAN).
In most operating systems, there is a distinct separation between the physical memory available in the host computer, and the logical memory footprint that application code can see. This is mediated, in most cases, by a part of the CPU called the Memory Management Unit (or MMU), and it serves a number of useful goals.
The most obvious is that it allows you to assign more memory to an application (or multiple applications) than is actually present on the machine. When the application asks for some data from memory, the MMU calls the operating system to figure out where that memory really is, either in core or on disk, if it has been paged out.
Another use for this is to segment some addresses for purposes other than application use, for instance the GPU's in most computers are controlled through a region of memory that is visible to the CPU as core memory, and it can read or write to that area of memory very efficiently. the MMU provides a way for the OS to use that memory, but make it inaccessible to normal applications.
Because of this segmenting, and for other reasons, the full range of addresses are not normally available to applications until the ask the OS for some memory for a particular purpose. For instance, on linux, applications ask for more core memory by calling brk or sbrk, and they ask for memory mapped IO by calling mmap. Until an address is returned through one of those calls, the address is unmapped, and accessing it will cause a segfault, normally terminating the offending program.
Some platforms only expose memory to the application that it knows has been mapped, but C++ errs on the side of performance, it never does bounds checking automatically, because that would require some extra instructions to be executed, and on some platforms the particular instructions could be very costly. On the other hand, C++ does provide for bounds checking, if you want it, through the standard template library.
Is it possible that the memory
location i am accessing has been
allocated to some other program and
assigning a value to it might cause
that program to crash/behave in a very
unexpected fashion?
No, modern OSs are designed just to avoid that (for security reasons).
And you have to allocate memory because, although every process has its own 4GB space (provided by Windows), they all share the same xxGB the user has on his machine. Allocating memory helps the operating system know which applications need more memory and give it only to who need it.
Why my "hello world" would need the same RAM crysys 2 needs? :P
EDIT:
Ok, someone misunderstood what I meant. I didn't say it's ok and everyone can do it and nothing will happen. I just said doing this won't harm any extern process. It still is undefined behavior because no one knows what's at p + 11118, but ub doesn't mean "it can order a pizza through skype" nor other "exciting things", at most an access violation, nothing more.

Is there any heap compaction in C++?

I have a notion that C++ runtime doesn't do any heap compaction which means that the address of an object created on heap never changes. I want to confirm if this is true and also if it is true for every platform (Win32, Mac, ...)?
The C++ standard says nothing about a heap, nor about compaction. However, it does require that if you take the address of an object, that address stays the same throughout the object's lifetime.
A C++ implementation could do some kind of heap compaction and move objects around behind the scenes. But then the "addresses" it return to you when you use the address-of operator, are not actually memory addresses but some other kind of mapping.
In other words, yes, it is safe to assume that addresses in C++ stay the same while the object you're taking the address of lives.
What happens behind the scenes is unknown. It is possible that the physical memory addresses change (although common C++ compilers wouldn't do this, it might be relevant for compilers targeting various forms of bytecode, such as Flash), but the addresses that your program sees are going to behave nicely.
The standard does not specify it, but then the standard does not specify a heap. This is entirely dependent on your implementation. However, there is nothing stopping an implementation compacting unused memory while maintaining the same addreses for objects in use.
You are right it does not change. Pages can be moved around in physical memory but the Translation Lookaside Buffer (This is what control virtual memory) hides all that from you.
I'm unaware of any C++ implementation that will move allocated objects around. I suppose it might be technically permitted by the standard (though I'm not 100% sure about that), but remember that the standard must allow a pointer to be cast to a large enough integral type and back again and still be a valid pointer. So an implementation that could move dynamically allocated objects around would have to be able to deal with the admittedly unlikely series of events where:
a pointer is cast to an intptr_t
that value is transformed somehow (xor'ed with some value), so the runtime can't detect that it's a pointer to a particular object
the object gets moved due to compaction
the intptr_t gets transformed back into its original value, and
cast back to a pointer to the object type
The implementation would need to ensure that the pointer from that last step points to the moved object's new location.
I suppose using double indirection for pointers might allow an implementation to deal with this, but I'm unaware of any implementation that does anything like this.
Under normal circumstances when you're using the system compiler's default runtimes, you can safely assume that pointers will not be invalidated by the runtime.
If you are not using the default memory managers, but a 3rd-party memory manager instead, it completely depends on the runtime and memory manager you are using. While C++ objects do not generally get moved around in memory by the memory manager, you can write a memory manager that compacts free space and you could conceivably write one that would move allocated objects around to maximise free space as well.