A few closely-related questions regarding buffer objects in OpenGL.
Besides persistent mapping, is there any other reason to allocate an immutable buffer? Even if the user allocates memory for the buffer only once, with mutable buffers he always has the ability to do it again if he needs to. Plus, with mutable buffers you can explicitly specify a usage hint.
How do people usually change data through a mapped pointer? The way I see it, you can either make changes to a single element, or multiple. For single-element changes all I could think of is an operator[] on a mapped pointer as if it was a C-style array. For multi-element changes, only thing I could think of is a memcpy, but in that case isn't it just better to use glBufferSubData?
Speaking of glBufferSubData, is there truly any difference between calling it and just doing a memcpy on a mapped pointer? I've heard the former does more than 1 memcpy, is it true?
Is there a known reason why you can't specify a usage hint for an immutable buffer?
I know these questions are mostly about performance, and thus can be answered with a simple "just do some profiling and see", but at the time of me asking this it's not so much about performance as it is about design - i.e., I want to know the good practices of choosing between a mutable buffer vs an immutable one, and how should I be modifying their contents.
Even if the user allocates memory for the buffer only once, with mutable buffers he always has the ability to do it again if he needs to.
And that's precisely why you shouldn't use them. Reallocating a buffer object's storage (outside of invalidation) is not a useful thing. Drivers have to do a lot of work to make it feasible.
So having an API that takes away tools you shouldn't use is a good thing.
How do people usually change data through a mapped pointer?
You generally use whatever tool is most appropriate to the circumstance. The point of having a mapped pointer is to access the storage directly, so writing your data elsewhere and copying it in manually is kind of working against that purpose.
Is there a known reason why you can't specify a usage hint for an immutable buffer?
Because the immutable buffer API was written by people who didn't want to have terrible, useless, and pointless parameters. The usage hint on mutable buffers is completely ignored by several implementations because users were so consistently confused about what those hints mean that people used them in strange scenarios.
Immutable buffers instead make you state how you intend to use the buffer, and then hold you to it. If you ask for a static buffer whose contents you will never modify, then you cannot modify it, period. This is prevented at the API level, unlike usage hints, where you could use any buffer in any particular way regardless of the hint.
Hints were a bad idea and needed to die.
Related
I am currently writing a library that parse some structured binary data into a set of objects. These objects are expected to outlive any user code, and would normally be freed at the end or after the main function.
I am using shared (and weak) pointers to manage the memory of each object, but it is causing a lot of added complexity to the program, and raises structural issues that I will not get into in this particular question.
Considering that:
traversing the entirety of the binary data is expensive and I cannot afford to do it more than one time,
each visited entry is used to build an object, that then gets registered (i.e. added into the set),
entries in the binary data may rely on other entries that appears later but gets parsed immediately, and registered when the entry is visited again,
duplicate entries may appear at any moment, but I need to merge those duplicates into one instance (and update any pointer referencing those duplicates to the new merged entry) before registration,
every single one of those objects is guaranteed to be of one of many POD types deriving a common class, so nothing except memory needs to be cleaned up,
the resulting program will run on a modern OS (or in this case, that collects memory from dead processes),
I am very tempted to just use raw pointers, never free the memory taken by those objects and let the OS do its cleanup after the process exits.
What would be the best course of action?
If you're writing reusable code, you need to at least provide the option of cleaning up. What if some program uses your library for one operation, and then continues running? It's not safe to assume that the process exits immediately after your library's task is complete.
The other answers cover the general and standard approach: in an ideal world, yes, you'd clean up your memory, because it makes the code more generic and more reusable and helps with tooling. As others have said, std::unique_ptr for owning pointers and raw pointers for non-owning pointers should work well.
There are a couple of more specialized approaches that may or may not be useful:
Use a pool allocator (such as Boost.Pool, or roll your own) to allocate a bunch of memory up front then dole out pieces of it for your objects. You can then free every object at once by deleting the pool.
Intentionally not freeing memory is occasionally a valid technique. See, e.g., "Increasing Compiler Performance by Over 75%", by Walter Bright. Of course, a compiler is a specialized problem domain, and Walter Bright is probably one of the top compiler developers alive, so techniques that work for his problem domain shouldn't be blindly applied elsewhere.
the resulting program will run on a modern OS (or in this case, that collects memory from dead processes)
I am very tempted to just use raw pointers, never free the memory taken by those objects and let the OS do its cleanup after the process exits.
If you take this approach, then anyone who uses your library and then uses valgrind to try to detect memory leaks in their program will report massive leaks coming from your library and complain to you about it, so if I were you I definitely would not do this.
If you are writing a library then you should provide a cleanup function that frees all memory that you allocated.
A practical example of why this is useful is if a Windows DLL uses your library. When the library is loaded, static data is initialized. When the library is unloaded, static data is cleared. If your library has some global pointers to memory that is never freed, then load-unload cycles of the DLL will leak memory.
If the objects are all of the same type, then rather than allocating each one independently, you could just put them all into a vector and have them refer to each other by index number instead of using pointers. The vector's built-in memory management takes care of allocating space as needed, and when you're done with the objects, you can just destroy the vector to deallocate them all at once. (Note that vector::clear() doesn't actually free the memory, though it does make it available to store a new set of objects in the vector.)
If your objects aren't all the same type, you'll want to look into the more general concept of region-based memory management. As above, the idea is that you can allocate all your objects in a relatively small number of memory chunks (possibly just one), which can be freed later without having to track all the
individual objects allocated within.
If your ownership and lifetimes are clear I suggest you use unique_ptr for the owning pointers and raw pointers for the non-owning pointers. It should be less complex than shared_ptr and weak_ptr whilst still managing memory automatically.
I don't think not managing memory at all is an option. But I think using smart pointers to express ownership is not just about good memory management it also makes code easier to reason about.
Try to think of future maintenance work. Suppose your code needs to be broken up or other stuff done after it. In this case you're opening yourself up to leaks or being a resource hog later down the line.
Cleaning up (or being able to do s) is good. It may seem obvious now that an application should work with a single structured binary dataset throughout its entire lifetime, but you'll start kicking yourself once you realize you need to write an application that needs to reset half-way through and start over with another dataset.
(a related thing that's easy to overlook is that an application may need to work with two completely independent datasets at the same time, so try not to design your library to exclude that use case!)
That said, I think you may be focusing too much on the extremes. Code that shouldn't participate in memory management can use raw pointers, and this is reasonable when there is no risk of these pointers outliving your structured dataset in memory.
However, that doesn't mean that code that does participate in memory management needs to use raw pointers too. You can use smart pointers to manage your data structures even if you are passing raw pointers out to the user.
That aside, keep in mind that, in my experience, pointers are usually the wrong semantics — usually, most use cases are most natural with reference or value semantics, which means you should be passing around raw references, or passing around lightweight wrapper class that have reference or value semantics, but are implemented as containing a pointer to the actual data. Or even as a copy of the actual data if appropriate.
I am currently using Boehm Garbage Collector for a large application in C++. While it works, it seems to me that the GC is overkill for my purpose (I do not like having this as a dependency and I have to continually make allowances and think about the GC in everything I do so as to not step on its toes). I would like to find a better solution that is more suited to my needs, rather than a blanket solution that happens to cover it.
In my situation I have one specific class (and everything that inherits from that class) that I want to "collect". I do not need general garbage collection, in all situations except for this particular class I can easily manage my own memory.
Before I started using the GC, I used reference counting, but reference cycles and the frequent updates made this a less than ideal solution.
Is there a better way for me to keep track of this class? One that does not involve additional library dependancies like boost.
Edit:
It is probably best if I give a rundown on the potential lifespan of my object(s).
A function creates a new instance of my class and may (or may not) use it. Regardless, it passes this new instance back to the caller as a return value. The caller may (or may not) use it as well, and again it passes it back up the stack, eventually getting to the top level function which just lets the pointer fade into oblivion.
I cannot just delete the pointer in the top level, because part of the "possible use", involves passing the pointer to other functions which may (or may not) store the pointer for use somewhere else, at some future time.
I hope this better illustrates the problem that I am trying to solve. I currently solve it with Boehm Garbage Collector, but would like simpler, non dependency involving, solution if possible.
In the Embedded Systems world, or programs that are real-time event critical, garbage collection is frowned upon. The point of using dynamic memory is bad.
With dynamic memory allocation, fragmentation occurs. A Garbage Collector is used to periodically arrange memory to reduce the fragmentation, such as combining sequential freed blocks. The primary issue is when to perform this defragmentation or running of the GC.
Some suggested alternatives:
Redesign your system to avoid dynamic memory allocation.
Allocate static buffers and use them. For example in an RTOS system, preallocate space for messages, rather than dynamically allocating them.
Use the Stack, not the Heap.
Use the stack for dynamically allocated variables, if possible. This is not a good idea if variables need a lifetime beyond the function execution.
Place limits on variable sized data.
Along with static buffers, place limits on variable length data or incoming data of unknown size. This may mean that the incoming data must be paused or multiple buffering when the input cannot be stopped.
Create your own memory allocator.
Create many memory pools that allocate different sized blocks. This will reduce fragmentation. For example, for small blocks, maybe a bitset could be used to determine which bytes are in use and which are available. Maybe another pool for 64 byte blocks is necessary. All depends on your system's needs.
If you really just need special handling for the memory allocations associated with a single class, then you should look at overloading the new operator for that class.
class MyClass
{
public:
void *operator new(size_t);
void operator delete(void *);
}
You can implement these operators to do whatever you need to track the memory: allocate it from a special pool, place references on a linked list for tracking, etc.
void* MyClass::operator new(size_t size)
{
void *p = my_allocator(size); // e.g., instead of malloc()
// place p on a linked list, etc.
return p;
}
void MyClass::operator delete(void *p)
{
// remove p from list...
my_free(p);
}
You can then write external code that can walk through the list you are keeping to inspect every currently-allocated instance of MyClass, GC'ing instances as appropriate for your situation.
With memory, you should always try and have clear ownership and knowledge of lifetime. Lifetime determines where you take the memory from (as do other factors), ie stack for scope lived, pool for reused, etc. Ownership will tell you when and if to free memory. In your case, the GC has the ownership and makes the decision when to free. With ref counting, the wrapper class does this logic. Unclear ownership leads to hard to maintain code if manual memory management is used. You must avoid use after free, double frees, and memory leaking.
To solve your problem, figure out who should keep ownership. This will dictate the algoritm to use. GC and ref counting are popular choices, but there are infinetly many. If ownership is unclear, give it to a 3rd party whose job it is to keep track of it. If ownership is shared, make sure all parties are aware of it perhaps by enforcing it via specialized classes. This can also be enforced by simple convention, ie objects of type foo should never keep ptrs of type bar internally as they do not own them and if they do they cannot assume them always valid and might have to check for validity first. Etc.
If you find this hard to determine, it could be a sign that the code is very complex. Could it be made in a more simple manner?
Understanding how your memory is used and accessed is key to writing clean code for maintenance and performance optimizations. This is true regardless of language used.
Best of luck.
I've been working on a bunch of image processing programs.. nothing fancy, mostly experimenting quick and dirty. The image data is stored in vectors which are declared on the stack (I try to avoid having to use pointers when I don't need to pass data around). I've noticed that some of my functions have been behaving very strangely despite countless amounts of debugging and stepping. Sometimes the debugger would give me an error that it cannot evaluate a certain variable, among other things. Things generally just do not make sense, and past experience tells me that when this happens this means that there is some kind of overflow or memory corruption going on. The first thing that came to mind was that it was probably due to me storing lots of image data into vectors.
However, I was under the impression that vectors store their actual data in the heap, and so I thought it wouldn't hurt to have a few of these large vectors on the stack. Am I wrong in thinking this? Should I be allocating my vectors and storing them in the heap rather than the stack?
Thanks,
[...]vectors store their actual data in the heap
vector, like all other containers, uses an allocator object for memory management. Typically, if you don't specify anything as the template's second parameter, the default allocator -- std::allocator from <memory> -- is used. It is the allocator's responsibility to reserve memory. It is free to do so either from the free-store or on the stack.
Most implementations typically use the pimpl idiom and store a pointer within the vector object which points to the actual memory on the free-store.
I've noticed that some of my functions have been behaving very strangely despite countless amounts of debugging and stepping
You may want to check that you are using your vectors properly. Look up the standard as to what gurantees you get with each member function, what conditions must be satisfied for the contained types and when your iterators get invalidated. That should be a good start.
std::vector does not store its memory within itself. It allocates memory from the heap (or wherever your allocator gets it from). So whether the vector itself is on the stack is irrelevant.
I would be willing to say that 99.9% of vector implementations store all of their data in the heap. Maybe somebody out there made a stack implementation, but you're probably not dealing with that. If random, intermittent failures are occurring a corner case not getting checked with pointer arithmetic is more likely the case. Either way, we can't know without you posting code
I want to customize std::vector class in order to use an OpenGL buffer object as storage.
Is it possible doing so without relying in a specific implementation of STL, by writing a custom allocator and/or subclassing the container?
My problem is how to create wrapper methods for glMapBuffer/glUnmapBuffer in order to user the buffer object for rendering which leave the container in a consistent state.
Is it possible doing so without relying in a specific implementation of STL, by writing a custom allocator and/or subclassing the container?
You can, but that doesn't make it a good idea. And even that you can is dependent on your compiler/standard library.
Before C++11, allocators can not have state. They can cannot have useful members, because the containers are not required to actually use the allocator you pass it. They are allowed to create their own. So you can set the type of allocator, but you cannot give it a specific allocator instance and expect that instance to always be used.
So your allocator cannot just create a buffer object and store the object internally. It would have to use a global (or private static or whatever) buffer. And even then, multiple instances would be using the same buffer.
You could get around this by having the allocator stores (in private static variables) a series of buffer objects and mapped pointers. This would allow you to allocate a buffer object of a particular size, and you get a mapped pointer back. The deallocator would use the pointer to figure out which buffer object it came from and do the appropriate cleanup for it.
Of course, this would be utterly useless for actually doing anything with those buffers. You can't use a buffer that is currently mapped. And if your allocator deletes the buffer once the vector is done with the memory, then you can never actually use that buffer object to do something.
Also, don't forget: unmapping a buffer can fail for unspecified reasons. If it does fail, you have no way of knowing that it did, because the unmap call is wrapped up in the allocator. And destructors shouldn't throw exceptions.
C++11 does make it so that allocators can have state. Which means that it is more or less possible. You can have the allocator survive the std::vector that built the data, and therefore, you can query the allocator for the buffer object post-mapping. You can also store whether the unmap failed.
That still doesn't make it a good idea. It'll be much easier overall to just use a regular old std::vector and use glBufferSubData to upload it. After all, mapping a buffer with READ_WRITE almost guarantees that it's going to be regular memory rather than a GPU address. Which means that unmapping is just going to perform a DMA, which glBufferSubData does. You won't gain much performance by mapping.
Reallocation with buffer objects is going to be much more painful. Since the std::vector object is the one that decides how much extra memory to store, you can't play games like allocating a large buffer object and then just expanding the amount of memory that the container uses. Every time the std::vector thinks that it needs more memory, you're going to have to create a new buffer object name, and the std::Vector will do an element-wise copy from mapped memory to mapped memory.
Not very fast.
Really, what you want is to just make your own container class. It isn't that hard. And it'll be much easier to control when it is mapped and when it is not.
I want to customize std::vector class in order to use an OpenGL buffer object as storage.
While this certainly is possible, I strongly discourage doing so. Mapped buffer objects must be unmapped before they can be used by OpenGL as data input. Thus such a derived, let's call it glbuffervector would have to map/unmap the buffer object for each and every access. Also taking an address of a dereferenced element will not work, since after dereferencing the buffer object would be unmapped again.
Instead of trying to make a vector that stores in a buffer object, I'd implement a referencing container, which can be created from an existing buffer object, together with a layout, so that iterators can be obtained. Following an RAII scheme the buffer object would be mapped creating an instance, and unmapped with instance deallocation.
Is it possible doing so without relying in a specific implementation
of STL, by writing a custom allocator and/or subclassing the
container?
If you are using Microsoft Visual C++, there is a blog post describing how to define a custom STL allocator: "The Mallocator".
I think writing custom allocators is STL-implementation-specific.
I'm implementing a compacting garbage collector for my own personal use in C++0x, and I've got a question. Obviously the mechanics of the collector depend upon moving objects, and I've been wondering how to implement this in terms of the smart pointer types that point to it. I've been thinking about either pointer-to-pointer in the pointer type itself, or, the collector maintains a list of pointers that point to each object so that they can be modified, removing the need for a double de-ref when accessing the pointer but adding some extra overhead during collection and additional memory overhead. What's the best way to go here?
Edit: My primary concern is for speedy allocation and access. I'm not concerned with particularly efficient collections or other maintenance, because that's not really what the GC is intended for.
There's nothing straight forward about grafting on extra GC to C++, let alone a compacting algorithm. It isn't clear exactly what you're trying to do and how it will interact with the rest of the C++ code.
I have actually written a gc in C++ which works with existing C++ code, and it had a compactor at one stage (though I dropped it because it was too slow). But there are many nasty semantic problems. I mentioned to Bjarne only a few weeks ago that C++ lacks the operator required to do it properly and the situation is that it is unlikely to ever exist because it has limited utility..
What you actually need is a "re-addres-me" operator. What happens is that you do not actually move objects around. You just use mmap to change the object address. This is much faster, and, in effect, it is using the VM features to provide handles.
Without this facility you have to have a way to perform an overlapping move of an object, which you cannot do in C++ efficiently: you'd have to move to a temporary first. In C, it is much easier, you can use memmove. At some stage all the pointers to or into the moved objects have to be adjusted.
Using handles does not solve this problem, it just reduces the problem from arbitrary sized objects to constant sized ones: these are easier to manage in an array, but the same problem exists: you have to manage the storage. If you remove lots of handle from the array randomly .. you still have a problem with fragmentation.
So don't bother with handles, they don't work.
This is what I did in Felix: you call new(shape, collector) T(args). Here the shape is a descriptor of the type, including a list of offsets which contain (GC) pointers, and the address of a routine to finalise the object (by default, it calls the destructor).
It also contains a flag saying if the object can be moved with memmove. If the object is big or immobile, it is allocated by malloc. If the object is small and mobile, it is allocated in an arena, provided there is space in the arena.
The arena is compacted by moving all the objects in it, and using the shape information to globally adjust all the pointers to or into these objects. Compaction can be done incrementally.
The downside for a C++ programmer is the need to construct a correct shape object to pass. This doesn't bother me because I'm implementing a language which can generate the shape information automatically.
Now: the key point is: to do compaction, you must use a precise collector. Compaction cannot work with a conservative collector. This is very important. It is fine to allow some leakage if you see an value that looks like a pointer but happens to be an integer: some object won't be collected, but this is usually no big deal. But for compaction you have to adjust the pointers but you'd better not change that integer: so you have to know for sure when something is a pointer, so your collector has to be precise: the shape must be known.
In Ocaml this is relatively simple: everything is either a pointer or integer and the low bit is used at run time to tell. Objects pointed at have a code telling the type, and there are only a few types: either a scalar (don't scan it) or an aggregate (scan it, it only contains integers or pointers).
This is a pretty straight-forward question so here's a straight-forward answer:
Mark-and-sweep (and occasionally mark-and-compact to avoid heap fragmentation) is the fastest when it comes to allocation and access (avoiding double de-refs). It's also very easy to implement. Since you're not worried about collection performance impact (mark-and-sweep tends to freeze up the process in a nondeterministically), this should be the way to go.
Implementation details found at:
http://www.brpreiss.com/books/opus5/html/page424.html#secgarbagemarksweep
http://www.brpreiss.com/books/opus5/html/page428.html
A nursery generation will give you the best possible allocation performance because it is just a pointer bump.
You could implement pointer updates without using double indirection by using techniques like a shadow stack but this will be slow and very error prone if you're writing this C++ code by hand.