What are the internal differences of choosing to use an std::vector vs a dynamically allocated array? I mean, not only performance differences like in this matching title question.
I mean, I try to design a library. So I want to offer a wrapper over an StackArray, that is just a C-Style array with some member methods that contains as a member T array[N]. No indirections and the new operator removed to force the implementor to have a array type always stored in the stack.
Now, I want to offer the dynamic variant. So, with a little effort, I just can declare something like:
template <typename T>
class DynArray
{
T* array,
size_t size,
size_t capacity
};
But... this seems pretty similar to a base approach to a C++ vector.
Also, an array stored in the heap can be resized by copying the elements to a new mem location (is this true?). That's pretty much the same that makes a vector when a push_back() operation exceeds its allocated capacity, for example, right?
Should I offer both API's if exists some notable differences? Or I am overcomplicated the design of the library and may I just have my StackArray and the Vector should be just the safe abstraction over a dynamically allocated array?
First there is a mindset (usually controversial) between the usage of the modern tools that the standard provides and the legacy ones.
You usually must be studing and asking things about C++ modern features, not comparing them with the old ones. But, for learning purposes, I have to admit that it's quite interesting dive deep somethimes in this topics.
With that in mind, std::vector is a collection that makes much more that just care about the bytes stored in it. There is a constraint really important, that the data must lie in contiguous memory, and std::vector ensures this in its internal implementation. Also, has an already well known, well tested implementation of the RAII pattern, with the correct usage of new[] and delete[] operators. You can reserve storage and emplace_abck() elements in a convenient and performant way which makes this collection really unique... there are really a lot of reasons that shows why std::vector is really different from a dynamically allocated array.
Not only is to worry about manual memory management, which almost an undesirable thing to do in modern C++ (embedded systems, or operating system themselves are a good point to discuse this last sentence). It's about to have a tool, std::vector<T> that makes your life as developer easier, specially in a prone-error language like C++.
Note: I say error-prone because it's a really hard to master language, which needs a lot of study and training. You can make almost everything in the world, and has an incredible amount of features that aren't begginer friendly. Also, the retrocompatibility constraint makes it really bigger, with literally thousand of things that you must care about. So, with a great power, always comes a great responsability.
Related
I am writing a small toy language/compiler for (fun and) scientific applications. Core design principles are simplicity and efficiency (some kind of "modern" Fortran if you will). The language would have built-in arrays, that would look something like that:
let x: Real[5] = {1.0, 2.0, 3.0, 4.0, 5.0}
let n = get_runtime_value()
let y: Integer[100,n] = ...
In the above statement, the user does not explicitly states whether the array should be allocated on the stack or on the heap. If at all possible, I'd rather not expose that to the users (my reasoning is that most engineers don't know the difference, and should not have to care. They have other problems to worry about.).
Technically, I could write something like:
if (some input parameter cannot be known at compile time)
allocate on the heap
else # candidate for the stack
if (the array is not returned by the function && the allocated size is smaller than some threshold)
allocate on the stack
else
allocate on the heap
However, this design scares me for a few reasons:
Added complexity, longer compilation times?
In C++, the compiler can perform RVO and return a value on the stack directly. I guess I could make the algorithm more complex to detect such cases, but this will make the whole thing more complex/buggy/slow to compile.
A slight change in array size could cause the switch from stack to heap. That could be confusing for the user. Defining this threshold would also require some care.
I need to check that some reference to that array is not being returned either (as well as references of references, etc.). I imagine that could be expensive to track down.
Note that I do not want to expose pointers or references in my language. Arrays will always be passed by reference under the hood.
Is there a neat way in the literature to solve this problem? Has it been done before in an existing language? All the languages I know require the user to specify where they want their data: Fortran has ::allocatable, C++ has std::vector and std::array, etc. I could also do something like llvm's SmallVector and always allocate a few elements on the stack before moving to the heap. Does my approach make any sense at all? I am using this project to learn more about compilers and language design. Is there something I should be watchful for?
The option is really up to you, but if I were you, I would let the user choose whether to allocate on the heap or stack. If not, it will most likely get very confusing for you, and the user. If you still want to implement the feature, I have some tips.
Instead of checking if things cannot be known at compile-time, check what can be known at compile-time-- it will simplify things.
Make everything defined on the heap or stack by default-- this will make it easier to handle situations where you need to switch from stack to heap (or vice versa) (it's easier because suppose when append is called you can switch to heap).
In conclusion, I recommend you let the user be explicit about declaring an array on the stack or heap.
What (if anything) can be done more efficiently with a array rather than a container?
I recently learned about C++ standard container classes. They have clear advantages and solve common problems with C-style arrays. The FAQs list on "why arrays are evil" can be summarized loosely like this:
1. subscripts are not checked
2. often it is required to allocate memory from the heap
3. not easy to insert elements in the middle
4. always passed as reference
I guess there are many cases, where one can live with these disadvantages. However, I am a bit puzzled about the question, what is it that can be done more efficient / easier with arrays rather than with containers? Or is there actually nothing like that and I really should not care about arrays anymore?
"However, I am a bit puzzled about the question, what is it that can be done more efficient / easier with arrays rather than with containers?"
Well, if you're referring to c-style arrays, with the current c++ standard, there's nothing left about the disadvantages of the classic standard c++ container classes (like e.g. std::vector) IMHO. These have a dependency to have allocatable memory (with new()), and that could well be be a restriction for your current (OS/bare metal) environment, as not being available out of the box.
The current standard provides std::array, that works without dynamic memory allocation needs, but fulfills all of the claims:
"1. subscripts are not checked"
std::array does subscript checking
"2. often it is required to allocate memory from the heap"
It's the client's choice where it's actually allocated with std::array. std::vector suffices anyway for doing this.
"3. not easy to insert elements in the middle"
Well, that's one point not supported well by std::array out of the box. But again, std::vector and other container classes support this well (as long your environment supports dynamic memory allocation)
"4. always passed as reference"
std::array supports passing by reference well, and much better than it can (can't) be achieved with c-style arrays.
Though there may be specialized cases for e.g. reusable object instance pools, or flyweight object instances, you might want to solve using the placement new() operator. Implementations for such solutions, usually would involve you're going to operate on raw c-style arrays.
Built-in arrays are a low-level tool with a somewhat awkward interface. They are needed to implement better to use classes, though: one of the nice things about C++ is that exposes many of the low-level tools to create efficient implementations of higher-level abstractions. To me, the primary uses of built-in arrays are these:
The mechanism to implement higher-level abstractions like std::vector<T> or std::array<T, N> (well, std::vector<...> and family don't really use built-in arrays but deal diretly with raw memory internally).
When I need an array of values initialized with a sequence of values allocated on stack, I'd use built-in arrays (std::array<...> can't deduce the number of arguments and anything using std::initializer_list<T> to get initialized won't have a fixed size).
Even though std::array<T, N> is really just a rewrite of [some of] the functionality of built-in arrays it has the nice feature that a debug implementation can assert assumptions which are made.
BTW, you list doesn't include on of the bigger issues: if you have a variable sized array, you'll need to give it your element type a default constructor. With C++11 the default constructor can, at least, be defaulted (that is an issue if your class needs another constructor) which can avoid initialization of objects about to be initialized. However, the various container classes take a lot of the complexity out of the picture.
Arrays on the stack can be more efficient than a vector, since the vector will always do a separate memory allocation. You're unlikely to notice the difference unless it's something performed many times in a large tight loop.
Or is there actually nothing like that and I really should not care about arrays anymore?
Consider that C++ dates to 1983, and it has seen a number of big changes since then. The container classes that are available now were designed to avoid the pitfalls that you listed, so it's not surprising that they're better in those respects. There is one thing that you can do with C-style arrays that you can't with modern container classes, however: you can compile your code with very old compilers. If you have a pressing need to maintain compatibility with compilers from the mid-1980's, you should stick with C-style arrays. Otherwise, use the newer, better approach.
c-stlye arrays have several advantages over stl containers (specifically std::array). Of course the same is true the other way around.
For one, with c-style arrays you have control over memory layout, which is extremely useful when interpreting network packets, or any similar data sources. This allows you to cast a block of memory to a struct, saving copy/assignment operations, which is necessary in some performance sensitive applications.
Another reason is simplicity - in a case where you don't need any of the benefits std containers offer, why use them?
And there's compatibility - different stl implementations will change with different compilers. Using arrays in the interfaces of shared libraries (so/dll) instead of stl containers allows the user to write against the shared library using almost any compiler. This is explicitly not true for stl containers.
Finally there's low level optimizations. There are situations where arrays can be faster than their stl equivalent std::array, although these situations are somewhat rare.
Currently, I have all my objects managing their own memory, allocating with new in their constructors typically, and using delete in their destructors. This works for now, but the number of classes I have that use arbitrary amounts of memory is growing. The fact that new is essentially a "request" also bothers me, since these objects have no code within them to handle being told "no", and I don't want to rely on Exception Handling if I do not need to.
Is it beneficial in terms of performance to completely
shield all calls that allocate memory, to a single class that handles
every memory allocation on the heap, probably allocating large chunks
at a time and using placement new to deal out references?
Is the use of memory allocation in smaller classes a big enough concern to even bother with this?
Can I still use STL containers and force them to use the heap I
provide?
Thank you in advance!
Can I still use STL containers and force them to use the heap I provide?
STL containers accept custom allocators:
http://en.wikipedia.org/wiki/Allocator_(C%2B%2B)#Custom_allocators
Here is a thread with links to samples:
Compelling examples of custom C++ allocators?
Is it beneficial in terms of performance ... ?
You can only find out by writing your application, coming up with a set of reproducible test scenarios, and running your code in a profiler. If you find the memory allocation to be a significant portion of the running time, then you might benefit from a better allocation strategy.
If you can break up your program to a feature level, and can come up with realistic scenarios for each case, you don't have to have your whole program working to do this. But remember that time spent optimizing is time that could be spent testing or coding new features :) Do what is necessary and no more...
Is the use of memory allocation in smaller classes a big enough concern to even bother with this?
Depending on your program, how sensitive you are to big allocation hitches, how often you allocate in loops, etc, it is possible. Profile :)
While developing your app, you can still be sensitive to allocations - create automatic storage (stack local) variables when you can, and allocate dynamically only when you must.
I'm not really sure I understand your problem here. That said, using STL containers in C++03 with your custom heap will be challenging, since allocators are considered stateless. Also, why don't you want to rely on exception handling? Are you aware that there is a no_throw version of new?
Edit: The no-throw version of new is invoked like this: new (std::nothrow) Type[size];. If the allocation fails, it will return a null pointer (0) instead of throwing std::bad_alloc.
Can I still use STL containers and force them to use the heap I provide?
Yes, look up STL allocators.
Is it beneficial in terms of performance to completely shield all calls that allocate memory, to a single class that handles every memory allocation on the heap, probably allocating large chunks at a time and using placement new to deal out references?
That's basically what a good implementation of malloc or new does. I disrecommend doing it yourself unless the situation is very performance-critical and everything else has been optimized. Why? Because good and well-thought through memory management is very hard to even get bug-free, let alone working optimized.
Is the use of memory allocation in smaller classes a big enough concern to even bother with this?
It depends, if you're programming a coffee machine or gaming device with 16k of memory, perhaps, but on a regular desktop computer or laptop probably not. Also remember that the stack is very fast (allocation and access) while the heap is a lot worse on allocation and slightly (not so sure about that TBH) on usage so in day to day situations you want to prefer the stack.
You say you're calling new in your constructors... Is that really necessary? I mean, instead of this...
class A{
std::vector<int>* v;
A( int vectorSize ){
v = new std::vector<int>( vectorSize, 0 );
}
~A(){
delete v;
}
};
...it's always preferrable to do this:
class A{
std::vector<int> v;
A( int vectorSize ):
v( vectorSize, 0 ){
}
};
This way you avoid using the heap. You should only use the heap when you have no other choice.
Asides from that, like the others said, writing your custom allocator is a very complex task and should only be done in a performance-critical scenario
I've done this before, and here's my experience:
It can get very messy very quickly. Especially since you now take memory allocation into your hands and you have to deal with stuff like fragmentation and re-entrancy.
However, I've seen performance boosts upwards of 20% due to being able to bypass the OS overheads.
For your last question, I think there is a way to make them use a custom allocator, but I've never done it before. I do most of my coding in C.
EDIT:
Based on your comment, here's an update. You don't really have to deal with building an allocator. You can probably get away with just pointing all memory allocations to your custom class. Then your custom class will call malloc() or new and catch whatever NULL or exception is returned.
(Though it will take some work replacing every single new with your own malloc().)
According to C++ Primer 4th edition, page 755, there is a note saying:
Modern C++ programs ordinarily ought to use the allocator class
to allocate memory. It is safer and more flexible.
I don't quite understand this statement.
So far all the materials I read teach using new to allocate memory in C++.
An example of how vector class utilize allocator is shown in the book.
However, I cannot think of other scenarios.
Can anyone help to clarify this statement? and give me more examples?
When should I use allocator and when to use new? Thanks!
For general programming, yes you should use new and delete.
However, if you are writing a library, you should not!
I don't have your textbook, but I imagine it is discussing allocators in the context of writing library code.
Users of a library may want control over exactly what gets allocated from where. If all of the library's allocations went through new and delete, the user would have no way to have that fine-grained level of control.
All STL containers take an optional allocator template argument. The container will then use that allocator for its internal memory needs. By default, if you omit the allocator, it will use std::allocator which uses new and delete (specifically, ::operator new(size_t) and ::operator delete(void*)).
This way, the user of that container can control where memory gets allocated from if they desire.
Example of implementing a custom allocator for use with STL, and explanation: Improving Performance with Custom Pool Allocators for STL
Side Note: The STL approach to allocators is non-optimal in several ways. I recommend reading Towards a Better Allocator Model for a discussion of some of those issues.
Edit in 2019: The situation in C++ has improved since this answer was written. Stateful allocators are supported in C++11, and that support was improved in C++17. Some of the people involved in the "Towards a Better Allocator Model" were involved in those changes (eg: N2387), so that's nice (:
The two are not contradictory. Allocators are a PolicyPattern or StrategyPattern used by the STL libraries' container adapters to allocate chunks of memory for use with objects.
These allocators frequently optimize memory allocation by allowing
* ranges of elements to be allocated at once, and then initialized using a placement new
* items to be selected from secondary, specialized heaps depending on blocksize
One way or another, the end result will (almost always) be that the objects are allocated with new (placement or default)
Another vivid example would be how e.g. boost library implements smartpointers. Because smartpointers are very small (with little overhead) the allocation overhead might become a burden. It would make sense for the implementation to define a specialized allocator to do the allocations, so one may have efficient std::set<> of smartpointers, std::map<..., smartpointer> etc.
(Now I'm almost sure that boost actually optimizes storage for most smartpointers by avoiding any virtuals, therefore the vft, making the class a POD structure, with only the raw pointer as storage; some of the example will not apply. But then again, extrapolate to other kinds of smartpointer (refcounting smartpointers, pointers to member functions, pointers to member functions with instance reference etc. etc.))
I'm implementing a compacting garbage collector for my own personal use in C++0x, and I've got a question. Obviously the mechanics of the collector depend upon moving objects, and I've been wondering how to implement this in terms of the smart pointer types that point to it. I've been thinking about either pointer-to-pointer in the pointer type itself, or, the collector maintains a list of pointers that point to each object so that they can be modified, removing the need for a double de-ref when accessing the pointer but adding some extra overhead during collection and additional memory overhead. What's the best way to go here?
Edit: My primary concern is for speedy allocation and access. I'm not concerned with particularly efficient collections or other maintenance, because that's not really what the GC is intended for.
There's nothing straight forward about grafting on extra GC to C++, let alone a compacting algorithm. It isn't clear exactly what you're trying to do and how it will interact with the rest of the C++ code.
I have actually written a gc in C++ which works with existing C++ code, and it had a compactor at one stage (though I dropped it because it was too slow). But there are many nasty semantic problems. I mentioned to Bjarne only a few weeks ago that C++ lacks the operator required to do it properly and the situation is that it is unlikely to ever exist because it has limited utility..
What you actually need is a "re-addres-me" operator. What happens is that you do not actually move objects around. You just use mmap to change the object address. This is much faster, and, in effect, it is using the VM features to provide handles.
Without this facility you have to have a way to perform an overlapping move of an object, which you cannot do in C++ efficiently: you'd have to move to a temporary first. In C, it is much easier, you can use memmove. At some stage all the pointers to or into the moved objects have to be adjusted.
Using handles does not solve this problem, it just reduces the problem from arbitrary sized objects to constant sized ones: these are easier to manage in an array, but the same problem exists: you have to manage the storage. If you remove lots of handle from the array randomly .. you still have a problem with fragmentation.
So don't bother with handles, they don't work.
This is what I did in Felix: you call new(shape, collector) T(args). Here the shape is a descriptor of the type, including a list of offsets which contain (GC) pointers, and the address of a routine to finalise the object (by default, it calls the destructor).
It also contains a flag saying if the object can be moved with memmove. If the object is big or immobile, it is allocated by malloc. If the object is small and mobile, it is allocated in an arena, provided there is space in the arena.
The arena is compacted by moving all the objects in it, and using the shape information to globally adjust all the pointers to or into these objects. Compaction can be done incrementally.
The downside for a C++ programmer is the need to construct a correct shape object to pass. This doesn't bother me because I'm implementing a language which can generate the shape information automatically.
Now: the key point is: to do compaction, you must use a precise collector. Compaction cannot work with a conservative collector. This is very important. It is fine to allow some leakage if you see an value that looks like a pointer but happens to be an integer: some object won't be collected, but this is usually no big deal. But for compaction you have to adjust the pointers but you'd better not change that integer: so you have to know for sure when something is a pointer, so your collector has to be precise: the shape must be known.
In Ocaml this is relatively simple: everything is either a pointer or integer and the low bit is used at run time to tell. Objects pointed at have a code telling the type, and there are only a few types: either a scalar (don't scan it) or an aggregate (scan it, it only contains integers or pointers).
This is a pretty straight-forward question so here's a straight-forward answer:
Mark-and-sweep (and occasionally mark-and-compact to avoid heap fragmentation) is the fastest when it comes to allocation and access (avoiding double de-refs). It's also very easy to implement. Since you're not worried about collection performance impact (mark-and-sweep tends to freeze up the process in a nondeterministically), this should be the way to go.
Implementation details found at:
http://www.brpreiss.com/books/opus5/html/page424.html#secgarbagemarksweep
http://www.brpreiss.com/books/opus5/html/page428.html
A nursery generation will give you the best possible allocation performance because it is just a pointer bump.
You could implement pointer updates without using double indirection by using techniques like a shadow stack but this will be slow and very error prone if you're writing this C++ code by hand.