How can I decide when choosing between std::containers (std::vector or std::array) and smart pointers pointing to arrays
I know containers are objects for memory managment. They are exception safe and there will not be any memory leak and they also provide veriuty of functions for memory managment(push.back etc) and smart pointers are pointer that also do not leak memory because they delete themsefs when they are not needed anymore(like unique_ptr when geting out of scope). Propably in containers there is an overhead every time they are created.
My question is how can i decide which method to use and why.
std::vector <unsigned char>myArray(3 * outputImageHight * outputImageWidth);
std::unique_ptr<unsigned char[]>myArray(new unsigned char[3 * outputImageHight * outputImageWidth]);
I would use the vector. Your pointer version offers basically no improvements over the vector, and you lose a lot of useful functionality. You're most likely going to need to measure the size and iterate your array at some point, with a vector you get this for free, whereas you'd need to implement it yourself for your pointer version; at which point you may as well have just used the vector to begin with.
There may be a performance cost instantiating the vector, but I doubt that it would be a bottleneck for most applications. If you're creating so many vectors that instantiating them is costing you time, you can probably be smarter about managing them (pooling your memory, custom vector allocators, etc). If in doubt, measure.
One example where you might need to use the unique_ptr<> version might be if you're working with a library written in C where you lose ownership of the array. For example:
std::unique_ptr<unsigned char[]>myArray(
new unsigned char[3 * outputImageHight * outputImageWidth]);
my_c_lib_data_t cLibData;
int result = my_c_lib_set_image(cLibData, myArray);
if (MYLIB_SUCCESS == result)
// mylib successfully took ownership of the char array, so release the pointer.
myArray.release();
If you have the choice though, prefer to use C++ style containers where you can.
std::vector, primarily because it better represents the "sequence of items in contiguous memory", it is the default representation for that, and enables a wide range of common operations.
vector has move semantics, so the benefit of std::unique_ptr is limited.
If you are lucky, your STL implementation `provides small vector optimization, skipping the memory allocation for small sizes.
-- edit: I wasn't aware SBO is apparently prohibited by the standard - sorry for getting your hopes up, thanks #KarlNicholl
If pointer semantics are required, a unique_ptr<vector<T>> or shared_ptr<vector<T>> is a valid choice with little overhead.
boost did introduce shared_array etc., that represent your second option better,
but I haven't seen them get much traction.
Always use STL containers except in situation where you have good reason to use pointers. Reasons are reliability and readability, IMO.
Related
Let's say I am making my own Matrix class/container which itself needs to manage some sort of array of doubles and a row/column dimension.
At the risk of sounding hand-wavy, what is the considered "best practice" for how to store this data if speed is of importance? The options I can see are:
Raw pointer to single dynamic C array
Unique pointer to a C array, which is similar/leads us to...
std::vector of doubles
What are the speed implications of these different options? Obviously it does depend on circumstance, but in a general case? Also, the size of an std::vector on MSVC and GCC for me is 24 bytes, indicating 3 pointers to the begin iterator, end iterator and the end of the memory allocation. Since I need to store the size myself to be aware of the Matrix dimensions, the end iterator is somewhat useless to me, expect for use with algorithms.
What are the thoughts on best practices of this? Is using a raw pointer acceptable since the container is somewhat "low-level"?
Thanks!
I would use std::vector because it solves memory allocation, deallocation, indexing, copying, etc.. Unless you will be using "millions" of matrices at the same time, the extra member (capacity) is probably not relevant.
In any case, optimizing the library for speed is the last thing you want to do -- after you can test the actual speed of your initial implementation. Then you can decide if it is worth spending time to effectively duplicate std::vector functionality with your own implementation.
Is there any big difference in allocation, deallocation and access time between std::vector<> and new[] when both are fixed and same length?
Depends on the types and how you call it. std::vector<int> v(1000000); has to zero a million ints, whereas new int[1000000]; doesn't, so I would expect a difference in speed. This is one place in std::vector where you might pay through the nose for something you don't use, if for some reason you don't care about the initial values of the elements.
If you compare std::vector<int> v(1000000); with new int[1000000](); then I doubt you'll see much difference. The significant question is whether one of them somehow has a more optimized loop setting the zeros, than the other one does. If so, then the implementation of the other one has missed a trick (or more specifically the optimizer has).
new is a bad thing, because it violates the idiom of single responsibility by assuming two responsibilities: Storage allocation and object construction. Complexity is the enemy of sanity, and you fight complexity by separating concerns and isolating responsibilities.
The standard library containers allow you to do just that, and only think about objects. Moreover, std::vector addionally allows you to think about storage, but separately, via the reserve/capacity interfaces.
So for the sake of keeping a clear mind about your program logic, you should always prefer a container such as std::vector:
std::vector<Foo> v;
// make some storage available
v.reserve(100);
// work with objects - no allocation is required
v.push_back(x);
v.push_back(f(1, 2));
v.emplace_back(true, 'a', 10);
In a comparison on performance between Java and C++ our professor claimed, that theres no benefit in choosing a native array (dynamic array like int * a = new int[NUM] ) over std::vector. She claimed too, that the optimized vector can be faster, but didn't tell us why. What is the magic behind the vector?
PLEASE just answers on performance, this is not a general poll!
Any super optimized code with lower level stuff like raw arrays can beat or tie the performance of an std::vector. But the benefits of having a vector far outweigh any small performance gains you get from low level code.
vector manages it's own memory, array you have to remember to manage the memory
vector allows you to make use of stdlib more easily (with dynamic arrays you have to keep track of the end ptr yourself) which is all very well written well tested code
sometimes vector can even be faster, like qsort v std::sort, read this article, keep in mind the reason this can happen is because writing higher level code can often allow you to reason better about the problem at hand.
Since the std containers are good to use for everything else, that dynamic arrays don't cover, keeping code consistent in style makes it more readable and less prone to errors.
One thing I would keep in mind is that you shouldn't compare a dynamic array to a std::vector they are two different things. It should be compared to something like std::dynarray which unfortunately probably won't make it into c++14 (boost prolly has one, and I'm sure there are reference implementations lying around). A std::dynarray implementation is unlikely to have any performance difference from a native array.
There is benefit of using vector instead of array when the number of elements needs to be changed.
Neither optimized vector can be faster than array.
The only performance optimization a std::vector can offer over a plain array is, that it can request more memory than currently needed. std::vector::reserve does just that. Adding elements up to its capacity() will not involve any more allocations.
This can be implemented with plain arrays as well, though.
I know that manual dynamic memory allocation is a bad idea in general, but is it sometimes a better solution than using, say, std::vector?
To give a crude example, if I had to store an array of n integers, where n <= 16, say. I could implement it using
int* data = new int[n]; //assuming n is set beforehand
or using a vector:
std::vector<int> data;
Is it absolutely always a better idea to use a std::vector or could there be practical situations where manually allocating the dynamic memory would be a better idea, to increase efficiency?
It is always better to use std::vector/std::array, at least until you can conclusively prove (through profiling) that the T* a = new T[100]; solution is considerably faster in your specific situation. This is unlikely to happen: vector/array is an extremely thin layer around a plain old array. There is some overhead to bounds checking with vector::at, but you can circumvent that by using operator[].
I can't think of any case where dynamically allocating a C style
vector makes sense. (I've been working in C++ for over 25
years, and I've yet to use new[].) Usually, if I know the
size up front, I'll use something like:
std::vector<int> data( n );
to get an already sized vector, rather than using push_back.
Of course, if n is very small and is known at compile time,
I'll use std::array (if I have access to C++11), or even
a C style array, and just create the object on the stack, with
no dynamic allocation. (Such cases seem to be rare in the
code I work on; small fixed size arrays tend to be members of
classes. Where I do occasionally use a C style array.)
If you know the size in advance (especially at compile time), and don't need the dynamic re-sizing abilities of std::vector, then using something simpler is fine.
However, that something should preferably be std::array if you have C++11, or something like boost::scoped_array otherwise.
I doubt there'll be much efficiency gain unless it significantly reduces code size or something, but it's more expressive which is worthwhile anyway.
You should try to avoid C-style-arrays in C++ whenever possible. The STL provides containers which usually suffice for every need. Just imagine reallocation for an array or deleting elements out of its middle. The container shields you from handling this, while you would have to take care of it for yourself, and if you haven't done this a hundred times it is quite error-prone.
An exception is of course, if you are adressing low-level-issues which might not be able to cope with STL-containers.
There have already been some discussion about this topic. See here on SO.
Is it absolutely always a better idea to use a std::vector or could there be practical situations where manually allocating the dynamic memory would be a better idea, to increase efficiency?
Call me a simpleton, but 99.9999...% of the times I would just use a standard container. The default choice should be std::vector, but also std::deque<> could be a reasonable option sometimes. If the size is known at compile-time, opt for std::array<>, which is a lightweight, safe wrapper of C-style arrays which introduces zero overhead.
Standard containers expose member functions to specify the initial reserved amount of memory, so you won't have troubles with reallocations, and you won't have to remember delete[]ing your array. I honestly do not see why one should use manual memory management.
Efficiency shouldn't be an issue, since you have throwing and non-throwing member functions to access the contained elements, so you have a choice whether to favor safety or performance.
std::vector could be constructed with an size_type parameter that instantiate the vector with the specified number of elements and that does a single dynamic allocation (same as your array) and also you can use reserve to decrease the number of re-allocations over the usage time.
In n is known at compile-time, then you should choose std::array as:
std::array<int, n> data; //n is compile-time constant
and if n is not known at compile-time, OR the array might grow at runtime, then go for std::vector:
std::vector<int> data(n); //n may be known at runtime
Or in some cases, you may also prefer std::deque which is faster than std::vector in some scenario. See these:
C++ benchmark – std::vector VS std::list VS std::deque
Using Vector and Deque by Herb Sutter
Hope that helps.
From a perspective of someone who often works with low level code with C++, std vectors are really just helper methods with a safety net for a classic C style array. The only overheads you'd experience realistically are memory allocations and safety checks for boundaries. If you're writing a program which needs performance and are going to be using vectors as a regular array I'd recommend to just use C style arrays instead of vectors. You should realistically be vetting the data that comes into the application and check the boundaries yourself to avoid checks on every memory access to the array.
It's good to see that others are checking the differences of the C ways and the C++ ways. More often than not C++ standard methods have significantly worse performance and uglier syntax than their C counterparts and is generally the reason people call C++ bloated. I think C++ focuses more on safety and making the language more like JavaScript/C# even though the language fundamentally lacks the foundation to be one.
So to deal with large blobs of memory either for an image or similar there are clearly lots of options.
Since I'm a fan of smart pointers and RAII I'm wondering about whether it's smarter to go with :
a shared_ptr to a std::vector
or
to go with a shared_array pointing to a dynamically allocated array.
What are the conceptual, practical, and performance implications of choosing one vs the other?
It's the same as comparing std::vector vs. C array.
Think about shared_array as a RAII C array. What you get is just automatic memory deallocation. Useful in cases when you deal with 3rd-party code that returns arrays.
Theoretically it's faster than std::vector in some edge cases, but much less flexible and less secure.
std::vector is probably the better choice.
shared_ptr to std::vector
+ allows amortized constant time push_back
- introduces an extra level of indirection over std::vector
shared_array
+ does not introduce an extra level of indirection
- does not allow amortized constant time append, unless you implement it yourself, which again would take an extra level of indirection.