How to append raw bytes to std::vector? - c++

I want to append the raw bytes into vector like this.
vector.reserve(current_size + append_data_size);
memcpy(append_data, vector.data() + current_size, append_data_size);
vector.resize(current_size + append_data_size) // Expect only set size to current_size + append_data_size.
does below is slower? because I think vector is initialised to default first then set the data which is waste.
vector.resize(current_size + append_data_size);
memcpy(append_data, vector.data() + current_size, append_data_size);

Modifying vector storage beyond its size is undefined behavior, and a subsequent resize will initialize the new elements at the end of the storage.
However, you could use insert instead:
vector.insert(vector.end(), bytes, bytes + size);

Even if you call reserve, you still must call resize on the vector if you want to access the new elements, otherwise the behaviour of your code is undefined. What reserve can do is make push_back and other such operations more efficient.
Personally I wouldn't concern yourself with any such optimisations unless you can prove they have an effect with an appropriate profiling tool. More often than not, fiddling with the capacity of a std::vector is pointless.
Also using memcpy is hazardous. (Copy constructors will not be called for example, knowledge of the exact behaviour of copying padding in structures with memcpy for example is a sure way of increasing your reputation on this site!) Use insert instead and trust the compiler to optimise as appropriate.
Without an explicit additional parameter, std::vector::resize value-initialises any additional members. Informally that means the elements of a std::vector of T say are set to values in the same way as the t in static T t; would be.

Related

How to deallocate excess memory allocated to a object (vector) [duplicate]

Is there a way to reduce the capacity of a vector ?
My code inserts values into a vector (not knowing their number beforehand), and
when this finishes, the vectors are used only for read operations.
I guess I could create a new vector, do a .reseve() with the size and copy
the items, but I don't really like the extra copy operation.
PS: I don't care for a portable solution, as long as it works for gcc.
std::vector<T>(v).swap(v);
Swapping the contents with another vector swaps the capacity.
std::vector<T>(v).swap(v); ==> is equivalent to
std::vector<T> tmp(v); // copy elements into a temporary vector
v.swap(tmp); // swap internal vector data
Swap() would only change the internal data structure.
With C++11, you can call the member function shrink_to_fit(). The draft standard section 23.2.6.2 says:
shrink_to_fit is a non-binding request
to reduce capacity() to size(). [Note: The request is non-binding to
allow latitude for
implementation-specific optimizations.
—end note]
Go look at Scott Meyers Effective STL item 17.
Basically you can't directly reduce the storage size of a std::vector. resize() and reseve() will never reduce the actually memory footprint of a container. The "trick" is to create a new container of the right size, copy the data and swap that with the current container. If we would like to clear a container out this is simply:
std::vector<T>().swap(v);
If we have to copy the data over then we need to do the copy:
std::vector<T>(v).swap(v);
What this does is creates a new vector with the data from the old one, doing the copy that would be required in any operation that has the effect you need. Then calling swap() will just swap the internal buffers between the objects. At the end of the line the temporary vector that was created is deleted, but it has the guts from the old vector and the old vector has the guts from the new copy that is the exact size we need.
The idiomatic solution is to swap with a newly constructed vector.
vector<int>().swap(v);
Edit: I misread the question. The code above will clear the vector. OP wants to keep the elements untouched, only shrink capacity() to size().
It is difficult to say if aJ's code will do that. I doubt there's portable solution. For gcc, you'll have to take a look at their particular implementation of vector.
edit: So I've peeked at libstdc++ implementation. It seems that aJ's solution will indeed work.
vector<int>(v).swap(v);
See the source, line 232.
No, you cannot reduce the capacity of a vector without copying. However, you can control how much new allocation growth by checking capacity() and call reserve() every time you insert something. The default behavior for std::vector is to grow its capacity by a factor of 2 every time new capacity is needed. You can growth it by your own magic ratio:
template <typename T>
void myPushBack(std::vector<T>& vec, const T& val) {
if (vac.size() + 1 == vac.capacity()) {
vac.reserve(vac.size() * my_magic_ratio);
}
vec.push_back(val);
}
If you're into a bit hacky techniques, you can always pass in your own allocator and do whatever you need to do to reclaim the unused capacity.
I'm not saying that GCC couldn't have some method for doing what you want without a copy, but it would be tricky to implement (I think) because vectors need to use an Allocator object to allocate and deallocate memory, and the interface for an Allocator doesn't include a reallocate() method. I don't think it would be impossible to do, but it might be tricky.
If you're worried about about the overhead of your vector then maybe you should be looking to using another type of data structure. You mentioned that once your code is done initializing the vector it becomes a read only process. I would suggest going with an open ended array that will allow the program to decide its capacity at compile time. Or perhaps a linked list would be more suitable to your needs.
Lemme know if I completely misunderstood what you were getting at.
-UBcse
Old thread, I know, but in case anyone is viewing this in the future.. there's shrink_to_fit() in C++11 but since it is a non-binding request, the behaviour will depend on its implementation.
See: http://en.cppreference.com/w/cpp/container/vector/shrink_to_fit
I'm not an expert in C++,but it seems this solution works(atleast compiling it with g++ does):
std::vector<int>some_vector(20);//initial capacity 10
//first you gotta resize the vector;
some_vector.resize(10);
//then you can shrink to fit;
some_vector.shrink_to_fit();
//new capacity is 10;
This also works:
Try it online!
v = std::vector<T>(v); // if we need to keep same data
v = std::vector<T>(); // if we need to clear
It calls && overload of = operator, which does moving, same overload is used by swap().
Get the "Effective STL" book by Scott Myers. It has a complete item jus on reducing vector's capacity.

Faster alternative to push_back(size is known)

I have a float vector. As I process certain data, I push it back.I always know what the size will be while declaring the vector.
For the largest case, it is 172,490,752 floats. This takes about eleven seconds just to push_back everything.
Is there a faster alternative, like a different data structure or something?
If you know the final size, then reserve() that size after you declare the vector. That way it only has to allocate memory once.
Also, you may experiment with using emplace_back() although I doubt it will make any difference for a vector of float. But try it and benchmark it (with an optimized build of course - you are using an optimized build - right?).
The usual way of speeding up a vector when you know the size beforehand is to call reserve on it before using push_back. This eliminates the overhead of reallocating memory and copying the data every time the previous capacity is filled.
Sometimes for very demanding applications this won't be enough. Even though push_back won't reallocate, it still needs to check the capacity every time. There's no way to know how bad this is without benchmarking, since modern processors are amazingly efficient when a branch is always/never taken.
You could try resize instead of reserve and use array indexing, but the resize forces a default initialization of every element; this is a waste if you know you're going to set a new value into every element anyway.
An alternative would be to use std::unique_ptr<float[]> and allocate the storage yourself.
::boost::container::stable_vector Notice that allocating a contiguous block of 172 *4 MB might easily fail and requires quite a lot page joggling. Stable vector is essentially a list of smaller vectors or arrays of reasonable size. You may also want to populate it in parallel.
You could use a custom allocator which avoids default initialisation of all elements, as discussed in this answer, in conjunction with ordinary element access:
const size_t N = 172490752;
std::vector<float, uninitialised_allocator<float> > vec(N);
for(size_t i=0; i!=N; ++i)
vec[i] = the_value_for(i);
This avoids (i) default initializing all elements, (ii) checking for capacity at every push, and (iii) reallocation, but at the same time preserves all the convenience of using std::vector (rather than std::unique_ptr<float[]>). However, the allocator template parameter is unusual, so you will need to use generic code rather than std::vector-specific code.
I have two answers for you:
As previous answers have pointed out, using reserve to allocate the storage beforehand can be quite helpful, but:
push_back (or emplace_back) themselves have a performance penalty because during every call, they have to check whether the vector has to be reallocated. If you know the number of elements you will insert already, you can avoid this penalty by directly setting the elements using the access operator []
So the most efficient way I would recommend is:
Initialize the vector with the 'fill'-constructor:
std::vector<float> values(172490752, 0.0f);
Set the entries directly using the access operator:
values[i] = some_float;
++i;
The reason push_back is slow is that it will need to copy all the data several times as the vector grows, and even when it doesn’t need to copy data it needs to check. Vectors grow quickly enough that this doesn’t happen often, but it still does happen. A rough rule of thumb is that every element will need to be copied on average once or twice; the earlier elements will need to be copied a lot more, but almost half the elements won’t need to be copied at all.
You can avoid the copying, but not the checks, by calling reserve on the vector when you create it, ensuring it has enough space. You can avoid both the copying and the checks by creating it with the right size from the beginning, by giving the number of elements to the vector constructor, and then inserting using indexing as Tobias suggested; unfortunately, this also goes through the vector an extra time initializing everything.
If you know the number of floats at compile time and not just runtime, you could use an std::array, which avoids all these problems. If you only know the number at runtime, I would second Mark’s suggestion to go with std::unique_ptr<float[]>. You would create it with
size_t size = /* Number of floats */;
auto floats = unique_ptr<float[]>{new float[size]};
You don’t need to do anything special to delete this; when it goes out of scope it will free the memory. In most respects you can use it like a vector, but it won’t automatically resize.

Are C++ vector constructors efficient?

If I make a vector like this:
vector<int>(50000000, 0);
What happens internally? Does it make a default vector and then continually add values, resizing as necessary? Note: 50,000,000 is not known at compile time.
Would it make a difference if I make the vector like this:
gVec = vector<int>();
gVec.reserve(50000000);
// push_back default values
Please tell me the constructor knows to avoid unnecessary reallocations given the two parameters.
Would it make a difference if I make the vector like this:
gVec = vector<int>();
gVec.reserve(50000000);
// push_back default values
Yes it definitiely makes a difference Using push_back() to fill in the default values may turn out a lot less efficient.
To have the same operations as with done with the constructor vector<int>(50000000, 0); use std::vector<int>::resize():
vector<int> gVec;
gVec.resize(50000000,0);
You will greatly enhance what you learn from this question by stepping through the two options in the debugger - seeing what the std::vector source code does should be instructive if you can mentally filter out a lot of the initially-confusing template and memory allocation abstractions. Demystify this for yourself - the STL is just someone else's code, and most of my work time is spent looking through that.
std::vector guarantees contiguous storage so only one memory block is ever allocated for the elements. The vector control structure will require a second allocation, if it is heap-based and not RAII (stack-based).
vector<int>(N, 0);
creates a vector of capacity >= N and size N, with N values each set to 0.
Step by step:
gVec = vector<int>();
creates an empty vector, typically with a nominal 'best-guess' capacity.
gVec.reserve(N);
updates the vector's capacity - ensures the vector has room for at least N elements. Typically this involves a reallocation from the 'best guess' default capacity, which is unlikely to be large enough for the value of N proposed in this question.
// push_back default values
Each iteration here increases the vector's size by one and sets the new back() element of the vector to 0. The vector's capacity will not change until the number of values pushed exceeds N plus whatever pad the vector implementation might have applied (typically none).
reserve solely allocates storage. No initialization is performed. Applied on an empty vector it should result in one call to the allocate member function of the allocator used.
The constructor shown allocates the storage required and initializes every element to zero: It's semantically equivalent to a reserve and a row of push_back's.
In both cases no reallocations are done.
I suppose in theory the constructor could start by allocating a small block of memory and expanding several times before returning, at least for types that didn't have side-effects in their copy constructor. This would be allowed only because there were no observable side effects of doing so though, not because the standard does anything to allow it directly.
At least in my opinion, it's not worth spending any time or effort worrying about such a possibility though. Chances of anybody doing it seem remote, to say the least. It's only "allowed" to the degree that it's essentially impossible to truly prohibit it.

Efficient Array Reallocation in C++

How would I efficiently resize an array allocated using some standards-conforming C++ allocator? I know that no facilities for reallocation are provided in the C++ alloctor interface, but did the C++11 revision enable us to work with them more easily? Suppose that I have a class vec with a copy-assignment operator foo& operator=(const foo& x) defined. If x.size() > this->size(), I'm forced to
Call allocator.destroy() on all elements in the internal storage of foo.
Call allocator.deallocate() on the internal storage of foo.
Reallocate a new buffer with enough room for x.size() elements.
Use std::uninitialized_copy to populate the storage.
Is there some way that I more easily reallocate the internal storage of foo without having to go through all of this? I could provide an actual code sample if you think that it would be useful, but I feel that it would be unnecessary here.
Based on a previous question, the approach that I took for handling large arrays that could grow and shrink with reasonable efficiency was to write a container similar to a deque that broke the array down into multiple pages of smaller arrays. So for example, say we have an array of n elements, we select a page size p, and create 1 + n/p arrays (pages) of p elements. When we want to re-allocate and grow, we simply leave the existing pages where they are, and allocate the new pages. When we want to shrink, we free the totally empty pages.
The downside is the array access is slightly slower, in that given and index i, you need the page = i / p, and the offset into the page i % p, to get the element. I find this is still very fast however and provides a good solution. Theoretically, std::deque should do something very similar, but for the cases I tried with large arrays it was very slow. See comments and notes on the linked question for more details.
There is also a memory inefficiency in that given n elements, we are always holding p - n % p elements in reserve. i.e. we only ever allocate or deallocate complete pages. This was the best solution I could come up with in the context of large arrays with the requirement for re-sizing and fast access, while I don't doubt there are better solutions I'd love to see them.
A similar problem also arises if x.size() > this->size() in foo& operator=(foo&& x).
No, it doesn't. You just swap.
There is no function that will resize in place or return 0 on failure (to resize). I don't know of any operating system that supports that kind of functionality beyond telling you how big a particular allocation actually is.
All operating systems do however have support for implementing realloc, however, that does a copy if it cannot resize in place.
So, you can't have it because the C++ language would not be implementable on most current operating systems if you had to add a standard function to do it.
There are the C++11 rvalue reference and move constructors.
There's a great video talk on them.
Even if re-allocate exists, actually, you can only avoid #2 you mentioned in your question in a copy constructor. However in the case of internal buffer growing, re-allocate can save these four operations.
Is internal buffer of your array continuous? if so see the answer of your link
if not, Hashed array tree or array list may be your choice to avoid re-allocate.
Interestingly, the default allocator for g++ is smart enough to use the same address for consecutive deallocations and allocations of larger sizes, as long as there is enough unused space after the end of the initially-allocated buffer. While I haven't tested what I'm about to claim, I doubt that there is much of a time difference between malloc/realloc and allocate/deallocate/allocate.
This leads to a potentially very dangerous, nonstandard shortcut that may work if you know that there is enough room after the current buffer so that a reallocation would not result in a new address. (1) Deallocate the current buffer without calling alloc.destroy() (2) Allocate a new, larger buffer and check the returned address (3) If the new address equals the old address, proceed happily; otherwise, you lost your data (4) Call allocator.construct() for elements in the newly-allocated space.
I wouldn't advocate using this for anything other than satisfying your own curiosity, but it does work on g++ 4.6.

vector and dumping

From what i know a vector is guaranteed to be continuous and i can write a chunk of memory to it and do send of fwrite with it. All i need to do is make sure i call .resize() to force it to be the min length i need then i can use it as a normal char array? would this code be correct
v.resize(numOfElements);
v.clear(); //so i wont get numOfElements + len when i push back
vector<char>v2;
v2.resize(numOfElements*SizeOfType);
while(...)
{
...
v.push_bacK(x);
}
compress(&v2[0], len, &v[0], len);
fwrite(&v2[0], ....)
noting that i never push back or pop v2 i only resize it once and used it as a char array. Would this be safe? and if i also dumped v that would also be safe(i do push back and clear, i may dump it for testing)
v.resize(numOfElements);
v.clear(); //so i wont get numOfElements + len when i push back
Well, that above code snippet is in effect allocating and creating elements, just to destroy them again. It's in effect the same as:
v.reserve(numOfElements);
Just that this code is way faster. So, v.size() == 0 in both cases and v.capacity() might be the same as numOfElements in both cases too (although this is not guaranteed). In the second case, however, the capacity is at least numOfElements, which means the internal buffer will not be reallocated until you have push_back'ed that many elements to your vector. Note that in both cases it is invalid if you try accessing any elements - because there are zero elements actually contained.
Apart from that, i haven't figured a problem in your code. It's safe and i would encourage it so use it instead of a raw new or malloc because of the added safeties it provides. I'm however not sure what you mean by "dump v".
Indeed, std::vector is guaranteed to be contiguous, in order to be layout-compatible with a C array. However, you must be aware that many operations of the vector invalidate all pointers pointing to its elements, so you'd better stick to one type of use: avoid mixing pointer arithmetic and method calls on the vector.
Apart from that is perfectly correct, except the first line : what you want is
v.reserve(numOfElements);
which will allocate enough place to store numOfElements into the vector, whereas
v.resize(numOfElements);
will do the following:
// pseudo-code
if (v.size() < numOfElements)
insert (numOfElements - size) elements default
constructed at the end of the vector
if (v.size() > numOfElements)
erase the last elements so that size = numOfElements
To sum up, after a reserve you are sure that vector capacity is superior or equal to numOfElements, and after a resize you are sure that vector size is equal to numOfElements.
For something like this I would personally use a class like STLSoft's auto_buffer<>:
http://www.synesis.com.au/software/stlsoft/doc-1.9/classstlsoft_1_1auto__buffer.html
As a disclaimer - I don't use the actual STLSoft library version, I've adapted my own template that is quite similar - I started from the Matthew Wilson's (the STLSoft author's) book "Imperfect C++".
I find it useful when I really just want a plain-old C array, but the size must be dynamic at runtime. auto_buffer<> is safer than a plain old array, but once you've constructed it you have no worries about how many elements are there or not - it's always whatever you constructed it with, just like an array (so it's a bit less complex than vector<> - which is appropriate at times).
The major downside to auto_buffer<> is that it's not standard and it's not in Boost, so you either have to incorporate some of STLSoft into your project or roll your own version.
Yes you use a vector of char as a buffer for reading raw input.
// dynamically allocates a buffer of 10,000 char as buffer
std::vector<char> data(10000);
fread(&data[0], sizeof(char),data.size(),fp);
I would not use it for reading any non POD data type directly into an vector though.
You could potentially use a vector as a source for a write.
But I would be very carefull how you read that back in (it may be easier to serialize it).
fwrite(&data[0], sizeof(char),data.size(),fp);
You're replacing reserve() with resize, you may as well replace
vector<char> v2
with
vector<Type> v2
This should simplify the code a tiny bit.
To be frank, it's the oddest use of vectors I've ever seen, but it probably will work. Are you sure you don't want to go with new char[size] and some sort of auto pointer from boost?