std::copy and std::vector problem - c++

I understand why this causes a segfault:
#include <algorithm>
#include <vector>
using namespace std;
int main()
{
vector<int> v;
int iArr[5] = {1, 2, 3, 4, 5};
int *p = iArr;
copy(p, p+5, v.begin());
return 0;
}
But why does this not cause a segfault?
#include <algorithm>
#include <vector>
using namespace std;
int main()
{
vector<int> v;
int iArr[5] = {1, 2, 3, 4, 5};
int *p = iArr;
v.reserve(1);
copy(p, p+5, v.begin());
return 0;
}

Both are wrong as you are copying to empty vector and copy requires that you have space for insertion. It does not resize container by itself. What you probably need here is back_insert_iterator and back_inserter:
copy(p, p+5, back_inserter(v));

This is undefined behavior - reserve() allocates a buffer for at least one element and the element is left uninitialized.
So either the buffer is big enough and so you technically can access elements beyond the first one or it is not big enough and you just happen to not observe any problems.
The bottom line is - don't do it. Only access elements that are legally stored in the vector instance.

But why does this not cause a
segfault?
Because the stars aligned. Or you were running in debug and the compiler did something to "help" you. Bottom line is you're doing the wrong thing, and crossed over in to the dark and nondeterministic world of Undefined Behavior. You reserve one spot in the vector and then try to cram 5 elements in to the reserve-ed space. Bad.
You have 3 options. In my personal order of preference:
1) Use a back_insert_iterator which is designed for just this purpose. It is provided by #include <iterator>. The syntax is a bit funky, but fortunately a nice sugar-coated shortcut, back_inserter is also provided:
#include <iterator>
// ...
copy( p, p+5, back_inserter(v) );
2) assign the elements to the vector. I prefer this method slightly less simply because assign is a member of vector, and that strikes me as slightly less generic than using somethign from algorithm.
v.assign(p, p+5);
3) reserve the right number of elements, then copy them. I consider this to be a last ditch effort in case everything else fails for whatever reason. It relies on the fact that a vector's storage is contiguous so it's not generic, and it just feels like a back-door method of getting the data in to the vector.

This is wrong! it is undefined behavior to access memory you don't own, even if it works in an example. The reason, I think, is that std::vector would reserve more than one element.

Because you were unlucky. Accessing memory not allocated is UB.

Most likely because an empty vector doesn't have any memory allocated at all, so you are trying to write to a NULL pointer which normally leads to an instant crash. In the second case it has at least some memory allocated, and you are most likely overwriting the end of an array which may or may not lead to a crash in C++.
Both are wrong.

It would be wrong, by the way, even to copy 1 element to the vector that way (or to reserve 5 then copy that way).
The reason it most likely does not segfault is that the implementor felt it would be inefficient to allocate the memory for just 1 element just in case you wanted to grow it later, so maybe they allocated enough for 16 or 32 elements.
Doing reserve(5) first then writing into 5 elements directly would probably not be Undefined Behaviour but would be incorrect because vector will not have a logical size of 5 yet and the copy would almost be "wasted" as vector will claim to still have the size of 0.
What would be valid behaviour is reserve(5), insert an element, store its iterator somewhere, insert 4 more elements and look at the contents of the first iterator. reserve() guarantees that the iterators do not become invalidated until the vector exceeds that size or a call such as erase(), clear(), resize() or another reserve() is made.

Related

Performance impact when resizing vector within capacity

I have the following synthesized example of my code:
#include <vector>
#include <array>
#include <cstdlib>
#define CAPACITY 10000
int main() {
std::vector<std::vector<int>> a;
std::vector<std::array<int, 2>> b;
a.resize(CAPACITY, std::vector<int> {0, 0})
b.resize(CAPACITY, std::array<int, 2> {0, 0})
for (;;) {
size_t new_rand_size = (std::rand() % CAPACITY);
a.resize(new_rand_size);
b.resize(new_rand_size);
for (size_t i = 0; i < new_rand_size; ++i) {
a[i][0] = std::rand();
a[i][1] = std::rand();
b[i][0] = std::rand();
b[i][1] = std::rand();
}
process(a); // respectively process(b)
}
}
so obviously, the array version is better, because it requires less allocation, as the array is fixed in size and continuous in memory (correct?). It just gets reinitialized when up-resizing again within capacity.
Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization (e.g. by overwriting the allocator or similar) to optimize the code even further.
so obviously,
The word "obviously" is typically used to mean "I really, really want the following to be true, so I'm going to skip the part where I determine if it is true." ;) (Admittedly, you did better than most since you did bring up some reasons for your conclusion.)
the array version is better, because it requires less allocation, as the array is fixed in size and continuous in memory (correct?).
The truth of this depends on the implementation, but the there is some validity here. I would go with a less micro-managementy approach and say that the array version is preferable because the final size is fixed. Using a tool designed for your specialized situation (fixed size array) tends to incur less overhead than using a tool for a more general situation. Not always less, though.
Another factor to consider is the cost of default-initializing the elements. When a std::array is constructed, all of its elements are constructed as well. With a std::vector, you can defer constructing elements until you have the parameters for construction. For objects that are expensive to default-construct, you might be able to measure a performance gain using a vector instead of an array. (If you cannot measure a difference, don't worry about it.)
When you do a comparison, make sure the vector is given a fair chance by using it well. Since the size is known in advance, reserve the required space right away. Also, use emplace_back to avoid a needless copy.
Final note: "contiguous" is a bit more accurate/descriptive than "continuous".
It just gets reinitialized when up-resizing again within capacity.
This is a factor that affects both approaches. In fact, this causes your code to exhibit undefined behavior. For example, let's suppose that your first iteration resizes the outer vector to 1, while the second resizes it to 5. Compare what your code does to the following:
std::vector<std::vector<int>> a;
a.resize(CAPACITY, std::vector<int> {0, 0});
a.resize(1);
a.resize(5);
std::cout << "Size " << a[1].size() <<".\n";
The output indicates that the size is zero at this point, yet your code would assign a value to a[1][0]. If you want each element of a to default to a vector of 2 elements, you need to specify that default each time you resize a, not just initially.
Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization (e.g. by overwriting the allocator or similar) to optimize the code even further.
Yes, you can skip the initialization. In fact, it is advisable to do so. Use the tool designed for the task at hand. Your initialization serves to increase the capacity of your vectors. So use the method whose sole purpose is to increase the capacity of a vector: vector::reserve.
Another option – depending on the exact situation — might be to not resize at all. Start with an array of arrays, and track the last usable element in the outer array. This is sort of a step backwards in that you now have a separate variable for tracking the size, but if your real code has enough iterations, the savings from not calling destructors when the size decreases might make this approach worth it. (For cleaner code, write a class that wraps the array of arrays and that tracks the usable size.)
Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization
Yes: Don't resize. Instead, reserve the capacity and push (or emplace) the new elements.

Any way that we can prevent the breakdown between reference and object when resize a vector?

I was debugging an issue and realized that when a vector is resizing, the reference will not work anymore. To illustrate this point, below is the minimal code. The output is 0 instead of 1. Is there anyway that we can prevent this happen except reserving a large space for x?
#include <iostream>
#include <vector>
using namespace std;
vector<int> x{};
int main(){
x.reserve(1);
x.push_back(0);
int & y = x[0];
x.resize(10);
y=1;
cout << x[0] << endl;
return 0;
}
This is called invalidation and the only way you can prevent it is if you make sure that the vector capacity does not change.
x.reserve(10);
x.push_back(0);
int &y = x[0];
x.resize(10);
The only way I can think of is to use std::deque instead of std::vector.
The reason for suggesting std::deque is this (from cppreference):
The storage of a deque is automatically expanded and contracted as
needed. Expansion of a deque is cheaper than the expansion of a
std::vector because it does not involve copying of the existing
elements to a new memory location.
That line about not copying is really the answer to your question. It means that the objects remain where you placed them (in memory) as long as the deque is alive.
However, on the very next line it says:
On the other hand, deques typically have large minimal memory cost; a
deque holding just one element has to allocate its full internal array
(e.g. 8 times the object size on 64-bit libstdc++; 16 times the object
size or 4096 bytes, whichever is larger, on 64-bit libc++).
It's now up to you to decide which is better - higher initial memory cost or changing your program's logic not to require referencing the items in the vector like that. You might also want to consider std::set or std::unordered_set for quickly finding an object within the container
There are several choices:
Don't use a vector.
Don't keep a reference.
Create a "smart reference" class that tracks the vector and the index and so it will obtain the appropriate object even if the vector moves.
You can create a vector of std::shared_ptr<> as well and keep the values instead of the interators.

Avoid the reallocation of a vector when its dimension has to be incremented

I have a
vector< pair<vector<double> , int>> samples;
This vector will contain a number of elements. For efficiency rason I initialize it in this way:
vector< pair<vector<double> , int>> samples(1000000);
I know the size in advance (not a compile-time) that I get from another container. The problem is that I have to decrease of 1 element the dimension of vector. Indeed, this case isn't a problem because resize with smaller dimension than the initial no do reallocation.I can do
samples.resize(999999);
The problem is that in some cases rather than decrease the dimension of 1 element I have to increment the dimension of an element. If I do
samples.resize(1000001);
there is the risk of do reallocation that I want avoid for efficiency rasons.
I ask if is a possible solution to my problem do like this:
vector< pair<vector<double> , int> samples;
samples.reserve(1000001);
samples.resize(1000000);
.
. Elaboration that fill samples
.
samples.resize(1000001); //here I don't want reallocation
or if there are better solutions?
Thanks in advance!
(I'm using C++11 compiler)
Just wrote a sample program to demonstrate, that resize is not going to reallocate space if capacity of the vector is sufficient:
#include <iostream>
#include <vector>
#include <utility>
#include <cassert>
using namespace std;
int main()
{
vector<pair<vector<double>, int>> samples;
samples.reserve(10001);
auto data = samples.data();
assert(10001==samples.capacity());
samples.resize(10000);
assert(10001 == samples.capacity());
assert(data == samples.data());
samples.resize(10001); //here I don't want reallocation
assert(10001==samples.capacity());
assert(data == samples.data());
}
This demo is based on assumption that std::vector guarantees contiguous memory and if data pointer does not change, than no realloc took place. This is also evident, by capacity() result to remain 10001 after every call to resize().
cppreference on vectors:
The storage of the vector is handled automatically, being expanded and contracted as needed. Vectors usually occupy more space than static arrays, because more memory is allocated to handle future growth. This way a vector does not need to reallocate each time an element is inserted, but only when the additional memory is exhausted. The total amount of allocated memory can be queried using capacity() function.
cppreference on reserve:
Correctly using reserve() can prevent unnecessary reallocations, but inappropriate uses of reserve() (for instance, calling it before every push_back() call) may actually increase the number of reallocations (by causing the capacity to grow linearly rather than exponentially) and result in increased computational complexity and decreased performance.
cppreference also sates to resize:
Complexity
Linear in the difference between the current size and count. Additional complexity possible due to reallocation if capacity is less than count
I ask if is a possible solution to my problem do like this:
samples.reserve(1000001);
samples.resize(1000000);
Yes, this is the solution.
or if there are better solutions?
Not that I know of.
As I recall when resize less than capacity, There will not be reallocation.
So, your code will work without reallocation.
cppreference.com
Vector capacity is never reduced when resizing to smaller size because that would invalidate all iterators, rather than only the ones that would be invalidated by the equivalent sequence of pop_back() calls.

Implementation defined to use a reserved vector without resizing it?

Is it implementation defined to use a reserved vector without resizing it?
By that I mean:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
std::vector<unsigned int> foo;
foo.reserve(1024);
foo[0] = 10;
std::cout<<foo[0];
return 0;
}
In the above, I reserve a good amount of space and I assigned a value to one of the indices in that space. However, I did not call push_back which "resizes" the vector and gives it a default value for each element (which I'm trying to avoid). So in this foo.size() is 0 while foo.capacity() is 1024.
So is this valid code or is it implementation defined? Seeing as I'm assigning to a vector with "0" size. It works but I'm not sure if it's a good idea..
The reason I'm trying to avoid the default value is because for large allocations, I don't need it "zero-ing" out each index as I will decide when I want to write to it or not. I'd use a raw pointer but the lodepng API accepts only a vector for decoding from file.
std::vector::reserve just reserves memory, so the next push_back does not have to allocate memory. It does not change the size of the vector.
If you want a vector with an initial size of 1024 elements, you can use the constructor to do that:
std::vector<unsigned int> foo(1024);
Note that if you create a vector with an initial size of e.g. 1024 elements, if you then do push_back you add an element, so the size of the vector increases to 1025 elements.
It is illegal, regardless of the type of item in the container or what seems to happen on a particular compiler. From 23.1.1/12 (Table 68) we learn that operator[] behaves like *(a.begin() + n). Since you haven't added any items to the container this is the same as accessing an iterator past end() which is undefined.

Stumped at a simple segmentation fault. C++

Could somebody be kind to explain why in the world this gives me a segmentation fault error?
#include <vector>
#include <iostream>
using namespace std;
vector <double>freqnote;
int main(){
freqnote[0] = 16.35;
cout << freqnote[0];
return 0;
}
I had other vectors in the code and this is the only vector that seems to be giving me trouble.
I changed it to vector<int>freqnote; and changed the value to 16 and I STILL get the segmentation fault. What is going on?
I have other vector ints and they give me correct results.
Replace
freqnote[0] = 16.35;
with
freqnote.push_back(16.35);
and you'll be fine.
The error is due to that index being out-of-range. At the time of your accessing the first element via [0], the vector likely has a capacity of 0. push_back(), on the other hand, will expand the vector's capacity (if necessary).
You can't initialise an element in a vector like that.
You have to go:
freqnote.push_back(16.35),
then access it as you would an array
You're accessing vector out of bounds. First you need to initialize vector specifying it's size.
int main() {
vector<int> v(10);
v[0] = 10;
}
As has been said, it's an issue about inserting an out of range index in the vector.
A vector is a dynamically sized array, it begins with a size of 0 and you can then extend/shrink it at your heart content.
There are 2 ways of accessing a vector element by index:
vector::operator[](size_t) (Experts only)
vector::at(size_t)
(I dispensed with the const overloads)
Both have the same semantics, however the second is "secured" in the sense that it will perform bounds checking and throw a std::out_of_range exception in case you're off bound.
I would warmly recommend performing ALL accesses using at.
The performance penalty can be shrugged off for most use cases. The operator[] should only be used by experts, after they have profiled the code and this spot proved to be a bottleneck.
Now, for inserting new elements in the vector you have several alternatives:
push_back will append an element
insert will insert the element in front of the element pointed to by the iterator
Depending on the semantics you wish for, both are to be considered. And of course, both will make the vector grow appropriately.
Finally, you can also define the size explicitly:
vector(size_t n, T const& t = T()) is an overload of the constructor which lets you specify the size
resize(size_t n, T const& t = T()) allows you to resize the vector, appending new elements if it gets bigger than it was
Both method allow you to supply an element to be copied (exemplar) and default to copying a default constructed object (0 if T is an int) if you don't supply the exemplar explicitly.
Besides using push_back() to store new elements, you can also call resize() once before you start using the vector to specify the number of elements it contains. This is very similar to allocating an array.