STL algorithm and (end of) arrays - c++

I am new to C++. I was trying using the accumulate algorithm from the numeric library. Consider the following code segment.
int a[3] = {1, 2, 3};
int b = accumulate(a, a + 3, 0);
It turns out that the code segment works and b gets assigned 1+2+3+0=6.
If a was a vector<int> instead of an array, I would call accumulate(a.begin(), a.end(), 0). a.end() would point to one past the the end of a. In the code segment above, a + 3 is analogous to a.end() in that a + 3 also points to one past the end of a. But with the primitive array type, how does the program know that a + 3 is pointing at one past the end of some array?

The end iterator is merely used for comparison as a stop marker, it does not matter what it really points to as it will never be dereferenced. The accumulate function will iterate over all elements in the range [a, a+3[ and stop as soon as it encounters the stop marker.
In the C-style array case, the stop marker will be the element one past the end of the array. This would be the same in the std::vector case.

Related

Variables inside and outside a loop with/without asterix

I am not proficient in C++ but I am converting a short script to PHP
for(auto it = First; it != Last; ++it)
{
Result += *it;
}
From this snippet, I can speculate this simply means
Result = Result + it
where * is a reference to the pointer of the loop.
That said I see this symbol used outside of loops and in some cases I see variables without this symbol both in and outside of loops which puts holes in my theory.
Again I am trying to RTFM but I am unsure what I am searching for.
Both First and Last are iterator objects, representing a generalization of pointers in C++ Standard Library. Additionally, the two iterators reference the same collection, and Last can be reached from First by incrementing the iterator*.
Result is some sort of accumulator. If it is of numeric type, += means Result = Result + *it, where *it is whatever the iterator is pointing to. In other words, Result accumulates the total of elements of the collection between First, inclusive, and Last, exclusive. If First points to the beginning of an array and Last points to one-past-the-end of an array of numeric type, your code would be equivalent to calling PHP array_sum() on the array.
However, Result is not required to be numeric. For example, it could be a std::string, in which case += represents appending the value to the string.
* In terms of pointers and arrays this would be "pointing to the same array," and "Last points to a higher index of the array than First."
I believe your speculation is incorrect.
it, first and last are either iterators or pointers. Iterators are C++ objects that can be used to iterator over containers. For basic usage, they behave much like pointers, and can be dereferenced the same way.
For example:
std::vector<int> myList;
...
// Search for the number 10 in the list.
std::vector<int>::iterator it = std::find(myList.begin(), myList.end(), 10);
// If the number 10 was found in the list, change the value to 11.
if (it != myList.end())
*it = 11; //< Similar to pointer syntax.
In your specific example, the Result variable has a value added to it. To get that value, your code uses the * operator to get the value from the iterator.
The same concept applies to pointers. although iterators and pointers are very different concepts, accessing their values is very similar.

Iterator returned by set_union()

I have the following C++ code using set_union() from algorithm stl:
9 int first[] = {5, 10, 15, 20, 25};
10 int second[] = {50, 40, 30, 20, 10};
11 vector<int> v(10);
12 vector<int>::iterator it;
13
14 sort(first, first+5);
15 sort(second, second+5);
16
17 it = set_union(first, first + 5, second, second + 5, v.begin());
18
19 cout << int(it - v.begin()) << endl;
I read through the document of set_union from http://www.cplusplus.com/reference/algorithm/set_union/ . I have two questions:
Line 17. I understand set_union() is returning an OutputIterator. I
thought iterators are like an object returned from a container object
(e.g. instantiated vector class, and calling blah.begin()
returns the iterator object). I am trying to understand what does
the "it" returned from set_union point to, which object?
Line 19. What does "it - v.begin()" equate to. I am guessing from the output value of "8", the size of union, but how?
Would really appreciate if someone can shed some light.
Thank you,
Ahmed.
The documentation for set_union states that the returned iterator points past the end of constructed range, in your case to one past the last element in v that was written to by set_union.
This is the reason it - v.begin() results in the length of the set union also. Note that you are able to simply subtract the two only because a vector<T>::iterator must satisfy the RandomAccessIterator concept. Ideally, you should use std::distance to figure out the interval between two iterators.
Your code snippet can be written more idiomatically as follows:
int first[] = {5, 10, 15, 20, 25};
int second[] = {50, 40, 30, 20, 10};
std::vector<int> v;
v.reserve(10); // reserve instead of setting an initial size
sort(std::begin(first), std::end(first));
sort(std::begin(second), std::begin(second));
// use std::begin/end instead of hard coding length
auto it = set_union(std::begin(first), std::end(first),
std::begin(second), std::end(second),
std::back_inserter(v));
// using back_inserter ensures the code works even if the vector is not
// initially set to the right size
std::cout << std::distance(v.begin(), it) << std::endl;
std::cout << v.size() << std::endl;
// these lines will output the same result unlike your example
In response to your comment below
What is the use of creating a vector of size 10 or reserving size 10
In your original example, creating a vector having initial size of at least 8 is necessary to prevent undefined behavior because set_union is going to write 8 elements to the output range. The purpose of reserving 10 elements is an optimization to prevent possibility of multiple reallocations of the vector. This is typically not needed, or feasible since you won't know the size of the result in advance.
I tried with size 1, works fine
Size of 1 definitely does NOT work fine with your code, it is undefined behavior. set_union will write past the end of the vector. You get a seg fault with size 0 for the same reason. There's no point in speculating why the same thing doesn't happen in the first case, that's just the nature of undefined behavior.
Does set_union trim the size of the vector, from 10 to 8. Why or is that how set_union() works
You're only passing an iterator to set_union, it knows nothing about the underlying container. So there's no way it could possibly trim excess elements, or make room for more if needed. It simply keeps writing to the output iterator and increments the iterator after each write. This is why I suggested using back_inserter, that is an iterator adaptor that will call vector::push_back() whenever the iterator is written to. This guarantees that set_union will never write beyond the bounds of the vector.
first: "it" is an iterator to the end of the constructed range (i.e. equivalent to v.end())
second: it - v.begin() equals 8 because vector iterators are usually just typedefed pointers and therefore it is just doing pointer arithmetic. In general, it is better to use the distance algorithm than relying on raw subtraction
cout << distance(v.begin(), it) << endl;

How to shuffle an array so that all elements change their place

I need to shuffle an array so that all array elements should change their location.
Given an array [0,1,2,3] it would be ok to get [1,0,3,2] or [3,2,0,1] but not [3,1,2,0] (because 2 left unchanged).
I suppose algorithm would not be language-specific, but just in case, I need it in C++ program (and I cannot use std::random_shuffle due to the additional requirement).
What about this?
Allocate an array which contains numbers from 0 to arrayLength-1
Shuffle the array
If there is no element in array whose index equals its value, continue to step 4; otherwise repeat from step 2.
Use shuffled array values as indexes for your array.
For each element e
If there is an element to the left of e
Select a random element r to the left of e
swap r and e
This guarantees that each value isn't in the position that it started, but doesn't guarantee that each value changes if there's duplicates.
BeeOnRope notes that though simple, this is flawed. Given the list [0,1,2,3], this algorithm cannot produce the output [1,0,3,2].
It's not going to be very random, but you can rotate all the elements at least one position:
std::rotate(v.begin(), v.begin() + (rand() % v.size() - 1) + 1, v.end());
If v was {1,2,3,4,5,6,7,8,9} at the beginning, then after rotation it will be, for example: {2,3,4,5,6,7,8,9,1}, or {3,4,5,6,7,8,9,1,2}, etc.
All elements of the array will change position.
I kind of have a idea in my mind hope it fits your application. Have one more container and this container will be
a "map(int,vector(int))" . The key element will show index and the second element the vector will hold the already used values.
For example for the first element you will use rand function to find which element of the array you should use.Than you will check the map structure if this element of the array has been used for this index.

Why is erase() function so expensive?

Consider a 2d vector vector < vector <int> > Nand lets say its contents are as follows:
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
So the size of N here is 4 i.e. N.size() = 4
Now, consider the following code :
int i = 0;
while(N != empty()){
N.erase(i);
++i;
}
I calculated the time just for this piece of code alone with various sizes for N and following are the results:
The size of N is 1000
Execution Time: 0.230000s
The size of N is 10000
Execution Time: 22.900000s
The size of N is 20000
Execution Time: 91.760000s
The size of N is 30000
Execution Time: 206.620000s
The size of N is 47895
Execution Time: 526.540000s
My question is why is this function so expensive ? If it is so then conditional erase statements in many programs could take forever just because of this function. It is the same case when I use erase function in std::map too. Is there any alternative for this function. Does other libraries like Boost offer any?
Please do not say I could do N.erase() as a whole because I'm just trying to analyze this function.
Consider what happens when you delete the first element of a vector. The rest of the vector must be "moved" down by one index, which involves copying it. Try erasing from the other end, and see if that makes a difference (I suspect it will...)
Because your algorithm is O(n^2). Each call to erase forces the vector to move all elements after the erased element back. So in your loop with the 4 element vector, the first loop causes 3 elements to be shifted, the second iteration causes 1 element to be shifted, and after that you have undefined behavior.
If you had 8 elements, the first iteration would move 7 elements, the next would move 5 elements, the next would move 3 elements, and the final enumeration would move 1 element. (And again you have undefined behavior)
When you encounter situations like this, generally you should use the standard algorithms (i.e. std::remove, std::remove_if) instead, as they run through the container once and turn typical O(n^2) algorithms into O(n) algorithms. For more information see Scott Meyers' "Effective STL" Item 43: Prefer Algorithm Calls to Explicit Loops.
A std::vector is, internally, just an array of elements. If you delete an element in the middle, all the elements after it have to be shifted down. This can be very expensive - even more so if the elements have a custom operator= that does a lot of work!
If you need erase() to be fast, you should use a std::list - this will use a doubly linked list structure that allows fast erasure from the middle (however, other operations get somewhat slower). If you just need to remove from the start of the list quickly, use std::deque - this creates a linked list of arrays, and offers most of the speed advantages of std::vector while still allowing fast erasures from the beginning or end only.
Furthermore, note that your loop there makes the problem worse - you first scan through all elements equal to zero and erase them. The scan takes O(n) time, the erasure also O(n) time. You then repeat for 1, and so on - overall, O(n^2) time. If you need to erase multiple values, you should take an iterator and go through the std::list yourself, using the iterator variant of erase(). Or if you use a vector, you'll find it can be faster to copy into a new vector.
As for std::map (and std::set) - this isn't a problem at all. std::map is capable of both removing elements at random, as well as searching for elements at random, with O(lg n) time - which is quite reasonable for most uses. Even your naive loop there shouldn't be too bad; manually iterating through and removing everything you want to remove in one pass is somewhat more efficient, but not nearly to the extent that it is with std::list and friends.
vector.erase will advance all elements after i forward by 1. This is an O(n) operation.
Additionally, you're passing vectors by value rather than by reference.
Your code also doesn't erase the entire vector.
For example:
i = 0
erase N[0]
N = {{2, 2, 2, 2}, {3, 3, 3, 3}, {4, 4, 4, 4}}
i = 1
erase N[1]
N = {{2, 2, 2, 2}, {4, 4, 4, 4}}
i = 2
erase N[2] nothing happens because the maximum index is N[1]
Lastly, I don' think that's the correct syntax for vector.erase(). You need to pass in an iterator to the begin location to erase the element you want.
Try this:
vector&ltvector&ltint&gt&gt vectors; // still passing by value so it'll be slow, but at least erases everything
for(int i = 0; i &lt 1000; ++i)
{
vector&ltint&gt temp;
for(int j = 0; j &lt 1000; ++j)
{
temp.push_back(i);
}
vectors.push_back(temp);
}
// erase starting from the beginning
while(!vectors.empty())
{
vectors.erase(vectors.begin());
}
You can also compare this to erasing from the end (it should be significantly faster, especially when using values rather than references):
// just replace the while-loop at the end
while(!vectors.empty())
{
vectors.erase(vectors.end()-1);
}
A vector is an array that grows automatically as you add elements to it. As such, elements in a vector a contiguous in memory. This allows constant time access to an element. Because they grow from the end, they also take amortized constant time to add or remove to/from the end.
Now, what happens when you remove in the middle? Well, it means whatever exists after the erased element must be shifted back one position. This is very expensive.
If you want to do lots of insertion/removal in the middle, use a linked list such as std::list of std::deque.
As Oli said, erasing from the first element of a vector means the elements following it have to be copied down in order for the array to behave as desired.
This is why linked lists are used for situations in which elements will be removed from random locations in the list - it is quicker (on larger lists) because there is no copying, only resetting some node pointers.

Weird behaviour with vector::erase and std::remove_if with end range different from vector.end()

I need to remove elements from the middle of a std::vector.
So I tried:
struct IsEven {
bool operator()(int ele)
{
return ele % 2 == 0;
}
};
int elements[] = {1, 2, 3, 4, 5, 6};
std::vector<int> ints(elements, elements+6);
std::vector<int>::iterator it = std::remove_if(ints.begin() + 2, ints.begin() + 4, IsEven());
ints.erase(it, ints.end());
After this I would expect that the ints vector have: [1, 2, 3, 5, 6].
In the debugger of Visual studio 2008, after the std::remove_if line, the elements of ints are modified, I'm guessing I'm into some sort of undefined behaviour here.
So, how do I remove elements from a Range of a vector?
Edit: Sorry, the original version of this was incorrect. Fixed.
Here's what's going on. Your input to remove_if is:
1 2 3 4 5 6
^ ^
begin end
And the remove_if algorithm looks at all numbers between begin and end (including begin, but excluding end), and removes all elements between that match your predicate. So after remove_if runs, your vector looks like this
1 2 3 ? 5 6
^ ^
begin new_end
Where ? is a value that I don't think is deterministic, although if it's guaranteed to be anything it would be 4. And new_end, which points to the new end of the input sequence you gave it, with the matching elements now removed, is what is returned by std::remove_if. Note that std::remove_if doesn't touch anything beyond the subsequence that you gave it. This might make more sense with a more extended example.
Say that this is your input:
1 2 3 4 5 6 7 8 9 10
^ ^
begin end
After std::remove_if, you get:
1 2 3 5 7 ? ? 8 9 10
^ ^
begin new_end
Think about this for a moment. What it has done is remove the 4 and the 6 from the subsequence, and then shift everything within the subsequence down to fill in the removed elements, and then moved the end iterator to the new end of the same subsequence. The goal is to satisfy the requirement that the (begin, new_end] sequence that it produces is the same as the (begin, end] subsequence that you passed in, but with certain elements removed. Anything at or beyond the end that you passed in is left untouched.
What you want to get rid of, then, is everything between the end iterator that was returned, and the original end iterator that you gave it. These are the ? "garbage" values. So your erase call should actually be:
ints.erase(it, ints.begin()+4);
The call to erase that you have just erases everything beyond the end of the subsequence that you performed the removal on, which isn't what you want here.
What makes this complicated is that the remove_if algorithm doesn't actually call erase() on the vector, or change the size of the vector at any point. It just shifts elements around and leaves some "garbage" elements after the end of the subsequence that you asked it to process. This seems silly, but the whole reason that the STL does it this way is to avoid the problem with invalidated iterators that doublep brought up (and to be able to run on things that aren't STL containers, like raw arrays).
Erasing elements in std::vector invalidates iterators past the removed element, so you cannot use "foreign" functions that accept ranges. You need to do that in a different way.
EDIT:
In general, you can use the fact that erasing one element "shifts" all elements at further positions one back. Something like this:
for (size_t scan = 2, end = 4; scan != end; )
{
if (/* some predicate on ints[scan] */)
{
ints.erase (ints.begin () + scan);
--end;
}
else
++scan;
}
Note that std::vector isn't suited for erasing elements in the middle. You should consider something else (std::list?) if you do that often.
EDIT 2:
As clarified by comments, first paragraph is not true. In such case std::remove_if should be more efficient than what I suggested in the first edit, so disregard this answer. (Keeping it for the comments.)
The behavior isn't weird - you're erasing the wrong range. std::remove_if moves elements it "removes" to the end of the input range. In this case, what you're looking for would be to do:
ints.erase(it, ints.begin() + 4 /* your end of range */);
From C++ in a Nutshell:
The remove_if function template
"removes" items for which pred returns
false from the range [first, last).
The return value is one past the new
end of the range. The relative order
of items that are not removed is
stable.
Nothing is actually erased from the
underlying container; instead, items
to the right are assigned to new
positions so they overwrite the
elements for which pred returns false.
See Figure 13-13 (under remove_copy)
for an example of the removal process.