Is it safe to traverse a container during std::remove_if execution? - c++

Suppose I want to remove the unique elements from an std::vector (not get rid of the duplicates, but retain only the elements that occur at least 2 times) and I want to achieve that in a pretty inefficient way - by calling std::count while std::remove_ifing. Consider the following code:
#include <algorithm>
#include <iostream>
#include <vector>
int main() {
std::vector<int> vec = {1, 2, 6, 3, 6, 2, 7, 4, 4, 5, 6};
auto to_remove = std::remove_if(vec.begin(), vec.end(), [&vec](int n) {
return std::count(vec.begin(), vec.end(), n) == 1;
});
vec.erase(to_remove, vec.end());
for (int i : vec) std::cout << i << ' ';
}
From reference on std::remove_if we know that the elements beginning from to_remove have unspecified values, but I wonder how unspecified they can really be.
To explain my concern a little further - we can see that the elements that should be removed are 1, 3, 5 and 7 - the only unique values. std::remove_if will move the 1 to the end but there is no guarantee that there will be a value 1 at the end after said operation. Can this be (due to that value being unspecified) that it will turn into 3 and make the std::count call return a count of (for example) 2 for the later encountered value 3?
Essentially my question is - is this guaranteed to work, and by work I mean to inefficiently erase unique elements from an std::vector?
I am interested in both language-lawyer answer (which could be "the standard says that this situation is possible, you should avoid it") and in-practice answer (which could be "the standard says that this situation is possible, but realistically there is no way of this value ending up as a completely differeny one, for example 3").

After the predicate returns true the first time, there will be one unspecified value in the range. That means any subsequent calls of the predicate will count an unspecified value. The count is therefore potentially incorrect, and you may either leave values unaffected that you intend to be discarded, or discard values that should be retained.
You could modify the predicate so it keeps a count of how many times it has returned true, and reduce the range accordingly. For example;
std::size_t count = 0;
auto to_remove = std::remove_if(vec.begin(), vec.end(), [&vec, &count](int n)
{
bool once = (std::count(vec.begin(), vec.end() - count, n) == 1);
if (once) ++count;
return once;
});
Subtracting an integral value from a vector's end iterator is safe, but that isn't necessarily true for other containers.

You misunderstood how std::remove_if works. The to-be-removed values are not necessarily shifted to the end. See:
Removing is done by shifting (by means of move assignment) the elements in the range in such a way that the elements that are not to be removed appear in the beginning of the range. cppreference
This is the only guarantee for the state of the range. According to my knowledge, it's not forbidden to shift all values around and it would still satisfy the complexity. So it might be possible that some compilers shift the unwanted values to the end but that would be just extra unnecessary work.
An example of possible implementation of removing odd numbers from 1 2 3 4 8 5:
v - read position
1 2 3 4 8 5 - X will denotes shifted from value = unspecified
^ - write position
v
1 2 3 4 8 5 1 is odd, ++read
^
v
2 X 3 4 8 5 2 is even, *write=move(*read), ++both
^
v
2 X 3 4 8 5 3 is odd, ++read
^
v
2 4 3 X 8 5 4 is even, *write=move(*read), ++both
^
v
2 4 8 X X 5 8 is even, *write=move(*read), ++both
^
2 4 8 X X 5 5 is odd, ++read
^ - this points to the new end.
So, in general, you cannot rely on count returning any meaningful values. Since in the case that move==copy (as is for ints) the resulting array is 2 4 8|4 8 5. Which has incorrect count both for the odd and even numbers. In case of std::unique_ptr the X==nullptr and thus the count for nullptr and removed values might be wrong. Other remaining values should not be left in the end part of the array as there were no copies done.
Note that the values are not unspecified as in you cannot know them. They are exactly the results of move assignments which might leave the value in unspecified state. If it specified the state of the moved-from variables ( asstd::unique_ptr does) then they would be known. E.g. if move==swap then the range will be permuted only.

I added some outputs:
#include <algorithm>
#include <iostream>
#include <vector>
#include <mutex>
int main() {
std::vector<int> vec = {1, 2, 6, 3, 6, 2, 7, 4, 4, 5, 6};
auto to_remove = std::remove_if(vec.begin(), vec.end(), [&vec](int n) {
std::cout << "number " << n << ": ";
for (auto i : vec) std::cout << i << ' ';
auto c = std::count(vec.begin(), vec.end(), n);
std::cout << ", count: " << c << std::endl;
return c == 1;
});
vec.erase(to_remove, vec.end());
for (int i : vec) std::cout << i << ' ';
}
and got
number 1: 1 2 6 3 6 2 7 4 4 5 6 , count: 1
number 2: 1 2 6 3 6 2 7 4 4 5 6 , count: 2
number 6: 2 2 6 3 6 2 7 4 4 5 6 , count: 3
number 3: 2 6 6 3 6 2 7 4 4 5 6 , count: 1
number 6: 2 6 6 3 6 2 7 4 4 5 6 , count: 4
number 2: 2 6 6 3 6 2 7 4 4 5 6 , count: 2
number 7: 2 6 6 2 6 2 7 4 4 5 6 , count: 1
number 4: 2 6 6 2 6 2 7 4 4 5 6 , count: 2
number 4: 2 6 6 2 4 2 7 4 4 5 6 , count: 3
number 5: 2 6 6 2 4 4 7 4 4 5 6 , count: 1
number 6: 2 6 6 2 4 4 7 4 4 5 6 , count: 3
2 6 6 2 4 4 6
As you can see the counts can be wrong. I'm not able to create an example for your special case but as a rule you have to worry about wrong results.
First the number 4 is counted twice and in the next step the number 4 is counted thrice. The counts are wrong and you can't rely on them.

Related

How to construct a tree given its depth and postorder traversal, then print its preorder traversal

I need to construct a tree given its depth and postorder traversal, and then I need to generate the corresponding preorder traversal. Example:
Depth: 2 1 3 3 3 2 2 1 1 0
Postorder: 5 2 8 9 10 6 7 3 4 1
Preorder(output): 1 2 5 3 6 8 9 10 7 4
I've defined two arrays that contain the postorder sequence and depth. After that, I couldn't come up with an algorithm to solve it.
Here's my code:
int postorder[1000];
int depth[1000];
string postorder_nums;
getline(cin, postorder_nums);
istringstream token1(postorder_nums);
string tokenString1;
int idx1 = 0;
while (token1 >> tokenString1) {
postorder[idx1] = stoi(tokenString1);
idx1++;
}
string depth_nums;
getline(cin, depth_nums);
istringstream token2(depth_nums);
string tokenString2;
int idx2 = 0;
while (token2 >> tokenString2) {
depth[idx2] = stoi(tokenString2);
idx2++;
}
Tree tree(1);
You can do this actually without constructing a tree.
First note that if you reverse the postorder sequence, you get a kind of preorder sequence, but with the children visited in opposite order. So we'll use this fact and iterate over the given arrays from back to front, and we will also store values in the output from back to front. This way at least the order of siblings will come out right.
The first value we get from the input will thus always be the root value. Obviously we cannot store this value at the end of the output array, as it really should come first. But we will put this value on a stack until all other values have been processed. The same will happen for any value that is followed by a "deeper" value (again: we are processing the input in reversed order). But as soon as we find a value that is not deeper, we flush a part of the stack into the output array (also filling it up from back to front).
When all values have been processed, we just need to flush the remaining values from the stack into the output array.
Now, we can optimise our space usage here: as we fill the output array from the back, we have free space at its front to use as the stack space for this algorithm. This has as nice consequence that when we arrive at the end we don't need to flush the stack anymore, because it is already there in the output, with every value where it should be.
Here is the code for this algorithm where I did not include the input collection, which apparently you already have working:
// Input example
int depth[] = {2, 1, 3, 3, 3, 2, 2, 1, 1, 0};
int postorder[] = {5, 2, 8, 9, 10, 6, 7, 3, 4, 1};
// Number of values in the input
int n = sizeof(depth)/sizeof(int);
int preorder[n]; // This will contain the ouput
int j = n; // index where last value was stored in preorder
int stackSize = 0; // how many entries are used as stack in preorder
for (int i = n - 1; i >= 0; i--) {
while (depth[i] < stackSize) {
preorder[--j] = preorder[--stackSize]; // flush it
}
preorder[stackSize++] = postorder[i]; // stack it
}
// Output the result:
for (int i = 0; i < n; i++) {
std::cout << preorder[i] << " ";
}
std::cout << "\n";
This algorithm has an auxiliary space complexity of O(1) -- so not counting the memory needed for the input and the output -- and has a time complexity of O(n).
I won't give you the code, but some hints how to solve the problem.
First, for postorder graph processing you first visit the children, then print (process) the value of the node. So, the tree or subtree parent is the last thing that is processed in its (sub)tree. I replace 10 with 0 for better indentation:
2 1 3 3 3 2 2 1 1 0
--------------------
5 2 8 9 0 6 7 3 4 1
As explained above, node of depth 0, or the root, is the last one. Let's lower all other nodes 1 level down:
2 1 3 3 3 2 2 1 1 0
-------------------
1
5 2 8 9 0 6 7 3 4
Now identify all nodes of depth 1, and lower all that is not of depth 0 or 1:
2 1 3 3 3 2 2 1 1 0
-------------------
1
2 3 4
5 8 9 0 6 7
As you can see, (5,2) is in a subtree, (8,9,10,6,7,3) in another subtree, (4) is a single-node subtree. In other words, all that is to the left of 2 is its subtree, all to the right of 2 and to the left of 3 is in the subtree of 3, all between 3 and 4 is in the subtree of 4 (here: empty).
Now lets deal with depth 3 in a similar way:
2 1 3 3 3 2 2 1 1 0
-------------------
1
2 3 4
5 6 7
8 9 0
2 is the parent for 2;
6 is the parent for 8, 8, 10;
3 is ahe parent for 6,7;
or very explicitly:
2 1 3 3 3 2 2 1 1 0
-------------------
1
/ / /
2 3 4
/ / /
5 6 7
/ / /
8 9 0
This is how you can construct a tree from the data you have.
EDIT
Clearly, this problem can be solved easily by recursion. In each step you find the lowest depth, print the node, and call the same function recursively for each of its subtrees as its argument, where the subtree is defined by looking for current_depth + 1. If the depth is passed as another argument, it can save the necessity of computing the lowest depth.

Number of first element insertions to sort array in O(nlogn)?

If you am only allowed to move the first element of an array, how many insertions does it take to fully sort the array?
In the output, give the number of insertions necessary as well as how many positions each element moves back.
For example:
Input:
6
1 4 2 5 3 6
Output:
4
3 4 2 4
Explanation:
This is the order of insertions:
4 2 5 1 3 6
2 5 1 3 4 6
5 1 2 3 4 6
1 2 3 4 5 6
I can do this in O(n2) since the problem simplifies to finding the position where the first element lies in the increasing suffix of the array.
How can I solve this in O(nlogn)?

What does this vector array code do? (C++)

Having difficulty finding an explanation to this.
What does this code do? I understand it creates an array of vector but that's about it.
How can I print the vector array and access elements to experiment with it?
#define MAXN 300009
vector<int>dv[MAXN];
int main()
{
for(int i=1;i<MAXN;i++)
for(int j=i;j<MAXN;j+=i)
dv[j].push_back(i);
}
The code is easy enough to instrument. The reality of what it ends up producing is a very simple (and very inefficient) Sieve of Eratosthenes. Understanding that algorithm, you'll see what this code does to produce that ilk.
Edit: It is also a factor-table generator. See Edit below.
Instrumenting the code and dumping output afterward, and reducing the number of loops for simplification we have something like the following code. We use range-based-for loops for enumerating over each vector in the array of vectors:
#include <iostream>
#include <vector>
#define MAXN 20
std::vector<int>dv[MAXN];
int main()
{
for(int i=1;i<MAXN;i++)
{
for(int j=i;j<MAXN;j+=i)
dv[j].push_back(i);
}
for (auto const& v : dv)
{
for (auto x : v)
std::cout << x << ' ';
std::cout << '\n';
}
}
The resulting output is:
1
1 2
1 3
1 2 4
1 5
1 2 3 6
1 7
1 2 4 8
1 3 9
1 2 5 10
1 11
1 2 3 4 6 12
1 13
1 2 7 14
1 3 5 15
1 2 4 8 16
1 17
1 2 3 6 9 18
1 19
Now, note each vector that only has two elements (1 and an additional number). That second number is prime. In our test case those two-element vectors are:
1 2
1 3
1 5
1 7
1 11
1 13
1 17
1 19
In short, this is a very simple, and incredibly inefficient way of finding prime numbers. A slight change to the output loops to only output the second element of all vectors of length-two-only will therefore generate all the primes lower than MAXN. Therefore, using:
for (auto const& v : dv)
{
if (v.size() == 2)
std::cout << v[1] << '\n';
}
We will get all primes from [2...MAXN)
Edit: Factor Table Generation
If it wasn't obvious, each vector has an ending element (that not-coincidentally also lines up with the subscripts of the outer array). All preceding elements make up the positive factors of that number. For example:
1 2 5 10
is the dv[10] vector, and tells you 10 has factors 1,2,5,10. Likewise,
1 2 3 6 9 18
is the dv[18] vector, and tells you 18 has factors 1,2,3,6,9,18.
In short, if someone wanted to know all the factors of some number N that is < MAXN, this would be a way of putting all that info into tabular form.

Difference of the two versions of partition used in quicksort

The first one is straightforward, just walk from both sides until finding a reversion.
/*C++ version, [first, last), last needs --first to fetch the last element*/
/*returns the middle of partitioning result*/
int* partition( int *first, int *last, int pivot ) {
while (true) {
while (*first < pivot) ++first;
--last;//Don't edit this, it's true.
while (pivot < *last) --last;
if (!(first < last)) return first;
swap(*first, *last);
++first;
}
}
The second one (shown in "Introduction to algorithms") is:
int* partition( int a[], int n, int pivot ) {
bound = 0;
for ( i = 1; i != n; ++i )
if ( a[i] < pivot )
swap( &a[i], &a[++bound]);
swap(a, a + bound);
return a + bound;
}
The invariant of the second one is " All elements before bound is less than pivot " .
Q: And what is the advantages and disadvantages of the two versions?
I'll give one first, the second one require ++ operation on the iterator( pointer ), so it can be applied to some ForwardIterator like the iterator of a linked list. Other tips?
As far as the basic idea of the two algorithms go, both are correct. They will do the same number of comparisons but the second one will do more swaps than the first.
You can see this by stepping through the algorithms as they partition the array 1 9 2 8 3 7 4 6 5 using 5 as the pivot. When the first algorithm swaps two numbers it never touches either of then again. The second algorithm first swaps 9 and 2, then 9 and 3, and so on, taking multiple swaps to move 9 to its final position.
There are other differences too. If I haven't made any mistakes, this is how the first algorithm partitions the array:
1 9 2 8 3 7 4 6 5
f l
1 9 2 8 3 7 4 6 5 # swap 9,5
f l
1 5 2 8 3 7 4 6 9 # swap 8,4
f l
1 5 2 4 3 7 8 6 9 # return f = 5
l f
This is how the second algorithm partitions the array:
1 9 2 8 3 7 4 6 5 # 1<5, swap 1,1
bi
1 9 2 8 3 7 4 6 5 # 9>5, no swap
bi
1 9 2 8 3 7 4 6 5 # 2<5, swap 9,2
b i
1 2 9 8 3 7 4 6 5 # 8>5, no swap
b i
1 2 9 8 3 7 4 6 5 # 3<5, swap 9,3
b i
1 2 3 8 9 7 4 6 5 # 7>5, no swap
b i
1 2 3 8 9 7 4 6 5 # 4<5, swap 8,4
b i
1 2 3 4 9 7 8 6 5 # 6>5, no swap
b i
1 2 3 4 9 7 8 6 5 # 5=5, exit loop, swap 9,5
b i
1 2 3 4 5 7 8 6 9 # return b = 4
b i
Notice how it makes 5 swaps, compared to just 2 of the other algorithm. It also moves the last item in the array to the middle array. In this case the last item happens to be the pivot so it's the pivot that's moved to the middle, but that's not the general case.

Permutations with some fixed numbers

How to effectively generate permutations of a number (or chars in word), if i need some char/digit on specified place?
e.g. Generate all numbers with digit 3 at second place from the beginning and digit 1 at second place from the end of the number. Each digit in number has to be unique and you can choose only from digits 1-5.
4 3 2 1 5
4 3 5 1 2
2 3 4 1 5
2 3 5 1 4
5 3 2 1 4
5 3 4 1 2
I know there's a next_permutation function, so i can prepare an array with numbers {4, 2, 5} and post this in cycle to this function, but how to handle the fixed positions?
Generate all permutations of 2 4 5 and insert 3 and 1 in your output routine. Just remember the positions were they have to be:
int perm[3] = {2, 4, 5};
const int N = sizeof(perm) / sizeof(int);
std::map<int,int> fixed; // note: zero-indexed
fixed[1] = 3;
fixed[3] = 1;
do {
for (int i=0, j=0; i<5; i++)
if (fixed.find(i) != fixed.end())
std::cout << " " << fixed[i];
else
std::cout << " " << perm[j++];
std::cout << std::endl;
} while (std::next_permutation(perm, perm + N));
outputs
2 3 4 1 5
2 3 5 1 4
4 3 2 1 5
4 3 5 1 2
5 3 2 1 4
5 3 4 1 2
I've read the other answers and I believe they are better than mine for your specific problem. However I'm answering in case someone needs a generalized solution to your problem.
I recently needed to generate all permutations of the 3 separate continuous ranges [first1, last1) + [first2, last2) + [first3, last3). This corresponds to your case with all three ranges being of length 1 and separated by only 1 element. In my case the only restriction is that distance(first3, last3) >= distance(first1, last1) + distance(first2, last2) (which I'm sure could be relaxed with more computational expense).
My application was to generate each unique permutation but not its reverse. The code is here:
http://howardhinnant.github.io/combinations.html
And the specific applicable function is combine_discontinuous3 (which creates combinations), and its use in reversible_permutation::operator() which creates the permutations.
This isn't a ready-made packaged solution to your problem. But it is a tool set that could be used to solve generalizations of your problem. Again, for your exact simple problem, I recommend the simpler solutions others have already offered.
Remember at which places you want your fixed numbers. Remove them from the array.
Generate permutations as usual. After every permutation, insert your fixed numbers to the spots where they should appear, and output.
If you have a set of digits {4,3,2,1,5} and you know that 3 and 1 will not be permutated, then you can take them out of the set and just generate a powerset for {4, 2, 5}. All you have to do after that is just insert 1 and 3 in their respective positions for each set in the power set.
I posted a similar question and in there you can see the code for a powerset.