Unexpected behavior using `std::count` on `std::vector` of pairs - c++

My goal is to completely remove all elements in a std::vector<std::pair<int, int>> that occur more than once.
The idea was to utilize std::remove with std::count as part of the predicate. My approach looks something like this:
#include <iostream>
#include <vector>
#include <algorithm>
using std::cout;
using std::endl;
using i_pair = std::pair<int, int>;
int main()
{
std::vector<i_pair> vec;
vec.push_back(i_pair(0,0)); // Expected to stay
vec.push_back(i_pair(0,1)); // Expected to go
vec.push_back(i_pair(1,1)); // Expected to stay
vec.push_back(i_pair(0,1)); // Expected to go
auto predicate = [&](i_pair& p)
{
return std::count(vec.begin(), vec.end(), p) > 1;
};
auto it = std::remove_if(vec.begin(), vec.end(), predicate);
cout << "Reordered vector:" << endl;
for(auto& e : vec)
{
cout << e.first << " " << e.second << endl;;
}
cout << endl;
cout << "Number of elements that would be erased: " << (vec.end() - it) << endl;
return 0;
}
The array gets reordered with both of the (0,1) elements pushed to the end, however the iterator returned by std::remove points at the last element. This means that a subsequent erase operation would only get rid of one (0,1) element.
Why is this behavior occurring and how can I delete all elements that occur more than once?

Your biggest problem is std::remove_if gives very little guarantees about the contents of the vector while it is running.
It guarantees at the end, begin() to returned iterator contains elements not removed, and from there until end() there are some other elements.
Meanwhile, you are iterating over the container in the middle of this operation.
It is more likely that std::partition would work, as it guarantees (when done) that the elements you are "removing" are actually stored at the end.
An even safer one would be to make a std::unordered_map<std::pair<int,int>, std::size_t> and count in one pass, then in a second pass remove everything whose count is at least 2. This is also O(n) instead of your algorithms O(n^2) so should be faster.
std::unordered_map<i_pair,std::size_t, pair_hasher> counts;
counts.reserve(vec.size()); // no more than this
for (auto&& elem:vec) {
++counts[elem];
}
vec.erase(std::remove_if(begin(vec), end(vec), [&](auto&&elem){return counts[elem]>1;}), end(vec));
you have to write your own pair_hasher. If you are willing to accept nlgn performance, you could do
std::map<i_pair,std::size_t> counts;
for (auto&& elem:vec) {
++counts[elem];
}
vec.erase(std::remove_if(begin(vec), end(vec), [&](auto&&elem){return counts[elem]>1;}), end(vec));

Related

What is the best way traversing an unordered_map with a starting from a random element in C++?

I have an unordered_map of 'n' elements. It has a some eligible elements. I want to write a function such that each time, a random eligible element is picked.
Can this be achieved in the following time complexity?
Best case: O(1)
Avg case: O(1)
Worst case: O(n)
Referring - retrieve random key element for std::map in c++, I have come up with the following solution.
#include <iostream>
#include <unordered_map>
#include <random>
using namespace std;
void select_random_best(const std::unordered_map<std::string, int>& umap, const int random_start)
{
cout << "Selected random number " << random_start << endl;
auto it = umap.begin();
std::advance(it, random_start);
for(int i = 0; i < umap.size(); i++, it++) {
if(it == umap.end())
it = umap.begin();
// Check if the selected element satisfies the eligibility criteria.
// For the sake of simplicity, I am taking the following example.
if(it->second % 3 == 0) {
cout << it->first << ", " <<
it->second << endl;
return;
}
// Element not found continue searching
}
}
int main()
{
srand(time(0));
unordered_map<string, int> umap;
// inserting values by using [] operator
umap["a"] = 6;
umap["b"] = 3;
umap["f"] = 9;
umap["c"] = 2;
umap["d"] = 1;
umap["e"] = 3;
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> distrib(0, umap.size() - 1);
const int random_start = distrib(gen);
select_random_best(umap, distrib(gen));
// another iteration
select_random_best(umap, distrib(gen));
cout << "Full list :" << endl;
// Traversing an unordered map
for (auto x : umap)
cout << x.first << ", " <<
x.second << "\t";
}
Can someone suggest if the use of std::advance() here would lead to the avg case time comlexity of O(1)? Or is there a better way of doing this?
std::unordered_map has forward iterators, which do not allow random access. Refer to iterator on the documentation page of the container.
Assuming all elements are eligible, std::advance() will go through size/2 elements on average. Because you only accept eligible elements, you will go through more than that. If you know the probability of the eligibility, you can estimate the average elements searched.
To achieve O(1) in the std::advance() step, you must use a data type with random access iterators, such as std::vector. However, the next step does not have constant compexity. In the worst case, you will go through all ineligible elements (not considering the possibility of an infinite loop if there are no eligible ones). So this approach is still O(n) as whole.
For the best performance, you need two lists: std::vector with only eligible elements, used for finding a random element, and std::unordered_map for other things.

Error during the usage of of size() function in vectors

So I've started learning vectors for the first time and wrote a simple program which goes like this:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<int> g1;
int n;
cout<<"enter values"<<endl;
do
{
cin>>n;
g1.push_back(n);
} while (n);
cout<<"Vector values are: "<<endl;
for(auto i=g1.begin(); i<g1.size();i++)
cout<<*i<<endl;
}
When I try executing it, an error shows up saying "type mismatch" at the g1.size() part. Why exactly does this happen? I used the auto keyword for the iterator involved and assumed there wouldn't be any problem?
That is the bad side of using auto. If you have no idea what the result of auto is, you get no idea why it is something totally different you expect!
std::vector::begin delivers a std::vector::iterator and you can't compare it against an size_type value which is a result of std::vector::size. This type is typically std::size_t
You have to compare against another iterator which is the representation of the end of the vector like:
for(auto i = g1.begin(); i != g1.end(); i++)
There are at least three ways to iterate through the contents of a vector.
You can use an index:
for (int i = 0; i < vec.size(); ++i)
std::cout << vec[i] << '\n';
You can use iterators:
for (auto it = vec.begin(); it != vec.end(); ++it)
std::cout << *it << '\n';
You can use a range-based for loop:
for (auto val : vec)
std::cout << Val <<'\n';
The latter two can be used with any container.
g1.begin() returns an iterator to the 1st element, whereas g1.size() returns the number of elements. You can't compare an iterator to a size, which is why you are getting the error. It has nothing to do with your use of auto, it has to do with you comparing 2 different things that are unrelated to each other.
You need to change your loop to compare your i iterator to the vector's end() iterator, eg:
for(auto i = g1.begin(); i != g1.end(); ++i)
cout << *i << endl;
Or, simply use a range-based for loop instead, which uses iterators internally:
for(auto i : g1)
cout << i << endl;
Otherwise, if you want to use size() then use indexes with the vector's operator[], instead of using iterators, eg:
for(size_t i = 0; i < g1.size(); ++i)
cout << g1[i] << endl;

Duplicating std::list with std::copy and removal with std::list::erase

In the bellow example code after assigning example list with numbers I'm trying to duplicate container with std::copy but problem is at runtime it says "cannot dereference end list iterator".
my question is how do I duplicate the list so that duplicated range is inserted to the end of the list?
to the end because I later need to be able to remove duplicated range, that is why I save the beginning of the new range to iterator.
#include <iostream>
#include <list>
#include <algorithm>
void print(std::list<int>& ref)
{
for (auto& num : ref)
{
std::cout << num << std::endl;
}
}
int main()
{
std::list<int> mylist{ 1, 2, 3, 4 };
std::list<int>::iterator iter = mylist.end();
std::cout << "INITIAL LIST NUMBERS" << std::endl;
print(mylist);
// duplicate list, will cause runtime error
iter = std::copy(mylist.begin(), mylist.end(), --mylist.end());
std::cout << "COPIED LIST IS NOW CONTAINS DUPLICATE NUMBERS" << std::endl;
print(mylist);
// remove previsous duplication
mylist.erase(iter, mylist.end());
std::cout << "AFTER REMOVAL OF COPIED LIST SHOULD BE SAME AS INITIAL LIST" << std::endl;
print(mylist);
std::cin.get();
return 0;
}
You can use std::copy_n. This circumvents the issue with std::copy, which would execute an infinite loop of insertions when fed with a std::back_inserter(mylist) and an always valid mylist.end() iterator.
const std::size_t n = mylist.size();
std::copy_n(mylist.cbegin(), n, std::back_inserter(mylist));
De-duplication then works with
mylist.erase(std::next(mylist.begin(), n), mylist.end());
if (!mylist.empty()) --iter;
std::copy_n(mylist.begin(), mylist.size(), std::back_inserter(mylist));
if (!mylist.empty()) ++iter;
Unfortunately we can't use end iterator in copy(), since it might lead to an infinite loop, as new elements are added between the end and the current iterator all the time.

Which C++ STL container provides `extract_max()`, `find(element_value)` and `modify(element)` functionality?

I want to use a C++ STL container to implement Prim's algorithm. I need extract_max, find(element) and modify(element_value) functionality, but std::priority_queue only provides extract_max. Is there some other container that I can use? Obviously I want all of these to be as fast as possible.
Edit: The container should also provide functionality to modify the value of its element.
Push your elements in an std::set<T, std::greater<T>>, which is an ordered heap.
Call *set::begin() to get to the max element on O(1) or O(log(n)), depending on how set::begin() is implemented.
Use set::find to perform a search in O(log(n)).
To modify an element, you must unfortunately remove it from the set and then insert the modified version. (This also applies to make_heap and friends). There could exist an answer where this is not necessary, but (A) you'd have to be paranoid about what members are used for comparison vs equality, and (B) the difference in speed is very small. So there is no common container that works that way.
If the element ordering is not unique in it's ordering, use std::multiset instead, which is otherwise identical.
Example:
#include <iostream>
#include <set>
int main()
{
std::set<int, std::greater<int>> v { 3, 1, 4, 1, 5, 9 };
std::cout << "initially, v: ";
for (auto i : v) std::cout << i << ' ';
std::cout << '\n';
auto largest = *v.begin();
v.erase(v.begin());
std::cout << "largest element: " << largest << '\n';
std::cout << "after removing the largest element, v: ";
for (auto i : v) std::cout << i << ' ';
std::cout << '\n';
}
Live demo

Does g++'s std::list::sort invalidate iterators?

According to SGI, cplusplus.com, and every other source I've got, the sort() member function of the std::list should not invalidate iterators. However, that doesn't seem to be the case when I run this code (c++11):
#include <list>
#include <chrono>
#include <random>
#include <iostream>
#include "print.hpp"
unsigned int seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine generator(seed);
std::uniform_int_distribution<unsigned int> distribution(1, 1000000000);
auto rng = std::bind(distribution, generator);
// C++11 RNG stuff. Basically, rng() now gives some unsigned int [1, 1000000000]
int main() {
unsigned int values(0);
std::cin >> values; // Determine the size of the list
std::list<unsigned int> c;
for (unsigned int n(0); n < values; ++n) {
c.push_front(rng());
}
auto c0(c);
auto it(c.begin()), it0(c0.begin());
for (unsigned int n(0); n < 7; ++n) {
++it; // Offset these iterators so I can print 7 values
++it0;
}
std::cout << "With seed: " << seed << "\n";
std::cout << "Unsorted list: \n";
print(c.begin(), c.end()) << "\n";
print(c.begin(), it) << "\n\n";
auto t0 = std::chrono::steady_clock::now();
c0.sort();
auto d0 = std::chrono::steady_clock::now() - t0;
std::cout << "Sorted list: \n";
print(c0.begin(), c0.end()) << "\n";
print(c0.begin(), it0) << "\n"; // My own print function, given further below
std::cout << "Seconds: " << std::chrono::duration<double>(d0).count() << std::endl;
return 0;
}
In print.hpp:
#include <iostream>
template<class InputIterator>
std::ostream& print(InputIterator begin, const InputIterator& end,
std::ostream& out = std::cout) {
bool first(true);
out << "{";
for (; begin != end; ++begin) {
if (first) {
out << (*begin);
first = false;
} else {
out << ", " << (*begin);
}
}
out << "}";
return out;
}
Sample input/output:
11
With seed: 3454921017
Unsorted list:
{625860546, 672762972, 319409064, 8707580, 317964049, 762505303, 756270868, 249266563, 224065083, 843444019, 523600743}
{625860546, 672762972, 319409064, 8707580, 317964049, 762505303, 756270868}
Sorted list:
{8707580, 224065083, 249266563, 317964049, 319409064, 523600743, 625860546, 672762972, 756270868, 762505303, 843444019}
{8707580, 224065083}
Seconds: 2.7e-05
Everything works as expected, except for the printing. It is supposed to show 7 elements, but instead the actual number is fairly haphazard, provided "value" is set to more than 7. Sometimes it gives none, sometimes it gives 1, sometimes 10, sometimes 7, etc.
So, is there something observably wrong with my code, or does this indicate that g++'s std::list (and std::forward_list) is not standards conforming?
Thanks in advance!
The iterators remain valid and still refer to the same elements of the list, which have been re-ordered.
So I don't think your code does what you think it does. It prints the list from the beginning, to wherever the 7th element ended up after the list was sorted. The number of elements it prints therefore depends on the values in the list, of course.
Consider the following code:
#include <list>
#include <iostream>
int main() {
std::list<int> l;
l.push_back(1);
l.push_back(0);
std::cout << (void*)(&*l.begin()) << "\n";
l.sort();
std::cout << (void*)(&*l.begin()) << "\n";
}
The two address printed differ, showing that (unlike std::sort), std::list::sort has sorted by changing the links between the elements, not by assigning new values to the elements.
I've always assumed that this is mandated (likewise for reverse()). I can't actually find explicit text to say so, but if you look at the description of merge, and consider that the reason for list::sort to exist is presumably because mergesort works nicely with lists, then I think it's "obviously" intended. merge says, "Pointers and references to the moved elements of x now refer to those same elements but as members of *this" (23.3.5.5./23), and the start of the section that includes merge and sort says, "Since lists allow fast insertion and erasing from the middle of a list, certain operations are provided specifically for them" (23.3.5.5/1).