How to sort only subset in std::vector?

How to sort only subset in std::vector? - c++

I have a vector of pairs:
std::vector<std::pair<std::string, Cell::Ptr>> mCells;
I want to sort only a subset of elements (on the first's string). The Cell has method GetSorted() which indicates if it's part of this subset or not.
This is what I had initially:
std::sort(mCells.begin(), mCells.end(),
[](std::pair<std::string, Cell::Ptr> const &a,
std::pair<std::string, Cell::Ptr> const &b)
{
// Only compare when both cells need to be sorted; otherwise return false
// to indicate that they are already in correct order. This keeps the
// non-marked cells at their original positions.
if (a.second->GetSorted() && b.second->GetSorted())
{
return a.first < b.first;
}
else
{
return false;
}
});
But it does not work, because sort, of course, does not compare all combinations. Sometimes the return a.first < b.first line is not even executed once.
To define the required sort function, here's an example. Suppose the elements are:
G* F C* A* D B E*
Only the *-ones need to be sorted. But, the sort should only be applied to adjacent to-be-sorted elements. (That's why I had a.second->GetSorted() && b.second->GetSorted().) The result should then be:
G* F A* C* D B E*
So, only A and C are adjacent, and are sorted. Is there an easy solution to this problem?
Alternatively a solution that results in:
A* F C* E* D B G*
would also be usable for me at the moment. So, sorting all * elements, while leaving the others where they are. This appears to be easier to do.

You need to separate finding the ranges to be sorted and sorting them:
using namespace std;
auto isSorted = [](std::pair<std::string, Cell::Ptr> const &a) {
return a.second->GetSorted();
}
auto it = begin(mCells);
const auto itEnd = end(mCells);
while (it != itEnd) {
auto rangeStart = find_if(it, itEnd, isSorted);
if (rangeStart == itEnd)
break;
auto rangeEnd = find_if_not(rangeStart, itEnd, isSorted);
if (distance(rangeStart, rangeEnd) > 1) {
// pair comparison should do the trick here
sort(rangeStart, rangeEnd);
}
it = rangeEnd;
}
Just saw your edit: you can achieve the alternate solution by defining a custom input iterator class that skips non-sorted elements, then using a single sort() call on the whole "range".

Related

map comparator for sorting map elements

I wanted the elements of the map to be arranged in a specific sequence(shortest first).
So i wrote a simple comparator which compares the length of the elements being inserted in the map with the previous element(s).
struct cmpByStringLength {
bool operator()(const std::string& a, const std::string& b) const {
return a.length() < b.length();
}
};
int main()
{
map<string,string,cmpByStringLength> obj1;
obj1.insert(make_pair("Anurag","Last"));
obj1.insert(make_pair("Second","Last"));
for(map<string,string>::iterator it=obj1.begin();it!= obj1.end();++it)
{
cout<<it->first;
cout<<endl;
}
return 0;
}
But the above won't insert element with key as Second in the map since the comparator compares elements with keys Second with Anurag and they have equal length so doesn't insert element with key Second. However the following would work fine:
obj1.insert(make_pair("abc","Last"));
obj1.insert(make_pair("abcdefg","Last"));
obj1.insert(make_pair("abcd","Last"));
Turns out, my understanding about custom comparator to sort the elements in a map is wrong as it is used for inserting the elements and not for inserting it as per the sort logic i provided through comparator.
So in other words, is it correct to say that custom comparators are just used for deciding whether or not to insert an element at all in the map, and it is not used to decide where to place the element?

C++ documentation on map (and really anything using a std::less-style comparator) makes it quite clear that two elements a, b are equivalent iff !comp(a, b) && !comp(b, a) (see, for example, https://en.cppreference.com/w/cpp/container/map). This means that, yes, your comparator gets used for ordering and equivalence testing.
The way you'd usually fix this is to implement a two-level comparison, e.g.
return (a.length() == b.length()) ? (a < b) : (a.length() < b.length());

Using std::multimap can solve your problem. But I don't know size of your data so I can't comment how it will effect performance on your application.
#include <iostream>
#include <map>
using namespace std;
struct cmpByStringLength {
bool operator()(const std::string &a, const std::string &b) const {
return a.length() < b.length();
}
};
int main() {
multimap<string, string, cmpByStringLength> obj1;
obj1.insert(make_pair("Second1", "Last"));
obj1.insert(make_pair("Anurag", "Last"));
obj1.insert(make_pair("Second", "Last"));
obj1.insert(make_pair("Secon", "Last"));
for (map<string, string>::iterator it = obj1.begin(); it != obj1.end();
++it) {
cout << it->first;
cout << endl;
}
return 0;
}
Output
Secon
Anurag
Second
Second1

Comparator needed for an associative container

I need a std::set<std::pair<std::string, int>, Compare> that compares two pairs according to their int values (in reverse order), if their int values are the same then according to their string values (in same order), but if their strings are equal then the two pairs are considered equal (regardless of their int values). So the Compare class I came up with is:
struct Compare {
bool operator()(const std::pair<std::string, int>& a, const std::pair<std::string, int>& b) const {
if (a.first == b.first)
return false;
if (a.second > b.second)
return true;
if (a.second < b.second)
return false;
return a.first < b.first;
}
};
The test
std::set<std::pair<std::string, int>, Compare> s;
s.insert({"Apple", 3});
s.insert({"Apple", 5});
works fine (inserting only the first pair). But
int main() {
std::set<std::pair<std::string, int>, Compare> s;
s.insert({"Ai", 14});
s.insert({"Am", 14});
s.insert({"F", 5});
s.insert({"Apple", 3});
s.insert({"Apple", 5});
}
shows both {"Apple", 3} and {"Apple", 5} being inserted, and I can't figure out why. What is the logical error in my Compare class? What is it supposed to be instead? I considered using std::map<std::string, int, Compare> but in this case the comparator could only use the key type std::string, which won't suffice for my specs.
I also tried:
bool operator()(const std::pair<std::string, int>& a, const std::pair<std::string, int>& b) const {
if (a.first < b.first || a.first > b.first) {
if (a.second > b.second)
return true;
if (a.second < b.second)
return false;
return a.first < b.first;
}
return false;
}
and it still does not give the results I want.

After examining your requirements, I came to the conclusion that your criteria for comparing objects do not meet requirements of strictly week ordering.
Say you insert the following objects to the set:
std::pair<std::string, int> obj1 = {"F", 5};
std::pair<std::string, int> obj2 = {"Apple", 3};
std::pair<std::string, int> obj3 = {"Apple", 5};
s.insert{obj1);
s.insert(obj2);
s.insert(obj3);
obj1 gets added since there is nothing else to compare with in the set. obj2 also gets added since it compare unequal to obj1. However. since obj1.second > obj2.second, the order of the objects in the set is:
obj1
obj2
Now, we come to insert obj3. obj3 < obj1 evaluates to true. Hence, it gets inserted before obj1. The logic for inserting an item into the set is such that obj3 never gets compares with obj2. Consequently, you end up with:
obj3
obj1
obj2

This is not how Compare in a std::set works. It is meant to provide an order from smallest to biggest. With your set you are trying to make 2 different kinds of comparisons.
You can order it by first int value, secondly string value. No problem.
But 2 elements in a set is considered equal if none if them compares smaller then the other one.
When you do your first example the 2 elements happens to be next to each other, so then the comparison function will be used on them and your a.first == b.first case triggers and none of them seems to be smaller then the other one so they are considered equal.
When you do your second attemp, by the time you insert "Apple", 5 your set look like this.
Ai 14
Am 14
F 5
Apple 3
Apple, 5 will here Compare smaller then Am 14 and bigger then F 5 so it will never be compared with Apple 3 at all, but it will be inserted between the two elements it's bigger and smaller then. Since the std::set is expected to be sorted in order already the elements beyond are irrelevant as far as the Compare is concerned.

The problem is the first comparison, it's just wrong. Remove it and it works.

Sort when only equality is available

Suppose we have a vector of pairs:
std::vector<std::pair<A,B>> v;
where for type A only equality is defined:
bool operator==(A const & lhs, A const & rhs) { ... }
How would you sort it that all pairs with the same first element will end up close? To be clear, the output I hope to achieve should be the same as does something like this:
std::unordered_multimap<A,B> m(v.begin(),v.end());
std::copy(m.begin(),m.end(),v.begin());
However I would like, if possible, to:
Do the sorting in place.
Avoid the need to define a hash function for equality.
Edit: additional concrete information.
In my case the number of elements isn't particularly big (I expect N = 10~1000), though I have to repeat this sorting many times ( ~400) as part of a bigger algorithm, and the datatype known as A is pretty big (it contains among other things an unordered_map with ~20 std::pair<uint32_t,uint32_t> in it, which is the structure preventing me to invent an ordering, and making it hard to build a hash function)

First option: cluster() and sort_within()
The handwritten double loop by #MadScienceDreams can be written as a cluster() algorithm of O(N * K) complexity with N elements and K clusters. It repeatedly calls std::partition (using C++14 style with generic lambdas, easily adaptable to C++1, or even C++98 style by writing your own function objects):
template<class FwdIt, class Equal = std::equal_to<>>
void cluster(FwdIt first, FwdIt last, Equal eq = Equal{})
{
for (auto it = first; it != last; /* increment inside loop */)
it = std::partition(it, last, [=](auto const& elem){
return eq(elem, *it);
});
}
which you call on your input vector<std::pair> as
cluster(begin(v), end(v), [](auto const& L, auto const& R){
return L.first == R.first;
});
The next algorithm to write is sort_within which takes two predicates: an equality and a comparison function object, and repeatedly calls std::find_if_not to find the end of the current range, followed by std::sort to sort within that range:
template<class RndIt, class Equal = std::equal_to<>, class Compare = std::less<>>
void sort_within(RndIt first, RndIt last, Equal eq = Equal{}, Compare cmp = Compare{})
{
for (auto it = first; it != last; /* increment inside loop */) {
auto next = std::find_if_not(it, last, [=](auto const& elem){
return eq(elem, *it);
});
std::sort(it, next, cmp);
it = next;
}
}
On an already clustered input, you can call it as:
sort_within(begin(v), end(v),
[](auto const& L, auto const& R){ return L.first == R.first; },
[](auto const& L, auto const& R){ return L.second < R.second; }
);
Live Example that shows it for some real data using std::pair<int, int>.
Second option: user-defined comparison
Even if there is no operator< defined on A, you might define it yourself. Here, there are two broad options. First, if A is hashable, you can define
bool operator<(A const& L, A const& R)
{
return std::hash<A>()(L) < std::hash<A>()(R);
}
and write std::sort(begin(v), end(v)) directly. You will have O(N log N) calls to std::hash if you don't want to cache all the unique hash values in a separate storage.
Second, if A is not hashable, but does have data member getters x(), y() and z(), that uniquely determine equality on A: you can do
bool operator<(A const& L, A const& R)
{
return std::tie(L.x(), L.y(), L.z()) < std::tie(R.x(), R.y(), R.z());
}
Again you can write std::sort(begin(v), end(v)) directly.

if you can come up with a function that assigns to each unique element a unique number, then you can build secondary array with this unique numbers and then sort secondary array and with it primary for example by merge sort.
But in this case you need function that assigns to each unique element a unique number i.e. hash-function without collisions. I think this should not be a problem.
And asymptotic of this solution if hash-function have O(1), then building secondary array is O(N) and sorting it with primary is O(NlogN). And summary O(N + NlogN) = O(N logN).
And the bad side of this solution is that it requires double memory.
In conclusion the main sense of this solution is quickly translate your elements to elements which you can quickly compare.

An in place algorithm is
for (int i = 0; i < n-2; i++)
{
for (int j = i+2; j < n; j++)
{
if (v[j].first == v[i].first)
{
std::swap(v[j],v[i+1]);
i++;
}
}
There is probably a more elegant way to write the loop, but this is O(n*m), where n is the number of elements and m is the number of keys. So if m is much smaller than n (with a best case being that all the keys are the same), this can be approximated by O(n). Worst case, the number of key ~= n, so this is O(n^2). I have no idea what you expect for the number of keys, so I can't really do the average case, but it is most likely O(n^2) for the average case as well.
For a small number of keys, this may work faster than unordered multimap, but you'll have to measure to find out.
Note: the order of clusters is completely random.
Edit: (much more efficient in the partially-clustered case, doesn't change complexity)
for (int i = 0; i < n-2; i++)
{
for(;i<n-2 && v[i+1].first==v[i].first; i++){}
for (int j = i+2; j < n; j++)
{
if (v[j].first == v[i].first)
{
std::swap(v[j],v[i+1]);
i++;
}
}
Edit 2: At /u/MrPisarik's comment, removed redundant i check in inner loop.

I'm surprised no one has suggested the use of std::partition yet. It makes the solution nice, elegant, and generic:
template<typename BidirIt, typename BinaryPredicate>
void equivalence_partition(BidirIt first, BidirIt last, BinaryPredicate p) {
using element_type = typename std::decay<decltype(*first)>::type;
if(first == last) {
return;
}
auto new_first = std::partition
(first, last, [=](element_type const &rhs) { return p(*first, rhs); });
equivalence_partition(new_first, last, p);
}
template<typename BidirIt>
void equivalence_partition(BidirIt first, BidirIt last) {
using element_type = typename std::decay<decltype(*first)>::type;
equivalence_partition(first, last, std::equal_to<element_type>());
}
Example here.

find the difference between two sets of pointers to the same object

How can i find the difference between two sets of pointers to the same object?
Is there an efficient way without iterating through all the objects of both sets.
i have two of these sets:
std::set<Object*>
If an object private member(name) is the same as the other objects name that means that the object is the same.

STL's algorithm library is awesome, extensible, and underused.
This will give you the set difference as a vector (I suppose you could convert that to a set, but there's no need, at least for what you asked, and a vector is faster since the sets are already sorted).
template<typename T>
std::vector<T> set_diff(std::set<T> const &a, std::set<T> const &b) {
std::vector v<T>;
std::set_difference(a.begin(), a.end(), b.begin(), b.end(), v.begin());
return v;
}
Optionally, put after the constructor
v.reserve(a.size() + b.size());
and before the return (C++11)
v.shrink_to_fit();
Note: This yields the items in a that are not in b. To find all items in one of the two but not the other, use std::set_symmetric_difference instead.

I think what you mean different is finding pointer elements which only appear in one set. The most efficient way is to iterate the two sets synchronously and this will cost only O(n+m) time, in which n, m denote the size of two sets, which in general case is the lower bound for the problem.
Luckily, STL container set use balanced binary search tree as its base, we can iterate all the elements in order in linear time, so O(n+m) can be achieved.
template<typename T>
std::vector<T> set_diff(std::set<T> const &a, std::set<T> const &b) {
std::vector<T> v;
auto ita = a.begin();
auto itb = b.begin();
while (ita != a.end() && itb != b.end()) {
if (*ita == *itb) {
++ita, ++itb;
} else if (*ita < *itb) {
v.push_back(*ita);
++ita;
} else {
v.push_back(*itb);
++itb;
}
}
for (; ita != a.end(); v.push_back(*ita), ++ita);
for (; itb != b.end(); v.push_back(*itb), ++itb);
return v;
}

What could be reason it crashes when I use vector::erase?

I am trying to do some operation on vector. And calling erase on vector only at some case.
here is my code
while(myQueue.size() != 1)
{
vector<pair<int,int>>::iterator itr = myQueue.begin();
while(itr != myQueue.end())
{
if(itr->first%2 != 0)
myQueue.erase(itr);
else
{
itr->second = itr->second/2;
itr++;
}
}
}
I am getting crash in 2nd iteration.And I am getting this crash with message vector iterator incompatible .
What could be the reason of crash?

If erase() is called the iterator is invalidated and that iterator is then accessed on the next iteration of the loop. std::vector::erase() returns the next iterator after the erased iterator:
itr = myQueue.erase(itr);

Given an iterator range [b, e) where b is the beginning and e one past the end of the range for a vector an erase operation on an iterator i somewhere in the range will invalidate all iterators from i upto e. Which is why you need to be very careful when calling erase. The erase member does return a new iterator which you can you for subsequent operations and you ought to use it:
itr = myQueue.erase( itr );
Another way would be to swap the i element and the last element and then erase the last. This is more efficient since less number of moves of elements beyond i are necessary.
myQueue.swap( i, myQueue.back() );
myQueue.pop_back();
Also, from the looks of it, why are you using vector? If you need a queue you might as well use std::queue.

That is undefined behavior. In particular, once you erase an iterator, it becomes invalid and you can no longer use it for anything. The idiomatic way of unrolling the loop would be something like:
for ( auto it = v.begin(); it != v.end(); ) {
if ( it->first % 2 != 0 )
it = v.erase(it);
else {
it->second /= 2;
++it;
}
}
But then again, it will be more efficient and idiomatic not to roll your own loop and rather use the algorithms:
v.erase( std::remove_if( v.begin(),
v.end(),
[]( std::pair<int,int> const & p ) {
return p.first % 2 != 0;
}),
v.end() );
std::transform( v.begin(), v.end(), v.begin(),
[]( std::pair<int,int> const & p ) {
return std::make_pair(p.first, p.second/2);
} );
The advantage of this approach is that there is a lesser number of copies of the elements while erasing (each valid element left in the range will have been copied no more than once), and it is harder to get it wrong (i.e. misuse an invalidated iterator...) The disadvantage is that there is no remove_if_and_transform so this is a two pass algorithm, which might be less efficient if there is a large number of elements.

Iterating while modifying a loop is generally tricky.
Therefore, there is a specific C++ idiom usable with non-associative sequences: the erase-remove idiom.
It combines the use of the remove_if algorithm with the range overload of the erase method:
myQueue.erase(
std::remove_if(myQueue.begin(), myQueue.end(), /* predicate */),
myQueue.end());
where the predicate is expressed either as a typical functor object or using the new C++11 lambda syntax.
// Functor
struct OddKey {
bool operator()(std::pair<int, int> const& p) const {
return p.first % 2 != 0;
}
};
/* predicate */ = OddKey()
// Lambda
/* predicate */ = [](std::pair<int, int> const& p) { return p.first % 2 != 0; }
The lambda form is more concise but may less self-documenting (no name) and only available in C++11. Depending on your tastes and constraints, pick the one that suits you most.
It is possible to elevate your way of writing code: use Boost.Range.
typedef std::vector< std::pair<int, int> > PairVector;
void pass(PairVector& pv) {
auto const filter = [](std::pair<int, int> const& p) {
return p.first % 2 != 0;
};
auto const transformer = [](std::pair<int, int> const& p) {
return std::make_pair(p.first, p.second / 2);
};
pv.erase(
boost::transform(pv | boost::adaptors::filtered( filter ),
std::back_inserter(pv),
transformer),
pv.end()
);
}
You can find transform and the filtered adaptor in the documentation, along with many others.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to sort only subset in std::vector? - c++

Related

map comparator for sorting map elements

Comparator needed for an associative container

Sort when only equality is available

find the difference between two sets of pointers to the same object

What could be reason it crashes when I use vector::erase?

Categories

Resources