Standard library partition algorithm - c++

I wrote this partition function:
template <class I, class P> I partition(I beg, I end, P p)
{
I first = beg;
while(beg != end) {
if(!p(*beg))
beg++;
else {
// if(beg != first) - EDIT: add conditional to prevent swapping identical elements
std::swap(*beg, *first);
first++;
beg++;
}
}
return first;
}
I've tested it with a few outputs and I haven't found anything wrong with it.
The standard library partition function is equivalent to:
template <class BidirectionalIterator, class UnaryPredicate>
BidirectionalIterator partition (BidirectionalIterator first,
BidirectionalIterator last, UnaryPredicate pred)
{
while (first!=last) {
while (pred(*first)) {
++first;
if (first==last) return first;
}
do {
--last;
if (first==last) return first;
} while (!pred(*last));
swap (*first,*last);
++first;
}
return first;
}
The latter seems much more complicated and has nested loops. Is there something wrong with my version? If not why the more complicated version?
Here is some output using the following predicate:
bool greaterthantwo(double val)
{
return val > 2;
}
MAIN
std::vector<double> test{1,2,3,4,2,5,6,7,4,8,2,4,10};
std::vector<double>::iterator part = ::partition(test.begin(), test.end(), greaterthantwo);
for(const auto &ref:test)
std::cout << ref << " ";
std::cout << std::endl;
for(auto it = part; it != test.end(); it++)
std::cout << *it << " ";
std::cout << std::endl;
OUTPUT
3 4 5 6 7 4 8 4 10 2 2 2 1
2 2 2 1

Your version is close to Nico Lomuto partition. Such partition works on ForwardIterators and is semi-stable (first part is stable, which can be useful in some circumstances).
Version from implementation of standard library which you quoted is close to partition described by C. A. R. Hoare at his paper "Quicksort". It works on BidirectionalIterators, and does not imply any stability.
Let's compare them on following case:
FTTTT
Forward partition will proceed like this:
FTTTT
TFTTT
TTFTT
TTTFT
TTTTF
resulting in swap on each iteration except first, while Bidirectional partition will go thru following permutations:
FTTTT
TTTTF
resulting only in one swap for all iterations.
Moreover, in general case Bidirectional will do N/2 swaps at maximum, while Forward version can do up to ~N swaps.
std::partition in C++98/03 works on BidirectionalIterators, but in C++11 they relaxed requirements to ForwardIterators (though, it doesn't have to be semi-stable). Complexity requirements:
Complexity: If ForwardIterator meets the requirements for a BidirectionalIterator, at most (last -first) / 2 swaps are done; otherwise at most last - first swaps are done. Exactly last - first applications of the predicate are done.
As you can see, implementations of standard library most likely will use Lomuto's partition for ForwardIterators and Hoare's partition for BidirectionalIterators.
Alexander Stepanov discuses partition problem in his Notes on Programming and in Elements of Programming co-authored with Paul McJones
Live Demo
#include <initializer_list>
#include <forward_list>
#include <algorithm>
#include <iostream>
#include <iterator>
#include <list>
using namespace std;
int counter = 0;
struct T
{
int value;
T(int x = 0) : value(x) {}
T(const T &x)
{
++counter;
value = x.value;
}
T &operator=(const T &x)
{
++counter;
value = x.value;
return *this;
}
};
auto pred = [](const T &x){return x.value;};
template<typename Container>
void test()
{
Container l = {0, 1, 1, 1, 1};
counter = 0;
partition(begin(l), end(l), pred);
cout << "Moves count: " << counter << endl;
}
int main()
{
test<forward_list<T>>();
test<list<T>>();
}
Output is:
Moves count: 12
Moves count: 3
(swap is 3 moves)

Your function has a serious defect. It swaps each element that satisfies the predicate with itself if initial elements of the sequence satisfy the predicate.

From STL partition description
Complexity
Linear in the distance between first and last: Applies pred to each element, and possibly swaps some of them (if the iterator type is a bidirectional, at most half that many swaps, otherwise at most that many).
In your implementation you swap more.

Related

Construct chains of pairs of numbers with one common member

I need to construct a chain of pair of numbers where:
In each pair, the first one is smaller than the second
In order to form a chain between two consecutive nodes, they must have one number in common. In other words, the link (a,b) -- (c,d) can be made if and only if either a==c, b==c, a==d or b==d
A pair cannot be made of the same number. In other words, if (a,b) exists, then a!=b
This may look like a Longest increasing subsequence but I actually want to chain consecutive pairs that have one equal member.
Example:
Initial list (unordered):
(0,1)
(2,3)
(1,6)
(4,6)
(8,9)
(2,8)
Result:
----- chain #1
(0,1)
(1,6)
(4,6)
----- chain #2
(2,3)
(2,8)
(8,9)
I could do an algorithm that will iterate over the entire list for each cell (O(n^2)), but I want to make it faster and I have the flexibility of ordering my initial array in any way I want (std::set, std::map, std::unordered_map, etc.). My list is made of tens of thousands of pairs so I need an efficient solution in terms of processing time.
You can solve it in O(N * log(N)) when you manage two lists, one sorted with respect to first the other sorted with respect to second.
The code has some duplication that I didnt bother to clean up yet.
#include <iostream>
#include <list>
#include <algorithm>
#include <tuple>
#include <any>
struct pair_and_iter {
int first;
int second;
std::any other_iter;
};
struct compare_first {
bool operator()(int x,pair_and_iter p){ return x < p.first; }
bool operator()(pair_and_iter p, int x){ return p.first < x; }
};
struct compare_second {
bool operator()(int x,pair_and_iter p){ return x < p.second; }
bool operator()(pair_and_iter p, int x){ return p.second < x; }
};
template <typename Iter,typename Comp>
Iter my_find(Iter first,Iter last,int x, Comp comp) {
auto it = std::lower_bound(first,last,x,comp);
if (it != last && (!comp(x,*it) && !comp(*it,x))){
return it;
} else {
return last;
}
}
int main() {
std::list<pair_and_iter> a {{0,1},{2,3},{1,6},{4,6},{8,9},{2,8}};
std::list<pair_and_iter> b;
for (auto it = a.begin(); it != a.end(); ++it){
b.push_back({it->first,it->second,it});
it->other_iter = std::prev(b.end());
}
a.sort([](const auto& x,const auto& y){
return std::tie(x.first,x.second) < std::tie(y.first,y.second); });
b.sort([](const auto& x,const auto& y){
return std::tie(x.second,x.first) < std::tie(y.second,y.first); });
std::vector<std::vector<pair_and_iter>> result;
std::vector<pair_and_iter> current_result;
current_result.push_back(a.front());
auto current = current_result.begin();
b.erase(std::any_cast<std::list<pair_and_iter>::iterator>(current->other_iter));
a.erase(a.begin());
while (a.size() && b.size()) {
// look for an element with same first
auto it = my_find(a.begin(),a.end(),current->first,compare_first{});
if (it == a.end()) {
// look for element where current->second == elem.first
it = my_find(a.begin(),a.end(),current->second,compare_first{});
}
if (it != a.end()){
current_result.push_back(*it);
current = std::prev(current_result.end());
b.erase(std::any_cast<std::list<pair_and_iter>::iterator>(it->other_iter));
a.erase(it);
continue;
}
// look for element with current->first == elem.second
it = my_find(b.begin(),b.end(),current->first,compare_second{});
if (it == b.end()) {
// look for element with same second
it = my_find(b.begin(),b.end(),current->second,compare_second{});
}
if (it != b.end()) {
current_result.push_back(*it);
current = std::prev(current_result.end());
a.erase(std::any_cast<std::list<pair_and_iter>::iterator>(it->other_iter));
b.erase(it);
continue;
}
// no matching element found
result.push_back(current_result);
current_result.clear();
current_result.push_back(a.front());
current = current_result.begin();
b.erase(std::any_cast<std::list<pair_and_iter>::iterator>(current->other_iter));
a.erase(a.begin());
}
result.push_back(current_result);
for (const auto& chain : result){
for (const auto& elem : chain){
std::cout << elem.first << " " << elem.second << "\n";
}
std::cout << "\n";
}
}
Output:
0 1
1 6
4 6
2 3
2 8
8 9
I used std::list because it has stable iterators and constant time erase. std::any for type erasure because each list contains iterators to the other list. a is sorted with respect to first and b is sorted with respect to second. Hence std::lower_bound can be used to to find a match in O(logN). A single linear search is traded against 2 binary searchs to find either current->first or current->second in a first of a and 2 binary searchs to find either current->first or current->second in a second of b. In total it is O(N log(N)) for sorting plus O( log(N) + log(N-1) + log(N-2) + .... log(1)) which equals O(log( n! )) if I am not mistaken.
PS: You didn't mention that you are looking for a longest chain, and this algorithm is not finding the longest chain. It just picks the first element of the remaining ones and uses the next element it finds to continue the chain.

How do iterators map/know their current position or element

Consider the following code example :
#include <vector>
#include <numeric>
#include <algorithm>
#include <iterator>
#include <iostream>
#include <functional>
int main()
{
std::vector<int> v(10, 2);
std::partial_sum(v.cbegin(), v.cend(), v.begin());
std::cout << "Among the numbers: ";
std::copy(v.cbegin(), v.cend(), std::ostream_iterator<int>(std::cout, " "));
std::cout << '\n';
if (std::all_of(v.cbegin(), v.cend(), [](int i){ return i % 2 == 0; })) {
std::cout << "All numbers are even\n";
}
if (std::none_of(v.cbegin(), v.cend(), std::bind(std::modulus<int>(),
std::placeholders::_1, 2))) {
std::cout << "None of them are odd\n";
}
struct DivisibleBy
{
const int d;
DivisibleBy(int n) : d(n) {}
bool operator()(int n) const { return n % d == 0; }
};
if (std::any_of(v.cbegin(), v.cend(), DivisibleBy(7))) {
std::cout << "At least one number is divisible by 7\n";
}
}
If we look at this part of the code :
if (std::all_of(v.cbegin(), v.cend(), [](int i){ return i % 2 == 0; })) {
std::cout << "All numbers are even\n";
}
which is fairly easy to understand. It iterates over those vector elements , and finds out i%2==0 , whether they are completely divisible by 2 or not , hence finds out they're even or not.
Its for loop counterpart could be something like this :
for(int i = 0; i<v.size();++i){
if(v[i] % 2 == 0) areEven = true; //just for readablity
else areEven = false;
}
In this for loop example , it is quiet clear that the current element we're processing is i since we're actually accessing v[i]. But how come in iterator version of same code , it maps i or knows what its current element is that we're accessing?
How does [](int i){ return i % 2 == 0; }) ensures/knows that i is the current element which iterator is pointing to.
I'm not able to makeout that without use of any v.currently_i_am_at_this_posiition() , how is iterating done. I know what iterators are but I'm having a hard time grasping them. Thanks :)
Iterators are modeled after pointers, and that's it really. How they work internally is of no interest, but a possible implementation is to actually have a pointer inside which points to the current element.
Iterating is done by using an iterator object
An iterator is any object that, pointing to some element in a range of
elements (such as an array or a container), has the ability to iterate
through the elements of that range using a set of operators (with at
least the increment (++) and dereference (*) operators).
The most obvious form of iterator is a pointer: A pointer can point to
elements in an array, and can iterate through them using the increment
operator (++).
and advancing it through the set of elements. The std::all_of function in your code is roughly equivalent to the following code
template< class InputIt, class UnaryPredicate >
bool c_all_of(InputIt first, InputIt last, UnaryPredicate p)
{
for (; first != last; ++first) {
if (!p(*first)) {
return false; // Found an odd element!
}
}
return true; // All elements are even
}
An iterator, when incremented, keeps track of the currently pointed element, and when dereferenced it returns the value of the currently pointed element.
For teaching's and clarity's sake, you might also think of the operation as follows (don't try this at home)
bool c_all_of(int* firstElement, size_t numberOfElements, std::function<bool(int)> evenTest)
{
for (size_t i = 0; i < numberOfElements; ++i)
if (!evenTest(*(firstElement + i)))
return false;
return true;
}
Notice that iterators are a powerful abstraction since they allow consistent elements access in different containers (e.g. std::map).

What's a good implementation of applying a unary function to some elements of a vector?

I'd like to apply a function UnaryFunction f to some elements of a std container, given a predicate UnaryPredicate p - sort of what you would get if you combine std::partition and then apply std::for_each to one of the partitions.
I'm quite new to C++, so forgive my ignorance. I have, however, looked for a suitable implementation in <algorithm>, yet I can't seem to find the desired function.
Based on the possible implementations over at cppreference.com, I've come up with the following:
template<class InputIt, class UnaryPredicate, class UnaryFunction>
UnaryFunction for_each_if(InputIt first, InputIt last, UnaryPredicate p, UnaryFunction f)
{
for (; first != last; ++first) {
if (p(*first))
{
f(*first);
}
}
return f;
}
The return value is modeled as per std::for_each, although an OutputIter might have been a better choice. This would require a more convoluted implementation though, and so I've chosen brevity over finesse this time around. The alternative implementation is left as an exercise to the reader.
My question is: is there already an established way to do this in the std library? If not, would this be a reasonable implementation of such a function template?
STL does not support composition of algorithms very well. As you said, you could first call partition, and then call for_each on one of the partitions if you don't care about the order of elements.
For a new project, or one where you can introduce libraries, I would strongly recommend to have a look at a range library, e.g. Boost.Range or Eric Niebler's range-v3.
With the range library, it can be done like this:
template<typename R, typename P, typename F>
F for_each_if(R& rng, P pred, F f)
{
using namespace boost::adaptors;
return (rng | filtered(pred) | for_each(f));
}
As far as the comments go, there seems to be no implementation of this in the std library. However, as user2672165 points out, the predicate may be easily included in the function. To illustrate this, see the following modified version of the for_each example over at cppreference.com:
#include <vector>
#include <algorithm>
#include <iostream>
struct Sum {
Sum() { sum = 0; }
void operator()(int n) { sum += n; }
int sum;
};
int main()
{
std::vector<int> nums{3, 4, 2, 9, 15, 267};
std::cout << "before: ";
for (auto n : nums) {
std::cout << n << " ";
}
std::cout << '\n';
std::for_each(nums.begin(), nums.end(), [](int &n){ if (n > 5) n++; });
// Calls Sum::operator() for each number
Sum s = std::for_each(nums.begin(), nums.end(), Sum());
std::cout << "after: ";
for (auto n : nums) {
std::cout << n << " ";
}
std::cout << '\n';
std::cout << "sum: " << s.sum << '\n';
}
Here, the predicate is added to the function, so that [](int &n){ n++; } now becomes [](int &n){ if (n > 5) n++; } to only apply the function to integer elements greater than 5.
Expected output is
before: 3 4 2 9 15 267
after: 3 4 2 10 16 268
sum: 303
Hope this helps someone else out there.

Output over unique elements of `std::multiset` and their frequency using std:: algorithm in C++ (no loops)

I have the following multiset in C++:
template<class T>
class CompareWords {
public:
bool operator()(T s1, T s2)
{
if (s1.length() == s2.length())
{
return ( s1 < s2 );
}
else return ( s1.length() < s2.length() );
}
};
typedef multiset<string, CompareWords<string>> mySet;
typedef std::multiset<string,CompareWords<string>>::iterator mySetItr;
mySet mWords;
I want to print each unique element of type std::string in the set once and next to the element I want to print how many time it appears in the list (frequency), as you can see the functor "CompareWord" keeps the set sorted.
A solution is proposed here, but its not what I need, because I am looking for a solution without using (while,for,do while).
I know that I can use this:
//gives a pointer to the first and last range or repeated element "word"
auto p = mWords.equal_range(word);
// compute the distance between the iterators that bound the range AKA frequency
int count = static_cast<int>(std::distance(p.first, p.second));
but I can't quite come up with a solution without loops?
Unlike the other solutions, this iterates over the list exactly once. This is important, as iterating over a structure like std::multimap is reasonably high overhead (the nodes are distinct allocations).
There are no explicit loops, but the tail-end recursion will be optimized down to a loop, and I call an algorithm that will run a loop.
template<class Iterator, class Clumps, class Compare>
void produce_clumps( Iterator begin, Iterator end, Clumps&& clumps, Compare&& compare) {
if (begin==end) return; // do nothing for nothing
typedef decltype(*begin) value_type_ref;
// We know runs are at least 1 long, so don't bother comparing the first time.
// Generally, advancing will have a cost similar to comparing. If comparing is much
// more expensive than advancing, then this is sub optimal:
std::size_t count = 1;
Iterator run_end = std::find_if(
std::next(begin), end,
[&]( value_type_ref v ){
if (!compare(*begin, v)) {
++count;
return false;
}
return true;
}
);
// call our clumps callback:
clumps( begin, run_end, count );
// tail end recurse:
return produce_clumps( std::move(run_end), std::move(end), std::forward<Clumps>(clumps), std::forward<Compare>(compare) );
}
The above is a relatively generic algorithm. Here is its use:
int main() {
typedef std::multiset<std::string> mySet;
typedef std::multiset<std::string>::iterator mySetItr;
mySet mWords { "A", "A", "B" };
produce_clumps( mWords.begin(), mWords.end(),
[]( mySetItr run_start, mySetItr /* run_end -- unused */, std::size_t count )
{
std::cout << "Word [" << *run_start << "] occurs " << count << " times\n";
},
CompareWords<std::string>{}
);
}
live example
The iterators must refer to a sorted sequence (with regards to the Comparator), then the clumps will be passed to the 3rd argument together with their length.
Every element in the multiset will be visited exactly once with the above algorithm (as a right-hand side argument to your comparison function). Every start of a clump will be visited (length of clump) additional times as a left-hand side argument (including clumps of length 1). There will be exactly N iterator increments performed, and no more than N+C+1 iterator comparisons (N=number of elements, C=number of clumps).
#include <iostream>
#include <algorithm>
#include <set>
#include <iterator>
#include <string>
int main()
{
typedef std::multiset<std::string> mySet;
typedef std::multiset<std::string>::iterator mySetItr;
mySet mWords;
mWords.insert("A");
mWords.insert("A");
mWords.insert("B");
mySetItr it = std::begin(mWords), itend = std::end(mWords);
std::for_each<mySetItr&>(it, itend, [&mWords, &it] (const std::string& word)
{
auto p = mWords.equal_range(word);
int count = static_cast<int>(std::distance(p.first, p.second));
std::cout << word << " " << count << std::endl;
std::advance(it, count - 1);
});
}
Outputs:
A 2
B 1
Live demo link.
Following does the job without explicit loop using recursion:
void print_rec(const mySet& set, mySetItr it)
{
if (it == set.end()) {
return;
}
const auto& word = *it;
auto next = std::find_if(it, set.end(),
[&word](const std::string& s) {
return s != word;
});
std::cout << word << " appears " << std::distance(it, next) << std::endl;
print_rec(set, next);
}
void print(const mySet& set)
{
print_rec(set, set.begin());
}
Demo

How do I efficiently remove_if only a single element from a forward_list?

Well I think the question pretty much sums it up. I have a forward_list of unique items, and want to remove a single item from it:
std::forward_list<T> mylist;
// fill with stuff
mylist.remove_if([](T const& value)
{
return value == condition;
});
I mean, this method works fine but it's inefficient because it continues to search once the item is found and deleted. Is there a better way or do I need to do it manually?
If you only want to remove the first match, you can use std::adjacent_find followed by the member erase_after
#include <algorithm>
#include <cassert>
#include <forward_list>
#include <iostream>
#include <ios>
#include <iterator>
// returns an iterator before first element equal to value, or last if no such element is present
// pre-condition: before_first is incrementable and not equal to last
template<class FwdIt, class T>
FwdIt find_before(FwdIt before_first, FwdIt last, T const& value)
{
assert(before_first != last);
auto first = std::next(before_first);
if (first == last) return last;
if (*first == value) return before_first;
return std::adjacent_find(first, last, [&](auto const&, auto const& R) {
return R == value;
});
}
int main()
{
auto e = std::forward_list<int>{};
std::cout << std::boolalpha << (++e.before_begin() == end(e)) << "\n";
std::cout << (find_before(e.before_begin(), end(e), 0) == end(e)) << "\n";
auto s = std::forward_list<int>{ 0 };
std::cout << (find_before(s.before_begin(), end(s), 0) == s.before_begin()) << "\n";
auto d = std::forward_list<int>{ 0, 1 };
std::cout << (find_before(d.before_begin(), end(d), 0) == d.before_begin()) << "\n";
std::cout << (find_before(d.before_begin(), end(d), 1) == begin(d)) << "\n";
std::cout << (find_before(d.before_begin(), end(d), 2) == end(d)) << "\n";
// erase after
auto m = std::forward_list<int>{ 1, 2, 3, 4, 1, 3, 5 };
auto it = find_before(m.before_begin(), end(m), 3);
if (it != end(m))
m.erase_after(it);
std::copy(begin(m), end(m), std::ostream_iterator<int>(std::cout, ","));
}
Live Example
This will stop as soon as a match is found. Note that the adjacent_find takes a binary predicate, and by comparing only the second argument, we get an iterator before the element we want to remove, so that erase_after can actually remove it. Complexity is O(N) so you won't get it more efficient than this.
FWIW, here's another short version
template< typename T, class Allocator, class Predicate >
bool remove_first_if( std::forward_list< T, Allocator >& list, Predicate pred )
{
auto oit = list.before_begin(), it = std::next( oit );
while( it != list.end() ) {
if( pred( *it ) ) { list.erase_after( oit ); return true; }
oit = it++;
}
return false;
}
Going to have to roll your own...
template <typename Container, typename Predicate>
void remove_first_of(Container& container, Predicate p)
{
auto it = container.before_begin();
for (auto nit = std::next(it); ; it = nit, nit = std::next(it))
{
if (nit == container.end())
return;
if (p(*nit))
{
container.erase_after(it);
return;
}
}
}
A more complete example...
There is nothing in the standard library which would be directly applicable. Actually, there is. See #TemplateRex's answer for that.
You can also write this yourself (especially if you want to combine the search with the erasure), something like this:
template <class T, class Allocator, class Predicate>
bool remove_first_if(std::forward_list<T, Allocator> &list, Predicate pred)
{
auto itErase = list.before_begin();
auto itFind = list.begin();
const auto itEnd = list.end();
while (itFind != itEnd) {
if (pred(*itFind)) {
list.erase_after(itErase);
return true;
} else {
++itErase;
++itFind;
}
}
return false;
}
This kind of stuff used to be a standard exercise when I learned programming way back in the early '80s. It might be interesting to to recall the solution, and compare that with what one can do in C++. Actually that was in Algol 68, but I won't impose that on you and give the translation into C. Given
typedef ... T;
typedef struct node *link;
struct node { link next; T data; };
one could write, realising that one needs to pass the address of the list head pointer if is to be possible to unlink the first node:
void search_and_destroy(link *p_addr, T y)
{
while (*p_addr!=NULL && (*p_addr)->data!=y)
p_addr = &(*p_addr)->next;
if (*p_addr!=NULL)
{
link old = *p_addr;
*p_addr = old->next; /* unlink node */
free(old); /* and free memory */
}
}
There are a lot of occurrences of *p_addr there; it is the last one, where it is the LHS of an assignment, that is the reason one needs the address of a pointer here in the first place. Note that in spite of the apparent complication, the statement p_addr = &(*p_addr)->next; is just replacing a pointer by the value it points to, and then adding an offset (which is 0 here).
One could introduce an auxiliary pointer value to lighten the code a bit up, as follows
void search_and_destroy(link *p_addr, T y)
{
link p=*p_addr;
while (p!=NULL && p->data!=y)
p=*(p_addr = &p->next);
if (p!=NULL)
{
*p_addr = p->next;
free(p);
}
}
but that is fundamentally the same code: any decent compiler should realise that the pointer value *p_addr is used multiple times in succession in the first example, and keep it in a register.
Now with std::forward_list<T>, we are not allowed access to the pointers that link the nodes, and get those awkward "iterators pointing one node before the real action" instead. Our solution becomes
void search_and_destroy(std::forward_list<T> list, T y)
{
std::forward_list<T>::iterator it = list.before_begin();
const std::forward_list<T>::iterator NIL = list.end();
while (std::next(it)!=NIL && *std::next(it)!=y)
++it;
if (std::next(it)!=NIL)
list.erase_after(it);
}
Again we could keep a second iterator variable to hold std::next(it) without having to spell it out each time (not forgetting to refresh its value when we increment it) and arrive at essentially the answer by Daniel Frey. (We could instead try to make that variable a pointer of type *T equal to &*std::next(it) instead, which suffices for the use we make of it, but it would actually be a bit of a hassle to ensure it becomes the null pointer when std::next(it)==NIL, as the standard will not let us take &*NIL).
I cannot help feel that since the old days the solution to this problem has not become more elegant.