STL algorithm for merge with addition

STL algorithm for merge with addition - c++

I was using stl::merge to put two sorted collections into one.
But my object has a natural key; and a defined addition semantic, so what I am after is a merge_and_sum that would not just merge the two collections into a single N+M length collection, but if the operator== on the object returned true, would then operator+ them.
I have implemented it thus
template<class _InIt1, class _InIt2, class _OutIt>
_OutIt merge_and_sum(_InIt1 _First1, _InIt1 _Last1, _InIt2 _First2, _InIt2 _Last2, _OutIt _Dest )
{ // copy merging ranges, both using operator<
for (; _First1 != _Last1 && _First2 != _Last2; ++_Dest)
{
if ( *_First2 < *_First1 )
*_Dest = *_First2, ++_First2;
else if ( *_First2 == *_First1)
*_Dest = *_First2 + *_First1, ++_First1, ++_First2;
else
*_Dest = *_First1, ++_First1;
}
_Dest = copy(_First1, _Last1, _Dest); // copy any tail
return (copy(_First2, _Last2, _Dest));
}
But was wondering if I have reinvented something that is composable from the other algorithms.

It sounds like your collections are like multisets with duplicates collapsed by your + operator (maybe just summing the multiplicities instead of keeping redundant copies). I assume so, because you're not changing the sorting order when you +, so + isn't affecting your key.
You should use your implementation. There's nothing in STL that will do it as efficiently. The closest semantic I can think of is standard merge followed by unique_copy. You could almost get unique_copy to work with a side-effectful comparison operator, but that would be extremely ill advised, as the implementation doesn't promise to only compare things directly vs. via a value-copied temporary (or even a given number of times).
Your type and variable names are unpleasantly long ;)

You could use std::merge with an output iterator of your own creation, which does the following in operator=. I think this ends up making more calls to operator== than your version, though, so unless it works out as less code it's probably not worth it.
if ((mylist.size() > 0) && (newvalue == mylist.back())) {
mylist.back() += newvalue;
} else {
mylist.push_back(newvalue);
}
(Actually, writing a proper output iterator might be more fiddly than that, I can't remember. But I hope you get the general idea).
mylist is a reference to the collection you're merging into. If the target doesn't have back(), then you'll have to buffer one value in the output iterator, and only write it once you see a non-equal value. Then define a flush function on the output iterator to write the last value, and call it at the end. I'm pretty sure that in this case it is too much mess to beat what you've already done.

Well, your other option would be to use set_symmetric_difference to get the elements that were different, then use set_intersection to get the ones that are the same, but twice. Then add them together and insert into the first.
typedef set<MyType, MyComp> SetType;
SetType merge_and_add(const SetType& s1, const SetType& s2)
{
SetType diff;
set_symmetric_difference(s1.begin(), s1.end(), s2.begin(), s2.end(), inserter(s2, s2.end());
vector<SetType::value_type> same1, same2;
set_intersection(s1.begin(), s1.end(), s2.begin(), s2.end(), back_inserter(same1));
set_intersection(s2.begin(), s2.end(), s1.begin(), s1.end(), back_inserter(same2));
transform(same1.begin(), same1.end(), same2.begin(), inserter(diff, diff.begin()), plus<SetType::value_type, SetType::value_type>());
return diff;
}
Side note! You should stick to either using operator==, in which case you should use an unordered_set, or you should use operator< for a regular set. A set is required to be partially ordered which means 2 entries are deemed equivalent if !(a < b) && !(b < a). So even if your two objects are unequal by operator==, if they satisfy this condition the set will consider them duplicates. So for your function supplied above I highly recommend refraining from using an == comparison.

Related

mid point using reverse iterators

palindromes implementation using reverse iterators
the error in code is of "operator /", is not defined for iterators
bool isPalindrome( std::string & s)
{
bool check = ( s == std::string{ s.rbegin(), s.rend() } );
return check; // works fine
}
in above there are n comparisons. ( n=s.length )
s == string{ s.rbegin(), s.rbegin() + (s.rend()/2) }
/* error: operator/ not defined */
I'm expecting a one or two lines of code for palindrome check with floor(n/2) comparisons.
Is there an elegant code. Am I missing something about reverse iterators?
and input of std::string{"cac"} should return true and should require 1 comparison
How to get mid-point in O(1) time, using reverse iterators

Dividing an iterator by a number does not really make sense. What you can do is obtain an iterator that is advanced half way the length of the container like this:
string{s.rbegin(), std::next(s.rbegin(), s.size() / 2)}
std::next obtains the iterator after incrementing it the supplied number of times.
This is only going to be efficient O(1) for contiguous containers like std::vector, std::array and std::string.

You are confusing something that references an element with the index of that element. Just think about what s.rend()/2 is supposed to mean. What you actually want is some difference between two indices divided by 2.
Given two iterators you can get their distance via std::distance.

Purpose of having std:less (or similar function) while it just call < operator

Why is std::less (and equivalent other function object) are needed when it just calls < operator and we can anyways overload operators?
Possible answer is in question:
Why is std::less better than "<"?
However I am not totally convinced (specially about weak ordering). Can someone explain a bit more ?

The purpose of std::less and friends is it allows you to generalize your code. Lets say we are writing a sorting function. We start with
void sort(int * begin, int * end) { /* sort here using < /* }
So now we can sort a container we can get int*'s to. Now lets make it a template so it will work with all type
template<typename Iterator>
void sort(Iterator begin, Iterator end) { /* sort here using < /* }
Now we can sort any type and we are using an "Iterator" as our way of saying we need something that points to the element. This is all well and good but this means we require any type passed to provide a operator < for it to work. It also doesn't let use change the sort order.
Now we could use a function pointer but that won't work for built in types as there is no function you can point to. If we instead make an additional template parameter, lets call it Cmp, then we can add another parameter to the function of type Cmp. This will be the comparison function. We would like to provide a default value for that so using std::less makes that very easy and gives us a good "default" behavior.
So with something like
template<typename Iterator, typename Cmp>
void sort(Iterator begin, Iterator end,
Cmp c = std::less<typename std::iterator_traits<Iterator>::value_type>)
{ /* sort here using c /* }
It allows you to sort all built in types, any type that has a operator <, and lets you specify any other way you want to compare elements in the data to sort them.
This is why we need std::less and friends. It lets us make the code generic and flexible without having to write a lot of boiler plate.
Using a function object also gives us some performance benefits. It is easier for the compiler to inline a call to the function call operator then it if it was using a function pointer. It also allows the comparator to have state, like a counter for the number of times it was called. For a more in-depth look at this, see C++ Functors - and their uses.

std::less is just a default policy that takes the natural sorting order of an object (i.e., its comparison operators).
The good thing about using std::less as a default template parameter is that you can tune your sorting algorithm (or your ordered data structure), so that you can decide whether to use the natural sorting order (e.g. minor to major in natural numbers) or a different sorting order for your particular problem (e.g. first odd and then even numbers) without modifying the actual object operators or the algorithm itself.
struct natural {
unsigned value;
natural( unsigned v ) :
value(v)
{
}
bool operator< ( natural other ) const {
return value < other.value;
}
};
struct first_the_odds {
bool operator()( natural left, natural right ) const {
bool left_odd = left.value % 2 != 0;
bool right_odd = right.value % 2 != 0;
if( left_odd == right_odd ) {
return left < right;
} else {
return left_odd;
}
}
};
// Sort me some numbers
std::vector<natural> numbers = { 0, 1, 2, 3, 4 };
std::sort( numbers.begin(), numbers.end(), first_the_odds() );
for( natural n : numbers )
std::cout << n.value << ",";
Output:
1, 3, 0, 2, 4,

The first problem with < is that under the C++ standard, < on pointers can be utter nonsense on anything that doesn't point within the same "parent" object or array.
This is because of the existence of segmented memory models, like the 8086's. It is much faster to compare within segments by ignoring the segment number, and objects cannot span over segments; so < can just compare offsets within segments and ignore segment number.
There are going to be other cases like this on equally strange hardware; imagine hardware where const (ROM) and non-const (RAM) data exist in a separate memory space.
std::less<Pointer> meanwhile guarantees a strict weak ordering despite any quirks in the memory architecture. It will pay the price on every comparison.
The second reason we need std::less is to be able to pass the concept of "less than" around easiliy. Look at std::sort's 3 argument overload:
void sort( Iterator, Iterator, Comparator )
here we can pass how we want to sort in the 3rd parameter. If we pass std::greater<Foo>{} we get one sort, and if we pass std::less<Foo>{} we get the opposite.
By default the 2 argument version sorts like std::less, but once you have greater, greater equal, less equal, adding less just makes sense.
And once you have defined less, using it to describe the behavior of default std sort and std map and the like is easier than repeating all of the wording about how it uses < except if it is on pointers then it generates a strict weak ordering that agrees with < where < has fully specied behavior in the standard.

Is there a standard way to compare two ranges using a predicate?

Given...
string a; // = something.
string b; // = something else. The two strings are of equal length.
string::size_type score = 0;
...what I would like to do is something like...
compare(a.cbegin(), a.cend(), b.cbegin(), b.cend(), [&score](const char c1, const char c2) -> void {
if (c1 == c2) { // actually a bit more complicated in real life
score++;
}
});
...but as far as I can tell there doesn't seem to be a std::compare. The nearest seems to be std::lexicographical_compare but that doesn't quite match. Ditto for std::equal. Is there really nothing appropriate in the standard library? I suppose I could write my own (or use a plain old C style loop which is what I did but how boring :-) but I would think what I'm doing is rather common so that would be a strange omission IMO. So my question is am I missing something?

Is there a standard algorithm to compare to ranges using a predicate? Yes, std::equal, or std::lexicographical_compare.
Is there a standard algorithm to do what your code is doing? std::inner_product can be made to do it:
std::string a = "something";
std::string b = "samething";
auto score = std::inner_product(
a.begin(), a.end(), b.begin(), 0,
[](int x, bool b) { return x + b; },
[](char a, char b) { return a == b; });
but I would think what I'm doing is rather common
No, not really. If you just want to run a general function over corresponding elements in two ranges, the appropriate algorithm would be for_each with a zip iterator. If anything's missing from the standard, it's the zip iterator. We don't need a special algorithm for this purpose.

It looks a bit as if you are looking for std::mismatch() which yields the iterators where the first difference is found (or the end, of course). It doesn't compute the difference, however, because there isn't a subtraction defined for all types. Like the other algorithms std::mismatch() comes in a form with a predicate and one without a predicate.

Thankyou to all that answered. What I was trying to do (more for my edification than anything else really) was to replace this...
for (string::const_iterator c1 = a.begin(), c2 = b.begin(); c1 != a.end(); ++c1, ++c2) {
if (*c1 == *c2) {
score++;
}
}
...with snazzy new c++11 stuff :-) I looked at equal, lexicographical_compare etc. but I guess what tripped me up was that they take a boolean predicate and if it returns false processing stops whereas I needed to process the entire ranges each time. Then after reading the answers you gave me I had the epiphany that just because there is a return value doesn't mean I can't throw it away if I don't need it. By simply always returning true in my lambda I can use any of the above mentioned algorithms and they will run to the end of the range.
The only thing is as I would be using the algorithms in a different way than their names suggest, it might cause maintainance problems in the future so I will just stick to my boring old loop for now but I learned something new so thanks once again.

C++ equivalent of Python difference_update?

s1 and s2 are sets (Python set or C++ std::set)
To add the elements of s2 to s1 (set union), you can do
Python: s1.update(s2)
C++: s1.insert(s2.begin(), s2.end());
To remove the elements of s2 from s1 (set difference), you can do
Python: s1.difference_update(s2)
What is the C++ equivalent of this? The code
s1.erase(s2.begin(), s2.end());
does not work, for s1.erase() requires iterators from s1.The code
std::set<T> s3;
std::set_difference(s1.begin(), s1.end(), s2.begin(), s2.end(), std::inserter(s3, s3.end());
s1.swap(s3);
works, but seems overly complex, at least compared with Python.
Is there a simpler way?

Using std::set_difference is the idiomatic way to do this in C++. You have stumbled across one of the primary differences (pun intended) between C++/STL and many other languages. STL does not bundle operations directly with the data structures. This is why std::set does not implement a difference routine.
Basically, algorithms such as std::set_difference write the result of the operation to another object. It is interesting to note that the algorithm does not require that either or both of the operands are actually std::set. The definition of the algorithm is:
Effects: Copies the elements of the range [first1, last1) which are not present in the range [first2, last2) to the range beginning at result. The elements in the constructed range are sorted.
Requires: The resulting range shall not overlap with either of the original ranges. Input ranges are required to be order by the same operator<.
Returns: The end of the constructed range.
Complexity: At most 2 * ((last1 - first1) + (last2 - first2)) - 1 comparisons
The interesting difference is that the C++ version is applicable to any two sorted ranges. In most languages, you are forced to coerce or translate the calling object (left-hand operand) into a set before you have access to the set difference algorithm.
This is not really pertinent to your question, but this is the reason that the various set algorithms are modeled as free-standing algorithms instead of member methods.

You should iterate through the second set:
for( set< T >::iterator iter = s2.begin(); iter != s2.end(); ++iter )
{
s1.erase( *iter );
}
This will could be cheaper than using std::set_difference - set_difference copies the unique objects into a new container, but it takes linear time, while .erase will not copy anything, but is O(n * log( n ) ).
In other words, depends on the container, you could choose the way, that will be faster for your case.
Thanks David Rodríguez - dribeas for the remark! (:
EDIT: Doh! I thought about BOOST_FOREACH at the very beginning, but I was wrong that it could not be used.. - you don't need the iterator, but just the value.. As user763305 said by himself/herself.

In c++ there is no difference method in the set. The set_difference looks much more awkward as it is more generic than applying a difference on two sets. Of course you can implement your own version of in place difference on sets:
template <typename T, typename Compare, typename Allocator>
void my_set_difference( std::set<T,Compare,Allocator>& lhs, std::set<T,Compare,Allocator> const & rhs )
{
typedef std::set<T,Comapre,Allocator> set_t;
typedef typename set_t::iterator iterator;
typedef typename set_t::const_iterator const_iterator;
const_iterator rit = rhs.begin(), rend = rhs.end();
iterator it = lhs.begin(), end = lhs.end();
while ( it != end && rit != rend )
{
if ( lhs.key_comp( *it, *rit ) ) {
++it;
} else if ( lhs.key_comp( *rit, *it ) ) {
++rit;
} else {
++rit;
lhs.erase( it++ );
}
}
}
The performance of this algorithm will be linear in the size of the arguments, and require no extra copies as it modifies the first argument in place.

You can also do it with remove_if writing your own functor for testing existence in a set, e.g.
std::remove_if(s1.begin(), s1.end(), ExistIn(s2));
I suppose that set_difference is more efficient though as it probably scans both sets only once

Python set is unordered, and is more of an equivalent of C++ std::unordered_set than std::set, which is ordered.
David Rodríguez's algorithm relies on the fact that std::set is ordered, so the lhs and rhs sets can be traversed in the way as exhibit in the algorithm.
For a more general solution that works for both ordered and unordered sets, Kiril Kirov's algorithm should be the safe one to adopt if you are enforcing/preserving the "unorderedness" nature of Python set.

Tell `string::operator==` to start comparing at the back of the string

Is it possible/(relatively) easy/std™ to start comparing at the back of a string or should I write my own function for that? It would be relatively straightforward of course, but still, I would trust a standard library implementation over mine any day.
The end of the string is almost unique, and the front is quite common, that's the only reason I need this "optimization".
Thanks!

Best I can think of so far is str1.size() == str2.size() && std::equal(str1.rbegin(), str1.rend(), str2.rbegin())

You could use std::equal in combination with std::basic_string::reverse_iterator (rbegin, rend).
However, it is relevant only if the strings have the same lenght (so you need first to check the sizes) and only for equality of the strings (since the most significant difference will be the last compared while iterating).
Example:
bool isEqual = s1.size() == s2.size() && std::equal( s1.rbegin(), s1.rend(), s2.rbegin());

Depending on how long the strings are (and your compiler), you may be better off sticking with operator==. On Visual C++ v10, that reduces to a memcmp call via char_traits::compare, which is (when optimized) going to compare the target byte ranges in chunks, probably as many bytes at a time as will fit in a register (4/8 for 32/64-bit).
static int __CLRCALL_OR_CDECL compare(const _Elem *_First1, const _Elem *_First2,
size_t _Count)
{ // compare [_First1, _First1 + _Count) with [_First2, ...)
return (_CSTD memcmp(_First1, _First2, _Count));
}
Meanwhile, std::equal (the nicest alternative) does a byte by byte comparison. Does anybody know if this will get optimized in the same way since they are reverse iterators? At best, alignment handling is more complex since the start of the range is not guaranteed well-aligned.
template<class _InIt1,
class _InIt2> inline
bool _Equal(_InIt1 _First1, _InIt1 _Last1, _InIt2 _First2)
{ // compare [_First1, _Last1) to [First2, ...)
for (; _First1 != _Last1; ++_First1, ++_First2)
if (!(*_First1 == *_First2))
return (false);
return (true);
}
See #greyfade's answer here for some color on GCC.

If you want to reverse it first, I would suggest reverse() from to reverse the string first, then start comparing using string.compare() or use your own algorithm. However, reverse() does take a while, and is processor intensive, so I do suggest your own function to handle this one. Start a loop with i equal to the string.length(), and then count back using --i and compare.
function stringCompFromBack(string str1, string str2)
{
if (str1.length() != str2.length)
{return false;}
for(int i = str1.length() ; i > 0; --i)
{
if(str1[i] != str2 [i])
{return false;}
}
return true;
}
string str1 = "James Madison";
string str2 = "James Ford";
bool same = stringCompFromBack(str1, str2);

You should write your own function for that. You could reverse as Lost says, but that wouldn't be an optimization unless you kept that reversed string around and where comparing multiple times. Even then, it wouldn't be an improvement over writing your own that simply iterates the strings in reverse.

I see two options:
Write your own comparison function and call that.
Write a wrapper class around
std::string, and implement operator== for that class to have the behavior you want.
The second is probably overkill.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

STL algorithm for merge with addition - c++

Related

mid point using reverse iterators

Purpose of having std:less (or similar function) while it just call < operator

Is there a standard way to compare two ranges using a predicate?

C++ equivalent of Python difference_update?

Tell `string::operator==` to start comparing at the back of the string

Categories

Resources