Composability of STL algorithms - c++

The STL algorithms are a pretty useful thing in C++. But one thing that kind of irks me is that they seem to lack composability.
For example, let's say I have a vector<pair<int, int>> and want to transform that to a vector<int> containing only the second member of the pair. That's simple enough:
std::vector<std::pair<int, int>> values = GetValues();
std::vector<int> result;
std::transform(values.begin(), values.end(), std::back_inserter(result),
[] (std::pair<int, int> p) { return p.second; });
Or maybe I want to filter the vector for only those pairs whose first member is even. Also pretty simple:
std::vector<std::pair<int, int>> values = GetValues();
std::vector<std::pair<int, int>> result;
std::copy_if(values.begin(), values.end(), std::back_inserter(result),
[] (std::pair<int, int> p) { return (p.first % 2) == 0; });
But what if I want to do both? There is no transform_if algorithm, and using both transform and copy_if seems to require allocating a temporary vector to hold the intermediate result:
std::vector<std::pair<int, int>> values = GetValues();
std::vector<std::pair<int, int>> temp;
std::vector<int> result;
std::copy_if(values.begin(), values.end(), std::back_inserter(temp),
[] (std::pair<int, int> p) { return (p.first % 2) == 0; });
std::transform(values.begin(), values.end(), std::back_inserter(result),
[] (std::pair<int, int> p) { return p.second; });
This seems rather wasteful to me. The only way I can think of to avoid the temporary vector is to abandon transform and copy_if and simply use for_each (or a regular for loop, whichever suits your fancy):
std::vector<std::pair<int, int>> values = GetValues();
std::vector<int> result;
std::for_each(values.begin(), values.end(),
[&result] (std::pair<int, int> p)
{ if( (p.first % 2) == 0 ) result.push_back(p.second); });
Am I missing something here? Is there a good way to compose two existing STL algorithms into a new one without needing temporary storage?

You're right. You can use Boost.Range adaptors to achieve composition.

I think the problem is unfortunately structural
C++ uses two iterators to represent a sequence
C++ functions are single-valued
so you cannot chain them because a function cannot return "a sequence".
An option would have been to use single-object sequences instead (like the range approach from boost). This way you could have combined the result of one processing as the input of another... (one object -> one object).
In the standard C++ library instead the processing is (two objects -> one object) and it's clear that this cannot be chained without naming the temporary object.

Back in 2000, the problem was already noted. Gary Powell and Martin Weiser came up with a "view" concept, and coined the name "View Template Library". It didn't take off then but the idea makes sense. A "view" adaptor essentially applies an on-the-fly transform. For instance, it can adapt the value_type.
The concept probably should be readdressed now we have C++0x. We've made quite some progress in generic programming since 2000.
For example, let's use the vector<pair<int, int>> to vector<int> example. That could be quite simple:
std::vector<std::pair<int, int>> values = GetValues();
vtl2::view v (values, [](std::pair<int, int> p) { return p.first });
std::vector<int> result(view.begin(), view.end());
Or, using the boost::bind techniques, even simpler:
std::vector<std::pair<int, int>> values = GetValues();
vtl2::view v (values, &std::pair<int, int>::first);
std::vector<int> result(view.begin(), view.end());

Since C++20 you can use std::ranges::copy together with the range adaptors std::views::filter and std::views::values from the Ranges library as follows:
int main() {
std::vector<std::pair<int, int>> values = { {1,2}, {4,5}, {6,7}, {9,10} };
std::vector<int> result;
auto even = [](const auto& p) { return (p.first % 2) == 0; };
std::ranges::copy(values | std::views::filter(even) | std::views::values,
std::back_inserter(result));
for (int i : result)
std::cout << i << std::endl;
return 0;
}
Output:
5
7
In the solution above, no temporary vector is created for an intermediate result, because the view adaptors create ranges that don't contain elements. These ranges are just views over the input vector, but with a customized iteration behavior.
Code on Wandbox

Not sure if this is still active, but...
A new light wait header only lib that does what you describe. Doc talks about lazy evaluation and com compossible generators.
Doc snippet:
Read in up to 10 integers from a file "test.txt".
filter for the even numbers, square them and sum their values.
int total = lz::read<int>(ifstream("test.txt")) | lz::limit(10) |
lz::filter([](int i) { return i % 2 == 0; }) |
lz::map([](int i) { return i * i; }) | lz::sum();
you can split that line up into multiple expressions.
auto numbers = lz::read<int>(ifstream("test.txt")) | lz::limit(10);
auto evenFilter = numbers | lz::filter([](int i) { return i % 2 == 0; });
auto squares = evenFilter | lz::map([](int i) { return i * i; });
int total = squares | lz::sum();
Even though this expression is split over multiple variable assignments, it is not any less efficient.
Each intermediate variable simply
describes a unit of code to be executed. All held in stack.
https://github.com/SaadAttieh/lazyCode

Related

The best practice for (unordered) map keys and values modification

The map of the form map<long long, vector<long long>> is given. One has to take all keys and values modulo some integer N. Some keys can merge and corresponding values must join accordingly. For example, the map {{1,{2,6,4}}, {5,{8,4,9}}, {10,{5,1,7}}} should be equal to {{1,{2,1,4}}, {0,{0,1,2,3,4}}} after reduction modulo 5.
My way is in using a new map but I think there should be a better way.
code added
vector<long long> tmp;
//integer N, for example N = 5
int N = 5;
unordered_map<long long, vector<long long>> map;
//temporary map
unordered_map<long long, vector<long long>> map_tmp;
for (auto & x : map)
{
tmp.clear();
for (auto & y : x.second) tmp.push_back(y % N);
ind = x.first % N;
map_tmp[ind].insert(map_tmp[ind].end(), tmp.begin(), tmp.end());
sort(map_tmp[ind].begin(), map_tmp[ind].end());
map_tmp[ind].erase(unique(map_tmp[ind].begin(), map_tmp[ind].end()), map_tmp[ind].end());
}
map = map_tmp;
Since apparently values in map are unique and after applying modulo operation values contains unique items, then you should use different data structure. for example:
using Map = std::unordered_map<int, std::set<int>>;
std::set will handle uniqueness and order of items for given key.
Now the whole trick is to inspect API of std::unordered_map and std::set and how item can be inserted there. See:
std::unordered_map::insert
std::set::insert
Note return value: std::pair<iterator,bool> which gives you iterator to inserted or exciting item in map/set.
Knowing this thing writing a code which is able to meet your requriements is quite simple:
using Map = std::unordered_map<int, std::set<int>>;
Map moduloMap(const Map& in, int mod)
{
Map out;
for (const auto& [k, s] : in) {
if (s.empty())
continue;
auto& destSet = out.insert({ k % mod, {} }).first->second;
for (auto x : s) {
destSet.insert(x % mod);
}
}
return out;
}
Live demo with tests
Sometimes a for loop can be the easiest, clearest way to do something.
map<long long, vector<long long>> result;
for (const auto& [key, vec] : input) {
process (result[key%5], vec);
}
and process takes the vector by (non-const) reference and appends the reduced values from the second (const) argument.
update
After seeing the code you posted, I have several suggestions:
use a set instead. You are spending multiple steps to append the new values, sort the whole thing together, then remove duplicates. Just use a set which maintains a single copy of each value automatically.
use structured binding in your loop. Instead of x.second and x.first you can just name them key and vec as in my earlier post.
Assuming you still need tmp, declare it where you are calling .clear() now, instead of declaring it way up at the top of your code. You don't need to clear it each time through the loop; it will be empty each time through the loop naturally.

Can I insert into a set, all the elements of a vector that matches a condition, in a single line of code

I have a vector of elements. I want to populate a set using the elements of this vector that match a certain condition. Can I do this using one line, or in any way that is more concise than the below?
// given vector<int> v
set<int> s;
for (const int& i : v)
{
if (/* some condition on i*/)
s.insert(i);
}
For example, something along the line of:
// given vector<int> v
set<int> s;
s.insert(v.filter(/* lambda here*/));
It goes without saying that the v.filter method should return an iterator, not a separate populated vector, for performance reasons.
You can use std::copy_if with a lambda and std::inserter to insert the values into the set. That looks like
std::copy_if(v.begin(), v.end(), std::inserter(s, s.begin()), [](auto val) { return val == some_condition; });
With range-v3, it would be
set<int> s = v | ranges::view::filter([](int e){ return cond(e); });
or simply (if cond already exist)
set<int> s = v | ranges::view::filter(cond);
+1 for the std::copy_if() solution that, IMHO, is the natural solution for this problem.
Just for fun, I propose a different solution based on std::for_each()
std::set<int> s;
std::for_each(v.cbegin(), v.cend(),
[&](int i) { if ( /* some condition */ ) s.insert(i); });

Getting all the keys of a map of the form <pair<int,int>, int*> in C++

In my C++ code I am using a map like this:
std::map<std::pair<int,int>,int*> patterns;
The problem is that I cannot figure out how I get all the keys of that map which are of the form
pair<int,int>
I have seen a few questions related to it, but in all the cases keys are single integers.
If you wanted to just iterate through all the keys:
C++03
for (std::map<std::pair<int,int>,int*>::iterator I = patterns.begin(); I != patterns.end(); I++) {
// I->first is a const reference to a std::pair<int,int> stored in the map
}
C++11
for (auto& kv : patterns) {
// kv.first is a const reference to a std::pair<int,int> stored in the map
}
If you wanted to copy the keys into a new container:
C++03
std::vector<std::pair<int,int> > V;
std::set<std::pair<int,int> > S;
for (std::map<std::pair<int,int>,int*>::iterator I = patterns.begin(); I != patterns.end(); I++) {
V.push_back(I->first);
S.insert(I->first);
}
C++11
std::vector<std::pair<int,int>> V;
std::set<std::pair<int,int>> S;
for (auto& kv : patterns) {
V.push_back(kv.first);
S.insert(kv.first);
}
Because I'm bored, here are a few additional solutions:
You could also do it with standard algorithms and a lambda function, but I don't think this is really better than just writing the loop yourself:
std::vector<std::pair<int,int>> V(patterns.size());
std::transform(patterns.begin(), patterns.end(), V.begin(),
[](decltype(patterns)::value_type& p){ return p.first; });
std::set<std::pair<int,int>> S;
std::for_each(patterns.begin(), patterns.end(),
[&S](decltype(patterns)::value_type& p){ S.insert(p.first); });
You could also use a Boost transform iterator to wrap iterators from the map, such that when the wrapped iterator is dereferenced, it gives you just the key from the map. Then you could call std::vector::insert or std::set::insert directly on a range of transform iterators.

C++ Copy a vector of pair<int,int> to a vector<int>

I've a vector of pair which I need to copy them linearly to a vector of ints. I've the following code which works well, but I'm not sure if it's safe considering struct padding issues in C++.
std::vector < std::pair<int, int> > test_vector;
for (int i=0;i<5;i++) {
test_vector.push_back(std::make_pair(i,i*5));
}
std::vector<int> int_vec(test_vector.size() * 2);
std::copy(reinterpret_cast<int*>(&(*test_vector.begin())),reinterpret_cast<int*>(&(*test_vector.end())),int_vec.begin());
Now, my question is - Is the above code safe? If not, is there an elegant way to do it without writing a loop?
How about std::transform and a lambda function ?
std::vector<int> v;
std::transform(test_vector.begin(), test_vector.end(), std::back_inserter(v),
[&v](const std::pair<int, int> &p)
{ v.push_back( p.first);
return p.second ;});
If you can't use C++11, and probably "hate" doing linear copying using loops
You can use functor like:
struct X{
X(std::vector<int> &x) :v(x){}
int operator () (const std::pair<int, int> &p)
{
v.push_back(p.first);
return p.second;
}
std::vector<int> &v;
};
std::vector<int> v; //Final vector
std::transform(test_vector.begin(),
test_vector.end(),
std::back_inserter(v),
X(v));
std::vector<int> ::iterator it;
for(it=v.begin() ; it!=v.end() ;++it)
std::cout<<*it<<" ";
You don't need any fanciness for this problem. A simple for loop will do, especially if you can't use C++11
std::vector < std::pair<int, int> > test_vector;
std::vector<int> int_vec; int_vec.reserve(test_vector.size() * 2);
for (std::vector < std::pair<int, int> >::const_iterator it = test_vector.begin(), end_it = test_vector.end(); it != end_it; ++it)
{
int_vec.push_back(it->first);
int_vec.push_back(it->second);
}
You're right to be concerned about structure padding issues, but I think you haven't really faced the central assumption that your code is making:
Can I treat a std::pair<int, int> as an array of two integers, with the .first being the first element in the array and .second being the second element?
From a "correctness" point of view, I'd say "no". You've identified padding issues, but there's also the ordering of the fields. There's really no guarantee that .first has a lower memory address than .second.
From a "practical" point of view, I'd be quite surprised your that code did not work. [ Edit: Neil has pointed out a concrete example where there are padding issues; so color me surprised. Besides being "bad form", I now consider the code broken in practice. ]
As for a solution, you can use for_each with a custom action that pushes both elements of the pair (untested code)
struct action {
action ( vector<int> & target ) : t_(target) {}
void operator () ( const pair &p ) const
{ t_.push_back(p.first); t_.push_back(p.second); }
private:
vector<int> &t_;
}
for_each ( test_vector.begin(), test_vector.end(), action(v));
A reinterpret_cast is usually bad news. Wouldn't you be better off reserve()ing enough space in the destination vector and then calling std::for_each on the source vector of pairs and then have the function/lambda push_back both first and second into the destination vector ?

Elegant and efficient algorithm for increasing values of a "vector<pair>"

I need to find an element in a vector<pair<int, float>> and increase the second value.
I tried an approach.
template <typename K, typename V>
struct match_first {
const K _k; match_first(const K& k) : _k(k) {}
bool operator()(const pair<K, V>& el) const {
return _k == el.first;
}
};
Eg to use.:
vector< pair<int, float> > vec;
vec.push_back(make_pair(2, 3.0));
vec.push_back(make_pair(3, 5.0));
vec.push_back(make_pair(1, 1.0));
vector< pair<int, float> >::iterator it = find_if(vec.begin(), vec.end(), match_first<int, float>(3));
if (it != vec.end()) {
it->second += 9;
}
There is a more efficient way of accomplishing this task?
A map seems more natural:
#include <map>
int main()
{
std::map<int, float> m;
m.insert(std::make_pair(2, 3.0));
m.insert(std::make_pair(3, 5.0));
m.insert(std::make_pair(1, 1.0));
auto it = m.find(3);
if (it != m.end()) {
it->second += 9;
}
}
It will also be faster because lookup is O(log(n))
You can reach the same complexity with a vector of sorted pairs by using std::lower_bound (or std::equal_range if keys can be repeated)
It depends on your constrains. If you have the unique key (the first element) you can use std::map<K,V> to hold your objects. Then increasing it is easy. If V has a default constructor initializing it to zero, you can even skip adding new elements and just increment (I am not sure it will work with ints through).
std::map<K,V> data;
data[key] = data[key] + 1;
the [] operator used for non-existent key will create the object for you using its default constructor. To just access data use at or find methods.
extending sehe's answer: You can use std::multimap in the same way if you may have duplicate keys. This container also keeps the <K,V> pair in sorted order(keys) so binary search approach obviously speed up things.
There is no exact answer to your question: it depends.
My first answer is: use std::find_if (available in <algorithm>, part of the C++ Standard Library), then profile your code. If the search turns out to be a bottleneck worthy of concern, then try another approach.
Beware of using a std::map, as it will sort the pairs by their first component (that is, the insertion order will be lost). In addition, it will not allow you to store two pairs with the same first component.
As others have mentioned, you can work around this caveats (if they are indeed caveats to your problem), but, like I mentioned before, it would only be worth your while if you demonstrate first that the search turned out to be a bottleneck after using the standard algorithms.