Coming back to C++ after years of C# I was wondering what the modern - read: C++11 - way of filtering an array would be, i.e. how can we achieve something similar to this Linq query:
var filteredElements = elements.Where(elm => elm.filterProperty == true);
In order to filter a vector of elements (strings for the sake of this question)?
I sincerely hope the old STL style algorithms (or even extensions like boost::filter_iterator) requiring explicit methods to be defined are superseded by now?
See the example from cplusplus.com for std::copy_if:
std::vector<int> foo = {25,15,5,-5,-15};
std::vector<int> bar;
// copy only positive numbers:
std::copy_if (foo.begin(), foo.end(), std::back_inserter(bar), [](int i){return i>=0;} );
std::copy_if evaluates the lambda expression for every element in foo here and if it returns true it copies the value to bar.
The std::back_inserter allows us to actually insert new elements at the end of bar (using push_back()) with an iterator without having to resize it to the required size first.
In C++20, use filter view from the ranges library: (requires #include <ranges>)
// namespace views = std::ranges::views;
vec | views::filter([](int a){ return a % 2 == 0; })
lazily returns the even elements in vec.
(See [range.adaptor.object]/4 and [range.filter])
This is already supported by GCC 10 (live demo). For Clang and older versions of GCC, the original range-v3 library can be used too, with #include <range/v3/view/filter.hpp> (or #include <range/v3/all.hpp>) and the ranges::views namespace instead of std::ranges::views (live demo).
A more efficient approach, if you don't actually need a new copy of the list, is remove_if, which actually removes the elements from the original container.
I think Boost.Range deserves a mention too. The resulting code is pretty close to the original:
#include <boost/range/adaptors.hpp>
// ...
using boost::adaptors::filtered;
auto filteredElements = elements | filtered([](decltype(elements)::value_type const& elm)
{ return elm.filterProperty == true; });
The only downside is having to explicitly declare the lambda's parameter type. I used decltype(elements)::value_type because it avoids having to spell out the exact type, and also adds a grain of genericity. Alternatively, with C++14's polymorphic lambdas, the type could be simply specified as auto:
auto filteredElements = elements | filtered([](auto const& elm)
{ return elm.filterProperty == true; });
filteredElements would be a range, suitable for traversal, but it's basically a view of the original container. If what you need is another container filled with copies of the elements satisfying the criteria (so that it's independent from the lifetime of the original container), it could look like:
using std::back_inserter; using boost::copy; using boost::adaptors::filtered;
decltype(elements) filteredElements;
copy(elements | filtered([](decltype(elements)::value_type const& elm)
{ return elm.filterProperty == true; }), back_inserter(filteredElements));
Improved pjm code following underscore-d suggestions:
template <typename Cont, typename Pred>
Cont filter(const Cont &container, Pred predicate) {
Cont result;
std::copy_if(container.begin(), container.end(), std::back_inserter(result), predicate);
return result;
}
Usage:
std::vector<int> myVec = {1,4,7,8,9,0};
auto filteredVec = filter(myVec, [](int a) { return a > 5; });
My suggestion for C++ equivalent of C#
var filteredElements = elements.Where(elm => elm.filterProperty == true);
Define a template function to which you pass a lambda predicate to do the filtering. The template function returns the filtered result. eg:
template<typename T>
vector<T> select_T(const vector<T>& inVec, function<bool(const T&)> predicate)
{
vector<T> result;
copy_if(inVec.begin(), inVec.end(), back_inserter(result), predicate);
return result;
}
to use - giving a trivial examples:
std::vector<int> mVec = {1,4,7,8,9,0};
// filter out values > 5
auto gtFive = select_T<int>(mVec, [](auto a) {return (a > 5); });
// or > target
int target = 5;
auto gt = select_T<int>(mVec, [target](auto a) {return (a > target); });
Related
I have a third-party function with this signature:
std::vector<T> f(T t);
I also have an existing potentially infinite range (of the range-v3 sort) of T named src. I want to create a pipeline that maps f to all elements of that range and flattens all the vectors into a single range with all their elements.
Instinctively, I would write the following.
auto rng = src | view::transform(f) | view::join;
However, this won't work didn't use to work, because we cannot couldn't create views of temporary containers.
UPDATE: This issue has been patched by this commit.
How does range-v3 support such a range pipeline?
It looks like there are now test cases in the range-v3 library that show how to do this correctly. It is necessary to add the views::cache1 operator into the pipeline:
auto rng = views::iota(0,4)
| views::transform([](int i) {return std::string(i, char('a'+i));})
| views::cache1
| views::join('-');
check_equal(rng, {'-','b','-','c','c','-','d','d','d'});
CPP_assert(input_range<decltype(rng)>);
CPP_assert(!range<const decltype(rng)>);
CPP_assert(!forward_range<decltype(rng)>);
CPP_assert(!common_range<decltype(rng)>);
so the solutions for the OP's question would be to write
auto rng = src | views::transform(f) | views::cache1 | views::join;
range-v3 forbids views over temporary containers to help us avoid the creation of dangling iterators. Your example demonstrates exactly why this rule is necessary in view compositions:
auto rng = src | view::transform(f) | view::join;
If view::join were to store the begin and end iterators of the temporary vector returned by f, they would be invalidated before ever being used.
"That's all great, Casey, but why don't range-v3 views store temporary ranges like this internally?"
Because performance. Much like how the performance of the STL algorithms is predicated on the requirement that iterator operations are O(1), the performance of view compositions is predicated on the requirement that view operations are O(1). If views were to store temporary ranges in internal containers "behind your back" then the complexity of view operations - and hence compositions - would become unpredictable.
"Ok, fine. Given that I understand all of this wonderful design, how do I MAKE THIS WORK?!??"
Since the view composition won't store the temporary ranges for you, you need to dump them into some kind of storage yourself, e.g.:
#include <iostream>
#include <vector>
#include <range/v3/range_for.hpp>
#include <range/v3/utility/functional.hpp>
#include <range/v3/view/iota.hpp>
#include <range/v3/view/join.hpp>
#include <range/v3/view/transform.hpp>
using T = int;
std::vector<T> f(T t) { return std::vector<T>(2, t); }
int main() {
std::vector<T> buffer;
auto store = [&buffer](std::vector<T> data) -> std::vector<T>& {
return buffer = std::move(data);
};
auto rng = ranges::view::ints
| ranges::view::transform(ranges::compose(store, f))
| ranges::view::join;
unsigned count = 0;
RANGES_FOR(auto&& i, rng) {
if (count) std::cout << ' ';
else std::cout << '\n';
count = (count + 1) % 8;
std::cout << i << ',';
}
}
Note that the correctness of this approach depends on the fact that view::join is an input range and therefore single-pass.
"This isn't novice-friendly. Heck, it isn't expert-friendly. Why isn't there some kind of support for 'temporary storage materialization™' in range-v3?"
Because we haven't gotten around to it - patches welcome ;)
I suspect it just can't. None of the views have any machinery to store temporaries anywhere - that's explicitly against the concept of view from the docs:
A view is a lightweight wrapper that presents a view of an underlying sequence of elements in some custom way without mutating or copying it. Views are cheap to create and copy, and have non-owning reference semantics.
So in order for that join to work and outlive the expression, something somewhere has to hold onto those temporaries. That something could be an action. This would work (demo):
auto rng = src | view::transform(f) | action::join;
except obviously not for src being infinite, and even for finite src probably adds too much overhead for you to want to use anyway.
You would probably have to copy/rewrite view::join to instead use some subtly modified version of view::all (required here) that instead of requiring an lvalue container (and returning an iterator pair into it), allowed for an rvalue container that it would store internally (and returning an iterator pair into that stored version). But that's several hundred lines' worth of copying code, so seems pretty unsatisfactory, even if that works.
Edited
Apparently, the code below violates the rule that views cannot own data they refer to. (However, I don't know if it's strictly forbidden to write something like this.)
I use ranges::view_facade to create a custom view. It holds a vector returned by f (one at a time), changing it to a range. This makes it possible to use view::join on a range of such ranges. Certainly, we can't have a random or bidirectional access to elements (but view::join itself degrades a range to an Input range), nor can we assign to them.
I copied struct MyRange from Eric Niebler's repository modifying it slightly.
#include <iostream>
#include <range/v3/all.hpp>
using namespace ranges;
std::vector<int> f(int i) {
return std::vector<int>(static_cast<size_t>(i), i);
}
template<typename T>
struct MyRange: ranges::view_facade<MyRange<T>> {
private:
friend struct ranges::range_access;
std::vector<T> data;
struct cursor {
private:
typename std::vector<T>::const_iterator iter;
public:
cursor() = default;
cursor(typename std::vector<T>::const_iterator it) : iter(it) {}
T const & get() const { return *iter; }
bool equal(cursor const &that) const { return iter == that.iter; }
void next() { ++iter; }
// Don't need those for an InputRange:
// void prev() { --iter; }
// std::ptrdiff_t distance_to(cursor const &that) const { return that.iter - iter; }
// void advance(std::ptrdiff_t n) { iter += n; }
};
cursor begin_cursor() const { return {data.begin()}; }
cursor end_cursor() const { return {data.end()}; }
public:
MyRange() = default;
explicit MyRange(const std::vector<T>& v) : data(v) {}
explicit MyRange(std::vector<T>&& v) noexcept : data (std::move(v)) {}
};
template <typename T>
MyRange<T> to_MyRange(std::vector<T> && v) {
return MyRange<T>(std::forward<std::vector<T>>(v));
}
int main() {
auto src = view::ints(1); // infinite list
auto rng = src | view::transform(f) | view::transform(to_MyRange<int>) | view::join;
for_each(rng | view::take(42), [](int i) {
std::cout << i << ' ';
});
}
// Output:
// 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 9 9 9 9 9 9
Compiled with gcc 5.3.0.
The problem here of course is the whole idea of a view - a non-storing layered lazy evaluator. To keep up with this contract, views have to pass around references to range elements, and in general they can handle both rvalue and lvalue references.
Unfortunately in this specific case view::transform can only provide an rvalue reference as your function f(T t) returns a container by value, and view::join expects an lvalue as it tries to bind views (view::all) to inner containers.
Possible solutions will all introduce some kind of temporary storage somewhere into the pipeline. Here are the options I came up with:
Create a version of view::all that can internally store a container passed by an rvalue reference (As suggested by Barry). From my point of view, this violates the
"non-storing view" conception and also requires some painful template
coding so I would suggest against this option.
Use a temporary container for the whole intermediate state after the view::transform step. Can be done either by hand:
auto rng1 = src | view::transform(f)
vector<vector<T>> temp = rng1;
auto rng = temp | view::join;
Or using action::join. This would result in "premature evaluation", will not work with infinite src, will waste some memory, and overall has a completely different semantics from your original intention, so that is hardly a solution at all, but at least it complies with view class contracts.
Wrap a temporary storage around the function you pass into view::transform. The simpliest example is
const std::vector<T>& f_store(const T& t)
{
static std::vector<T> temp;
temp = f(t);
return temp;
}
and then pass f_store to the view::transform. As f_store returns an lvalue reference, view::join will not complain now.
This of course is somewhat of a hack and will only work if you then streamline the whole range into some sink, like an output container. I believe it will withstand some straightforward transformations, like view::replace or more view::transforms, but anything more complex can try to access this temp storage in non-straightforward order.
In that case other types of storage can be used, e.g. std::map will fix that problem and will still allow infinite src and lazy evaluation at the expense of some memory:
const std::vector<T>& fc(const T& t)
{
static std::map<T, vector<T>> smap;
smap[t] = f(t);
return smap[t];
}
If your f function is stateless, this std::map can also be used to potentially save some calls. This approach can possibly be improved further if there is a way to guarantee that an element will no longer be required and remove it from the std::map to conserve memory. This however depends on further steps of the pipeline and the evaluation.
As these 3 solutions pretty much cover all the places to introduce temporary storage between view::transform and view::join, I think these are all the options you have. I would suggest going with #3 as it will allow you to keep the overall semantics intact and it is quite simple to implement.
UPDATE
range-v3 now has views::cache1, a view that caches the most recent element in the view object itself, and returns a reference to that object. That is how this problem is cleanly and efficiently solved today, as pointed out by user #bradgonesurfing in his answer.
Old, out-of-date answer below, preserved for historical curiosity.
This is another solution that doesn't require much fancy hacking. It comes at the cost of a call to std::make_shared at each call to f. But you're allocating and populating a container in f anyway, so maybe this is an acceptable cost.
#include <range/v3/core.hpp>
#include <range/v3/view/iota.hpp>
#include <range/v3/view/transform.hpp>
#include <range/v3/view/join.hpp>
#include <vector>
#include <iostream>
#include <memory>
std::vector<int> f(int i) {
return std::vector<int>(3u, i);
}
template <class Container>
struct shared_view : ranges::view_interface<shared_view<Container>> {
private:
std::shared_ptr<Container const> ptr_;
public:
shared_view() = default;
explicit shared_view(Container &&c)
: ptr_(std::make_shared<Container const>(std::move(c)))
{}
ranges::range_iterator_t<Container const> begin() const {
return ranges::begin(*ptr_);
}
ranges::range_iterator_t<Container const> end() const {
return ranges::end(*ptr_);
}
};
struct make_shared_view_fn {
template <class Container,
CONCEPT_REQUIRES_(ranges::BoundedRange<Container>())>
shared_view<std::decay_t<Container>> operator()(Container &&c) const {
return shared_view<std::decay_t<Container>>{std::forward<Container>(c)};
}
};
constexpr make_shared_view_fn make_shared_view{};
int main() {
using namespace ranges;
auto rng = view::ints | view::transform(compose(make_shared_view, f)) | view::join;
RANGES_FOR( int i, rng ) {
std::cout << i << '\n';
}
}
I have a third-party function with this signature:
std::vector<T> f(T t);
I also have an existing potentially infinite range (of the range-v3 sort) of T named src. I want to create a pipeline that maps f to all elements of that range and flattens all the vectors into a single range with all their elements.
Instinctively, I would write the following.
auto rng = src | view::transform(f) | view::join;
However, this won't work didn't use to work, because we cannot couldn't create views of temporary containers.
UPDATE: This issue has been patched by this commit.
How does range-v3 support such a range pipeline?
It looks like there are now test cases in the range-v3 library that show how to do this correctly. It is necessary to add the views::cache1 operator into the pipeline:
auto rng = views::iota(0,4)
| views::transform([](int i) {return std::string(i, char('a'+i));})
| views::cache1
| views::join('-');
check_equal(rng, {'-','b','-','c','c','-','d','d','d'});
CPP_assert(input_range<decltype(rng)>);
CPP_assert(!range<const decltype(rng)>);
CPP_assert(!forward_range<decltype(rng)>);
CPP_assert(!common_range<decltype(rng)>);
so the solutions for the OP's question would be to write
auto rng = src | views::transform(f) | views::cache1 | views::join;
range-v3 forbids views over temporary containers to help us avoid the creation of dangling iterators. Your example demonstrates exactly why this rule is necessary in view compositions:
auto rng = src | view::transform(f) | view::join;
If view::join were to store the begin and end iterators of the temporary vector returned by f, they would be invalidated before ever being used.
"That's all great, Casey, but why don't range-v3 views store temporary ranges like this internally?"
Because performance. Much like how the performance of the STL algorithms is predicated on the requirement that iterator operations are O(1), the performance of view compositions is predicated on the requirement that view operations are O(1). If views were to store temporary ranges in internal containers "behind your back" then the complexity of view operations - and hence compositions - would become unpredictable.
"Ok, fine. Given that I understand all of this wonderful design, how do I MAKE THIS WORK?!??"
Since the view composition won't store the temporary ranges for you, you need to dump them into some kind of storage yourself, e.g.:
#include <iostream>
#include <vector>
#include <range/v3/range_for.hpp>
#include <range/v3/utility/functional.hpp>
#include <range/v3/view/iota.hpp>
#include <range/v3/view/join.hpp>
#include <range/v3/view/transform.hpp>
using T = int;
std::vector<T> f(T t) { return std::vector<T>(2, t); }
int main() {
std::vector<T> buffer;
auto store = [&buffer](std::vector<T> data) -> std::vector<T>& {
return buffer = std::move(data);
};
auto rng = ranges::view::ints
| ranges::view::transform(ranges::compose(store, f))
| ranges::view::join;
unsigned count = 0;
RANGES_FOR(auto&& i, rng) {
if (count) std::cout << ' ';
else std::cout << '\n';
count = (count + 1) % 8;
std::cout << i << ',';
}
}
Note that the correctness of this approach depends on the fact that view::join is an input range and therefore single-pass.
"This isn't novice-friendly. Heck, it isn't expert-friendly. Why isn't there some kind of support for 'temporary storage materialization™' in range-v3?"
Because we haven't gotten around to it - patches welcome ;)
I suspect it just can't. None of the views have any machinery to store temporaries anywhere - that's explicitly against the concept of view from the docs:
A view is a lightweight wrapper that presents a view of an underlying sequence of elements in some custom way without mutating or copying it. Views are cheap to create and copy, and have non-owning reference semantics.
So in order for that join to work and outlive the expression, something somewhere has to hold onto those temporaries. That something could be an action. This would work (demo):
auto rng = src | view::transform(f) | action::join;
except obviously not for src being infinite, and even for finite src probably adds too much overhead for you to want to use anyway.
You would probably have to copy/rewrite view::join to instead use some subtly modified version of view::all (required here) that instead of requiring an lvalue container (and returning an iterator pair into it), allowed for an rvalue container that it would store internally (and returning an iterator pair into that stored version). But that's several hundred lines' worth of copying code, so seems pretty unsatisfactory, even if that works.
Edited
Apparently, the code below violates the rule that views cannot own data they refer to. (However, I don't know if it's strictly forbidden to write something like this.)
I use ranges::view_facade to create a custom view. It holds a vector returned by f (one at a time), changing it to a range. This makes it possible to use view::join on a range of such ranges. Certainly, we can't have a random or bidirectional access to elements (but view::join itself degrades a range to an Input range), nor can we assign to them.
I copied struct MyRange from Eric Niebler's repository modifying it slightly.
#include <iostream>
#include <range/v3/all.hpp>
using namespace ranges;
std::vector<int> f(int i) {
return std::vector<int>(static_cast<size_t>(i), i);
}
template<typename T>
struct MyRange: ranges::view_facade<MyRange<T>> {
private:
friend struct ranges::range_access;
std::vector<T> data;
struct cursor {
private:
typename std::vector<T>::const_iterator iter;
public:
cursor() = default;
cursor(typename std::vector<T>::const_iterator it) : iter(it) {}
T const & get() const { return *iter; }
bool equal(cursor const &that) const { return iter == that.iter; }
void next() { ++iter; }
// Don't need those for an InputRange:
// void prev() { --iter; }
// std::ptrdiff_t distance_to(cursor const &that) const { return that.iter - iter; }
// void advance(std::ptrdiff_t n) { iter += n; }
};
cursor begin_cursor() const { return {data.begin()}; }
cursor end_cursor() const { return {data.end()}; }
public:
MyRange() = default;
explicit MyRange(const std::vector<T>& v) : data(v) {}
explicit MyRange(std::vector<T>&& v) noexcept : data (std::move(v)) {}
};
template <typename T>
MyRange<T> to_MyRange(std::vector<T> && v) {
return MyRange<T>(std::forward<std::vector<T>>(v));
}
int main() {
auto src = view::ints(1); // infinite list
auto rng = src | view::transform(f) | view::transform(to_MyRange<int>) | view::join;
for_each(rng | view::take(42), [](int i) {
std::cout << i << ' ';
});
}
// Output:
// 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 9 9 9 9 9 9
Compiled with gcc 5.3.0.
The problem here of course is the whole idea of a view - a non-storing layered lazy evaluator. To keep up with this contract, views have to pass around references to range elements, and in general they can handle both rvalue and lvalue references.
Unfortunately in this specific case view::transform can only provide an rvalue reference as your function f(T t) returns a container by value, and view::join expects an lvalue as it tries to bind views (view::all) to inner containers.
Possible solutions will all introduce some kind of temporary storage somewhere into the pipeline. Here are the options I came up with:
Create a version of view::all that can internally store a container passed by an rvalue reference (As suggested by Barry). From my point of view, this violates the
"non-storing view" conception and also requires some painful template
coding so I would suggest against this option.
Use a temporary container for the whole intermediate state after the view::transform step. Can be done either by hand:
auto rng1 = src | view::transform(f)
vector<vector<T>> temp = rng1;
auto rng = temp | view::join;
Or using action::join. This would result in "premature evaluation", will not work with infinite src, will waste some memory, and overall has a completely different semantics from your original intention, so that is hardly a solution at all, but at least it complies with view class contracts.
Wrap a temporary storage around the function you pass into view::transform. The simpliest example is
const std::vector<T>& f_store(const T& t)
{
static std::vector<T> temp;
temp = f(t);
return temp;
}
and then pass f_store to the view::transform. As f_store returns an lvalue reference, view::join will not complain now.
This of course is somewhat of a hack and will only work if you then streamline the whole range into some sink, like an output container. I believe it will withstand some straightforward transformations, like view::replace or more view::transforms, but anything more complex can try to access this temp storage in non-straightforward order.
In that case other types of storage can be used, e.g. std::map will fix that problem and will still allow infinite src and lazy evaluation at the expense of some memory:
const std::vector<T>& fc(const T& t)
{
static std::map<T, vector<T>> smap;
smap[t] = f(t);
return smap[t];
}
If your f function is stateless, this std::map can also be used to potentially save some calls. This approach can possibly be improved further if there is a way to guarantee that an element will no longer be required and remove it from the std::map to conserve memory. This however depends on further steps of the pipeline and the evaluation.
As these 3 solutions pretty much cover all the places to introduce temporary storage between view::transform and view::join, I think these are all the options you have. I would suggest going with #3 as it will allow you to keep the overall semantics intact and it is quite simple to implement.
UPDATE
range-v3 now has views::cache1, a view that caches the most recent element in the view object itself, and returns a reference to that object. That is how this problem is cleanly and efficiently solved today, as pointed out by user #bradgonesurfing in his answer.
Old, out-of-date answer below, preserved for historical curiosity.
This is another solution that doesn't require much fancy hacking. It comes at the cost of a call to std::make_shared at each call to f. But you're allocating and populating a container in f anyway, so maybe this is an acceptable cost.
#include <range/v3/core.hpp>
#include <range/v3/view/iota.hpp>
#include <range/v3/view/transform.hpp>
#include <range/v3/view/join.hpp>
#include <vector>
#include <iostream>
#include <memory>
std::vector<int> f(int i) {
return std::vector<int>(3u, i);
}
template <class Container>
struct shared_view : ranges::view_interface<shared_view<Container>> {
private:
std::shared_ptr<Container const> ptr_;
public:
shared_view() = default;
explicit shared_view(Container &&c)
: ptr_(std::make_shared<Container const>(std::move(c)))
{}
ranges::range_iterator_t<Container const> begin() const {
return ranges::begin(*ptr_);
}
ranges::range_iterator_t<Container const> end() const {
return ranges::end(*ptr_);
}
};
struct make_shared_view_fn {
template <class Container,
CONCEPT_REQUIRES_(ranges::BoundedRange<Container>())>
shared_view<std::decay_t<Container>> operator()(Container &&c) const {
return shared_view<std::decay_t<Container>>{std::forward<Container>(c)};
}
};
constexpr make_shared_view_fn make_shared_view{};
int main() {
using namespace ranges;
auto rng = view::ints | view::transform(compose(make_shared_view, f)) | view::join;
RANGES_FOR( int i, rng ) {
std::cout << i << '\n';
}
}
I am trying below code using ranges but it doesn't working.
// Code
std::map<int, std::string> m{ {1,"foo"},{42,"bar"},{7,"baz"} };
std::vector<int> keys;
// without using ranges
std::transform(begin(m), end(m), std::back_inserter(keys), [](auto val)
{
return val.first;
});
which is working fine. But,
// with using ranges
ranges::transform(m,std::back_inserter(keys), [](auto val)
{
return val.first;
});
it is not working with ranges??
I am using MSVC 2017 15.9.14
The range-v3 doesn't support std::back_insert_iterator, because it doesn't satisfy the library Iterator concept, see this issue. As it's pointed out in the discussion, this is supposed to be fixed with C++20.
You can fix this by either
keys.resize(3);
ranges::transform(m, keys.begin(), [](auto val) { return val.first; });
or, in my opinion preferable (as you can make keys const):
const std::vector<int> keys = m |
ranges::view::transform([](auto val){ return val.first; });
As a side note, consider passing the lambda parameter as a const-qualified reference to avoid unnecessary copies.
I have some C++11 code like
std::vector<std::string> names;
std::map<std::string, std::string> first_to_last_name_map;
std::transform(names.begin(), names.end(), std::inserter(first_to_last_name_map, first_to_last_name_map.begin()), [](const std::string& i){
if (i == "bad")
return std::pair<std::string, std::string>("bad", "bad"); // Don't Want This
else
return std::pair<std::string, std::string>(i.substr(0,5), i.substr(5,5));
});
where I'm transforming a vector to a map using std::transform with a lambda function. My problem is that sometimes, as shown, I don't want to return anything from my lambda function, i.e. I basically want to skip that i and go to the next one (without adding anything to the map).
Is there any way to achieve what I'm thinking about? I can use boost if it helps. I want to avoid a solution where I have to do a pre-process or post-process on my vector to filter out the "bad" items; I should only need to look at each item once. Also, my actual logic is a bit more complicated than the if/else as written, so I think it would be nice to keep things encapsulated in this std::transform/lambda model if possible (though maybe what I'm trying to achieve isn't possible with this model).
EDIT: Just to emphasize, I'm looking to perform this operation (selectively processing vector elements and inserting them into a map) in the most efficient way possible, even if it means a less elegant solution or a big rewrite. I could even use a different map data type depending on what is most efficient.
template<class Src, class Sink, class F>
void transform_if(Src&& src, Sink&& sink, F&& f){
for(auto&& x:std::forward<Src>(src))
if(auto&& e=f(decltype(x)(x)))
*sink++ = *decltype(e)(e);
}
Now simply get a boost or std or std experiental optional. Have your f return an optional<blah>.
auto sink = std::inserter(first_to_last_name_map, first_to_last_name_map.begin());
using pair_type = decltype(first_to_last_name_map)::value_type;
transform_if(names, sink,
[](const std::string& i)->std::optional<pair_type>{
if (i == "bad")
return {}; // Don't Want This
else
return std::make_pair(i.substr(0,5), i.substr(5,5));
}
);
My personal preferred optional actually has begin end defined. And we get this algorithm:
template<class Src, class Sink, class F>
void polymap(Src&& src, Sink&& sink, F&& f){
for(auto&& x:std::forward<Src>(src))
for(auto&& e:f(decltype(x)(x)))
*sink++ = decltype(e)(e);
}
which now lets the f return a range, where optional is a model of a zero or one element range.
You can simply have a first/last pass with std::remove_if. E.g.
std::vector<std::string> names;
std::map<std::string, std::string> first_to_last_name_map;
std::transform(names.begin(),
std::remove_if(names.begin(),
names.end(),
[](const std::string &str){
return str=="bad";
}),
std::inserter(first_to_last_name_map,
first_to_last_name_map.begin()),
[](const std::string& i){
return std::pair<std::string, std::string>(i.substr(0,5), i.substr(5,5));
});
Note that remove_if simply shifts the removed items past the iterator it returns.
You can use boost::adaptors::filtered to first filter the vector of the elements you don't want, before passing it to transform.
using boost::adaptors::filtered;
boost::transform(names | filtered([](std::string const& s) { return s != "bad"; }),
std::inserter(first_to_last_name_map, first_to_last_name_map.begin()),
[](std::string const& i) { return std::make_pair(i.substr(0,5), i.substr(5,5)); });
Live demo
Suppose I have the following two data structures:
std::vector<int> all_items;
std::set<int> bad_items;
The all_items vector contains all known items and the bad_items vector contains a list of bad items. These two data structures are populated entirely independent of one another.
What's the proper way to write a method that will return a std::vector<int> contain all elements of all_items not in bad_items?
Currently, I have a clunky solution that I think can be done more concisely. My understanding of STL function adapters is lacking. Hence the question. My current solution is:
struct is_item_bad {
std::set<int> const* bad_items;
bool operator() (int const i) const {
return bad_items.count(i) > 0;
}
};
std::vector<int> items() const {
is_item_bad iib = { &bad_items; };
std::vector<int> good_items(all_items.size());
std::remove_copy_if(all_items.begin(), all_items.end(),
good_items.begin(), is_item_bad);
return good_items;
}
Assume all_items, bad_items, is_item_bad and items() are all a part of some containing class. Is there a way to write them items() getter such that:
It doesn't need temporary variables in the method?
It doesn't need the custom functor, struct is_item_bad?
I had hoped to just use the count method on std::set as a functor, but I haven't been able to divine the right way to express that w/ the remove_copy_if algorithm.
EDIT: Fixed the logic error in items(). The actual code didn't have the problem, it was a transcription error.
EDIT: I have accepted a solution that doesn't use std::set_difference since it is more general and will work even if the std::vector isn't sorted. I chose to use the C++0x lambda expression syntax in my code. My final items() method looks like this:
std::vector<int> items() const {
std::vector<int> good_items;
good_items.reserve(all_items.size());
std::remove_copy_if(all_items.begin(), all_items.end(),
std::back_inserter(good_items),
[&bad_items] (int const i) {
return bad_items.count(i) == 1;
});
}
On a vector of about 8 million items the above method runs in 3.1s. I bench marked the std::set_difference approach and it ran in approximately 2.1s. Thanks to everyone who supplied great answers.
As jeffamaphone suggested, if you can sort any input vectors, you can use std::set_difference which is efficient and less code:
#include <algorithm>
#include <set>
#include <vector>
std::vector<int>
get_good_items( std::vector<int> const & all_items,
std::set<int> const & bad_items )
{
std::vector<int> good_items;
// Assumes all_items is sorted.
std::set_difference( all_items.begin(),
all_items.end(),
bad_items.begin(),
bad_items.end(),
std::back_inserter( good_items ) );
return good_items;
}
Since your function is going to return a vector, you will have to make a new vector (i.e. copy elements) in any case. In which case, std::remove_copy_if is fine, but you should use it correctly:
#include <iostream>
#include <vector>
#include <set>
#include <iterator>
#include <algorithm>
#include <functional>
std::vector<int> filter(const std::vector<int>& all, const std::set<int>& bad)
{
std::vector<int> result;
remove_copy_if(all.begin(), all.end(), back_inserter(result),
[&bad](int i){return bad.count(i)==1;});
return result;
}
int main()
{
std::vector<int> all_items = {4,5,2,3,4,8,7,56,4,2,2,2,3};
std::set<int> bad_items = {2,8,4};
std::vector<int> filtered_items = filter(all_items, bad_items);
copy(filtered_items.begin(), filtered_items.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
}
To do this in C++98, I guess you could use mem_fun_ref and bind1st to turn set::count into a functor in-line, but there are issues with that (which resulted in deprecation of bind1st in C++0x) which means depending on your compiler, you might end up using std::tr1::bind anyway:
remove_copy_if(all.begin(), all.end(), back_inserter(result),
bind(&std::set<int>::count, bad, std::tr1::placeholders::_1)); // or std::placeholders in C++0x
and in any case, an explicit function object would be more readable, I think:
struct IsMemberOf {
const std::set<int>& bad;
IsMemberOf(const std::set<int>& b) : bad(b) {}
bool operator()(int i) const { return bad.count(i)==1;}
};
std::vector<int> filter(const std::vector<int>& all, const std::set<int>& bad)
{
std::vector<int> result;
remove_copy_if(all.begin(), all.end(), back_inserter(result), IsMemberOf(bad));
return result;
}
At the risk of appearing archaic:
std::set<int> badItems;
std::vector<int> items;
std::vector<int> goodItems;
for ( std::vector<int>::iterator iter = items.begin();
iter != items.end();
++iter)
{
int& item = *iter;
if ( badItems.find(item) == badItems.end() )
{
goodItems.push_back(item);
}
}
std::remove_copy_if returns an iterator to the target collection. In this case, it would return good_items.end() (or something similar). good_items goes out of scope at the end of the method, so this would cause some memory errors. You should return good_items or pass in a new vector<int> by reference and then clear, resize, and populate it. This would get rid of the temporary variable.
I believe you have to define the custom functor because the method depends on the object bad_items which you couldn't specify without it getting hackey AFAIK.