How to make this matching algorithm run faster? - c++

I have two lists of pointers to a data structure X, the algorithm is very simple:
It loops over the first list A and try to find the the first matching element in list B. The requirement is to have at least 50k elements in each list:
#include <iostream>
#include <memory>
#include <chrono>
#include <vector>
#include <algorithm>
#include <string>
struct X {
std::string field_1;
std::string field_2;
std::string field_3;
std::string field_4;
X(std::string f1, std::string f2, std::string f3, std::string f4)
: field_1(f1)
, field_2(f2)
, field_3(f3)
, field_4(f4)
{};
bool equal(const std::shared_ptr<X>& x) {
return (x->field_1 == field_1) &&
(x->field_2 == field_2) &&
(x->field_3 == field_3) &&
(x->field_4 == field_4);
};
X *match = nullptr;
};
typedef std::shared_ptr<X> X_ptr;
class Timer
{
public:
Timer(std::string name) : beg_(clock_::now()), name_(name) {}
~Timer() {
std::cout << "Elapsed(" << name_ << "): " << elapsed() << std::endl;
}
void reset() { beg_ = clock_::now(); }
double elapsed() const {
return std::chrono::duration_cast<second_>
(clock_::now() - beg_).count();
}
private:
typedef std::chrono::high_resolution_clock clock_;
typedef std::chrono::duration<double, std::ratio<1> > second_;
std::chrono::time_point<clock_> beg_;
std::string name_;
};
std::string random_string(size_t length)
{
auto randchar = []() -> char
{
const char charset[] =
"0123456789"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ";
const size_t max_index = (sizeof(charset) - 1);
return charset[rand() % max_index];
};
std::string str(length, 0);
std::generate_n(str.begin(), length, randchar);
return str;
}
int main()
{
Timer t("main");
std::vector <X_ptr> list_A;
std::vector <X_ptr> list_B;
const int MAX_ELEM = 50000;
list_A.reserve(MAX_ELEM);
list_B.reserve(MAX_ELEM);
{
Timer t("insert");
for (int i = 0; i < MAX_ELEM; i++) {
list_A.push_back(X_ptr(new X{ random_string(2), random_string(2), random_string(2), random_string(2) }));
list_B.push_back(X_ptr(new X{ random_string(2), random_string(2), random_string(2), random_string(2) }));
}
}
{
Timer t("match");
std::for_each(list_A.begin(), list_A.end(), [list_B](X_ptr& a) {
auto found_b = std::find_if(list_B.begin(), list_B.end(), [a](const X_ptr& b) {
return a->equal(b);
});
if (found_b != list_B.end()) {
a->match = found_b->get();
std::cout << "match OK \n";
}
});
}
}
on my machine the program is running extremly slow:
Elapsed(insert): 0.05566
Elapsed(match): 98.3739
Elapsed(main): 98.452
Would appreciate it if you can think of any other way to optimize it to run faster.

You are using vectors so each lookup into list_B takes O(n), where n is the number of elements in B. This means the total algorithm is O(m*n), if m is the number of elements in list_A. Thus if m and n a similar in size, you have a O(n^2) algorithm. That is too slow for any large n. To fix this, convert list_B into a unordered_map, (you can do this as part of this algorithm as the conversion is O(n)) where an element in the map's key is an element from list B and the value anything, say 0. You can then perform lookups into the map in O(1) time using find() on the map. Thus your algorithm becomes O(n), way better that O(n^2).
For example
std::unordered_map< X_ptr, int > value_map;
Time r t("match");
std::for_each(list_B.begin(), list_B.end(), [&](X_ptr& b) {
value_map[b] = 0;
});
std::for_each(list_A.begin(), list_A.end(), [value_map](X_ptr& a) {
auto found_b = value_map.find( a );
if ( found_b != value_map.end() )
{
a->match = found_b->first.get();
std::cout << "match OK \n";
}
});
}
Your Version:
Elapsed(insert): 0.0758608
Elapsed(match): 182.899
Elapsed(main): 182.991
New Version:
Elapsed(insert): 0.0719907
Elapsed(match): 0.0388562
Elapsed(main): 0.130884

You may use something like the following:
std::sort(list_B.begin(), list_B.end(), deref_less<X>);
{
Timer t("match");
for (const auto& a : list_A) {
auto it = std::lower_bound(list_B.begin(), list_B.end(), a, deref_less<X>);
if (it != list_B.end() && **it == *a) {
a->match = it->get();
std::cout << "match OK \n";
}
}
}
Live example.

Related

Is there any equivalent of Python range() in C++?

I want to use std::for_each to iterate over vector indexes in range [a, b) in parallel, calculate the value of the Weierstrass function and write it to the std::vector:
std::vector<std::array<float, 2>> values(1000);
auto range = /** equivalent of Pyhthon range(0, values.size()) **/;
std::for_each(std::execution::par, range.begin(), range.end(), [&](auto &&i) {
values[i][0] = static_cast<float>(i) / resolution;
values[i][1] = weierstrass(a, b, static_cast<float>(i) / resolution);
});
// a, b, and resolution are some constants defined before
// weierstrass() is the Weierstrass function
I have found some solutions in the internet, but all of them requires to include some third-party libraries or create my own range class. Is there any standard solution for this?
You can use std::views::iota(), its use is similar (but a bit different) to Python's range(). With help of std::ranges::for_each(). Both are available in C++20.
Try it online!
#include <algorithm>
#include <ranges>
#include <iostream>
int main() {
std::ranges::for_each(std::views::iota(1, 10), [](int i) {
std::cout << i << ' ';
});
}
Output:
1 2 3 4 5 6 7 8 9
As noted by #Afshin, in code mentioned above std::ranges::for_each() doesn't support std::execution::par for multi-threaded execution.
To overcome this issue you may use iota with regular std::for_each() as following:
Try it online!
#include <algorithm>
#include <ranges>
#include <iostream>
#include <execution>
int main() {
auto range = std::views::iota(1, 10);
std::for_each(std::execution::par, range.begin(), range.end(),
[](int i) {
std::cout << i << ' ';
});
}
Output:
1 2 3 4 5 6 7 8 9
I decided to implement Range class plus iterator from scratch, according to how it works in Python's range().
Similar to Python you can use it three ways: Range(stop), Range(start, stop), Range(start, stop, step). All three support any negative value.
To test correctness of implementation I filled two unordered sets, one containing all generated values, another containing all used thread ids (to show that it actually used multi-core CPU execution).
Although I marked my iterator as random access type, still it is missing some methods like -= or -- operators, these extra methods are for further improvements. But for usage of std::for_each() it has enough methods.
If I made some mistakes of implementation please add comments to my answer with explanation.
Try it online!
#include <limits>
#include <execution>
#include <algorithm>
#include <iostream>
#include <iterator>
#include <thread>
#include <unordered_set>
#include <string>
#include <sstream>
#include <mutex>
class Range {
public:
Range(ptrdiff_t start_stop, ptrdiff_t stop =
std::numeric_limits<ptrdiff_t>::max(), ptrdiff_t step = 1)
: step_(step) {
if (stop == std::numeric_limits<ptrdiff_t>::max()) {
start_ = 0;
stop_ = start_stop;
} else {
start_ = start_stop;
stop_ = stop;
}
if (step_ >= 0)
stop_ = std::max(start_, stop_);
else
stop_ = std::min(start_, stop_);
if (step_ >= 0)
stop_ = start_ + (stop_ - start_ + step_ - 1) / step_ * step_;
else
stop_ = start_ - (start_ - stop_ + step_ - 1) / (-step_) * (-step_);
}
class RangeIter {
public:
using iterator_category = std::random_access_iterator_tag;
using value_type = ptrdiff_t;
using difference_type = ptrdiff_t;
using pointer = ptrdiff_t const *;
using reference = ptrdiff_t const &;
RangeIter() {}
RangeIter(ptrdiff_t start, ptrdiff_t stop, ptrdiff_t step)
: cur_(start), stop_(stop), step_(step) {}
RangeIter & operator += (ptrdiff_t steps) {
cur_ += step_ * steps;
if (step_ >= 0)
cur_ = std::min(cur_, stop_);
else
cur_ = std::max(cur_, stop_);
return *this;
}
RangeIter operator + (ptrdiff_t steps) const {
auto it = *this;
it += steps;
return it;
}
ptrdiff_t operator [] (ptrdiff_t steps) const {
auto it = *this;
it += steps;
return *it;
}
ptrdiff_t operator - (RangeIter const & other) const {
return (cur_ - other.cur_) / step_;
}
RangeIter & operator ++ () {
*this += 1;
return *this;
}
ptrdiff_t const & operator * () const {
return cur_;
}
bool operator == (RangeIter const & other) const {
return cur_ == other.cur_;
}
bool operator != (RangeIter const & other) const {
return !(*this == other);
}
ptrdiff_t cur_ = 0, stop_ = 0, step_ = 0;
};
auto begin() const { return RangeIter(start_, stop_, step_); }
auto end() const { return RangeIter(stop_, stop_, step_); }
private:
ptrdiff_t start_ = 0, stop_ = 0, step_ = 0;
};
int main() {
ptrdiff_t start = 1, stop = 1000000, step = 2;
std::mutex mutex;
std::unordered_set<std::string> threads;
std::unordered_set<ptrdiff_t> values;
auto range = Range(start, stop, step);
std::for_each(std::execution::par, range.begin(), range.end(),
[&](int i) {
std::unique_lock<std::mutex> lock(mutex);
std::ostringstream ss;
ss << std::this_thread::get_id();
threads.insert(ss.str());
values.insert(i);
});
std::cout << "Threads:" << std::endl;
for (auto const & s: threads)
std::cout << s << std::endl;
{
bool correct = true;
size_t cnt = 0;
for (ptrdiff_t i = start; i < stop; i += step) {
++cnt;
if (!values.count(i)) {
correct = false;
std::cout << "No value: " << i << std::endl;
break;
}
}
if (values.size() != cnt)
std::cout << "Expected amount of values: " << cnt
<< ", actual " << values.size() << std::endl;
std::cout << "Correct values: " << std::boolalpha
<< (correct && (values.size() == cnt)) << std::endl;
}
}
Output:
Threads:
1628
9628
5408
2136
2168
8636
2880
6492
1100
Correct values: true
If the problem is in creating range similar to python's range() you can look through https://en.cppreference.com/w/cpp/iterator/iterator and use it's example:
#include <iostream>
#include <algorithm>
template<long FROM, long TO>
class Range {
public:
// member typedefs provided through inheriting from std::iterator
class iterator: public std::iterator<
std::input_iterator_tag, // iterator_category
long, // value_type
long, // difference_type
const long*, // pointer
long // reference
>{
long num = FROM;
public:
explicit iterator(long _num = 0) : num(_num) {}
iterator& operator++() {num = TO >= FROM ? num + 1: num - 1; return *this;}
iterator operator++(int) {iterator retval = *this; ++(*this); return retval;}
bool operator==(iterator other) const {return num == other.num;}
bool operator!=(iterator other) const {return !(*this == other);}
reference operator*() const {return num;}
};
iterator begin() {return iterator(FROM);}
iterator end() {return iterator(TO >= FROM? TO+1 : TO-1);}
};
int main() {
// std::find requires an input iterator
auto range = Range<15, 25>();
auto itr = std::find(range.begin(), range.end(), 18);
std::cout << *itr << '\n'; // 18
// Range::iterator also satisfies range-based for requirements
for(long l : Range<3, 5>()) {
std::cout << l << ' '; // 3 4 5
}
std::cout << '\n';
}
Just as an alternative, you could make each work package carry the necessary information by adding the index you need.
Example:
std::vector<std::pair<size_t, std::array<float, 2>>> values(1000);
for(size_t i = 0; i < values.size(); ++i) values[i].first = i;
std::for_each(std::execution::par, values.begin(), values.end(),
[resolution](auto& p) {
p.second[0] = static_cast<float>(p.first) / resolution;
p.second[1] = weierstrass(a, b, static_cast<float>(p.first) / resolution);
});
Not using indexing on values inside the threaded part like above may prevent false sharing and improve performance. You could also make each work package aligned to prevent false sharing to see if that has an effect on performance.
#include <new>
struct alignas(std::hardware_destructive_interference_size) workpackage {
size_t index;
std::array<float, 2> arr;
};
std::vector<workpackage> values(1000);
for(size_t i = 0; i < values.size(); ++i) values[i].index = i;
std::for_each(std::execution::par, values.begin(), values.end(),
[resolution](auto& wp) {
wp.arr[0] = static_cast<float>(wp.index) / resolution;
wp.arr[1] = weierstrass(a, b, static_cast<float>(wp.index) / resolution);
});
You can write your code in another way and drop any need for range at all like this:
std::vector<std::array<float, 2>> values(1000);
std::for_each(std::execution::par, values.begin(), values.end(), [&](std::array<float, 2>& val) {
auto i = std::distance(&values[0], &val);
val[0] = static_cast<float>(i) / resolution;
val[1] = weierstrass(a, b, static_cast<float>(i) / resolution);
});
I should say that this code is valid if and only if you are using std::for_each, because it is stated that:
Unlike the rest of the parallel algorithms, std::for_each is not allowed to make copies of the elements in the sequence even if they are trivially copyable.

Get sub-map from std::map by number of elements instead of key using iterator

I have a std::map<std::string, std::vector<std::string>> and I need to perform a threaded task on this map by dividing the map into sub-maps and passing each sub-map to a thread.
With a std::vector<T> I would be able to get a sub-vector pretty easy, by doing this:
#include <vector>
#include <string>
int main(void)
{
size_t off = 0;
size_t num_elms = 100; // Made up value
std::vector<uint8_t> full; // Assume filled with stuff
std::vector<uin8t_t> sub(std::begin(full) + off, std::begin(full) + off + num_elms);
off = off + num_elms;
}
However, doing the same with std::map<T1, T2> gives a compilation error.
#include <vector>
#include <map>
#include <string>
int main(void)
{
size_t off = 0;
size_t num_elms = 100;
std::map<std::string, std::vector<std::string>> full;
std::map<std::string, std::vector<std::string>> sub(std::begin(full) + off,
std::begin(full) + off + num_elms);
off = off + num_elms;
}
It is the same with other std::map "types". Which, from what I have gathered, is down to the iterator.
What is possible is to extract the keys and do something similar to this solution:
#include <map>
#include <vector>
#include <string>
#include <iostream>
void print_map(const std::map<std::string, std::vector<std::string>>& _map)
{
for (const auto& [key, value] : _map)
{
std::cout << "key: " << key << "\nvalues\n";
for (const auto& elm : value)
{
std::cout << "\t" << elm << "\n";
}
}
}
void print_keys(const std::vector<std::string>& keys)
{
std::cout << "keys: \n";
for(const auto& key : keys)
{
std::cout << key << "\n";
}
}
int main(void)
{
std::map<std::string, std::vector<std::string>> full;
full["aa"] = {"aa", "aaaa", "aabb"};
full["bb"] = {"bb", "bbbbb", "bbaa"};
full["cc"] = {"cc", "cccc", "ccbb"};
full["dd"] = {"dd", "dd", "ddcc"};
print_map(full);
std::vector<std::string> keys;
for (const auto& [key, value] : full)
{
(void) value;
keys.emplace_back(key);
}
print_keys(keys);
size_t off = 0;
size_t num_elms = 2;
std::map<std::string, std::vector<std::string>> sub1 (full.find(keys.at(off)), full.find(keys.at(off + num_elms)));
off = off + num_elms;
std::map<std::string, std::vector<std::string>> sub2 (full.find(keys.at(off)), full.find(keys.at(off + num_elms -1)));
std::cout << "sub1:\n";
print_map(sub1);
std::cout << "sub2:\n";
print_map(sub2);
}
However, this has the potential to be extremely inefficient, as the map can be really big (10k+ elements).
So, is there a better way to replicate the std::vector approach with std::map?
A slightly different approach would be to use one of the execution policies added in C++17, like std::execution::parallel_policy. In the example below, the instance std::execution::par is used:
#include <execution>
// ...
std::for_each(std::execution::par, full.begin(), full.end(), [](auto& p) {
// Here you are likely using a thread from a built-in thread pool
auto& vec = p.second;
// do work with "vec"
});
With a slight adaption, you can reasonably easily pass ranges to print_map, and divide up your map by calling std::next on an iterator.
// Minimal range-for support
template <typename Iter>
struct Range {
Range (Iter b, Iter e) : b(b), e(e) {}
Iter b;
Iter e;
Iter begin() const { return b; }
Iter end() const { return e; }
};
// some shorter aliases
using Map = std::map<std::string, std::vector<std::string>>;
using MapView = Range<Map::const_iterator>;
// not necessarily the whole map
void print_map(MapView map) {
for (const auto& [key, value] : map)
{
std::cout << "key: " << key << "\nvalues\n";
for (const auto& elm : value)
{
std::cout << "\t" << elm << "\n";
}
}
}
int main(void)
{
Map full;
full["aa"] = {"aa", "aaaa", "aabb"};
full["bb"] = {"bb", "bbbbb", "bbaa"};
full["cc"] = {"cc", "cccc", "ccbb"};
full["dd"] = {"dd", "dd", "ddcc"};
// can still print the whole map
print_map({ map.begin(), map.end() });
size_t num_elms = 2;
size_t num_full_views = full.size() / num_elms;
std::vector<MapView> views;
auto it = full.begin();
for (size_t i = 0; i < num_full_views; ++i) {
auto next = std::next(it, num_elms);
views.emplace_back(it, next);
it = next;
}
if (it != full.end()) {
views.emplace_back(it, full.end());
}
for (auto view : views) {
print_map(view);
}
}
In C++20 (or with another ranges library), this can be simplified with std::ranges::drop_view / std::ranges::take_view.
using MapView = decltype(std::declval<Map>() | std::ranges::views::drop(0) | std::ranges::views::take(0));
for (size_t i = 0; i < map.size(); i += num_elms) {
views.push_back(map | std::ranges::views::drop(i) | std::ranges::views::take(num_elms));
}

Find element in boost multi_index_container

In my code I need to have a functionality to iterate over all elements and check if there some element already exists possibly as soon as possible, so my choice fell on boost multi index container where I can use vector and unordered_set interface for my class Animal at the same time. The problem is that I am not able to find some element through unordered_set interface since I replaced key from std::string to std::array<char, 50> and adjusted the code, and I don't know what I am doing wrong ?
code:
https://wandbox.org/permlink/dnCaEzYVdXkTFBGo
#include <array>
#include <algorithm>
#include <iostream>
#include <chrono>
#include <string>
#include <vector>
#include <list>
#include <map>
#include <set>
#include <unordered_map>
#include <unordered_set>
#include <memory>
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/composite_key.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/sequenced_index.hpp>
#include <boost/multi_index/random_access_index.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/multi_index/identity.hpp>
int constexpr elements_size{ 1'000'000 };
struct Animal
{
Animal(std::string name, std::string description, int leg, int age, double maxSpeed) noexcept :
description_{std::move(description)}, leg_{leg}, age_{age}, maxSpeed_{maxSpeed}
{
std::copy(name.begin(), name.end(), name_.data());
}
Animal(std::string const& name, std::string const& description) noexcept :
description_{description}
{
std::copy(name.begin(), name.end(), name_.data());
}
Animal(Animal&& animal) noexcept
{
name_ = name_;
description_ = std::move(animal).description_;
leg_ = animal.leg_;
age_ = animal.age_;
maxSpeed_ = animal.maxSpeed_;
}
Animal(Animal const& animal) noexcept
{
name_ = animal.name_;
description_ = animal.description_;
leg_ = animal.leg_;
age_ = animal.age_;
maxSpeed_ = animal.maxSpeed_;
}
Animal& operator=(Animal&& animal) noexcept
{
name_ = name_;
description_ = std::move(animal).description_;
leg_ = animal.leg_;
age_ = animal.age_;
maxSpeed_ = animal.maxSpeed_;
return *this;
}
Animal& operator=(Animal const& animal) noexcept
{
name_ = animal.name_;
description_ = animal.description_;
leg_ = animal.leg_;
age_ = animal.age_;
maxSpeed_ = animal.maxSpeed_;
return *this;
}
std::array<char, 50> name_;
std::string description_;
int leg_{0};
int age_{0};
double maxSpeed_{0.0};
};
struct Hasher
{
bool print_;
Hasher(bool print = false): print_{print} {}
std::size_t operator()(std::array<char, 50> const& name) const
{
if (print_)
std::cout << "array hash" << std::hash<std::string_view>{}(name.data()) << std::endl;
return std::hash<std::string_view>{}(name.data());
}
std::size_t operator()(std::string const& name) const
{
if (print_)
std::cout << "string hash" << std::hash<std::string_view>{}(name.c_str()) << std::endl;
return std::hash<std::string_view>{}(name.c_str());
}
std::size_t operator()(const char* name) const
{
if (print_)
std::cout << "char hash" << std::hash<std::string_view>{}(name) << std::endl;
return std::hash<std::string_view>{}(name);
}
};
struct KeysComparator
{
bool operator()(std::array<char, 50> const& a1, std::array<char, 50> const& a2) const {return a1 == a2; }
template <typename T>
bool operator()(std::string const& n1, T const& t) const
{
std::cout << "### value.name_" << t.value.name_.data() << ", n1: " << n1 << std::endl;
return n1 == t.value.name_.data();
}
};
template<typename TimePoint>
std::string getElapsedTime(TimePoint const& start, TimePoint const& end)
{
auto micro = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
auto milli = std::chrono::duration_cast<std::chrono::milliseconds>(micro);
auto sec = std::chrono::duration_cast<std::chrono::seconds>(milli);
return {std::to_string(micro.count()) + " µs, " + std::to_string(milli.count()) + " ms, " + std::to_string(sec.count()) + " s"};
}
template<typename TimePoint>
void printStatistics(TimePoint const& emplace_start, TimePoint const& emplace_end, TimePoint const& iterate_start, TimePoint const& iterate_end,
TimePoint const& find_start, TimePoint const& find_end, intmax_t const sum, std::string target)
{
std::cout << "Elapsed time emplace: " << getElapsedTime(emplace_start, emplace_end)
<< " | iterate: " << getElapsedTime(iterate_start, iterate_end)
<< " | find: " << getElapsedTime(find_start, find_end)
<< ", sum:" << sum << " , calculation for " << target << std::endl;
}
void test()
{
using namespace boost::multi_index;
using Animal_multi = multi_index_container<Animal, indexed_by<
random_access<>,
hashed_unique<
composite_key<Animal, member<Animal, std::array<char, 50>, &Animal::name_>>,
composite_key_hash<Hasher>,
composite_key_equal_to<KeysComparator>>
>>;
Animal_multi container;
auto emplace_start = std::chrono::steady_clock::now();
for (auto i = 0; i < elements_size; ++i)
container.emplace_back("the really long name of some animal 12345678910_" + std::to_string(i),
"bla bla bla bla bla bla bla bla bla bla bla bla bla", 4, i, i + 2);
auto emplace_end = std::chrono::steady_clock::now();
intmax_t sum{0};
auto iterate_start = std::chrono::steady_clock::now();
for (auto const& e : container)
sum += e.age_;
auto iterate_end = std::chrono::steady_clock::now();
KeysComparator key_comparator;
Hasher hasher{true};
auto find_start = std::chrono::steady_clock::now();
auto &container_interface = container.get<1>();
auto isSucceeded = container_interface.count("the really long name of some animal 12345678910_" + std::to_string(elements_size-1),
hasher, key_comparator);
if (not isSucceeded)
std::cout << "WARN: Element has not been found." << std::endl;
auto find_end = std::chrono::steady_clock::now();
printStatistics(emplace_start, emplace_end, iterate_start, iterate_end, find_start, find_end, sum, "Animal_multi (boost multi_index)");
}
int main()
{
test();
return 0;
}
There are a number of bugs like in the move constructor:
name_ = name_; // oops this does nothing at all
Just follow Rule Of Zero. This will also inform you that std::string copy/assignment are not noexcept.
The name copy should probably be length-limited:
std::copy_n(name.begin(), std::min(name.size(), name_.size()), name_.data());
At this point I notice something that might explain your trouble: you don't NUL-terminate, nor make sure that the array is 0-initialized.
BINGO
Indeed, just a few lines down I spot:
return std::hash<std::string_view>{}(name.data());
That's... UB! Your string_view might contain indeterminate data, but what's worse, you would NEVER have copied the terminating NUL character. So, std::string_view will model a string with indeterminate length which WILL likely exceed 50.
Read here about Nasal Demons (UB)
Such are the perils of skipping standard library types for the old C craft.
First Dig
So, here's the entirety of the class with equal/better characteristics:
using Name = std::array<char, 50>;
struct Animal {
Animal(std::string_view name, std::string description,
int leg = 0, int age = 0, double maxSpeed = 0) noexcept
: name_{0}, // zero initialize!
description_{std::move(description)},
leg_{leg},
age_{age},
maxSpeed_{maxSpeed}
{
constexpr auto Capacity = std::tuple_size<Name>::value;
constexpr auto MaxLen = Capacity - 1; // reserve NUL char
assert(name.length() < MaxLen);
std::copy_n(name.data(), std::min(name.length(), MaxLen), name_.data());
}
//Animal ( Animal&& animal ) noexcept = default;
//Animal ( Animal const& animal ) = default;
//Animal& operator= ( Animal&& animal ) noexcept = default;
//Animal& operator= ( Animal const& animal ) = default;
Name name_;
std::string description_;
int leg_{0};
int age_{0};
double maxSpeed_{0.0};
};
Improving: FixedString
This just screams for a better Name type. How about, FixedString:
template <size_t N> struct FixedString {
static_assert(N > 1); // require space for NUL char
FixedString(std::string_view s) : data_{0} {
if (s.length() >= N)
throw std::length_error("FixedString");
std::copy_n(s.data(), std::min(s.length(), N - 1), data());
}
std::string_view str() const { return { data(), size() }; }
operator std::string_view() const { return str(); }
auto data() const { return data_.data(); }
auto data() { return data_.data(); }
auto c_str() const { return data_.data(); }
auto c_str() { return data_.data(); }
auto begin() const { return data_.begin(); }
auto end() const { return data_.end(); }
auto begin() { return data_.begin(); }
auto end() { return data_.end(); }
size_t size() const {
auto terminator = std::memchr(data(), 0, data_.max_size());
return terminator
? static_cast<char const*>(terminator) - data()
: data_.max_size();
};
bool operator<(FixedString const& rhs) const { return str() < rhs.str(); }
bool operator==(FixedString const& rhs) const { return str() == rhs.str(); }
bool operator!=(FixedString const& rhs) const { return str() != rhs.str(); }
// optimizations:
bool operator<(std::string_view const& rhs) const { return str() < rhs.substr(0, N-1); }
bool operator==(std::string_view const& rhs) const { return str() == rhs.substr(0, N-1); }
bool operator!=(std::string_view const& rhs) const { return str() != rhs.substr(0, N-1); }
private:
std::array<char, N> data_;
};
Now you can simply
using Name = FixedString<50>;
And all your Names will magically (and safely) convert to and from string views.
using Name = FixedString<50>;
struct Animal {
Animal(std::string_view name, std::string description,
int leg = 0, int age = 0, double maxSpeed = 0) noexcept
: name_{name}, description_{std::move(description)},
leg_{leg}, age_{age}, maxSpeed_{maxSpeed}
{ }
Name name_;
std::string description_;
int leg_{0};
int age_{0};
double maxSpeed_{0.0};
};
Everything Simplifies With The Right Abstraction
This is the most important lesson I think I learned in my programming career: choosing the right abstraction leads to simplicity. Here, we evaporate two messy helpers:
using Hasher = std::hash<std::string_view>;
using KeysComparator = std::equal_to<Name>;
Boom. They do everything you had, but better.
Now, The Missing Element
After simplifying the whole thing to this it should become pretty obvious that a std::array<char, 50> can never correctly contain names longer than 50 characters. Indeed, checking the insertions:
auto emplace_start = Now();
size_t duplicates = 0;
for (auto i = 0; i < elements_size; ++i) {
auto [_, ok] = container.emplace_back(
make_name(i), "bla bla bla bla bla bla bla bla bla bla bla bla bla",
4, i, i + 2);
if (!ok) ++duplicates;
}
if (duplicates) {
std::cerr << "Oops, " << duplicates << " duplicate keys not inserted\n";
}
auto emplace_end = Now();
Reveals that:
Oops, 999990 duplicate keys not inserted
Elapsed time emplace: 116.491ms iterate: 0.000145ms find: 0.000597ms, sum:45 , calculation for Animal_multi (boost multi_index)
At least, now you replaced Undefined
Behaviour with
constraint checks.
Of course, just increasing the name capacity fixes it: [https://wandbox.org/permlink/6AamJfXe76nYALfR)
using Name = FixedString<60>;
Prints:
Elapsed time emplace: 594.475ms iterate: 18.6076ms find: 0.003138ms, sum:499999500000 , calculation for Animal_multi (boost multi_index)
Alternatively you can throw on Name construction with an overly long name: Live On Wandbox
FixedString(std::string_view s) : data_{0} {
if (s.length() >= N)
throw std::length_error("FixedString");
std::copy_n(s.data(), std::min(s.length(), N - 1), data());
}
Which duly prints
terminate called after throwing an instance of 'std::length_error'
what(): FixedString
Full Listing
This demo uses FixedString<60> to avoid the key errors:
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/random_access_index.hpp>
#include <boost/multi_index/member.hpp>
#include <iostream>
#include <iomanip>
#include <chrono>
using namespace std::chrono_literals;
int constexpr elements_size{ 1'000'000 };
template <size_t N> struct FixedString {
static_assert(N > 1); // require space for NUL char
FixedString(std::string_view s) : data_{0} {
if (s.length() >= N)
throw std::length_error("FixedString");
std::copy_n(s.data(), std::min(s.length(), N - 1), data());
}
std::string_view str() const { return { data(), size() }; }
operator std::string_view() const { return str(); }
auto data() const { return data_.data(); }
auto data() { return data_.data(); }
auto c_str() const { return data_.data(); }
auto c_str() { return data_.data(); }
auto begin() const { return data_.begin(); }
auto end() const { return data_.end(); }
auto begin() { return data_.begin(); }
auto end() { return data_.end(); }
size_t size() const {
auto terminator = std::memchr(data(), 0, data_.max_size());
return terminator
? static_cast<char const*>(terminator) - data()
: data_.max_size();
};
bool operator<(std::string_view const& rhs) const { return str() < rhs.substr(0, N-1); }
bool operator==(std::string_view const& rhs) const { return str() == rhs.substr(0, N-1); }
bool operator!=(std::string_view const& rhs) const { return str() != rhs.substr(0, N-1); }
bool operator<(FixedString const& rhs) const { return str() < rhs.str(); }
bool operator==(FixedString const& rhs) const { return str() == rhs.str(); }
bool operator!=(FixedString const& rhs) const { return str() != rhs.str(); }
private:
std::array<char, N> data_;
};
using Name = FixedString<60>;
struct Animal {
Animal(std::string_view name, std::string description,
int leg = 0, int age = 0, double maxSpeed = 0) noexcept
: name_{name}, description_{std::move(description)},
leg_{leg}, age_{age}, maxSpeed_{maxSpeed}
{ }
Name name_;
std::string description_;
int leg_{0};
int age_{0};
double maxSpeed_{0.0};
};
using Hasher = std::hash<std::string_view>;
using KeysComparator = std::equal_to<Name>;
using Clock = std::chrono::steady_clock;
using Duration = Clock::duration;
static auto Now = Clock::now;
void printStatistics(Duration emplace, Duration iterate, Duration find,
intmax_t const sum, std::string target)
{
std::cout << "Elapsed time"
<< " emplace: " << (emplace/1.0ms) << "ms"
<< " iterate: " << (iterate/1.0ms) << "ms"
<< " find: " << (find/1.0ms) << "ms"
<< ", sum:" << sum
<< " , calculation for " << target
<< std::endl;
}
void test() {
namespace bmi = boost::multi_index;
using Animal_multi = bmi::multi_index_container<Animal,
bmi::indexed_by<
bmi::random_access<>,
bmi::hashed_unique<
bmi::tag<struct by_name>,
bmi::member<Animal, Name, &Animal::name_>, Hasher, KeysComparator>
>
>;
Animal_multi container;
auto make_name = [](size_t id) {
return "the really long name of some animal 12345678910_" + std::to_string(id);
};
auto emplace_start = Now();
size_t duplicates = 0;
for (auto i = 0; i < elements_size; ++i) {
auto [_, ok] = container.emplace_back(
make_name(i), "bla bla bla bla bla bla bla bla bla bla bla bla bla",
4, i, i + 2);
if (!ok) ++duplicates;
}
if (duplicates) {
std::cerr << "Oops, " << duplicates << " duplicate keys not inserted\n";
}
auto emplace_end = Now();
intmax_t sum{ 0 };
auto iterate_start = Now();
for (auto const& e : container) {
sum += e.age_;
}
auto iterate_end = Now();
auto find_start = Now();
{
auto& name_idx = container.get<by_name>();
auto last_key = make_name(elements_size - 1);
if (name_idx.count(std::string_view(last_key)) == 0u) {
std::cout << "WARN: Element has not been found." << std::endl;
}
}
auto find_end = Now();
printStatistics(
emplace_end - emplace_start,
iterate_end - iterate_start,
find_end - find_start, sum,
"Animal_multi (boost multi_index)");
}
int main() { test(); }

How do I move items from a boost::variant to a multimap?

I'd like to improve the performance of PickPotatoes in the below code by using move instead of copy, but I can't figure out how to do that with insert and a boost::variant. In my actual use case, parsing the data takes about 75% of the time, and the real version of PickPotatoes takes about 25%, due to some slow copies. By improving PickPotatoes I should be able to get that down. Is it possible to move something out of a boost::variant and improve PickPotatoes?
#include <map>
#include "boost/variant.hpp"
#include <string>
#include <vector>
#include <functional>
struct tuber
{
int z;
std::vector<double> r;
};
int getZ(const tuber& t)
{
return t.z;
}
boost::variant<std::string, tuber> GrowPotato()
{
int z = std::rand() / (RAND_MAX / 10);
if (z < 2)
{
return "BAD POTATO";
}
else
{
tuber ret;
ret.z = z;
ret.r.resize(10000);
for (int i = 0;i < 10000;++i)
{
ret.r[i] = std::rand() / (RAND_MAX / 50);
}
return ret;
}
}
std::vector<boost::variant<std::string,tuber>> GrowPotatoes(int n)
{
std::vector<boost::variant<std::string, tuber>> ret;
ret.resize(n);
for (int i = 0; i < n; ++i)
{
ret[i] = GrowPotato();
}
return ret;
}
//could make this more efficient.
std::pair<std::vector<std::string>,std::multimap<int, tuber>> PickPotatoes(std::vector <boost::variant<std::string, tuber>> result)
{
std::pair<std::vector<std::string>,std::multimap<int,tuber>> ret;
int numTypTwo = 0;
for (const auto& item : result)
{
numTypTwo += item.which();
}
ret.first.resize(result.size() - numTypTwo);
int fstSpot = 0;
for (int i = 0; i < result.size();++i)
{
if (result[i].which())
{
ret.second.insert(std::pair<int, tuber>(getZ(boost::get<tuber>(result[i])), boost::get<tuber>(result[i])));
}
else
{
ret.first[fstSpot++] = std::move(boost::get<std::string>(result[i]));
}
}
return ret;
}
int main()
{
std::srand(0);
std::vector<boost::variant<std::string, tuber>> q= GrowPotatoes(5000);
std::pair<std::vector<std::string>, std::multimap<int, tuber>> z = PickPotatoes(q);
return 0;
}
The simplest win would be to move the parameter value:
std::pair<std::vector<std::string>, std::multimap<int, tuber>> z = PickPotatoes(std::move(q));
Indeed, it wins 14% of performance, roughly on my benchmarks. The rest heavily depends on what it all means, how it's going to be used.
Focus on reducing allocations (use a non-nodebased container if you can, e.g. boost::flat_multimap, sort explicitly, use string_view, parse into the desired datastructure instead of intermediate).
BONUS
I was able to shave off about 30% using:
std::pair<std::vector<std::string>, std::multimap<int, tuber> >
PickPotatoes(std::vector<boost::variant<std::string, tuber> >&& result) {
std::pair<std::vector<std::string>, std::multimap<int, tuber> > ret;
ret.first.reserve(result.size());
struct Vis {
using result_type = void;
void operator()(std::string& s) const {
first.emplace_back(std::move(s));
}
void operator()(tuber& tbr) const {
second.emplace(tbr.z, std::move(tbr));
}
std::vector<std::string>& first;
std::multimap<int, tuber>& second;
} visitor { ret.first, ret.second };
for (auto& element : result) {
boost::apply_visitor(visitor, element);
}
return ret;
}
Using emplace, avoiding repeated get<>, avoiding the loop to get the first size etc.

crash when using upper_bound in C++

I have following program which crashed at upper bound call. I am not getting why there is a crash. Any reason why I am having a crash. Thanks for your help and time.
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
enum quality { good = 0, bad, uncertain };
struct sValue {
int time;
int value;
int qual;
};
struct CompareLowerBoundValueAndTime {
bool operator()( const sValue& v, int time ) const {
return v.time < time;
}
bool operator()( const sValue& v1, const sValue& v2 ) const {
return v1.time < v2.time;
}
bool operator()( int time1, int time2 ) const {
return time1 < time2;
}
bool operator()( int time, const sValue& v ) const {
return time < v.time;
}
};
struct CompareUpperBoundValueAndTime {
bool operator()( const sValue& v, int time ) const {
return v.time > time;
}
bool operator()( const sValue& v1, const sValue& v2 ) const {
return v1.time > v2.time;
}
bool operator()( int time1, int time2 ) const {
return time1 > time2;
}
bool operator()( int time, const sValue& v ) const {
return time > v.time;
}
};
class MyClass {
public:
MyClass() {
InsertValues();
}
void InsertValues();
int GetLocationForTime(int time);
void PrintValueContainer();
private:
vector<sValue> valueContainer;
};
void MyClass::InsertValues() {
for(int num = 0; num < 5; num++) {
sValue temp;
temp.time = num;
temp.value = num+1;
temp.qual = num % 2;
valueContainer.push_back(temp);
}
}
void MyClass::PrintValueContainer()
{
for(int i = 0; i < valueContainer.size(); i++) {
std::cout << i << ". " << valueContainer[i].time << std::endl;
}
}
int MyClass::GetLocationForTime(int time)
{
std::vector< sValue >::iterator lower, upper;
lower = std::lower_bound(valueContainer.begin(), valueContainer.end(), time, CompareLowerBoundValueAndTime() );
upper = std::upper_bound(valueContainer.begin(), valueContainer.end(), time, CompareUpperBoundValueAndTime() ); // Crashing here.
std::cout << "Lower bound: " << lower - valueContainer.begin() << std::endl;
std::cout << "Upper bound: " << upper - valueContainer.begin() << std::endl;
return lower - valueContainer.begin();
}
int main()
{
MyClass a;
a.PrintValueContainer();
std::cout << "Location received for 2: " << a.GetLocationForTime(2) << std::endl;
return 0;
}
lower_bound and upper_bound work on a sorted sequence. The sequence has to be sorted using the same comparing function that you pass to both functions.
When you insert the elements in InsertValues you insert them in ascending order, so your CompareLowerBoundValueAndTime is a correct way to compare them.
But for upper_bound you're passing a different compare function. Pass CompareLowerBoundValueAndTime() and it should work.
Note that CompareLowerBoundValueAndTime is a misleading name. It should be something along the lines of CompareValueAndTimeAscending.
You should use the same comparer for both upper_bound and lower_bound. The difference is in the algorithm, not in the comparison.
Your compiler is giving you the answer. Check your code here: http://ideone.com/x6RE9
This gives you an error saying:
prog.cpp: In member function ‘int MyClass::GetLocationForTime(int)’:
prog.cpp:94: error: no match for ‘operator*’ in ‘*upper.__gnu_cxx::__normal_iterator<_Iterator, _Container>::operator* [with _Iterator = sValue*, _Container = std::vector<sValue, std::allocator<sValue> >]()’
You don't have to dereference upper twice, it doesn't make any sense.
I think you are getting an assertion error in upper_bound because it is finding that your sequence is not correctly sorted.
You seem to misunderstand what upper_bound does. It is the same as lower_bound except that the item pointed to by the iterator will be strictly greater than the search value, not greater-or-equal. IF there are no such values, it will point to the end of the sequence.
When using a predicate (Pred), it needs to be sorted such that
Pred( iter2, iter1 )
will return false whenever iter2 appears later than iter1 in the sequence.
That is not the case with your sequence and predicate combination, therefore you are getting an assertion error.