Merging two lists efficiently with limited bound - c++

I am trying to merge two arrays/lists where each element of the array has to be compared. If there is an identical element in both of them I increase their total occurrence by one. The arrays are both 2D, where each element has a counter for its occurrence. I know both of these arrays can be compared with a double for loop in O(n^2), however I am limited by a bound of O(nlogn). The final array will have all of the elements from both lists with their increased counters if there are more than one occurrence
Array A[][] = [[8,1],[5,1]]
Array B[][] = [[2,1],[8,1]]
After the merge is complete I should get an array like so
Array C[][] = [[2,1],[8,2],[8,2],[5,1]]
The arrangement of the elements does not have to be necessary.
From readings, Mergesort takes O(nlogn) to merge two lists however I am currently at a roadblock with my bound problem. Any pseudo code visual would be appreciated.

I quite like Stepanov's Efficient Programming although they are rather slow. In sessions 6 and 7 (if I recall correctly) he discusses the algorithms add_to_counter() and reduce_counter(). Both algorithms are entirely trivial, of course, but can be used to implement a non-recursive merge-sort without too much effort. The only possibly non-obvious insight is that the combining operation can reduce the two elements into a sequence rather than just one element. To do the operations in-place you'd actually store iterators (i.e., pointers in case of arrays) using a suitable class to represent a partial view of an array.
I haven't watched the sessions beyond session 7 (and actually not even the complete session 7, yet) but I would fully expect that he actually presents how to use the counter produced in session 7 to implement, e.g., merge-sort. Of course, the run-time complexity of merge-sort is O(n ln n) and, when using the counter approach it will use O(ln n) auxiliary space.

A simple algorithm that requires twice as much memory would be to order both inputs (O(n log n)) and then sequentially pick the elements from the head of both lists and do the merge (O(n)). The overall cost would be O(n log n) with O(n) extra memory (additional size of the smallest of both inputs)

Here's my algorithm based on bucket counting
time complexity: O(n)
memory complexity: O(max), where max is the maximum element in the arrays
Output:
[8,2][5,1][2,1][8,2]
Code:
#include <iostream>
#include <vector>
#include <iterator>
int &refreshCount(std::vector<int> &counters, int in) {
if((counters.size() - 1) < in) {
counters.resize(in + 1);
}
return ++counters[in];
}
void copyWithCounts(std::vector<std::pair<int, int> >::iterator it,
std::vector<std::pair<int, int> >::iterator end,
std::vector<int> &counters,
std::vector<std::pair<int, int&> > &result
) {
while(it != end) {
int &count = refreshCount(counters, (*it).first);
std::pair<int, int&> element((*it).first, count);
result.push_back(element);
++it;
}
}
void countingMerge(std::vector<std::pair<int, int> > &array1,
std::vector<std::pair<int, int> > &array2,
std::vector<std::pair<int, int&> > &result) {
auto array1It = array1.begin();
auto array1End = array1.end();
auto array2It = array2.begin();
auto array2End = array2.end();
std::vector<int> counters = {0};
copyWithCounts(array1It, array1End, counters, result);
copyWithCounts(array2It, array2End, counters, result);
}
int main()
{
std::vector<std::pair<int, int> > array1 = {{8, 1}, {5, 1}};
std::vector<std::pair<int, int> > array2 = {{2, 1}, {8, 1}};
std::vector<std::pair<int, int&> > result;
countingMerge(array1, array2, result);
for(auto it = result.begin(); it != result.end(); ++it) {
std::cout << "[" << (*it).first << "," << (*it).second << "] ";
}
return 0;
}
Short explanation:
because you mentioned, that final arrangement is not necessary, I did simple merge (without sort, who asked sort?) with counting, where result contains reference to counters, so no need to walk through the array to update the counters.

You could write an algorithm to merge them by walking both sequences sequentially in order, inserting where appropriate.
I've chosen a (seemingly more apt) datastructure here: std::map<Value, Occurence>:
#include <map>
using namespace std;
using Value = int;
using Occurence = unsigned;
using Histo = map<Value, Occurence>;
If you insist on contiguous storage, boost::flat_map<> should be your friend here (and a drop-in replacement).
The algorithm (tested with your inputs, read comments for explanation):
void MergeInto(Histo& target, Histo const& other)
{
auto left_it = begin(target), left_end = end(target);
auto right_it = begin(other), right_end = end(other);
auto const& cmp = target.value_comp();
while (right_it != right_end)
{
if ((left_it == left_end) || cmp(*right_it, *left_it))
{
// insert at left_it
target.insert(left_it, *right_it);
++right_it; // and carry on
} else if (cmp(*left_it, *right_it))
{
++left_it; // keep left_it first, so increment it
} else
{
// keys match!
left_it->second += right_it->second;
++left_it;
++right_it;
}
}
}
It's really quite straight-forward!
A test program: See it Live On Coliru
#include <iostream>
// for debug output
static inline std::ostream& operator<<(std::ostream& os, Histo::value_type const& v) { return os << "{" << v.first << "," << v.second << "}"; }
static inline std::ostream& operator<<(std::ostream& os, Histo const& v) { for (auto& el : v) os << el << " "; return os; }
//
int main(int argc, char *argv[])
{
Histo A { { 8, 1 }, { 5, 1 } };
Histo B { { 2, 1 }, { 8, 1 } };
std::cout << "A: " << A << "\n";
std::cout << "B: " << B << "\n";
MergeInto(A, B);
std::cout << "merged: " << A << "\n";
}
Printing:
A: {5,1} {8,1}
B: {2,1} {8,1}
merged: {2,1} {5,1} {8,2}
You could shuffle the interface a tiny bit in case you really wanted to merge into a new object (C):
// convenience
Histo Merge(Histo const& left, Histo const& right)
{
auto copy(left);
MergeInto(copy, right);
return copy;
}
Now you can just write
Histo A { { 8, 1 }, { 5, 1 } };
Histo B { { 2, 1 }, { 8, 1 } };
auto C = Merge(A, B);
See that Live on Coliru, too

Related

Prevent memory allocation in recursive combination generation

(Sorry about the title, it's not the best descriptive)
I am playing with graph theory, and generating all possible combinations of a given set of input numbers. Given the input set {2,3,4}, my possible combinations (of which there are 3!), are:
The following recursive solution works, but I don't like the fact that I have to "copy" the input vector in order to "remove" the element that represents the node I am following in order to prevent including it for output again. Elements I am going to output are stored in vecValues whereas the elements I can currently choose from are stored in vecInput:
void OutputCombos(vector<int>& vecInput, vector<int>& vecValues)
{
// When hit 0 input size, output.
if (vecInput.size() == 0)
{
for (int i : vecValues) cout << i << " ";
cout << endl;
}
size_t nSize = vecInput.size();
for (vector<int>::iterator iter = begin(vecInput); iter != end(vecInput); ++iter)
{
auto vecCopy = vecInput;
vecCopy.erase(find(begin(vecCopy), end(vecCopy), *iter));
vecValues.push_back(*iter);
OutputCombos(vecCopy, vecValues);
vecValues.pop_back();
}
}
void OutputCombos(vector<int>& vecInput)
{
vector<int> vecValues;
OutputCombos(vecInput, vecValues);
}
int main()
{
vector<int> vecInput{ 2,3,4 };
OutputCombos(vecInput);
return 0;
}
As expected from my state space tree, the output is
2 3 4
2 4 3
3 2 4
3 4 2
4 2 3
4 3 2
How can I get around this without having to make a copy of the vector for each recursive call please?
You could always just use std::next_permutation from <algorithm>
#include <algorithm>
#include <iostream>
#include <vector>
int main()
{
std::vector<int> input {2, 3, 4};
do {
for (auto i : input) std::cout << i << " ";
std::cout << std::endl;
} while(std::next_permutation(input.begin(), input.end()));
return 0;
}
This gives you the same output. You might want to check out a possible implementation of next_permutation, which involves swaps within the vector rather than copying the vector several times.
I think this might be closer to what you're looking for. A version without std::next_permutation that doesn't involve copying any vectors, and allows the input to remain const. However, it does this at the cost of checking the output in each iteration to make sure it doesn't add the same number twice.
#include<vector>
#include<iostream>
#include<algorithm>
template<typename T>
void OutputCombinations(
const std::vector<T>& input,
std::vector<typename std::vector<T>::const_iterator >& output)
{
for(auto it = input.begin(); it != input.end(); ++it)
{
if (std::find(output.begin(), output.end(), it) == output.end())
{
output.push_back(it);
if (output.size() == input.size())
{
for(auto node : output) std::cout << *node << " ";
std::cout << std::endl;
}
else OutputCombinations(input, output);
output.pop_back();
}
}
}
int main()
{
std::vector<int> nodes{ 2, 3, 4, 2 };
std::vector<std::vector<int>::const_iterator> result{};
OutputCombinations(nodes, result);
return 0;
}
After much studying I found inspiration in this article which gave me the ultimate solution. The idea is that we keep a vector of Boolean values which indicates whether or not a particular value has been used in the combination; that way we don't need to remove the element that we have already used hence there is no memory allocation overhead.
So, when building the branch {2,4,3}, if we get to {2,4}, vecTaken will be {true, false, true} and nNumBoolsSet will be 2. So when we loop, we will only "use" the element at index 1 of vecInput since that is the only element that has not been used as dictated by vecTaken.
void OutputCombos(vector<int>& vecInput, vector<int>& vecValues, vector<bool>& vecTaken, int& nNumBoolsSet)
{
size_t nSize = vecInput.size();
if (nNumBoolsSet == nSize)
{
for (int i : vecValues) cout << i << " ";
cout << endl;
return;
}
for (vector<int>::size_type i = 0; i < nSize; ++i)
{
if (vecTaken[i] == false)
{
vecValues.push_back(vecInput[i]);
vecTaken[i] = true;
++nNumBoolsSet;
OutputCombos(vecInput, vecValues, vecTaken, nNumBoolsSet);
vecTaken[i] = false;
vecValues.pop_back();
--nNumBoolsSet;
}
}
}
void OutputCombos(vector<int>& vecInput)
{
vector<int> vecValues;
vector<bool> vecTaken(vecInput.size(), false);
int nNumBoolsSet = 0;
OutputCombos(vecInput, vecValues, vecTaken, nNumBoolsSet);
}
int main()
{
vector<int> vecInput{ 2,3,4 };
OutputCombos(vecInput);
}

std::map: return vectors consisting of keys that have equal values

I have a std::map object. Keys are entity IDs (integers) and values their 2D positions (vectors). The aim is to identify, which entities are in the
same position.
ID Position
1 {2,3}
5 {6,2}
12 {2,3}
54 {4,4}
92 {6,2}
I need to get a vector of vectors consisting of keys, which have equal values.
Output for the example input data above: {1,12}, {5,92}
I know I can copy the 2D positions to vector to vectors and loop the first level vector to find indexes of equal second level vectors. Then work back to find keys by selecting the vectors by index and looping again to find the corresponding keys.
Please suggest a cleaner approach for this.
The point of an std::map is to provide an efficient key to value mapping. What you need is an additional value to key mapping - that can be achieved in multiple ways:
Have with an extra std::map that goes from Position to std::vector<ID>.
Use some sort of spatial partitioning data structure (e.g. quadtree, spatial hash, grid) that makes it efficient to find entities depending on their position.
Use a bidirectional multi-map like boost::bimap. This will allow you to have a bidirectional mapping over collection of values without having to use multiple data structures.
"How do I choose?"
It depends on your priorities. If you want maximum performance, you should try all the approaches (maybe using some sort of templatized wrapper) and profile. If you want elegance/cleanliness, boost::bimap seems to be the most appropriate solution.
You could put your data from the map into a std::mutlimap, with the Position as key and ID as value.
As a side note I wonder if a std::pair might be better than a vector for 2d points.
This answer seems to be best, but I'll offer my code anyway.
Given
#include <iostream>
#include <map>
#include <vector>
// Some definiton of Vector2D
struct Vector2D { int x; int y; };
// and some definition of operator< on Vector2D
bool operator<(Vector2D const & a, Vector2D const & b) noexcept {
if (a.x < b.x) return true;
if (a.x > b.x) return false;
return a.y < b.y;
}
How about:
template <typename M>
auto calculate(M const & inputMap) -> std::vector<std::vector<typename M::key_type> > {
std::map<typename M::mapped_type,
std::vector<typename M::key_type> > resultMap;
for (auto const & vp : inputMap)
resultMap[vp.second].push_back(vp.first);
std::vector<std::vector<typename M::key_type> > result;
for (auto & vp: resultMap)
if (vp.second.size() > 1)
result.emplace_back(std::move(vp.second));
return result;
}
Here's how to test:
int main() {
std::map<int, Vector2D> input{
{1, Vector2D{2,3}},
{5, Vector2D{6,2}},
{13, Vector2D{2,3}},
{54, Vector2D{4,4}},
{92, Vector2D{6,2}}
};
auto const result = calculate(input);
// Ugly print
std::cout << '{';
static auto const maybePrintComma =
[](bool & print) {
if (print) {
std::cout << ", ";
} else {
print = true;
}
};
bool comma = false;
for (auto const & v: result) {
maybePrintComma(comma);
std::cout << '{';
bool comma2 = false;
for (auto const & v2: v) {
maybePrintComma(comma2);
std::cout << v2;
}
std::cout << '}';
}
std::cout << '}' << std::endl;
}
You need to provide a reverse mapping. There are a number of ways to do this, including multimap, but a simple approach if your mapping isn't modified after creation is to iterate over the map and build up the reverse mapping. In the reverse mapping, you map value -> list of keys.
The code below uses std::unordered_map to map std::pair<int, int> (the value in the original map) to std::vector<int> (list of keys in the original map). The building of the reverse map is simple and concise:
std::unordered_map<Point, std::vector<int>, hash> r;
for (const auto& item : m) {
r[item.second].push_back(item.first);
}
(See the full example for the definition of hash).
There's no need to worry about whether the key exists; it will be created (and the vector of ids will be initialised as an empty vector) when you attempt to access that key using the r[key] notation.
This solution targets simplicity; it's a workable solution if you need to do this and don't care about performance, memory usage or using third-party libraries like Boost.
If you do care about any of those things, or you're modifying the map while doing lookups in both directions, you should probably explore other options.
Live example
#include <iostream>
#include <map>
#include <unordered_map>
#include <vector>
// Define a point type. Use pair<int, int> for simplicity.
using Point = std::pair<int, int>;
// Define a hash function for our point type:
struct hash {
std::size_t operator()(const Point& p) const
{
std::size_t h1 = std::hash<int>{}(p.first);
std::size_t h2 = std::hash<int>{}(p.second);
return h1 ^ (h2 << 1);
}
};
int main() {
// The original forward mapping:
std::map<int, Point> m = {
{1, {2, 3}},
{5, {6, 2}},
{12, {2, 3}},
{54, {4, 4}},
{92, {6, 2}}
};
// Build reverse mapping:
std::unordered_map<Point, std::vector<int>, hash> r;
for (const auto& item : m) {
r[item.second].push_back(item.first);
}
// DEMO: Show all indices for {6, 2}:
Point val1 = {6, 2};
for (const auto& id : r[val1]) {
std::cout << id << " ";
}
std::cout << "\n";
// DEMO: Show all indices for {2, 3}:
Point val2 = {2, 3};
for (const auto& id : r[val2]) {
std::cout << id << " ";
}
std::cout << "\n";
}

What container to store unique values?

I've got the following problem. I have a game which runs on average 60 frames per second. Each frame I need to store values in a container and there must be no duplicates.
It probably has to store less than 100 items per frame, but the number of insert-calls will be alot more (and many rejected due to it has to be unique). Only at the end of the frame do I need to traverse the container. So about 60 iterations of the container per frame, but alot more insertions.
Keep in mind the items to store are simple integer.
There are a bunch of containers I can use for this but I cannot make up my mind what to pick. Performance is the key issue for this.
Some pros/cons that I've gathered:
vector
(PRO): Contigous memory, a huge factor.
(PRO): Memory can be reserved first, very few allocations/deallocations afterwards
(CON): No alternative than to traverse the container (std::find) each insert() to find unique keys? The comparison is simple though (integers) and the whole container can probably fit the cache
set
(PRO): Simple, clearly meant for this
(CON): Not constant insert-time
(CON): Alot of allocations/deallocations per frame
(CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory.
unordered_set
(PRO): Simple, clearly meant for this
(PRO): Average case constant time insert
(CON): Seeing as I store integers, hash operation is probably alot more expensive than anything else
(CON): Alot of allocations/deallocations per frame
(CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory.
I'm leaning on going the vector-route because of memory access patterns, even though set is clearly meant for this issue. The big issue that is unclear to me is whether traversing the vector for each insert is more costly than the allocations/deallocations (especially considering how often this must be done) and the memory lookups of set.
I know ultimately it all comes down to profiling each case, but if nothing else than as a headstart or just theoretically, what would probably be best in this scenario? Are there any pros/cons I might've missed aswell?
EDIT: As I didnt mention, the container is cleared() at the end of each frame
I did timing with a few different methods that I thought were likely candidates. Using std::unordered_set was the winner.
Here are my results:
Using UnorderedSet: 0.078s
Using UnsortedVector: 0.193s
Using OrderedSet: 0.278s
Using SortedVector: 0.282s
Timing is based on the median of five runs for each case.
compiler: gcc version 4.9.1
flags: -std=c++11 -O2
OS: ubuntu 4.9.1
CPU: Intel(R) Core(TM) i5-4690K CPU # 3.50GHz
Code:
#include <algorithm>
#include <chrono>
#include <cstdlib>
#include <iostream>
#include <random>
#include <set>
#include <unordered_set>
#include <vector>
using std::cerr;
static const size_t n_distinct = 100;
template <typename Engine>
static std::vector<int> randomInts(Engine &engine,size_t n)
{
auto distribution = std::uniform_int_distribution<int>(0,n_distinct);
auto generator = [&]{return distribution(engine);};
auto vec = std::vector<int>();
std::generate_n(std::back_inserter(vec),n,generator);
return vec;
}
struct UnsortedVectorSmallSet {
std::vector<int> values;
static const char *name() { return "UnsortedVector"; }
UnsortedVectorSmallSet() { values.reserve(n_distinct); }
void insert(int new_value)
{
auto iter = std::find(values.begin(),values.end(),new_value);
if (iter!=values.end()) return;
values.push_back(new_value);
}
};
struct SortedVectorSmallSet {
std::vector<int> values;
static const char *name() { return "SortedVector"; }
SortedVectorSmallSet() { values.reserve(n_distinct); }
void insert(int new_value)
{
auto iter = std::lower_bound(values.begin(),values.end(),new_value);
if (iter==values.end()) {
values.push_back(new_value);
return;
}
if (*iter==new_value) return;
values.insert(iter,new_value);
}
};
struct OrderedSetSmallSet {
std::set<int> values;
static const char *name() { return "OrderedSet"; }
void insert(int new_value) { values.insert(new_value); }
};
struct UnorderedSetSmallSet {
std::unordered_set<int> values;
static const char *name() { return "UnorderedSet"; }
void insert(int new_value) { values.insert(new_value); }
};
int main()
{
//using SmallSet = UnsortedVectorSmallSet;
//using SmallSet = SortedVectorSmallSet;
//using SmallSet = OrderedSetSmallSet;
using SmallSet = UnorderedSetSmallSet;
auto engine = std::default_random_engine();
std::vector<int> values_to_insert = randomInts(engine,10000000);
SmallSet small_set;
namespace chrono = std::chrono;
using chrono::system_clock;
auto start_time = system_clock::now();
for (auto value : values_to_insert) {
small_set.insert(value);
}
auto end_time = system_clock::now();
auto& result = small_set.values;
auto sum = std::accumulate(result.begin(),result.end(),0u);
auto elapsed_seconds = chrono::duration<float>(end_time-start_time).count();
cerr << "Using " << SmallSet::name() << ":\n";
cerr << " sum=" << sum << "\n";
cerr << " elapsed: " << elapsed_seconds << "s\n";
}
I'm going to put my neck on the block here and suggest that the vector route is probably most efficient when the size is 100 and the objects being stored are integral values. The simple reason for this is that set and unordered_set allocate memory for each insert whereas the vector needn't more than once.
You can increase search performance dramatically by keeping the vector ordered, since then all searches can be binary searches and therefore complete in log2N time.
The downside is that the inserts will take a tiny fraction longer due to the memory moves, but it sounds as if there will be many more searches than inserts, and moving (average) 50 contiguous memory words is an almost instantaneous operation.
Final word:
Write the correct logic now. Worry about performance when the users are complaining.
EDIT:
Because I couldn't help myself, here's a reasonably complete implementation:
template<typename T>
struct vector_set
{
using vec_type = std::vector<T>;
using const_iterator = typename vec_type::const_iterator;
using iterator = typename vec_type::iterator;
vector_set(size_t max_size)
: _max_size { max_size }
{
_v.reserve(_max_size);
}
/// #returns: pair of iterator, bool
/// If the value has been inserted, the bool will be true
/// the iterator will point to the value, or end if it wasn't
/// inserted due to space exhaustion
auto insert(const T& elem)
-> std::pair<iterator, bool>
{
if (_v.size() < _max_size) {
auto it = std::lower_bound(_v.begin(), _v.end(), elem);
if (_v.end() == it || *it != elem) {
return make_pair(_v.insert(it, elem), true);
}
return make_pair(it, false);
}
else {
return make_pair(_v.end(), false);
}
}
auto find(const T& elem) const
-> const_iterator
{
auto vend = _v.end();
auto it = std::lower_bound(_v.begin(), vend, elem);
if (it != vend && *it != elem)
it = vend;
return it;
}
bool contains(const T& elem) const {
return find(elem) != _v.end();
}
const_iterator begin() const {
return _v.begin();
}
const_iterator end() const {
return _v.end();
}
private:
vec_type _v;
size_t _max_size;
};
using namespace std;
BOOST_AUTO_TEST_CASE(play_unique_vector)
{
vector_set<int> v(100);
for (size_t i = 0 ; i < 1000000 ; ++i) {
v.insert(int(random() % 200));
}
cout << "unique integers:" << endl;
copy(begin(v), end(v), ostream_iterator<int>(cout, ","));
cout << endl;
cout << "contains 100: " << v.contains(100) << endl;
cout << "contains 101: " << v.contains(101) << endl;
cout << "contains 102: " << v.contains(102) << endl;
cout << "contains 103: " << v.contains(103) << endl;
}
As you said you have many insertions and only one traversal, I’d suggest to use a vector and push the elements in regardless of whether they are unique in the vector. This is done in O(1).
Just when you need to go through the vector, then sort it and remove the duplicate elements. I believe this can be done in O(n) as they are bounded integers.
EDIT: Sorting in linear time through counting sort presented in this video. If not feasible, then you are back to O(n lg(n)).
You will have very little cache miss because of the contiguity of the vector in memory, and very few allocations (especially if you reserve enough memory in the vector).

Output over unique elements of `std::multiset` and their frequency using std:: algorithm in C++ (no loops)

I have the following multiset in C++:
template<class T>
class CompareWords {
public:
bool operator()(T s1, T s2)
{
if (s1.length() == s2.length())
{
return ( s1 < s2 );
}
else return ( s1.length() < s2.length() );
}
};
typedef multiset<string, CompareWords<string>> mySet;
typedef std::multiset<string,CompareWords<string>>::iterator mySetItr;
mySet mWords;
I want to print each unique element of type std::string in the set once and next to the element I want to print how many time it appears in the list (frequency), as you can see the functor "CompareWord" keeps the set sorted.
A solution is proposed here, but its not what I need, because I am looking for a solution without using (while,for,do while).
I know that I can use this:
//gives a pointer to the first and last range or repeated element "word"
auto p = mWords.equal_range(word);
// compute the distance between the iterators that bound the range AKA frequency
int count = static_cast<int>(std::distance(p.first, p.second));
but I can't quite come up with a solution without loops?
Unlike the other solutions, this iterates over the list exactly once. This is important, as iterating over a structure like std::multimap is reasonably high overhead (the nodes are distinct allocations).
There are no explicit loops, but the tail-end recursion will be optimized down to a loop, and I call an algorithm that will run a loop.
template<class Iterator, class Clumps, class Compare>
void produce_clumps( Iterator begin, Iterator end, Clumps&& clumps, Compare&& compare) {
if (begin==end) return; // do nothing for nothing
typedef decltype(*begin) value_type_ref;
// We know runs are at least 1 long, so don't bother comparing the first time.
// Generally, advancing will have a cost similar to comparing. If comparing is much
// more expensive than advancing, then this is sub optimal:
std::size_t count = 1;
Iterator run_end = std::find_if(
std::next(begin), end,
[&]( value_type_ref v ){
if (!compare(*begin, v)) {
++count;
return false;
}
return true;
}
);
// call our clumps callback:
clumps( begin, run_end, count );
// tail end recurse:
return produce_clumps( std::move(run_end), std::move(end), std::forward<Clumps>(clumps), std::forward<Compare>(compare) );
}
The above is a relatively generic algorithm. Here is its use:
int main() {
typedef std::multiset<std::string> mySet;
typedef std::multiset<std::string>::iterator mySetItr;
mySet mWords { "A", "A", "B" };
produce_clumps( mWords.begin(), mWords.end(),
[]( mySetItr run_start, mySetItr /* run_end -- unused */, std::size_t count )
{
std::cout << "Word [" << *run_start << "] occurs " << count << " times\n";
},
CompareWords<std::string>{}
);
}
live example
The iterators must refer to a sorted sequence (with regards to the Comparator), then the clumps will be passed to the 3rd argument together with their length.
Every element in the multiset will be visited exactly once with the above algorithm (as a right-hand side argument to your comparison function). Every start of a clump will be visited (length of clump) additional times as a left-hand side argument (including clumps of length 1). There will be exactly N iterator increments performed, and no more than N+C+1 iterator comparisons (N=number of elements, C=number of clumps).
#include <iostream>
#include <algorithm>
#include <set>
#include <iterator>
#include <string>
int main()
{
typedef std::multiset<std::string> mySet;
typedef std::multiset<std::string>::iterator mySetItr;
mySet mWords;
mWords.insert("A");
mWords.insert("A");
mWords.insert("B");
mySetItr it = std::begin(mWords), itend = std::end(mWords);
std::for_each<mySetItr&>(it, itend, [&mWords, &it] (const std::string& word)
{
auto p = mWords.equal_range(word);
int count = static_cast<int>(std::distance(p.first, p.second));
std::cout << word << " " << count << std::endl;
std::advance(it, count - 1);
});
}
Outputs:
A 2
B 1
Live demo link.
Following does the job without explicit loop using recursion:
void print_rec(const mySet& set, mySetItr it)
{
if (it == set.end()) {
return;
}
const auto& word = *it;
auto next = std::find_if(it, set.end(),
[&word](const std::string& s) {
return s != word;
});
std::cout << word << " appears " << std::distance(it, next) << std::endl;
print_rec(set, next);
}
void print(const mySet& set)
{
print_rec(set, set.begin());
}
Demo

How to find the index of current object in range-based for loop?

Assume I have the following code:
vector<int> list;
for(auto& elem:list) {
int i = elem;
}
Can I find the position of elem in the vector without maintaining a separate iterator?
Yes you can, it just take some massaging ;)
The trick is to use composition: instead of iterating over the container directly, you "zip" it with an index along the way.
Specialized zipper code:
template <typename T>
struct iterator_extractor { typedef typename T::iterator type; };
template <typename T>
struct iterator_extractor<T const> { typedef typename T::const_iterator type; };
template <typename T>
class Indexer {
public:
class iterator {
typedef typename iterator_extractor<T>::type inner_iterator;
typedef typename std::iterator_traits<inner_iterator>::reference inner_reference;
public:
typedef std::pair<size_t, inner_reference> reference;
iterator(inner_iterator it): _pos(0), _it(it) {}
reference operator*() const { return reference(_pos, *_it); }
iterator& operator++() { ++_pos; ++_it; return *this; }
iterator operator++(int) { iterator tmp(*this); ++*this; return tmp; }
bool operator==(iterator const& it) const { return _it == it._it; }
bool operator!=(iterator const& it) const { return !(*this == it); }
private:
size_t _pos;
inner_iterator _it;
};
Indexer(T& t): _container(t) {}
iterator begin() const { return iterator(_container.begin()); }
iterator end() const { return iterator(_container.end()); }
private:
T& _container;
}; // class Indexer
template <typename T>
Indexer<T> index(T& t) { return Indexer<T>(t); }
And using it:
#include <iostream>
#include <iterator>
#include <limits>
#include <vector>
// Zipper code here
int main() {
std::vector<int> v{1, 2, 3, 4, 5, 6, 7, 8, 9};
for (auto p: index(v)) {
std::cout << p.first << ": " << p.second << "\n";
}
}
You can see it at ideone, though it lacks the for-range loop support so it's less pretty.
EDIT:
Just remembered that I should check Boost.Range more often. Unfortunately no zip range, but I did found a pearl: boost::adaptors::indexed. However it requires access to the iterator to pull of the index. Shame :x
Otherwise with the counting_range and a generic zip I am sure it could be possible to do something interesting...
In the ideal world I would imagine:
int main() {
std::vector<int> v{1, 2, 3, 4, 5, 6, 7, 8, 9};
for (auto tuple: zip(iota(0), v)) {
std::cout << tuple.at<0>() << ": " << tuple.at<1>() << "\n";
}
}
With zip automatically creating a view as a range of tuples of references and iota(0) simply creating a "false" range that starts from 0 and just counts toward infinity (or well, the maximum of its type...).
jrok is right : range-based for loops are not designed for that purpose.
However, in your case it is possible to compute it using pointer arithmetic since vector stores its elements contiguously (*)
vector<int> list;
for(auto& elem:list) {
int i = elem;
int pos = &elem-&list[0]; // pos contains the position in the vector
// also a &-operator overload proof alternative (thanks to ildjarn) :
// int pos = addressof(elem)-addressof(list[0]);
}
But this is clearly a bad practice since it obfuscates the code & makes it more fragile (it easily breaks if someone changes the container type, overload the & operator or replace 'auto&' by 'auto'. good luck to debug that!)
NOTE: Contiguity is guaranteed for vector in C++03, and array and string in C++11 standard.
No, you can't (at least, not without effort). If you need the position of an element, you shouldn't use range-based for. Remember that it's just a convenience tool for the most common case: execute some code for each element. In the less-common circumstances where you need the position of the element, you have to use the less-convenient regular for loop.
Based on the answer from #Matthieu there is a very elegant solution using the mentioned boost::adaptors::indexed:
std::vector<std::string> strings{10, "Hello"};
int main(){
strings[5] = "World";
for(auto const& el: strings| boost::adaptors::indexed(0))
std::cout << el.index() << ": " << el.value() << std::endl;
}
You can try it
This works pretty much like the "ideal world solution" mentioned, has pretty syntax and is concise. Note that the type of el in this case is something like boost::foobar<const std::string&, int>, so it handles the reference there and no copying is performed. It is even incredibly efficient: https://godbolt.org/g/e4LMnJ (The code is equivalent to keeping an own counter variable which is as good as it gets)
For completeness the alternatives:
size_t i = 0;
for(auto const& el: strings) {
std::cout << i << ": " << el << std::endl;
++i;
}
Or using the contiguous property of a vector:
for(auto const& el: strings) {
size_t i = &el - &strings.front();
std::cout << i << ": " << el << std::endl;
}
The first generates the same code as the boost adapter version (optimal) and the last is 1 instruction longer: https://godbolt.org/g/nEG8f9
Note: If you only want to know, if you have the last element you can use:
for(auto const& el: strings) {
bool isLast = &el == &strings.back();
std::cout << isLast << ": " << el << std::endl;
}
This works for every standard container but auto&/auto const& must be used (same as above) but that is recommended anyway. Depending on the input this might also be pretty fast (especially when the compiler knows the size of your vector)
Replace the &foo by std::addressof(foo) to be on the safe side for generic code.
If you have a compiler with C++14 support you can do it in a functional style:
#include <iostream>
#include <string>
#include <vector>
#include <functional>
template<typename T>
void for_enum(T& container, std::function<void(int, typename T::value_type&)> op)
{
int idx = 0;
for(auto& value : container)
op(idx++, value);
}
int main()
{
std::vector<std::string> sv {"hi", "there"};
for_enum(sv, [](auto i, auto v) {
std::cout << i << " " << v << std::endl;
});
}
Works with clang 3.4 and gcc 4.9 (not with 4.8); for both need to set -std=c++1y. The reason you need c++14 is because of the auto parameters in the lambda function.
If you insist on using range based for, and to know index, it is pretty trivial to maintain index as shown below.
I do not think there is a cleaner / simpler solution for range based for loops. But really why not use a standard for(;;)? That probably would make your intent and code the clearest.
vector<int> list;
int idx = 0;
for(auto& elem:list) {
int i = elem;
//TODO whatever made you want the idx
++idx;
}
There is a surprisingly simple way to do this
vector<int> list;
for(auto& elem:list) {
int i = (&elem-&*(list.begin()));
}
where i will be your required index.
This takes advantage of the fact that C++ vectors are always contiguous.
Here's a quite beautiful solution using c++20:
#include <array>
#include <iostream>
#include <ranges>
template<typename T>
struct EnumeratedElement {
std::size_t index;
T& element;
};
auto enumerate(std::ranges::range auto& range)
-> std::ranges::view auto
{
return range | std::views::transform(
[i = std::size_t{}](auto& element) mutable {
return EnumeratedElement{i++, element};
}
);
}
auto main() -> int {
auto const elements = std::array{3, 1, 4, 1, 5, 9, 2};
for (auto const [index, element] : enumerate(elements)) {
std::cout << "Element " << index << ": " << element << '\n';
}
}
The major features used here are c++20 ranges, c++20 concepts, c++11 mutable lambdas, c++14 lambda capture initializers, and c++17 structured bindings. Refer to cppreference.com for information on any of these topics.
Note that element in the structured binding is in fact a reference and not a copy of the element (not that it matters here). This is because any qualifiers around the auto only affect a temporary object that the fields are extracted from, and not the fields themselves.
The generated code is identical to the code generated by this (at least by gcc 10.2):
#include <array>
#include <iostream>
#include <ranges>
auto main() -> int {
auto const elements = std::array{3, 1, 4, 1, 5, 9, 2};
for (auto index = std::size_t{}; auto& element : elements) {
std::cout << "Element " << index << ": " << element << '\n';
index++;
}
}
Proof: https://godbolt.org/z/a5bfxz
I read from your comments that one reason you want to know the index is to know if the element is the first/last in the sequence. If so, you can do
for(auto& elem:list) {
// loop code ...
if(&elem == &*std::begin(list)){ ... special code for first element ... }
if(&elem == &*std::prev(std::end(list))){ ... special code for last element ... }
// if(&elem == &*std::rbegin(list)){... (C++14 only) special code for last element ...}
// loop code ...
}
EDIT: For example, this prints a container skipping a separator in the last element. Works for most containers I can imagine (including arrays), (online demo http://coliru.stacked-crooked.com/a/9bdce059abd87f91):
#include <iostream>
#include <vector>
#include <list>
#include <set>
using namespace std;
template<class Container>
void print(Container const& c){
for(auto& x:c){
std::cout << x;
if(&x != &*std::prev(std::end(c))) std::cout << ", "; // special code for last element
}
std::cout << std::endl;
}
int main() {
std::vector<double> v{1.,2.,3.};
print(v); // prints 1,2,3
std::list<double> l{1.,2.,3.};
print(l); // prints 1,2,3
std::initializer_list<double> i{1.,2.,3.};
print(i); // prints 1,2,3
std::set<double> s{1.,2.,3.};
print(s); // print 1,2,3
double a[3] = {1.,2.,3.}; // works for C-arrays as well
print(a); // print 1,2,3
}
Tobias Widlund wrote a nice MIT licensed Python style header only enumerate (C++17 though):
GitHub
Blog Post
Really nice to use:
std::vector<int> my_vector {1,3,3,7};
for(auto [i, my_element] : en::enumerate(my_vector))
{
// do stuff
}
If you want to avoid having to write an auxiliary function while having
the index variable local to the loop, you can use a lambda with a mutable variable.:
int main() {
std::vector<char> values = {'a', 'b', 'c'};
std::for_each(begin(values), end(values), [i = size_t{}] (auto x) mutable {
std::cout << i << ' ' << x << '\n';
++i;
});
}
Here's a macro-based solution that probably beats most others on simplicity, compile time, and code generation quality:
#include <iostream>
#define fori(i, ...) if(size_t i = -1) for(__VA_ARGS__) if(i++, true)
int main() {
fori(i, auto const & x : {"hello", "world", "!"}) {
std::cout << i << " " << x << std::endl;
}
}
Result:
$ g++ -o enumerate enumerate.cpp -std=c++11 && ./enumerate
0 hello
1 world
2 !