C++ equivalent of scala's Seq.grouped? - c++

I would like to use an idiomatic (std::algorithm or similar) version of Scala's .grouped in C++. This breaks a sequence into groups of size N where the last group may be smaller. Any ideas?
Reference: https://www.scala-lang.org/api/current/scala/collection/Seq.html#grouped(size:Int):Iterator%5BC%5D
I've successfully used a loop with std::min but I would like something built in. This is my solution for grouping into chunks of 7 (found here on SO in another answer):
std::vector<std::vector<uint64_t>> chunked;
std::vector<uint64_t> flat;
// group into chunks of 7
for (size_t i = 0; i < flat.size(); i += 7) {
auto last = std::min(flat.size(), i + 7);
std::vector<uint64_t> chunk =std::vector<uint64_t>(flat.begin() + i, flat.begin() + last);
chunked.emplace_back(chunk);
}

std::ranges::views::chunk would do the job (but is only available "since" C++23).

Related

Best way to remove elements from a container using its ranges version

I face a common problem in my code where I would like to remove only a single element from a reversed std::vector after it satisfies a predicate. I understand there's a number of ways to do this with ranges-v3, but each way I come up with seems a bit convoluted.
Here's an example of the target vector v:
std::vector v = { 1, 2, 3, 2, 4 };
The result needs to be vector r:
std::vector r = { 1, 2, 3, 4 };
Which will be done by removing the first 2 (via a lambda predicate "is_two") that is found when reverse traversing the vector v.
Here's one what it could look like in a vanilla C++ raw loop:
auto is_two = [](int a) { return a == 2; };
for (int i = v.size(); --i >= 0;) {
if (is_two(v[i])) {
v.erase(v.begin() + i);
break;
}
}
Here's my bad ranges-v3 version:
namespace rs = ranges;
namespace rv = ranges::view;
namespace ra = ranges::action;
rs::for_each(v | rv::enumerate
| rv::reverse
| rv::filter([](auto i_e) { return i_e.second == 2; })
| rv::take(1),
[&](auto& i_e) { v.erase(v.begin() + i_e.first); });
Ideally I'm wondering if there's some solution that could look something like this:
ra::remove_if(v | rv::reverse, is_two);
To generalize, I'd like to know how one can take a container, pipe it through some ranges::view operations, then remove the elements in the resulting range from the original container.
Since no one seems to have come up with a better approach,
I would like to mention for your benefit the possibility of resorting to good old reverse_iterators.
vec.erase(std::prev(ranges::find_if(vec.rbegin(), vec.rend(), is_two).base()));
Admittedly, this is not very ranges-ish,
but least it works.
The main purpose of range-based for loops is consistency. Having the same operation executed for each element.
Deleting an element breaks this consistency. So the best solution is a normal for loop where you can use iterators and breaks.
When you have a hammer everything looks like a nail. Don't hammer the ranged loop into a normal one.

Implementing the Backward Nondeterministic Dawg Matching algorithm

I'm trying to implement the BNDM algorithm in my code, in order to perform a fast pattern search.
I found some code online and tried to adjust it for my use case:
I think that I did something wrong while changing the values, since the algorithm takes a few minutes to finish (I was expecting it to be faster).
Using std::search takes me 30 seconds (with wildcards).
This takes me around 4-5 minutes (without wildcards).
The reason I'm casting everything to (unsigned char) is because the program crashes otherwise, since both my data and pattern hold hex values.
What I'd like to know is, where did I go wrong with this implementation (why is it running so slow)? and how can I include the ability to search for a pattern that contains wildcards?
EDIT*
The issue with speed has been solved by switching build from debug to release.
Also changing the size of the B array to 256 made it even faster.
The only issue I currently have now is how to implement a way to use wildcards using this algorithm.
Current code:
vector<unsigned int> get_matches(const vector<char> & data, const string & pattern) {
vector<unsigned int> matches;
//vector<char>::const_iterator walk = data.begin();
std::array<std::uint32_t, 256> B{ 0 };
int m = pattern.size();
int n = data.size();
int i, j, s, d, last;
//if (m > WORD_SIZE)
// error("BNDM");
// Pre processing
//memset(B, 0, ASIZE * sizeof(int));
s = 1;
for (i = m - 1; i >= 0; i--) {
B[(unsigned char)pattern[i]] |= s;
s <<= 1;
}
// Searching phase
j = 0;
while (j <= n - m) {
i = m - 1; last = m;
d = ~0;
while (i >= 0 && d != 0) {
d &= B[(unsigned char)data[j + i]];
i--;
if (d != 0) {
if (i >= 0)
last = i + 1;
else
matches.emplace_back(j);
}
d <<= 1;
}
j += last;
}
return matches;
}
B is not big enough -- it is indexed by the bytes in the pattern so it must have 256 elements (assuming an 8-bit byte architecture.) But you define it as having pattern.size() elements, which is a much smaller number.
As a consequence, you are using memory outside of B's allocation, which is Undefined Behaviour.
I suggest you use std::array<std::uint32_t, 256>, since you don't ever need to resize B. (Or even better, std::array<std::uint32_t, std::numeric_limits<unsigned char>::max()+1>).
I'm not an expert on this particular search algorithm, but the preprocessing step appears to set bit p in element c of B if the character c matches pattern element p. Since a wildcard pattern element can match any character, it seems reasonable that every element of B should have the bits corresponding to wildcard characters set. In other words, instead of initialising every element of B to 0, initialise them to the mask of wildcard positions in the pattern.
I don't know if that is sufficient to get the algorithm to work with wildcards, but it could be worth a try.

Is there some STL function to get cartesian product of two C++ vectors?

Suppose
b = ["good ", "bad "]
a = ["apple","mango"]
then output = ["good apple","good mango","bad apple","bad mango"]
I know this can be done with nested for loops but is there some elegant one liner for doing this using C++ STL?
Here is a one-liner (copied from Jonathan Mee's answer posted here):
for(size_t i = 0, s = a.size(); i < output.size(); ++i) output[i] = b[i/s] + ' ' + a[i%s];
Full example here.
Given vector<string> a and vector<string> b you can use for_each:
vector<string> output(size(a) * size(b));
for_each(begin(output), end(output), [&, it = 0U](auto& i) mutable {
i = a[it / size(b)] + ' ' + b[it % size(b)];
++it;
});
Live Example
EDIT:
We've initialized output with enough room to contain every combination of a and b. Then we'll step through each element of output and assign it.
We'll want to use the 1st element of a for the first size(b) elements of output, and the 2nd element of a for the second size(b) elements, and so on. So we'll do this by indexing with it / size(b). We'll want to combine that by iteration through b's elements.
it will move to the next index for each element of output but the indexing needs to wrap or it will be out of bounds when it == size(b), to do that we use it % size(b).
EDIT2:
In this question through benchmarking I'd discovered the phenomenon that modulo and division are expensive operations for iteration. I've done the same test here. For the purpose of isolating the algorithms I'm just doing the Cartesian summation on a vector<int> not vector<string>.
First off we can see the two algorithms result in differing assembly. My algorithm as written above requires 585 lines of assembly. 588 lines were required by my interpretation of MSalter's code
vector<string> output(size(testValues1) * size(testValues2));
auto i = begin(output);
std::for_each(cbegin(a), cend(a), [&](const auto& A) { std::for_each(cbegin(b), cend(b), [&](const auto& B) { *i++ = A + ' ' + B; }); });
I have placed a pretty solid benchmarking test here: http://ideone.com/1YpzIO In the test I've only got it set to do 100 tests yet MSalters' algorithm always wins. Locally using Visual Studio 2015 in release with 10,000,000 tests MSalters algorithm finishes in about 2/3 the time it takes mine.
Clearly modulo isn't a great method of indexing :(
There's no direct solution; I checked the whole of <algorithm>. None of the functions produce an output of length M*N.
What you can do is call std::for_each on the first range, using a lambda which calls std::for_each on the second range (!)
std::vector<std::string> a, b;
std::for_each(a.begin(), a.end(),
[&](std::string A) { std::for_each(b.begin(), b.end(),
[A](std::string B) { std::cout << A << '/' << B << '\n'; }
);});
But that's just a nested loop in STL.

C++ Optimizing this Algorithm

After watching some Terence Tao videos, I wanted to try implementing algorithms into c++ code to find all the prime numbers up to a number n. In my first version, where I simply had every integer from 2 to n tested to see if they were divisible by anything from 2 to sqrt(n), I got the program to find the primes between 1-10,000,000 in ~52 seconds.
Attempting to optimize the program, and implementing what I now know to be the Sieve of Eratosthenes, I assumed the task would be done much faster than 51 seconds, but sadly, that wasn't the case. Even going up to 1,000,000 took a considerable amount of time (didn't time it, though)
#include <iostream>
#include <vector>
using namespace std;
void main()
{
vector<int> tosieve = {};
for (int i = 2; i < 1000001; i++)
{
tosieve.push_back(i);
}
for (int j = 0; j < tosieve.size(); j++)
{
for (int k = j + 1; k < tosieve.size(); k++)
{
if (tosieve[k] % tosieve[j] == 0)
{
tosieve.erase(tosieve.begin() + k);
}
}
}
//for (int f = 0; f < tosieve.size(); f++)
//{
// cout << (tosieve[f]) << endl;
//}
cout << (tosieve.size()) << endl;
system("pause");
}
Is it the repeated referencing of the vectors or something? Why is this so slow? Even if I'm completely overlooking something (could be, complete beginner at this :I) I would think that finding the primes between 2 and 1,000,000 with this horrible inefficient method would be faster than my original way of finding them from 2 to 10,000,000.
Hope someone has a clear answer to this - hopefully I can use whatever knowledge is gleaned in the future when optimizing programs using a lot of recursion.
The problem is that 'erase' moves every element in the vector down one, meaning it is an O(n) operation.
There are three alternative choices:
1) Just mark deleted elements as 'empty' (make them 0, for example). This will mean future passes have to pass over those empty positions, but that isn't that expensive.
2) Make a new vector, and push_back new values into there.
3) Use std::remove_if: This will move the elements down, but do it in a single pass so will be more efficient. If you use std::remove_if, then you will have to remember it doesn't resize the vector itself.
Most of vector operations, including erase() have a O(n) linear time complexity.
Since you have two loops of size 10^6, and a vector of size 10^6, your algorithm executes up to 10^18 operations.
Qubic algorithms for such a big N will take a huge amount of time.
N = 10^6 is even big enough for quadratic algorithms.
Please, read carefully about Sieve of Eratosthenes. The fact that both full search and Sieve of Eratosthenes algorithms took the same time, means that you have done the second one wrong.
I see two performanse issues here:
First of all, push_back() will have to reallocate the dynamic memory block once in a while. Use reserve():
vector<int> tosieve = {};
tosieve.resreve(1000001);
for (int i = 2; i < 1000001; i++)
{
tosieve.push_back(i);
}
Second erase() has to move all Elements behind the one you try to remove. You set the elements to 0 instead and do a run over the vector in the end (untested code):
for (auto& x : tosieve) {
for (auto y = tosieve.begin(); *y < x; ++y) // this check works only in
// the case of an ordered vector
if (y != 0 && x % y == 0) x = 0;
}
{ // this block will make sure, that sieved will be released afterwards
auto sieved = vector<int>{};
for(auto x : tosieve)
sieved.push_back(x);
swap(tosieve, sieved);
} // the large memory block is released now, just keep the sieved elements.
consider to use standard algorithms instead of hand written loops. They help you to state your intent. In this case I see std::transform() for the outer loop of the sieve, std::any_of() for the inner loop, std::generate_n() for filling tosieve at the beginning and std::copy_if() for filling sieved (untested code):
vector<int> tosieve = {};
tosieve.resreve(1000001);
generate_n(back_inserter(tosieve), 1000001, []() -> int {
static int i = 2; return i++;
});
transform(begin(tosieve), end(tosieve), begin(tosieve), [](int i) -> int {
return any_of(begin(tosieve), begin(tosieve) + i - 2,
[&i](int j) -> bool {
return j != 0 && i % j == 0;
}) ? 0 : i;
});
swap(tosieve, [&tosieve]() -> vector<int> {
auto sieved = vector<int>{};
copy_if(begin(tosieve), end(tosieve), back_inserter(sieved),
[](int i) -> bool { return i != 0; });
return sieved;
});
EDIT:
Yet another way to get that done:
vector<int> tosieve = {};
tosieve.resreve(1000001);
generate_n(back_inserter(tosieve), 1000001, []() -> int {
static int i = 2; return i++;
});
swap(tosieve, [&tosieve]() -> vector<int> {
auto sieved = vector<int>{};
copy_if(begin(tosieve), end(tosieve), back_inserter(sieved),
[](int i) -> bool {
return !any_of(begin(tosieve), begin(tosieve) + i - 2,
[&i](int j) -> bool {
return i % j == 0;
});
});
return sieved;
});
Now instead of marking elements, we don't want to copy afterwards, but just directly copy only the elements, we want to copy. This is not only faster than the above suggestion, but also better states the intent.
Very interesting task you have. Thanks!
With pleasure I implemented from scratch my own versions of solving it.
I created 3 separate (independent) functions, all based on Sieve of Eratosthenes. These 3 versions are different in their complexity and speed.
Just a quick note, my simplest (slowest) version finds all primes below your desired limit of 10'000'000 within just 0.025 sec (i.e. 25 milli-seconds).
I also tested all 3 versions to find primes below 2^32 (4'294'967'296), which is solved by "simple" version within 47 seconds, by "intermediate" version within 30 seconds, by "advanced" within 12 seconds. So within just 12 seconds it finds all primes below 4 Billion (there are 203'280'221 such primes below 2^32, see OEIS sequence)!!!
For simplicity I will describe in details only Simple version out of 3. Here's code:
template <typename T>
std::vector<T> GenPrimes_SieveOfEratosthenes(size_t end) {
// https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
if (end <= 2)
return {};
size_t const cnt = end >> 1;
std::vector<u8> composites((cnt + 7) / 8);
auto Get = [&](size_t i){ return bool((composites[i / 8] >> (i % 8)) & 1); };
auto Set = [&](size_t i){ composites[i / 8] |= u8(1) << (i % 8); };
std::vector<T> primes = {2};
size_t i = 0;
for (i = 1; i < cnt; ++i) {
if (Get(i))
continue;
size_t const p = 2 * i + 1, start = (p * p) >> 1;
primes.push_back(p);
if (start >= cnt)
break;
for (size_t j = start; j < cnt; j += p)
Set(j);
}
for (i = i + 1; i < cnt; ++i)
if (!Get(i))
primes.push_back(2 * i + 1);
return primes;
}
This code implements simplest but fast algorithm of finding primes, called Sieve of Eratosthenes. As a small optimization of speed and memory, I search only over odd numbers. This odd numbers optimization gives me ability to store 2x times less memory and do 2x times less steps, hence improves both speed and memory consumption exactly 2 times.
Algorithm is simple, we allocate array of bits, this array at position K has bit 1 if K is composite, or has 0 if K is probably prime. At the end all 0 bits in array signify Definite primes (that are for sure primes). Also due to odd numbers optimization this bit-array stores only odd numbers, so K-th bit is actually a number 2 * K + 1.
Then left to right we go over this array of bits and if we meet 0 bit at position K then it means we found a prime number P = 2 * K + 1 and now starting from position (P * P) / 2 we mark every P-th bit with 1. It means we mark all numbers bigger than P*P that are composite, because they are divisible by P.
We do this procedure only until P * P becomes greater or equal to our limit End (we're finding all primes < End). This limit guarantees that after reaching it ALL zero bits inside array signify prime numbers.
Second version of code does only one optimization to this Simple version, it makes all multi-core (multi-threaded). But this only optimization makes code much bigger and more complex. Basically it slices whole range of bits into all cores, so that they write bits to memory in parallel.
I'll explain only my third Advanced version, it is most complex of 3 versions. It does not only multi-threaded optimization, but also so-called Primorial optimization.
What is Primorial, it is a product of first smallest primes, for example I take primorial 2 * 3 * 5 * 7 = 210.
We can see that any primorial splits infinite range of integers into wheels by modulus of this primorial. For example primorial 210 splits into ranges [0; 210), [210; 2210), [2210; 3*210), etc.
Now it is easy to mathematically prove that inside All ranges of primorial we can mark same positions of numbers as complex, exactly we can mark all numbers that are multiple of 2 or 3 or 5 or 7 as composite.
We can see that out of 210 remainders there are 162 remainders that are for sure composite, and only 48 remainders are probably prime.
Hence it is enough for us to check primality of only 48/210=22.8% of whole search space. This reduction of search space makes task more than 4x times faster, and 4x times less memory consuming.
One can see that my first Simple version in fact due to odd-only optimization was actually using Primorial equal to 2 optimization. Yes, if we take primorial 2 instead of primorial 210, then we gain exactly first version (Simple) algorithm.
All of my 3 versions are tested for correctness and speed. Although still some tiny bugs can remain. Note. Yet it is recommended not to use my code straight away in production, unless it is tested thoroughly.
All 3 versions are tested for correctness by re-using each other answers. I thoroughly test correctness by feeding all limits (end value) from 0 to 2^18. It takes some time to do this.
See main() function to figure out how to use my functions.
Try it online!
SOURCE CODE GOES HERE. Due to StackOverflow limit of 30K symbols per post, I can't inline source code here, as it is almost 30K in size and together with English post above it takes more than 30K. So I'm providing source code on separate Github Gist server, link below. Note that Try it online! link above also contains full source code, but I reduced search limit of 2^32 to smaller one due to GodBolt limit of running time to 3 seconds.
Github Gist code
Output:
10M time 'Simple' 0.024 sec
Time 2^32 'Simple' 46.924 sec, number of primes 203280221
Time 2^32 'Intermediate' 30.999 sec
Time 2^32 'Advanced' 11.359 sec
All checked till 0
All checked till 5000
All checked till 10000
All checked till 15000
All checked till 20000
All checked till 25000

C++: How to pick out last quarter of elements in a vector?

What is the best way to pick out the last quarter of the elements in a vector containg N elements?
size_t n = src.size();
std::vector<int> dest(src.begin() + (3*n)/4, src.end());
dest contains the last quarter elements from the source vector src.
You can also use std::copy from <algorithm> header file as,
std::vector<int> dest_copy;
std::copy(src.begin() + (3*n)/4, src.end(), std::back_inserter(dest_copy));
See the online demo at ideone : http://ideone.com/qrVod
I think, you may want to work more on the expression (3*n)/4. Like when n is say 5, you want to pick 1 element only, but when n is 7, you may want to pick 2 instead of 1. So this decision is upto you. My solution just tells you how would you copy the elements, once you decide exactly how many!
Something like this, I guess:
size_t lastQuarter = myVector.size() * 3 / 4;
for (size_t i = lastQuarter; i < myVector.size(); i++)
{
doSomething(myVector.at(i));
}