Better way to implement count_permutations? - c++

I need a function count_permutations() that returns the number of permutations of a given range. Assuming that the range is allowed to be modified, and starts at the first permutation, I could naively implement this as repeated calls to next_permutation() as below:
template<class Ret, class Iter>
Ret count_permutations(Iter first, Iter last)
{
Ret ret = 0;
do {
++ret;
} while (next_permutation(first, last));
return ret;
}
Is there a faster way that doesn't require iterating through all the permutations to find the answer? It could still assume that the input can be modified, and starts in the first permutation, but obviously if it is possible to implement without those assumtions it'd be great too.

The number of permutations for a range where all the elements are unique is n! where n is the length of the range.
If there are duplicate elements, you can use n!/(n_0!)...(n_m!) where n_0...n_m are the lengths of duplicate ranges.
So for example [1,2,3] has 3! = 6 permutations while [1,2,2] has 3!/2! = 3 permutations.
EDIT: A better example is [1,2,2,3,3,3] which has 6!/2!3! = 60 permutations.

In math the function factorial !n represents the number of permutations of n elements.
As Can Berg and Greg suggested, if there are repeated elements in a set, to take them into account, we must divide the factorial by the number of permutations of each indistinguishable group (groups composed of identical elements).
The following implementation counts the number of permutations of the elements in the range [first, end). The range is not required to be sorted.
// generic factorial implementation...
int factorial(int number) {
int temp;
if(number <= 1) return 1;
temp = number * factorial(number - 1);
return temp;
}
template<class Ret, class Iter>
Ret count_permutations(Iter first, Iter end)
{
std::map<typename Iter::value_type, int> counter;
Iter it = first;
for( ; it != end; ++it) {
counter[*it]++;
}
int n = 0;
typename std::map<typename Iter::value_type, int>::iterator mi = counter.begin();
for(; mi != counter.end() ; mi++)
if ( mi->second > 1 )
n += factorial(mi->second);
return factorial(std::distance(first,end))/n;
}

Related

Getting the nth value of any STL container and manipulating iterators

I'm trying to write a function that can do a binary search for any given STL container (i.e lists, vectors, etc). My approach was using two iterators, one for the beginning and end of the sequences.
template <class ln, class T> int binarysearch_list(ln first, ln last, T search_value) {
int low = 0;
int high = 0;
int midpoint = 0;
int start = 0;
while(first != last) {
first++;
start++;
}
high = start;
while(low <= high) {
ln current = next(first, midpoint);
if(search_value == *current) { // if we already find search_value at the midpoint we can simply exit
return midpoint;
} else if(search_value > midpoint) {
low = midpoint + 1; // change the low to the next element after midpoint
} else if(search_value < midpoint) {
high = midpoint - 1;
}
}
return -1;
}
The first while loop is used to get the size, the second one is the actual binary search. I get an error at ln current = next(first, midpoint), I'm not sure how to tackle increment the iterator. Ideally, where you see "if(search_value == *current)" it will act like vector[midpoint] or list(midpoint). I need a way to read the nth element of any given sequence using a begin and end iterator, and I need a way to flexibly increment an iterator by n. I've tried advance(first, amount), (first+amount), and many others but I'm stumped!

Find First Missing Element in a vector

This question has been asked before but I cannot find it for C++.
If I have a vector and I have a starting number, does std::algorithm provide me a way to find the next highest missing number?
I can obviously write this in a nested loop, I just cant shake the feeling that I'm reinventing the wheel.
For example, given: vector foo{13,8,3,6,10,1,7,0};
The starting number 0 should find 2.
The starting number 6 should find 9.
The starting number -2 should find -1.
EDIT:
Thus far all the solutions require sorting. This may in fact be required, but a temporary sorted vector would have to be created to accommodate this, as foo must remain unchanged.
At least as far as I know, there's no standard algorithm that directly implements exactly what you're asking for.
If you wanted to do it with something like O(N log N) complexity, you could start by sorting the input. Then use std::upper_bound to find the (last instance of) the number you've asked for (if present). From there, you'd find a number that differs from the previous by more than one. From there you'd scan for a difference greater than 1 between the consecutive numbers in the collection.
One way to do this in real code would be something like this:
#include <iostream>
#include <algorithm>
#include <vector>
#include <numeric>
#include <iterator>
int find_missing(std::vector<int> x, int number) {
std::sort(x.begin(), x.end());
auto pos = std::upper_bound(x.begin(), x.end(), number);
if (*pos - number > 1)
return number + 1;
else {
std::vector<int> diffs;
std::adjacent_difference(pos, x.end(), std::back_inserter(diffs));
auto pos2 = std::find_if(diffs.begin() + 1, diffs.end(), [](int x) { return x > 1; });
return *(pos + (pos2 - diffs.begin() - 1)) + 1;
}
}
int main() {
std::vector<int> x{ 13, 8, 3, 6, 10, 1,7, 0};
std::cout << find_missing(x, 0) << "\n";
std::cout << find_missing(x, 6) << "\n";
}
This is somewhat less than what you'd normally think of as optimal to provide the external appearance of a vector that can/does remain un-sorted (and unmodified in any way). I've done that by creating a copy of the vector, and sorting the copy inside the find_missing function. Thus, the original vector remains unmodified. The disadvantage is obvious: if the vector is large, copying it can/will be expensive. Furthermore, this ends up sorting the vector for every query instead of sorting once, then carrying out as many queries as desired on it.
So I thought I'd post an answer. I don't know anything in std::algorithm that accomplishes this directly, but in combination with vector<bool> you can do this in O(2N).
template <typename T>
T find_missing(const vector<T>& v, T elem){
vector<bool> range(v.size());
elem++;
for_each(v.begin(), v.end(), [&](const T& i){if((i >= elem && i - elem < range.size())range[i - elem] = true;});
auto result = distance(range.begin(), find(range.begin(), range.end(), false));
return result + elem;
}
First you need to sort the vector. Use std::sort for that.
std::lower_bound finds the first element that is greater or equal with a given element. (the elements have to be at least partially ordered)
From there you iterate while you have consecutive elements.
Dealing with duplicates: One way is the way I went: consider consecutive and equal elements when iterating. Another approach is to add a prerequisite that the vector / range contains unique elements. I chose the former because it avoids erasing elements.
Here is how you eliminate duplicates from a sorted vector:
v.erase(std::unique(v.begin(), v.end()), v.end());
My implementation:
// finds the first missing element in the vector v
// prerequisite: v must be sorted
auto firstMissing(std::vector<int> const &v, int elem) -> int {
auto low = std::lower_bound(std::begin(v), std::end(v), elem);
if (low == std::end(v) || *low != elem) {
return elem;
}
while (low + 1 != std::end(v) &&
(*low == *(low + 1) || *low + 1 == *(low + 1))) {
++low;
}
return *low + 1;
}
And a generalized version:
// finds the first missing element in the range [first, last)
// prerequisite: the range must be sorted
template <class It, class T = decltype(*std::declval<It>())>
auto firstMissing(It first, It last, T elem) -> T {
auto low = std::lower_bound(first, last, elem);
if (low == last || *low != elem) {
return elem;
}
while (std::next(low) != last &&
(*low == *std::next(low) || *low + 1 == *std::next(low))) {
std::advance(low, 1);
}
return *low + 1;
}
Test case:
int main() {
auto v = std::vector<int>{13, 8, 3, 6, 10, 1, 7, 7, 7, 0};
std::sort(v.begin(), v.end());
for (auto n : {-2, 0, 5, 6, 20}) {
cout << n << ": " << firstMissing(v, n) << endl;
}
return 0;
}
Result:
-2: -2
0: 2
5: 5
6: 9
20: 20
A note about sorting: From the OP's comments he was searching for a solution that wouldn't modify the vector.
You have to sort the vector for an efficient solution. If modifying the vector is not an option you could create a copy and work on it.
If you are hell-bent on not sorting, there is a brute force solution (very very inefficient - O(n^2)):
auto max = std::max_element(std::begin(v), std::end(v));
if (elem > *max) {
return elem;
}
auto i = elem;
while (std::find(std::begin(v), std::end(v), i) != std::end(v)) {
++i;
}
return i;
First solution:
Sort the vector. Find the starting number and see what number is next.
This will take O(NlogN) where N is the size of vector.
Second solution:
If the range of numbers is small e.g. (0,M) you can create boolean vector of size M. For each number of initial vector make the boolean of that index true. Later you can see next missing number by checking the boolean vector. This will take O(N) time and O(M) auxiliary memory.

Time complexity in terms of big O for a reverse vector

template <typename T>
void reverseVector(vector<T> &vec, int start, int end) {
if(start < end) {
char temp = vec[start];
vec[start] = vec[end];
vec[end] = temp;
reverseVector(vec, start + 1, end – 1); }
}
}
Assuming N = vec.size() what would be the time complexity of this method?
Assuming I am correct, a getting and setting a vector has time O(1). Thus, the first 3 lines in the if statement are all O(1) each. Then, the method recursively calls itself and each time the function becomes smaller, iterating n(n-1)(n-2)... times. So my answer would be O(n!) for this method. Am I correct?
edit: similar syntax, but with linked list
template <typename T>
void reverseLinkedList(list<T> &lst, int start, int end) {
if(start < end) {
char temp = lst[start];
lst[start] = lst[end];
lst[end] = temp;
reverseLinkedList(lst, start + 1, end – 1);
}
}
That is O(n).
You swap elements n/2 times: (0 with n-1), (1 with n-2), ... (n/2 - 1 with n/2 + 1)

What is the fastest way to find longest 'consecutive numbers' streak in vector ?

I have a sorted std::vector<int> and I would like to find the longest 'streak of consecutive numbers' in this vector and then return both the length of it and the smallest number in the streak.
To visualize it for you :
suppose we have :
1 3 4 5 6 8 9
I would like it to return: maxStreakLength = 4 and streakBase = 3
There might be occasion where there will be 2 streaks and we have to choose which one is longer.
What is the best (fastest) way to do this ? I have tried to implement this but I have problems with coping with more than one streak in the vector. Should I use temporary vectors and then compare their lengths?
No you can do this in one pass through the vector and only storing the longest start point and length found so far. You also need much fewer than 'N' comparisons. *
hint: If you already have say a 4 long match ending at the 5th position (=6) and which position do you have to check next?
[*] left as exercise to the reader to work out what's the likely O( ) complexity ;-)
It would be interesting to see if the fact that the array is sorted can be exploited somehow to improve the algorithm. The first thing that comes to mind is this: if you know that all numbers in the input array are unique, then for a range of elements [i, j] in the array, you can immediately tell whether elements in that range are consecutive or not, without actually looking through the range. If this relation holds
array[j] - array[i] == j - i
then you can immediately say that elements in that range are consecutive. This criterion, obviously, uses the fact that the array is sorted and that the numbers don't repeat.
Now, we just need to develop an algorithm which will take advantage of that criterion. Here's one possible recursive approach:
Input of recursive step is the range of elements [i, j]. Initially it is [0, n-1] - the whole array.
Apply the above criterion to range [i, j]. If the range turns out to be consecutive, there's no need to subdivide it further. Send the range to output (see below for further details).
Otherwise (if the range is not consecutive), divide it into two equal parts [i, m] and [m+1, j].
Recursively invoke the algorithm on the lower part ([i, m]) and then on the upper part ([m+1, j]).
The above algorithm will perform binary partition of the array and recursive descent of the partition tree using the left-first approach. This means that this algorithm will find adjacent subranges with consecutive elements in left-to-right order. All you need to do is to join the adjacent subranges together. When you receive a subrange [i, j] that was "sent to output" at step 2, you have to concatenate it with previously received subranges, if they are indeed consecutive. Or you have to start a new range, if they are not consecutive. All the while you have keep track of the "longest consecutive range" found so far.
That's it.
The benefit of this algorithm is that it detects subranges of consecutive elements "early", without looking inside these subranges. Obviously, it's worst case performance (if ther are no consecutive subranges at all) is still O(n). In the best case, when the entire input array is consecutive, this algorithm will detect it instantly. (I'm still working on a meaningful O estimation for this algorithm.)
The usability of this algorithm is, again, undermined by the uniqueness requirement. I don't know whether it is something that is "given" in your case.
Anyway, here's a possible C++ implementation
typedef std::vector<int> vint;
typedef std::pair<vint::size_type, vint::size_type> range;
class longest_sequence
{
public:
const range& operator ()(const vint &v)
{
current = max = range(0, 0);
process_subrange(v, 0, v.size() - 1);
check_record();
return max;
}
private:
range current, max;
void process_subrange(const vint &v, vint::size_type i, vint::size_type j);
void check_record();
};
void longest_sequence::process_subrange(const vint &v,
vint::size_type i, vint::size_type j)
{
assert(i <= j && v[i] <= v[j]);
assert(i == 0 || i == current.second + 1);
if (v[j] - v[i] == j - i)
{ // Consecutive subrange found
assert(v[current.second] <= v[i]);
if (i == 0 || v[i] == v[current.second] + 1)
// Append to the current range
current.second = j;
else
{ // Range finished
// Check against the record
check_record();
// Start a new range
current = range(i, j);
}
}
else
{ // Subdivision and recursive calls
assert(i < j);
vint::size_type m = (i + j) / 2;
process_subrange(v, i, m);
process_subrange(v, m + 1, j);
}
}
void longest_sequence::check_record()
{
assert(current.second >= current.first);
if (current.second - current.first > max.second - max.first)
// We have a new record
max = current;
}
int main()
{
int a[] = { 1, 3, 4, 5, 6, 8, 9 };
std::vector<int> v(a, a + sizeof a / sizeof *a);
range r = longest_sequence()(v);
return 0;
}
I believe that this should do it?
size_t beginStreak = 0;
size_t streakLen = 1;
size_t longest = 0;
size_t longestStart = 0;
for (size_t i=1; i < len.size(); i++) {
if (vec[i] == vec[i-1] + 1) {
streakLen++;
}
else {
if (streakLen > longest) {
longest = streakLen;
longestStart = beginStreak;
}
beginStreak = i;
streakLen = 1;
}
}
if (streakLen > longest) {
longest = streakLen;
longestStart = beginStreak;
}
You can't solve this problem in less than O(N) time. Imagine your list is the first N-1 even numbers, plus a single odd number (chosen from among the first N-1 odd numbers). Then there is a single streak of length 3 somewhere in the list, but worst case you need to scan the entire list to find it. Even on average you'll need to examine at least half of the list to find it.
Similar to Rodrigo's solutions but solving your example as well:
#include <vector>
#include <cstdio>
#define len(x) sizeof(x) / sizeof(x[0])
using namespace std;
int nums[] = {1,3,4,5,6,8,9};
int streakBase = nums[0];
int maxStreakLength = 1;
void updateStreak(int currentStreakLength, int currentStreakBase) {
if (currentStreakLength > maxStreakLength) {
maxStreakLength = currentStreakLength;
streakBase = currentStreakBase;
}
}
int main(void) {
vector<int> v;
for(size_t i=0; i < len(nums); ++i)
v.push_back(nums[i]);
int lastBase = v[0], currentStreakBase = v[0], currentStreakLength = 1;
for(size_t i=1; i < v.size(); ++i) {
if (v[i] == lastBase + 1) {
currentStreakLength++;
lastBase = v[i];
} else {
updateStreak(currentStreakLength, currentStreakBase);
currentStreakBase = v[i];
lastBase = v[i];
currentStreakLength = 1;
}
}
updateStreak(currentStreakLength, currentStreakBase);
printf("maxStreakLength = %d and streakBase = %d\n", maxStreakLength, streakBase);
return 0;
}

categorize a double into arbitrary bins

I'm looking for a class that categorizes floating point numbers into arbitrary bins. The bins. The desired syntax would be something like:
std::vector<double> bin_vector;
// ..... fill the vector with 1, 1.4, 5, etc not evenly spaced values
Binner bins(bin_vector);
for (std::vector<double>::const_iterator d_itr = some_vector.begin();
d_itr != some_vector.end(); d_itr++) {
int bin = bins.categorize(*d_itr);
// bin would be 0 for x < 1, 1 for 1 < x < 1.4, etc
// do something with bin
}
Unfortunately, due to portability requirements I'm limited to boost and stl. I've rolled my own O(log n) solutions using maps and overloading < for a custom range object, but that solution seemed bug prone and ugly at best.
Is there some simple stl or boost object solution to this?
Use a std::map, mapping interval boundaries to bin numbers. Then use .upper_bound() to find the bin.
Here is an untested generic algorithm that takes an input vector of arbitrary length M and a sorted vector of N-1 bin boundaries, and that returns a vector of N bin counts. Bin i counts the values in the interval [breaks[i-1], breaks[i]). The types T1 and T2 should be mutually comparable. Complexity is equal to O(M * log (N)).
#include<algorithm> // std::is_sorted, std::lower_bound
#include<cassert> // assert
#include<iterator> // std::distance
#include<vector> // std::vector
template<typename T1, typename T2>
std::vector<std::size_t> bin_count(const std::vector<T1>& input, const std::vector<T2>& breaks)
{
// breaks is a sorted vector -INF < a0 < a1 < ... < aN-2 < +INF
assert(std::is_sorted(breaks.begin(), breaks.end()));
auto N = breaks.size() + 1;
std::vector<std::size_t> output(N, 0);
if (N == 1) {
// everything is inside [-INF, INF)
output[0] = input.size();
return output;
}
for(auto it = input.begin(), it != input.end(); ++it) {
if (*it < breaks.front()) {
// first bin counts values in [-INF, a0)
++output[0];
break;
}
if (*it >= breaks.back()) {
// last bin counts values in [aN-1, +INF)
++output[N-1];
break;
}
const auto break_it = std::lower_bound(breaks.begin(), breaks.end(), *it);
bin_index = std::distance(breaks.begin(), break_it) + 1;
++output[bin_index];
}
return output;
}