C++ Binomial Coefficient is too slow - c++

I've tried to compute the binomial coefficient by making a recursion with Pascal's triangle. It works great for small numbers, but 20 up is either really slow or doesn't work at all.
I've tried to look up some optimization techniques, such as "chaching" but they don't really seem to be well integrated in C++.
Here's the code if that helps you.
int binom(const int n, const int k)
{
double sum;
if(n == 0 || k == 0){
sum = 1;
}
else{
sum = binom(n-1,k-1)+binom(n-1,k);
}
if((n== 1 && k== 0) || (n== 1 && k== 1))
{
sum = 1;
}
if(k > n)
{
sum = 0;
}
return sum;
}
int main()
{
int n;
int k;
int sum;
cout << "Enter a n: ";
cin >> n;
cout << "Enter a k: ";
cin >> k;
Summe = binom(n,k);
cout << endl << endl << "Number of possible combinations: " << sum <<
endl;
}
My guess is that the programm wastes a lot of time calculating results it has already calculated. It somehow must memorize past results.

My guess is that the program wastes a lot of time calculating results it has already calculated.
That's definitely true.
On this topic, I'd suggest you have a look to Dynamic Programming Topic.
There is a class of problem which requires an exponential runtime complexity but they can be solved with Dynamic Programming Techniques.
That'd reduce the runtime complexity to polynomial complexity (most of the times, at the expense of increasing space complexity).
The common approaches for dynamic programming are:
Top-Down (exploiting memoization and recursion).
Bottom-Up (iterative).
Following, my bottom-up solution (fast and compact):
int BinomialCoefficient(const int n, const int k) {
std::vector<int> aSolutions(k);
aSolutions[0] = n - k + 1;
for (int i = 1; i < k; ++i) {
aSolutions[i] = aSolutions[i - 1] * (n - k + 1 + i) / (i + 1);
}
return aSolutions[k - 1];
}
This algorithm has a runtime complexity O(k) and space complexity O(k).
Indeed, this is a linear.
Moreover, this solution is simpler and faster than the recursive approach. It is very CPU cache-friendly.
Note also there is no dependency on n.
I have achieved this result exploiting simple math operations and obtaining the following formula:
(n, k) = (n - 1, k - 1) * n / k
Some math references on the Binomial Coeffient.
Note
The algorithm does not really need a space complexity of O(k).
Indeed, the solution at i-th step depends only on (i-1)-th.
Therefore, there is no need to store all intermediate solutions but just the one at the previous step. That would make the algorithm O(1) in terms of space complexity.
However, I would prefer keeping all intermediate solutions in solution code to better show the principle behind the Dynamic Programming methodology.
Here my repository with the optimized algorithm.

I would cache the results of each calculation in a map. You can't make a map with a complex key, but you could turn the key into a string.
string key = string("") + n.to_s() + "," + k.to_s();
Then have a global map:
map<string, double> cachedValues;
You can then do a lookup with the key, and if found, return immediately. otherwise before your return, store to the map.
I began mapping out what would happen with a call to 4,5. It gets messy, with a LOT of calculations. Each level deeper results in 2^n lookups.
I don't know if your basic algorithm is correct, but if so, then I'd move this code to the top of the method:
if(k > n)
{
return 0;
}
As it appears that if k > n, you always return 0, even for something like 6,100. I don't know if that's correct or not, however.

You're computing some binomial values multiple times. A quick solution is memoization.
Untested:
int binom(int n, int k);
int binom_mem(int n, int k)
{
static std::map<std::pair<int, int>, std::optional<int>> lookup_table;
auto const input = std::pair{n,k};
if (lookup_table[input].has_value() == false) {
lookup_table[input] = binom(n, k);
}
return lookup_table[input];
}
int binom(int n, int k)
{
double sum;
if (n == 0 || k == 0){
sum = 1;
} else {
sum = binom_mem(n-1,k-1) + binom_mem(n-1,k);
}
if ((n== 1 && k== 0) || (n== 1 && k== 1))
{
sum = 1;
}
if(k > n)
{
sum = 0;
}
return sum;
}
A better solution would be to turn the recursion tailrec (not easy with double recursions) or better yet, not use recursion at all ;)

I found this very simple (perhaps a bit slow) method of writing the binomial coefficient even for non integers, based on this proof (written by me):
double binomial_coefficient(float k, int a) {
double b=1;
for(int p=1; p<=a; p++) {
b=b*(k+1-p)/p;
}
return b;
}

If you can tolerate wasting some compile time memory, you can pre-compute a Pascal-Triangle at compile time. With a simple lookup mechanism, this will give you maximum speed.
The downsite is that you can only calculate up to the 69th row. After that, even an unsigned long long would overflow.
So, we simply use a constexpr function and calculate the values for a Pascal triangle in a 2 dimensional compile-time constexpr std::array.
The nCr function simply uses an index into that array (into Pascals Triangle).
Please see the following example code:
#include <iostream>
#include <utility>
#include <array>
#include <iomanip>
#include <cmath>
// Biggest number for which nCR will work with a 64 bit variable: 69
constexpr size_t MaxN = 69u;
// If we store Pascal Triangle in a 2 dimensional array, the size will be that
constexpr size_t ArraySize = MaxN;
// This function will generate Pascals triangle stored in a 2 dimension std::array
constexpr auto calculatePascalTriangle() {
// Result of function. Here we will store Pascals triangle as a 1 dimensional array
std::array<std::array<unsigned long long, ArraySize>, ArraySize> pascalTriangle{};
// Go through all rows and columns of Pascals triangle
for (size_t row{}; row < MaxN; ++row) for (size_t col{}; col <= row; ++col) {
// Border valus are always one
unsigned long long result{ 1 };
if (col != 0 && col != row) {
// And calculate the new value for the current row
result = pascalTriangle[row - 1][col - 1] + pascalTriangle[row - 1][col];
}
// Store new value
pascalTriangle[row][col] = result;
}
// And return array as function result
return pascalTriangle;
}
// This is a constexpr std::array<std::array<unsigned long long,ArraySize>, ArraySize> with the name PPP, conatining all nCr results
constexpr auto PPP = calculatePascalTriangle();
// To calculate nCr, we used look up the value from the array
constexpr unsigned long long nCr(size_t n, size_t r) {
return PPP[n][r];
}
// Some debug test driver code. Print Pascal triangle
int main() {
constexpr size_t RowsToPrint = 16u;
const size_t digits = static_cast<size_t>(std::ceil(std::log10(nCr(RowsToPrint, RowsToPrint / 2))));
for (size_t row{}; row < RowsToPrint; ++row) {
std::cout << std::string((RowsToPrint - row) * ((digits + 1) / 2), ' ');
for (size_t col{}; col <= row; ++col)
std::cout << std::setw(digits) << nCr(row, col) << ' ';
std::cout << '\n';
}
return 0;
}
We can also store Pascals Triangle in a 1 dimensional constexpr std::array. But then we need to additionally calculate the Triangle numbers to find the start index for a row. But also this can be done completely at compile time.
Then the solution would look like this:
#include <iostream>
#include <utility>
#include <array>
#include <iomanip>
#include <cmath>
// Biggest number for which nCR will work with a 64 bit variable
constexpr size_t MaxN = 69u; //14226520737620288370
// If we store Pascal Triangle in an 1 dimensional array, the size will be that
constexpr size_t ArraySize = (MaxN + 1) * MaxN / 2;
// To get the offset of a row of a Pascals Triangle stored in an1 1 dimensional array
constexpr size_t getTriangleNumber(size_t row) {
size_t sum{};
for (size_t i = 1; i <= row; i++) sum += i;
return sum;
}
// Generate a std::array with n elements of a given type and a generator function
template <typename DataType, DataType(*generator)(size_t), size_t... ManyIndices>
constexpr auto generateArray(std::integer_sequence<size_t, ManyIndices...>) {
return std::array<DataType, sizeof...(ManyIndices)>{ { generator(ManyIndices)... } };
}
// This is a std::arrax<size_t,MaxN> withe the Name TriangleNumber, containing triangle numbers for ip ti MaxN
constexpr auto TriangleNumber = generateArray<size_t, getTriangleNumber>(std::make_integer_sequence<size_t, MaxN>());
// This function will generate Pascals triangle stored in an 1 dimension std::array
constexpr auto calculatePascalTriangle() {
// Result of function. Here we will store Pascals triangle as an 1 dimensional array
std::array <unsigned long long, ArraySize> pascalTriangle{};
size_t index{}; // Running index for storing values in the array
// Go through all rows and columns of Pascals triangle
for (size_t row{}; row < MaxN; ++row) for (size_t col{}; col <= row; ++col) {
// Border valuse are always one
unsigned long long result{ 1 };
if (col != 0 && col != row) {
// So, we are not at the border. Get the start index the upper 2 values
const size_t offsetOfRowAbove = TriangleNumber[row - 1] + col;
// And calculate the new value for the current row
result = pascalTriangle[offsetOfRowAbove] + pascalTriangle[offsetOfRowAbove - 1];
}
// Store new value
pascalTriangle[index++] = result;
}
// And return array as function result
return pascalTriangle;
}
// This is a constexpr std::array<unsigned long long,ArraySize> with the name PPP, conatining all nCr results
constexpr auto PPP = calculatePascalTriangle();
// To calculate nCr, we used look up the value from the array
constexpr unsigned long long nCr(size_t n, size_t r) {
return PPP[TriangleNumber[n] + r];
}
// Some debug test driver code. Print Pascal triangle
int main() {
constexpr size_t RowsToPrint = 16; // MaxN - 1;
const size_t digits = static_cast<size_t>(std::ceil(std::log10(nCr(RowsToPrint, RowsToPrint / 2))));
for (size_t row{}; row < RowsToPrint; ++row) {
std::cout << std::string((RowsToPrint - row+1) * ((digits+1) / 2), ' ');
for (size_t col{}; col <= row; ++col)
std::cout << std::setw(digits) << nCr(row, col) << ' ';
std::cout << '\n';
}
return 0;
}

Related

How can I approach this CP task?

The task (from a Bulgarian judge, click on "Език" to change it to English):
I am given the size of the first (S1 = A) of N corals. The size of every subsequent coral (Si, where i > 1) is calculated using the formula (B*Si-1 + C)%D, where A, B, C and D are some constants. I am told that Nemo is nearby the Kth coral (when the sizes of all corals are sorted in ascending order).
What is the size of the above-mentioned Kth coral ?
I will have T tests and for every one of them I will be given N, K, A, B, C and D and prompted to output the size of the Kth coral.
The requirements:
1 ≤ T ≤ 3
1 ≤ K ≤ N ≤ 107
0 ≤ A < D ≤ 1018
1 ≤ C, B*D ≤ 1018
Memory available is 64 MB
Time limit is 1.9 sec
The problem I have:
For the worst case scenario I will need 107*8B which is 76 MB.
The solution If the memory available was at least 80 MB would be:
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
std::vector<biggie>::iterator it_ans;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
std::vector<biggie> lut{ a };
lut.reserve(n);
for (j = 1; j != n; ++j) {
lut.emplace_back((b * lut.back() + c) % d);
}
it_ans = std::next(lut.begin(), k - 1);
std::nth_element(lut.begin(), it_ans, lut.end());
std::cout << *it_ans << '\n';
}
return 0;
}
Question 1: How can I approach this CP task given the requirements listed above ?
Question 2: Is it somehow possible to use std::nth_element to solve it since I am not able to store all N elements ? I mean using std::nth_element in a sliding window technique (If this is possible).
# Christian Sloper
#include <iostream>
#include <queue>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j, j_lim;
biggie a, b, c, d, prev, curr;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
if (k < n - k + 1) {
std::priority_queue<biggie, std::vector<biggie>, std::less<biggie>> q;
q.push(a);
prev = a;
for (j = 1; j != k; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr < q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
else {
std::priority_queue<biggie, std::vector<biggie>, std::greater<biggie>> q;
q.push(a);
prev = a;
for (j = 1, j_lim = n - k + 1; j != j_lim; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr > q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
}
return 0;
}
This gets accepted (Succeeds all 40 tests. Largest time 1.4 seconds, for a test with T=3 and D≤10^9. Largest time for a test with larger D (and thus T=1) is 0.7 seconds.).
#include <iostream>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
biggie prefix = 0;
for (int shift = d > 1000000000 ? 40 : 20; shift >= 0; shift -= 20) {
biggie prefix_mask = ((biggie(1) << (40 - shift)) - 1) << (shift + 20);
int count[1 << 20] = {0};
biggie s = a;
int rank = 0;
for (j = 0; j != n; ++j) {
biggie s_vs_prefix = s & prefix_mask;
if (s_vs_prefix < prefix)
++rank;
else if (s_vs_prefix == prefix)
++count[(s >> shift) & ((1 << 20) - 1)];
s = (b * s + c) % d;
}
int i = -1;
while (rank < k)
rank += count[++i];
prefix |= biggie(i) << shift;
}
std::cout << prefix << '\n';
}
return 0;
}
The result is a 60 bits number. I first determine the high 20 bits with one pass through the numbers, then the middle 20 bits in another pass, then the low 20 bits in another.
For the high 20 bits, generate all the numbers and count how often each high 20 bits pattern occurrs. After that, add up the counts until you reach K. The pattern where you reach K, that pattern covers the K-th largest number. In other words, that's the result's high 20 bits.
The middle and low 20 bits are computed similarly, except we take the by then known prefix (the high 20 bits or high+middle 40 bits) into account. As a little optimization, when D is small, I skip computing the high 20 bits. That got me from 2.1 seconds down to 1.4 seconds.
This solution is like user3386109 described, except with bucket size 2^20 instead of 10^6 so I can use bit operations instead of divisions and think of bit patterns instead of ranges.
For the memory constraint you hit:
(B*Si-1 + C)%D
requires only the value (Si-2) before itself. So you can compute them in pairs, to use only 1/2 of total you need. This only needs indexing even values and iterating once for odd values. So you can just use half-length LUT and compute the odd value in-flight. Modern CPUs are fast enough to do extra calculations like these.
std::vector<biggie> lut{ a_i,a_i_2,a_i_4,... };
a_i_3=computeOddFromEven(lut[1]);
You can make a longer stride like 4,8 too. If dataset is large, RAM latency is big. So it's like having checkpoints in whole data search space to balance between memory and core usage. 1000-distance checkpoints would put a lot of cpu cycles into re-calculations but then the array would fit CPU's L2/L1 cache which is not bad. When sorting, the maximum re-calc iteration per element would be n=1000 now. O(1000 x size) maybe it's a big constant but maybe somehow optimizable by compiler if some constants really const?
If CPU performance becomes problem again:
write a compiling function that writes your source code with all the "constant" given by user to a string
compile the code using command-line (assuming target computer has some accessible from command line like g++ from main program)
run it and get results
Compiler should enable more speed/memory optimizations when those are really constant in compile-time rather than depending on std::cin.
If you really need to add a hard-limit to the RAM usage, then implement a simple cache with the backing-store as your heavy computations with brute-force O(N^2) (or O(L x N) with checkpoints every L elements as in first method where L=2 or 4, or ...).
Here's a sample direct-mapped cache with 8M long-long value space:
int main()
{
std::vector<long long> checkpoints = {
a_0, a_16, a_32,...
};
auto cacheReadMissFunction = [&](int key){
// your pure computational algorithm here, helper meant to show variable
long long result = checkpoints[key/16];
for(key - key%16 times)
result = iterate(result);
return result;
};
auto cacheWriteMissFunction = [&](int key, long long value){
/* not useful for your algorithm as it doesn't change behavior per element */
// backing_store[key] = value;
};
// due to special optimizations, size has to be 2^k
int cacheSize = 1024*1024*8;
DirectMappedCache<int, long long> cache(cacheSize,cacheReadMissFunction,cacheWriteMissFunction);
std::cout << cache.get(20)<<std::endl;
return 0;
}
If you use a cache-friendly sorting-algorithm, a direct cache access would make a lot of re-use for nearly all the elements in comparisons if you fill the output buffer/terminal with elements one by one by following something like a bitonic-sort-path (that is known in compile-time). If that doesn't work, then you can try accessing files as a "backing-store" of cache for sorting whole array at once. Is file system prohibited for use? Then the online-compiling method above won't work either.
Implementation of a direct mapped cache (don't forget to call flush() after your algorithm finishes, if you use any cache.set() method):
#ifndef DIRECTMAPPEDCACHE_H_
#define DIRECTMAPPEDCACHE_H_
#include<vector>
#include<functional>
#include<mutex>
#include<iostream>
/* Direct-mapped cache implementation
* Only usable for integer type keys in range [0,maxPositive-1]
*
* CacheKey: type of key (only integers: int, char, size_t)
* CacheValue: type of value that is bound to key (same as above)
*/
template< typename CacheKey, typename CacheValue>
class DirectMappedCache
{
public:
// allocates buffers for numElements number of cache slots/lanes
// readMiss: cache-miss for read operations. User needs to give this function
// to let the cache automatically get data from backing-store
// example: [&](MyClass key){ return redis.get(key); }
// takes a CacheKey as key, returns CacheValue as value
// writeMiss: cache-miss for write operations. User needs to give this function
// to let the cache automatically set data to backing-store
// example: [&](MyClass key, MyAnotherClass value){ redis.set(key,value); }
// takes a CacheKey as key and CacheValue as value
// numElements: has to be integer-power of 2 (e.g. 2,4,8,16,...)
DirectMappedCache(CacheKey numElements,
const std::function<CacheValue(CacheKey)> & readMiss,
const std::function<void(CacheKey,CacheValue)> & writeMiss):size(numElements),sizeM1(numElements-1),loadData(readMiss),saveData(writeMiss)
{
// initialize buffers
for(size_t i=0;i<numElements;i++)
{
valueBuffer.push_back(CacheValue());
isEditedBuffer.push_back(0);
keyBuffer.push_back(CacheKey()-1);// mapping of 0+ allowed
}
}
// get element from cache
// if cache doesn't find it in buffers,
// then cache gets data from backing-store
// then returns the result to user
// then cache is available from RAM on next get/set access with same key
inline
const CacheValue get(const CacheKey & key) noexcept
{
return accessDirect(key,nullptr);
}
// only syntactic difference
inline
const std::vector<CacheValue> getMultiple(const std::vector<CacheKey> & key) noexcept
{
const int n = key.size();
std::vector<CacheValue> result(n);
for(int i=0;i<n;i++)
{
result[i]=accessDirect(key[i],nullptr);
}
return result;
}
// thread-safe but slower version of get()
inline
const CacheValue getThreadSafe(const CacheKey & key) noexcept
{
std::lock_guard<std::mutex> lg(mut);
return accessDirect(key,nullptr);
}
// set element to cache
// if cache doesn't find it in buffers,
// then cache sets data on just cache
// writing to backing-store only happens when
// another access evicts the cache slot containing this key/value
// or when cache is flushed by flush() method
// then returns the given value back
// then cache is available from RAM on next get/set access with same key
inline
void set(const CacheKey & key, const CacheValue & val) noexcept
{
accessDirect(key,&val,1);
}
// thread-safe but slower version of set()
inline
void setThreadSafe(const CacheKey & key, const CacheValue & val) noexcept
{
std::lock_guard<std::mutex> lg(mut);
accessDirect(key,&val,1);
}
// use this before closing the backing-store to store the latest bits of data
void flush()
{
try
{
std::lock_guard<std::mutex> lg(mut);
for (size_t i=0;i<size;i++)
{
if (isEditedBuffer[i] == 1)
{
isEditedBuffer[i]=0;
auto oldKey = keyBuffer[i];
auto oldValue = valueBuffer[i];
saveData(oldKey,oldValue);
}
}
}catch(std::exception &ex){ std::cout<<ex.what()<<std::endl; }
}
// direct mapped access
// opType=0: get
// opType=1: set
CacheValue const accessDirect(const CacheKey & key,const CacheValue * value, const bool opType = 0)
{
// find tag mapped to the key
CacheKey tag = key & sizeM1;
// compare keys
if(keyBuffer[tag] == key)
{
// cache-hit
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
valueBuffer[tag]=*value;
}
// cache hit value
return valueBuffer[tag];
}
else // cache-miss
{
CacheValue oldValue = valueBuffer[tag];
CacheKey oldKey = keyBuffer[tag];
// eviction algorithm start
if(isEditedBuffer[tag] == 1)
{
// if it is "get"
if(opType==0)
{
isEditedBuffer[tag]=0;
}
saveData(oldKey,oldValue);
// "get"
if(opType==0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else /* "set" */
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
else // not edited
{
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
}
// "get"
if(opType == 0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else // "set"
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
}
}
private:
const CacheKey size;
const CacheKey sizeM1;
std::mutex mut;
std::vector<CacheValue> valueBuffer;
std::vector<unsigned char> isEditedBuffer;
std::vector<CacheKey> keyBuffer;
const std::function<CacheValue(CacheKey)> loadData;
const std::function<void(CacheKey,CacheValue)> saveData;
};
#endif /* DIRECTMAPPEDCACHE_H_ */
You can solve this problem using a Max-heap.
Insert the first k elements into the max-heap. The largest element of these k will now be at the root.
For each remaining element e:
Compare e to the root.
If e is larger than the root, discard it.
If e is smaller than the root, remove the root and insert e into the heap structure.
After all elements have been processed, the k-th smallest element is at the root.
This method uses O(K) space and O(n log n) time.
There’s an algorithm that people often call LazySelect that I think would be perfect here.
With high probability, we make two passes. In the first pass, we save a random sample of size n much less than N. The answer will be around index (K/N)n in the sorted sample, but due to the randomness, we have to be careful. Save the values a and b at (K/N)n ± r instead, where r is the radius of the window. In the second pass, we save all of the values in [a, b], count the number of values less than a (let it be L), and select the value with index K−L if it’s in the window (otherwise, try again).
The theoretical advice on choosing n and r is fine, but I would be pragmatic here. Choose n so that you use most of the available memory; the bigger the sample, the more informative it is. Choose r fairly large as well, but not quite as aggressively due to the randomness.
C++ code below. On the online judge, it’s faster than Kelly’s (max 1.3 seconds on the T=3 tests, 0.5 on the T=1 tests).
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <limits>
#include <optional>
#include <random>
#include <vector>
namespace {
class LazySelector {
public:
static constexpr std::int32_t kTargetSampleSize = 1000;
explicit LazySelector() { sample_.reserve(1000000); }
void BeginFirstPass(const std::int32_t n, const std::int32_t k) {
sample_.clear();
mask_ = n / kTargetSampleSize;
mask_ |= mask_ >> 1;
mask_ |= mask_ >> 2;
mask_ |= mask_ >> 4;
mask_ |= mask_ >> 8;
mask_ |= mask_ >> 16;
}
void FirstPass(const std::int64_t value) {
if ((gen_() & mask_) == 0) {
sample_.push_back(value);
}
}
void BeginSecondPass(const std::int32_t n, const std::int32_t k) {
sample_.push_back(std::numeric_limits<std::int64_t>::min());
sample_.push_back(std::numeric_limits<std::int64_t>::max());
const double p = static_cast<double>(sample_.size()) / n;
const double radius = 2 * std::sqrt(sample_.size());
const auto lower =
sample_.begin() + std::clamp<std::int32_t>(std::floor(p * k - radius),
0, sample_.size() - 1);
const auto upper =
sample_.begin() + std::clamp<std::int32_t>(std::ceil(p * k + radius), 0,
sample_.size() - 1);
std::nth_element(sample_.begin(), upper, sample_.end());
std::nth_element(sample_.begin(), lower, upper);
lower_ = *lower;
upper_ = *upper;
sample_.clear();
less_than_lower_ = 0;
equal_to_lower_ = 0;
equal_to_upper_ = 0;
}
void SecondPass(const std::int64_t value) {
if (value < lower_) {
++less_than_lower_;
} else if (upper_ < value) {
} else if (value == lower_) {
++equal_to_lower_;
} else if (value == upper_) {
++equal_to_upper_;
} else {
sample_.push_back(value);
}
}
std::optional<std::int64_t> Select(std::int32_t k) {
if (k < less_than_lower_) {
return std::nullopt;
}
k -= less_than_lower_;
if (k < equal_to_lower_) {
return lower_;
}
k -= equal_to_lower_;
if (k < sample_.size()) {
const auto kth = sample_.begin() + k;
std::nth_element(sample_.begin(), kth, sample_.end());
return *kth;
}
k -= sample_.size();
if (k < equal_to_upper_) {
return upper_;
}
return std::nullopt;
}
private:
std::default_random_engine gen_;
std::vector<std::int64_t> sample_ = {};
std::int32_t mask_ = 0;
std::int64_t lower_ = std::numeric_limits<std::int64_t>::min();
std::int64_t upper_ = std::numeric_limits<std::int64_t>::max();
std::int32_t less_than_lower_ = 0;
std::int32_t equal_to_lower_ = 0;
std::int32_t equal_to_upper_ = 0;
};
} // namespace
int main() {
int t;
std::cin >> t;
for (int i = t; i > 0; --i) {
std::int32_t n;
std::int32_t k;
std::int64_t a;
std::int64_t b;
std::int64_t c;
std::int64_t d;
std::cin >> n >> k >> a >> b >> c >> d;
std::optional<std::int64_t> ans = std::nullopt;
LazySelector selector;
do {
{
selector.BeginFirstPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.FirstPass(s);
s = (b * s + c) % d;
}
}
{
selector.BeginSecondPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.SecondPass(s);
s = (b * s + c) % d;
}
}
ans = selector.Select(k - 1);
} while (!ans);
std::cout << *ans << '\n';
}
}

Time limit exceeded issue in c++ UVa problem for university homework

I have a Time limit exceeded issue in problem 100 from UVa.
the question is here:
https://onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&category=24&page=show_problem&problem=36
Here is my code. Please help me find a solution. How can I avoid such problems?
I don't know if it is the problem with cin and cout or the while loops? this program works well in my terminal when I run it.
#include <iostream>
using namespace std;
int main()
{
int i , j, temp, n;
while (cin >> i >> j) //asking for user input
{
int x, y;
x = i;
y = j;
if (i > j) //sorting i and j to fix the order of numbers
{
temp = j;
j = i;
i = temp;
}
int answer = 0;
int counter;
while (i <= j)
{
n = i;
counter = 1; // make the value of counter to 1 because it increases if i is 1
while (1)
{
if(n == 1) { //if n = 1 then stop
break;
} else if (n % 2 == 0) //cheak if i is odd
{
n = (3 % n) + 1;
} else {
n = n / 2; //cheak if i is even
}
counter++; //increase by one for every number that is not 1
}
if (counter > answer)
{
answer = counter;
}
i++;
}
cout << x << " " << y << " " << answer << "\n";
}
return 0;
}
Thanks in advance
In my humble opinion this problem is not about calculating the resulting values using the given algorithm. Because of the simplicity this is just some noise. So,maybe we are talking about a XY Problem here.
Maybe I am wrong, but the main problem here seems to be memoization.
It maybe that values need to be calculated over and over again, because they are in some overlapped range. And this is not necessary.
So, we could memorize already calculated values, for example in a std::unordered_map (or std::map). So, something like in the below:
unsigned int getSteps(size_t index) noexcept {
unsigned counter{};
while (index != 1) {
if (index % 2) index = index * 3 + 1;
else index /= 2;
++counter;
}
return counter+1;
}
unsigned int getStepsMemo(size_t index) {
// Here we will memorize whatever we calculated before
static std::unordered_map<unsigned int, unsigned int> memo{};
// Resulting value
unsigned int result{};
// Look, if we did calculate the value in the past
auto iter = memo.find(index);
if (iter != memo.end())
// If yes, then reuse old value
result = iter->second;
else {
// If no, then calculate new and memorize it
result = getSteps(index);
memo[index] = result;
}
return result;
}
This will help with many given input pairs. It will avoid recalculating steps for already calculated values.
But having thought in this direction, we can also calculate all values at compile time and store them in a constexpr std::array. Then no calculation will be done during runtime. All steps for any number up to 10000 will be precalculated. So, the algorithm will never be called during runtime.
It should be clear that this is the fastest possible algorithm, because we do nothing. Just get the value from a lookup table.
And if we want to make things nice, then we pack everything in a class and let the class encapsulate the problem. Even input and output operatores will be overwritten and used for our own purposes.
And in the end, we will have an ultra fast one liner in our function main. Please see:
#include <iostream>
#include <utility>
#include <sstream>
#include <array>
#include <algorithm>
#include <iterator>
#include <unordered_map>
// All done during compile time -------------------------------------------------------------------
constexpr unsigned int getSteps(size_t index) noexcept {
unsigned counter{};
while (index != 1) {
if (index % 2) index = index * 3 + 1;
else index /= 2;
++counter;
}
return counter+1;
}
// Some helper to create a constexpr std::array initilized by a generator function
template <typename Generator, size_t ... Indices>
constexpr auto generateArrayHelper(Generator generator, std::index_sequence<Indices...>) {
return std::array<decltype(std::declval<Generator>()(size_t{})), sizeof...(Indices) > { generator(Indices+1)... };
}
template <size_t Size, typename Generator>
constexpr auto generateArray(Generator generator) {
return generateArrayHelper(generator, std::make_index_sequence<Size>());
}
constexpr size_t MaxIndex = 10000;
// This is the definition of a std::array<unsigned long long, 10000> with all step counts
constexpr auto steps = generateArray<MaxIndex>(getSteps);
// End of: All done during compile time -----------------------------------------------------------
// Some very simple helper class for easier handling of the functionality
struct StepsForPair {
// A pair with special functionality
unsigned int first{};
unsigned int second{};
// Simple extraction operator. Read 2 values
friend std::istream& operator >> (std::istream& is, StepsForPair& sfp) {
return is >> sfp.first >> sfp.second;
}
// Simple inserter. Sort first and second value and show result
friend std::ostream& operator << (std::ostream& os, const StepsForPair& sfp) {
unsigned int f{ sfp.first }, s{ sfp.second };
if (f > s) std::swap(f, s);
return os << sfp.first << ' ' << sfp.second << ' ' << *std::max_element(&steps[f], &steps[s]);
}
};
// Some test data. I will not use std::cin, but read from this std::istringstream here
std::istringstream testData{ R"(1 10
100 200
201 210
900 1000
22 22)" };
int main() {
// Read all input data and generate output
std::copy(std::istream_iterator<StepsForPair>(testData), {}, std::ostream_iterator<StepsForPair>(std::cout,"\n"));
}
Please note, since I do not have std::cin here on SO, I read the test values from a std::istringstream. Because of the overwritten extractor operator, this is easily possible.
If you want to read from std::cin then please replace in the std::copy statement in main "testData" eith "std::cin".
If you want to read from a file, then put a fileStream variable in there.
In this line n = (3 % n) + 1;, (3 % n) means that you take the remainder of 3 divided by n, which is probably not what you want. Change that to 3 * n

Minimum cuts on a rectangle to make into squares

I'm trying to solve this problem:
Given an a×b rectangle, your task is to cut it into squares. On each move you can select a rectangle and cut it into two rectangles in such a way that all side lengths remain integers. What is the minimum possible number of moves?
My logic is that the minimum number of cuts means the minimum number of squares; I don't know if it's the correct approach.
I see which side is smaller, Now I know I need to cut bigSide/SmallSide of cuts to have squares of smallSide sides, then I am left with SmallSide and bigSide%smallSide. Then I go on till any side is 0 or both are equal.
#include <iostream>
int main() {
int a, b; std::cin >> a >> b; // sides of the rectangle
int res = 0;
while (a != 0 && b != 0) {
if (a > b) {
if (a % b == 0)
res += a / b - 1;
else
res += a / b;
a = a % b;
} else if (b > a) {
if (b % a == 0)
res += b / a - 1;
else
res += b / a;
b = b % a;
} else {
break;
}
}
std::cout << res;
return 0;
}
When the input is 404 288, my code gives 18, but the right answer is actually 10.
What am I doing wrong?
It seems clear to me that the problem defines each move as cutting a rectangle to two rectangles along the integer lines, and then asks for the minimum number of such cuts. As you can see there is a clear recursive nature in this problem. Once you cut a rectangle to two parts, you can recurse and cut each of them into squares with minimum moves and then sum up the answers. The problem is that the recursion might lead to exponential time complexity which leads us directly do dynamic programming. You have to use memoization to solve it efficiently (worst case time O(a*b*(a+b))) Here is what I'd suggest doing:
#include <iostream>
#include <vector>
using std::vector;
int min_cuts(int a, int b, vector<vector<int> > &mem) {
int min = mem[a][b];
// if already computed, just return the value
if (min > 0)
return min;
// if one side is divisible by the other,
// store min-cuts in 'min'
if (a%b==0)
min= a/b-1;
else if (b%a==0)
min= b/a -1;
// if there's no obvious solution, recurse
else {
// recurse on hight
for (int i=1; i<a/2; i++) {
int m = min_cuts(i,b, mem);
int n = min_cuts(a-i, b, mem);
if (min<0 or m+n+1<min)
min = m + n + 1;
}
// recurse on width
for (int j=1; j<b/2; j++) {
int m = min_cuts(a,j, mem);
int n = min_cuts(a, b-j, mem);
if (min<0 or m+n+1<min)
min = m + n + 1;
}
}
mem[a][b] = min;
return min;
}
int main() {
int a, b; std::cin >> a >> b; // sides of the rectangle
// -1 means the problem is not solved yet,
vector<vector<int> > mem(a+1, vector<int>(b+1, -1));
int res = min_cuts(a,b,mem);
std::cout << res << std::endl;
return 0;
}
The reason the foor loops go up until a/2 and b/2 is that cuting a paper is symmetric: if you cut along vertical line i it is the same as cutting along the line a-i if you flip the paper vertically. This is a little optimization hack that reduces complexity by a factor of 4 overall.
Another little hack is that by knowing that the problem is that if you transpose the paper the result is the same, meaining min_cuts(a,b)=min_cuts(b,a) you can potentially reduce computations by half. But any major further improvement, say a greedy algorithm would take more thinking (if there exists one at all).
The current answer is a good start, especially the suggestions to use memoization or dynamic programming, and potentially efficient enough.
Obviously, all answerers used the first with a sub-par data-structure. Vector-of-Vector has much space and performance overhead, using a (strict) lower triangular matrix stored in an array is much more efficient.
Using the maximum value as sentinel (easier with unsigned) would also reduce complexity.
Finally, let's move to dynamic programming instead of memoization to simplify and get even more efficient:
#include <algorithm>
#include <memory>
#include <utility>
constexpr unsigned min_cuts(unsigned a, unsigned b) {
if (a < b)
std::swap(a, b);
if (a == b || !b)
return 0;
const auto triangle = [](std::size_t n) { return n * (n - 1) / 2; };
const auto p = std::make_unique_for_overwrite<unsigned[]>(triangle(a));
/* const! */ unsigned zero = 0;
const auto f = [&](auto a, auto b) -> auto& {
if (a < b)
std::swap(a, b);
return a == b ? zero : p[triangle(a - 1) + b - 1];
};
for (auto i = 1u; i <= a; ++i) {
for (auto j = 1u; j < i; ++j) {
auto r = -1u;
for (auto k = i / 2; k; --k)
r = std::min(r, f(k, j) + f(i - k, j));
for (auto k = j / 2; k; --k)
r = std::min(r, f(k, i) + f(j - k, i));
f(i, j) = ++r;
}
}
return f(a, b);
}

Finding the median value of a vector using C++

I'm a programming student, and for a project I'm working on, on of the things I have to do is compute the median value of a vector of int values and must be done by passing it through functions. Also the vector is initially generated randomly using the C++ random generator mt19937 which i have already written down in my code.I'm to do this using the sort function and vector member functions such as .begin(), .end(), and .size().
I'm supposed to make sure I find the median value of the vector and then output it
And I'm Stuck, below I have included my attempt. So where am I going wrong? I would appreciate if you would be willing to give me some pointers or resources to get going in the right direction.
Code:
#include<iostream>
#include<vector>
#include<cstdlib>
#include<ctime>
#include<random>
#include<vector>
#include<cstdlib>
#include<ctime>
#include<random>
using namespace std;
double find_median(vector<double>);
double find_median(vector<double> len)
{
{
int i;
double temp;
int n=len.size();
int mid;
double median;
bool swap;
do
{
swap = false;
for (i = 0; i< len.size()-1; i++)
{
if (len[i] > len[i + 1])
{
temp = len[i];
len[i] = len[i + 1];
len[i + 1] = temp;
swap = true;
}
}
}
while (swap);
for (i=0; i<len.size(); i++)
{
if (len[i]>len[i+1])
{
temp=len[i];
len[i]=len[i+1];
len[i+1]=temp;
}
mid=len.size()/2;
if (mid%2==0)
{
median= len[i]+len[i+1];
}
else
{
median= (len[i]+0.5);
}
}
return median;
}
}
int main()
{
int n,i;
cout<<"Input the vector size: "<<endl;
cin>>n;
vector <double> foo(n);
mt19937 rand_generator;
rand_generator.seed(time(0));
uniform_real_distribution<double> rand_distribution(0,0.8);
cout<<"original vector: "<<" ";
for (i=0; i<n; i++)
{
double rand_num=rand_distribution(rand_generator);
foo[i]=rand_num;
cout<<foo[i]<<" ";
}
double median;
median=find_median(foo);
cout<<endl;
cout<<"The median of the vector is: "<<" ";
cout<<median<<endl;
}
The median is given by
const auto median_it = len.begin() + len.size() / 2;
std::nth_element(len.begin(), median_it , len.end());
auto median = *median_it;
For even numbers (size of vector) you need to be a bit more precise. E.g., you can use
assert(!len.empty());
if (len.size() % 2 == 0) {
const auto median_it1 = len.begin() + len.size() / 2 - 1;
const auto median_it2 = len.begin() + len.size() / 2;
std::nth_element(len.begin(), median_it1 , len.end());
const auto e1 = *median_it1;
std::nth_element(len.begin(), median_it2 , len.end());
const auto e2 = *median_it2;
return (e1 + e2) / 2;
} else {
const auto median_it = len.begin() + len.size() / 2;
std::nth_element(len.begin(), median_it , len.end());
return *median_it;
}
There are of course many different ways how we can get element e1. We could also use max or whatever we want. But this line is important because nth_element only places the nth element correctly, the remaining elements are ordered before or after this element, depending on whether they are larger or smaller. This range is unsorted.
This code is guaranteed to have linear complexity on average, i.e., O(N), therefore it is asymptotically better than sort, which is O(N log N).
Regarding your code:
for (i=0; i<len.size(); i++){
if (len[i]>len[i+1])
This will not work, as you access len[len.size()] in the last iteration which does not exist.
std::sort(len.begin(), len.end());
double median = len[len.size() / 2];
will do it. You might need to take the average of the middle two elements if size() is even, depending on your requirements:
0.5 * (len[len.size() / 2 - 1] + len[len.size() / 2]);
Instead of trying to do everything at once, you should start with simple test cases and work upwards:
#include<vector>
double find_median(std::vector<double> len);
// Return the number of failures - shell interprets 0 as 'success',
// which suits us perfectly.
int main()
{
return find_median({0, 1, 1, 2}) != 1;
}
This already fails with your code (even after fixing i to be an unsigned type), so you could start debugging (even 'dry' debugging, where you trace the code through on paper; that's probably enough here).
I do note that with a smaller test case, such as {0, 1, 2}, I get a crash rather than merely failing the test, so there's something that really needs to be fixed.
Let's replace the implementation with one based on overseas's answer:
#include <algorithm>
#include <limits>
#include <vector>
double find_median(std::vector<double> len)
{
if (len.size() < 1)
return std::numeric_limits<double>::signaling_NaN();
const auto alpha = len.begin();
const auto omega = len.end();
// Find the two middle positions (they will be the same if size is odd)
const auto i1 = alpha + (len.size()-1) / 2;
const auto i2 = alpha + len.size() / 2;
// Partial sort to place the correct elements at those indexes (it's okay to modify the vector,
// as we've been given a copy; otherwise, we could use std::partial_sort_copy to populate a
// temporary vector).
std::nth_element(alpha, i1, omega);
std::nth_element(i1, i2, omega);
return 0.5 * (*i1 + *i2);
}
Now, our test passes. We can write a helper method to allow us to create more tests:
#include <iostream>
bool test_median(const std::vector<double>& v, double expected)
{
auto actual = find_median(v);
if (abs(expected - actual) > 0.01) {
std::cerr << actual << " - expected " << expected << std::endl;
return true;
} else {
std::cout << actual << std::endl;
return false;
}
}
int main()
{
return test_median({0, 1, 1, 2}, 1)
+ test_median({5}, 5)
+ test_median({5, 5, 5, 0, 0, 0, 1, 2}, 1.5);
}
Once you have the simple test cases working, you can manage more complex ones. Only then is it time to create a large array of random values to see how well it scales:
#include <ctime>
#include <functional>
#include <random>
int main(int argc, char **argv)
{
std::vector<double> foo;
const int n = argc > 1 ? std::stoi(argv[1]) : 10;
foo.reserve(n);
std::mt19937 rand_generator(std::time(0));
std::uniform_real_distribution<double> rand_distribution(0,0.8);
std::generate_n(std::back_inserter(foo), n, std::bind(rand_distribution, rand_generator));
std::cout << "Vector:";
for (auto v: foo)
std::cout << ' ' << v;
std::cout << "\nMedian = " << find_median(foo) << std::endl;
}
(I've taken the number of elements as a command-line argument; that's more convenient in my build than reading it from cin). Notice that instead of allocating n doubles in the vector, we simply reserve capacity for them, but don't create any until needed.
For fun and kicks, we can now make find_median() generic. I'll leave that as an exercise; I suggest you start with:
typename<class Iterator>
auto find_median(Iterator alpha, Iterator omega)
{
using value_type = typename Iterator::value_type;
if (alpha == omega)
return std::numeric_limits<value_type>::signaling_NaN();
}

Using pow() for large number

I am trying to solve a problem, a part of which requires me to calculate (2^n)%1000000007 , where n<=10^9. But my following code gives me output "0" even for input like n=99.
Is there anyway other than having a loop which multilplies the output by 2 every time and finding the modulo every time (this is not I am looking for as this will be very slow for large numbers).
#include<stdio.h>
#include<math.h>
#include<iostream>
using namespace std;
int main()
{
unsigned long long gaps,total;
while(1)
{
cin>>gaps;
total=(unsigned long long)powf(2,gaps)%1000000007;
cout<<total<<endl;
}
}
You need a "big num" library, it is not clear what platform you are on, but start here:
http://gmplib.org/
this is not I am looking for as this will be very slow for large numbers
Using a bigint library will be considerably slower pretty much any other solution.
Don't take the modulo every pass through the loop: rather, only take it when the output grows bigger than the modulus, as follows:
#include <iostream>
int main() {
int modulus = 1000000007;
int n = 88888888;
long res = 1;
for(long i=0; i < n; ++i) {
res *= 2;
if(res > modulus)
res %= modulus;
}
std::cout << res << std::endl;
}
This is actually pretty quick:
$ time ./t
./t 1.19s user 0.00s system 99% cpu 1.197 total
I should mention that the reason this works is that if a and b are equivalent mod m (that is, a % m = b % m), then this equality holds multiple k of a and b (that is, the foregoing equality implies (a*k)%m = (b*k)%m).
Chris proposed GMP, but if you need just that and want to do things The C++ Way, not The C Way, and without unnecessary complexity, you may just want to check this out - it generates few warnings when compiling, but is quite simple and Just Works™.
You can split your 2^n into chunks of 2^m. You need to find: `
2^m * 2^m * ... 2^(less than m)
Number m should be 31 is for 32-bit CPU. Then your answer is:
chunk1 % k * chunk2 * k ... where k=1000000007
You are still O(N). But then you can utilize the fact that all chunk % k are equal except last one and you can make it O(1)
I wrote this function. It is very inefficient but it works with very large numbers. It uses my self-made algorithm to store big numbers in arrays using a decimal like system.
mpfr2.cpp
#include "mpfr2.h"
void mpfr2::mpfr::setNumber(std::string a) {
for (int i = a.length() - 1, j = 0; i >= 0; ++j, --i) {
_a[j] = a[i] - '0';
}
res_size = a.length();
}
int mpfr2::mpfr::multiply(mpfr& a, mpfr b)
{
mpfr ans = mpfr();
// One by one multiply n with individual digits of res[]
int i = 0;
for (i = 0; i < b.res_size; ++i)
{
for (int j = 0; j < a.res_size; ++j) {
ans._a[i + j] += b._a[i] * a._a[j];
}
}
for (i = 0; i < a.res_size + b.res_size; i++)
{
int tmp = ans._a[i] / 10;
ans._a[i] = ans._a[i] % 10;
ans._a[i + 1] = ans._a[i + 1] + tmp;
}
for (i = a.res_size + b.res_size; i >= 0; i--)
{
if (ans._a[i] > 0) break;
}
ans.res_size = i+1;
a = ans;
return a.res_size;
}
mpfr2::mpfr mpfr2::mpfr::pow(mpfr a, mpfr b) {
mpfr t = a;
std::string bStr = "";
for (int i = b.res_size - 1; i >= 0; --i) {
bStr += std::to_string(b._a[i]);
}
int i = 1;
while (!0) {
if (bStr == std::to_string(i)) break;
a.res_size = multiply(a, t);
// Debugging
std::cout << "\npow() iteration " << i << std::endl;
++i;
}
return a;
}
mpfr2.h
#pragma once
//#infdef MPFR2_H
//#define MPFR2_H
// C standard includes
#include <iostream>
#include <string>
#define MAX 0x7fffffff/32/4 // 2147483647
namespace mpfr2 {
class mpfr
{
public:
int _a[MAX];
int res_size;
void setNumber(std::string);
static int multiply(mpfr&, mpfr);
static mpfr pow(mpfr, mpfr);
};
}
//#endif
main.cpp
#include <iostream>
#include <fstream>
// Local headers
#include "mpfr2.h" // Defines local mpfr algorithm library
// Namespaces
namespace m = mpfr2; // Reduce the typing a bit later...
m::mpfr tetration(m::mpfr, int);
int main() {
// Hardcoded tests
int x = 7;
std::ofstream f("out.txt");
m::mpfr t;
for(int b=1; b<x;b++) {
std::cout << "2^^" << b << std::endl; // Hardcoded message
t.setNumber("2");
m::mpfr res = tetration(t, b);
for (int i = res.res_size - 1; i >= 0; i--) {
std::cout << res._a[i];
f << res._a[i];
}
f << std::endl << std::endl;
std::cout << std::endl << std::endl;
}
char c; std::cin.ignore(); std::cin >> c;
return 0;
}
m::mpfr tetration(m::mpfr a, int b)
{
m::mpfr tmp = a;
if (b <= 0) return m::mpfr();
for (; b > 1; b--) tmp = m::mpfr::pow(a, tmp);
return tmp;
}
I created this for tetration and eventually hyperoperations. When the numbers get really big it can take ages to calculate and a lot of memory. The #define MAX 0x7fffffff/32/4 is the number of decimals one number can have. I might make another algorithm later to combine multiple of these arrays into one number. On my system the max array length is 0x7fffffff aka 2147486347 aka 2^31-1 aka int32_max (which is usually the standard int size) so I had to divide int32_max by 32 to make the creation of this array possible. I also divided it by 4 to reduce memory usage in the multiply() function.
- Jubiman