The task (from a Bulgarian judge, click on "Език" to change it to English):
I am given the size of the first (S1 = A) of N corals. The size of every subsequent coral (Si, where i > 1) is calculated using the formula (B*Si-1 + C)%D, where A, B, C and D are some constants. I am told that Nemo is nearby the Kth coral (when the sizes of all corals are sorted in ascending order).
What is the size of the above-mentioned Kth coral ?
I will have T tests and for every one of them I will be given N, K, A, B, C and D and prompted to output the size of the Kth coral.
The requirements:
1 ≤ T ≤ 3
1 ≤ K ≤ N ≤ 107
0 ≤ A < D ≤ 1018
1 ≤ C, B*D ≤ 1018
Memory available is 64 MB
Time limit is 1.9 sec
The problem I have:
For the worst case scenario I will need 107*8B which is 76 MB.
The solution If the memory available was at least 80 MB would be:
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
std::vector<biggie>::iterator it_ans;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
std::vector<biggie> lut{ a };
lut.reserve(n);
for (j = 1; j != n; ++j) {
lut.emplace_back((b * lut.back() + c) % d);
}
it_ans = std::next(lut.begin(), k - 1);
std::nth_element(lut.begin(), it_ans, lut.end());
std::cout << *it_ans << '\n';
}
return 0;
}
Question 1: How can I approach this CP task given the requirements listed above ?
Question 2: Is it somehow possible to use std::nth_element to solve it since I am not able to store all N elements ? I mean using std::nth_element in a sliding window technique (If this is possible).
# Christian Sloper
#include <iostream>
#include <queue>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j, j_lim;
biggie a, b, c, d, prev, curr;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
if (k < n - k + 1) {
std::priority_queue<biggie, std::vector<biggie>, std::less<biggie>> q;
q.push(a);
prev = a;
for (j = 1; j != k; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr < q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
else {
std::priority_queue<biggie, std::vector<biggie>, std::greater<biggie>> q;
q.push(a);
prev = a;
for (j = 1, j_lim = n - k + 1; j != j_lim; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr > q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
}
return 0;
}
This gets accepted (Succeeds all 40 tests. Largest time 1.4 seconds, for a test with T=3 and D≤10^9. Largest time for a test with larger D (and thus T=1) is 0.7 seconds.).
#include <iostream>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
biggie prefix = 0;
for (int shift = d > 1000000000 ? 40 : 20; shift >= 0; shift -= 20) {
biggie prefix_mask = ((biggie(1) << (40 - shift)) - 1) << (shift + 20);
int count[1 << 20] = {0};
biggie s = a;
int rank = 0;
for (j = 0; j != n; ++j) {
biggie s_vs_prefix = s & prefix_mask;
if (s_vs_prefix < prefix)
++rank;
else if (s_vs_prefix == prefix)
++count[(s >> shift) & ((1 << 20) - 1)];
s = (b * s + c) % d;
}
int i = -1;
while (rank < k)
rank += count[++i];
prefix |= biggie(i) << shift;
}
std::cout << prefix << '\n';
}
return 0;
}
The result is a 60 bits number. I first determine the high 20 bits with one pass through the numbers, then the middle 20 bits in another pass, then the low 20 bits in another.
For the high 20 bits, generate all the numbers and count how often each high 20 bits pattern occurrs. After that, add up the counts until you reach K. The pattern where you reach K, that pattern covers the K-th largest number. In other words, that's the result's high 20 bits.
The middle and low 20 bits are computed similarly, except we take the by then known prefix (the high 20 bits or high+middle 40 bits) into account. As a little optimization, when D is small, I skip computing the high 20 bits. That got me from 2.1 seconds down to 1.4 seconds.
This solution is like user3386109 described, except with bucket size 2^20 instead of 10^6 so I can use bit operations instead of divisions and think of bit patterns instead of ranges.
For the memory constraint you hit:
(B*Si-1 + C)%D
requires only the value (Si-2) before itself. So you can compute them in pairs, to use only 1/2 of total you need. This only needs indexing even values and iterating once for odd values. So you can just use half-length LUT and compute the odd value in-flight. Modern CPUs are fast enough to do extra calculations like these.
std::vector<biggie> lut{ a_i,a_i_2,a_i_4,... };
a_i_3=computeOddFromEven(lut[1]);
You can make a longer stride like 4,8 too. If dataset is large, RAM latency is big. So it's like having checkpoints in whole data search space to balance between memory and core usage. 1000-distance checkpoints would put a lot of cpu cycles into re-calculations but then the array would fit CPU's L2/L1 cache which is not bad. When sorting, the maximum re-calc iteration per element would be n=1000 now. O(1000 x size) maybe it's a big constant but maybe somehow optimizable by compiler if some constants really const?
If CPU performance becomes problem again:
write a compiling function that writes your source code with all the "constant" given by user to a string
compile the code using command-line (assuming target computer has some accessible from command line like g++ from main program)
run it and get results
Compiler should enable more speed/memory optimizations when those are really constant in compile-time rather than depending on std::cin.
If you really need to add a hard-limit to the RAM usage, then implement a simple cache with the backing-store as your heavy computations with brute-force O(N^2) (or O(L x N) with checkpoints every L elements as in first method where L=2 or 4, or ...).
Here's a sample direct-mapped cache with 8M long-long value space:
int main()
{
std::vector<long long> checkpoints = {
a_0, a_16, a_32,...
};
auto cacheReadMissFunction = [&](int key){
// your pure computational algorithm here, helper meant to show variable
long long result = checkpoints[key/16];
for(key - key%16 times)
result = iterate(result);
return result;
};
auto cacheWriteMissFunction = [&](int key, long long value){
/* not useful for your algorithm as it doesn't change behavior per element */
// backing_store[key] = value;
};
// due to special optimizations, size has to be 2^k
int cacheSize = 1024*1024*8;
DirectMappedCache<int, long long> cache(cacheSize,cacheReadMissFunction,cacheWriteMissFunction);
std::cout << cache.get(20)<<std::endl;
return 0;
}
If you use a cache-friendly sorting-algorithm, a direct cache access would make a lot of re-use for nearly all the elements in comparisons if you fill the output buffer/terminal with elements one by one by following something like a bitonic-sort-path (that is known in compile-time). If that doesn't work, then you can try accessing files as a "backing-store" of cache for sorting whole array at once. Is file system prohibited for use? Then the online-compiling method above won't work either.
Implementation of a direct mapped cache (don't forget to call flush() after your algorithm finishes, if you use any cache.set() method):
#ifndef DIRECTMAPPEDCACHE_H_
#define DIRECTMAPPEDCACHE_H_
#include<vector>
#include<functional>
#include<mutex>
#include<iostream>
/* Direct-mapped cache implementation
* Only usable for integer type keys in range [0,maxPositive-1]
*
* CacheKey: type of key (only integers: int, char, size_t)
* CacheValue: type of value that is bound to key (same as above)
*/
template< typename CacheKey, typename CacheValue>
class DirectMappedCache
{
public:
// allocates buffers for numElements number of cache slots/lanes
// readMiss: cache-miss for read operations. User needs to give this function
// to let the cache automatically get data from backing-store
// example: [&](MyClass key){ return redis.get(key); }
// takes a CacheKey as key, returns CacheValue as value
// writeMiss: cache-miss for write operations. User needs to give this function
// to let the cache automatically set data to backing-store
// example: [&](MyClass key, MyAnotherClass value){ redis.set(key,value); }
// takes a CacheKey as key and CacheValue as value
// numElements: has to be integer-power of 2 (e.g. 2,4,8,16,...)
DirectMappedCache(CacheKey numElements,
const std::function<CacheValue(CacheKey)> & readMiss,
const std::function<void(CacheKey,CacheValue)> & writeMiss):size(numElements),sizeM1(numElements-1),loadData(readMiss),saveData(writeMiss)
{
// initialize buffers
for(size_t i=0;i<numElements;i++)
{
valueBuffer.push_back(CacheValue());
isEditedBuffer.push_back(0);
keyBuffer.push_back(CacheKey()-1);// mapping of 0+ allowed
}
}
// get element from cache
// if cache doesn't find it in buffers,
// then cache gets data from backing-store
// then returns the result to user
// then cache is available from RAM on next get/set access with same key
inline
const CacheValue get(const CacheKey & key) noexcept
{
return accessDirect(key,nullptr);
}
// only syntactic difference
inline
const std::vector<CacheValue> getMultiple(const std::vector<CacheKey> & key) noexcept
{
const int n = key.size();
std::vector<CacheValue> result(n);
for(int i=0;i<n;i++)
{
result[i]=accessDirect(key[i],nullptr);
}
return result;
}
// thread-safe but slower version of get()
inline
const CacheValue getThreadSafe(const CacheKey & key) noexcept
{
std::lock_guard<std::mutex> lg(mut);
return accessDirect(key,nullptr);
}
// set element to cache
// if cache doesn't find it in buffers,
// then cache sets data on just cache
// writing to backing-store only happens when
// another access evicts the cache slot containing this key/value
// or when cache is flushed by flush() method
// then returns the given value back
// then cache is available from RAM on next get/set access with same key
inline
void set(const CacheKey & key, const CacheValue & val) noexcept
{
accessDirect(key,&val,1);
}
// thread-safe but slower version of set()
inline
void setThreadSafe(const CacheKey & key, const CacheValue & val) noexcept
{
std::lock_guard<std::mutex> lg(mut);
accessDirect(key,&val,1);
}
// use this before closing the backing-store to store the latest bits of data
void flush()
{
try
{
std::lock_guard<std::mutex> lg(mut);
for (size_t i=0;i<size;i++)
{
if (isEditedBuffer[i] == 1)
{
isEditedBuffer[i]=0;
auto oldKey = keyBuffer[i];
auto oldValue = valueBuffer[i];
saveData(oldKey,oldValue);
}
}
}catch(std::exception &ex){ std::cout<<ex.what()<<std::endl; }
}
// direct mapped access
// opType=0: get
// opType=1: set
CacheValue const accessDirect(const CacheKey & key,const CacheValue * value, const bool opType = 0)
{
// find tag mapped to the key
CacheKey tag = key & sizeM1;
// compare keys
if(keyBuffer[tag] == key)
{
// cache-hit
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
valueBuffer[tag]=*value;
}
// cache hit value
return valueBuffer[tag];
}
else // cache-miss
{
CacheValue oldValue = valueBuffer[tag];
CacheKey oldKey = keyBuffer[tag];
// eviction algorithm start
if(isEditedBuffer[tag] == 1)
{
// if it is "get"
if(opType==0)
{
isEditedBuffer[tag]=0;
}
saveData(oldKey,oldValue);
// "get"
if(opType==0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else /* "set" */
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
else // not edited
{
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
}
// "get"
if(opType == 0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else // "set"
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
}
}
private:
const CacheKey size;
const CacheKey sizeM1;
std::mutex mut;
std::vector<CacheValue> valueBuffer;
std::vector<unsigned char> isEditedBuffer;
std::vector<CacheKey> keyBuffer;
const std::function<CacheValue(CacheKey)> loadData;
const std::function<void(CacheKey,CacheValue)> saveData;
};
#endif /* DIRECTMAPPEDCACHE_H_ */
You can solve this problem using a Max-heap.
Insert the first k elements into the max-heap. The largest element of these k will now be at the root.
For each remaining element e:
Compare e to the root.
If e is larger than the root, discard it.
If e is smaller than the root, remove the root and insert e into the heap structure.
After all elements have been processed, the k-th smallest element is at the root.
This method uses O(K) space and O(n log n) time.
There’s an algorithm that people often call LazySelect that I think would be perfect here.
With high probability, we make two passes. In the first pass, we save a random sample of size n much less than N. The answer will be around index (K/N)n in the sorted sample, but due to the randomness, we have to be careful. Save the values a and b at (K/N)n ± r instead, where r is the radius of the window. In the second pass, we save all of the values in [a, b], count the number of values less than a (let it be L), and select the value with index K−L if it’s in the window (otherwise, try again).
The theoretical advice on choosing n and r is fine, but I would be pragmatic here. Choose n so that you use most of the available memory; the bigger the sample, the more informative it is. Choose r fairly large as well, but not quite as aggressively due to the randomness.
C++ code below. On the online judge, it’s faster than Kelly’s (max 1.3 seconds on the T=3 tests, 0.5 on the T=1 tests).
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <limits>
#include <optional>
#include <random>
#include <vector>
namespace {
class LazySelector {
public:
static constexpr std::int32_t kTargetSampleSize = 1000;
explicit LazySelector() { sample_.reserve(1000000); }
void BeginFirstPass(const std::int32_t n, const std::int32_t k) {
sample_.clear();
mask_ = n / kTargetSampleSize;
mask_ |= mask_ >> 1;
mask_ |= mask_ >> 2;
mask_ |= mask_ >> 4;
mask_ |= mask_ >> 8;
mask_ |= mask_ >> 16;
}
void FirstPass(const std::int64_t value) {
if ((gen_() & mask_) == 0) {
sample_.push_back(value);
}
}
void BeginSecondPass(const std::int32_t n, const std::int32_t k) {
sample_.push_back(std::numeric_limits<std::int64_t>::min());
sample_.push_back(std::numeric_limits<std::int64_t>::max());
const double p = static_cast<double>(sample_.size()) / n;
const double radius = 2 * std::sqrt(sample_.size());
const auto lower =
sample_.begin() + std::clamp<std::int32_t>(std::floor(p * k - radius),
0, sample_.size() - 1);
const auto upper =
sample_.begin() + std::clamp<std::int32_t>(std::ceil(p * k + radius), 0,
sample_.size() - 1);
std::nth_element(sample_.begin(), upper, sample_.end());
std::nth_element(sample_.begin(), lower, upper);
lower_ = *lower;
upper_ = *upper;
sample_.clear();
less_than_lower_ = 0;
equal_to_lower_ = 0;
equal_to_upper_ = 0;
}
void SecondPass(const std::int64_t value) {
if (value < lower_) {
++less_than_lower_;
} else if (upper_ < value) {
} else if (value == lower_) {
++equal_to_lower_;
} else if (value == upper_) {
++equal_to_upper_;
} else {
sample_.push_back(value);
}
}
std::optional<std::int64_t> Select(std::int32_t k) {
if (k < less_than_lower_) {
return std::nullopt;
}
k -= less_than_lower_;
if (k < equal_to_lower_) {
return lower_;
}
k -= equal_to_lower_;
if (k < sample_.size()) {
const auto kth = sample_.begin() + k;
std::nth_element(sample_.begin(), kth, sample_.end());
return *kth;
}
k -= sample_.size();
if (k < equal_to_upper_) {
return upper_;
}
return std::nullopt;
}
private:
std::default_random_engine gen_;
std::vector<std::int64_t> sample_ = {};
std::int32_t mask_ = 0;
std::int64_t lower_ = std::numeric_limits<std::int64_t>::min();
std::int64_t upper_ = std::numeric_limits<std::int64_t>::max();
std::int32_t less_than_lower_ = 0;
std::int32_t equal_to_lower_ = 0;
std::int32_t equal_to_upper_ = 0;
};
} // namespace
int main() {
int t;
std::cin >> t;
for (int i = t; i > 0; --i) {
std::int32_t n;
std::int32_t k;
std::int64_t a;
std::int64_t b;
std::int64_t c;
std::int64_t d;
std::cin >> n >> k >> a >> b >> c >> d;
std::optional<std::int64_t> ans = std::nullopt;
LazySelector selector;
do {
{
selector.BeginFirstPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.FirstPass(s);
s = (b * s + c) % d;
}
}
{
selector.BeginSecondPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.SecondPass(s);
s = (b * s + c) % d;
}
}
ans = selector.Select(k - 1);
} while (!ans);
std::cout << *ans << '\n';
}
}
question: Dean of MAIT is going to visit Hostels of MAIT. As you know that he is a very busy person so he decided to visit only the first "K" nearest Hostels. Hostels are situated on a 2D plane. You are given the coordinates of hostels and you have to answer the Rocket distance of Kth nearest hostel from the origin ( Dean's place )
Input Format
The first line of input contains Q Total no. of queries and K There are two types of queries:
first type: 1 x y For query of 1st type, you came to know about the coordinates ( x, y ) of the newly constructed hostel. second type: 2 For query of 2nd type, you have to output the Rocket distance of Kth nearest hostel till now.
//The Dean will always stay at his place ( origin ). It is guaranteed that there will be at least k queries of type 1 before the first query of type 2.
Rocket distance between two points ( x2 , y2 ) and ( x1 , y1 ) is defined as (x2 - x1)2 + (y2 - y1)2
Constraints
1 < = k < = Q < = 10^5 -10^6 < = x , y < = 10^6
Output Format
For each query of type 2 output the Rocket distance of Kth nearest hostel from Origin.//
This is my code:
#include <iostream>
#include <queue>
#include <vector>
#include <algorithm>
using namespace std;
class roomno
{
public:
int x;
int y;
roomno(int x,int y)
{
this->x=x;
this->y=y;
}
void print()
{
cout<<"location"<<"("<<x<<","<<y<<")"<<endl;
}
int distance ()
{
return (x*x+y*y);
}
};
class roomcompare
{
public:
bool operator() (roomno r1,roomno r2)
{
return r1.distance()>r2.distance();
}
};
int main()
{
int x[1000]},y[1000];
int l,k=0;
priority_queue<roomno,vector<roomno>,roomcompare> pq;
int n,i,j;
cin>>n>>l;
//cin>>n;
cin.ignore();
for( i=0;i<n;i++)
{
cin>>x[i];
}
cin.ignore();
for( j=0;j<n;j++)
{
cin>>y[j];
}
cin.ignore();
for(i=0;i<n;i++ )
{
roomno r1(x[i],y[i]);
pq.push(r1);
}
while(!pq.empty()&&k!=l)
{ k++;
roomno r2=pq.top();
r2.print();
pq.pop();
}
return 0;
}
Original link to code: https://codeshare.io/2j1bkA
What's wrong with my code?
Please consider adding what problem you are facing in your post. I've seen your code but I couldn't help with your code as I even saw a syntax error in your code and didn't know if it was the problem.
If I didn't misunderstand your question, std::set will fit better than priority queue as it is not possible to remove the largest item in a priority queue(min heap). The following code should work:
#include <iostream>
#include <set>
using namespace std;
using ll = long long;
multiset<ll> distances;
int main() {
ll n, k;
cin >> n >> k;
for(ll i = 0; i < n; ++i) {
ll query;
cin >> query;
if (query == 1) {
ll x, y;
cin >> x >> y;
distances.insert(x * x + y * y);
if (distances.size() > k) {
distances.erase(--distances.end());
}
} else {
cout << *distances.rbegin() << '\n';
}
}
return 0;
}
You have an array A of length N consisting of positive integers. Let's consider N vertical segments (parallel to y axis) on the plane, the i-th segment connects the points (i, 0) and (i, Ai).
You have Q queries of following two types:
+ i X : Ai += X
? i L R: Let's consider R - L + 1 rays which are parallel to x axis. The j-th ray eminates at point (i - 0.5, L+ j - 1.5) and extends to the right infinitely long parallel to x axis. You have to print the number of the segments that will be shot by some ray. If a ray shoots a segment then it stops at that point and does't extend further.
Note that all the queries of the second type are independent of each other.
Code I tried produces right result but not much efficient:
int main()
{
int T;
int k;
cin>>T;
long int N,Q;
for(k=0;k<T;k++)
{
cin>>N>>Q;
vector<long int> v;
v.push_back(-1);
long int i=0,p;
for(i=0;i<N;i++)
{
cin>>p;
v.push_back(p);
}
for(long int j=0;j<Q;j++)
{
char c1;
cin>>c1;
if(c1=='+')
{
long int i;
int X;
cin>>i>>X;
v[i] = v[i] + X;
}
else
{
long int L,R,i;
cin>>i>>L>>R;
long int count=0,max=-100000;
for(long int l=i;l<=N;l++)
{
if(v[l]>=L && v[l]>max)
{
max=v[l];
count++;
if(max>=R)
break;
}
}
cout<<count<<endl;
}
}
}
}
T is here no of test cases.N is total no of elements in the array. Q is the no of queries on that array. So what I need is a better algorithm to do the same job that I did.
Suppose we are given a weighted tree and some set of vertices of that tree. The problem is to isolate those vertices(e.g. there should not be paths between them) by removing edges from the tree, such that the sum of the weights of removed edges is minimal.
I have been trying to solve this problem for about two hours, then I found solution in C++, but there was no explanation. As I understood it uses Dynamic Programming technique.
My question is what is the basic idea of solving this problem?
Thanks!
Here is the solution.
#include <iostream>
#include <cstring>
#include <vector>
using namespace std;
#define PB push_back
#define MP make_pair
typedef long long LL;
const int MAXN = 100005;
const LL inf = 1LL << 55;
int N, K;
bool bad[MAXN];
LL dp[MAXN][2];
vector<pair<int, int> > adj[MAXN];
void dfs(int u, int fa) {
dp[u][1] = 0;
dp[u][0] = bad[u] ? inf : 0;
for (int i = 0; i < adj[u].size(); i++) {
int v = adj[u][i].first, w = adj[u][i].second;
if (v != fa) {
dfs(v, u);
dp[u][1] += min(dp[v][0], dp[v][1] + w);
dp[u][1] = min(dp[u][1], dp[u][0] + dp[v][1]);
if (!bad[u])
dp[u][0] += min(dp[v][0], dp[v][1] + w);
}
}
}
int main() {
cin >> N >> K;
for (int i = 1; i < N; i++) {
int a, b, c;
cin >> a >> b >> c;
adj[a].PB(MP(b, c)); adj[b].PB(MP(a, c));
}
memset(bad, false, sizeof(bad));
for (int i = 0; i < K; i++) {
int x;
cin >> x;
bad[x] = true;
}
dfs(0, -1);
LL ret = min(dp[0][0], dp[0][1]);
cout << ret << endl;
return 0;
}
Formally, the problem is, given an acyclic undirected graph with weighted edges, together with a set of terminal vertices, find the minimum set of edges whose removal means that, for all pairs of distinct terminals, those terminals are no longer connected.
Root the graph arbitrarily and treat it as a general tree. We run a dynamic program that operates bottom-up on the vertices. Each vertex u has two labels: dp[u][0] is the minimum weight of edges to be removed in the subtree rooted at u so that no terminal in the subtree is connected to another terminal or to u, and dp[u][1] is the minimum removal weight that leaves exactly one subtree terminal connected to u. We have a recurrence
{ infinity if v is a terminal
dp[u][0] = { sum dp[v][0] otherwise
{ children v of u
dp[u][1] = min dp[v][1] + sum min(dp[w][0], dp[w][1] + weight(uw)),
children v of u other children w of v
taking an empty min to be infinity.
I am trying to solve a problem, a part of which requires me to calculate (2^n)%1000000007 , where n<=10^9. But my following code gives me output "0" even for input like n=99.
Is there anyway other than having a loop which multilplies the output by 2 every time and finding the modulo every time (this is not I am looking for as this will be very slow for large numbers).
#include<stdio.h>
#include<math.h>
#include<iostream>
using namespace std;
int main()
{
unsigned long long gaps,total;
while(1)
{
cin>>gaps;
total=(unsigned long long)powf(2,gaps)%1000000007;
cout<<total<<endl;
}
}
You need a "big num" library, it is not clear what platform you are on, but start here:
http://gmplib.org/
this is not I am looking for as this will be very slow for large numbers
Using a bigint library will be considerably slower pretty much any other solution.
Don't take the modulo every pass through the loop: rather, only take it when the output grows bigger than the modulus, as follows:
#include <iostream>
int main() {
int modulus = 1000000007;
int n = 88888888;
long res = 1;
for(long i=0; i < n; ++i) {
res *= 2;
if(res > modulus)
res %= modulus;
}
std::cout << res << std::endl;
}
This is actually pretty quick:
$ time ./t
./t 1.19s user 0.00s system 99% cpu 1.197 total
I should mention that the reason this works is that if a and b are equivalent mod m (that is, a % m = b % m), then this equality holds multiple k of a and b (that is, the foregoing equality implies (a*k)%m = (b*k)%m).
Chris proposed GMP, but if you need just that and want to do things The C++ Way, not The C Way, and without unnecessary complexity, you may just want to check this out - it generates few warnings when compiling, but is quite simple and Just Works™.
You can split your 2^n into chunks of 2^m. You need to find: `
2^m * 2^m * ... 2^(less than m)
Number m should be 31 is for 32-bit CPU. Then your answer is:
chunk1 % k * chunk2 * k ... where k=1000000007
You are still O(N). But then you can utilize the fact that all chunk % k are equal except last one and you can make it O(1)
I wrote this function. It is very inefficient but it works with very large numbers. It uses my self-made algorithm to store big numbers in arrays using a decimal like system.
mpfr2.cpp
#include "mpfr2.h"
void mpfr2::mpfr::setNumber(std::string a) {
for (int i = a.length() - 1, j = 0; i >= 0; ++j, --i) {
_a[j] = a[i] - '0';
}
res_size = a.length();
}
int mpfr2::mpfr::multiply(mpfr& a, mpfr b)
{
mpfr ans = mpfr();
// One by one multiply n with individual digits of res[]
int i = 0;
for (i = 0; i < b.res_size; ++i)
{
for (int j = 0; j < a.res_size; ++j) {
ans._a[i + j] += b._a[i] * a._a[j];
}
}
for (i = 0; i < a.res_size + b.res_size; i++)
{
int tmp = ans._a[i] / 10;
ans._a[i] = ans._a[i] % 10;
ans._a[i + 1] = ans._a[i + 1] + tmp;
}
for (i = a.res_size + b.res_size; i >= 0; i--)
{
if (ans._a[i] > 0) break;
}
ans.res_size = i+1;
a = ans;
return a.res_size;
}
mpfr2::mpfr mpfr2::mpfr::pow(mpfr a, mpfr b) {
mpfr t = a;
std::string bStr = "";
for (int i = b.res_size - 1; i >= 0; --i) {
bStr += std::to_string(b._a[i]);
}
int i = 1;
while (!0) {
if (bStr == std::to_string(i)) break;
a.res_size = multiply(a, t);
// Debugging
std::cout << "\npow() iteration " << i << std::endl;
++i;
}
return a;
}
mpfr2.h
#pragma once
//#infdef MPFR2_H
//#define MPFR2_H
// C standard includes
#include <iostream>
#include <string>
#define MAX 0x7fffffff/32/4 // 2147483647
namespace mpfr2 {
class mpfr
{
public:
int _a[MAX];
int res_size;
void setNumber(std::string);
static int multiply(mpfr&, mpfr);
static mpfr pow(mpfr, mpfr);
};
}
//#endif
main.cpp
#include <iostream>
#include <fstream>
// Local headers
#include "mpfr2.h" // Defines local mpfr algorithm library
// Namespaces
namespace m = mpfr2; // Reduce the typing a bit later...
m::mpfr tetration(m::mpfr, int);
int main() {
// Hardcoded tests
int x = 7;
std::ofstream f("out.txt");
m::mpfr t;
for(int b=1; b<x;b++) {
std::cout << "2^^" << b << std::endl; // Hardcoded message
t.setNumber("2");
m::mpfr res = tetration(t, b);
for (int i = res.res_size - 1; i >= 0; i--) {
std::cout << res._a[i];
f << res._a[i];
}
f << std::endl << std::endl;
std::cout << std::endl << std::endl;
}
char c; std::cin.ignore(); std::cin >> c;
return 0;
}
m::mpfr tetration(m::mpfr a, int b)
{
m::mpfr tmp = a;
if (b <= 0) return m::mpfr();
for (; b > 1; b--) tmp = m::mpfr::pow(a, tmp);
return tmp;
}
I created this for tetration and eventually hyperoperations. When the numbers get really big it can take ages to calculate and a lot of memory. The #define MAX 0x7fffffff/32/4 is the number of decimals one number can have. I might make another algorithm later to combine multiple of these arrays into one number. On my system the max array length is 0x7fffffff aka 2147486347 aka 2^31-1 aka int32_max (which is usually the standard int size) so I had to divide int32_max by 32 to make the creation of this array possible. I also divided it by 4 to reduce memory usage in the multiply() function.
- Jubiman