Mo's algorithm to compute "power" of array - c++

Recently, I learned Mo's algorithm for the square-root decomposition of queries in order to speed up solutions to certain problems.
In order to practice implementation, I have been trying to solve D. Powerful array (a past contest problem on Codeforces) using this idea. The problem is as follows:
You have an array with integers .
Consider an arbitrary subarray . Define to be the number of occurrences of an integer in this subarray. The power of a subarray is defined as the sum of for all integers (note that there are only a positive number of terms for which this is not zero).
Answer queries. In each query, given two integers and , compute the power of .
It holds:
Using Mo's algorithm, I have written code that solves this problem offline in . I am certain that this problem can be solved using this algorithm and time complexity, as I have inspected the accepted code of others and they also use a similar algorithm.
My code, however, gets a time limit exceeded verdict.
Below is the code I have written:
#include <ios>
#include <iostream>
#include <cmath>
#include <algorithm>
#include <vector>
#include <utility>
#include <map>
int sqt;
long long int ans = 0;
long long int arr[200005] = {};
long long int cnt[1000005] = {};
long long int tans[200005] = {};
struct el
{
int l, r, in;
};
bool cmp(const el &x, const el &y)
{
if (x.l/sqt != y.l/sqt)
return x.l/sqt < y.l/sqt;
return x.r < y.r;
}
el qr[200005];
int main()
{
std::ios_base::sync_with_stdio(false);
std::cin.tie(NULL);
std::cout.tie(NULL);
int n, q, a, b;
std::cin >> n >> q;
sqt = sqrt((double)(n))+27;
for (int i = 0; i < n; i++)
std::cin >> arr[i];
for (int i = 0; i < q; i++)
{
std::cin >> a >> b;
a--; b--;
qr[i].l = a;
qr[i].r = b;
qr[i].in = i;
}
std::sort(qr, qr+q, cmp);
int li = 0; //left iterator
int ri = 0; //right iterator
ans = arr[0];
cnt[arr[0]]++;
for (int i = 0; i < q; i++)
{
while (li < qr[i].l)
{
ans -= cnt[arr[li]]*cnt[arr[li]]*arr[li];
cnt[arr[li]]--;
ans += cnt[arr[li]]*cnt[arr[li]]*arr[li];
li++;
}
while (li > qr[i].l)
{
li--;
ans -= cnt[arr[li]]*cnt[arr[li]]*arr[li];
cnt[arr[li]]++;
ans += cnt[arr[li]]*cnt[arr[li]]*arr[li];
}
while (ri < qr[i].r)
{
ri++;
ans -= cnt[arr[ri]]*cnt[arr[ri]]*arr[ri];
cnt[arr[ri]]++;
ans += cnt[arr[ri]]*cnt[arr[ri]]*arr[ri];
}
while (ri > qr[i].r)
{
ans -= cnt[arr[ri]]*cnt[arr[ri]]*arr[ri];
cnt[arr[ri]]--;
ans += cnt[arr[ri]]*cnt[arr[ri]]*arr[ri];
ri--;
}
tans[qr[i].in] = ans;
}
for (int i = 0; i < q; i++)
std::cout << tans[i] << '\n';
}
Can you suggest any non-asymptotic (or possibly even an asymptotic) improvement that can speed up the program enough to pass the time limit?
I have already tried the following things, to no avail:
Using a vector instead of an array.
Using two nested pairs instead of struct.
Using only one pair, and then using a map to try to recover the correct order of answers.
Adding some various constants to sqt (such as in the code above).
Overloading the < comparison operator within the struct el itself.
I feel like I'm missing something important, since the other codes I have inspected seem to pass the time limit with quite a bit of leeway (around half a second). Yet, they seem to be using the same algorithm as my code.
Any help would be highly appreciated!

You could strength-reduce
while (li < qr[i].l)
{
ans -= cnt[arr[li]]*cnt[arr[li]]*arr[li];
cnt[arr[li]]--;
ans += cnt[arr[li]]*cnt[arr[li]]*arr[li];
li++;
}
to
while (li < qr[i].l)
{
ans -= (2*cnt[arr[li]]-1)*arr[li];
cnt[arr[li]]--;
li++;
}
and likewise for the others.

You can modify the MO's sorting function comparator function cmp.
Your version:
bool cmp(const el &x, const el &y)
{
if (x.l/sqt != y.l/sqt)
return x.l/sqt < y.l/sqt;
return x.r < y.r;
}
Optimisation:
If the block is even, you can sort the R in descending order, and if the block is odd, you can sort the R in ascending order. This will minimise the movement of R pointer considerably when moving from one block to another.
My code:
bool cmp(const el &x, const el &y)
{
if (x.l/sqt != y.l/sqt)
return x.l/sqt < y.l/sqt;
return (x.l/sqt & 1) ? x.r < y.r : x.r > y.r; // avoids TLE
}

Related

Limit on array size

I wrote the following code in c++ which was supposed to print as well as calculate all the prime numbers till n.
The code is perfectly working for n<=10000 but is not working for n>=100000.
#include "iostream"
#include "vector"
using namespace std;
int main(){
int n,ans=0;
cin>>n;
vector <bool> v(n+1,true);
for(int i=2;i<=n;i++){
if(v[i]){
cout<<i<<endl;
ans++;
for(int j=i*i;j<=n;j+=i)
v[j]=false;
}
}
cout<<endl<<ans;
return 0;
}
Kindly state the reason.
Thank you.
In your inner look, j=i*i will overflow a 32-bit signed integer at around 46341. The easiest fix there is to use a long long for j and also cast i correctly before multiplication.
So, change the inner loop to
for (auto j = static_cast<long long>(i) * i; j <= n; j += i)
And that should be it.
As a side note, please don't #include standard headers with double quotes; prefer <vector> and <iostream>.
UPDATE: As a more robust way of doing the same thing (and still using roughly the same code,) I'd suggest the following:
#include <iostream>
#include <vector>
using namespace std;
int main(){
size_t n, ans = 0; // A) Switch to size_t, which can accommodate the largest
// size of a vector; if your number is larger, then we
// can't make an array for it...
cin >> n;
vector<bool> v (n, true); // B) Remove the chance of overflow; indices are
// now shifted by one
for(size_t i = 2; i <= n; i++) {
if (v[i - 1]) {
cout << i << endl;
ans++;
if (n / i >= i) { // C) Check for possibility of overflow in `i*i`
// D) Switch j to correct type
// E) Check if `j+=i` overflowed
for (size_t j = i * i; j <= n && j > i; j += i)
v[j - 1] = false;
}
}
}
cout << endl << ans;
return 0;
}
The really important changes are A, C, and D. The other two, B and E are needed for complete correctness, but they only matter when you have enough actual memory in your system to almost overflow size_t (otherwise, the ctor for vector would throw.) This is completely probably on 32-bit systems, but impossible on 64-bit builds for now.
Also note that A, B, and D are "essentially" free (perf-wise,) but C and E do have some impact on running time (although probably tiny; E is the more onerous.)

How to reduce memory usage? Problem from code forces

I solved this problem from codeforces: https://codeforces.com/problemset/problem/1471/B. But when I upload it it says memory limit exceeded. How can I reduce the memory usage? I used C++ for the problem. The problem was the following: "You have given an array a of length n and an integer x to a brand new robot. What the robot does is the following: it iterates over the elements of the array, let the current element be q. If q is divisible by x, the robot adds x copies of the integer qx to the end of the array, and moves on to the next element. Note that the newly added elements could be processed by the robot later. Otherwise, if q is not divisible by x, the robot shuts down.
Please determine the sum of all values of the array at the end of the process".
This is the code:
#include <iostream>
#include <cstdlib>
#include <vector>
using namespace std;
int main()
{
vector<int> vec;
vector<int> ans;
int temp;
int t;
cin >> t;
int a = 0;
int n, x;
for(int i=0; i<t; i++){
cin >> n >> x;
while(a<n){
cin >> temp;
a++;
vec.push_back(temp);
}
int q = 0;
while(true){
if(vec[q]%x == 0){
for(int copies=0; copies<x; copies++){
vec.push_back(vec[q]/x);
}
}
else{
break;
}
q++;
}
int sum = 0;
for(int z: vec){
sum += z;
}
ans.push_back(sum);
vec.clear();
a = 0;
}
for(int y: ans){
cout << y << endl;
}
return 0;
}
Thanks.
You don't need to build the array as specified to compute the sum
You might do:
int pow(int x, int n)
{
int res = 1;
for (int i = 0; i != n; ++i) {
res *= x;
}
return res;
}
int compute(const std::vector<int>& vec, int x)
{
int res = 0;
int i = 0;
while (true) {
const auto r = pow(x, i);
for (auto e : vec) {
if (e % r != 0) {
return res;
}
res += e;
}
++i;
}
}
Demo
Consider:
If you find an indivisible number in the original array, you're going to stop before you reach the numbers you have added (so they don't affect the result).
If you add q/x to the array but q/x isn't divisible by x, you're going to stop there when you reach it, if you haven't already stopped earlier. (On the other hand, if q/x is divisible by x, the sum of x copies of q/x is q, so adding them is equivalent to adding q.)
So you don't need to expand the array, you just need to sum the elements and - on the side - keep the sum of all the numbers you would have expanded with until you find one that is not a multiple of x.
Then you either add that to the sum of the array or not, depending on whether you reached the end of the array.

Making a square() function without x*x in C++

I am self-studying C++ and the book "Programming-Principles and Practices Using C++" by Bjarne Stroustrup. One of the "Try This" asks this:
Implement square() without using the multiplication operator; that is, do the x*x by repeated addition (start a
variable result at 0 and add x to it x times). Then run some version of “the first program” using that square().
Basically, I need to make a square(int x) function that will return the square of it without using the multiplication operator. I so far have this:
int square(int x)
{
int i = 0;
for(int counter = 0; counter < x; ++counter)
{
i = i + x;
}
return i;
}
But I was wondering if there was a better way to do this. The above function works, but I am highly sure it is not the best way to do it. Any help?
Mats Petersson stole the idea out of my head even before I thought to think it.
#include <iostream>
template <typename T>
T square(T x) {
if(x < 0) x = T(0)-x;
T sum{0}, s{x};
while(s) {
if(s & 1) sum += x;
x <<= 1;
s >>= 1;
}
return sum;
}
int main() {
auto sq = square(80);
std::cout << sq << "\n";
}
int square(int x) {
int result = { 0 };
int *ptr = &result;
for (int i = 0; i < x; i++) {
*ptr = *ptr + x;
}
return *ptr;
}
I am reading that book atm. Here is my solution.
int square(int x)
{
int result = 0;
for (int counter = 0; counter < x; ++counter) result += x;
return result;
}
int square(int n)
{
// handle negative input
if (n<0) n = -n;
// Initialize result
int res = n;
// Add n to res n-1 times
for (int i=1; i<n; i++)
res += n;
return res;
}
//Josef.L
//Without using multiplication operators.
int square (int a){
int b = 0; int c =0;
//I don't need to input value for a, because as a function it already did it for me.
/*while(b != a){
b ++;
c = c + a;}*/
for(int b = 0; b != a; b++){ //reduce the workload.
c = c +a;
//Interesting, for every time b is not equal to a, it will add one to its value:
//In the same time, when it add one new c = old c + input value will repeat again.
//Hence when be is equal to a, c which intially is 0 already add to a for a time.
//Therefore, it is same thing as saying a * a.
}
return c;
}
int main(void){
int a;
cin >>a;
cout <<"Square of: "<<a<< " is "<<square(a)<<endl;
return 0;
}
//intricate.
In term of the running time complexity,your implementation is clear and simply enough,its running time is T(n)=Θ(n) for input n elements.Of course you also can use Divide-and-Conquer method,assuming split n elements to n/2:n/2,and finally recursive compute it then sum up two parts,that running time will be like
T(n)=2T(n/2)+Θ(n)=Θ(nlgn),we can find its running time complexity become worse than your implementation.
You can include <math.h> or <cmath> and use its sqrt() function:
#include <iostream>
#include <math.h>
int square(int);
int main()
{
int no;
std::cin >> no;
std::cout << square(no);
return 0;
}
int square(int no)
{
return pow(no, 2);
}

How to optimise the O(m.n) solution for longest common subsequence?

Given two strings string X of length x1 and string Y of length y1, find the longest sequence of characters that appear left to right (but not necessarily in contiguous block) in both strings.
e.g if X = ABCBDAB and Y = BDCABA, the LCS(X,Y) = {"BCBA","BDAB","BCAB"} and LCSlength is 4.
I used the standard solution for this problem:
if(X[i]=Y[j]) :1+LCS(i+1,j+1)
if(X[i]!=Y[j]) :LCS(i,j+1) or LCS(i+1,j), whichever is greater
and then I used memorization, making it a standard DP problem.
#include<iostream>
#include<string>
using namespace std;
int LCS[1024][1024];
int LCSlen(string &x, int x1, string &y, int y1){
for(int i = 0; i <= x1; i++)
LCS[i][y1] = 0;
for(int j = 0; j <= y1; j++)
LCS[x1][j] = 0;
for(int i = x1 - 1; i >= 0; i--){
for(int j = y1 - 1; j >= 0; j--){
LCS[i][j] = LCS[i+1][j+1];
if(x[i] == y[j])
LCS[i][j]++;
if(LCS[i][j+1] > LCS[i][j])
LCS[i][j] = LCS[i][j+1];
if(LCS[i+1][j] > LCS[i][j])
LCS[i][j] = LCS[i+1][j];
}
}
return LCS[0][0];
}
int main()
{
string x;
string y;
cin >> x >> y;
int x1 = x.length() , y1 = y.length();
int ans = LCSlen( x, x1, y, y1);
cout << ans << endl;
return 0;
}
Running here, this solution I used in SPOJ and I got a time limit exceeded and/or runtime error.
Only 14 user solutions are yet accepted. Is there a smarter trick to decrease the time complexity of this question?
LCS is a classical, well studied computer science problem, and for the case with two sequences it is known that its lower bound is O(n·m).
Furthermore, your algorithm implementation has no obvious efficiency bugs, so it should run close to as fast as possible (although it may be beneficial to use a dynamically sized 2D matrix rather than an oversized one, which takes up 4 MiB of memory, and will require frequent cache invalidation (which is a costly operation, since it causes a transfer from main memory to the processor cache, which is several orders of magnitude slower than cached memory access).
In terms of algorithm, in order to lower the theoretical bound you need to exploit specifics of your input structure: for instance, if you are searching one of the strings repeatedly, it may pay to build a search index which takes some processing time, but will make the actual search much faster. Two classical variants of that are the suffix array and the suffix tree.
If it is known that at least one of your strings is very short (< 64 characters) you can use Myers’ bit vector algorithm, which performs much faster. Unfortunately the algorithm is far from trivial to implement. There exists an implementation in the SeqAn library, but using the library itself has a steep learning curve.
(As a matter of interest, this algorithm finds frequent application in bioinformatics, and has been used during the sequence assembly in the Human Genome Project.)
Although I still didn't get an AC because of time limit exceeded ,I was however able to implement the linear space algorithm.In case anyone wants to see, here is the c++ implementation of the Hirschbirg algorithm.
#include <cstdlib>
#include <algorithm>
#include <iostream>
#include <cstring>
#include <string>
#include <cstdio>
using namespace std;
int* compute_help_table(const string & A,const string & B);
string lcs(const string & A, const string & B);
string simple_solution(const string & A, const string & B);
int main(void) {
string A,B;
cin>>A>>B;
cout << lcs(A, B).size() << endl;
return 0;
}
string lcs(const string &A, const string &B) {
int m = A.size();
int n = B.size();
if (m == 0 || n == 0) {
return "";
}
else if(m == 1) {
return simple_solution(A, B);
}
else if(n == 1) {
return simple_solution(B, A);
}
else {
int i = m / 2;
string Asubstr = A.substr(i, m - i);
//reverse(Asubstr.begin(), Asubstr.end());
string Brev = B;
reverse(Brev.begin(), Brev.end());
int* L1 = compute_help_table(A.substr(0, i), B);
int* L2 = compute_help_table(Asubstr, Brev);
int k;
int M = -1;
for(int j = 0; j <= n; j++) {
if(M < L1[j] + L2[n-j]) {
M = L1[j] + L2[n-j];
k = j;
}
}
delete [] L1;
delete [] L2;
return lcs(A.substr(0, i), B.substr(0, k)) + lcs(A.substr(i, m - i), B.substr(k, n - k));
}
}
int* compute_help_table(const string &A, const string &B) {
int m = A.size();
int n = B.size();
int* first = new int[n+1];
int* second = new int[n+1];
for(int i = 0; i <= n; i++) {
second[i] = 0;
}
for(int i = 0; i < m; i++) {
for(int k = 0; k <= n; k++) {
first[k] = second[k];
}
for(int j = 0; j < n; j++) {
if(j == 0) {
if (A[i] == B[j])
second[1] = 1;
}
else {
if(A[i] == B[j]) {
second[j+1] = first[j] + 1;
}
else {
second[j+1] = max(second[j], first[j+1]);
}
}
}
}
delete [] first;
return second;
}
string simple_solution(const string & A, const string & B) {
int i = 0;
for(; i < B.size(); i++) {
if(B.at(i) == A.at(0))
return A;
}
return "";
}
Running here.
If the two strings share a common prefix (e.g. "ABCD" and "ABXY" share "AB") then that will be part of the LCS. Same for common suffixes. So for some pairs of strings you can gain some speed by skipping over the longest common prefix and longest common suffix before starting the DP algorithm; this doesn't change the worst-case bounds, but it changes the best case complexity to linear time and constant space.

Codechef practice question help needed - find trailing zeros in a factorial

I have been working on this for 24 hours now, trying to optimize it. The question is how to find the number of trailing zeroes in factorial of a number in range of 10000000 and 10 million test cases in about 8 secs.
The code is as follows:
#include<iostream>
using namespace std;
int count5(int a){
int b=0;
for(int i=a;i>0;i=i/5){
if(i%15625==0){
b=b+6;
i=i/15625;
}
if(i%3125==0){
b=b+5;
i=i/3125;
}
if(i%625==0){
b=b+4;
i=i/625;
}
if(i%125==0){
b=b+3;
i=i/125;
}
if(i%25==0){
b=b+2;
i=i/25;
}
if(i%5==0){
b++;
}
else
break;
}
return b;
}
int main(){
int l;
int n=0;
cin>>l; //no of test cases taken as input
int *T = new int[l];
for(int i=0;i<l;i++)
cin>>T[i]; //nos taken as input for the same no of test cases
for(int i=0;i<l;i++){
n=0;
for(int j=5;j<=T[i];j=j+5){
n+=count5(j); //no of trailing zeroes calculted
}
cout<<n<<endl; //no for each trialing zero printed
}
delete []T;
}
Please help me by suggesting a new approach, or suggesting some modifications to this one.
Use the following theorem:
If p is a prime, then the highest
power of p which divides n! (n
factorial) is [n/p] + [n/p^2] +
[n/p^3] + ... + [n/p^k], where k is
the largest power of p <= n, and [x] is the integral part of x.
Reference: PlanetMath
The optimal solution runs in O(log N) time, where N is the number you want to find the zeroes for. Use this formula:
Zeroes(N!) = N / 5 + N / 25 + N / 125 + ... + N / 5^k, until a division becomes 0. You can read more on wikipedia.
So for example, in C this would be:
int Zeroes(int N)
{
int ret = 0;
while ( N )
{
ret += N / 5;
N /= 5;
}
return ret;
}
This will run in 8 secs on a sufficiently fast computer. You can probably speed it up by using lookup tables, although I'm not sure how much memory you have available.
Here's another suggestion: don't store the numbers, you don't need them! Calculate the number of zeroes for each number when you read it.
If this is for an online judge, in my experience online judges exaggerate time limits on problems, so you will have to resort to ugly hacks even if you have the right algorithm. One such ugly hack is to not use functions such as cin and scanf, but instead use fread to read a bunch of data at once in a char array, then parse that data (DON'T use sscanf or stringstreams though) and get the numbers out of it. Ugly, but a lot faster usually.
This question is from codechef.
http://www.codechef.com/problems/FCTRL
How about this solution:
#include <stdio.h>
int a[] = {5, 25, 125, 625, 3125, 15625, 78125, 390625, 1953125, 9765625, 48828125, 244140625};
int main()
{
int i, j, l, n, ret = 0, z;
scanf("%d", &z);
for(i = 0; i < z; i++)
{
ret = 0;
scanf("%d", &n);
for(j = 0; j < 12; j++)
{
l = n / a[j];
if(l <= 0)
break;
ret += l;
}
printf("%d\n", ret);
}
return 0;
}
Any optimizations???
Knows this is over 2 years old but here's my code for future reference:
#include <cmath>
#include <cstdio>
inline int read()
{
char temp;
int x=0;
temp=getchar_unlocked();
while(temp<48)temp=getchar_unlocked();
x+=(temp-'0');
temp=getchar_unlocked();
while(temp>=48)
{
x=x*10;
x+=(temp-'0');
temp=getchar_unlocked();
}
return x;
}
int main()
{
int T,x,z;
int pows[]={5,25,125,625,3125,15625,78125,390625,1953125,9765625,48828125,244140625};
T=read();
for(int i=0;i<T;i++)
{
x=read();
z=0;
for(int j=0;j<12 && pows[j]<=x;j++)
z+=x/pows[j];
printf("%d\n",z);
}
return 0;
}
It ran in 0.13s
Here is my accepted solution. Its score is 1.51s, 2.6M. Not the best, but maybe it can help you.
#include <iostream>
using namespace std;
void calculateTrailingZerosOfFactoriel(int testNumber)
{
int numberOfZeros = 0;
while (true)
{
testNumber = testNumber / 5;
if (testNumber > 0)
numberOfZeros += testNumber;
else
break;
}
cout << numberOfZeros << endl;
}
int main()
{
//cout << "Enter number of tests: " << endl;
int t;
cin >> t;
for (int i = 0; i < t; i++)
{
int testNumber;
cin >> testNumber;
calculateTrailingZerosOfFactoriel(testNumber);
}
return 0;
}
#include <cstdio>
int main(void) {
long long int t, n, s, i, j;
scanf("%lld", &t);
while (t--) {
i=1; s=0; j=5;
scanf("%lld", &n);
while (i != 0) {
i = n / j;
s = s + i * (2*j + (i-1) * j) / 2;
j = j * 5;
}
printf("%lld\n", s);
}
return 0;
}
You clearly already know the correct algorithm. The bottleneck in your code is the use of cin/cout. When dealing with very large input, cin is extremely slow compared to scanf.
scanf is also slower than direct methods of reading input such as fread, but using scanf is sufficient for almost all problems on online judges.
This is detailed in the Codechef FAQ, which is probably worth reading first ;)