Improving optimization of nested loop - c++

I'm making a simple program to calculate the number of pairs in an array that are divisible by 3 array length and values are user determined.
Now my code is perfectly fine. However, I just want to check if there is a faster way to calculate it which results in less compiling time?
As the length of the array is 10^4 or less compiler takes less than 100ms. However, as it gets more to 10^5 it spikes up to 1000ms so why is this? and how to improve speed?
#include <iostream>
using namespace std;
int main()
{
int N, i, b;
b = 0;
cin >> N;
unsigned int j = 0;
std::vector<unsigned int> a(N);
for (j = 0; j < N; j++) {
cin >> a[j];
if (j == 0) {
}
else {
for (i = j - 1; i >= 0; i = i - 1) {
if ((a[j] + a[i]) % 3 == 0) {
b++;
}
}
}
}
cout << b;
return 0;
}

Your algorithm has O(N^2) complexity. There is a faster way.
(a[i] + a[j]) % 3 == ((a[i] % 3) + (a[j] % 3)) % 3
Thus, you need not know the exact numbers, you need to know their remainders of division by three only. Zero remainder of the sum can be received with two numbers with zero remainders (0 + 0) and with two numbers with remainders 1 and 2 (1 + 2).
The result will be equal to r[1]*r[2] + r[0]*(r[0]-1)/2 where r[i] is the quantity of numbers with remainder equal to i.
int r[3] = {};
for (int i : a) {
r[i % 3]++;
}
std::cout << r[1]*r[2] + (r[0]*(r[0]-1)) / 2;
The complexity of this algorithm is O(N).

I've encountered this problem before, and while I don't find my particular solution, you could improve running times by hashing.
The code would look something like this:
// A C++ program to check if arr[0..n-1] can be divided
// in pairs such that every pair is divisible by k.
#include <bits/stdc++.h>
using namespace std;
// Returns true if arr[0..n-1] can be divided into pairs
// with sum divisible by k.
bool canPairs(int arr[], int n, int k)
{
// An odd length array cannot be divided into pairs
if (n & 1)
return false;
// Create a frequency array to count occurrences
// of all remainders when divided by k.
map<int, int> freq;
// Count occurrences of all remainders
for (int i = 0; i < n; i++)
freq[arr[i] % k]++;
// Traverse input array and use freq[] to decide
// if given array can be divided in pairs
for (int i = 0; i < n; i++)
{
// Remainder of current element
int rem = arr[i] % k;
// If remainder with current element divides
// k into two halves.
if (2*rem == k)
{
// Then there must be even occurrences of
// such remainder
if (freq[rem] % 2 != 0)
return false;
}
// If remainder is 0, then there must be two
// elements with 0 remainder
else if (rem == 0)
{
if (freq[rem] & 1)
return false;
}
// Else number of occurrences of remainder
// must be equal to number of occurrences of
// k - remainder
else if (freq[rem] != freq[k - rem])
return false;
}
return true;
}
/* Driver program to test above function */
int main()
{
int arr[] = {92, 75, 65, 48, 45, 35};
int k = 10;
int n = sizeof(arr)/sizeof(arr[0]);
canPairs(arr, n, k)? cout << "True": cout << "False";
return 0;
}
That works for a k (in your case 3)
But then again, this is not my code, but the code you can find in the following link. with a proper explanation. I didn't just paste the link since it's bad practice I think.

Related

Minimum Cost to reduce the size of array to 1

Given an array of N numbers (not necessarily sorted). We can merge any two numbers into one and the cost of merging the two numbers is equal to the sum of the two values. The task is to find the total minimum cost of merging all the numbers.
Example:
Let the array A = [1,2,3,4]
Then, we can remove 1 and 2, add both of them and keep the sum back in array. Cost of this step would be (1+2) = 3.
Now, A = [3,3,4], Cost = 3
In second step, we can 3 and 3, add both of them and keep the sum back in array. Cost of this step would be (3+3) = 6.
Now, A = [4,6], Cost = 6
In third step, we can remove both elements from the array and keep the sum back in array again. Cost of this step would be (4+6) = 6.
Now, A = [10], Cost = 10
So, total cost turns out to be 19 (10+6+3).
We will have to pick the 2 smallest elements to minimize our total cost. A simple way to do this is using a min heap structure. We will be able to get the minimum element in O(1) and insertion will be O(log n).
The time complexity of this approach is O(n log n).
But I tried another approach, and wasn't able to find the cases where it fails. The basic idea was that the sum of two smallest elements that we will choose at any time will always be greater than the sum of the pair of elements chosen before. So the "temp" array will always be sorted, and we will be able to access the minimum elements in O(1).
As I am sorting the input array and then simply traversing the array, the complexity of my approach is O(n log n).
int minCost(vector<int>& arr) {
sort(arr.begin(), arr.end());
// temp array will contain the sum of all the pairs of minimum elements
vector<int> temp;
// index for arr
int i = 0;
// index for temp
int j = 0;
int cost = 0;
// while we have more than 1 element combined in both the input and temp array
while(arr.size() - i + temp.size() - j > 1) {
int num1, num2;
// selecting num1 (minimum element)
if(i < arr.size() && j < temp.size()) {
if(arr[i] <= temp[j])
num1 = arr[i++];
else
num1 = temp[j++];
}
else if(i < arr.size())
num1 = arr[i++];
else if(j < temp.size())
num1 = temp[j++];
// selecting num2 (second minimum element)
if(i < arr.size() && j < temp.size()) {
if(arr[i] <= temp[j])
num2 = arr[i++];
else
num2 = temp[j++];
}
else if(i < arr.size())
num2 = arr[i++];
else if(j < temp.size())
num2 = temp[j++];
// appending the sum of the minimum elements in the temp array
int sum = num1 + num2;
temp.push_back(sum);
cost += sum;
}
return cost;
}
Is this approach correct? If not, please let me know what I am missing, and the test cases in which this algorithm fails.
SPOJ Link for the same problem
The logic seems very solid to me... all the computed sums will never be decreasing and therefore you only need to add up either oldest two computed sums, next two elements or oldest sum and next element.
I would just simplify the code:
#include <vector>
#include <algorithm>
#include <stdio.h>
int hsum(std::vector<int> arr) {
int ni = arr.size(), nj = 0, i = 0, j = 0, res = 0;
std::sort(arr.begin(), arr.end());
std::vector<int> temp;
auto get = [&]()->int {
if (j == nj || (i < ni && arr[i] < temp[j])) return arr[i++];
return temp[j++];
};
while ((ni-i)+(nj-j)>1) {
int a = get(), b = get();
res += a+b;
temp.push_back(a + b); nj++;
}
return res;
}
int main() {
fprintf(stderr, "%i\n", hsum(std::vector<int>{1,4,2,3}));
return 0;
}
Very nice idea!
Another improvement is noting that the cumulative length of the two arrays being processed (the original one and the temporary one holding the sums) will decrease at every step.
Since the first step will use two input elements, the fact that the temporary array grows one element at each step will still not be enough for a "walking queue" allocated in the array itself to reach the reading pointer.
This means that there is no need of a temporary array and the space for the sums can be found in the array itself...
int hsum(std::vector<int> arr) {
int ni = arr.size(), nj = 0, i = 0, j = 0, res = 0;
std::sort(arr.begin(), arr.end());
auto get = [&]()->int {
if (j == nj || (i < ni && arr[i] < arr[j])) return arr[i++];
return arr[j++];
};
while ((ni-i)+(nj-j)>1) {
int a = get(), b = get();
res += a+b;
arr[nj++] = a + b;
}
return res;
}
About the error on SPOJ... I tried briefly to search for the problem but I didn't succeed. I tried however generating random arrays of random lengths and checking this solution with what finds a "brute-force" one implemented directly from the specs and I'm reasonably confident that the algorithm is correct.
I know at least one programming arena (Topcoder) where sometimes the problems are carefully crafted so that the computation gives correct results if using unsigned but not if using int (or if using unsigned long long but not if using long long) because of integer overflow.
I don't know if SPOJ also does this kind of nonsense(1)... may be that is the reason some hidden test case fails...
EDIT
Checking with SPOJ the algorithm passes if using long long values... this is the entry I used:
#include <stdio.h>
#include <algorithm>
#include <vector>
int main(int argc, const char *argv[]) {
int n;
scanf("%i", &n);
for (int testcase=0; testcase<n; testcase++) {
int sz; scanf("%i", &sz);
std::vector<long long> arr(sz);
for (int i=0; i<sz; i++) scanf("%lli", &arr[i]);
int ni = arr.size(), nj = 0, i = 0, j = 0;
long long res = 0;
std::sort(arr.begin(), arr.end());
auto get = [&]() -> long long {
if (j == nj || (i < ni && arr[i] < arr[j])) return arr[i++];
return arr[j++];
};
while ((ni-i)+(nj-j)>1) {
long long a = get(), b = get();
res += a+b;
arr[nj++] = a + b;
}
printf("%lli\n", res);
}
return 0;
}
PS: This very kind of computation is also what is needed to build an Huffman tree for entropy coding given the symbols frequency table and thus it's not a mere random exercise but it has practical applications.
(1) I'm saying "nonsense" because in Topcoder they never give problems that require 65 bits; thus it's not a genuine care about overflows, but just setting traps for novices.
Another that I think is a bad practice I saw on TC is that some problems are carefully designed so that the correct algorithm if using C++ will barely fit in the timeout limit: just use another language (and get e.g. a 2× slowdown) and you cannot solve the problem.
First of all, think simple!
When using a priority queue, the problem is easy!
In the first test case :
1 6 3 20
// after pushing to Q
1 3 6 20
// and sum two top items and pop and push!
(1 + 3) 6 20 cost = 4
(4 + 6) 20 cost = 10 + 4
(10 + 20) cost = 30 + 14
30 cost = 44
#include<iostream>
#include<queue>
using namespace std;
int main()
{
int t;
cin >> t;
while (t--) {
int n;
cin >> n;
priority_queue<long long int, vector<long long int>, greater<long long int>> q;
for (int i = 0; i < n; ++i) {
int k;
cin >> k;
q.push(k);
}
long long int sum = 0;
while (q.size() > 1) {
long long int a = q.top();
q.pop();
long long int b = q.top();
q.pop();
q.push(a + b);
sum += a + b;
}
cout << sum << "\n";
}
}
Basically we need to sort the list in desc order and then find its cost like this.
A.sort(reverse=True)
cost = 0
for i in range(len(A)):
cost += A[i] * (i+1)
return cost

How to reduce the time in this program?

I have a program like this: given a sequence of integers, find the biggest prime and its positon.
Example:
input:
9 // how many numbers
19 7 81 33 17 4 19 21 13
output:
19 // the biggest prime
1 7 // and its positon
So first I get the input, store it in an array, make a copy of that array and sort it (because I use a varible to keep track of the higest prime, and insane thing will happen if that was unsorted) work with every number of that array to check if it is prime, loop through it again to have the positon and print the result.
But the time is too slow, can I improve it?
My code:
#include <iostream>
#include <cmath>
#include <algorithm>
using namespace std;
int main()
{
int n;
cin >> n;
int numbersNotSorted[n];
int maxNum{0};
for (int i = 0; i < n; i++)
{
cin >> numbersNotSorted[i];
}
int numbersSorted[n];
for (int i = 0; i < n; i++)
{
numbersSorted[i] = numbersNotSorted[i];
}
sort(numbersSorted, numbersSorted + n);
for (int number = 0; number < n; number++)
{
int countNum{0};
for (int i = 2; i <= sqrt(numbersSorted[number]); i++)
{
if (numbersSorted[number] % i == 0)
countNum++;
}
if (countNum == 0)
{
maxNum = numbersSorted[number];
}
}
cout << maxNum << '\n';
for (int i = 0; i < n; i++)
{
if (numbersNotSorted[i] == maxNum)
cout << i + 1 << ' ';
}
}
If you need the biggest prime, sorting the array brings you no benefit, you'll need to check all the values stored in the array anyway.
Even if you implemented a fast sorting algorithm, the best averages you can hope for are O(N + k), so just sorting the array is actually more costly than looking for the largest prime in an unsorted array.
The process is pretty straight forward, check if the next value is larger than the current largest prime, and if so check if it's also prime, store the positions and/or value if it is, if not, check the next value, repeat until the end of the array.
θ(N) time compexity will be the best optimization possible given the conditions.
Start with a basic "for each number entered" loop:
#include <iostream>
#include <cmath>
#include <algorithm>
using namespace std;
int main() {
int n;
int newNumber;
cin >> n;
for (int i = 0; i < n; i++) {
cin >> newNumber;
}
}
If the new number is smaller than the current largest prime, then it can be ignored.
int main() {
int n;
int newNumber;
int highestPrime;
cin >> n;
for (int i = 0; i < n; i++) {
cin >> newNumber;
if(newNumber >= highestPrime) {
}
}
}
If the new number is equal to the highest prime, then you just need to store its position somewhere. I'm lazy, so:
int main() {
int n;
int newNumber;
int highestPrime;
int maxPositions = 1234;
int positionList[maxPositions];
int nextPosition;
int currentPosition = 0;
cin >> n;
for (int i = 0; i < n; i++) {
cin >> newNumber;
currentPosition++;
if(newNumber >= highestPrime) {
if(newNumber == highestPrime) {
if(nextPosition+1 >= maxPositions) {
// List of positions is too small (should've used malloc/realloc instead of being lazy)!
} else {
positionList[nextPosition++] = currentPosition;
}
}
}
}
}
If the new number is larger than the current largest prime, then you need to figure out if it is a prime number, and if it is you need to reset the list and store its position, etc:
int main() {
int n;
int newNumber;
int highestPrime = 0;
int maxPositions = 1234;
int positionList[maxPositions];
int nextPosition;
int currentPosition = 0;
cin >> n;
for (int i = 0; i < n; i++) {
cin >> newNumber;
currentPosition++;
if(newNumber >= highestPrime) {
if(newNumber == highestPrime) {
if(nextPosition+1 >= maxPositions) {
// List of positions is too small (should've used malloc/realloc instead of being lazy)!
} else {
positionList[nextPosition++] = currentPosition;
}
} else { // newNumber > highestPrime
if(isPrime(newNumber)) {
nextPosition = 0; // Reset the list
highestPrime = newNumber;
positionList[nextPosition++] = currentPosition;
}
}
}
}
}
You'll also want something to display the results:
if(highestPrime > 0) {
for(nextPosition= 0; nextPosition < currentPosition; nextPosition++) {
cout << positionList[nextPosition];
}
}
Now; the only thing you're missing is an isPrime(int n) function. The fastest way to do that is to pre-calculate a "is/isn't prime" bitfield. It might look something like:
bool isPrime(int n) {
if(n & 1 != 0) {
n >>= 1;
if( primeNumberBitfield[n / 32] & (1 << (n % 32)) != 0) {
return true;
}
}
return false;
}
The problem here is that (for positive values in a 32-bit signed integer) you'll need 1 billion bits (or 128 MiB).
To avoid that you can use a much smaller bitfield for numbers up to sqrt(1 << 31) (which is only about 4 KiB); then if the number is too large for the bitfield you can use the bitfield to find prime numbers and check (with modulo) if they divide the original number evenly.
Note that Sieve of Eratosthenes ( https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes ) is an efficient way to generate that smaller bitfield (but is not efficient to use for a sparse population of larger numbers).
If you do it right, you'll probably create the illusion that it's instantaneous because almost all of the work will be done while a human is slowly typing the numbers in (and not left until after all of the numbers have been entered). For a very fast typist you'll have ~2 milliseconds between numbers, and (after the last number is entered) humans can't notice delays smaller than about 10 milliseconds.
But the time is too slow, can I improve it?
Below loop suffers from:
Why check smallest values first? Makes more sense to check largest values first to find the largest prime. Exit the for (... number..) loop early once a prime is found. This takes advantage of the work done by sort().
Once a candidate value is not a prime, quit testing for prime-ness.
.
// (1) Start for other end rather than as below
for (int number = 0; number < n; number++) {
int countNum {0};
for (int i = 2; i <= sqrt(numbersSorted[number]); i++) {
if (numbersSorted[number] % i == 0)
// (2) No point in continuing prime testing, Value is composite.
countNum++;
}
if (countNum == 0) {
maxNum = numbersSorted[number];
}
}
Corrections left for OP to implement.
Advanced: Prime testing is a deep subject and many optimizations (trivial and complex) exist that are better than OP's approach. Yet I suspect the above 2 improvement will suffice for OP.
Brittleness: Code does not well handle the case of no primes in the list or n <= 0.
i <= sqrt(numbersSorted[number]) is prone to FP issues leading to an incorrect results. Recommend i <= numbersSorted[number]/i).
Sorting is O(n * log n). Prime testing, as done here, is O(n * sqrt(n[i])). Sorting does not increase O() of the overall code when the square root of the max value is less than log of n. Sorting is worth doing if the result of the sort is used well.
Code fails if the largest value was 1 as prime test incorrectly identifies 1 as a prime.
Code fails if numbersSorted[number] < 0 due to sqrt().
Simply full-range int prime test:
bool isprime(int num) {
if (num % 2 == 0) return num == 2;
for (int divisor = 3; divisor <= num / divisor; divisor += 2) {
if (num % divisor == 0) return false;
}
return num > 1;
}
If you want to find the prime, don't go for sorting. You'll have to check for all the numbers present in the array then.
You can try this approach to do the same thing, but all within a lesser amount of time:
Step-1: Create a global function for detecting a prime number. Here's how you can approach this-
bool prime(int n)
{
int i, p=1;
for(i=2;i<=sqrt(n);i++) //note that I've iterated till the square root of n, to cut down on the computational time
{
if(n%i==0)
{
p=0;
break;
}
}
if(p==0)
return false;
else
return true;
}
Step-2: Now your main function starts. You take input from the user:
int main()
{
int n, i, MAX;
cout<<"Enter the number of elements: ";
cin>>n;
int arr[n];
cout<<"Enter the array elements: ";
for(i=0;i<n;i++)
cin>>arr[i];
Step-3: Note that I've declared a counter variable MAX. I initialize this variable as the first element of the array: MAX=arr[0];
Step-4: Now the loop for iterating the array. What I did was, I iterated through the array and at each element, I checked if the value is greater than or equal to the previous MAX. This will ensure, that the program does not check the values which are less than MAX, thus eliminating a part of the array and cutting down the time. I then nested another if statement, to check if the value is a prime or not. If both of these are satisfied, I set the value of MAX to the current value of the array:
for(i=0;i<n;i++)
{
if(arr[i]>=MAX) //this will check if the number is greater than the previous MAX number or not
{
if(prime(arr[i])) //if the previous condition satisfies, then only this block of code will run and check if it's a prime or not
MAX=arr[i];
}
}
What happens is this- The value of MAX changes to the max prime number of the array after every single loop.
Step-5: Then, after finally traversing the array, when the program finally comes out of the loop, MAX will have the largest prime number of the array stored in it. Print this value of MAX. Now for getting the positions where MAX happens, just iterate over the whole loop and check for the values that match MAX and print their positions:
for(i=0;i<n;i++)
{
if(arr[i]==MAX)
cout<<i+1<<" ";
}
I ran this code in Dev C++ 5.11 and the compilation time was 0.72s.

using vector as function output

I'm trying to write a code shows all the numbers with the following characteristics:
the number itself is a prime number.
for each digit removed from the right the remaining number should still be a prime number.
Considering the number 293 for example: 293 itself is a prime number if we delete the digit on the right we have 29 which is still a prime number, and if we delete the right digit again we have 2 which is still prime.
I'm trying to write a code that gets the integer n<=8 from the user and shows all the n-digit numbers that have the characteristics stated above. My algorithm is to write a recursive function (show) that returns the vector v.
If n=1 then it just shows the numbers 2-3-5-7... if n!=1 it should call show(n-1) and multiply all the generated numbers by 10 and add them up with odd numbers... then it should check if the new number is prime. If so it should be added to the vector.
My problem is the code only works for n=1. Here is my code:
#include <iostream>
#include <cmath>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
bool isPrime(int a)
{
int i, p = 0;
if (a == 1)
return false;
else
{
for (i = a - 1; i > sqrt(a); i--)
if (a % i == 0)
p++;
if (p != 0)
return false;
else
return true;
}
}
vector<int> show(int n)
{
vector<int> v;
int i, j;
if (n == 1)
{
v.push_back(2);
v.push_back(3);
v.push_back(5);
v.push_back(7);
}
else
{
show(n - 1);
if (n != 1)
for (i = 0; i < v.size(); i++)
{
for (j = 1; j <= 9; j += 2)
if (isPrime((v.at(i) * 10) + j))
v.at(i) = (v.at(i) * 10) + j;
}
}
return v;
}
int main()
{
int n, s = 0, i;
cin >> n;
show(n);
for (i = 0; i < show(n).size(); i++)
cout << show(n).at(i) << endl;
system("pause");
return 0;
}
In addition to comments on question, take a look at the show(n - 1) line. Shouldn't you be saving the return value: v = show(n - 1)?
Also, it will be better to pass the vector as a reference, in this way you avoid the copy of vector's content (in your case it doesn't have much impact since vector will not grow too much, but imagine using large values for n).

For a given number N, how do I find x, S.T product of (x and no. of factors to x) = N?

to find factors of number, i am using function void primeFactors(int n)
# include <stdio.h>
# include <math.h>
# include <iostream>
# include <map>
using namespace std;
// A function to print all prime factors of a given number n
map<int,int> m;
void primeFactors(int n)
{
// Print the number of 2s that divide n
while (n%2 == 0)
{
printf("%d ", 2);
m[2] += 1;
n = n/2;
}
// n must be odd at this point. So we can skip one element (Note i = i +2)
for (int i = 3; i <= sqrt(n); i = i+2)
{
// While i divides n, print i and divide n
while (n%i == 0)
{
int k = i;
printf("%d ", i);
m[k] += 1;
n = n/i;
}
}
// This condition is to handle the case whien n is a prime number
// greater than 2
if (n > 2)
m[n] += 1;
printf ("%d ", n);
cout << endl;
}
/* Driver program to test above function */
int main()
{
int n = 72;
primeFactors(n);
map<int,int>::iterator it;
int to = 1;
for(it = m.begin(); it != m.end(); ++it){
cout << it->first << " appeared " << it->second << " times "<< endl;
to *= (it->second+1);
}
cout << to << " total facts" << endl;
return 0;
}
You can check it here. Test case n = 72.
http://ideone.com/kaabO0
How do I solve above problem using above algo. (Can it be optimized more ?). I have to consider large numbers as well.
What I want to do ..
Take example for N = 864, we found X = 72 as (72 * 12 (no. of factors)) = 864)
There is a prime-factorizing algorithm for big numbers, but actually it is not often used in programming contests.
I explain 3 methods and you can implementate using this algorithm.
If you implementated, I suggest to solve this problem.
Note: In this answer, I use integer Q for the number of queries.
O(Q * sqrt(N)) solution per query
Your algorithm's time complexity is O(n^0.5).
But you are implementating with int (32-bit), so you can use long long integers.
Here's my implementation: http://ideone.com/gkGkkP
O(sqrt(maxn) * log(log(maxn)) + Q * sqrt(maxn) / log(maxn)) algorithm
You can reduce the number of loops because composite numbers are not neccesary for integer i.
So, you can only use prime numbers in the loop.
Algorithm:
Calculate all prime numbers <= sqrt(n) with Eratosthenes's sieve. The time complexity is O(sqrt(maxn) * log(log(maxn))).
In a query, loop for i (i <= sqrt(n) and i is a prime number). The valid integer i is about sqrt(n) / log(n) with prime number theorem, so the time complexity is O(sqrt(n) / log(n)) per query.
More efficient algorithm
There are more efficient algorithm in the world, but it is not used often in programming contests.
If you check "Integer factorization algorithm" on the internet or wikipedia, you can find the algorithm like Pollard's-rho or General number field sieve.
Well,I will show you the code.
# include <stdio.h>
# include <iostream>
# include <map>
using namespace std;
const long MAX_NUM = 2000000;
long prime[MAX_NUM] = {0}, primeCount = 0;
bool isNotPrime[MAX_NUM] = {1, 1}; // yes. can be improve, but it is useless when sieveOfEratosthenes is end
void sieveOfEratosthenes() {
//#see https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
for (long i = 2; i < MAX_NUM; i++) { // it must be i++
if (!isNotPrime[i]) //if it is prime,put it into prime[]
prime[primeCount++] = i;
for (long j = 0; j < primeCount && i * prime[j] < MAX_NUM; j++) { /*foreach prime[]*/
// if(i * prime[j] >= MAX_NUM){ // if large than MAX_NUM break
// break;
// }
isNotPrime[i * prime[j]] = 1; // set i * prime[j] not a prime.as you see, i * prime[j]
if (!(i % prime[j])) //if this prime the min factor of i,than break.
// and it is the answer why not i+=( (i & 1) ? 2 : 1).
// hint : when we judge 2,prime[]={2},we set 2*2=4 not prime
// when we judge 3,prime[]={2,3},we set 3*2=6 3*3=9 not prime
// when we judge 4,prime[]={2,3},we set 4*2=8 not prime (why not set 4*3=12?)
// when we judge 5,prime[]={2,3,5},we set 5*2=10 5*3=15 5*5=25 not prime
// when we judge 6,prime[]={2,3,5},we set 6*2=12 not prime,than we can stop
// why not put 6*3=18 6*5=30 not prime? 18=9*2 30=15*2.
// this code can make each num be set only once,I hope it can help you to understand
// this is difficult to understand but very useful.
break;
}
}
}
void primeFactors(long n)
{
map<int,int> m;
map<int,int>::iterator it;
for (int i = 0; prime[i] <= n; i++) // we test all prime small than n , like 2 3 5 7... it musut be i++
{
while (n%prime[i] == 0)
{
cout<<prime[i]<<" ";
m[prime[i]] += 1;
n = n/prime[i];
}
}
cout<<endl;
int to = 1;
for(it = m.begin(); it != m.end(); ++it){
cout << it->first << " appeared " << it->second << " times "<< endl;
to *= (it->second+1);
}
cout << to << " total facts" << endl;
}
int main()
{
//first init for calculate all prime numbers,for example we define MAX_NUM = 2000000
// the result of prime[] should be stored, you primeFactors will use it
sieveOfEratosthenes();
//second loop for i (i*i <= n and i is a prime number). n<=MAX_NUM
int n = 72;
primeFactors(n);
n = 864;
primeFactors(n);
return 0;
}
My best shot at performance without getting overboard with special algos.
The Erathostenes' seive - the complexity of the below is O(N*log(log(N))) - because the inner j loop starts from i*i instead of i.
#include <vector>
using std::vector;
void erathostenes_sieve(size_t upToN, vector<size_t>& primes) {
primes.clear();
vector<bool> bitset(upToN+1, true); // if the bitset[i] is true, the i is prime
bitset[0]=bitset[1]=0;
// if i is 2, will jump to 3, otherwise will jump on odd numbers only
for(size_t i=2; i<=upToN; i+=( (i&1) ? 2 : 1)) {
if(bitset[i]) { // i is prime
primes.push_back(i);
// it is enough to start the next cycle from i*i, because all the
// other primality tests below it are already performed:
// e.g:
// - i*(i-1) was surely marked non-prime when we considered multiples of 2
// - i*(i-2) was tested at (i-2) if (i-2) was prime or earlier (if non-prime)
for(size_t j=i*i; j<upToN; j+=i) {
bitset[j]=false; // all multiples of the prime with value of i
// are marked non-prime, using **addition only**
}
}
}
}
Now factoring based on the primes (set in a sorted vector). Before this, let's examine the myth of sqrt being expensive but a large bunch of multiplications is not.
First of all, let us note that sqrt is not that expensive anymore: on older CPU-es (x86/32b) it used to be twice as expensive as a division (and a modulo operation is division), on newer architectures the CPU costs are equal. Since factorisation is all about % operations again and again, one may still consider sqrt now and then (e.g. if and when using it saves CPU time).
For example consider the following code for an N=65537 (which is the 6553-th prime) assuming the primes has 10000 entries
size_t limit=std::sqrt(N);
size_t largestPrimeGoodForN=std::distance(
primes.begin(),
std::upper_limit(primes.begin(), primes.end(), limit) // binary search
);
// go descendingly from limit!!!
for(int i=largestPrimeGoodForN; i>=0; i--) {
// factorisation loop
}
We have:
1 sqrt (equal 1 modulo),
1 search in 10000 entries - at max 14 steps, each involving 1 comparison, 1 right-shift division-by-2 and 1 increment/decrement - so let's say a cost equal with 14-20 multiplications (if ever)
1 difference because of std::distance.
So, maximal cost - 1 div and 20 muls? I'm generous.
On the other side:
for(int i=0; primes[i]*primes[i]<N; i++) {
// factorisation code
}
Looks much simpler, but as N=65537 is prime, we'll go through all the cycle up to i=64 (where we'll find the first prime which cause the cycle to break) - a total of 65 multiplications.
Try this with a a higher prime number and I guarantee you the cost of 1 sqrt+1binary search are better use of the CPU cycle than all the multiplications on the way in the simpler form of the cycle touted as a better performance solution
So, back to factorisation code:
#include <algorithm>
#include <math>
#include <unordered_map>
void factor(size_t N, std::unordered_map<size_t, size_t>& factorsWithMultiplicity) {
factorsWithMultiplicity.clear();
while( !(N & 1) ) { // while N is even, cheaper test than a '% 2'
factorsWithMultiplicity[2]++;
N = N >> 1; // div by 2 of an unsigned number, cheaper than the actual /2
}
// now that we know N is even, we start using the primes from the sieve
size_t limit=std::sqrt(N); // sqrt is no longer *that* expensive,
vector<size_t> primes;
// fill the primes up to the limit. Let's be generous, add 1 to it
erathostenes_sieve(limit+1, primes);
// we know that the largest prime worth checking is
// the last element of the primes.
for(
size_t largestPrimeIndexGoodForN=primes.size()-1;
largestPrimeIndexGoodForN<primes.size(); // size_t is unsigned, so after zero will underflow
// we'll handle the cycle index inside
) {
bool wasFactor=false;
size_t factorToTest=primes[largestPrimeIndexGoodForN];
while( !( N % factorToTest) ) {
wasFactor=true;// found one
factorsWithMultiplicity[factorToTest]++;
N /= factorToTest;
}
if(1==N) { // done
break;
}
if(wasFactor) { // time to resynchronize the index
limit=std::sqrt(N);
largestPrimeIndexGoodForN=std::distance(
primes.begin(),
std::upper_bound(primes.begin(), primes.end(), limit)
);
}
else { // no luck this time
largestPrimeIndexGoodForN--;
}
} // done the factoring cycle
if(N>1) { // N was prime to begin with
factorsWithMultiplicity[N]++;
}
}

Count subarrays divisible by K

Given a sequence of n positive integers we need to count consecutive sub-sequences whose sum is divisible by k.
Constraints : N is up to 10^6 and each element up to 10^9 and K is up to 100
EXAMPLE : Let N=5 and K=3 and array be 1 2 3 4 1
Here answer is 4
Explanation : there exists, 4 sub-sequences whose sum is divisible by 3, they are
3
1 2
1 2 3
2 3 4
My Attempt :
long long int count=0;
for(int i=0;i<n;i++){
long long int sum=0;
for(int j=i;j<n;j++)
{
sum=sum+arr[j];
if(sum%k==0)
{
count++;
}
}
}
But obviously its poor approach. Can their be better approach for this question? Please help.
Complete Question: https://www.hackerrank.com/contests/w6/challenges/consecutive-subsequences
Here is a fast O(n + k) solution:
1)Lets compute prefix sums pref[i](for 0 <= i < n).
2)Now we can compute count[i] - the number of prefixes with sum i modulo k(0 <= i < k).
This can be done by iterating over all the prefixes and making count[pref[i] % k]++.
Initially, count[0] = 1(an empty prefix has sum 0) and 0 for i != 0.
3)The answer is sum count[i] * (count[i] - 1) / 2 for all i.
4)It is better to compute prefix sums modulo k to avoid overflow.
Why does it work? Let's take a closer a look at a subarray divisible by k. Let's say that it starts in L position and ends in R position. It is divisible by k if and only if pref[L - 1] == pref[R] (modulo k) because their differnce is zero modulo k(by definition of divisibility). So for each fixed modulo, we can pick any two prefixes with this prefix sum modulo k(and there are exactly count[i] * (count[i] - 1) / 2 ways to do it).
Here is my code:
long long get_count(const vector<int>& vec, int k) {
//Initialize count array.
vector<int> cnt_mod(k, 0);
cnt_mod[0] = 1;
int pref_sum = 0;
//Iterate over the input sequence.
for (int elem : vec) {
pref_sum += elem;
pref_sum %= k;
cnt_mod[pref_sum]++;
}
//Compute the answer.
long long res = 0;
for (int mod = 0; mod < k; mod++)
res += (long long)cnt_mod[mod] * (cnt_mod[mod] - 1) / 2;
return res;
}
That have to make your calculations easier:
//Now we will move all numbers to [0..K-1]
long long int count=0;
for(int i=0;i<n;i++){
arr[i] = arr[i]%K;
}
//Now we will calculate cout of all shortest subsequences.
long long int sum=0;
int first(0);
std::vector<int> beg;
std::vector<int> end;
for(int i=0;i<n;i++){
if (arr[i] == 0)
{
count++;
continue;
}
sum += arr[i];
if (sum == K)
{
beg.push_back(first);
end.push_back(i);
count++;
}
else
{
while (sum > K)
{
sum -= arr[first];
first++;
}
if (sum == K)
{
beg.push_back(first);
end.push_back(i);
count++;
}
}
}
//this way we found all short subsequences. And we need to calculate all subsequences that consist of some short subsequencies.
int party(0);
for (int i = 0; i < beg.size() - 1; ++i)
{
if (end[i] == beg[i+1])
{
count += party + 1;
party++;
}
else
{
party = 0;
}
}
So, with max array size = 10^6 and max size of rest = 99, you will not have overflow even if you will need to summ all numbers in simple int32.
And time you will spend will be around O(n+n)