I'm trying to optimize my code to calculate the nth power of a matrix.
Before I would just call multiplySquare n times but that was way too slow. The problem is, it builds just fine but when I run it, I get a failure with exit value 1. I believe my algorithm is right so what's causing this?
[EDIT] Added recursion termination condition but still, I get the same error.
[EDIT AGAIN] I re-wrote the recursion part again and now it seems to work but only for certain inputs of n. I'll have to play around with it more. Any help would be appreciated.
void multiplySquare(long long A[2][2], long long B[2][2]){
long long result[2][2];
for (int i = 0; i < 2; i++){
for (int j = 0; j < 2; j++){
result[i][j] = 0;
for (int k = 0; k < 2; k++){
result[i][j] += A[i][k] * B[k][j];
}
}
}
for (int i=0; i<2; i++){
for (int j=0; j<2; j++){
A[i][j] = result[i][j];
}
}
}
void power(long long A[2][2], long long B[2][2], long long n){
if(n/2 != 0){
power(A, B, n/2);
}
if(n%2 != 0){
multiplySquare(A, B);
}
}
The algorithm to compute the N-th power of a number x efficiently is:
If N is zero, return 1.
If N is 1, return x.
Compute (N/2)-th power. y = x^(N/2)
If N is even, return y*y
If N is odd, return x*y*y
If you translate that logic to your case, you will need something along the lines of:
// Assuming that the result is returned in B.
void power(long long A[2][2], long long B[2][2], long long n)
{
if ( n == 0 )
{
makeIdentity(B);
return;
}
if ( n == 1 )
{
assign(A, B); // Make B same as A.
return;
}
power(A, B, n/2);
multiplySquare(B, B);
if(n % 2 != 0)
{
multiplySquare(B, A);
}
}
I'm trying to optimize my code to calculate the nth power of a matrix.
Since your goal is an optimization, it might be a good thing to consider that diagonal matrices have trivial n-th power, i.e. the n-th power on the elements of the main diagonal.
So, firstly you should diagonalise your matrix. One way to do it is to find the eigenvectors and eigenvalues of your initial matrix, A, and utilize the following relationship:
A = P D P-1
where P is a matrix containing the (column) eigenvectors of A, P-1
is its inverse and D is a diagonal matrix containing the eigenvalues.
Then: An = P Dn P-1
The above equation:
Takes A to a place where rising to the n-th power is trivial.
Calculates the n-th power.
Returns A back to the original place.
It seems your snippet is not what you aim at. I conjecture what you mean is something like this:
void power(long long A[2][2], long long B[2][2], long long n){
if (n == 1) {
multiplySquare(A, B);
}
else if(n % 2 == 0) {
power(A, B, n / 2);
multiplySquare(A, A);
}
else {
power(A, B, (n - 1) / 2);
multiplySquare(A, A);
multiplySquare(A, B);
}
Related
Problem statement: Given a set of n coins of some denominations (maybe repeating, in random order), and a number k. A game is being played by a single player in the following manner: Player can choose to pick 0 to k coins contiguously but will have to leave one next coin from picking. In this manner give the highest sum of coins he/she can collect.
Input:
First line contains 2 space-separated integers n and x respectively, which denote
n - Size of the array
x - Window size
Output:
A single integer denoting the max sum the player can obtain.
Working Soln Link: Ideone
long long solve(int n, int x) {
if (n == 0) return 0;
long long total = accumulate(arr + 1, arr + n + 1, 0ll);
if (x >= n) return total;
multiset<long long> dp_x;
for (int i = 1; i <= x + 1; i++) {
dp[i] = arr[i];
dp_x.insert(dp[i]);
}
for (int i = x + 2; i <= n; i++) {
dp[i] = arr[i] + *dp_x.begin();
dp_x.erase(dp_x.find(dp[i - x - 1]));
dp_x.insert(dp[i]);
}
long long ans = total;
for (int i = n - x; i <= n; i++) {
ans = min(ans, dp[i]);
}
return total - ans;
}
Can someone kindly explain how this code is working i.e., how line no. 12-26 in the Ideone solution is producing the correct answer?
I have dry run the code using pen and paper and found that it's giving the correct answer but couldn't figure out the algorithm used(if any). Can someone kindly explain to me how Line No. 12-26 is producing the correct answer? Is there any technique or algorithm at use here?
I am new to DP, so if someone can point out a tutorial(YouTube video, etc) related to this kind of problem, that would be great too. Thank you.
It looks like the idea is converting the problem - You must choose at least one coin in no more than x+1 coins in a row, and make it minimal. Then the original problem's answer would just be [sum of all values] - [answer of the new problem].
Then we're ready to talk about dynamic programming. Let's define a recurrence relation for f(i) which means "the partial answer of the new problem considering 1st to i-th coins, and i-th coin is chosen". (Sorry about the bad description, edits welcome)
f(i) = a(i) : if (i<=x+1)
f(i) = a(i) + min(f(i-1),f(i-2),...,f(i-x-1)) : otherwise
where a(i) is the i-th coin value
I added some comments line by line.
// NOTE f() is dp[] and a() is arr[]
long long solve(int n, int x) {
if (n == 0) return 0;
long long total = accumulate(arr + 1, arr + n + 1, 0ll); // get the sum
if (x >= n) return total;
multiset<long long> dp_x; // A min-heap (with fast random access)
for (int i = 1; i <= x + 1; i++) { // For 1 to (x+1)th,
dp[i] = arr[i]; // f(i) = a(i)
dp_x.insert(dp[i]); // Push the value to the heap
}
for (int i = x + 2; i <= n; i++) { // For the rest,
dp[i] = arr[i] + *dp_x.begin(); // f(i) = a(i) + min(...)
dp_x.erase(dp_x.find(dp[i - x - 1])); // Erase the oldest one from the heap
dp_x.insert(dp[i]); // Push the value to the heap, so it keeps the latest x+1 elements
}
long long ans = total;
for (int i = n - x; i <= n; i++) { // Find minimum of dp[] (among candidate answers)
ans = min(ans, dp[i]);
}
return total - ans;
}
Please also note that multiset is used as a min-heap. However we also need quick random-access(to erase the old ones) and multiset can do it in logarithmic time. So, the overall time complexity is O(n log x).
I'm trying to solve a programming problem where I have to display the number of positive integer solutions of the inequality x² + y² < n, where n is given by the user. I've already written a code that seems to work but not as fast as I'd like it to. Is there any way to speed it up?
My current code:
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
long long n, i, r, k, p, a;
cin >> k;
while (k--)
{
r = 0;
cin >> n;
p = sqrt(n);
for (i = 1; i <= p; i++)
{
a = sqrt(n - (i * i));
r += a;
if ((((i * i) + (a * a)) == n) && (a > 0))
{
r--;
}
}
cout << r << "\n";
}
return 0;
}
Edit:
This is a solution for this task.
The task in English:
Find the number of natural solutions (x≥1, y≥1) of the inequality x²+y² < n, where 0 < n < 2147483647. For example, for n=10 there are 4 solutions: (1,1), (1,2), (2,1), (2,2).
Input
In the first line of input the number of test cases k is given. In the next k lines, there are the n values given.
Output
In the output, you have to display in separate lines the number of natural solutions of the inequality.
Example
Input:
2
10
11
Output:
4
6
Your solution seems fast already. The main possibility to reduce the time spent is to suppress the call to sqrtin the loop. This is obtained by considering that the value a = sqrt(n - (i * i)) does not vary very much from one iteration to the next one.
Here is the code:
r = 0;
p = sqrt(n);
if ((p*p) == n) p--;
a = p;
for (long long i = 1; i <= p; i++)
{
while ((n-i*i) <= a*a) {
--a;
}
r += a;
}
I use two method to calculate the binomial coefficient.
one is
int fac(int n) {
if ( n < 2 ) return 1; // return 1 when n=0,1
int ret = 1;
for(int i=2; i <= n; ++i)
ret *= i; // calculate factorial
return ret;
}
int choose_fac(int n, int k) {
return fac(n)/fac(k)/fac(n-k);
}
The other one is:
int choose_dp(int n, int k) {
int C[n+1][k+1];
int i, j;
for (i = 0; i <= n; i++) {
for (j = 0; j <= min(i, k); j++) {
if (j == 0 || j == i) C[i][j] = 1;
else C[i][j] = C[i-1][j-1] + C[i-1][j];
}
}
return C[n][k];
}
So when i run on (15,5), the second one give the right answer while first one gives 4. I know that for choose_fac the int goes out of range when calculate 15!, but if this is the reason, why choose_dp doesn't return a wrong answer as they both use int to define the function?
Thanks a lot!
E.
The first one overflows an int.
When you call fac(15) to compute choose_fac(15, 5) the value that should be computed by the function, 1,307,674,368,000 greatly exceeds the range of int. The end result of the computation 15 choose 5 would be in the range for int because two relatively large factorials are divided, but an error in the intermediate result prevents this computation from completing successfully.
The second function, which uses dynamic programming, is free from this problem, because it does not compute factorials explicitly. This method of computing binomial coefficients is called Pascal's Triangle.
When you return fac(n)/fac(k)/fac(n-k); It is evaluated left to right. (((fac(n)) /fac(k))/fac(n-k)) First evaluation fac(n) gives an overflow error.
Well I am doing a C++ program and in that I need to find numbers with common factors from an array.I am already doing it in the naive way.
int commonFactors(int p, int q){
int count = 0;
if(q > p){
for(int i = 2;i < q;i++){
if((q%i==0)&&(p%i==0)){
count++;
break;
}
}
}
else if(p > q){
for(int i = 2;i < p;i++){
if((p%i==0)&&(q%i==0)){
count++;
break;
}
}
}
else{
count = 1;
}
return count;
}
Well then my code timeouts for larger inputs. My input range is from 1 to 1000000 for any element in the array. Any clue about how to compute it efficiently?
I have an idea of checking with only prime factors but I am worried about the range in which to check.
If the sole question is "do these two have a common factor (other than one)", then one option would simply be to compute their greatest common divisor, and check if it is one. The GCD can be computed fairly efficiently (definitely faster than just counting all the way up to your numbers) using the Euclidean algorithm:
gcd(a, 0) = a
gcd(a, b) = gcd(b, a % b)
You can do it more efficiently by running the for loop up to "sqrt(p)" (or q, depending on the smaller number of course).
That should speed up things already.
Consider two numbers: 9240 and 16170. Each number can be written down as a product of a (few) prime numbers:
9240 = 2*2*3*5*7*11
16170 = 2*3*5*7*7*11
From the example above it should be obvious that the total number of possible common factors would be the total list of numbers you can create with those operands. In this case the set of numbers 2, 3, 5, and 11 will produce 15 total combinations.
So your code could do the following steps (I'm not going to write the C++ code for you as you should be able to do so easily yourself):
Split each the number into its prime factors using Integer factorization
Find the complete subset of those primes that are present in each list (don't forget that some may appear more than once in both lists and should be counted as separate ones, i.e. twice)
Find all the possible numbers you can create by combining the given set of primes
For the last part of this you can see Dynamic programming for ideas on how to improve its performance significantly compared to a naïve approach.
First some mathematics: Say A and B are two positive not null integers, let us call C= gcd(A, B) the greatest common divisor of A and B, then if M divises both A and B, M divises C.
So if you only want to know whether A and B have common divisors you just have to check whether C is greater than 1, if you want to know all common divisors (or their number) you have to find all divisors of C.
Euclidean's algorithm to find the GCD of two numbers is based on following property: say B < A, A = P * Q + R is the euclidean division of P by Q, then if R = 0, GCD(A,B) = B, else GCD(A,B) = GCD(B,R) (ref wikipedia)
Now some code:
/* Euclidian algorythm to find Greatest Common Divisor
Constraint (not controled here) p>0 and q>0 */
int gcd(int p, int q) {
// ensures q < p
if (p < q) {
int temp = p;
p = q;
q = temp;
}
int r = p % q;
// if q divises q, gcd is q, else gcd(p, q) is gcq(q, r)
return (r == 0) ? q : gcd(q, r);
}
bool sharedivisors(int p, int q) {
int d = gcd(p, q);
return d > 1;
}
int divisors(int p, int q) {
int d = gcd(p, q);
if (d == 1) {
return 1;
}
int count = 0;
for(int i=2; i<d/2; i++) {
if(d % i == 0) {
int j = d/i;
if (j > i) count += 2;
else {
if (j == i) count += 1;
break;
}
}
}
return count + 2; // and 1 and d
}
Counting factors from 2 to bigger input is brute force and lasts long even if one of the inputs is large.
Number of common divisors could be get from exponents of their prime factorization. Easier to calculate their greatest common divisor first
gcd = gcd( p0, q0 )
/* .. */
int gcd( p0, q0 )
{
while( q0 )
{
int swp = q0;
q0 = p0 % q0;
p0 = swp;
}
return p0;
}
and then count its divisors
in naiv way (as in question)
by always dividing gcd with found divisors
by prime factorization
p0^x0 * p1^x1 * .. * pN^xN = gcd
count = (1+x0) * (1+x1) * .. * (1+xN)
Prime factorization requires prime list up to sqrt(gcd).
Suppose you have a linear equation in n variables. The goal is to either determine that no integer solution is possible, or determine the smallest coefficient vector, for an integer solution.
In other words, let ax=b where x is the vector you want to find, and a is a vector of coefficients. b is a scalar constant. Find x such that the sum of x1, ... ,xn is minimized, and all xis are integers. Or, determine that no such x exists. From now on, I will say that |x| is the sum of the xi's.
What is an efficient way to solve this? I feel like this is similar to the Knapsack problem, but I'm not entirely sure.
My Solution
The way I tried to solve this was doing a Breadth-First Search on the space of vectors, where the breadth would be the sum of the vector entries.
At first I did this naively, starting from |x| = 0, but when n is even moderately large, and the solution is non-trivial, the number of vectors generated is enormous (n ^ |x| for each |x| you go through). Even worse, I was generating many duplicates. Even when I found a way to generate almost no duplicates, this way is too slow.
Next, I tried starting from a higher |x| from the beginning, by putting a lower bound on the optimal |x|. I sorted a to have it in decreasing order, then removed all ai > b. Then a lower bound on |x| is b / a[0]. However, from this point, I had difficulty quickly generating all the vectors of size |x|. From here, my code is mostly hacky.
In the code, b = distance, x = clubs, n = numClubs
Here is what it looks like:
short getNumStrokes (unsigned short distance, unsigned short numClubs, vector<unsigned short> clubs) {
if (distance == 0)
return 0;
numClubs = pruneClubs(distance, &clubs, numClubs);
//printClubs (clubs, numClubs);
valarray<unsigned short> a(numClubs), b(numClubs);
queue<valarray<unsigned short> > Q;
unsigned short floor = distance / clubs[0];
if (numClubs > 1) {
for (int i = 0; i < numClubs; i++) {
a[i] = floor / numClubs;
}
Q.push (a);
}
// starter vectors
for (int i = 0; i < numClubs; i++) {
for (int j = 0; j < numClubs; j++) {
if (i == j)
a[j] = distance / clubs[0];
else
a[j] = 0;
}
if (dot_product (a, clubs) == distance)
return count_strokes(a);
// add N starter values
Q.push (a);
}
bool sawZero = false;
while (! Q.empty ()) {
a = Q.front(); // take first element from Q
Q.pop(); // apparently need to do this in 2 operations >_<
sawZero = false;
for (unsigned int i = 0; i < numClubs; i++) {
// only add numbers past right-most non-zero digit
//if (sawZero || (a[i] != 0 && (i + 1 == numClubs || a[i + 1] == 0))) {
// sawZero = true;
b = a; // deep copy
b[i] += 1;
if (dot_product (b, clubs) == distance) {
return count_strokes(b);
} else if (dot_product (b, clubs) < distance) {
//printValArray (b, clubs, numClubs);
Q.push (b);
}
//}
}
}
return -1;
}
EDIT: I'm using valarray because my compiler isn't C++ 11 compliant, so I can't use array. Other code suggestions much appreciated.
Your problem is an equality constrained integer knapsack problem:
min |x|
s.t. ax = b
x integer
If you have access, CPLEX or GUROBI can generally solve such problems quite easily.
Otherwise, consider some reductions of the constraint set
(e.g., http://www.optimization-online.org/DB_FILE/2002/11/561.ps)