I have a number n and I have to split it into k numbers such that all k numbers are distinct, the sum of the k numbers is equal to n and k is maximum. Example if n is 9 then the answer should be 1,2,6. If n is 15 then answer should be 1,2,3,4,5.
This is what I've tried -
void findNum(int l, int k, vector<int>& s)
{
if (k <= 2 * l) {
s.push_back(k);
return;
}
else if (l == 1) {
s.push_back(l);
findNum(l + 1, k - 1, s);
}
else if(l == 2) {
s.push_back(l);
findNum(l + 2, k - 2, s);
}
else{
s.push_back(l);
findNum(l + 1, k - l, s);
}
}
Initially k = n and l = 1. Resulting numbers are stored in s. This solution even though returns the number n as a sum of k distinct numbers but it is the not the optimal solution(k is not maximal). Example output for n = 15 is 1,2,4,8. What changes should be made to get the correct result?
Greedy algorithm works for this problem. Just start summing up from 1 to m such that sum(1...m) <= n. As soon as it exceeds, add the excess to m-1. Numbers from 1 upto m|m-1 will be the answer.
eg.
18
1+2+3+4+5 < 18
+6 = 21 > 18
So, answer: 1+2+3+4+(5+6-(21-18))
28
1+2+3+4+5+6+7 = 28
So, answer: 1+2+3+4+5+6+7
Pseudocode (in constant time, complexity O(1))
Find k such that, m * (m+1) > 2 * n
Number of terms = m-1
Terms: 1,2,3...m-2,(m-1 + m - (sum(1...m) - n))
sum can be partitionned into k terms in {1, ... , m} if min(k) <= sum <= max(k,m), with
min(k) = 1 + 2 + .. + k = (k*(k+1))/2
max(k,m) = m + (m-1) + .. + (m-k+1) = k*m - (k*(k-1))/2
So, you can use the following pseudo-code:
fn solve(n, k, sum) -> set or error
s = new_set()
for m from n down to 1:
# will the problem be solvable if we add m to s?
if min(k-1) <= sum-m <= max(k-1, m-1) then
s.add(m), sum-=m, k-=1
if s=0 and k=0 then s else error()
Related
I am reading about shell sort in Algorithms in C++ by Robert Sedwick.
Here outer loop to change the increments leads to this compact shellsort implementation, which uses the increment sequence 1 4 13 40 121 364 1093 3280 9841 . . . .
template <class Item>
void shellsort(Item a[], int l, int r)
{
int h;
for (h = 1; h <= (r - l) / 9; h = 3 * h + 1);
for (; h > 0; h = h / 3)
{
for (int i = l + h; i <= r; i++)
{
int j = i; Item v = a[i];
while (j >= l + h && v < a[j - h])
{
a[j] = a[j - h]; j -= h;
}
a[j] = v;
}
}
}
My question under what basis author is checking for condition h <= (r-l)/9, and why author is dividing by 9.
The loop:
for (h = 1; h <= (r - l) / 9; h = 3 * h + 1);
calculates the initial value of h. This value must be smaller than the range it will be used in:
h <= (r - l)
Everytime this condition passes, h gets updated to 3 * h + 1, which means that even though h is smaller than (r-l), the updated value might be larger. To prevent this, we could check if the next value of h would surpass the largest index:
(h * 3) + 1 <= (r - l)
This will make sure h is smaller than range of the array.
For example: say we have an array of size 42, which means indices go from 0 to 41. Using the condition as described above:
h = 1, is (3 * 1 + 1) <= (41 - 0) ? yes! -> update h to 4
h = 4, is (3 * 4 + 1) <= (41 - 0) ? yes! -> update h to 13
h = 13, is (3 * 13 + 1) <= (41 - 0) ? yes! -> update h to 40
h = 40, is (3 * 40 + 1) <= (41 - 0) ? no! => h will begin at 40
This means our initial h is 40, because h is only marginally smaller than the range of the array, very little work will be done, the algorithm will only check the following:
Does array[40] needs to be swapped with array[0] ?
Does array[41] needs to be swapped with array[1] ?
This is a bit useless, the first iteration only performs two checks. A smaller initial value of h means more work will get done in the first iteration.
Using:
h <= (r - l) / 9
ensures the initial value of h to be sufficiently small to allow the first iteration to do useful work. As an extra advantage, it also looks cleaner than the previous condition.
You could replace 9 by any value greater than 3. Why greater than 3? To ensure (h * 3) + 1 <= (r - l) is still true!
But do remember to not make the initial h too small: Shell Sort is based on Insertion Sort, which only performs well on small or nearly sorted arrays. Personally, I would not exceed h <= (r - l) / 15.
Given is a N*N grid.Now we need to find a good path of maximum length , where good path is defined as follow :
Good path always start from a cell marked as 0
We are only allowed to move Left,Right,Up Or Down
If the value of ith cell is say A, then value of next cell in the path must be A+1.
Now given these few conditions, I need to find out the length of maximum path that can be made. Also I need to count such paths that are of maximum length.
Example : Let N=3 and we have 3*3 matrix as follow :
0 3 2
3 0 1
2 1 0
Then maximum good path length here is 3 and the count of such good paths is 4.
0 3 2
3 0 1
2 1 0
0 3 2
3 0 1
2 1 0
0 3 2
3 0 1
2 1 0
0 3 2
3 0 1
2 1 0
This problem is a variation of Longest Path Problem, however your restrictions make this problem much easier, since the graph is actually a Directed Acyclic Graph (DAG), and thus the problem is solveable efficiently.
Define the directed graph G=(V,E) as following:
V = { all cells in the matrix} (sanity check: |V| = N^2)
E = { (u,v) | u is adjacent to v AND value(u) + 1 = value(v) }
Note that the resulting graph from the above definition is a DAG, because you cannot have any cycles, since it will result in having some edge e= (u,v) such that value(u) > value(v).
Now, you only need to find longest path in a DAG from any starting point. This is done by topological sort on the graph, and then using Dynamic Programming:
init:
for every source in the DAG:
D(v) = 0 if value(v) = 0
-infinity otherwise
step:
for each node v from first to last (according to topological sort)
D(v) = max{D(u) + 1 | for each edge (u,v) }
When you are done, find the node v with maximal value D(v), this is the length of the longest "good path".
Finding the path itself is done by rerolling the above, retracing your steps back from the maximal D(v) until you reach back the initial node with value 0.
Complexity of this approach is O(V+E) = O(n^2)
Since you are looking for the number of longest paths, you can modify this solution a bit to count the number of paths reached to each node, as follows:
Topological sort the nodes, let the sorted array be arr (1)
For each node v from start to end of arr:
if value(v) = 0:
set D(v) = 1
else
sum = 0
for each u such that (u,v) is an edge: (2)
sum = sum + D(u)
D(v) = sum
The above will find you for each node v the number of "good paths" D(v) that reaches it. All you have to do now, is find the maximal value x that has sum node v such that value(v) = x and D(v) > 0, and sum the number of paths reaching any node with value(v):
max = 0
numPaths = 0
for each node v:
if value(v) == max:
numPaths = numPaths + D(v)
else if value(v) > max AND D(v) > 0:
numPaths = D(v)
max = value(v)
return numPaths
Notes:
(1) - a "regular" sort works here to, but it will take O(n^2logn) time, and topological sort takes O(n^2) time
(2) Reminder, (u,v) is an edge if: (1) u and v are adjacent (2) value(u) + 1 = value(v)
You can do this with a simple Breadth-First Search.
First find all cells marked 0. (This is O(N2).) On each such cell put a walker. Each walker carries a number 'p' initialized to 1.
Now iterate:
All walkers stand on cells with the same number k. Each walker looks for neighboring cells (left, right, up or down) marked with k+1.
If no walker sees such a cell, the search is over. The length of the longest path is k, and the number of such paths is the sum of the p's of all the walkers.
If some walkers see such numbers, kill any walkers that don't.
Each walker moves into a good neighboring cell. If a walker sees more than one good cell, it divides into as many walkers as there are good cells, and one goes into each. (Each "child" has the same p value its "parent" had.) If two or more walkers meet in the same cell (i.e. if more than one path led to that cell) then they combine into a single walker, whose 'p' value is the sum of their 'p' values.
This algorithm is O(N2), since no cell can be visited more than once, and the number of walkers cannot exceed the number of cells.
I did it using ActionScript, hope it's readable. I think it is working correctly but I may have missed something.
const N:int = 9; // field size
const MIN_VALUE:int = 0; // start value
var field:Array = [];
// create field - not relevant to the task
var probabilities:Array = [0,1,2,3,4,5];
for (var i:int = 0; i < N * N; i++) field.push(probabilities[int(Math.random() * probabilities.length)]);//RANGE));
print_field();
// initial chain fill. We will find any chains of adjacent 0-1 elements.
var chain_list:Array = [];
for (var offset:int = 0; offset < N * N - 1; offset++) {
if (offset < N * N - N) { // y coordinate is not the lowest
var chain:Array = find_chain(offset, offset + N, MIN_VALUE);
if (chain) chain_list.push(chain);
}
if ((offset % N) < N - 1) { // x coordinate is not the rightmost
chain = find_chain(offset, offset + 1, MIN_VALUE);
if (chain) chain_list.push(chain);
}
}
var merged_chain_list:Array = chain_list;
var current_value:int = MIN_VALUE + 1;
// for each found chain, scan its higher end for more attached chains
// and merge them into new chain if found
while(chain_list.length) {
chain_list = [];
for (i = 0; i < merged_chain_list.length; i++) {
chain = merged_chain_list[i];
offset = chain[chain.length - 1];
if (offset < N * N - N) {
var tmp:Array = find_chain(offset, offset + N, current_value);
if (tmp) chain_list.push(merge_chains(chain, tmp));
}
if (offset > N) {
tmp = find_chain(offset, offset - N, current_value);
if (tmp) chain_list.push(merge_chains(chain, tmp));
}
if ((offset % N) < N - 1) {
tmp = find_chain(offset, offset + 1, current_value);
if (tmp) chain_list.push(merge_chains(chain, tmp));
}
if (offset % N) {
tmp = find_chain(offset, offset - 1, current_value);
if (tmp) chain_list.push(merge_chains(chain, tmp));
}
}
//save the last merged result if any and try the next value
if (chain_list.length) {
merged_chain_list = chain_list;
current_value++;
}
}
// final merged list is a list of chains of a same maximum length
print_chains(merged_chain_list);
function find_chain(offset1, offset2, current_value):Array {
// returns always sorted sorted from min to max
var v1:int = field[offset1];
var v2:int = field[offset2];
if (v1 == current_value && v2 == current_value + 1) return [offset1, offset2];
if (v2 == current_value && v1 == current_value + 1) return [offset2, offset1];
return null;
}
function merge_chains(chain1:Array, chain2:Array):Array {
var tmp:Array = [];
for (var i:int = 0; i < chain1.length; i++) tmp.push(chain1[i]);
tmp.push(chain2[1]);
return tmp;
}
function print_field():void {
for (var pos_y:int = 0; pos_y < N; pos_y++) {
var offset:int = pos_y * N;
var s:String = "";
for (var pos_x:int = 0; pos_x < N; pos_x++) {
var v:int = field[offset++];
if (v == 0) s += "[0]"; else s += " " + v + " ";
}
trace(s);
}
}
function print_chains(chain_list):void {
var cl:int = chain_list.length;
trace("\nchains found: " + cl);
if (cl) trace("chain length: " + chain_list[0].length);
for (var i:int = 0; i < cl; i++) {
var chain:Array = chain_list[i];
var s:String = "";
for (var j:int = 0; j < chain.length; j++) s += chain[j] + ":" + field[chain[j]] + " ";
trace(s);
}
}
Sample output:
1 2 1 3 2 2 3 2 4
4 3 1 2 2 2 [0][0] 1
[0][0] 1 2 4 [0] 3 3 1
[0][0] 5 4 1 1 [0][0] 1
2 2 3 4 3 2 [0] 1 5
4 [0] 3 [0] 3 1 4 3 1
1 2 2 3 5 3 3 3 2
3 4 2 1 2 4 4 4 5
4 2 1 2 2 3 4 5 [0]
chains found: 2
chain length: 5
23:0 32:1 41:2 40:3 39:4
33:0 32:1 41:2 40:3 39:4
I implemented it in my own Lisp dialect, so the source code is not going to help you that much :-) ...
EDIT: Added a Python version too.
anyway the idea is:
write a function paths(i, j) --> (maxlen, number) that returns maximal length of paths starting from (i, j) and how many of them are present..
this function is recursive and looking at neighbors of (i, j) with value M[i][j]+1 will call paths(ni, nj) to get the result for valid neighbors
if the maximal length for a neighbor is bigger than current maximal length you set a new current maximal length and reset the counter
if the maximal length is the same as current then add the counter to the total
if the maximal length is smaller just ignore that neighbor result
cache the result of the computation for the cell (this is very important!). In my version the code is split in two mutually recursive functions: paths that checks the cache first and calls compute-paths otherwise; compute-paths calls paths when processing neighbors. The caching of a recursive call is roughly equivalent to an explicit Dynamic Programming approach, but sometimes easier to implement.
To compute the final result you basically do the same computation but adding up the result for all 0 cells instead of considering neighbors.
Note that the number of different paths can become huge, and that's why enumerating all of them is not a viable option and caching/DP is a must: for example for a N=20 matrix with values M[i][j] = i+j there are 35,345,263,800 maximal paths of length 38.
This algorithm is O(N^2) in time (each cell is visited at most once) and requires O(N^2) space for the cache and for the recursion. Of course you cannot expect to get anything better than this given that the input is composed of N^2 numbers itself and you need at least to read them to compute an answer.
(defun good-paths (matrix)
(let** ((N (length matrix))
(cache (make-array (list N N)))
(#'compute-paths (i j)
(let ((res (list 0 1))
(count (1+ (aref matrix i j))))
(dolist ((ii jj) (list (list (1+ i) j) (list (1- i) j)
(list i (1+ j)) (list i (1- j))))
(when (and (< -1 ii N) (< -1 jj N)
(= (aref matrix ii jj) count))
(let (((maxlen num) (paths ii jj)))
(incf maxlen)
(cond
((< (first res) maxlen)
(setf res (list maxlen num)))
((= (first res) maxlen)
(incf (second res) num))))))
res))
(#'paths (i j)
(first (or (aref cache i j)
(setf (aref cache i j)
(list (compute-paths i j))))))
(res (list 0 0)))
(dotimes (i N)
(dotimes (j N)
(when (= (aref matrix i j) 0)
(let (((maxlen num) (paths i j)))
(cond
((< (first res) maxlen)
(setf res (list maxlen num)))
((= (first res) maxlen)
(incf (second res) num)))))))
res))
Edit
The following is a transliteration of the above in Python, that should be much easier to understand if you never saw Lisp before...
def good_paths(matrix):
N = len(matrix)
cache = [[None]*N for i in xrange(N)] # an NxN matrix of None
def compute_paths(i, j):
maxlen, num = 0, 1
count = 1 + matrix[i][j]
for (ii, jj) in ((i+1, j), (i-1, j), (i, j-1), (i, j+1)):
if 0 <= ii < N and 0 <= jj < N and matrix[ii][jj] == count:
nh_maxlen, nh_num = paths(ii, jj)
nh_maxlen += 1
if maxlen < nh_maxlen:
maxlen = nh_maxlen
num = nh_num
elif maxlen == nh_maxlen:
num += nh_num
return maxlen, num
def paths(i, j):
res = cache[i][j]
if res is None:
res = cache[i][j] = compute_paths(i, j)
return res
maxlen, num = 0, 0
for i in xrange(N):
for j in xrange(N):
if matrix[i][j] == 0:
c_maxlen, c_num = paths(i, j)
if maxlen < c_maxlen:
maxlen = c_maxlen
num = c_num
elif maxlen == c_maxlen:
num += c_num
return maxlen, num
consider that
0 -- is the first
1 -- is the second
2 -- is the third
.....
9 -- is the 10th
11 -- is the 11th
what is an efficient algorithm to find the nth palindromic number?
I'm assuming that 0110 is not a palindrome, as it is 110.
I could spend a lot of words on describing, but this table should be enough:
#Digits #Pal. Notes
0 1 "0" only
1 9 x with x = 1..9
2 9 xx with x = 1..9
3 90 xyx with xy = 10..99 (in other words: x = 1..9, y = 0..9)
4 90 xyyx with xy = 10..99
5 900 xyzyx with xyz = 100..999
6 900 and so on...
The (nonzero) palindromes with even number of digits start at p(11) = 11, p(110) = 1001, p(1100) = 100'001,.... They are constructed by taking the index n - 10^L, where L=floor(log10(n)), and append the reversal of this number: p(1101) = 101|101, p(1102) = 102|201, ..., p(1999) = 999|999, etc. This case must be considered for indices n >= 1.1*10^L but n < 2*10^L.
When n >= 2*10^L, we get the palindromes with odd number of digits, which start with p(2) = 1, p(20) = 101, p(200) = 10001 etc., and can be constructed the same way, using again n - 10^L with L=floor(log10(n)), and appending the reversal of that number, now without its last digit: p(21) = 11|1, p(22) = 12|1, ..., p(99) = 89|8, ....
When n < 1.1*10^L, subtract 1 from L to be in the correct setting with n >= 2*10^L for the case of an odd number of digits.
This yields the simple algorithm:
p(n) = { L = logint(n,10);
P = 10^(L - [1 < n < 1.1*10^L]); /* avoid exponent -1 for n=1 */
n -= P;
RETURN( n * 10^L + reverse( n \ 10^[n >= P] ))
}
where [...] is 1 if ... is true, 0 else, and \ is integer division.
(The expression n \ 10^[...] is equivalent to: if ... then n\10 else n.)
(I added the condition n > 1 in the exponent to avoid P = 10^(-1) for n=0. If you use integer types, you don't need this. Another choice it to put max(...,0) as exponent in P, or use if n=1 then return(0) right at the start. Also notice that you don't need L after assigning P, so you could use the same variable for both.)
I am trying to write a function that determines if a number n is prime or composite using the Lucas pseudoprime test; at the moment, I am working with the standard test, but once I get that working I will then write the strong test. I am reading the paper by Baillie and Wagstaff, and following the implementation by Thomas Nicely in the trn.c file.
I understand that the full test involves several steps: trial division by small primes, checking that n is not a square, performing a strong pseudoprimality test to base 2, then finally the Lucas pseudoprime test. I can handle all the other pieces, but I am having trouble with the Lucas pseudoprime test. Here is my implementation, in Python:
def gcd(a, b):
while b != 0:
a, b = b, a % b
return a
def jacobi(a, m):
a = a % m; t = 1
while a != 0:
while a % 2 == 0:
a = a / 2
if m % 8 == 3 or m % 8 == 5:
t = -1 * t
a, m = m, a # swap a and m
if a % 4 == 3 and m % 4 == 3:
t = -1 * t
a = a % m
if m == 1:
return t
return 0
def isLucasPrime(n):
dAbs, sign, d = 5, 1, 5
while 1:
if 1 < gcd(d, n) > n:
return False
if jacobi(d, n) == -1:
break
dAbs, sign = dAbs + 2, sign * -1
d = dAbs * sign
p, q = 1, (1 - d) / 4
print "p, q, d =", p, q, d
u, v, u2, v2, q, q2 = 0, 2, 1, p, q, 2 * q
bits = []
t = (n + 1) / 2
while t > 0:
bits.append(t % 2)
t = t // 2
h = -1
while -1 * len(bits) <= h:
print "u, u2, v, v2, q, q2, bits, bits[h] = ",\
u, u2, v, v2, q, q2, bits, bits[h]
u2 = (u2 * v2) % n
v2 = (v2 * v2 - q2) % n
if bits[h] == 1:
u = u2 * v + u * v2
u = u if u % 2 == 0 else u + n
u = (u / 2) % n
v = (v2 * v) + (u2 * u * d)
v = v if v % 2 == 0 else v + n
v = (v / 2) % n
if -1 * len(bits) < h:
q = (q * q) % n
q2 = q + q
h = h - 1
return u == 0
When I run this, isLucasPrime returns False for such primes as 83 and 89, which is incorrect. It also returns False for the composite 111, which is correct. And it returns False for the composite 323, which I know is a Lucas pseudoprime for which isLucasPrime should return True. In fact, isLucasPseudoprime returns False for every n on which I have tested it.
I have several questions:
1) I'm not expert with C/GMP, but it seems to me that Nicely runs through the bits of (n+1)/2 from right-to-left (least significant to most significant) where other authors run through the bits left-to-right. My code shown above runs through the bits left-to-right, but I have also tried running through the bits right-to-left, with the same result. Which order is correct?
2) It looks odd to me that Nicely only updates the u and v variables for a 1-bit. Is this correct? I expected to update all four of the Lucas-chain variables each time through the loop, since the indexes of the chain increase at each step.
3) What have I done wrong?
1) I'm not expert with C/GMP, but it seems to me that Nicely runs through the bits of (n+1)/2 from right-to-left (least significant to most significant) where other authors run through the bits left-to-right. My code shown above runs through the bits left-to-right, but I have also tried running through the bits right-to-left, with the same result. Which order is correct?
Indeed, Nicely goes from least significant to most significant bit. He computes U(2^k) and V(2^k) (and Q^(2^k); all modulo N of course), in the mpzU2m and mpzV2m variables, and has U((N+1) % 2^k) resp V((N+1) % 2^k) stored in mpzU and mpzV. When a 1-bit is encountered, the remainder (N+1) % 2^k changes, and mpzU and mpzV are updated accordingly.
The other way is to compute U(p), U(p+1), V(p) and (optionally) V(p+1) for a prefix p of N+1 and combine those to compute U(2*p+1) and either U(2*p) or U(2*p+2) [ditto for V] depending on whether the next bit after the prefix p is 0 or 1.
Both methods are correct, like you can compute the power x^N going from left to right, having x^p and x^(p+1) as state, or from right to left having x^(2^k) and x^(N % 2^k) as state [and, computing U(n) and U(n+1) is basically computing ζ^n where ζ = (1 + sqrt(D))/2].
I - and others, apparently - find the left-to-right order simpler. I haven't done or read an analysis, it might be that right-to-left is computationally less expensive on average and Nicely chose right-to-left because of that.
2) It looks odd to me that Nicely only updates the u and v variables for a 1-bit. Is this correct? I expected to update all four of the Lucas-chain variables each time through the loop, since the indexes of the chain increase at each step.
Yes, that is correct, because the remainder (N+1) % 2^k == (N+1) % 2^(k-1) if the 2^k bit is 0.
3) What have I done wrong?
A small typo first:
if 1 < gcd(d, n) > n:
should be
if 1 < gcd(d, n) < n:
of course.
More substantially, you use the updates for Nicely's traversal order (right-to-left), but traverse in the other direction. That of course produces wrong results.
Further, when updating v
if bits[h] == 1:
u = u2 * v + u * v2
u = u if u % 2 == 0 else u + n
u = (u / 2) % n
v = (v2 * v) + (u2 * u * d)
v = v if v % 2 == 0 else v + n
v = (v / 2) % n
you use the new value of u, but you ought to use the old value.
def isLucasPrime(n):
dAbs, sign, d = 5, 1, 5
while 1:
if 1 < gcd(d, n) < n:
return False
if jacobi(d, n) == -1:
break
dAbs, sign = dAbs + 2, sign * -1
d = dAbs * sign
p, q = 1, (1 - d) // 4
u, v, u2, v2, q, q2 = 0, 2, 1, p, q, 2 * q
bits = []
t = (n + 1) // 2
while t > 0:
bits.append(t % 2)
t = t // 2
h = 0
while h < len(bits):
u2 = (u2 * v2) % n
v2 = (v2 * v2 - q2) % n
if bits[h] == 1:
uold = u
u = u2 * v + u * v2
u = u if u % 2 == 0 else u + n
u = (u // 2) % n
v = (v2 * v) + (u2 * uold * d)
v = v if v % 2 == 0 else v + n
v = (v // 2) % n
if h < len(bits) - 1:
q = (q * q) % n
q2 = q + q
h = h + 1
return u == 0
works (no guarantees, but I think it is correct, and have done some tests, all of which it passed).
I have a big matrix as input, and I have the size of a smaller matrix. I have to compute the sum of all possible smaller matrices which can be formed out of the bigger matrix.
Example.
Input matrix size: 4 × 4
Matrix:
1 2 3 4
5 6 7 8
9 9 0 0
0 0 9 9
Input smaller matrix size: 3 × 3 (not necessarily a square)
Smaller matrices possible:
1 2 3
5 6 7
9 9 0
5 6 7
9 9 0
0 0 9
2 3 4
6 7 8
9 0 0
6 7 8
9 0 0
0 9 9
Their sum, final output
14 18 22
29 22 15
18 18 18
I did this:
int** matrix_sum(int **M, int n, int r, int c)
{
int **res = new int*[r];
for(int i=0 ; i<r ; i++) {
res[i] = new int[c];
memset(res[i], 0, sizeof(int)*c);
}
for(int i=0 ; i<=n-r ; i++)
for(int j=0 ; j<=n-c ; j++)
for(int k=i ; k<i+r ; k++)
for(int l=j ; l<j+c ; l++)
res[k-i][l-j] += M[k][l];
return res;
}
I guess this is too slow, can anyone please suggest a faster way?
Your current algorithm is O((m - p) * (n - q) * p * q). The worst case is when p = m / 2 and q = n / 2.
The algorithm I'm going to describe will be O(m * n + p * q), which will be O(m * n) regardless of p and q.
The algorithm consists of 2 steps.
Let the input matrix A's size be m x n and the size of the window matrix being p x q.
First, you will create a precomputed matrix B of the same size as the input matrix. Each element of the precomputed matrix B contains the sum of all the elements in the sub-matrix, whose top-left element is at coordinate (1, 1) of the original matrix, and the bottom-right element is at the same coordinate as the element that we are computing.
B[i, j] = Sum[k = 1..i, l = 1..j]( A[k, l] ) for all 1 <= i <= m, 1 <= j <= n
This can be done in O(m * n), by using this relation to compute each element in O(1):
B[i, j] = B[i - 1, j] + Sum[k = 1..j-1]( A[i, k] ) + A[j] for all 2 <= i <= m, 1 <= j <= n
B[i - 1, j], which is everything of the sub-matrix we are computing except the current row, has been computed previously. You keep a prefix sum of the current row, so that you can use it to quickly compute the sum of the current row.
This is another way to compute B[i, j] in O(1), using the property of the 2D prefix sum:
B[i, j] = B[i - 1, j] + B[i, j - 1] - B[i - 1, j - 1] + A[j] for all 1 <= i <= m, 1 <= j <= n and invalid entry = 0
Then, the second step is to compute the result matrix S whose size is p x q. If you make some observation, S[i, j] is the sum of all elements in the matrix size (m - p + 1) * (n - q + 1), whose top-left coordinate is (i, j) and bottom-right is (i + m - p + 1, j + n - q + 1).
Using the precomputed matrix B, you can compute the sum of any sub-matrix in O(1). Apply this to compute the result matrix S:
SubMatrixSum(top-left = (x1, y1), bottom-right = (x2, y2))
= B[x2, y2] - B[x1 - 1, y2] - B[x2, y1 - 1] + B[x1 - 1, y1 - 1]
Therefore, the complexity of the second step will be O(p * q).
The final complexity is as mentioned above, O(m * n), since p <= m and q <= n.