Programming task: sum of submatrices - c++

I have a problem with resolving my programming task. In fact, I resolved it, but my code don't pass some of test (time overseed).
The text of task is following:
We have a matrix that have N*N size. The first line of input contain two int: N and K. K is a number of lines that define submatrices.
Next N lines contains elements of main matrix (whitespace as delimeter of elements, \n as delimeter of lines). After that we have K lines that defines submatrices.
Definition is following:
y_l x_l y_r x_r where (x_l, y_l) is column and line of left top corner of submatrix in main matrix and (x_r, y_r) is column and line of right bottom corner of submatrix. We have to calculate sum of all submatrices and divide it into equivalence classes (submatrices are belong to one class if that sums are equal).
Output of program should be following:
three int (divided by whitespace) where first one is number of equivalence classes, second one is number of equivalence classes that have maximum elements and third one is average of sum of all submatrices.
From tests I pick up fact that problem is in calculation of sum:
while(true){
for(int i = x_l; i <= x_r; i++)
sum += *diff++;
if(diff == d_end) break;
d_start = d_start + size;
diff = d_start;
}
But I have no idea how to optimize it. May be someone can give me algorithm or some ideas how to calculate those sums faster.
Thanks.
UPDATE: Answer
After few days of searching I finally got working version of my program. Thanks to Yakk, which gave some very usefull advices.
There's final code.
Very usefull link that I strangely couldn't find before unless I ask a very specific question (bases on information that Yakk gave me) link.
I hope that my code might be helpfull for somebody in future.

Build a sum matrix.
At location (a,b) in the sum matrix, the sum of all elements left&above (including at (a,b)) of (a,b) in the original matrix is summed.
Now calculating the sum of a submatrix is 4 lookups, one add and two subtracts. Draw a 4x4 matrix and express the bottom right 2x2 using such sums to see how.
If you double stored data you can halve lookups. But I would not bother.
Building the sum matrix requires only a modest amoumt of work if you do it carefully.

Related

Tiling a 2xM array with 2x1 tiles to maximise the differences - INOI 2008, P2

(As I am new and may not be aware of the code of conduct, feel free to edit this post to make this better and more helpful to other people.)
Greetings everybody!
This problem is related to this: Problem Link
The problem in brief:
Given a 2xM array and we want to tile it with 2x1 tiles such that the sum of absolute values of the differences of the values "covered" via the individual tiles is maximized. We want to report this max sum.
The problem in detail:
In Domino Solitaire, you have a grid with two rows and many columns. Each square in the grid contains an integer. You are given a supply of rectangular 2×1 tiles, each of which exactly covers two adjacent squares of the grid. You have to place tiles to cover all the squares in the grid such that each tile covers two squares and no pair of tiles overlap. The score for a tile is the difference between the bigger and the smaller number that are covered by the tile. The aim of the game is to maximize the sum of the scores of all the tiles.
Below is my code for it. Basically, I've done a sort of a recursive thing because there are two cases: (1) One vertical 2x1 tile in the start and (2) Two horizontal 2x1 laid together to cover 2 columns.
#include <bits/stdc++.h>
using namespace std;
int maxScore(int array[][2], int N, int i);
int main(){
ios::sync_with_stdio(0);
cin.tie(0);
int N; cin >> N;
int array[N][2]; for(int i=0;i<N;i++) cin >> array[i][0] >> array[i][1];
cout << maxScore(array, N, 0);
return 0;
}
int maxScore(int array[][2], int N, int i){
int score1 = abs(array[i][0] - array[i][1]) + maxScore(array, N, i+1);
int score2 = abs(array[i][0] - array[i+1][0]) + abs(array[i][1] - array[i+1][1]) + maxScore(array, N, i+2);
return max(score1, score2);
}
However, this seems to be a very inefficient solution and I can't really understand how to cover the base cases (otherwise this would go on forever).
Any help would be really appreciated. Thank You! (BTW I want to create a new tag - Competitive Programming, can anybody help me do so?)
Maintain an array of the best solutions, where the value in column i of the array is the best solution considering only the matching colums of the input matrix. Then arr[i] = max possible by adding either one tile to the arr[i-1] solution, or 2 to the arr[i-2] solution. Treat arr[-1] as 0 and set arr[0] to the val of one vertical dominoe.
This is intentionally not a complete solution, but should help you find a much faster implementation.
Since you need to cover every square of a 2xM grid, there is no way you have dominoes placed like this:
. . .[#|#]. .
. .[#|#]. . .
So essentially, for every sub-block the right most domino is vertical, or there are two horizontal ones above each other.
If you start from the left, you only need to remember what your best result was for the first n or n-1 tiles, then try placing a vertical domino right to the n-solution or two horizontal dominoes right to the n-1 solution. The better solution is the best n+1 solution. You can compute this in a simple for-loop, as a first step, store all partial solutions in a std::vector.

Pick a matrix cell according to its probability

I have a 2D matrix of positive real values, stored as follow:
vector<vector<double>> matrix;
Each cell can have a value equal or greater to 0, and this value represents the possibility of the cell to be chosen. In particular, for example, a cell with a value equals to 3 has three times the probability to be chosen compared to a cell with value 1.
I need to select N cells of the matrix (0 <= N <= total number of cells) randomly, but according to their probability to be selected.
How can I do that?
The algorithm should be as fast as possible.
I describe two methods, A and B.
A works in time approximately N * number of cells, and uses space O(log number of cells). It is good when N is small.
B works in time approximately (number of cells + N) * O(log number of cells), and uses space O(number of cells). So, it is good when N is large (or even, 'medium') but uses a lot more memory, in practice it might be slower in some regimes for that reason.
Method A:
The first thing you need to do is normalize the entries. (It's not clear to me if you assume they are normalized or not.) That means, sum all the entries and divide by the sum. (This part is potentially slow, so it's better if you assume or require that it already happened.)
Then you sample like this:
Choose a random [i,j] entry of the matrix (by choosing i,j each uniformly randomly from the range of integers 0 to n-1).
Choose a uniformly random real number p in the range [0, 1].
Check if matrix[i][j] > p. If so, return the pair [i][j]. If not, go back to step 1.
Why does this work? The probability that we end at step 3 with any particular output, is equal to, the probability that [i][j] was selected (this is the same for each entry), times the probality that the number p was small enough. This is proportional to the value matrix[i][j], so the sampling is choosing each entry with the correct proportions. It's also possible that at step 3 we go back to the start -- does that bias things? Basically, no. The reason is, suppose we arbitrarily choose a number k and then consider the distribution of the algorithm, conditioned on stopping exactly after k rounds. Conditioned on the assumption that we stop at the k'th round, no matter what value k we choose, the distribution we sample has to be exactly right by the above argument. Since if we eliminate the case that p is too small, the other possibilities all have their proportions correct. Since the distribution is perfect for each value of k that we might condition on, and the overall distribution (not conditioned on k) is an average of the distributions for each value of k, the overall distribution is perfect also.
If you want to analyze the number of rounds that typically needed in a rigorous way, you can do it by analyzing the probability that we actually stop at step 3 for any particular round. Since the rounds are independent, this is the same for every round, and statistically, it means that the running time of the algorithm is poisson distributed. That means it is tightly concentrated around its mean, and we can determine the mean by knowing that probability.
The probability that we stop at step 3 can be determined by considering the conditional probability that we stop at step 3, given that we chose any particular entry [i][j]. By the formulas for conditional expectation, you get that
Pr[ stop at step 3 ] = sum_{i,j} ( 1/(n^2) * Matrix[i,j] )
Since we assumed the matrix is normalized, this sum reduces to just 1/n^2. So, the expected number of rounds is about n^2 (that is, n^2 up to a constant factor) no matter what the entries in the matrix are. You can't hope to do a lot better than that I think -- that's about the same amount of time it takes to just read all the entries of the matrix, and it's hard to sample from a distribution that you cannot even read all of.
Note: What I described is a way to correctly sample a single element -- to get N elements from one matrix, you can just repeat it N times.
Method B:
Basically you just want to compute a histogram and sample inversely from it, so that you know you get exactly the right distribution. Computing the histogram is expensive, but once you have it, getting samples is cheap and easy.
In C++ it might look like this:
// Make histogram
typedef unsigned int uint;
typedef std::pair<uint, uint> upair;
typedef std::map<double, upair> histogram_type;
histogram_type histogram;
double cumulative = 0.0f;
for (uint i = 0; i < Matrix.size(); ++i) {
for (uint j = 0; j < Matrix[i].size(); ++j) {
cumulative += Matrix[i][j];
histogram[cumulative] = std::make_pair(i,j);
}
}
std::vector<upair> result;
for (uint k = 0; k < N; ++k) {
// Do a sample (this should never repeat... if it does not find a lower bound you could also assert false quite reasonably since it means something is wrong with rand() implementation)
while(1) {
double p = cumulative * rand(); // Or, for best results use std::mt19937 or boost::mt19937 and sample a real in the range [0,1] here.
histogram_type::iterator it = histogram::lower_bound(p);
if (it != histogram.end()) {
result.push_back(it->second);
break;
}
}
}
return result;
Here the time to make the histogram is something like number of cells * O(log number of cells) since inserting into the map takes time O(log n). You need an ordered data structure in order to get cheap lookup N * O(log number of cells) later when you do repeated sampling. Possibly you could choose a more specialized data structure to go faster, but I think there's only limited room for improvement.
Edit: As #Bob__ points out in comments, in method (B) a written there is potentially going to be some error due to floating point round-off if the matrices are quite large, even using type double, at this line:
cumulative += Matrix[i][j];
The problem is that, if cumulative is much larger than Matrix[i][j] beyond what the floating point precision can handle then these each time this statement is executed you may observe significant errors which accumulate to introduce significant inaccuracy.
As he suggests, if that happens, the most straightforward way to fix it is to sort the values Matrix[i][j] first. You could even do this in the general implementation to be safe -- sorting these guys isn't going to take more time asymptotically than you already have anyways.

Reaching from first index to last with minimum product without using Graphs?

Solving this problem on codechef:
After visiting a childhood friend, Chef wants to get back to his home.
Friend lives at the first street, and Chef himself lives at the N-th
(and the last) street. Their city is a bit special: you can move from
the X-th street to the Y-th street if and only if 1 <= Y - X <= K,
where K is the integer value that is given to you. Chef wants to get
to home in such a way that the product of all the visited streets'
special numbers is minimal (including the first and the N-th street).
Please, help him to find such a product. Input
The first line of input consists of two integer numbers - N and K -
the number of streets and the value of K respectively. The second line
consist of N numbers - A1, A2, ..., AN respectively, where Ai equals
to the special number of the i-th street. Output
Please output the value of the minimal possible product, modulo
1000000007. Constraints
1 ≤ N ≤ 10^5 1 ≤ Ai ≤ 10^5 1 ≤ K ≤ N Example
Input: 4 2 1 2 3 4.
Output: 8
It could be solved using graphs based on this tutorial
I tried to solve it without using graphs and just using recursion and DP.
My approach:
Take an array and calculate the min product to reach every index and store it in the respective index.
This could be calculated using top down approach and recursively sending index (eligible) until starting index is reached.
Out of all calculated values store the minimum one.
If it is already calculated return it else calculate.
CODE:
#include<iostream>
#include<cstdio>
#define LI long int
#define MAX 100009
#define MOD 1000000007
using namespace std;
LI dp[MAX]={0};
LI ar[MAX],k,orig;
void cal(LI n)
{
if(n==0)
return;
if(dp[n]!=0)
return;
LI minn=MAX;
for(LI i=n-1;i>=0;i--)
{
if(ar[n]-ar[i]<=k && ar[n]-ar[i]>=1)
{
cal(i);
minn=(min(dp[i]*ar[n],minn))%MOD;
}
}
dp[n]=minn%MOD;
return;
}
int main()
{
LI n,i;
scanf("%ld %ld",&n,&k);
orig=n;
for(i=0;i<n;i++)
scanf("%ld",&ar[i]);
dp[0]=ar[0];
cal(n-1);
if(dp[n-1]==MAX)
printf("0");
else printf("%ld",dp[n-1]);
return 0;
}
Its been 2 days and I have checked every corner cases and constraints but it still gives Wrong answer! Whats wrong with the solution?
Need Help.
Analysis
There are many problems. Here is what I found:
You restrict the product to a value inferior to 100009 without reason. The product can be way higher that that (this is indeed the reason why the problem only asked the value modulo 1000000007)
You restrict your moves from streets whose difference in special number is K whereas the problem statement says that you can move between any cities whose index difference is inferior to K
In you dynamic programming function you compute the product and store the modulo of the product. This can lead to a problem because the modulo of a big number can be lower than the modulo of a lower number. This may corrupt later computations.
The integral type you use, long int, is too short.
The complexity of your algorithm is too high.
From all these problems, the last one is the most serious. I fixed it by changing the whole aproach and using a better datastructure.
1st Problem
In your main() function:
if(dp[n-1]==MAX)
printf("0");
In your cal() function:
LI minn=MAX;
You should replace this line with:
LI minn = std::numeric_limits<LI>::max();
Do not forget to:
#include <limits>
2nd Problem
for(LI i=n-1;i>=0;i--)
{
if(ar[n]-ar[i]<=k && ar[n]-ar[i]>=1)
{
. . .
}
}
You should replace the for loop condition:
for(LI i=n-1;i>=n-k;i--)
And remove altogether the condition on the special numbers.
3rd Problem
You are looking for the path whose product of special numbers is the lowest. In your current setting, you compare path's product after having taken the modulo of the product. This is wrong, as the modulo of a higher number may become very low (for instance a path whose product is 1000000008 will have a modulo of 1 and you will choose this path, even if there is a path whose product is only 2).
This means you should compare the real products, without taking their modulo. As these products can become very high you should take their logarithm. This will allow you to compare the products with a simple double. Remember that:
log(a*b) = log(a) + log(b)
4th Problem
Use unsigned long long.
5th Problem
I fixed all these issues and submitted on codechef CHRL4. I got all but one test case accepted. The testcase not accepted was because of a timeout. This is due to the fact that your algorithm has got a complexity of O(k*n).
You can achieve O(n) complexity using a bottom-up dynamic programming approach, instead of top-down and using a data structure that will return the minimum log value of the k previous streets. You can lookup sliding window minimum algorithm to find how to do.
References
numeric_limits::max()
my own codechef CHRL4 solution: bottom-up dp + sliding window minimum

compute sum of value in an rectangle area of array

I have a very big array of many value and store it in an row-major 1d array.
ex:
1 2 3
4 5 6
will be store in int* array = {1,2,3,4,5,6};
what I have to do is given the row1, row2, column1, column2, then print out the area's sum, and it will request to caulate different area for many times.
what I have think about it is first use nested loop to traverse the array and store each row's sum in sum_row and store each column's sum in sum_column and store the total element's sum im totalSum.
Then totalSum - the row and the columns that surrond it + the elemnts that has been minus twice.
But it seems fast enough, is there any algorithm that can do faster or some coding style tips that can make the factor little?
Thx in advance.
It seems to me that you have replaced one double iteration with another. The problem is in subtracting "the elemnts that has been minus twice"; unless I'm mistaken, this involves iterating over those elements to sum them.
Instead, just iterate over the rectangular area that you need to sum. I doubt it will be any slower.
A more efficient algorithm can be obtained by generating the matrix of summed upper-left matrices. (See the Wikipedia article on summed area table.) You can then compute any submatrix sum by looking up four area sums.

USACO: Subsets (Inefficient)

I am trying to solve subsets from the USACO training gateway...
Problem Statement
For many sets of consecutive integers from 1 through N (1 <= N <= 39), one can partition the set into two sets whose sums are identical.
For example, if N=3, one can partition the set {1, 2, 3} in one way so that the sums of both subsets are identical:
{3} and {1,2}
This counts as a single partitioning (i.e., reversing the order counts as the same partitioning and thus does not increase the count of partitions).
If N=7, there are four ways to partition the set {1, 2, 3, ... 7} so that each partition has the same sum:
{1,6,7} and {2,3,4,5}
{2,5,7} and {1,3,4,6}
{3,4,7} and {1,2,5,6}
{1,2,4,7} and {3,5,6}
Given N, your program should print the number of ways a set containing the integers from 1 through N can be partitioned into two sets whose sums are identical. Print 0 if there are no such ways.
Your program must calculate the answer, not look it up from a table.
End
Before I was running on a O(N*2^N) by simply permuting through the set and finding the sums.
Finding out how horribly inefficient that was, I moved on to mapping the sum sequences...
http://en.wikipedia.org/wiki/Composition_(number_theory)
After many coding problems to scrape out repetitions, still too slow, so I am back to square one :(.
Now that I look more closely at the problem, it looks like I should try to find a way to not find the sums, but actually go directly to the number of sums via some kind of formula.
If anyone can give me pointers on how to solve this problem, I'm all ears. I program in java, C++ and python.
Actually, there is a better and simpler solution. You should use Dynamic Programming
instead. In your code, you would have an array of integers (whose size is the sum), where each value at index i represents the number of ways to possibly partition the numbers so that one of the partitions has a sum of i. Here is what your code could look like in C++:
int values[N];
int dp[sum+1]; //sum is the sum of the consecutive integers
int solve(){
if(sum%2==1)
return 0;
dp[0]=1;
for(int i=0; i<N; i++){
int val = values[i]; //values contains the consecutive integers
for(int j=sum-val; j>=0; j--){
dp[j+val]+=dp[j];
}
}
return dp[sum/2]/2;
}
This gives you an O(N^3) solution, which is by far fast enough for this problem.
I haven't tested this code, so there might be a syntax error or something, but you get the point. Let me know if you have any more questions.
This is the same thing as finding the coefficient x^0 term in the polynomial (x^1+1/x)(x^2+1/x^2)...(x^n+1/x^n), which should take about an upper bound of O(n^3).