I am trying to find the recurrence relation for this problem on Codechef:
http://www.codechef.com/problems/BWALL
I know once I find it, I can easily solve it using matrix exponentiation. But I'm having trouble understanding how it gets the right answer. There is a solution here, but I'd like if someone explained it in a better manner?
Is there is a simple rule of thumb to find recurrences or something like that? Thanks!
The "general rule" to find a recurrence is to understand how is the solution of a problem related to the solutions of smaller problems. But more than that, I don't think that there's a general procedure to find the recurrence.
For this particular example, here is how you can find the recurrence.
Suppose that you have a big wall of size N. Now, just look at the end of the wall. More precisely, from the end of the wall, find the first place with a "vertical separation", i.e. the first place where you can split the wall into two smaller walls without L-shape.
Example:
(A) Here is the wall:
X###X#XXX#X
XX#XX#XXXXX
The splitting with the end gives you:
X###X#XXX #X
XX#XX#XXX XX
(B) Another wall
X###X#XXX
XX#XX#XXX
The splitting with the end gives you:
X###X# XXX
XX#XX# XXX
What is the size of the small piece of wall that you can get between the splitting and the end of the wall? Well, you can have 1, 2 or 3, but not more (otherwise, you could make a smallest splitting).
The possibilities for the small piece are actually the ones given in your question (yes, the 7 small blocks).
So, to build a wall with size N, you must:
build a wall of size N-1 and add to end the size-1 small block
or build a wall of size N-2 and add to end one of the four size-2 blocks
or build a wall of size N-3 and add to end one of the two size-3 blocks.
So, the number T(N) of possible walls with size N is
the number of walls with size N-1 (with size-1 block in the end) -> T(N-1)
plus the number of walls with size N-2 with 4 possible end blocks (with size 2) -> 4 T(N-2)
plus the number of walls with size N-3 with 2 possible end blocks (with size 3) -> 2 T(N-3)
And there you get your recurrence.
Hope it helps!
Related
This is an interview problem.
You are given a string. Example S = "abbba".
If I choose a substring say "bbba" then this selection gets reduced to "ba" (All continously repeating characters are dropped).
You need to find number of odd/even length substring selection that would result to a palindrome after reduction.
I can solve it using 2D dp if it was not the condition of selection reduction which makes problem complicated.
First, reduce your entire string and save the quantity that each character present in the reduced string appears in the original string (can be done in O(n)). Let the reduced string be x1x2...xk and the respective quantitites be q1, q2, ..., qk.
Calculate the 2D dp you mentioned but for the reduced string (takes O(k^2)).
Now, it becomes a combinatorics problem. It can be solved using simple combinatorics principles, like the additive principle and the multiplicative principle. The total of substrings that become palindromes after you reduce it is:
q1 * dp[1][1] + q2 * (dp[1][2] + dp[2][2]) + ... + qk * (dp[1][k] + dp[2][k] + ... + dp[k][k])
It takes O(k^2) to compute this sum.
Now, how many of those have odd length ? How many have even length ?
To find it, you need some more simple observations about odd and even numbers and some careful case by case analysis. I will let you try it. Let me know if you have some problem.
The problem statement is ->
We want to scan N documents using two scanners. If S1 and S2 are the time taken by the scanner 1 and scanner 2 to scan a single document, find the minimum time required to scan all the N documents.
Example:-
S1 = 2, S2 = 4, N = 2
Output: 4
Explanation: Here we have two possibilities.
Either scan both documents in scanner 1 or
scan one document in each scanner.
In both the cases time required is 4.
I came up with a solution where we find all the combinations and insert them into the set. The minimum value will be the first element in the set.
The problem is the solution will have time complexity of O(n), but the accepted time complexity is O(logn).
I am thinking on the lines of binary search but can't come up with a solution.
What should be the approach?
If you could scan a fraction of a document, the fastest way would be to scan N*S2/(S1+S2) documents in scanner 1, and N*S1/(S1+S2) documents in scanner 2. And if these are not integers, you must round one of them up and one of them down, which gives you just two possibilities to check. This is O(1), not O(log n).
Well, I'm sharing the O(log n) approach. With binary search on ans / time, we could find the optimal time.
For binary search, we need upper bound & lower bound. Let's assume lower bound as 0. Now we need to find out the upper bound.
What will be the minimum time required if we scan all the documents in one scanner. It will be min (S1,S2) * N, right? Note: here we are not using other scanner which could scan documents while another one is busy. So min(S1,S2) * N will be our upper bound.
We've got our bounds,
Lower bound = 0
Upper bound = min(S1,S2) * N
Now do BS on time, take a mid & check how many documents can be scanned with scanner 1 scanner 2 within mid time. Whenever total scanned documents get >= N then mid could be ans
.
You can check BS from here - https://www.hackerearth.com/practice/algorithms/searching/binary-search/tutorial/
I came across an interview question:
"Given life times of different elephants. Find the period when maximum number of elephants were alive." For example:
Input: [5, 10], [6, 15], [2, 7]
Output: [6,7] (3 elephants)
I wonder if this problem can be related to the Longest substring problem for 'n' number of strings, such that each string represents the continuous range of a time period.
For e.g:
[5,10] <=> 5 6 7 8 9 10
If not, what can be a good solution to this problem ? I want to code it in C++.
Any help will be appreciated.
For each elephant, create two events: elephant born, elephant died. Sort the events by date. Now walk through the events and just keep a running count of how many elephants are alive; each time you reach a new maximum, record the starting date, and each time you go down from the maximum record the ending date.
This solution doesn't depend on the dates being integers.
If i were you at the interview i would create a std::array with maximum age of the elephant and then increment elements number for each elephant like:
[5,10] << increment all elements from index 5 to 10 in array.
Then i would sort and find where is the biggest number.
There is possibility to use std::map like map<int,int> ( 1st - period, 2nd - number of elephants). It will be sorted by default.
Im wondering if you know any better solution?
This is similar to a program that checks to see if parenthesis are missing. It is also related to date range overlap. This subject is beaten to death on StackOverflow and elsewhere. Here it is:
Determine Whether Two Date Ranges Overlap
I have implemented this by placing all of the start/end ranged in one vector of structs (or classes) and then sorting them. Then you can run through the vector and detect transitions of the level of elephants. (Number of elephants -- funny way of stating the problem!)
From your Input I find that all the time period are overlapping then in that case the solution is simple
we have been given range as [start end]
so the answer will be maximum of all start and minimum of all end.
Just traverse over each time period and find the maximum of all start and mimumum of all end
Note : this solution is applicable when all the time periods over lap
In Your example
Maximum of all input = 6
Minimum of all output= 7
I will just make two arrays , one for the time elephants are born and one for the time elephants die . Sort both of the arrays.
Now keep a counter (initially at zero ) . Start traversing both the arrays and keep getting the smallest element from both of the arrays. If we get an element from start array then increment the counter , else decrement the counter. We can find the max value and the time easily by this method.
Buttons
Each cell of an N x N grid is either a 0 or a 1. You are given two such N x N grids, the initial grid and the final grid. There is a button against each row and each column of the initial N x N grid. Pressing a row-button toggles the values of all the cells in that row, and pressing a column-button toggles the values of all the cells in that column.
You are required to find the minimum number of button presses required to transform the grid from the initial configuration to the final configuration, and the buttons that must be pressed in order to make this transformation.
When the initial and the final configurations are the same, print "0".
Input
The first line contains t, the number of test cases (about 10). Then t test cases follow.
Each test case has the following form:
The first line contains n, the size of the board (1 ≤ n ≤ 1000).
n lines follow. The ith line contains n space separated integers representing the ith row of the initial grid. Each integer is either a 0 or a 1.
n lines follow, representing the final grid, in the same format as above.
Output
For each test case, output the number of row-button presses, followed by the row buttons that must be pressed. Print the number of column-button presses next, followed by 0-indexed indices of the column buttons that must be pressed. The total number of button presses must be minimized.
Output "-1" if it is impossible to achieve the final configuration from the initial configuration. If there is more than one solution, print any one of them.
Input:
1
3
0 0 0
1 1 0
1 1 0
1 1 0
1 1 1
1 1 1
Output:
1
0
1
2
Though it works absolutely fine on my machine,it doesnt accept a solution at codechef and gives me a wrong answer.Can anyone guide me what to do pls pls pls??
Code has been written in C++ and compiled using g++ compiler.
In the code posted, I would revise the code after you calculate "matrixc". I find it very difficult to follow beyond that point, so I'm going to stop looking at the code and talk about the problem. For those without the code, matrix C = initial matrix - final matrix. The matrices are over the binary field.
In problems like these, look at the symmetries in the solution. There are three symmetries. One is the order of the buttons does not matter. If you take a valid solution and rearrange it, you get another valid solution. Another symmetry is that pressing a button twice is the same as not pressing it at all. The last symmetry is that if you take the complement of a valid solution, you get another valid solution. For example, in a 3x3 grid, if S = { row1, row3, col1 } is a solution, then S' = { row2, col2, col3 } is also a solution.
So all you need to do is find one solution, then exploit the symmetry. Since you only need to find one, just do the easiest thing you can think of. I would just look at column 1 and row 1 to construct the solution, then check the solution against the whole matrix. If this solution gives you more than N buttons to press for an NxN grid, then take the solution's complement and you'll end up with a smaller one.
Symmetry is a very important concept in computer science and it comes up almost everywhere. Understanding the symmetries of this problem is what allows you to solve it without checking every possible solution.
P.S. You say this code is C++, but it is also perfectly valid C if you remove #include <iostream> from the top. It might take a lot less time to compile if you compile it as C.
I'm looking for an algorithm, or at least theory of operation on how you would find similar text in two or more different strings...
Much like the question posed here: Algorithm to find articles with similar text, the difference being that my text strings will only ever be a handful of words.
Like say I have a string:
"Into the clear blue sky"
and I'm doing a compare with the following two strings:
"The color is sky blue" and
"In the blue clear sky"
I'm looking for an algorithm that can be used to match the text in the two, and decide on how close they match. In my case, spelling, and punctuation are going to be important. I don't want them to affect the ability to discover the real text. In the above example, if the color reference is stored as "'sky-blue'", I want it to still be able to match. However, the 3rd string listed should be a BETTER match over the second, etc.
I'm sure places like Google probably use something similar with the "Did you mean:" feature...
* EDIT *
In talking with a friend, he worked with a guy who wrote a paper on this topic. I thought I might share it with everyone reading this, as there are some really good methods and processes described in it...
Here's the link to his paper, I hope it is helpful to those reading this question, and on the topic of similar string algorithms.
Levenshtein distance will not completely work, because you want to allow rearrangements. I think your best bet is going to be to find best rearrangement with levenstein distance as cost for each word.
To find the cost of rearrangement, kinda like the pancake sorting problem. So, you can permute every combination of words (filtering out exact matches), with every combination of other string, trying to minimize a combination of permute distance and Levenshtein distance on each word pair.
edit:
Now that I have a second I can post a quick example (all 'best' guesses are on inspection and not actually running the algorithms):
original strings | best rearrangement w/ lev distance per word
Into the clear blue sky | Into the c_lear blue sky
The color is sky blue | is__ the colo_r blue sky
R_dist = dist( 3 1 2 5 4 ) --> 3 1 2 *4 5* --> *2 1 3* 4 5 --> *1 2* 3 4 5 = 3
L_dist = (2D+S) + (I+D+S) (Total Subsitutions: 2, deletions: 3, insertion: 1)
(notice all the flips include all elements in the range, and I use ranges where Xi - Xj = +/- 1)
Other example
original strings | best rearrangement w/ lev distance per word
Into the clear blue sky | Into the clear blue sky
In the blue clear sky | In__ the clear blue sky
R_dist = dist( 1 2 4 3 5 ) --> 1 2 *3 4* 5 = 1
L_dist = (2D) (Total Subsitutions: 0, deletions: 2, insertion: 0)
And to show all possible combinations of the three...
The color is sky blue | The colo_r is sky blue
In the blue clear sky | the c_lear in sky blue
R_dist = dist( 2 4 1 3 5 ) --> *2 3 1 4* 5 --> *1 3 2* 4 5 --> 1 *2 3* 4 5 = 3
L_dist = (D+I+S) + (S) (Total Subsitutions: 2, deletions: 1, insertion: 1)
Anyway you make the cost function the second choice will be lowest cost, which is what you expected!
One way to determine a measure of "overall similarity without respect to order" is to use some kind of compression-based distance. Basically, the way most compression algorithms (e.g. gzip) work is to scan along a string looking for string segments that have appeared earlier -- any time such a segment is found, it is replaced with an (offset, length) pair identifying the earlier segment to use. You can use measures of how well two strings compress to detect similarities between them.
Suppose you have a function string comp(string s) that returns a compressed version of s. You can then use the following expression as a "similarity score" between two strings s and t:
len(comp(s)) + len(comp(t)) - len(comp(s . t))
where . is taken to be concatenation. The idea is that you are measuring how much further you can compress t by looking at s first. If s == t, then len(comp(s . t)) will be barely any larger than len(comp(s)) and you'll get a high score, while if they are completely different, len(comp(s . t)) will be very near len(comp(s) + comp(t)) and you'll get a score near zero. Intermediate levels of similarity produce intermediate scores.
Actually the following formula is even better as it is symmetric (i.e. the score doesn't change depending on which string is s and which is t):
2 * (len(comp(s)) + len(comp(t))) - len(comp(s . t)) - len(comp(t . s))
This technique has its roots in information theory.
Advantages: good compression algorithms are already available, so you don't need to do much coding, and they run in linear time (or nearly so) so they're fast. By contrast, solutions involving all permutations of words grow super-exponentially in the number of words (although admittedly that may not be a problem in your case as you say you know there will only be a handful of words).
One way (although this is perhaps better suited a spellcheck-type algorithm) is the "edit distance", ie., calculate how many edits it takes to transform one string to another. A common technique is found here:
http://en.wikipedia.org/wiki/Levenshtein_distance
You might want to look into the algorithms used by biologists to compare DNA sequences, since they have to cope with many of the same things (chunks may be missing, or have been inserted, or just moved to a different position in the string.
The Smith-Waterman algorithm would be one example that'd probably work fairly well, although it might be too slow for your uses. Might give you a starting point, though.
i had a similar problem, i needed to get the percentage of characters in a string that were similar. it needed exact sequences, so for example "hello sir" and "sir hello" when compared needed to give me five characters that are the same, in this case they would be the two "hello"'s. it would then take the length of the longest of the two strings and give me a percentage of how similar they were. this is the code that i came up with
int compare(string a, string b){
return(a.size() > b.size() ? bigger(a,b) : bigger(b,a));
}
int bigger(string a, string b){
int maxcount = 0, currentcount = 0;//used to see which set of concurrent characters were biggest
for(int i = 0; i < a.size(); ++i){
for(int j = 0; j < b.size(); ++j){
if(a[i+j] == b[j]){
++currentcount;
}
else{
if(currentcount > maxcount){
maxcount = currentcount;
}//end if
currentcount = 0;
}//end else
}//end inner for loop
}//end outer for loop
return ((int)(((float)maxcount/((float)a.size()))*100));
}
I can't mark two answers here, so I'm going to answer and mark my own. The Levenshtein distance appears to be the correct method in most cases for this. But, it is worth mentioning j_random_hackers answer as well. I have used an implementation of LZMA to test his theory, and it proves to be a sound solution. In my original question I was looking for a method for short strings (2 to 200 chars), where the Levenshtein Distance algorithm will work. But, not mentioned in the question was the need to compare two (larger) strings (in this case, text files of moderate size) and to perform a quick check to see how similar the two are. I believe that this compression technique will work well but I have yet to study it to find at which point one becomes better than the other, in terms of the size of the sample data and the speed/cost of the operation in question. I think a lot of the answers given to this question are valuable, and worth mentioning, for anyone looking to solve a similar string ordeal like I'm doing here. Thank you all for your great answers, and I hope they can be used to serve others well too.
There's another way. Pattern recognition using convolution. Image A is run thru a Fourier transform. Image B also. Now superimposing F(A) over F(B) then transforming this back gives you a black image with a few white spots. Those spots indicate where A matches B strongly. Total sum of spots would indicate an overall similarity. Not sure how you'd run an FFT on strings but I'm pretty sure it would work.
The difficulty would be to match the strings semantically.
You could generate some kind of value based on the lexical properties of the string. e.g. They bot have blue, and sky, and they're in the same sentence, etc etc... But it won't handle cases where "Sky's jean is blue", or some other odd ball English construction that uses same words, but you'd need to parse the English grammar...
To do anything beyond lexical similarity, you'd need to look at natural language processing, and there isn't going to be one single algorith that would solve your problem.
Possible approach:
Construct a Dictionary with a string key of "word1|word2" for all combinations of words in the reference string. A single combination may happen multiple times, so the value of the Dictionary should be a list of numbers, each representing the distance between the words in the reference string.
When you do this, there will be duplication here: for every "word1|word2" dictionary entry, there will be a "word2|word1" entry with the same list of distance values, but negated.
For each combination of words in the comparison string (words 1 and 2, words 1 and 3, words 2 and 3, etc.), check the two keys (word1|word2 and word2|word1) in the reference string and find the closest value to the distance in the current string. Add the absolute value of the difference between the current distance and the closest distance to a counter.
If the closest reference distance between the words is in the opposite direction (word2|word1) as the comparison string, you may want to weight it smaller than if the closest value was in the same direction in both strings.
When you are finished, divide the sum by the square of the number of words in the comparison string.
This should provide some decimal value representing how closely each word/phrase matches some word/phrase in the original string.
Of course, if the original string is longer, it won't account for that, so it may be necessary to compute this both directions (using one as the reference, then the other) and average them.
I have absolutely no code for this, and I probably just re-invented a very crude wheel. YMMV.