I have been trying to write code that determines if a matrix is upper triangular or not, and print it.
I have tried while loops, double for loops, nothing. Here is the mess I currently have:
int i, j;
int count = 0;
bool upper;
while (upper = true;)
{
for (i=1; i<m; i++)
{
for (j=i-1; j<n; j++)
{
if (a[i] > a[j] && a[i][j] == 0.0)
upper = true;
else if (a[i][j] != 0.0)
upper = false;
}
}
}
// cout << "Matrix is upper triangular. " << count << endl;
Look at example:
|0|1|2|3|4|5|
0| | | | | | |
1|X| | | | | |
2|X|X| | | | |
3|X|X|X| | | |
4|X|X|X|X| | |
5|X|X|X|X|X| |
This matrix is upper triangular is cells marked with X are all zero.
For i-th row - the cells {i,0},{i,1}, ... , {i,i-1} must be zero.
So it is easy task:
bool isUpperTriangle = true; // be optimistic!
for (int i = 1; i < SIZE && isUpperTriangle; ++i) // rows
for (int j = 0; j < i && isUpperTriangle; ++j) // columns - see example
if (m[i][j] != 0)
isUpperTriangle = false;
Whether the matrix is upper triangular or not can only be determined by checking the whole lower part. If you encounter a non-zero element along the way, you know it's not upper triangular. You cannot make that determination until you checked the whole lower part. So your:
upper = true;
statement while you're still in the loop has no logical basis.
The problem is similar to a character search inside a string. You need to check the whole string. If you arrived at the end of the string and still didn't find the character you're looking for, then (and only then) do you know that the character isn't in the string. The only difference with the matrix problem is that you've now got one additional dimension. Or, in other words, multiple one-dimensional arrays, each one +1 in size compared to the previous array, and you got to search them all.
I think this will likely do what you're looking for. Note: this assume the matrix is square. If it is not (i.e. m!=n) you should return false immediately:
bool upper = true;
for (i=1; i<m && upper; ++i)
for (j=0; j<i && (upper = (0 == a[i][j])); ++j);
Have you considered using a matrix library that has this function built in? I use the Eigen library quite often and I find the syntax very easy to use - they also have a short and useful tutorial to become familiar rather quickly.
http://eigen.tuxfamily.org/index.php?title=Main_Page
Related
Problem description:
There's a chocolate bar that consists of m x n squares. Some of the squares are black, some are white. Someone breaks the chocolate bar along its vertical axis or horizontal axis. Then it is broken again along its vertical or horizontal axis and it's being broken until it can broken into a single square or it can broken into squares that are only black or only white. Using a preferably divide-and-conquer algorithm, find the number of methods a chocolate bar can be broken.
Input:
The first line tells you the m x n dimensions of the chocolate bar. In the next m lines there are n characters that tell you how does the chocolate bar look. Letter w is a white square, letter b is a black square.
for example:
3 2
bwb
wbw
Output:
the number of methods the chocolate bar can be broken:
for the example above, it's 5 (take a look at the attached picture).
I tried to solve it using an iterative approach. Unfortunately, I couldn't finish the code as I'm not yet sure how to divide the the halves (see my code below). I was told that an recursive approach is much easier than this, but I have no idea how to do it. I'm looking for another way to solve this problem than my approach or I'm looking for some help with finishing my code.
I made two 2D arrays, first for white squares, second for black squares. I'm making a matrix out of the squares and if there's a chocolate of such or such color, then I'm marking it as 1 in the corresponding array.
Then I made two arrays of the two cumulative sums of the matrices above.
Then I created a 4D array of size [n][m][n][m] and I made four loops: first two (i, j) are increasing the size of an rectangular array that is the size of the searching array (it's pretty hard to explain...) and two more loops (k, l) are increasing the position of my starting points x and y in the array. Then the algorithm checks using the cumulative sum if in the area starting at position kxl and ending at k+i x l+j there is one black and one white square. If there is, then I'm creating two more loops that will divide the area in half. If in the two new halves there are still black and white squares, then I'm increasing the corresponding 4D array element by the number of combinations of the first halve * the number of combinations of the second halve.
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
int counter=0;
int n, m;
ifstream in;
in.open("in.txt");
ofstream out;
out.open("out.txt");
if(!in.good())
{
cout << "No such file";
return 0;
}
in >> n >> m;
int whitesarray[m][n];
int blacksarray[m][n];
int methodsarray[m][n][m][n];
for(int i=0; i<m; i++)
{
for(int j=0; j<n; j++)
{
whitesarray[i][j] = 0;
blacksarray[i][j] = 0;
}
}
while(in)
{
string colour;
in >> colour;
for (int i=0; i < colour.length(); i++)
{
if(colour[i] == 'c')
{
blacksarray[counter][i] = 1;
}
if(colour[i] == 'b')
{
whitesarray[counter][i] = 1;
}
}
counter++;
}
int whitessum[m][n];
int blackssum[m][n];
for (int i=0; i<m; i++)
{
for (int j=0; j<n; j++)
{
if(i-1 == -1 && j-1 == -1)
{
whitessum[i][j] = whitesarray[i][j];
blackssum[i][j] = blacksarray[i][j];
}
if(i-1 == -1 && j-1 != -1)
{
whitessum[i][j] = whitessum[i][j-1] + whitesarray[i][j];
blackssum[i][j] = blackssum[i][j-1] + blacksarray[i][j];
}
if(j-1 == -1 && i-1 != -1)
{
whitessum[i][j] = whitessum[i-1][j] + whitesarray[i][j];
blackssum[i][j] = blackssum[i-1][j] + blacksarray[i][j];
}
if(j-1 != -1 && i-1 != -1)
{
whitessum[i][j] = whitessum[i-1][j] + whitessum[i][j-1] - whitessum[i-1][j-1] + whitesarray[i][j];
blackssum[i][j] = blackssum[i-1][j] + blackssum[i][j-1] - blackssum[i-1][j-1] + blacksarray[i][j];
}
}
}
int posx=0;
int posy=0;
int tempwhitessum=0;
int tempblackssum=0;
int k=0, l=0;
for (int i=0; i<=m; i++)
{
for (int j=0; j<=n; j++) // wielkosc wierszy
{
for (posx=0; posx < m - i; posx++)
{
for(posy = 0; posy < n - j; posy++)
{
k = i+posx-1;
l = j+posy-1;
if(k >= m || l >= n)
continue;
if(posx==0 && posy==0)
{
tempwhitessum = whitessum[k][l];
tempblackssum = blackssum[k][l];
}
if(posx==0 && posy!=0)
{
tempwhitessum = whitessum[k][l] - whitessum[k][posy-1];
tempblackssum = blackssum[k][l] - blackssum[k][posy-1];
}
if(posx!=0 && posy==0)
{
tempwhitessum = whitessum[k][l] - whitessum[posx-1][l];
tempblackssum = blackssum[k][l] - blackssum[posx-1][l];
}
if(posx!=0 && posy!=0)
{
tempwhitessum = whitessum[k][l] - whitessum[posx-1][l] - whitessum[k][posy-1] + whitessum[posx-1][posy-1];
tempblackssum = blackssum[k][l] - blackssum[posx-1][l] - blackssum[k][posy-1] + blackssum[posx-1][posy-1];
}
if(tempwhitessum >0 && tempblackssum > 0)
{
for(int e=0; e<n; e++)
{
//Somehow divide the previously found area by two and check again if there are black and white squares in this area
}
for(int r=0; r<m; r++)
{
//Somehow divide the previously found area by two and check again if there are black and white squares in this area
}
}
}
}
}}
return 0;
}
I strongly recommend recursion for this. In fact, Dynamic Programming (DP) would also be very useful, especially for larger bars. Recursion first ...
Recursion
Your recursive routine takes a 2-D array of characters (b and w). It returns the number of ways this can be broken.
First, the base cases: (1) if it's possible to break the given bar into a single piece (see my comment above, asking for clarification), return 1; (2) if the array is all one colour, return 1. For each of these, there's only one way for the bar to end up -- the way it was passed in.
Now, for the more complex case, when the bar can still be broken:
total_ways = 0
for each non-edge position in each dimension:
break the bar at that spot; form the two smaller bars, A and B.
count the ways to break each smaller bar: count(A) and count(B)
total_ways += count(A) * count(B)
return total_ways
Is that clear enough for the general approach? You still have plenty of coding to do, but using recursion allows you to think of only the two basic ideas when writing your function: (1) How do I know when I'm done, and what trivial result do I return then? (2) If I'm not done, how do I reduce the problem?
Dynamic Programming
This consists of keeping a record of situations you've already solved. The first thing you do in the routine is to check your "data base" to see whether you already know this case. If so, return the known result instead of recomputing. This includes the overhead of developing and implementing said data base, probably a look-up list (dictionary) of string arrays and integer results, such as ["bwb", "wbw"] => 5.
My problem is to find the repeating sequence of characters in the given array. simply, to identify the pattern in which the characters are appearing.
.---.---.---.---.---.---.---.---.---.---.---.---.---.---.
1: | J | A | M | E | S | O | N | J | A | M | E | S | O | N |
'---'---'---'---'---'---'---'---'---'---'---'---'---'---'
.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.
2: | R | O | N | R | O | N | R | O | N | R | O | N | R | O | N |
'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'
.---.---.---.---.---.---.---.---.---.---.---.---.
3: | S | H | A | M | I | L | S | H | A | M | I | L |
'---'---'---'---'---'---'---'---'---'---'---'---'
.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.
4: | C | A | R | P | E | N | T | E | R | C | A | R | P | E | N | T | E | R |
'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'
Example
Given the previous data, the result should be:
"JAMESON"
"RON"
"SHAMIL"
"CARPENTER"
Question
How to deal with this problem efficiently?
Tongue-in-cheek O(NlogN) solution
Perform an FFT on your string (treating characters as numeric values). Every peak in the resulting graph corresponds to a substring periodicity.
For your examples, my first approach would be to
get the first character of the array (for your last example, that would be C)
get the index of the next appearance of that character in the array (e.g. 9)
if it is found, search for the next appearance of the substring between the two appearances of the character (in this case CARPENTER)
if it is found, you're done (and the result is this substring).
Of course, this works only for a very limited subset of possible arrays, where the same word is repeated over and over again, starting from the beginning, without stray characters in between, and its first character is not repeated within the word. But all your examples fall into this category - and I prefer the simplest solution which could possibly work :-)
If the repeated word contains the first character multiple times (e.g. CACTUS), the algorithm can be extended to look for subsequent occurrences of that character too, not only the first one (so that it finds the whole repeated word, not only a substring of it).
Note that this extended algorithm would give a different result for your second example, namely RONRON instead of RON.
In Python, you can leverage regexes thus:
def recurrence(text):
import re
for i in range(1, len(text)/2 + 1):
m = re.match(r'^(.{%d})\1+$'%i, text)
if m: return m.group(1)
recurrence('abcabc') # Returns 'abc'
I'm not sure how this would translate to Java or C. (That's one of the reasons I like Python, I guess. :-)
First write a method that find repeating substring sub in the container string as below.
boolean findSubRepeating(String sub, String container);
Now keep calling this method with increasing substring in the container, first try 1 character substring, then 2 characters, etc going upto container.length/2.
Pseudocode
len = str.length
for (i in 1..len) {
if (len%i==0) {
if (str==str.substr(0,i).repeat(len/i)) {
return str.substr(0,i)
}
}
}
Note: For brevity, I'm inventing a "repeat" method for strings, which isn't actually part of Java's string; "abc".repeat(2)="abcabc"
Using C++:
//Splits the string into the fragments of given size
//Returns the set of of splitted strings avaialble
set<string> split(string s, int frag)
{
set<string> uni;
int len = s.length();
for(int i = 0; i < len; i+= frag)
{
uni.insert(s.substr(i, frag));
}
return uni;
}
int main()
{
string out;
string s = "carpentercarpenter";
int len = s.length();
//Optimistic approach..hope there are only 2 repeated strings
//If that fails, then try to break the strings with lesser number of
//characters
for(int i = len/2; i>1;--i)
{
set<string> uni = split(s,i);
if(uni.size() == 1)
{
out = *uni.begin();
break;
}
}
cout<<out;
return 0;
}
The first idea that comes to my mind is trying all repeating sequences of lengths that divide length(S) = N. There is a maximum of N/2 such lengths, so this results in a O(N^2) algorithm.
But i'm sure it can be improved...
Here is a more general solution to the problem, that will find repeating subsequences within an sequence (of anything), where the subsequences do not have to start at the beginning, nor immediately follow each other.
given an sequence b[0..n], containing the data in question, and a threshold t being the minimum subsequence length to find,
l_max = 0, i_max = 0, j_max = 0;
for (i=0; i<n-(t*2);i++) {
for (j=i+t;j<n-t; j++) {
l=0;
while (i+l<j && j+l<n && b[i+l] == b[j+l])
l++;
if (l>t) {
print "Sequence of length " + l + " found at " + i + " and " + j);
if (l>l_max) {
l_max = l;
i_max = i;
j_max = j;
}
}
}
}
if (l_max>t) {
print "longest common subsequence found at " + i_max + " and " + j_max + " (" + l_max + " long)";
}
Basically:
Start at the beginning of the data, iterate until within 2*t of the end (no possible way to have two distinct subsequences of length t in less than 2*t of space!)
For the second subsequence, start at least t bytes beyond where the first sequence begins.
Then, reset the length of the discovered subsequence to 0, and check to see if you have a common character at i+l and j+l. As long as you do, increment l.
When you no longer have a common character, you have reached the end of your common subsequence.
If the subsequence is longer than your threshold, print the result.
Just figured this out myself and wrote some code for this (written in C#) with a lot of comments. Hope this helps someone:
// Check whether the string contains a repeating sequence.
public static bool ContainsRepeatingSequence(string str)
{
if (string.IsNullOrEmpty(str)) return false;
for (int i=0; i<str.Length; i++)
{
// Every iteration, cut down the string from i to the end.
string toCheck = str.Substring(i);
// Set N equal to half the length of the substring. At most, we have to compare half the string to half the string. If the string length is odd, the last character will not be checked against, but it will be checked in the next iteration.
int N = toCheck.Length / 2;
// Check strings of all lengths from 1 to N against the subsequent string of length 1 to N.
for (int j=1; j<=N; j++)
{
// Check from beginning to j-1, compare against j to j+j.
if (toCheck.Substring(0, j) == toCheck.Substring(j, j)) return true;
}
}
return false;
}
Feel free to ask any questions if it's unclear why it works.
and here is a concrete working example:
/* find greatest repeated substring */
char *fgrs(const char *s,size_t *l)
{
char *r=0,*a=s;
*l=0;
while( *a )
{
char *e=strrchr(a+1,*a);
if( !e )
break;
do {
size_t t=1;
for(;&a[t]!=e && a[t]==e[t];++t);
if( t>*l )
*l=t,r=a;
while( --e!=a && *e!=*a );
} while( e!=a && *e==*a );
++a;
}
return r;
}
size_t t;
const char *p;
p=fgrs("BARBARABARBARABARBARA",&t);
while( t-- ) putchar(*p++);
p=fgrs("0123456789",&t);
while( t-- ) putchar(*p++);
p=fgrs("1111",&t);
while( t-- ) putchar(*p++);
p=fgrs("11111",&t);
while( t-- ) putchar(*p++);
Not sure how you define "efficiently". For easy/fast implementation you could do this in Java:
private static String findSequence(String text) {
Pattern pattern = Pattern.compile("(.+?)\\1+");
Matcher matcher = pattern.matcher(text);
return matcher.matches() ? matcher.group(1) : null;
}
it tries to find the shortest string (.+?) that must be repeated at least once (\1+) to match the entire input text.
This is a solution I came up with using the queue, it passed all the test cases of a similar problem in codeforces. Problem No is 745A.
#include<bits/stdc++.h>
using namespace std;
typedef long long ll;
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
string s, s1, s2; cin >> s; queue<char> qu; qu.push(s[0]); bool flag = true; int ind = -1;
s1 = s.substr(0, s.size() / 2);
s2 = s.substr(s.size() / 2);
if(s1 == s2)
{
for(int i=0; i<s1.size(); i++)
{
s += s1[i];
}
}
//cout << s1 << " " << s2 << " " << s << "\n";
for(int i=1; i<s.size(); i++)
{
if(qu.front() == s[i]) {qu.pop();}
qu.push(s[i]);
}
int cycle = qu.size();
/*queue<char> qu2 = qu; string str = "";
while(!qu2.empty())
{
cout << qu2.front() << " ";
str += qu2.front();
qu2.pop();
}*/
while(!qu.empty())
{
if(s[++ind] != qu.front()) {flag = false; break;}
qu.pop();
}
flag == true ? cout << cycle : cout << s.size();
return 0;
}
I'd convert the array to a String object and use regex
Put all your character in an array e.x. a[]
i=0; j=0;
for( 0 < i < count )
{
if (a[i] == a[i+j+1])
{++i;}
else
{++j;i=0;}
}
Then the ratio of (i/j) = repeat count in your array.
You must pay attention to limits of i and j, but it is the simple solution.
I want to create a program to search for a subtext in a text.
For example, I have this text: abcdeabbdfeg
And in that text I want to find: cd
But I want to use this algorithm:
start = 1
end = string length of the text
middle = (start + end) / 2
if (pattern < text[middle]) end = mid - 1;
if (pattern > text[middle]) start = mid + 1;
...and continue until the pattern is found in the text
So, I already have a simple program that completely works without any problem but without that algorithm above, so now I only want to implement that algorithm above in my program, I have tried many ways, but my program won't show anything in any case, after I add that algorithm...
This is the code that I have and works:
void search(char *pat, char *txt)
{
int M = strlen(pat);
int N = strlen(txt);
for (int i = 0; i <= N - M; i++)
{
int j;
for (j = 0; j < M; j++)
{
if (txt[i+j] != pat[j])
break;
}
if (j == M)
{
printf("Pattern found at index %d \n", i);
}
}
}
And this is the code above with the implementation of the algorithm:
int _tmain(int argc, _TCHAR* argv[])
{
char t[32];
cout << "Please enter your text (t):";
cin >> t;
char p[32];
cout << "Please enter the pattern (p) you wish to look for in that text (t):";
cin >> p;
int start, end = 0;
double middle = 0;
start = 1;
end = strlen(t);
while (start <= end)
{
int M = strlen(p);
int N = strlen(t);
middle = std::ceil((start + end) / 2.0);
int mid = (int)middle;
for (int i = mid; i <= M; i++)
{
int j;
for (j = 0; j < M; j++)
{
if (t[mid] != p[j]) break;
if (p[j] < t[mid]) { end = mid - 1; }
else if (p[j] > t[mid]) { start = mid + 1; }
}
if (j == M)
{
printf("Pattern found at index %d \n", i);
}
}
}
if (start > end) cout << "Search has ended: pattern p does not occur in the text." << endl;
return 0;
}
Your algorithm is still a binary search. You split the array into two partitions, then select a partition, based on the value of a letter.
The requirements of a partitioned search is to have an ordered collection.
Let's use your example.
0 1 2 3 4 5 6 7 8 9 10 11
+---+---+---+---+---+---+---+---+---+---+---+---+
| a | b | c | d | e | a | b | b | d | f | e | g |
+---+---+---+---+---+---+---+---+---+---+---+---+
If you choose the midpoint at index 5, this yields the letter a. Since you are searching for the letter c first, then d, the algorithm says that the letter c must lie in the partition 6..11. Thus the basis of your issue.
The algorithm will not find cd because there is no c in the partition 6..11.
The algorithm assumes that the array is sorted and that for any given index, there will be one partition containing values less than array[index] and one partition containing values greater than array[index].
This assumption is demonstrated by the following code of yours:
if (p[j] < t[mid]) { end = mid - 1; }
else if (p[j] > t[mid]) { start = mid + 1; }
No matter how you name your algorithm, if it assumes an ordering on the array (e.g. p[j] < t[mid]), the array must be ordered.
You data is not ordered, so your algorithm fails the assumption, and thus the algorithm fails.
Edit 1:
Using Partitions
If you really must use a partitioning algorithm, you will need to build a set of partitions.
For example one partition starts at index 0 and proceeds until array[i] > array[i+1], this ends up at index 4. The other partition is 5..11.
(By the way, by determining the partitions, you have used more operations than a linear search.)
At this point, how do you know which partition to choose?
You don't. The letter c, that you are searching for, lies between a and e in the first partition; and a through g in the second partition. Pick a partition. If not found in the partition, you will have to search the other partition.
By performing a binary search on either partition, you have used more operations than a linear search.
Recently i was taking part in one Hackathon and i came to know about a problem which tries to find a pattern of a grid form in a 2d matrix.A pattern could be U,H and T and will be represented by 3*3 matrix
suppose if i want to present H and U
+--+--+--+ +--+--+--+
|1 |0 |1 | |1 |0 |1 |
+--+--+--+ +--+--+--+
|1 |1 |1 | --> H |1 |0 |1 | -> U
+--+--+--+ +--+--+--+
|1 |0 |1 | |1 |1 |1 |
+--+--+--+ +--+--+--+
Now i need to search this into 10*10 matrix containing 0s and 1s.Closest and only solution i can get it brute force algorithm of O(n^4).In languages like MATLAB and R there are very subtle ways to do this but not in C,C++. I tried a lot to search this solution on Google and on SO.But closest i can get is this SO POST which discuss about implementing Rabin-Karp string-search algorithm .But there is no pseudocode or any post explaining this.Could anyone help or provide any link,pdf or some logic to simplify this?
EDIT
as Eugene Sh. commented that If N is the size of the large matrix(NxN) and k - the small one (kxk), the buteforce algorithm should take O((Nk)^2). Since k is fixed, it is reducing to O(N^2).Yes absolutely right.
But is there is any generalised way if N and K is big?
Alright, here is then the 2D Rabin-Karp approach.
For the following discussion, assume we want to find a (m, m) sub-matrix inside a (n, n) matrix. (The concept works for rectangular matrices just as well but I ran out of indices.)
The idea is that for each possible sub-matrix, we compute a hash. Only if that hash matches the hash of the matrix we want to find, we will compare element-wise.
To make this efficient, we must avoid re-computing the entire hash of the sub-matrix each time. Because I got little sleep tonight, the only hash function for which I could figure out how to do this easily is the sum of 1s in the respective sub-matrix. I leave it as an exercise to someone smarter than me to figure out a better rolling hash function.
Now, if we have just checked the sub-matrix from (i, j) to (i + m – 1, j + m – 1) and know it has x 1s inside, we can compute the number of 1s in the sub-matrix one to the right – that is, from (i, j + 1) to (i + m – 1, j + m) – by subtracting the number of 1s in the sub-vector from (i, j) to (i + m – 1, j) and adding the number of 1s in the sub-vector from (i, j + m) to (i + m – 1, j + m).
If we hit the right margin of the large matrix, we shift the window down by one and then back to the left margin and then again down by one and then again to the right and so forth.
Note that this requires O(m) operations, not O(m2) for each candidate. If we do this for every pair of indices, we get O(mn2) work. Thus, by cleverly shifting a window of the size of the potential sub-matrix through the large matrix, we can reduce the amount of work by a factor of m. That is, if we don't get too many hash collisions.
Here is a picture:
As we shift the current window one to the right, we subtract the number of 1s in the red column vector on the left side and add the number of 1s in the green column vector on the right side to obtain the number of 1s in the new window.
I have implemented a quick demo of this idea using the great Eigen C++ template library. The example also uses some stuff from Boost but only for argument parsing and output formatting so you can easily get rid of it if you don't have Boost but want to try out the code. The index fiddling is a bit messy but I'll leave it without further explanation here. The above prose should cover it sufficiently.
#include <cassert>
#include <cstddef>
#include <cstdlib>
#include <iostream>
#include <random>
#include <type_traits>
#include <utility>
#include <boost/format.hpp>
#include <boost/lexical_cast.hpp>
#include <Eigen/Dense>
#define PROGRAM "submatrix"
#define SEED_CSTDLIB_RAND 1
using BitMatrix = Eigen::Matrix<bool, Eigen::Dynamic, Eigen::Dynamic>;
using Index1D = BitMatrix::Index;
using Index2D = std::pair<Index1D, Index1D>;
std::ostream&
operator<<(std::ostream& out, const Index2D& idx)
{
out << "(" << idx.first << ", " << idx.second << ")";
return out;
}
BitMatrix
get_random_bit_matrix(const Index1D rows, const Index1D cols)
{
auto matrix = BitMatrix {rows, cols};
matrix.setRandom();
return matrix;
}
Index2D
findSubMatrix(const BitMatrix& haystack,
const BitMatrix& needle,
Index1D *const collisions_ptr = nullptr) noexcept
{
static_assert(std::is_signed<Index1D>::value, "unsigned index type");
const auto end = Index2D {haystack.rows(), haystack.cols()};
const auto hr = haystack.rows();
const auto hc = haystack.cols();
const auto nr = needle.rows();
const auto nc = needle.cols();
if (nr > hr || nr > hc)
return end;
const auto target = needle.count();
auto current = haystack.block(0, 0, nr - 1, nc).count();
auto j = Index1D {0};
for (auto i = Index1D {0}; i <= hr - nr; ++i)
{
if (j == 0) // at left margin
current += haystack.block(i + nr - 1, 0, 1, nc).count();
else if (j == hc - nc) // at right margin
current += haystack.block(i + nr - 1, hc - nc, 1, nc).count();
else
assert(!"this should never happen");
while (true)
{
if (i % 2 == 0) // moving right
{
if (j > 0)
current += haystack.block(i, j + nc - 1, nr, 1).count();
}
else // moving left
{
if (j < hc - nc)
current += haystack.block(i, j, nr, 1).count();
}
assert(haystack.block(i, j, nr, nc).count() == current);
if (current == target)
{
// TODO: There must be a better way than using cwiseEqual().
if (haystack.block(i, j, nr, nc).cwiseEqual(needle).all())
return Index2D {i, j};
else if (collisions_ptr)
*collisions_ptr += 1;
}
if (i % 2 == 0) // moving right
{
if (j < hc - nc)
{
current -= haystack.block(i, j, nr, 1).count();
++j;
}
else break;
}
else // moving left
{
if (j > 0)
{
current -= haystack.block(i, j + nc - 1, nr, 1).count();
--j;
}
else break;
}
}
if (i % 2 == 0) // at right margin
current -= haystack.block(i, hc - nc, 1, nc).count();
else // at left margin
current -= haystack.block(i, 0, 1, nc).count();
}
return end;
}
int
main(int argc, char * * argv)
{
if (SEED_CSTDLIB_RAND)
{
std::random_device rnddev {};
srand(rnddev());
}
if (argc != 5)
{
std::cerr << "usage: " << PROGRAM
<< " ROWS_HAYSTACK COLUMNS_HAYSTACK"
<< " ROWS_NEEDLE COLUMNS_NEEDLE"
<< std::endl;
return EXIT_FAILURE;
}
auto hr = boost::lexical_cast<Index1D>(argv[1]);
auto hc = boost::lexical_cast<Index1D>(argv[2]);
auto nr = boost::lexical_cast<Index1D>(argv[3]);
auto nc = boost::lexical_cast<Index1D>(argv[4]);
const auto haystack = get_random_bit_matrix(hr, hc);
const auto needle = get_random_bit_matrix(nr, nc);
auto collisions = Index1D {};
const auto idx = findSubMatrix(haystack, needle, &collisions);
const auto end = Index2D {haystack.rows(), haystack.cols()};
std::cout << "This is the haystack:\n\n" << haystack << "\n\n";
std::cout << "This is the needle:\n\n" << needle << "\n\n";
if (idx != end)
std::cout << "Found as sub-matrix at " << idx << ".\n";
else
std::cout << "Not found as sub-matrix.\n";
std::cout << boost::format("There were %d (%.2f %%) hash collisions.\n")
% collisions
% (100.0 * collisions / ((hr - nr) * (hc - nc)));
return (idx != end) ? EXIT_SUCCESS : EXIT_FAILURE;
}
While it compiles and runs, please consider the above as pseudo-code. I have made almost no attempt at optimizing it. It was just a proof-of concept for myself.
I'm going to present an algorithm that takes O(n*n) time in the worst case whenever k = O(sqrt(n)) and O(n*n + n*k*k) in general. This is an extension of Aho-Corasick to 2D. Recall that Aho-Corasick locates all occurrences of a set of patterns in a target string T, and it does so in time linear in pattern lengths, length of T, and number of occurrences.
Let's introduce some terminology. The haystack is the large matrix we are searching in and the needle is the pattern-matrix. The haystack is a nxn matrix and the needle is a kxk matrix. The set of patterns that we are going to use in Aho-Corasick is the set of rows of the needle. This set contains at most k rows and will have fewer if there are duplicate rows.
We are going to build the Aho-Corasick automaton (which is a Trie augmented with failure links) and then run the search algorithm on each row of the haystack. So we take every row of the needle and search for it in every row of the haystack. We can use a linear time 1D matching algorithm to do this but that would still be inefficient. The advantage of Aho-Corasick is that it searches for all patterns at once.
During the search we are going to populate a matrix A which we are going to use later. When we search in the first row of the haystack, the first row of A is filled with the occurrences of rows of the needle in the first row of the haystack. So we'll end up with a first row of A that looks like 2 - 0 - - 1 for example. This means that row number 0 of the needle appears at position 2 in the first row of the haystack; row number 1 appears at position 5; row number 2 appears at position 0. The - entries are positions that did not get matched. Keep doing this for every row.
Let's for now assume that there are no duplicate rows in the needle. Assign 0 to the first row of the needle, 1 to the second, and so on. Now we are going to search for the pattern [0 1 2 ... k-1] in every column of the matrix A using a linear time 1D search algorithm (KMP for example). Recall that every row of A stores positions at which rows of the needle appear. So if a column contains the pattern [0 1 2 ... k-1], this means that row number 0 of the needle appears at some row of the haystack, row number 1 of the needle is just below it, and so on. This is exactly what we want. If there are duplicate rows, just assign a unique number to each unique row.
Search in column takes O(n) using a linear time algorithm. So searching all columns takes O(n*n). We populate the matrix during the search, we search every row of the haystack (there are n rows) and the search in a row takes O(n+k*k). So O(n(n+k*k)) overall.
So the idea was to find that matrix and then reduce the problem to 1D pattern matching. Aho-Corasick is just there for efficiency, I don't know if there is another efficient way to find the matrix.
EDIT: added implementation.
Here is my c++ implementation. The max value of n is set to 100 but you can change it.
The program starts by reading two integers n k (the dimensions of the matrices). Then it reads n lines each containing a string of 0's and 1's of length n. Then it reads k lines each containing a string of 0's and 1's of length k. The output is the upper-left coordinate of all matches. For the following input for example.
12 2
101110111011
111010111011
110110111011
101110111010
101110111010
101110111010
101110111010
111010111011
111010111011
111010111011
111010111011
111010111011
11
10
The program will output:
match at (2,0)
match at (1,1)
match at (0,2)
match at (6,2)
match at (2,10)
#include <cstdio>
#include <cstring>
#include <string>
#include <queue>
#include <iostream>
using namespace std;
const int N = 100;
const int M = N;
int n, m;
string haystack[N], needle[M];
int A[N][N]; /* filled by successive calls to match */
int p[N]; /* pattern to search for in columns of A */
struct Node
{ Node *a[2]; /* alphabet is binary */
Node *suff; /* pointer to node whose prefix = longest proper suffix of this node */
int flag;
Node()
{ a[0] = a[1] = 0;
suff = 0;
flag = -1;
}
};
void insert(Node *x, string s)
{ static int id = 0;
static int p_size = 0;
for(int i = 0; i < s.size(); i++)
{ char c = s[i];
if(x->a[c - '0'] == 0)
x->a[c - '0'] = new Node;
x = x->a[c - '0'];
}
if(x->flag == -1)
x->flag = id++;
/* update pattern */
p[p_size++] = x->flag;
}
Node *longest_suffix(Node *x, int c)
{ while(x->a[c] == 0)
x = x->suff;
return x->a[c];
}
Node *mk_automaton(void)
{ Node *trie = new Node;
for(int i = 0; i < m; i++)
{ insert(trie, needle[i]);
}
queue<Node*> q;
/* level 1 */
for(int i = 0; i < 2; i++)
{ if(trie->a[i])
{ trie->a[i]->suff = trie;
q.push(trie->a[i]);
}
else trie->a[i] = trie;
}
/* level > 1 */
while(q.empty() == false)
{ Node *x = q.front(); q.pop();
for(int i = 0; i < 2; i++)
{ if(x->a[i] == 0) continue;
x->a[i]->suff = longest_suffix(x->suff, i);
q.push(x->a[i]);
}
}
return trie;
}
/* search for patterns in haystack[j] */
void match(Node *x, int j)
{ for(int i = 0; i < n; i++)
{ x = longest_suffix(x, haystack[j][i] - '0');
if(x->flag != -1)
{ A[j][i-m+1] = x->flag;
}
}
}
int match2d(Node *x)
{ int matches = 0;
static int z[M+N];
static int z_str[M+N+1];
/* init */
memset(A, -1, sizeof(A));
/* fill the A matrix */
for(int i = 0; i < n; i++)
{ match(x, i);
}
/* build string for z algorithm */
z_str[n+m] = -2; /* acts like `\0` for strings */
for(int i = 0; i < m; i++)
{ z_str[i] = p[i];
}
for(int i = 0; i < n; i++)
{ /* search for pattern in column i */
for(int j = 0; j < n; j++)
{ z_str[j + m] = A[j][i];
}
/* run z algorithm */
int l, r;
l = r = 0;
z[0] = n + m;
for(int j = 1; j < n + m; j++)
{ if(j > r)
{ l = r = j;
while(z_str[r] == z_str[r - l]) r++;
z[j] = r - l;
r--;
}
else
{ if(z[j - l] < r - j + 1)
{ z[j] = z[j - l];
}
else
{ l = j;
while(z_str[r] == z_str[r - l]) r++;
z[j] = r - l;
r--;
}
}
}
/* locate matches */
for(int j = m; j < n + m; j++)
{ if(z[j] >= m)
{ printf("match at (%d,%d)\n", j - m, i);
matches++;
}
}
}
return matches;
}
int main(void)
{ cin >> n >> m;
for(int i = 0; i < n; i++)
{ cin >> haystack[i];
}
for(int i = 0; i < m; i++)
{ cin >> needle[i];
}
Node *trie = mk_automaton();
match2d(trie);
return 0;
}
Let's start with an O(N * N * K) solution. I will use the following notation: A is a pattern matrix, B is a big matrix(the one we will search for occurrences of the pattern in).
We can fix a top row of the B matrix(that is, we will search for all occurrence that start in a position (this row, any column). Let's call this row a topRow. Now we can take a slice of this matrix that contains [topRow; topRow + K) rows and all columns.
Let's create a new matrix as a result of concatenation A + column + the slice, where a column is a column with K elements that are not present in A or B(if A and B consist of 0 and 1, we can use -1, for instance). Now we can treat columns of this new matrix as letters and run the Knuth-Morris-Pratt's algorithm. Comparing two letters requires O(K) time, thus the time complexity of this step is O(N * K).
There are O(N) ways to fix the top row, so the total time complexity is O(N * N * K). It is already better than a brute-force solution, but we are not done yet. The theoretical lower bound is O(N * N)(I assume that N >= K), and I want to achieve it.
Let's take a look at what can be improved here. If we could compare two columns of a matrix in O(1) time instead of O(k), we would have achieved the desired time complexity. Let's concatenate all columns of both A and B inserting some separator after each column. Now we have a string and we need to compare its substrings(because columns and their parts are substrings now). Let's construct a suffix tree in linear time(using Ukkonnen's algorithm). Now comparing two substrings is all about finding the height of the lowest common ancestor(LCA) of two nodes in this tree. There is an algorithm that allows us to do it with linear preprocessing time and O(1) time per LCA query. It means that we can compare two substrings(or columns) in constant time! Thus, the total time complexity is O(N * N). There is another way to achieve this time complexity: we can build a suffix array in linear time and answer the longest common prefix queries in constant time(with a linear time preprocessing). However, both of this O(N * N) solutions look pretty hard to implement and they will have a big constant.
P.S If we have a polynomial hash function that we can fully trust(or we are fine with a few false positives), we can get a much simpler O(N * N) solution using 2-D polynomial hashes.
how do you go about solving this problem ::
A robot is located at the top-left corner of a 4x4 grid. The robot can move either up, down, left, or right, but can not visit the same spot twice. The robot is trying to reach the bottom-right corner of the grid.
My ideas:
Backtracking solution where we go through the tree of all solutions and print once we reach goal cell. I implemented that but I'm not sure if it's correct or makes sense, or if it is even the right approach. I've posted code here and would much appreciate if someone could explain what's wrong with it.
Recursive solution where I start at start cell, and find the path to the goal cell from each of its neighboring cells recursively, with the base case being hitting the goal cell.
QUESTIONS:
1) Are these two ways of expressing the same idea?
2) Do these ideas even make sense?
3) What is the time complexity of each of these solutions? I think the second one is 4^n?
4) Does my backtracking code make sense?
5) Is there a much simpler way to do this?
Here is my code, which prints the correct number of paths for N = 4. Is it correct?
#define N 4
int counter = 0;
bool legal_move(int x, int y, int array[N+2][N+2]){
bool ret = (array[x][y] == 1);
array[x][y] = 0;
return ret;
}
/*
void print_array(int array[N+2][N+2]){
for(int i = 0; i < N+2; i++){
for(int j = 0; j < N+2; j++)
cout << array[i][j] << " ";
cout << endl;
}
cout << endl << endl;
}
*/
void print_paths(int x, int y, int n, int m, int array[N+2][N+2]){
if(x == n && y == m){
print_array(array);
counter++;
}
else {
int dx = 1;
int dy = 0;
for(int i = 0; i < 4; i++){
if(legal_move(x + dx, y + dy, array)){
print_paths(x + dx, y + dy, n, m, array);
array[x+dx][y+dy] = 1;
}
swap(dx,dy);
if(i == 1)
dx = -dx;
}
}
}
int main(){
int array[N+2][N+2];
for(int i = 1; i < N+1; i++)
for(int j = 1; j < N+1; j++)
array[i][j] = 1;
for(int i = 0; i < N+2; i++)
array[0][i] = array[i][0] = array[N+1][i] = array[i][N+1] = 0;
//print_array(array);
array[1][1] = 0; //Set start cell to be seen.
print_paths(1,1,N,N,array);
cout << counter << endl;
}
I think it's the same idea.
The problem with your code is that you haven't implemented 'but can not visit the same spot twice' correctly.
Suppose that your robot has gone from S to A by some path, and you are now examining whether to go to B adjacent to A. The test should be 'has the robot been to B before on the current path'. But what you have implemented is 'has the robot visited B before on any path'.
In other words you need to modify print_paths to take an extra parameter for the current path, and use that to implement the test correctly.
Here is a partial answer, which talks mostly about complexity (I think your code is correct, after the minor bugfix suggested by john, and backtracking is probably the simplest way to do what you want).
The time complexity seems something like O(3^n^2) to me - there are at most n^2 nodes, and at each node you want to check 3 possibilities. There are actually less nodes, because of the way backtracking works, but the complexity is at least O(2^(n^2/4)), so it's greater than any exponential. (the O in the last formula is actually a theta).
The below diagram describes this lower bound. Cells marked by ? have a "decision" in them - the robot may decide to go straight or turn. There are at least n^2/4 such cells, so the number of paths is at least 2^(n^2/4).
?->-?->-?->-V
| | | | | | |
>-^ >-^ >-^ V
|
V-<-?-<-?-<-?
| | | | | | |
V ^-< ^-< ^-<
|
...