Recently i was taking part in one Hackathon and i came to know about a problem which tries to find a pattern of a grid form in a 2d matrix.A pattern could be U,H and T and will be represented by 3*3 matrix
suppose if i want to present H and U
+--+--+--+ +--+--+--+
|1 |0 |1 | |1 |0 |1 |
+--+--+--+ +--+--+--+
|1 |1 |1 | --> H |1 |0 |1 | -> U
+--+--+--+ +--+--+--+
|1 |0 |1 | |1 |1 |1 |
+--+--+--+ +--+--+--+
Now i need to search this into 10*10 matrix containing 0s and 1s.Closest and only solution i can get it brute force algorithm of O(n^4).In languages like MATLAB and R there are very subtle ways to do this but not in C,C++. I tried a lot to search this solution on Google and on SO.But closest i can get is this SO POST which discuss about implementing Rabin-Karp string-search algorithm .But there is no pseudocode or any post explaining this.Could anyone help or provide any link,pdf or some logic to simplify this?
EDIT
as Eugene Sh. commented that If N is the size of the large matrix(NxN) and k - the small one (kxk), the buteforce algorithm should take O((Nk)^2). Since k is fixed, it is reducing to O(N^2).Yes absolutely right.
But is there is any generalised way if N and K is big?
Alright, here is then the 2D Rabin-Karp approach.
For the following discussion, assume we want to find a (m, m) sub-matrix inside a (n, n) matrix. (The concept works for rectangular matrices just as well but I ran out of indices.)
The idea is that for each possible sub-matrix, we compute a hash. Only if that hash matches the hash of the matrix we want to find, we will compare element-wise.
To make this efficient, we must avoid re-computing the entire hash of the sub-matrix each time. Because I got little sleep tonight, the only hash function for which I could figure out how to do this easily is the sum of 1s in the respective sub-matrix. I leave it as an exercise to someone smarter than me to figure out a better rolling hash function.
Now, if we have just checked the sub-matrix from (i, j) to (i + m – 1, j + m – 1) and know it has x 1s inside, we can compute the number of 1s in the sub-matrix one to the right – that is, from (i, j + 1) to (i + m – 1, j + m) – by subtracting the number of 1s in the sub-vector from (i, j) to (i + m – 1, j) and adding the number of 1s in the sub-vector from (i, j + m) to (i + m – 1, j + m).
If we hit the right margin of the large matrix, we shift the window down by one and then back to the left margin and then again down by one and then again to the right and so forth.
Note that this requires O(m) operations, not O(m2) for each candidate. If we do this for every pair of indices, we get O(mn2) work. Thus, by cleverly shifting a window of the size of the potential sub-matrix through the large matrix, we can reduce the amount of work by a factor of m. That is, if we don't get too many hash collisions.
Here is a picture:
As we shift the current window one to the right, we subtract the number of 1s in the red column vector on the left side and add the number of 1s in the green column vector on the right side to obtain the number of 1s in the new window.
I have implemented a quick demo of this idea using the great Eigen C++ template library. The example also uses some stuff from Boost but only for argument parsing and output formatting so you can easily get rid of it if you don't have Boost but want to try out the code. The index fiddling is a bit messy but I'll leave it without further explanation here. The above prose should cover it sufficiently.
#include <cassert>
#include <cstddef>
#include <cstdlib>
#include <iostream>
#include <random>
#include <type_traits>
#include <utility>
#include <boost/format.hpp>
#include <boost/lexical_cast.hpp>
#include <Eigen/Dense>
#define PROGRAM "submatrix"
#define SEED_CSTDLIB_RAND 1
using BitMatrix = Eigen::Matrix<bool, Eigen::Dynamic, Eigen::Dynamic>;
using Index1D = BitMatrix::Index;
using Index2D = std::pair<Index1D, Index1D>;
std::ostream&
operator<<(std::ostream& out, const Index2D& idx)
{
out << "(" << idx.first << ", " << idx.second << ")";
return out;
}
BitMatrix
get_random_bit_matrix(const Index1D rows, const Index1D cols)
{
auto matrix = BitMatrix {rows, cols};
matrix.setRandom();
return matrix;
}
Index2D
findSubMatrix(const BitMatrix& haystack,
const BitMatrix& needle,
Index1D *const collisions_ptr = nullptr) noexcept
{
static_assert(std::is_signed<Index1D>::value, "unsigned index type");
const auto end = Index2D {haystack.rows(), haystack.cols()};
const auto hr = haystack.rows();
const auto hc = haystack.cols();
const auto nr = needle.rows();
const auto nc = needle.cols();
if (nr > hr || nr > hc)
return end;
const auto target = needle.count();
auto current = haystack.block(0, 0, nr - 1, nc).count();
auto j = Index1D {0};
for (auto i = Index1D {0}; i <= hr - nr; ++i)
{
if (j == 0) // at left margin
current += haystack.block(i + nr - 1, 0, 1, nc).count();
else if (j == hc - nc) // at right margin
current += haystack.block(i + nr - 1, hc - nc, 1, nc).count();
else
assert(!"this should never happen");
while (true)
{
if (i % 2 == 0) // moving right
{
if (j > 0)
current += haystack.block(i, j + nc - 1, nr, 1).count();
}
else // moving left
{
if (j < hc - nc)
current += haystack.block(i, j, nr, 1).count();
}
assert(haystack.block(i, j, nr, nc).count() == current);
if (current == target)
{
// TODO: There must be a better way than using cwiseEqual().
if (haystack.block(i, j, nr, nc).cwiseEqual(needle).all())
return Index2D {i, j};
else if (collisions_ptr)
*collisions_ptr += 1;
}
if (i % 2 == 0) // moving right
{
if (j < hc - nc)
{
current -= haystack.block(i, j, nr, 1).count();
++j;
}
else break;
}
else // moving left
{
if (j > 0)
{
current -= haystack.block(i, j + nc - 1, nr, 1).count();
--j;
}
else break;
}
}
if (i % 2 == 0) // at right margin
current -= haystack.block(i, hc - nc, 1, nc).count();
else // at left margin
current -= haystack.block(i, 0, 1, nc).count();
}
return end;
}
int
main(int argc, char * * argv)
{
if (SEED_CSTDLIB_RAND)
{
std::random_device rnddev {};
srand(rnddev());
}
if (argc != 5)
{
std::cerr << "usage: " << PROGRAM
<< " ROWS_HAYSTACK COLUMNS_HAYSTACK"
<< " ROWS_NEEDLE COLUMNS_NEEDLE"
<< std::endl;
return EXIT_FAILURE;
}
auto hr = boost::lexical_cast<Index1D>(argv[1]);
auto hc = boost::lexical_cast<Index1D>(argv[2]);
auto nr = boost::lexical_cast<Index1D>(argv[3]);
auto nc = boost::lexical_cast<Index1D>(argv[4]);
const auto haystack = get_random_bit_matrix(hr, hc);
const auto needle = get_random_bit_matrix(nr, nc);
auto collisions = Index1D {};
const auto idx = findSubMatrix(haystack, needle, &collisions);
const auto end = Index2D {haystack.rows(), haystack.cols()};
std::cout << "This is the haystack:\n\n" << haystack << "\n\n";
std::cout << "This is the needle:\n\n" << needle << "\n\n";
if (idx != end)
std::cout << "Found as sub-matrix at " << idx << ".\n";
else
std::cout << "Not found as sub-matrix.\n";
std::cout << boost::format("There were %d (%.2f %%) hash collisions.\n")
% collisions
% (100.0 * collisions / ((hr - nr) * (hc - nc)));
return (idx != end) ? EXIT_SUCCESS : EXIT_FAILURE;
}
While it compiles and runs, please consider the above as pseudo-code. I have made almost no attempt at optimizing it. It was just a proof-of concept for myself.
I'm going to present an algorithm that takes O(n*n) time in the worst case whenever k = O(sqrt(n)) and O(n*n + n*k*k) in general. This is an extension of Aho-Corasick to 2D. Recall that Aho-Corasick locates all occurrences of a set of patterns in a target string T, and it does so in time linear in pattern lengths, length of T, and number of occurrences.
Let's introduce some terminology. The haystack is the large matrix we are searching in and the needle is the pattern-matrix. The haystack is a nxn matrix and the needle is a kxk matrix. The set of patterns that we are going to use in Aho-Corasick is the set of rows of the needle. This set contains at most k rows and will have fewer if there are duplicate rows.
We are going to build the Aho-Corasick automaton (which is a Trie augmented with failure links) and then run the search algorithm on each row of the haystack. So we take every row of the needle and search for it in every row of the haystack. We can use a linear time 1D matching algorithm to do this but that would still be inefficient. The advantage of Aho-Corasick is that it searches for all patterns at once.
During the search we are going to populate a matrix A which we are going to use later. When we search in the first row of the haystack, the first row of A is filled with the occurrences of rows of the needle in the first row of the haystack. So we'll end up with a first row of A that looks like 2 - 0 - - 1 for example. This means that row number 0 of the needle appears at position 2 in the first row of the haystack; row number 1 appears at position 5; row number 2 appears at position 0. The - entries are positions that did not get matched. Keep doing this for every row.
Let's for now assume that there are no duplicate rows in the needle. Assign 0 to the first row of the needle, 1 to the second, and so on. Now we are going to search for the pattern [0 1 2 ... k-1] in every column of the matrix A using a linear time 1D search algorithm (KMP for example). Recall that every row of A stores positions at which rows of the needle appear. So if a column contains the pattern [0 1 2 ... k-1], this means that row number 0 of the needle appears at some row of the haystack, row number 1 of the needle is just below it, and so on. This is exactly what we want. If there are duplicate rows, just assign a unique number to each unique row.
Search in column takes O(n) using a linear time algorithm. So searching all columns takes O(n*n). We populate the matrix during the search, we search every row of the haystack (there are n rows) and the search in a row takes O(n+k*k). So O(n(n+k*k)) overall.
So the idea was to find that matrix and then reduce the problem to 1D pattern matching. Aho-Corasick is just there for efficiency, I don't know if there is another efficient way to find the matrix.
EDIT: added implementation.
Here is my c++ implementation. The max value of n is set to 100 but you can change it.
The program starts by reading two integers n k (the dimensions of the matrices). Then it reads n lines each containing a string of 0's and 1's of length n. Then it reads k lines each containing a string of 0's and 1's of length k. The output is the upper-left coordinate of all matches. For the following input for example.
12 2
101110111011
111010111011
110110111011
101110111010
101110111010
101110111010
101110111010
111010111011
111010111011
111010111011
111010111011
111010111011
11
10
The program will output:
match at (2,0)
match at (1,1)
match at (0,2)
match at (6,2)
match at (2,10)
#include <cstdio>
#include <cstring>
#include <string>
#include <queue>
#include <iostream>
using namespace std;
const int N = 100;
const int M = N;
int n, m;
string haystack[N], needle[M];
int A[N][N]; /* filled by successive calls to match */
int p[N]; /* pattern to search for in columns of A */
struct Node
{ Node *a[2]; /* alphabet is binary */
Node *suff; /* pointer to node whose prefix = longest proper suffix of this node */
int flag;
Node()
{ a[0] = a[1] = 0;
suff = 0;
flag = -1;
}
};
void insert(Node *x, string s)
{ static int id = 0;
static int p_size = 0;
for(int i = 0; i < s.size(); i++)
{ char c = s[i];
if(x->a[c - '0'] == 0)
x->a[c - '0'] = new Node;
x = x->a[c - '0'];
}
if(x->flag == -1)
x->flag = id++;
/* update pattern */
p[p_size++] = x->flag;
}
Node *longest_suffix(Node *x, int c)
{ while(x->a[c] == 0)
x = x->suff;
return x->a[c];
}
Node *mk_automaton(void)
{ Node *trie = new Node;
for(int i = 0; i < m; i++)
{ insert(trie, needle[i]);
}
queue<Node*> q;
/* level 1 */
for(int i = 0; i < 2; i++)
{ if(trie->a[i])
{ trie->a[i]->suff = trie;
q.push(trie->a[i]);
}
else trie->a[i] = trie;
}
/* level > 1 */
while(q.empty() == false)
{ Node *x = q.front(); q.pop();
for(int i = 0; i < 2; i++)
{ if(x->a[i] == 0) continue;
x->a[i]->suff = longest_suffix(x->suff, i);
q.push(x->a[i]);
}
}
return trie;
}
/* search for patterns in haystack[j] */
void match(Node *x, int j)
{ for(int i = 0; i < n; i++)
{ x = longest_suffix(x, haystack[j][i] - '0');
if(x->flag != -1)
{ A[j][i-m+1] = x->flag;
}
}
}
int match2d(Node *x)
{ int matches = 0;
static int z[M+N];
static int z_str[M+N+1];
/* init */
memset(A, -1, sizeof(A));
/* fill the A matrix */
for(int i = 0; i < n; i++)
{ match(x, i);
}
/* build string for z algorithm */
z_str[n+m] = -2; /* acts like `\0` for strings */
for(int i = 0; i < m; i++)
{ z_str[i] = p[i];
}
for(int i = 0; i < n; i++)
{ /* search for pattern in column i */
for(int j = 0; j < n; j++)
{ z_str[j + m] = A[j][i];
}
/* run z algorithm */
int l, r;
l = r = 0;
z[0] = n + m;
for(int j = 1; j < n + m; j++)
{ if(j > r)
{ l = r = j;
while(z_str[r] == z_str[r - l]) r++;
z[j] = r - l;
r--;
}
else
{ if(z[j - l] < r - j + 1)
{ z[j] = z[j - l];
}
else
{ l = j;
while(z_str[r] == z_str[r - l]) r++;
z[j] = r - l;
r--;
}
}
}
/* locate matches */
for(int j = m; j < n + m; j++)
{ if(z[j] >= m)
{ printf("match at (%d,%d)\n", j - m, i);
matches++;
}
}
}
return matches;
}
int main(void)
{ cin >> n >> m;
for(int i = 0; i < n; i++)
{ cin >> haystack[i];
}
for(int i = 0; i < m; i++)
{ cin >> needle[i];
}
Node *trie = mk_automaton();
match2d(trie);
return 0;
}
Let's start with an O(N * N * K) solution. I will use the following notation: A is a pattern matrix, B is a big matrix(the one we will search for occurrences of the pattern in).
We can fix a top row of the B matrix(that is, we will search for all occurrence that start in a position (this row, any column). Let's call this row a topRow. Now we can take a slice of this matrix that contains [topRow; topRow + K) rows and all columns.
Let's create a new matrix as a result of concatenation A + column + the slice, where a column is a column with K elements that are not present in A or B(if A and B consist of 0 and 1, we can use -1, for instance). Now we can treat columns of this new matrix as letters and run the Knuth-Morris-Pratt's algorithm. Comparing two letters requires O(K) time, thus the time complexity of this step is O(N * K).
There are O(N) ways to fix the top row, so the total time complexity is O(N * N * K). It is already better than a brute-force solution, but we are not done yet. The theoretical lower bound is O(N * N)(I assume that N >= K), and I want to achieve it.
Let's take a look at what can be improved here. If we could compare two columns of a matrix in O(1) time instead of O(k), we would have achieved the desired time complexity. Let's concatenate all columns of both A and B inserting some separator after each column. Now we have a string and we need to compare its substrings(because columns and their parts are substrings now). Let's construct a suffix tree in linear time(using Ukkonnen's algorithm). Now comparing two substrings is all about finding the height of the lowest common ancestor(LCA) of two nodes in this tree. There is an algorithm that allows us to do it with linear preprocessing time and O(1) time per LCA query. It means that we can compare two substrings(or columns) in constant time! Thus, the total time complexity is O(N * N). There is another way to achieve this time complexity: we can build a suffix array in linear time and answer the longest common prefix queries in constant time(with a linear time preprocessing). However, both of this O(N * N) solutions look pretty hard to implement and they will have a big constant.
P.S If we have a polynomial hash function that we can fully trust(or we are fine with a few false positives), we can get a much simpler O(N * N) solution using 2-D polynomial hashes.
Related
Given heights of n towers and a value k. We need to either increase or decrease height of every tower by k (only once) where k > 0. The task is to minimize the difference between the heights of the longest and the shortest tower after modifications, and output this difference.
I get the intuition behind the solution but I can not comment on the correctness of the solution below.
// C++ program to find the minimum possible
// difference between maximum and minimum
// elements when we have to add/subtract
// every number by k
#include <bits/stdc++.h>
using namespace std;
// Modifies the array by subtracting/adding
// k to every element such that the difference
// between maximum and minimum is minimized
int getMinDiff(int arr[], int n, int k)
{
if (n == 1)
return 0;
// Sort all elements
sort(arr, arr+n);
// Initialize result
int ans = arr[n-1] - arr[0];
// Handle corner elements
int small = arr[0] + k;
int big = arr[n-1] - k;
if (small > big)
swap(small, big);
// Traverse middle elements
for (int i = 1; i < n-1; i ++)
{
int subtract = arr[i] - k;
int add = arr[i] + k;
// If both subtraction and addition
// do not change diff
if (subtract >= small || add <= big)
continue;
// Either subtraction causes a smaller
// number or addition causes a greater
// number. Update small or big using
// greedy approach (If big - subtract
// causes smaller diff, update small
// Else update big)
if (big - subtract <= add - small)
small = subtract;
else
big = add;
}
return min(ans, big - small);
}
// Driver function to test the above function
int main()
{
int arr[] = {4, 6};
int n = sizeof(arr)/sizeof(arr[0]);
int k = 10;
cout << "\nMaximum difference is "
<< getMinDiff(arr, n, k);
return 0;
}
Can anyone help me provide the correct solution to this problem?
The codes above work, however I don't find much explanation so I'll try to add some in order to help develop intuition.
For any given tower, you have two choices, you can either increase its height or decrease it.
Now if you decide to increase its height from say Hi to Hi + K, then you can also increase the height of all shorter towers as that won't affect the maximum. Similarly, if you decide to decrease the height of a tower from Hi to Hi − K, then you can also decrease the heights of all taller towers.
We will make use of this, we have n buildings, and we'll try to make each of the building the highest and see making which building the highest gives us the least range of heights(which is our answer). Let me explain:
So what we want to do is - 1) We first sort the array(you will soon see why).
2) Then for every building from i = 0 to n-2[1] , we try to make it the highest (by adding K to the building, adding K to the buildings on its left and subtracting K from the buildings on its right).
So say we're at building Hi, we've added K to it and the buildings before it and subtracted K from the buildings after it. So the minimum height of the buildings will now be min(H0 + K, Hi+1 - K), i.e. min(1st building + K, next building on right - K).
(Note: This is because we sorted the array. Convince yourself by taking a few examples.)
Likewise, the maximum height of the buildings will be max(Hi + K, Hn-1 - K), i.e. max(current building + K, last building on right - K).
3) max - min gives you the range.
[1]Note that when i = n-1. In this case, there is no building after the current building, so we're adding K to every building, so the range will merely be
height[n-1] - height[0] since K is added to everything, so it cancels out.
Here's a Java implementation based on the idea above:
class Solution {
int getMinDiff(int[] arr, int n, int k) {
Arrays.sort(arr);
int ans = arr[n-1] - arr[0];
int smallest = arr[0] + k, largest = arr[n-1]-k;
for(int i = 0; i < n-1; i++){
int min = Math.min(smallest, arr[i+1]-k);
int max = Math.max(largest, arr[i]+k);
if (min < 0) continue;
ans = Math.min(ans, max-min);
}
return ans;
}
}
int getMinDiff(int a[], int n, int k) {
sort(a,a+n);
int i,mx,mn,ans;
ans = a[n-1]-a[0]; // this can be one possible solution
for(i=0;i<n;i++)
{
if(a[i]>=k) // since height of tower can't be -ve so taking only +ve heights
{
mn = min(a[0]+k, a[i]-k);
mx = max(a[n-1]-k, a[i-1]+k);
ans = min(ans, mx-mn);
}
}
return ans;
}
This is C++ code, it passed all the test cases.
This python code might be of some help to you. Code is self explanatory.
def getMinDiff(arr, n, k):
arr = sorted(arr)
ans = arr[-1]-arr[0] #this case occurs when either we subtract k or add k to all elements of the array
for i in range(n):
mn=min(arr[0]+k, arr[i]-k) #after sorting, arr[0] is minimum. so adding k pushes it towards maximum. We subtract k from arr[i] to get any other worse (smaller) minimum. worse means increasing the diff b/w mn and mx
mx=max(arr[n-1]-k, arr[i]+k) # after sorting, arr[n-1] is maximum. so subtracting k pushes it towards minimum. We add k to arr[i] to get any other worse (bigger) maximum. worse means increasing the diff b/w mn and mx
ans = min(ans, mx-mn)
return ans
Here's a solution:-
But before jumping on to the solution, here's some info that is required to understand it. In the best case scenario, the minimum difference would be zero. This could happen only in two cases - (1) the array contain duplicates or (2) for an element, lets say 'x', there exists another element in the array which has the value 'x + 2*k'.
The idea is pretty simple.
First we would sort the array.
Next, we will try to find either the optimum value (for which the answer would come out to be zero) or at least the closest number to the optimum value using Binary Search
Here's a Javascript implementation of the algorithm:-
function minDiffTower(arr, k) {
arr = arr.sort((a,b) => a-b);
let minDiff = Infinity;
let prev = null;
for (let i=0; i<arr.length; i++) {
let el = arr[i];
// Handling case when the array have duplicates
if (el == prev) {
minDiff = 0;
break;
}
prev = el;
let targetNum = el + 2*k; // Lets say we have an element 10. The difference would be zero when there exists an element with value 10+2*k (this is the 'optimum value' as discussed in the explaination
let closestMatchDiff = Infinity; // It's not necessary that there would exist 'targetNum' in the array, so we try to find the closest to this number using Binary Search
let lb = i+1;
let ub = arr.length-1;
while (lb<=ub) {
let mid = lb + ((ub-lb)>>1);
let currMidDiff = arr[mid] > targetNum ? arr[mid] - targetNum : targetNum - arr[mid];
closestMatchDiff = Math.min(closestMatchDiff, currMidDiff);
if (arr[mid] == targetNum) break; // in this case the answer would be simply zero, no need to proceed further
else if (arr[mid] < targetNum) lb = mid+1;
else ub = mid-1;
}
minDiff = Math.min(minDiff, closestMatchDiff);
}
return minDiff;
}
Here is the C++ code, I have continued from where you left. The code is self-explanatory.
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int minDiff(int arr[], int n, int k)
{
// If the array has only one element.
if (n == 1)
{
return 0;
}
//sort all elements
sort(arr, arr + n);
//initialise result
int ans = arr[n - 1] - arr[0];
//Handle corner elements
int small = arr[0] + k;
int big = arr[n - 1] - k;
if (small > big)
{
// Swap the elements to keep the array sorted.
int temp = small;
small = big;
big = temp;
}
//traverse middle elements
for (int i = 0; i < n - 1; i++)
{
int subtract = arr[i] - k;
int add = arr[i] + k;
// If both subtraction and addition do not change the diff.
// Subtraction does not give new minimum.
// Addition does not give new maximum.
if (subtract >= small or add <= big)
{
continue;
}
// Either subtraction causes a smaller number or addition causes a greater number.
//Update small or big using greedy approach.
// if big-subtract causes smaller diff, update small Else update big
if (big - subtract <= add - small)
{
small = subtract;
}
else
{
big = add;
}
}
return min(ans, big - small);
}
int main(void)
{
int arr[] = {1, 5, 15, 10};
int n = sizeof(arr) / sizeof(arr[0]);
int k = 3;
cout << "\nMaximum difference is: " << minDiff(arr, n, k) << endl;
return 0;
}
class Solution {
public:
int getMinDiff(int arr[], int n, int k) {
sort(arr, arr+n);
int diff = arr[n-1]-arr[0];
int mine, maxe;
for(int i = 0; i < n; i++)
arr[i]+=k;
mine = arr[0];
maxe = arr[n-1]-2*k;
for(int i = n-1; i > 0; i--){
if(arr[i]-2*k < 0)
break;
mine = min(mine, arr[i]-2*k);
maxe = max(arr[i-1], arr[n-1]-2*k);
diff = min(diff, maxe-mine);
}
return diff;
}
};
class Solution:
def getMinDiff(self, arr, n, k):
# code here
arr.sort()
res = arr[-1]-arr[0]
for i in range(1, n):
if arr[i]>=k:
# at a time we can increase or decrease one number only.
# Hence assuming we decrease ith elem, we will increase i-1 th elem.
# using this we basically find which is new_min and new_max possible
# and if the difference is smaller than res, we return the same.
new_min = min(arr[0]+k, arr[i]-k)
new_max = max(arr[-1]-k, arr[i-1]+k)
res = min(res, new_max-new_min)
return res
Hexagonal grid is represented by a two-dimensional array with R rows and C columns. First row always comes "before" second in hexagonal grid construction (see image below). Let k be the number of turns. Each turn, an element of the grid is 1 if and only if the number of neighbours of that element that were 1 the turn before is an odd number. Write C++ code that outputs the grid after k turns.
Limitations:
1 <= R <= 10, 1 <= C <= 10, 1 <= k <= 2^(63) - 1
An example with input (in the first row are R, C and k, then comes the starting grid):
4 4 3
0 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0
Simulation: image, yellow elements represent '1' and blank represent '0'.
This problem is easy to solve if I simulate and produce a grid each turn, but with big enough k it becomes too slow. What is the faster solution?
EDIT: code (n and m are used instead R and C) :
#include <cstdio>
#include <cstring>
using namespace std;
int old[11][11];
int _new[11][11];
int n, m;
long long int k;
int main() {
scanf ("%d %d %lld", &n, &m, &k);
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) scanf ("%d", &old[i][j]);
}
printf ("\n");
while (k) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
int count = 0;
if (i % 2 == 0) {
if (i) {
if (j) count += old[i-1][j-1];
count += old[i-1][j];
}
if (j) count += (old[i][j-1]);
if (j < m-1) count += (old[i][j+1]);
if (i < n-1) {
if (j) count += old[i+1][j-1];
count += old[i+1][j];
}
}
else {
if (i) {
if (j < m-1) count += old[i-1][j+1];
count += old[i-1][j];
}
if (j) count += old[i][j-1];
if (j < m-1) count += old[i][j+1];
if (i < n-1) {
if (j < m-1) count += old[i+1][j+1];
count += old[i+1][j];
}
}
if (count % 2) _new[i][j] = 1;
else _new[i][j] = 0;
}
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) old[i][j] = _new[i][j];
}
k--;
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
printf ("%d", old[i][j]);
}
printf ("\n");
}
return 0;
}
For a given R and C, you have N=R*C cells.
If you represent those cells as a vector of elements in GF(2), i.e, 0s and 1s where arithmetic is performed mod 2 (addition is XOR and multiplication is AND), then the transformation from one turn to the next can be represented by an N*N matrix M, so that:
turn[i+1] = M*turn[i]
You can exponentiate the matrix to determine how the cells transform over k turns:
turn[i+k] = (M^k)*turn[i]
Even if k is very large, like 2^63-1, you can calculate M^k quickly using exponentiation by squaring: https://en.wikipedia.org/wiki/Exponentiation_by_squaring This only takes O(log(k)) matrix multiplications.
Then you can multiply your initial state by the matrix to get the output state.
From the limits on R, C, k, and time given in your question, it's clear that this is the solution you're supposed to come up with.
There are several ways to speed up your algorithm.
You do the neighbour-calculation with the out-of bounds checking in every turn. Do some preprocessing and calculate the neighbours of each cell once at the beginning. (Aziuth has already proposed that.)
Then you don't need to count the neighbours of all cells. Each cell is on if an odd number of neighbouring cells were on in the last turn and it is off otherwise.
You can think of this differently: Start with a clean board. For each active cell of the previous move, toggle the state of all surrounding cells. When an even number of neighbours cause a toggle, the cell is on, otherwise the toggles cancel each other out. Look at the first step of your example. It's like playing Lights Out, really.
This method is faster than counting the neighbours if the board has only few active cells and its worst case is a board whose cells are all on, in which case it is as good as neighbour-counting, because you have to touch each neighbours for each cell.
The next logical step is to represent the board as a sequence of bits, because bits already have a natural way of toggling, the exclusive or or xor oerator, ^. If you keep the list of neigbours for each cell as a bit mask m, you can then toggle the board b via b ^= m.
These are the improvements that can be made to the algorithm. The big improvement is to notice that the patterns will eventually repeat. (The toggling bears resemblance with Conway's Game of Life, where there are also repeating patterns.) Also, the given maximum number of possible iterations, 2⁶³ is suspiciously large.
The playing board is small. The example in your question will repeat at least after 2¹⁶ turns, because the 4×4 board can have at most 2¹⁶ layouts. In practice, turn 127 reaches the ring pattern of the first move after the original and it loops with a period of 126 from then.
The bigger boards may have up to 2¹⁰⁰ layouts, so they may not repeat within 2⁶³ turns. A 10×10 board with a single active cell near the middle has ar period of 2,162,622. This may indeed be a topic for a maths study, as Aziuth suggests, but we'll tacke it with profane means: Keep a hash map of all previous states and the turns where they occurred, then check whether the pattern has occurred before in each turn.
We now have:
a simple algorithm for toggling the cells' state and
a compact bitwise representation of the board, which allows us to create a hash map of the previous states.
Here's my attempt:
#include <iostream>
#include <map>
/*
* Bit representation of a playing board, at most 10 x 10
*/
struct Grid {
unsigned char data[16];
Grid() : data() {
}
void add(size_t i, size_t j) {
size_t k = 10 * i + j;
data[k / 8] |= 1u << (k % 8);
}
void flip(const Grid &mask) {
size_t n = 13;
while (n--) data[n] ^= mask.data[n];
}
bool ison(size_t i, size_t j) const {
size_t k = 10 * i + j;
return ((data[k / 8] & (1u << (k % 8))) != 0);
}
bool operator<(const Grid &other) const {
size_t n = 13;
while (n--) {
if (data[n] > other.data[n]) return true;
if (data[n] < other.data[n]) return false;
}
return false;
}
void dump(size_t n, size_t m) const {
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
std::cout << '\n';
}
};
int main()
{
size_t n, m, k;
std::cin >> n >> m >> k;
Grid grid;
Grid mask[10][10];
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
int x;
std::cin >> x;
if (x) grid.add(i, j);
}
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
Grid &mm = mask[i][j];
if (i % 2 == 0) {
if (i) {
if (j) mm.add(i - 1, j - 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j) mm.add(i + 1, j - 1);
mm.add(i + 1, j);
}
} else {
if (i) {
if (j < m - 1) mm.add(i - 1, j + 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j < m - 1) mm.add(i + 1, j + 1);
mm.add(i + 1, j);
}
}
}
}
std::map<Grid, size_t> prev;
std::map<size_t, Grid> pattern;
for (size_t turn = 0; turn < k; turn++) {
Grid next;
std::map<Grid, size_t>::const_iterator it = prev.find(grid);
if (1 && it != prev.end()) {
size_t start = it->second;
size_t period = turn - start;
size_t index = (k - turn) % period;
grid = pattern[start + index];
break;
}
prev[grid] = turn;
pattern[turn] = grid;
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
if (grid.ison(i, j)) next.flip(mask[i][j]);
}
}
grid = next;
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (grid.ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
return 0;
}
There is probably room for improvement. Especially, I'm not so sure how it fares for big boards. (The code above uses an ordered map. We don't need the order, so using an unordered map will yield faster code. The example above with a single active cell on a 10×10 board took significantly longer than a second with an ordered map.)
Not sure about how you did it - and you should really always post code here - but let's try to optimize things here.
First of all, there is not really a difference between that and a quadratic grid. Different neighbor relationships, but I mean, that is just a small translation function. If you have a problem there, we should treat this separately, maybe on CodeReview.
Now, the naive solution is:
for all fields
count neighbors
if odd: add a marker to update to one, else to zero
for all fields
update all fields by marker of former step
this is obviously in O(N). Iterating twice is somewhat twice the actual run time, but should not be that bad. Try not to allocate space every time that you do that but reuse existing structures.
I'd propose this solution:
at the start:
create a std::vector or std::list "activated" of pointers to all fields that are activated
each iteration:
create a vector "new_activated"
for all items in activated
count neighbors, if odd add to new_activated
for all items in activated
set to inactive
replace activated by new_activated*
for all items in activated
set to active
*this can be done efficiently by putting them in a smart pointer and use move semantics
This code only works on the activated fields. As long as they stay within some smaller area, this is far more efficient. However, I have no idea when this changes - if there are activated fields all over the place, this might be less efficient. In that case, the naive solution might be the best one.
EDIT: after you now posted your code... your code is quite procedural. This is C++, use classes and use representation of things. Probably you do the search for neighbors right, but you can easily make mistakes there and therefore should isolate that part in a function, or better method. Raw arrays are bad and variables like n or k are bad. But before I start tearing your code apart, I instead repeat my recommendation, put the code on CodeReview, having people tear it apart until it is perfect.
This started off as a comment, but I think it could be helpful as an answer in addition to what has already been stated.
You stated the following limitations:
1 <= R <= 10, 1 <= C <= 10
Given these restrictions, I'll take the liberty to can represent the grid/matrix M of R rows and C columns in constant space (i.e. O(1)), and also check its elements in O(1) instead of O(R*C) time, thus removing this part from our time-complexity analysis.
That is, the grid can simply be declared as bool grid[10][10];.
The key input is the large number of turns k, stated to be in the range:
1 <= k <= 2^(63) - 1
The problem is that, AFAIK, you're required to perform k turns. This makes the algorithm be in O(k). Thus, no proposed solution can do better than O(k)[1].
To improve the speed in a meaningful way, this upper-bound must be lowered in some way[1], but it looks like this cannot be done without altering the problem constraints.
Thus, no proposed solution can do better than O(k)[1].
The fact that k can be so large is the main issue. The most anyone can do is improve the rest of the implementation, but this will only improve by a constant factor; you'll have to go through k turns regardless of how you look at it.
Therefore, unless some clever fact and/or detail is found that allows this bound to be lowered, there's no other choice.
[1] For example, it's not like trying to determine if some number n is prime, where you can check all numbers in the range(2, n) to see if they divide n, making it a O(n) process, or notice that some improvements include only looking at odd numbers after checking n is not even (constant factor; still O(n)), and then checking odd numbers only up to √n, i.e., in the range(3, √n, 2), which meaningfully lowers the upper-bound down to O(√n).
I have worked out a O(n square) solution to the problem. I was wondering about a better solution to this. (this is not a homework/interview problem but something I do out of my own interest, hence sharing here):
If a=1, b=2, c=3,….z=26. Given a string, find all possible codes that string
can generate. example: "1123" shall give:
aabc //a = 1, a = 1, b = 2, c = 3
kbc // since k is 11, b = 2, c= 3
alc // a = 1, l = 12, c = 3
aaw // a= 1, a =1, w= 23
kw // k = 11, w = 23
Here is my code to the problem:
void alpha(int* a, int sz, vector<vector<int>>& strings) {
for (int i = sz - 1; i >= 0; i--) {
if (i == sz - 1) {
vector<int> t;
t.push_back(a[i]);
strings.push_back(t);
} else {
int k = strings.size();
for (int j = 0; j < k; j++) {
vector<int> t = strings[j];
strings[j].insert(strings[j].begin(), a[i]);
if (t[0] < 10) {
int n = a[i] * 10 + t[0];
if (n <= 26) {
t[0] = n;
strings.push_back(t);
}
}
}
}
}
}
Essentially the vector strings will hold the sets of numbers.
This would run in n square. I am trying my head around at least an nlogn solution.
Intuitively tree should help here, but not getting anywhere post that.
Generally, your problem complexity is more like 2^n, not n^2, since your k can increase with every iteration.
This is an alternative recursive solution (note: recursion is bad for very long codes). I didn't focus on optimization, since I'm not up to date with C++X, but I think the recursive solution could be optimized with some moves.
Recursion also makes the complexity a bit more obvious compared to the iterative solution.
// Add the front element to each trailing code sequence. Create a new sequence if none exists
void update_helper(int front, std::vector<std::deque<int>>& intermediate)
{
if (intermediate.empty())
{
intermediate.push_back(std::deque<int>());
}
for (size_t i = 0; i < intermediate.size(); i++)
{
intermediate[i].push_front(front);
}
}
std::vector<std::deque<int>> decode(int digits[], int count)
{
if (count <= 0)
{
return std::vector<std::deque<int>>();
}
std::vector<std::deque<int>> result1 = decode(digits + 1, count - 1);
update_helper(*digits, result1);
if (count > 1 && (digits[0] * 10 + digits[1]) <= 26)
{
std::vector<std::deque<int>> result2 = decode(digits + 2, count - 2);
update_helper(digits[0] * 10 + digits[1], result2);
result1.insert(result1.end(), result2.begin(), result2.end());
}
return result1;
}
Call:
std::vector<std::deque<int>> strings = decode(codes, size);
Edit:
Regarding the complexity of the original code, I'll try to show what would happen in the worst case scenario, where the code sequence consists only of 1 and 2 values.
void alpha(int* a, int sz, vector<vector<int>>& strings)
{
for (int i = sz - 1;
i >= 0;
i--)
{
if (i == sz - 1)
{
vector<int> t;
t.push_back(a[i]);
strings.push_back(t); // strings.size+1
} // if summary: O(1), ignoring capacity change, strings.size+1
else
{
int k = strings.size();
for (int j = 0; j < k; j++)
{
vector<int> t = strings[j]; // O(strings[j].size) vector copy operation
strings[j].insert(strings[j].begin(), a[i]); // strings[j].size+1
// note: strings[j].insert treated as O(1) because other containers could do better than vector
if (t[0] < 10)
{
int n = a[i] * 10 + t[0];
if (n <= 26)
{
t[0] = n;
strings.push_back(t); // strings.size+1
// O(1), ignoring capacity change and copy operation
} // if summary: O(1), strings.size+1
} // if summary: O(1), ignoring capacity change, strings.size+1
} // for summary: O(k * strings[j].size), strings.size+k, strings[j].size+1
} // else summary: O(k * strings[j].size), strings.size+k, strings[j].size+1
} // for summary: O(sum[i from 1 to sz] of (k * strings[j].size))
// k (same as string.size) doubles each iteration => k ends near 2^sz
// string[j].size increases by 1 each iteration
// k * strings[j].size increases by ?? each iteration (its getting huge)
}
Maybe I made a mistake somewhere and if we want to play nice we can treat a vector copy as O(1) instead of O(n) in order to reduce complexity, but the hard fact remains, that the worst case is doubling outer vector size in each iteration (at least every 2nd iteration, considering the exact structure of the if conditions) of the inner loop and the inner loop depends on that growing vector size, which makes the whole story at least O(2^n).
Edit2:
I figured out the result complexity (the best hypothetical algoritm still needs to create every element of the result, so result complexity is like a lower bound to what any algorithm can archieve)
Its actually following the Fibonacci numbers:
For worst case input (like only 1s) of size N+2 you have:
size N has k(N) elements
size N+1 has k(N+1) elements
size N+2 is the combination of codes starting with a followed by the combinations from size N+1 (a takes one element of the source) and the codes starting with k, followed by the combinations from size N (k takes two elements of the source)
size N+2 has k(N) + k(N+1) elements
Starting with size 1 => 1 (a) and size 2 => 2 (aa or k)
Result: still exponential growth ;)
Edit3:
Worked out a dynamic programming solution, somewhat similar to your approach with reverse iteration over the code array and kindof optimized in its vector usage, based on the properties explained in Edit2.
The inner loop (update_helper) is still dominated by the count of results (worst case Fibonacci) and a few outer loop iterations will have a decent count of sub-results, but at least the sub-results are reduced to a pointer to some intermediate node, so copying should be pretty efficient. As a little bonus, I switched the result from numbers to characters.
Another edit: updated code with range 0 - 25 as 'a' - 'z', fixed some errors that led to wrong results.
struct const_node
{
const_node(char content, const_node* next)
: next(next), content(content)
{
}
const_node* const next;
const char content;
};
// put front in front of each existing sub-result
void update_helper(int front, std::vector<const_node*>& intermediate)
{
for (size_t i = 0; i < intermediate.size(); i++)
{
intermediate[i] = new const_node(front + 'a', intermediate[i]);
}
if (intermediate.empty())
{
intermediate.push_back(new const_node(front + 'a', NULL));
}
}
std::vector<const_node*> decode_it(int digits[9], size_t count)
{
int current = 0;
std::vector<const_node*> intermediates[3];
for (size_t i = 0; i < count; i++)
{
current = (current + 1) % 3;
int prev = (current + 2) % 3; // -1
int prevprev = (current + 1) % 3; // -2
size_t index = count - i - 1; // invert direction
// copy from prev
intermediates[current] = intermediates[prev];
// update current (part 1)
update_helper(digits[index], intermediates[current]);
if (index + 1 < count && digits[index] &&
digits[index] * 10 + digits[index + 1] < 26)
{
// update prevprev
update_helper(digits[index] * 10 + digits[index + 1], intermediates[prevprev]);
// add to current (part 2)
intermediates[current].insert(intermediates[current].end(), intermediates[prevprev].begin(), intermediates[prevprev].end());
}
}
return intermediates[current];
}
void cleanupDelete(std::vector<const_node*>& nodes);
int main()
{
int code[] = { 1, 2, 3, 1, 2, 3, 1, 2, 3 };
int size = sizeof(code) / sizeof(int);
std::vector<const_node*> result = decode_it(code, size);
// output
for (size_t i = 0; i < result.size(); i++)
{
std::cout.width(3);
std::cout.flags(std::ios::right);
std::cout << i << ": ";
const_node* item = result[i];
while (item)
{
std::cout << item->content;
item = item->next;
}
std::cout << std::endl;
}
cleanupDelete(result);
}
void fillCleanup(const_node* n, std::set<const_node*>& all_nodes)
{
if (n)
{
all_nodes.insert(n);
fillCleanup(n->next, all_nodes);
}
}
void cleanupDelete(std::vector<const_node*>& nodes)
{
// this is like multiple inverse trees, hard to delete correctly, since multiple next pointers refer to the same target
std::set<const_node*> all_nodes;
for each (auto var in nodes)
{
fillCleanup(var, all_nodes);
}
nodes.clear();
for each (auto var in all_nodes)
{
delete var;
}
all_nodes.clear();
}
A drawback of the dynamically reused structure is the cleanup, since you wanna be careful to delete each node only once.
I want to create a program to search for a subtext in a text.
For example, I have this text: abcdeabbdfeg
And in that text I want to find: cd
But I want to use this algorithm:
start = 1
end = string length of the text
middle = (start + end) / 2
if (pattern < text[middle]) end = mid - 1;
if (pattern > text[middle]) start = mid + 1;
...and continue until the pattern is found in the text
So, I already have a simple program that completely works without any problem but without that algorithm above, so now I only want to implement that algorithm above in my program, I have tried many ways, but my program won't show anything in any case, after I add that algorithm...
This is the code that I have and works:
void search(char *pat, char *txt)
{
int M = strlen(pat);
int N = strlen(txt);
for (int i = 0; i <= N - M; i++)
{
int j;
for (j = 0; j < M; j++)
{
if (txt[i+j] != pat[j])
break;
}
if (j == M)
{
printf("Pattern found at index %d \n", i);
}
}
}
And this is the code above with the implementation of the algorithm:
int _tmain(int argc, _TCHAR* argv[])
{
char t[32];
cout << "Please enter your text (t):";
cin >> t;
char p[32];
cout << "Please enter the pattern (p) you wish to look for in that text (t):";
cin >> p;
int start, end = 0;
double middle = 0;
start = 1;
end = strlen(t);
while (start <= end)
{
int M = strlen(p);
int N = strlen(t);
middle = std::ceil((start + end) / 2.0);
int mid = (int)middle;
for (int i = mid; i <= M; i++)
{
int j;
for (j = 0; j < M; j++)
{
if (t[mid] != p[j]) break;
if (p[j] < t[mid]) { end = mid - 1; }
else if (p[j] > t[mid]) { start = mid + 1; }
}
if (j == M)
{
printf("Pattern found at index %d \n", i);
}
}
}
if (start > end) cout << "Search has ended: pattern p does not occur in the text." << endl;
return 0;
}
Your algorithm is still a binary search. You split the array into two partitions, then select a partition, based on the value of a letter.
The requirements of a partitioned search is to have an ordered collection.
Let's use your example.
0 1 2 3 4 5 6 7 8 9 10 11
+---+---+---+---+---+---+---+---+---+---+---+---+
| a | b | c | d | e | a | b | b | d | f | e | g |
+---+---+---+---+---+---+---+---+---+---+---+---+
If you choose the midpoint at index 5, this yields the letter a. Since you are searching for the letter c first, then d, the algorithm says that the letter c must lie in the partition 6..11. Thus the basis of your issue.
The algorithm will not find cd because there is no c in the partition 6..11.
The algorithm assumes that the array is sorted and that for any given index, there will be one partition containing values less than array[index] and one partition containing values greater than array[index].
This assumption is demonstrated by the following code of yours:
if (p[j] < t[mid]) { end = mid - 1; }
else if (p[j] > t[mid]) { start = mid + 1; }
No matter how you name your algorithm, if it assumes an ordering on the array (e.g. p[j] < t[mid]), the array must be ordered.
You data is not ordered, so your algorithm fails the assumption, and thus the algorithm fails.
Edit 1:
Using Partitions
If you really must use a partitioning algorithm, you will need to build a set of partitions.
For example one partition starts at index 0 and proceeds until array[i] > array[i+1], this ends up at index 4. The other partition is 5..11.
(By the way, by determining the partitions, you have used more operations than a linear search.)
At this point, how do you know which partition to choose?
You don't. The letter c, that you are searching for, lies between a and e in the first partition; and a through g in the second partition. Pick a partition. If not found in the partition, you will have to search the other partition.
By performing a binary search on either partition, you have used more operations than a linear search.
I have a sorted std::vector<int> and I would like to find the longest 'streak of consecutive numbers' in this vector and then return both the length of it and the smallest number in the streak.
To visualize it for you :
suppose we have :
1 3 4 5 6 8 9
I would like it to return: maxStreakLength = 4 and streakBase = 3
There might be occasion where there will be 2 streaks and we have to choose which one is longer.
What is the best (fastest) way to do this ? I have tried to implement this but I have problems with coping with more than one streak in the vector. Should I use temporary vectors and then compare their lengths?
No you can do this in one pass through the vector and only storing the longest start point and length found so far. You also need much fewer than 'N' comparisons. *
hint: If you already have say a 4 long match ending at the 5th position (=6) and which position do you have to check next?
[*] left as exercise to the reader to work out what's the likely O( ) complexity ;-)
It would be interesting to see if the fact that the array is sorted can be exploited somehow to improve the algorithm. The first thing that comes to mind is this: if you know that all numbers in the input array are unique, then for a range of elements [i, j] in the array, you can immediately tell whether elements in that range are consecutive or not, without actually looking through the range. If this relation holds
array[j] - array[i] == j - i
then you can immediately say that elements in that range are consecutive. This criterion, obviously, uses the fact that the array is sorted and that the numbers don't repeat.
Now, we just need to develop an algorithm which will take advantage of that criterion. Here's one possible recursive approach:
Input of recursive step is the range of elements [i, j]. Initially it is [0, n-1] - the whole array.
Apply the above criterion to range [i, j]. If the range turns out to be consecutive, there's no need to subdivide it further. Send the range to output (see below for further details).
Otherwise (if the range is not consecutive), divide it into two equal parts [i, m] and [m+1, j].
Recursively invoke the algorithm on the lower part ([i, m]) and then on the upper part ([m+1, j]).
The above algorithm will perform binary partition of the array and recursive descent of the partition tree using the left-first approach. This means that this algorithm will find adjacent subranges with consecutive elements in left-to-right order. All you need to do is to join the adjacent subranges together. When you receive a subrange [i, j] that was "sent to output" at step 2, you have to concatenate it with previously received subranges, if they are indeed consecutive. Or you have to start a new range, if they are not consecutive. All the while you have keep track of the "longest consecutive range" found so far.
That's it.
The benefit of this algorithm is that it detects subranges of consecutive elements "early", without looking inside these subranges. Obviously, it's worst case performance (if ther are no consecutive subranges at all) is still O(n). In the best case, when the entire input array is consecutive, this algorithm will detect it instantly. (I'm still working on a meaningful O estimation for this algorithm.)
The usability of this algorithm is, again, undermined by the uniqueness requirement. I don't know whether it is something that is "given" in your case.
Anyway, here's a possible C++ implementation
typedef std::vector<int> vint;
typedef std::pair<vint::size_type, vint::size_type> range;
class longest_sequence
{
public:
const range& operator ()(const vint &v)
{
current = max = range(0, 0);
process_subrange(v, 0, v.size() - 1);
check_record();
return max;
}
private:
range current, max;
void process_subrange(const vint &v, vint::size_type i, vint::size_type j);
void check_record();
};
void longest_sequence::process_subrange(const vint &v,
vint::size_type i, vint::size_type j)
{
assert(i <= j && v[i] <= v[j]);
assert(i == 0 || i == current.second + 1);
if (v[j] - v[i] == j - i)
{ // Consecutive subrange found
assert(v[current.second] <= v[i]);
if (i == 0 || v[i] == v[current.second] + 1)
// Append to the current range
current.second = j;
else
{ // Range finished
// Check against the record
check_record();
// Start a new range
current = range(i, j);
}
}
else
{ // Subdivision and recursive calls
assert(i < j);
vint::size_type m = (i + j) / 2;
process_subrange(v, i, m);
process_subrange(v, m + 1, j);
}
}
void longest_sequence::check_record()
{
assert(current.second >= current.first);
if (current.second - current.first > max.second - max.first)
// We have a new record
max = current;
}
int main()
{
int a[] = { 1, 3, 4, 5, 6, 8, 9 };
std::vector<int> v(a, a + sizeof a / sizeof *a);
range r = longest_sequence()(v);
return 0;
}
I believe that this should do it?
size_t beginStreak = 0;
size_t streakLen = 1;
size_t longest = 0;
size_t longestStart = 0;
for (size_t i=1; i < len.size(); i++) {
if (vec[i] == vec[i-1] + 1) {
streakLen++;
}
else {
if (streakLen > longest) {
longest = streakLen;
longestStart = beginStreak;
}
beginStreak = i;
streakLen = 1;
}
}
if (streakLen > longest) {
longest = streakLen;
longestStart = beginStreak;
}
You can't solve this problem in less than O(N) time. Imagine your list is the first N-1 even numbers, plus a single odd number (chosen from among the first N-1 odd numbers). Then there is a single streak of length 3 somewhere in the list, but worst case you need to scan the entire list to find it. Even on average you'll need to examine at least half of the list to find it.
Similar to Rodrigo's solutions but solving your example as well:
#include <vector>
#include <cstdio>
#define len(x) sizeof(x) / sizeof(x[0])
using namespace std;
int nums[] = {1,3,4,5,6,8,9};
int streakBase = nums[0];
int maxStreakLength = 1;
void updateStreak(int currentStreakLength, int currentStreakBase) {
if (currentStreakLength > maxStreakLength) {
maxStreakLength = currentStreakLength;
streakBase = currentStreakBase;
}
}
int main(void) {
vector<int> v;
for(size_t i=0; i < len(nums); ++i)
v.push_back(nums[i]);
int lastBase = v[0], currentStreakBase = v[0], currentStreakLength = 1;
for(size_t i=1; i < v.size(); ++i) {
if (v[i] == lastBase + 1) {
currentStreakLength++;
lastBase = v[i];
} else {
updateStreak(currentStreakLength, currentStreakBase);
currentStreakBase = v[i];
lastBase = v[i];
currentStreakLength = 1;
}
}
updateStreak(currentStreakLength, currentStreakBase);
printf("maxStreakLength = %d and streakBase = %d\n", maxStreakLength, streakBase);
return 0;
}