unsorted matrix search algorithm - c++

is there a suitable algorithm that allows a program to search through an unsorted matrix in search of the biggest prime number within. The matrix is of size m*n and may be populated with other prime numbers and non-primes. The search must find the biggest prime.
I have studied the divide and conquer algorithms, and binary trees, and step-wise searches, but all of these deal with sorted matrices.

First of all, it doesn't matter if you are using m * n matrix or vector with m * n elements. Generally speaking, you will have to visit each matrix element at least once, as it is not sorted. There are few hints to make process faster.
If it is big matrix, you should visit elements row by row (and not column by column) as matrix is stored that way in memory so that elements from the same row will likely be in the cache once you access one of them.
Testing number's primeness is the most costly part of your task so if numbers in matrix are not too big, you can use Eratosthenes' sieve algorithm to make lookup of prime numbers in advance. https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
If you don't use Eratosthenes' sieve, maybe it will be beneficial if you sort your numbers before algorithm so that you can test numbers from the greatest to the smallest. In that case, your algorithm can stop once the first prime number is found. If you don't sort it, you will have to test all numbers, which is probably slowest method.

You could do this:
for (int i = 0; i < m; m++)
{
for (int j = 0; j < n; j++)
{
if ((array[i][j] == *a prime number*)
&& (array[i][j] > biggestPrime))
{
biggestPrime = array[i][j];
}
}
}

Related

Big-O time complexity of 4sum problem. Brute force approach

I am doing this problem called 4sum. Link to problem.
Given an array of n numbers and an integer(this is the target
element). Calculate the total number of quads (collection of 4
distinct numbers chosen from these n numbers) are there for which the
sum of quad elements will add up to the target element.
I wrote this code for brute force approach. According to me the big-o time complexity comes out to be --- n^4 log (n^4). I have given the reason below. Although the complexity should be n^4 only. Please help me understand what am i missing.
set<vector<int>>s;
for (int i = 0; i < n; i++) {
for (int j = i + 1; j < n; j++) {
for (int k = j + 1; k < n; k++) {
for(int l = k + 1; l < n; l++) {
if (a[i]+a[j]+a[k]+a[l]==target) {
s.insert({a[i], a[j], a[k], a[l]});
}
}
}
}
}
Logic is to generate all possible quads (with distinct elements), then for each quad check whether the sum of quad elements is equal to the target or not. If yes, then insert the quad in the set.
Now, we cannot know how many quads will match this condition because this solely depends on input. But to get the upper bound we assume that every quad that we check satisfies the condition. Hence, there are a total of N^4 insertions in the set.
For N^4 insertions --- complexity is N^4 log(N^4).
if (a[i]+a[j]+a[k]+a[l]==target) {
s.insert({a[i], a[j], a[k], a[l]});
}
This gets executed O(N^4) times, indeed.
to get the upper bound we assume that every quad that we check satisfies the condition.
Correct.
For N^4 insertions --- complexity is N^4 log(N^4).
Not so, because N^4 insertions do not necessarily result in a set with N^4 elements.
The cost of an insertion is O(log(s.size()). But s.size() is upper-bound by the number of distinct ways K in which target can be expressed as a sum of 4 integers in a given range, so the worst case cost is O(log(K)). While K can be a large number, it does not depend on N, so as far as the complexity in N is concerned, this counts as constant time O(1), and therefore the overall complexity is still O(N^4)·O(1) = O(N^4).
[ EDIT ]   Regarding #MysteriousUser's suggestion to use std::unordered_set instead of std::set, that would change the O(1) constant of the loop body to a better one, indeed, but would not change the overall complexity, which would still be O(N^4).
The other option which would in fact increase the complexity to O(N^4 log(N)) as proposed by the OP would be std::multi_set, since in that case each insertion would result in a size increase of the multiset.

Find 101 in prime numbers

I'm currently solving a problem which is pretty straightforward: I need to find all prime numbers up to N which contain 101 in them and count them.
Say if N is 1000 then the output should ne 6 since there is 101, 1013, 1015, 5101, 6101 and 8101. I used sieve's algorithm to get all of the prime numbers up to N, though I don't know how could I solve it completely. I thought of std::find, but resigned from that idea because the time complexity grows fast. I know I need to modify sieve's algorithm to fit my needs though I can't find any patterns.
Any help would be appreciated.
Edit:
I'm using this algorithm:
vector<int> sieve;
vector<int> primes;
for (int i = 1; i < max + 1; ++i)
sieve.push_back(i); // you'll learn more efficient ways to handle this later
sieve[0]=0;
for (int i = 2; i < max + 1; ++i) { // there are lots of brace styles, this is mine
if (sieve[i-1] != 0) {
primes.push_back(sieve[i-1]);
for (int j = 2 * sieve[i-1]; j < max + 1; j += sieve[i-1]) {
sieve[j-1] = 0;
}
}
}
Yes, checking every prime number for containing "101" is a waste of time. Generating all numbers containing 101 and checking whether they are prime is probably faster.
For generating the 101 numbers, let's look at the possible digit patterns, e.g. with 5 digits:
nn101
n101m
101mm
For each of these patterns, you get all the numbers by iterating n in an outer loop and m in an inner loop and doing the math to get the pattern's value (of course, you need not consider even m values, because the only even prime is 2). You are done when the values reaches N.
To check for being a prime, an easy way is to prepare a list of all primes up to M=sqrt(N), using a sieve if you like, and check whether your value is divisible by one of them.
This should be running in O(N^1.5). Why? The number of patterns grows with O(logN), the iterations inside each pattern with N/1000, giving O(N), the prime check with the number of primes up to M, being O(M/log(M)), finding these primes with a sieve being O(M^2). Altogether that's O(N * log(N) * sqrt(N) / log(sqrt(N)) + N) or O(N^1.5).
You don't need to modify your prime generation algorithm. When processing the primes you get from your gen alg you just have to check if the prime satisfies your condition:
e.g.
// p is the prime number
if (contains(p))
{
// print or write to file or whatever
}
with:
bool contains(int);
a function which checks your condition

What is fastest way to find a prime number in range?

I have this code to find prime numbers:
void writePrimesToFile(int begin, int end, ofstream& file)
{
bool isPrime = 0;
for (int i = begin; i < end; i = i+2)
{
isPrime = 1;
for (int j = 2; j<i; j++)
if (i % j == 0)
{
isPrime = 0;
break;
}
if (isPrime)
file << i << " \n";
}
}
Is there a faster way to do it?
I tried googling a faster way but its all math and I don't understand how can I turn it into code.
Is there a faster way to do it?
Yes. There are faster primality test algorithms.
What is fastest way to find a prime number in range?
No one knows. If some one knows, then that person is guarding a massively important secret. No one has been able to prove that any of the known techniques is the fastest possible way to test primality.
You might have asked: What is the fastest known way to find a prime number in range.
The answer to that would be: It depends. The complexity of some algorithms grow asymptotically slower than that of other algorithms, but that is irrelevant if the input numbers are small. There are probabilistic methods that are very fast for some numbers, but have problematic cases where they are slower than deterministic methods.
Your input numbers are small, because they are of type int and therefore have quite limited range. With small numbers, a simple algorithm may be faster than a more complex one. To find out which algorithm is fastest for your use case, you must benchmark them.
I recommend starting with Sieve of Eratosthenes since it is asymptotically faster than your naïve approach, but also easy to implement (pseudo code courtesy of wikipedia):
Input: an integer n > 1
Let A be an array of Boolean values, indexed by integers 2 to n,
initially all set to true.
for i = 2, 3, 4, ..., not exceeding √n:
if A[i] is true:
for j = i², i²+i, i²+2i, i²+3i, ..., not exceeding n :
A[j] := false
Output: all i such that A[i] is true.

Optimize counting sort?

Given that the input will be N numbers from 0 to N (with duplicates) how I can optimize the code bellow for both small and big arrays:
void countingsort(int* input, int array_size)
{
int max_element = array_size;//because no number will be > N
int *CountArr = new int[max_element+1]();
for (int i = 0; i < array_size; i++)
CountArr[input[i]]++;
for (int j = 0, outputindex = 0; j <= max_element; j++)
while (CountArr[j]--)
input[outputindex++] = j;
delete []CountArr;
}
Having a stable sort is not a requirement.
edit: In case it's not clear, I am talking about optimizing the algorithm.
IMHO there's nothing wrong here. I highly recommend this approach when max_element is small, numbers sorted are non sparse (i.e. consecutive and no gaps) and greater than or equal to zero.
A small tweak, I'd replace new / delete and just declare a finite array using heap, e.g. 256 for max_element.
int CountArr[256] = { }; // Declare and initialize with zeroes
As you bend these rules, i.e. sparse, negative numbers you'd be struggling with this approach. You will need to find an optimal hashing function to remap the numbers to your efficient array. The more complex the hashing becomes the benefit between this over well established sorting algorithms diminishes.
In terms of complexity this cannot be beaten. It's O(N) and beats standard O(NlogN) sorting by exploiting the extra knowledge that 0<x<N. You cannot go below O(N) because you need at least to swipe through the input array once.

Graph traversal of n steps

Given a simple undirected graph like this:
Starting in D, A, B or C (V_start)—I have to calculate how many possible paths there are from the starting point (V_start) to the starting point (V_start) of n steps, where each edge and vertex can be visited an unlimited amount of times.
I was thinking of doing a depth first search, stopping when steps > n || (steps == n && vertex != V_start), however, this becomes rather expensive if, for instance, n = 1000000. My next thought led me to combining DFS with dynamic programming, however, this is where I'm stuck.
(This is not homework, just me getting stuck playing around with graphs and algorithms for the sake of learning.)
How would I go about solving this in a reasonable time with a large n?
This task is solved by matrix multiplication.
Create matrix nxn containing 0s and 1s (1 for a cell mat[i][j] if there is path from i to j). Multiply this matrix by itself k times (consider using fast matrix exponentiation). Then in the matrix's cell mat[i][j] you have the number of paths with length k starting from i and ending in j.
NOTE: The fast matrix exponentiation is basically the same as the fast exponentiation, just that instead you multiply numbers you multiply matrices.
NOTE2: Lets assume n is the number of vertices in the graph. Then the algorithm I propose here runs in time complexity O(log k * n3) and has memory complexity of O(n 2). You can improve it a bit more if you use optimized matrix multiplication as described here. Then the time complexity will become O(log k * nlog27).
EDIT As requested by Antoine I include an explanation why this algorithm actually works:
I will prove the algorithm by induction. The base of the induction is obvious: initially I have in the matrix the number of paths of length 1.
Let us assume that until the power of k if I raise the matrix to the power of k I have in mat[i][j] the number of paths with length k between i and j.
Now lets consider the next step k + 1. It is obvious that every path of length k + 1 consists of prefix of length k and one more edge. This basically means that the paths of length k + 1 can be calculated by (here I denote by mat_pow_k the matrix raised to the kth power)
num_paths(x, y, k + 1) = sumi=0i<n mat_pow_k[x][i] * mat[i][y]
Again: n is the number of vertices in the graph. This might take a while to understand, but basically the initial matrix has 1 in its mat[i][y] cell only if there is direct edge between x and y. And we count all possible prefixes of such edge to form path of length k + 1.
However the last thing I wrote is actually calculating the k + 1st power of mat, which proves the step of the induction and my statement.
It's quite like a dynamic programming problem:
define a f[n][m] to be the number of paths from the starting point to point n in m steps
from every point n to its adjacent k, you have formula: f[k][m+1] = f[k][m+1] + f[n][m]
in the initialization, all f[n][m] will be 0, but f[starting_point][0] = 1
so you can calculate the final result
pseudo code:
memset(f, 0, sizeof(f));
f[starting_point][0] = 1;
for (int step = 0; step < n; ++step) {
for (int point = 0; point < point_num; ++point) {
for (int next_point = 0; next_point < point_num; ++ next_point) {
if (adjacent[point][next_point]) {
f[next_point][step+1] += f[point][step];
}
}
}
}
return f[starting_point][n]