I have this code to find prime numbers:
void writePrimesToFile(int begin, int end, ofstream& file)
{
bool isPrime = 0;
for (int i = begin; i < end; i = i+2)
{
isPrime = 1;
for (int j = 2; j<i; j++)
if (i % j == 0)
{
isPrime = 0;
break;
}
if (isPrime)
file << i << " \n";
}
}
Is there a faster way to do it?
I tried googling a faster way but its all math and I don't understand how can I turn it into code.
Is there a faster way to do it?
Yes. There are faster primality test algorithms.
What is fastest way to find a prime number in range?
No one knows. If some one knows, then that person is guarding a massively important secret. No one has been able to prove that any of the known techniques is the fastest possible way to test primality.
You might have asked: What is the fastest known way to find a prime number in range.
The answer to that would be: It depends. The complexity of some algorithms grow asymptotically slower than that of other algorithms, but that is irrelevant if the input numbers are small. There are probabilistic methods that are very fast for some numbers, but have problematic cases where they are slower than deterministic methods.
Your input numbers are small, because they are of type int and therefore have quite limited range. With small numbers, a simple algorithm may be faster than a more complex one. To find out which algorithm is fastest for your use case, you must benchmark them.
I recommend starting with Sieve of Eratosthenes since it is asymptotically faster than your naïve approach, but also easy to implement (pseudo code courtesy of wikipedia):
Input: an integer n > 1
Let A be an array of Boolean values, indexed by integers 2 to n,
initially all set to true.
for i = 2, 3, 4, ..., not exceeding √n:
if A[i] is true:
for j = i², i²+i, i²+2i, i²+3i, ..., not exceeding n :
A[j] := false
Output: all i such that A[i] is true.
I have an external collection containing n elements that I want to select some number (k) of them at random, outputting the indices of those elements to some serialized data file. I want the indices to be output in strict ascending order, and for there to be no duplicates. Both n and k may be quite large, and it is generally not feasible to simply store entire arrays in memory of that size.
The first algorithm I came up with was to pick a random number r[0] from 1 to n-k... and then pick a successive random numbers r[i] from r[i-1]+1 to n-k+i, only needing to store two entries for 'r' at any one time. However, a fairly simple analysis reveals the the probability for selecting small numbers is inconsistent with what could have been if the entire set was equally distributed. For example, if n was a billion and k was half a billion, the probability of selecting the first entry with the approach I've just described is very tiny (1 in half a billion), where in actuality since half of the entries are being selected, the first should be selected 50% of the time. Even if I use external sorting to sort k random numbers, I would have to discard any duplicates, and try again. As k approaches n, the number of retries would continue to grow, with no guarantee of termination.
I would like to find a O(k) or O(k log k) algorithm to do this, if it is at all possible. The implementation language I will be using is C++11, but descriptions in pseudocode may still be helpful.
If in practice k has the same order of magnitude as n, perhaps very straightforward O(n) algorithm will suffice:
assert(k <= n);
std::uniform_real_distribution rnd;
for (int i = 0; i < n; i++) {
if (rnd(engine) * (n - i) < k) {
std::cout << i << std::endl;
k--;
}
}
It produces all ascending sequences with equal probability.
You can solve this recursively in O(k log k) if you partition in the middle of your range, and randomly sample from the hypergeometric probability distribution to choose how many values lie above and below the middle point (i.e. the values of k for each subsequence), then recurse for each:
int sample_hypergeometric(int n, int K, int N) // samples hypergeometric distribution and
// returns number of "successes" where there are n draws without replacement from
// a population of N with K possible successes.
// Something similar to scipy.stats.hypergeom.rvs in Python.
// In this case, "success" means the selected value lying below the midpoint.
{
std::default_random_engine generator;
std::uniform_real_distribution<double> distribution(0.0,1.0);
int successes = 0;
for(int trial = 0; trial < n; trial++)
{
if((int)(distribution(generator) * N) < K)
{
successes++;
K--;
}
N--;
}
return successes;
}
select_k_from_n(int start, int k, int n)
{
if(k == 0)
return;
if(k == 1)
{
output start + random(1 to n);
return;
}
// find the number of results below the mid-point:
int k1 = sample_hypergeometric(k, n >> 1, n);
select_k_from_n(start, k1, n >> 1);
select_k_from_n(start + (n >> 1), k - k1, n - (n >> 1));
}
Sampling from the binomial distribution could also be used to approximate the hypergeometric distribution with p = (n >> 1) / n, rejecting samples where k1 > (n >> 1).
As mentioned in my comment, use a std::set<int> to store the randomly generated integers such that the resulting container is inherently sorted and contains no duplicates. Example code snippet:
#include <random>
#include <set>
int main(void) {
std::set<int> random_set;
std::random_device rd;
std::mt19937 mt_eng(rd());
// min and max of random set range
const int m = 0; // min
const int n = 100; // max
std::uniform_int_distribution<> dist(m,n);
// number to generate
const int k = 50;
for (int i = 0; i < k; ++i) {
// only non-previously occurring values will be inserted
if (!random_set.insert(dist(mt_eng)).second)
--i;
}
}
Assuming that you can't store k random numbers in memory, you'll have to generate the numbers in strict random order. One way to do it would be to generate a number between 0 and n/k. Call that number x. The next number you have to generate is between x+1 and (n-x)/(k-1). Continue in that fashion until you've selected k numbers.
Basically, you're dividing the remaining range by the number of values left to generate, and then generating a number in the first section of that range.
An example. You want to generate 3 numbers between 0 and 99, inclusive. So you first generate a number between 0 and 33. Say you pick 10.
So now you need a number between 11 and 99. The remaining range consists of 89 values, and you have two values left to pick. So, 89/2 = 44. You need a number between 11 and 54. Say you pick 36.
Your remaining range is from 37 to 99, and you have one number left to choose. So pick a number at random between 37 and 99.
This won't give you a normal distribution, as once you choose a number it's impossible to get a number less than that in a subsequent choice. But it might be good enough for your purposes.
This pseudocode shows the basic idea.
pick_k_from_n(n, k)
{
num_left = k
last_k = 0;
while num_left > 0
{
// divide the remaining range into num_left partitions
range_size = (n - last_k) / num_left
// pick a number in the first partition
r = random(range_size) + last_k + 1
output(r)
last_k = r
num_left = num_left - 1
}
}
Note that this takes O(k) time and requires O(1) extra space.
You can do it in O(k) time with Floyd's algorithm (not Floyd-Warshall, that's a shortest path thing). The only data structure you need is a 1-bit table that will tell you whether or not a number has already been selected. Searching a hash table can be O(1), so this will not be a burden, and can be kept in memory even for very large n (if n is truly huge, you'll have to use a b-tree or bloom filter or something).
To select k items from among n:
for j = n-k+1 to n:
select random x from 1 to j
if x is already in hash:
insert j into hash
else
insert x into hash
That's it. At the end, your hash table will contain a uniformly selected sample of k items from among n. Read them out in order (you may have to pick a type of hash table that allows that).
Could you adjust each ascending index selection in a way that compensates for the probability distortion you are describing?
IANAS, but my guess would be that if you pick a random number r between 0 and 1 (that you'll scale to the full remaining index range after the adjustment), you might be able to adjust it by calculating r^(x) (keeping the range in 0..1, but increasing the probability of smaller numbers), with x selected by solving the equation for the probability of the first entry?
Here's an O(k log k + √n)-time algorithm that uses O(√n) words of space. This can be generalized to an O(k + n^(1/c))-time, O(n^(1/c))-space algorithm for any integer constant c.
For intuition, imagine a simple algorithm that uses (e.g.) Floyd's sampling algorithm to generate k of n elements and then radix sorts them in base √n. Instead of remembering what the actual samples are, we'll do a first pass where we run a variant of Floyd's where we remember only the number of samples in each bucket. The second pass is, for each bucket in order, to randomly resample the appropriate number of elements from the bucket range. There's a short proof involving conditional probability that this gives a uniform distribution.
# untested Python code for illustration
# b is the number of buckets (e.g., b ~ sqrt(n))
import random
def first_pass(n, k, b):
counts = [0] * b # list of b zeros
for j in range(n - k, n):
t = random.randrange(j + 1)
if t // b >= counts[t % b]: # intuitively, "t is not in the set"
counts[t % b] += 1
else:
counts[j % b] += 1
return counts
is there a suitable algorithm that allows a program to search through an unsorted matrix in search of the biggest prime number within. The matrix is of size m*n and may be populated with other prime numbers and non-primes. The search must find the biggest prime.
I have studied the divide and conquer algorithms, and binary trees, and step-wise searches, but all of these deal with sorted matrices.
First of all, it doesn't matter if you are using m * n matrix or vector with m * n elements. Generally speaking, you will have to visit each matrix element at least once, as it is not sorted. There are few hints to make process faster.
If it is big matrix, you should visit elements row by row (and not column by column) as matrix is stored that way in memory so that elements from the same row will likely be in the cache once you access one of them.
Testing number's primeness is the most costly part of your task so if numbers in matrix are not too big, you can use Eratosthenes' sieve algorithm to make lookup of prime numbers in advance. https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
If you don't use Eratosthenes' sieve, maybe it will be beneficial if you sort your numbers before algorithm so that you can test numbers from the greatest to the smallest. In that case, your algorithm can stop once the first prime number is found. If you don't sort it, you will have to test all numbers, which is probably slowest method.
You could do this:
for (int i = 0; i < m; m++)
{
for (int j = 0; j < n; j++)
{
if ((array[i][j] == *a prime number*)
&& (array[i][j] > biggestPrime))
{
biggestPrime = array[i][j];
}
}
}
I need to write a program that could find the sums of the prime factors of the first 1000 numbers, check if the sums are prime, and print them if they are.
I have some pseudo-code I've written and I have a working program for generating prime numbers that I'm trying to expand off of. I'm learning from the book "Jumping into C++," by the way (it's a Practice Problem in the book).
This is the pseudo-code:
// for every 1000 of the first bunch of numbers, check if the number is prime
// if (number isPrime())
// use expression <number_being_checked % number_being_compared_against == 0;>
// if (number_being_checked % number_being_compared_against == 0)
// for every number found from dividing the two numbers, check if number is prime
// add up prime numbers and check if the sums are prime
// else, return false in bool function isFactorPrime() (if I write such a function)
And this is the main() function right now:
int main ()
{
for (int i = 0; i < 1000; i++)
{
if (isFactorPrime(i))
{
cout << i
}
}
}
The issue I'm having right now is about what I should add to i (i + some_variable?) to get my sum that I can use in my check to see if it's a prime number. Should I make an inner for-loop and then add loop-variable in that to i, with expression i + j? Where I'd assign j the value of the number being checked (which is what I'm wondering how I could do. It's not like I'm going to take the user's input, I'm just iterating over the first 1000 numbers and checking them).
There are also some other Practice Problems in the book I need to ask help with, but for now I'll go with just this one.
my teacher gave me this :
n<=10^6;
an array of n integer :ai..an(ai<=10^9);
find all prime numbers .
he said something about sieve of eratosthenes,and I read about it,also the wheel factorization too,but I still couldn't figure it out how to get the program (fpc) to run in 1s.??
as I know it's impossible,but still want to know your opinion .
and with the wheel factorization ,a 2*3 circle will treat 25 as a prime number,and I wanna ask if there is a way to find out the first number of the wheel treated wrong as a prime number.
example:2*3*5 circle ,how to find the first composite number treated as aprime number??
please help..and sorry for bad english.
A proper Sieve of Eratosthenes should find the primes less than a billion in about a second; it's possible. If you show us your code, we'll be happy to help you find what is wrong.
The smallest composite not marked by a 2,3,5-wheel is 49: the next largest prime not a member of the wheel is 7, and 7 * 7 = 49.
I did it now and it's finding primes up to 1000000 in a few milliseconds, without displaying all those numbers.
Declare an array a of n + 1 bools (if it is zero-based). At the beginning 0th and 1st element are false, all others are true (false is not a prime).
The algorithm looks like that:
i = 2;
while i * i <= n
if a[i] == true
j = i * i;
while j < n
a[j] = false;
j = j + i;
i = i + 1;
In a loop the condition is i * i <= n because you start searching from i * i (smaller primes than that was found already by one of other primes) so square root of i must not be bigger than n. You remove all numbers which are multiplies of primes up to n.
Time complexity is O(n log log n).
If you want to display primes, you display indexes which values in array are true.
Factorization is usefull if you want to find e.g. all semiprimes from 0 to n (products of two prime numbers). Then you find all smallest prime divisors from 0 to n/2 and check for each number if it has prime divisor and if number divided by its prime divisor has zero divisors. If so - it is a semiprime. My program wrote like that was calculating 8 times faster than first finding all primes and then multiplying them and saving result in an array.