N choose k for large n and k

N choose k for large n and k - c++

I have n elements stored in an array and a number k of possible subset over n(n chose k).
I have to find all the possible combinations of k elements in the array of length n and, for each set(of length k), make some calculations on the elements choosen.
I have written a recursive algorithm(in C++) that works fine, but for large number it crashes going out of heap space.
How can I fix the problem? How can I calculate all the sets of n chose k for large n and k?
Is there any library for C++ that can help me?
I know it is a np problem but I would write the best code in order to calculate the biggest numbers possible.
Which is approximately the biggest numbers (n and k)beyond which it becames unfeasible?
I am only asking for the best algorithm, not for unfeasible space/work.
Here my code
vector<int> people;
vector<int> combination;
void pretty_print(const vector<int>& v)
{
static int count = 0;
cout << "combination no " << (++count) << ": [ ";
for (int i = 0; i < v.size(); ++i) { cout << v[i] << " "; }
cout << "] " << endl;
}
void go(int offset, int k)
{
if (k == 0) {
pretty_print(combination);
return;
}
for (int i = offset; i <= people.size() - k; ++i) {
combination.push_back(people[i]);
go(i+1, k-1);
combination.pop_back();
}
}
int main() {
int n = #, k = #;
for (int i = 0; i < n; ++i) { people.push_back(i+1); }
go(0, k);
return 0;
}

Here is non recursive algorithm:
const int n = ###;
const int k = ###;
int currentCombination[k];
for (int i=0; i<k; i++)
currentCombination[i]=i;
currentCombination[k-1] = k-1-1; // fill initial combination is real first combination -1 for last number, as we will increase it in loop
do
{
if (currentCombination[k-1] == (n-1) ) // if last number is just before overwhelm
{
int i = k-1-1;
while (currentCombination[i] == (n-k+i))
i--;
currentCombination[i]++;
for (int j=(i+1); j<k; j++)
currentCombination[j] = currentCombination[i]+j-i;
}
else
currentCombination[k-1]++;
for (int i=0; i<k; i++)
_tprintf(_T("%d "), currentCombination[i]);
_tprintf(_T("\n"));
} while (! ((currentCombination[0] == (n-1-k+1)) && (currentCombination[k-1] == (n-1))) );

Your recursive algorithm might be blowing the stack. If you make it non-recursive, then that would help, but it probably won't solve the problem if your case is really 100 choose 10. You have two problems. Few, if any, computers in the world have 17+ terabytes of memory. Going through 17 trillion+ iterations to generate all the combinations will take way too long. You need to rethink the problem and either come up with an N choose K case that is more reasonable, or process only a certain subset of the combinations.
You probably do not want to be processing more than a billion or two combinations at the most - and even that will take some time. That translates to around 41 choose 10 to about 44 choose 10. Reducing either N or K will help. Try editing your question and posting the problem you are trying to solve and why you think you need to go through all of the combinations. There may be a way to solve it without going through all of the combinations.
If it turns out you do need to go through all those combinations, then maybe you should look into using a search technique like a genetic algorithm or simulated annealing. Both of these hill climbing search techniques provide the ability to search a large space in a relatively small time for a close to optimal solution, but neither guarantee to find the optimal solution.

You can use next_permutation() in algorithm.h to generate all possible combinations.
Here is some example code:
bool is_chosen(n, false);
fill(is_chosen.begin() + n - k, is_chosen.end(), true);
do
{
for(int i = 0; i < n; i++)
{
if(is_chosen[i])
cout << some_array[i] << " ";
}
cout << endl;
} while( next_permutation(is_chosen.begin(), is_chosen.end()) );
Don't forget to include the algorithm.

As I said in a comment, it's not clear what you really want.
If you want to compute (n choose k) for relatively small values, say n,k < 100 or so, you may want to use a recursive method, using Pascals triangle.
If n,k are large (say n=1000000, k=500000), you may be happy with an approxiate result using Sterlings formula for the factorial: (n choose k) = exp(loggamma(n)-loggamma(k)-loggamma(n-k)), computing loggamma(x) via Sterling's formula.
If you want (n choose k) for all or many k but the same n, you can simply iterate over k and use (n choose k+1) = ((n choose k)*(n-k))/(k+1).

Related

Count number of ways for choosing two numbers in efficient algorithm

I solved this problem but I got TLE Time Limit Exceed on online judge
the output of program is right but i think the way can be improved to be more efficient!
the problem :
Given n integer numbers, count the number of ways in which we can choose two elements such
that their absolute difference is less than 32.
In a more formal way, count the number of pairs (i, j) (1 ≤ i < j ≤ n) such that
|V[i] - V[j]| < 32. |X|
is the absolute value of X.
Input
The first line of input contains one integer T, the number of test cases (1 ≤ T ≤ 128).
Each test case begins with an integer n (1 ≤ n ≤ 10,000).
The next line contains n integers (1 ≤ V[i] ≤ 10,000).
Output
For each test case, print the number of pairs on a single line.
my code in c++ :
int main() {
int T,n,i,j,k,count;
int a[10000];
cin>>T;
for(k=0;k<T;k++)
{ count=0;
cin>>n;
for(i=0;i<n;i++)
{
cin>>a[i];
}
for(i=0;i<n;i++)
{
for(j=i;j<n;j++)
{
if(i!=j)
{
if(abs(a[i]-a[j])<32)
count++;
}
}
}
cout<<count<<endl;
}
return 0;
}
I need help how can I solve it in more efficient algorithm ?

Despite my previous (silly) answer, there is no need to sort the data at all. Instead you should count the frequencies of the numbers.
Then all you need to do is keep track of the number of viable numbers to pair with, while iterating over the possible values. Sorry no c++ but java should be readable as well:
int solve (int[] numbers) {
int[] frequencies = new int[10001];
for (int i : numbers) frequencies[i]++;
int solution = 0;
int inRange = 0;
for (int i = 0; i < frequencies.length; i++) {
if (i > 32) inRange -= frequencies[i - 32];
solution += frequencies[i] * inRange;
solution += frequencies[i] * (frequencies[i] - 1) / 2;
inRange += frequencies[i];
}
return solution;
}

#include <bits/stdc++.h>
using namespace std;
int a[10010];
int N;
int search (int x){
int low = 0;
int high = N;
while (low < high)
{
int mid = (low+high)/2;
if (a[mid] >= x) high = mid;
else low = mid+1;
}
return low;
}
int main() {
cin >> N;
for (int i=0 ; i<N ; i++) cin >> a[i];
sort(a,a+N);
long long ans = 0;
for (int i=0 ; i<N ; i++)
{
int t = search(a[i]+32);
ans += (t -i - 1);
}
cout << ans << endl;
return 0;
}

You can sort the numbers, and then use a sliding window. Starting with the smallest number, populate a std::deque with the numbers so long as they are no larger than the smallest number + 31. Then in an outer loop for each number, update the sliding window and add the new size of the sliding window to the counter. Update of the sliding window can be performed in an inner loop, by first pop_front every number that is smaller than the current number of the outer loop, then push_back every number that is not larger than the current number of the outer loop + 31.

One faster solution would be to first sort the array, then iterate through the sorted array and for each element only visit the elements to the right of it until the difference exceeds 31.
Sorting can probably be done via count sort (since you have 1 ≤ V[i] ≤ 10,000). So you get linear time for the sorting part. It might not be necessary though (maybe quicksort suffices in order to get all the points).
Also, you can do a trick for the inner loop (the "going to the right of the current element" part). Keep in mind that if S[i+k]-S[i]<32, then S[i+k]-S[i+1]<32, where S is the sorted version of V. With this trick the whole algorithm turns linear.

This can be done constant number of passes over the data, and actually can be done without being affected by the value of the "interval" (in your case, 32).
This is done by populating an array where a[i] = a[i-1] + number_of_times_i_appears_in_the_data - informally, a[i] holds the total number of elements that are smaller/equals to i.
Code (for a single test case):
static int UPPER_LIMIT = 10001;
static int K = 32;
int frequencies[UPPER_LIMIT] = {0}; // O(U)
int n;
std::cin >> n;
for (int i = 0; i < n; i++) { // O(n)
int x;
std::cin >> x;
frequencies[x] += 1;
}
for (int i = 1; i < UPPER_LIMIT; i++) { // O(U)
frequencies[i] += frequencies[i-1];
}
int count = 0;
for (int i = 1; i < UPPER_LIMIT; i++) { // O(U)
int low_idx = std::max(i-32, 0);
int number_of_elements_with_value_i = frequencies[i] - frequencies[i-1];
if (number_of_elements_with_value_i == 0) continue;
int number_of_elements_with_value_K_close_to_i =
(frequencies[i-1] - frequencies[low_idx]);
std::cout << "i: " << i << " number_of_elements_with_value_i: " << number_of_elements_with_value_i << " number_of_elements_with_value_K_close_to_i: " << number_of_elements_with_value_K_close_to_i << std::endl;
count += number_of_elements_with_value_i * number_of_elements_with_value_K_close_to_i;
// Finally, add "duplicates" of i, this is basically sum of arithmetic
// progression with d=1, a0=0, n=number_of_elements_with_value_i
count += number_of_elements_with_value_i * (number_of_elements_with_value_i-1) /2;
}
std::cout << count;
Working full example on IDEone.

You can sort and then use break to end loop when ever the range goes out.
int main()
{
int t;
cin>>t;
while(t--){
int n,c=0;
cin>>n;
int ar[n];
for(int i=0;i<n;i++)
cin>>ar[i];
sort(ar,ar+n);
for(int i=0;i<n;i++){
for(int j=i+1;j<n;j++){
if(ar[j]-ar[i] < 32)
c++;
else
break;
}
}
cout<<c<<endl;
}
}
Or, you can use a hash array for the range and mark occurrence of each element and then loop around and check for each element i.e. if x = 32 - y is present or not.

A good approach here is to split the numbers into separate buckets:
constexpr int limit = 10000;
constexpr int diff = 32;
constexpr int bucket_num = (limit/diff)+1;
std::array<std::vector<int>,bucket_num> buckets;
cin>>n;
int number;
for(i=0;i<n;i++)
{
cin >> number;
buckets[number/diff].push_back(number%diff);
}
Obviously the numbers that are in the same bucket are close enough to each other to fit the requirement, so we can just count all the pairs:
int result = std::accumulate(buckets.begin(), buckets.end(), 0,
[](int s, vector<int>& v){ return s + (v.size()*(v.size()-1))/2; });
The numbers that are in non-adjacent buckets cannot form any acceptable pairs, so we can just ignore them.
This leaves the last corner case - adjacent buckets - which can be solved in many ways:
for(int i=0;i<bucket_num-1;i++)
if(buckets[i].size() && buckets[i+1].size())
result += adjacent_buckets(buckets[i], buckets[i+1]);
Personally I like the "occurrence frequency" approach on the one bucket scale, but there may be better options:
int adjacent_buckets(const vector<int>& bucket1, const vector<int>& bucket2)
{
std::array<int,diff> pairs{};
for(int number : bucket1)
{
for(int i=0;i<number;i++)
pairs[i]++;
}
return std::accumulate(bucket2.begin(), bucket2.end(), 0,
[&pairs](int s, int n){ return s + pairs[n]; });
}
This function first builds an array of "numbers from lower bucket that are close enough to i", and then sums the values from that array corresponding to the upper bucket numbers.
In general this approach has O(N) complexity, in the best case it will require pretty much only one pass, and overall should be fast enough.
Working Ideone example

This solution can be considered O(N) to process N input numbers and constant in time to process the input:
#include <iostream>
using namespace std;
void solve()
{
int a[10001] = {0}, N, n, X32 = 0, ret = 0;
cin >> N;
for (int i=0; i<N; ++i)
{
cin >> n;
a[n]++;
}
for (int i=0; i<10001; ++i)
{
if (i >= 32)
X32 -= a[i-32];
if (a[i])
{
ret += a[i] * X32;
ret += a[i] * (a[i]-1)/2;
X32 += a[i];
}
}
cout << ret << endl;
}
int main()
{
int T;
cin >> T;
for (int i=0 ; i<T ; i++)
solve();
}
run this code on ideone
Solution explanation: a[i] represents how many times i was in the input series.
Then you go over entire array and X32 keeps track of number of elements that's withing range from i. The only tricky part really is to calculate properly when some i is repeated multiple times: a[i] * (a[i]-1)/2. That's it.

You should start by sorting the input.
Then if your inner loop detects the distance grows above 32, you can break from it.

Thanks for everyone efforts and time to solve this problem.
I appreciated all Attempts to solve it.
After testing the answers on online judge I found the right and most efficient solution algorithm is Stef's Answer and AbdullahAhmedAbdelmonem's answer also pavel solution is right but it's exactly same as Stef solution in different language C++.
Stef's code got time execution 358 ms in codeforces online judge and accepted.
also AbdullahAhmedAbdelmonem's code got time execution 421 ms in codeforces online judge and accepted.
if they put detailed explanation to there algorithm the bounty will be to one of them.
you can try your solution and submit it to codeforces online judge at this link after choosing problem E. Time Limit Exceeded?
also I found a great algorithm solution and more understandable using frequency array and it's complexity O(n).
in this algorithm you only need to take specific range for each inserted element to the array which is:
begin = element - 32
end = element + 32
and then count number of pair in this range for each inserted element in the frequency array :
int main() {
int T,n,i,j,k,b,e,count;
int v[10000];
int freq[10001];
cin>>T;
for(k=0;k<T;k++)
{
count=0;
cin>>n;
for(i=1;i<=10000;i++)
{
freq[i]=0;
}
for(i=0;i<n;i++)
{
cin>>v[i];
}
for(i=0;i<n;i++)
{
count=count+freq[v[i]];
b=v[i]-31;
e=v[i]+31;
if(b<=0)
b=1;
if(e>10000)
e=10000;
for(j=b;j<=e;j++)
{
freq[j]++;
}
}
cout<<count<<endl;
}
return 0;
}
finally i think the best approach to solve this kind of problems to use frequency array and count number of pairs in specific range because it's time complexity is O(n).

C++ algorithm optimization: find K combination from N elements

I am pretty noobie with C++ and am trying to do some HackerRank challenges as a way to work on that.
Right now I am trying to solve Angry Children problem: https://www.hackerrank.com/challenges/angry-children
Basically, it asks to create a program that given a set of N integer, finds the smallest possible "unfairness" for a K-length subset of that set. Unfairness is defined as the difference between the max and min of a K-length subset.
The way I'm going about it now is to find all K-length subsets and calculate their unfairness, keeping track of the smallest unfairness.
I wrote the following C++ program that seems to the problem correctly:
#include <cmath>
#include <cstdio>
#include <iostream>
using namespace std;
int unfairness = -1;
int N, K, minc, maxc, ufair;
int *candies, *subset;
void check() {
ufair = 0;
minc = subset[0];
maxc = subset[0];
for (int i = 0; i < K; i++) {
minc = min(minc,subset[i]);
maxc = max(maxc, subset[i]);
}
ufair = maxc - minc;
if (ufair < unfairness || unfairness == -1) {
unfairness = ufair;
}
}
void process(int subsetSize, int nextIndex) {
if (subsetSize == K) {
check();
} else {
for (int j = nextIndex; j < N; j++) {
subset[subsetSize] = candies[j];
process(subsetSize + 1, j + 1);
}
}
}
int main() {
cin >> N >> K;
candies = new int[N];
subset = new int[K];
for (int i = 0; i < N; i++)
cin >> candies[i];
process(0, 0);
cout << unfairness << endl;
return 0;
}
The problem is that HackerRank requires the program to come up with a solution within 3 seconds and that my program takes longer than that to find the solution for 12/16 of the test cases. For example, one of the test cases has N = 50 and K = 8; the program takes 8 seconds to find the solution on my machine. What can I do to optimize my algorithm? I am not very experienced with C++.

All you have to do is to sort all the numbers in ascending order and then get minimal a[i + K - 1] - a[i] for all i from 0 to N - K inclusively.
That is true, because in optimal subset all numbers are located successively in sorted array.

One suggestion I'd give is to sort the integer list before selecting subsets. This will dramatically reduce the number of subsets you need to examine. In fact, you don't even need to create subsets, simply look at the elements at index i (starting at 0) and i+k, and the lowest difference for all elements at i and i+k [in valid bounds] is your answer. So now instead of n choose k subsets (factorial runtime I believe) you just have to look at ~n subsets (linear runtime) and sorting (nlogn) becomes your bottleneck in performance.

Find the biggest 3 numbers in a vector

I'm trying to make a function to get the 3 biggest numbers in a vector. For example:
Numbers: 1 6 2 5 3 7 4
Result: 5 6 7
I figured I could sort them DESC, get the 3 numbers at the beggining, and after that resort them ASC, but that would be a waste of memory allocation and execution time. I know there is a simpler solution, but I can't figure it out. And another problem is, what if I have only two numbers...
BTW: I use as compiler BorlandC++ 3.1 (I know, very old, but that's what I'll use at the exam..)
Thanks guys.
LE: If anyone wants to know more about what I'm trying to accomplish, you can check the code:
#include<fstream.h>
#include<conio.h>
int v[1000], n;
ifstream f("bac.in");
void citire();
void afisare_a();
int ultima_cifra(int nr);
void sortare(int asc);
void main() {
clrscr();
citire();
sortare(2);
afisare_a();
getch();
}
void citire() {
f>>n;
for(int i = 0; i < n; i++)
f>>v[i];
f.close();
}
void afisare_a() {
for(int i = 0;i < n; i++)
if(ultima_cifra(v[i]) == 5)
cout<<v[i]<<" ";
}
int ultima_cifra(int nr) {
return nr - 10 * ( nr / 10 );
}
void sortare(int asc) {
int aux, s;
if(asc == 1)
do {
s = 0;
for(int i = 0; i < n-1; i++)
if(v[i] > v[i+1]) {
aux = v[i];
v[i] = v[i+1];
v[i+1] = aux;
s = 1;
}
} while( s == 1);
else
do {
s = 0;
for(int i = 0; i < n-1; i++)
if(v[i] < v[i+1]) {
aux = v[i];
v[i] = v[i+1];
v[i+1] = v[i];
s = 1;
}
} while(s == 1);
}
Citire = Read
Afisare = Display
Ultima Cifra = Last digit of number
Sortare = Bubble Sort

If you were using a modern compiler, you could use std::nth_element to find the top three. As is, you'll have to scan through the array keeping track of the three largest elements seen so far at any given time, and when you get to the end, those will be your answer.
For three elements that's a trivial thing to manage. If you had to do the N largest (or smallest) elements when N might be considerably larger, then you'd almost certainly want to use Hoare's select algorithm, just like std::nth_element does.

You could do this without needing to sort at all, it's doable in O(n) time with linear search and 3 variables keeping your 3 largest numbers (or indexes of your largest numbers if this vector won't change).

Why not just step through it once and keep track of the 3 highest digits encountered?
EDIT: The range for the input is important in how you want to keep track of the 3 highest digits.

Use std::partial_sort to descending sort the first c elements that you care about. It will run in linear time for a given number of desired elements (n log c) time.

If you can't use std::nth_element write your own selection function.
You can read about them here: http://en.wikipedia.org/wiki/Selection_algorithm#Selecting_k_smallest_or_largest_elements

Sort them normally and then iterate from the back using rbegin(), for as many as you wish to extract (no further than rend() of course).
sort will happen in place whether ASC or DESC by the way, so memory is not an issue since your container element is an int, thus has no encapsulated memory of its own to manage.

Yes sorting is good. A especially for long or variable length lists.
Why are you sorting it twice, though? The second sort might actually be very inefficient (depends on the algorithm in use). A reverse would be quicker, but why even do that? If you want them in ascending order at the end, then sort them into ascending order first ( and fetch the numbers from the end)

I think you have the choice between scanning the vector for the three largest elements or sorting it (either using sort in a vector or by copying it into an implicitly sorted container like a set).

If you can control the array filling maybe you could add the numbers ordered and then choose the first 3 (ie), otherwise you can use a binary tree to perform the search or just use a linear search as birryree says...

Thank #nevets1219 for pointing out that the code below only deals with positive numbers.
I haven't tested this code enough, but it's a start:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> nums;
nums.push_back(1);
nums.push_back(6);
nums.push_back(2);
nums.push_back(5);
nums.push_back(3);
nums.push_back(7);
nums.push_back(4);
int first = 0;
int second = 0;
int third = 0;
for (int i = 0; i < nums.size(); i++)
{
if (nums.at(i) > first)
{
third = second;
second = first;
first = nums.at(i);
}
else if (nums.at(i) > second)
{
third = second;
second = nums.at(i);
}
else if (nums.at(i) > third)
{
third = nums.at(i);
}
std::cout << "1st: " << first << " 2nd: " << second << " 3rd: " << third << std::endl;
}
return 0;
}

The following solution finds the three largest numbers in O(n) and preserves their relative order:
std::vector<int>::iterator p = std::max_element(vec.begin(), vec.end());
int x = *p;
*p = std::numeric_limits<int>::min();
std::vector<int>::iterator q = std::max_element(vec.begin(), vec.end());
int y = *q;
*q = std::numeric_limits<int>::min();
int z = *std::max_element(vec.begin(), vec.end());
*q = y; // restore original value
*p = x; // restore original value

A general solution for the top N elements of a vector:
Create an array or vector topElements of length N for your top N elements.
Initialise each element of topElements to the value of your first element in your vector.
Select the next element in the vector, or finish if no elements are left.
If the selected element is greater than topElements[0], replace topElements[0] with the value of the element. Otherwise, go to 3.
Starting with i = 0, swap topElements[i] with topElements[i + 1] if topElements[i] is greater than topElements[i + 1].
While i is less than N, increment i and go to 5.
Go to 3.
This should result in topElements containing your top N elements in reverse order of value - that is, the largest value is in topElements[N - 1].

C++ Prime number task from the book

I'm a C++ beginner ;)
How good is the code below as a way of finding all prime numbers between 2-1000:
int i, j;
for (i=2; i<1000; i++) {
for (j=2; j<=(i/j); j++) {
if (! (i%j))
break;
if (j > (i/j))
cout << i << " is prime\n";
}
}

You stop when j = i.
A first simple optimization is to stop when j = sqrt(i) (since there can be no factors of a number greater than its square root).
A much faster implementation is for example the sieve of eratosthenes.
Edit: the code looks somewhat mysterious, so here's how it works:
The terminating condition on the inner for is i/j, equivalent to j<i (which is much clearer),since when finally have j==i, we'll have i/j==0 and the for will break.
The next check if(j>(i/j)) is really nasty. Basically it just checks whether the loop hit the for's end condition (therefore we have a prime) or if we hit the explicit break (no prime). If we hit the for's end, then j==i+1 (think about it) => i/j==0 => it's a prime. If we hit a break, it means j is a factor of i,but not just any factor, the smallest in fact (since we exit at the first j that divides i)!
Since j is the smallest factor,the other factor (or product of remaining factors, given by i/j) will be greater or equal to j, hence the test. If j<=i/j,we hit a break and j is the smallest factor of i.
That's some unreadable code!

Not very good. In my humble opinion, the indentation and spacing is hideous (no offense). To clean it up some:
int i, j;
for (i=2; i<1000; i++) {
for (j=2; i/j; j++) {
if (!(i % j))
break;
if (j > i/j)
cout << i << " is prime\n";
}
}
This reveals a bug: the if (j > i/j) ... needs to be on the outside of the inner loop for this to work. Also, I think that the i/j condition is more confusing (not to mention slower) than just saying j < i (or even nothing, because once j reaches i, i % j will be 0). After these changes, we have:
int i, j;
for (i=2; i<1000; i++) {
for (j=2; j < i; j++) {
if (!(i % j))
break;
}
if (j > i/j)
cout << i << " is prime\n";
}
This works. However, the j > i/j confuses the heck out of me. I can't even figure out why it works (I suppose I could figure it out if I spent a while looking like this guy). I would write if (j == i) instead.
What you have implemented here is called trial division. A better algorithm is the Sieve of Eratosthenes, as posted in another answer. A couple things to check if you implement a Sieve of Eratosthenes:
It should work.
It shouldn't use division or modulus. Not that these are "bad" (granted, they tend to be an order of magnitude slower than addition, subtraction, negation, etc.), but they aren't needed, and if they're present, it probably means the implementation isn't really that efficient.
It should be able to compute the primes less than 10,000,000 in about a second (depending on your hardware, compiler, etc.).

First off, your code is both short and correct, which is very good for at beginner. ;-)
This is what I would do to improve the code:
1) Define the variables inside the loops, so they don't get confused with something else. I would also make the bound a parameter or a constant.
#define MAX 1000
for(int i=2;i<MAX;i++){
for(int j=2;j<i/j;j++){
if(!(i%j)) break;
if(j>(i/j)) cout<<i<<" is prime\n";
}
}
2) I would use the Sieve of Eratosthenes, as Joey Adams and Mau have suggested. Notice how I don't have to write the bound twice, so the two usages will always be identical.
#define MAX 1000
bool prime[MAX];
memset(prime, sizeof(prime), true);
for(int i=4;i<MAX;i+=2) prime[i] = false;
prime[1] = false;
cout<<2<<" is prime\n";
for(int i=3;i*i<MAX;i+=2)
if (prime[i]) {
cout<<i<<" is prime\n";
for(int j=i*i;j<MAX;j+=i)
prime[j] = false;
}
The bounds are also worth noting. i*i<MAX is a lot faster than j > i/j and you also don't need to mark any numbers < i*i, because they will already have been marked, if they are composite. The most important thing is the time complexity though.
3) If you really want to make this algorithm fast, you need to cache optimize it. The idea is to first find all the primes < sqrt(MAX) and then use them to find the rest of the
primes. Then you can use the same block of memory to find all primes from 1024-2047, say,
and then 2048-3071. This means that everything will be kept in L1-cache. I once measured a ~12 time speedup by using this optimization on the Sieve of Eratosthenes.
You can also cut the space usage in half by not storing the even numbers, which means that
you don't have to perform the calculations to begin working on a new block as often.
If you are a beginner you should probably just forget about the cache for the moment though.

The one simple answer to the whole bunch of text we posted up here is : Trial division!
If someone mentioned mathematical basis that this task was based on, we'd save plenty of time ;)

#include <stdio.h>
#define N 1000
int main()
{
bool primes[N];
for(int i = 0 ; i < N ; i++) primes[i] = false;
primes[2] = true;
for(int i = 3 ; i < N ; i+=2) { // Check only odd integers
bool isPrime = true;
for(int j = i/2 ; j > 2 ; j-=2) { // Check only from largest possible multiple of current number
if ( j%2 == 0 ) { j = j-1; } // Check only with previous odd divisors
if(!primes[j]) continue; // Check only with previous prime divisors
if ( i % j == 0 ) {
isPrime = false;
break;
}
}
primes[i] = isPrime;
}
return 0;
}
This is working code. I also included many of the optimizations mentioned by previous posters. If there are any other optimizations that can be done, it would be informative to know.

This function is more efficient to see if a number is prime.
bool isprime(const unsigned long n)
{
if (n<2) return false;
if (n<4) return true;
if (n%2==0) return false;
if (n%3==0) return false;
unsigned long r = (unsigned long) sqrt(n);
r++;
for(unsigned long c=6; c<=r; c+=6)
{
if (n%(c-1)==0) return false;
if (n%(c+1)==0) return false;
}

Finding composite numbers

I have a range of random numbers. The range is actually determined by the user but it will be up to 1000 integers. They are placed in this:
vector<int> n
and the values are inserted like this:
srand(1);
for (i = 0; i < n; i++)
v[i] = rand() % n;
I'm creating a separate function to find all the non-prime values. Here is what I have now, but I know it's completely wrong as I get both prime and composite in the series.
void sieve(vector<int> v, int n)
{
int i,j;
for(i = 2; i <= n; i++)
{
cout << i << " % ";
for(j = 0; j <= n; j++)
{
if(i % v[j] == 0)
cout << v[j] << endl;
}
}
}
This method typically worked when I just had a series of numbers from 0-1000, but it doesn't seem to be working now when I have numbers out of order and duplicates. Is there a better method to find non-prime numbers in a vector? I'm tempted to just create another vector, fill it with n numbers and just find the non-primes that way, but would that be inefficient?
Okay, since the range is from 0-1000 I am wondering if it's easier to just create vector with 0-n sorted, and then using a sieve to find the primes, is this getting any closer?
void sieve(vector<int> v, BST<int> t, int n)
{
vector<int> v_nonPrime(n);
int i,j;
for(i = 2; i < n; i++)
v_nonPrime[i] = i;
for(i = 2; i < n; i++)
{
for(j = i + 1; j < n; j++)
{
if(v_nonPrime[i] % j == 0)
cout << v_nonPrime[i] << endl;
}
}
}

In this code:
if(i % v[j] == 0)
cout << v[j] << endl;
You are testing your index to see if it is divisible by v[j]. I think you meant to do it the other way around, i.e.:
if(v[j] % i == 0)
Right now, you are printing random divisors of i. You are not printing out random numbers which are known not to be prime. Also, you will have duplicates in your output, perhaps that is ok.

First off, I think Knuth said it first: premature optimization is the cause of many bugs. Make the slow version first, and then figure out how to make it faster.
Second, for your outer loop, you really only need to go to sqrt(n) rather than n.

Basically, you have a lot of unrelated numbers, so for each one you will have to check if it's prime.
If you know the range of the numbers in advance, you can generate all prime numbers that can occur in that range (or the sqrt thereof), and test every number in your container for divisibility by any one of the generated primes.
Generating the primes is best done by the Erathostenes Sieve - many examples to be found of that algorithm.

You should try using a prime sieve. You need to know the maximal number for creating the sieve (O(n)) and then you can build a set of primes in that range (O(max_element) or as the problem states O(1000) == O(1))) and check whether each number is in the set of primes.

Your code is just plain wrong. First, you're testing i % v[j] == 0, which is backwards and also explains why you get all numbers. Second, your output will contain duplicates as you're testing and outputting each input number every time it fails the (broken) divisibility test.
Other suggestions:
Using n as the maximum value in the vector and the number of elements in the vector is confusing and pointless. You don't need to pass in the number of elements in the vector - you just query the vector's size. And you can figure out the max fairly quickly (but if you know it ahead of time you may as well pass it in).
As mentioned above, you only need to test to sqrt(n) [where n is the max value in the vecotr]
You could use a sieve to generate all primes up to n and then just remove those values from the input vector, as also suggested above. This may be quicker and easier to understand, especially if you store the primes somewhere.
If you're going to test each number individually (using, I guess, and inverse sieve) then I suggest testing each number individually, in order. IMHO it'll be easier to understand than the way you've written it - testing each number for divisibility by k < n for ever increasing k.

The idea of the sieve that you try to implement depends on the fact that you start at a prime (2) and cross out multitudes of that number - so all numbers that depend on the prime "2" are ruled out beforehand.
That's because all non-primes can be factorized down to primes. Whereas primes are not divisible with modulo 0 unless you divide them by 1 or by themselves.
So, if you want to rely on this algorithm, you will need some mean to actually restore this property of the algorithm.

Your code seems to have many problems:
If you want to test if your number is prime or non-prime, you would need to check for v[j] % i == 0, not the other way round
You did not check if your number is dividing by itself
You keep on checking your numbers again and again. That's very inefficient.
As other guys suggested, you need to do something like the Sieve of Eratosthenes.
So a pseudo C code for your problem would be (I haven't run this through compilers yet, so please ignore syntax errors. This code is to illustrate the algorithm only)
vector<int> inputNumbers;
// First, find all the prime numbers from 1 to n
bool isPrime[n+1] = {true};
isPrime[0]= false;
isPrime[1]= false;
for (int i = 2; i <= sqrt(n); i++)
{
if (!isPrime[i])
continue;
for (int j = 2; j <= n/i; j++)
isPrime[i*j] = false;
}
// Check the input array for non-prime numbers
for (int i = 0; i < inputNumbers.size(); i++)
{
int thisNumber = inputNumbers[i];
// Vet the input to make sure we won't blow our isPrime array
if ((0<= thisNumber) && (thisNumber <=n))
{
// Prints out non-prime numbers
if (!isPrime[thisNumber])
cout<< thisNumber;
}
}

sorting the number first might be a good start - you can do that in nLogN time. That is a small addition (I think) to your other problem - that of finding if a number is prime.
(actually, with a small set of numbers like that you can do a sort much faster with a copy of the size of the vector/set and do a hash/bucket sort/whatever)
I'd then find the highest number in the set (I assume the numbers can be unbounded - no know upper limit until your sort - or do a single pass to find the max)
then go with a sieve - as others have said

Jeremy is right, the basic problem is your i % v[j] instead of v[j] % i.
Try this:
void sieve(vector<int> v, int n) {
int i,j;
for(j = 0; j <= n; j++) {
cout << v[j] << ": ";
for(i = 2; i < v[j]; i++) {
if(v[j] % i == 0) {
cout << "is divisible by " << i << endl;
break;
}
}
if (i == v[j]) {
cout << "is prime." << endl;
}
}
}
It's not optimal, because it's attempting to divide by all numbers less than v[j] instead of just up to the square root of v[j]. And it is attempting dividion by all numbers instead of only primes.
But it will work.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

N choose k for large n and k - c++

Related

Count number of ways for choosing two numbers in efficient algorithm

C++ algorithm optimization: find K combination from N elements

Find the biggest 3 numbers in a vector

C++ Prime number task from the book

Finding composite numbers

Categories

Resources