How do i reduce the repeadly use of % operator for faster execution in C - c++

This is code -
for (i = 1; i<=1000000 ; i++ ) {
for ( j = 1; j<= 1000000; j++ ) {
for ( k = 1; k<= 1000000; k++ ) {
if (i % j == k && j%k == 0)
count++;
}
}
}
or is it better to reduce any % operation that goes upto million times in any programme ??
edit- i am sorry ,
initialized by 0, let say i = 1 ok!
now, if i reduce the third loop as #darshan's answer then both the first
&& second loop can run upto N times
and also it calculating % , n*n times. ex- 2021 mod 2022 , then 2021 mod 2023..........and so on
so my question is- % modulus is twice (and maybe more) as heavy as compared to +, - so there's any other logic can be implemented here ?? which is alternate for this question. and gives the same answer as this logic will give..
Thank you so much for knowledgeable comments & help-
Question is:
3 integers (A,B,C) is considered to be special if it satisfies the
following properties for a given integer N :
A mod B=C
B mod C=0
1≤A,B,C≤N
I'm so curious if there is any other smartest solution which can greatly reduces time complexity.

A much Efficient code will be the below one , but I think it can be optimized much more.
First of all modulo (%) operator is quite expensive so try to avoid it on a large scale
for(i = 0; i<=1000000 ; i++ )
for( j = 0; j<= 1000000; j++ )
{
a = i%j;
for( k = 0; k <= j; k++ )
if (a == k && j%k == 0)
count++;
}
We placed a = i%j in second loop because there is no need for it to be calculated every time k changes as it is independent of k and for the condition j%k == 0 to be true , k should be <= j hence change looping restrictions

First of all, your code has undefined behavior due to division by zero: when k is zero then j%k is undefined, so I assume that all your loops should start with 1 and not 0.
Usually the % and the / operators are much slower to execute than any other operation. It is possible to get rid of most invocations of the % operators in your code by several simple steps.
First, look at the if line:
if (i % j == k && j%k == 0)
The i % j == k has a very strict constrain over k which plays into your hands. It means that it is pointless to iterate k at all, since there is only one value of k that passes this condition.
for (i = 1; i<=1000000 ; i++ ) {
for ( j = 1; j<= 1000000; j++ ) {
k = i % j;
// Constrain k to the range of the original loop.
if (k <= 1000000 && k > 0 && j%k == 0)
count++;
}
}
To get rid of "i % j" switch the loop. This change is possible since this code is affected only by which combinations of i,j are tested, not in the order in which they are introduced.
for ( j = 1; j<= 1000000; j++ ) {
for (i = 1; i<=1000000 ; i++ ) {
k = i % j;
// Constrain k to the range of the original loop.
if (k <= 1000000 && k > 0 && j%k == 0)
count++;
}
}
Here it is easy to observe how k behaves, and use that in order to iterate on k directly without iterating on i and so getting rid of i%j. k iterates from 1 to j-1 and then does it again and again. So all we have to do is to iterate over k directly in the loop of i. Note that i%j for j == 1 is always 0, and since k==0 does not pass the condition of the if we can safely start with j=2, skipping 1:
for ( j = 2; j<= 1000000; j++ ) {
for (i = 1, k=1; i<=1000000 ; i++, k++ ) {
if (k == j)
k = 0;
// Constrain k to the range of the original loop.
if (k <= 1000000 && k > 0 && j%k == 0)
count++;
}
}
This is still a waste to run j%k repeatedly for the same values of j,k (remember that k repeats several times in the inner loop). For example, for j=3 the values of i and k go {1,1}, {2,2}, {3,0}, {4,1}, {5,2},{6,0},..., {n*3, 0}, {n*3+1, 1}, {n*3+2, 2},... (for any value of n in the range 0 < n <= (1000000-2)/3).
The values beyond n= floor((1000000-2)/3)== 333332 are tricky - let's have a look. For this value of n, i=333332*3=999996 and k=0, so the last iteration of {i,k}: {n*3,0},{n*3+1,1},{n*3+2, 2} becomes {999996, 0}, {999997, 1}, {999998, 2}. You don't really need to iterate over all these values of n since each of them does exactly the same thing. All you have to do is to run it only once and multiply by the number of valid n values (which is 999996+1 in this case - adding 1 to include n=0).
Since that did not cover all elements, you need to continue the remainder of the values: {999999, 0}, {1000000, 1}. Notice that unlike other iterations, there is no third value, since it would set i out-of-range.
for (int j = 2; j<= 1000000; j++ ) {
if (j % 1000 == 0) std::cout << std::setprecision(2) << (double)j*100/1000000 << "% \r" << std::flush;
int innerCount = 0;
for (int k=1; k<j ; k++ ) {
if (j%k == 0)
innerCount++;
}
int innerLoopRepeats = 1000000/j;
count += innerCount * innerLoopRepeats;
// complete the remainder:
for (int k=1, i= j * innerLoopRepeats+1; i <= 1000000 ; k++, i++ ) {
if (j%k == 0)
count++;
}
}
This is still extremely slow, but at least it completes in less than a day.
It is possible to have a further speed up by using an important property of divisibility.
Consider the first inner loop (it's almost the same for the second inner loop),
and notice that it does a lot of redundant work, and does it expensively.
Namely, if j%k==0, it means that k divides j and that there is pairK such that pairK*k==j.
It is trivial to calculate the pair of k: pairK=j/k.
Obviously, for k > sqrt(j) there is pairK < sqrt(j). This implies that any k > sqrt(j) can be extracted simply
by scanning all k < sqrt(j). This feature lets you loop over only a square root of all interesting values of k.
By searching only for sqrt(j) values gives a huge performance boost, and the whole program can finish in seconds.
Here is a view of the second inner loop:
// complete the remainder:
for (int k=1, i= j * innerLoopRepeats+1; i <= 1000000 && k*k <= j; k++, i++ ) {
if (j%k == 0)
{
count++;
int pairI = j * innerLoopRepeats + j / k;
if (pairI != i && pairI <= 1000000) {
count++;
}
}
}
The first inner loop has to go over a similar transformation.

Just reorder indexation and calculate A based on constraints:
void findAllSpecial(int N, void (*f)(int A, int B, int C))
{
// 1 ≤ A,B,C ≤ N
for (int C = 1; C < N; ++C) {
// B mod C = 0
for (int B = C; B < N; B += C) {
// A mod B = C
for (int A = C; A < N; A += B) {
f(A, B, C);
}
}
}
}
No divisions not useless if just for loops and adding operations.

Below is the obvious optimization:
The 3rd loop with 'k' is really not needed as there is already a many to One mapping from (I,j) -> k
What I understand from the code is that you want to calculate the number of (i,j) pairs such that the (i%j) is a factor of j. Is this correct or am I missing something?

Related

Algorithm on hexagonal grid

Hexagonal grid is represented by a two-dimensional array with R rows and C columns. First row always comes "before" second in hexagonal grid construction (see image below). Let k be the number of turns. Each turn, an element of the grid is 1 if and only if the number of neighbours of that element that were 1 the turn before is an odd number. Write C++ code that outputs the grid after k turns.
Limitations:
1 <= R <= 10, 1 <= C <= 10, 1 <= k <= 2^(63) - 1
An example with input (in the first row are R, C and k, then comes the starting grid):
4 4 3
0 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0
Simulation: image, yellow elements represent '1' and blank represent '0'.
This problem is easy to solve if I simulate and produce a grid each turn, but with big enough k it becomes too slow. What is the faster solution?
EDIT: code (n and m are used instead R and C) :
#include <cstdio>
#include <cstring>
using namespace std;
int old[11][11];
int _new[11][11];
int n, m;
long long int k;
int main() {
scanf ("%d %d %lld", &n, &m, &k);
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) scanf ("%d", &old[i][j]);
}
printf ("\n");
while (k) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
int count = 0;
if (i % 2 == 0) {
if (i) {
if (j) count += old[i-1][j-1];
count += old[i-1][j];
}
if (j) count += (old[i][j-1]);
if (j < m-1) count += (old[i][j+1]);
if (i < n-1) {
if (j) count += old[i+1][j-1];
count += old[i+1][j];
}
}
else {
if (i) {
if (j < m-1) count += old[i-1][j+1];
count += old[i-1][j];
}
if (j) count += old[i][j-1];
if (j < m-1) count += old[i][j+1];
if (i < n-1) {
if (j < m-1) count += old[i+1][j+1];
count += old[i+1][j];
}
}
if (count % 2) _new[i][j] = 1;
else _new[i][j] = 0;
}
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) old[i][j] = _new[i][j];
}
k--;
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
printf ("%d", old[i][j]);
}
printf ("\n");
}
return 0;
}
For a given R and C, you have N=R*C cells.
If you represent those cells as a vector of elements in GF(2), i.e, 0s and 1s where arithmetic is performed mod 2 (addition is XOR and multiplication is AND), then the transformation from one turn to the next can be represented by an N*N matrix M, so that:
turn[i+1] = M*turn[i]
You can exponentiate the matrix to determine how the cells transform over k turns:
turn[i+k] = (M^k)*turn[i]
Even if k is very large, like 2^63-1, you can calculate M^k quickly using exponentiation by squaring: https://en.wikipedia.org/wiki/Exponentiation_by_squaring This only takes O(log(k)) matrix multiplications.
Then you can multiply your initial state by the matrix to get the output state.
From the limits on R, C, k, and time given in your question, it's clear that this is the solution you're supposed to come up with.
There are several ways to speed up your algorithm.
You do the neighbour-calculation with the out-of bounds checking in every turn. Do some preprocessing and calculate the neighbours of each cell once at the beginning. (Aziuth has already proposed that.)
Then you don't need to count the neighbours of all cells. Each cell is on if an odd number of neighbouring cells were on in the last turn and it is off otherwise.
You can think of this differently: Start with a clean board. For each active cell of the previous move, toggle the state of all surrounding cells. When an even number of neighbours cause a toggle, the cell is on, otherwise the toggles cancel each other out. Look at the first step of your example. It's like playing Lights Out, really.
This method is faster than counting the neighbours if the board has only few active cells and its worst case is a board whose cells are all on, in which case it is as good as neighbour-counting, because you have to touch each neighbours for each cell.
The next logical step is to represent the board as a sequence of bits, because bits already have a natural way of toggling, the exclusive or or xor oerator, ^. If you keep the list of neigbours for each cell as a bit mask m, you can then toggle the board b via b ^= m.
These are the improvements that can be made to the algorithm. The big improvement is to notice that the patterns will eventually repeat. (The toggling bears resemblance with Conway's Game of Life, where there are also repeating patterns.) Also, the given maximum number of possible iterations, 2⁶³ is suspiciously large.
The playing board is small. The example in your question will repeat at least after 2¹⁶ turns, because the 4×4 board can have at most 2¹⁶ layouts. In practice, turn 127 reaches the ring pattern of the first move after the original and it loops with a period of 126 from then.
The bigger boards may have up to 2¹⁰⁰ layouts, so they may not repeat within 2⁶³ turns. A 10×10 board with a single active cell near the middle has ar period of 2,162,622. This may indeed be a topic for a maths study, as Aziuth suggests, but we'll tacke it with profane means: Keep a hash map of all previous states and the turns where they occurred, then check whether the pattern has occurred before in each turn.
We now have:
a simple algorithm for toggling the cells' state and
a compact bitwise representation of the board, which allows us to create a hash map of the previous states.
Here's my attempt:
#include <iostream>
#include <map>
/*
* Bit representation of a playing board, at most 10 x 10
*/
struct Grid {
unsigned char data[16];
Grid() : data() {
}
void add(size_t i, size_t j) {
size_t k = 10 * i + j;
data[k / 8] |= 1u << (k % 8);
}
void flip(const Grid &mask) {
size_t n = 13;
while (n--) data[n] ^= mask.data[n];
}
bool ison(size_t i, size_t j) const {
size_t k = 10 * i + j;
return ((data[k / 8] & (1u << (k % 8))) != 0);
}
bool operator<(const Grid &other) const {
size_t n = 13;
while (n--) {
if (data[n] > other.data[n]) return true;
if (data[n] < other.data[n]) return false;
}
return false;
}
void dump(size_t n, size_t m) const {
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
std::cout << '\n';
}
};
int main()
{
size_t n, m, k;
std::cin >> n >> m >> k;
Grid grid;
Grid mask[10][10];
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
int x;
std::cin >> x;
if (x) grid.add(i, j);
}
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
Grid &mm = mask[i][j];
if (i % 2 == 0) {
if (i) {
if (j) mm.add(i - 1, j - 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j) mm.add(i + 1, j - 1);
mm.add(i + 1, j);
}
} else {
if (i) {
if (j < m - 1) mm.add(i - 1, j + 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j < m - 1) mm.add(i + 1, j + 1);
mm.add(i + 1, j);
}
}
}
}
std::map<Grid, size_t> prev;
std::map<size_t, Grid> pattern;
for (size_t turn = 0; turn < k; turn++) {
Grid next;
std::map<Grid, size_t>::const_iterator it = prev.find(grid);
if (1 && it != prev.end()) {
size_t start = it->second;
size_t period = turn - start;
size_t index = (k - turn) % period;
grid = pattern[start + index];
break;
}
prev[grid] = turn;
pattern[turn] = grid;
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
if (grid.ison(i, j)) next.flip(mask[i][j]);
}
}
grid = next;
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (grid.ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
return 0;
}
There is probably room for improvement. Especially, I'm not so sure how it fares for big boards. (The code above uses an ordered map. We don't need the order, so using an unordered map will yield faster code. The example above with a single active cell on a 10×10 board took significantly longer than a second with an ordered map.)
Not sure about how you did it - and you should really always post code here - but let's try to optimize things here.
First of all, there is not really a difference between that and a quadratic grid. Different neighbor relationships, but I mean, that is just a small translation function. If you have a problem there, we should treat this separately, maybe on CodeReview.
Now, the naive solution is:
for all fields
count neighbors
if odd: add a marker to update to one, else to zero
for all fields
update all fields by marker of former step
this is obviously in O(N). Iterating twice is somewhat twice the actual run time, but should not be that bad. Try not to allocate space every time that you do that but reuse existing structures.
I'd propose this solution:
at the start:
create a std::vector or std::list "activated" of pointers to all fields that are activated
each iteration:
create a vector "new_activated"
for all items in activated
count neighbors, if odd add to new_activated
for all items in activated
set to inactive
replace activated by new_activated*
for all items in activated
set to active
*this can be done efficiently by putting them in a smart pointer and use move semantics
This code only works on the activated fields. As long as they stay within some smaller area, this is far more efficient. However, I have no idea when this changes - if there are activated fields all over the place, this might be less efficient. In that case, the naive solution might be the best one.
EDIT: after you now posted your code... your code is quite procedural. This is C++, use classes and use representation of things. Probably you do the search for neighbors right, but you can easily make mistakes there and therefore should isolate that part in a function, or better method. Raw arrays are bad and variables like n or k are bad. But before I start tearing your code apart, I instead repeat my recommendation, put the code on CodeReview, having people tear it apart until it is perfect.
This started off as a comment, but I think it could be helpful as an answer in addition to what has already been stated.
You stated the following limitations:
1 <= R <= 10, 1 <= C <= 10
Given these restrictions, I'll take the liberty to can represent the grid/matrix M of R rows and C columns in constant space (i.e. O(1)), and also check its elements in O(1) instead of O(R*C) time, thus removing this part from our time-complexity analysis.
That is, the grid can simply be declared as bool grid[10][10];.
The key input is the large number of turns k, stated to be in the range:
1 <= k <= 2^(63) - 1
The problem is that, AFAIK, you're required to perform k turns. This makes the algorithm be in O(k). Thus, no proposed solution can do better than O(k)[1].
To improve the speed in a meaningful way, this upper-bound must be lowered in some way[1], but it looks like this cannot be done without altering the problem constraints.
Thus, no proposed solution can do better than O(k)[1].
The fact that k can be so large is the main issue. The most anyone can do is improve the rest of the implementation, but this will only improve by a constant factor; you'll have to go through k turns regardless of how you look at it.
Therefore, unless some clever fact and/or detail is found that allows this bound to be lowered, there's no other choice.
[1] For example, it's not like trying to determine if some number n is prime, where you can check all numbers in the range(2, n) to see if they divide n, making it a O(n) process, or notice that some improvements include only looking at odd numbers after checking n is not even (constant factor; still O(n)), and then checking odd numbers only up to √n, i.e., in the range(3, √n, 2), which meaningfully lowers the upper-bound down to O(√n).

How to flatten a loop through the upper triangle of a matrix?

I have a situation where I am using openMP for the Xeon Phi Coprocessor, and I have an opportunity to parallelize an "embarrassingly parallel" double for loop.
However, the for loop is looping through the upper triangle (including the diagonal):
for (int i = 0; i < n; i++)
// access some arrays with the value of i
for (int j = i; j < n; j++)
// do some stuff with the values of j and i
So, I've got the total size of the loop,
for (int q = 0; q < ((n*n - n)/2)+n; q++)
// Do stuff
But where I am struggling is:
How do I calculate i and j from q? Is it possible?
In the meantime, I'm just doing the full matrix, calculating i and j from there, and only doing my stuff when j >= i...but that still leaves a hefty amount of thread overhead.
If I restate your problem, to find i and j from q, you need the greatest i such that
q >= i*n - (i-1)*i/2
to define j as
j = i + (q - i*n - (i-1)*i/2)
If you have such a greatest i, then
(i+1)*n - i*(i+1)/2 > q >= i*n - (i-1)*i/2
n-i > (q - i*n - (i-1)*i/2) >= 0
n > j = i + (q - i*n - (i-1)*i/2) >= i
Here is a first iterative method to find i:
for (i = 0; q >= i*n - (i-1)*i/2; ++i);
i = i-1;
If you iterate over q, the computation of i is likely to exploit the iterative process.
A second method could use sqrt since
i*n - i²/2 + i/2 ~ q
i²/2 - i(n+1/2) + q ~ 0
delta = (n+0.5)² - 2q
i ~ (n+0.5) - sqrt(delta)
i could be defined as floor((n+0.5) - sqrt((n+0.5)² - 2q))
OK, this isn't an answer as far as I can tell. But, it is a workaround (for now).
I was thinking about it, and creating an array (or corresponding pointer), a[(n*n + n)/2 + n][2], and reading in the corresponding i and j values in my calling code, and passing this to my function would allow for the speed up.
You can make your loop to iterate as it is iterating on the whole matrix.
And just to keep track on your current line and each time you are entering a new line increment the index with the value of that line.
Then: i == line and j == i%n.
See this code:
int main() {
int n = 10;
int line = 0;
for (int i=0; i<n*n; i++){
if (i%n == 0 && i!=0){
line++;
i += line;
cout << endl;
}
cout << "("<<line<<","<<i%n<<")";
}
return 0;
}
Running Example

Finding the T(n) of An Algorithm

Okay so when my professor was going over it in class it seemed quite simple, but when I got to my homework I became confused. This is a homework example.
for (int i = 0; i < n; i++) // I know this runs at T(n)
for (int j = n - 1; j >= i; j--)
cout << i << " " << j << endl;
Here's an example I understand
for(int i=0; i<n-1; i++) {
for(int j=i+1; j<n; j++) {
1 Simple statement
}
For that example I just plugged in 0, 1, and 2. For 0, it ran for n-1, at 1 for n-2 and at 2 n-3. So I think that for the homework example if I plugged in 0 it would run for n+1 since j has to be greater than or equal to i which is 0. If it's not obvious, i'm pretty confused. If anyone could show me how to solve it, that'd make my day. Thanks guys.
Let's dig into the functon. Let's pick some numbers.
say, n = 5
So our code looks like this (magical pseudo-code uses INCLUSIVE loops, not that it's too important)
(1)for i = 0 to 4
(2)for j = 4 to i
(3)print i j
next
next
So this is a matter of preference, but usually loops are assumed to cost 1 simple statement per execution (comparison, and incrementation). So we'll assume that statements (1) and (2) have a cost of 2. Statement (3) has a cost of 1.
Now to determine T(n).
Our outer loop for i = 0 to 4 runs exactly n times.
Our inner loop for j = 4 to i . . . We'll dig in there for a minute.
For our example with n = 5 loop (2) will execute like so
j = 4; i = 0; j = 4; i = 1; j = 4; i = 2; j = 4; i = 3 j = 4; i = 4;
j = 3; i = 0; j = 3; i = 1; j = 3; i = 2; j = 3; i = 3;
j = 2; i = 0; j = 2; i = 1; j = 2; i = 2;
j = 1; i = 0; j = 1; i = 1;
j = 0; i = 0;
So it makes this kind of pyramid shape, where we do 1 less iteration each time. This particular example ran 5 + 4 + 3 + 2 + 1 = 15 times.
We can write this down as SUM(i; i = 0 to n).
Which we know from precalc: = (1/2)(n)(n+1).
And (3) will execute the exact same number of times as that inner loop since it's the only statement. So our total runtime is going to be. . .
COST(1) + COST(2) + COST(3)
(2)(n) + 2(1/2)(n)(n+1) + (1/2)(n)(n+1)
We can clean this up to be
(3/2)(n)(n+1) + 2n = T(n).
That said, this assumes that loops cost 2 and the statement costs 1. It's usually more meaningful to say loops cost 0 and statements cost 1. If that were the case, T(n) = (1/2)(n)(n+1).
And givent that T(n), we know T(n) is O(n^2).
Hope this helps!
It's not that hard.
3 examples for single loops:
for (int i = 0; i < n; i++)
for(int i = 0; i < n-1; i++)
for(int i = 2; i < n-1; i++)
The first loop executs it's content n times (i=0,1,2,3,...,n-1).
The same way, the second loop is just n-1 times.
The third would be n-3 because it starts not at 0, but 2
(and if n is less than 3, ie. n-3<0, it won't execute at all)
In a nested loop like
for(int i = 0; i < n-1; i++) {
for(int j = 0; j < n; j++) {
//something
}
}
For each pass of the outer loop, the whole inner loop is executed, ie. you can multiply both single loop counts to get how often "something" is executed in total. Here, it is (n-1) * n = n^2 - n.
If the inner loop depends on the value of the outer loop, it gets a bit more complicated:
for(int i = 0; i < n-1; i++) {
for(int j = i+1; j < n; j++) {
//something
}
}
The inner loop alone is n - (i+1) times, the outer one n-1 times (with i going from 0 to n-2).
While there are "proper" ways to calculate this, a bit logical thinking is often easier, as you did already:
i-value => inner-loop-time
0 => n-1
1 => n-2
...
n-2 => n - (n-2+1) = 1
So you´ll need the sum 1+2+3+...+(n-1).
For calculating sums from 1 to x, following formula helps:
sum[1...x] = x*(x+1)/2
So, the sum from 1 to n-1 is
sum[1...n-1] = (n-1)*(n-1+1)/2 = (n^2 - n)/2
and that´s the solution for the loops above (your second code).
About the first code:
Outer loop: n
Inner loop: From n-1 down to i included, or the other way from i up to <=n-1,
or from i up to <n, that´s n-i times
i >= innerloop
0 n
1 n-1
2 n-2
...
n-1 1
...and the sum from 1 to n is (n^2 + n)/2.
One easy way to investigate a problem is to model it and look at resulting data.
In your case, the question is: how many iterations does the inner loop depending on the the value of the outer loop variable?
let n = 10 in [0..n-1] |> List.map (fun x -> x,n-1-x);;
The 1 line above is the model showing what happens. If you now look at the resulting output, you will quickly notice something...
val it : (int * int) list =
[(0, 9); (1, 8); (2, 7); (3, 6); (4, 5); (5, 4); (6, 3); (7, 2); (8, 1);
(9, 0)]
What is it you notice? For a given N you run the outer loop N times - this is trivial. Now we need to sum up the second numbers and we have the solution:
sum(N-1..0) = sum(N-1..1) = N * (N-1) / 2.
So the total count of cout calls is N * (N-1) / 2.
Another easy way to achieve the same is to modify your function a bit:
int count(int n) {
int c = 0;
<outer for loop>
<inner for loop>
c++;
return c;
}

to optimize the nested loops

for( a=1; a <= 25; a++){
num1 = m[a];
for( b=1; b <= 25; b++){
num2 = m[b];
for( c=1; c <= 25; c++){
num3 = m[c];
for( d=1; d <= 25; d++){
num4 = m[d];
for( e=1; e <= 25; e++){
num5 = m[e];
for( f=1; f <= 25; f++){
num6 = m[f];
for( g=1; g <= 25; g++){
num7 = m[g];
for( h=1; h <= 25; h++){
num8 = m[h];
for( i=1; i <= 25; i++){
num = num1*100000000 + num2*10000000 +
num3* 1000000 + num4* 100000 +
num5* 10000 + num6* 1000 +
num7* 100 + num8* 10 + m[i];
check_prime = 1;
for ( y=2; y <= num/2; y++)
{
if ( num % y == 0 )
check_prime = 0;
}
if ( check_prime != 0 )
{
array[x++] = num;
}
num = 0;
}}}}}}}}}
The above code takes a hell lot of time to finish executing.. In fact it doesn't even finish executing, What can i do to optimize the loop and speed up the execution?? I am newbie to cpp.
Replace this code with code using a sensible algorithm, such as the Sieve of Eratosthenes. The most important "optimization" is choosing the right algorithm in the first place.
If your algorithm for sorting numbers is to swap them randomly until they're in order, it doesn't matter how much you optimize the selecting of the random entries, swapping them, or checking if they're in order. A bad algorithm will mean bad performance regardless.
You're checking 259 = 3,814,697,265,625 numbers whether they're prime. That's a lot of prime tests and will always take long. Even in the best case (for performance) when all array entries (in m) are 0 (never mind that the test considers 0 a prime), so that the trial division loop never runs, it will take hours to run. When all entries of m are positive, the code as is will run for hundreds or thousands of years, since then each number will be trial-divided by more than 50,000,000 numbers.
Looking at the prime check,
check_prime = 1;
for ( y = 2; y <= num/2; y++)
{
if ( num % y == 0 )
check_prime = 0;
}
the first glaring inefficiency is that the loop continues even after a divisor has been found and the compositeness of num established. Break out of the loop as soon as you know the outcome.
check_prime = 1;
for ( y = 2; y <= num/2; y++)
{
if ( num % y == 0 )
{
check_prime = 0;
break;
}
}
In the unfortunate case that all numbers you test are prime, that won't change a thing, but if all (or almost all, for sufficiently large values of almost) the numbers are composite, it will cut the running time by a factor of at least 5000.
The next thing is that you divide up to num/2. That is not necessary. Why do you stop at num/2, and not at num - 1? Well, because you figured out that the largest proper divisor of num cannot be larger than num/2 because if (num >) k > num/2, then 2*k > num and num is not a multiple of k.
That's good, not everybody sees that.
But you can pursue that train of thought further. If num/2 is a divisor of num, that means num = 2*(num/2) (using integer division, with the exception of num = 3). But then num is even, and its compositeness was already determined by the division by 2, so the division by num/2 will never be tried if it succeeds.
So what's the next possible candidate for the largest divisor that needs to be considered? num/3 of course. But if that's a divisor of num, then num = 3*(num/3) (unless num < 9) and the division by 3 has already settled the question.
Going on, if k < √num and num/k is a divisor of num, then num = k*(num/k) and we see that num has a smaller divisor, namely k (possibly even smaller ones).
So the smallest nontrivial divisor of num is less than or equal to √num. Thus the loop needs only run for y <= √num, or y*y <= num. If no divisor has been found in that range, num is prime.
Now the question arises whether to loop
for(y = 2; y*y <= num; ++y)
or
root = floor(sqrt(num));
for(y = 2; y <= root; ++y)
The first needs one multiplication for the loop condition in each iteration, the second one computation of the square root outside the loop.
Which is faster?
That depends on the average size of num and whether many are prime or not (more precisely, on the average size of the smallest prime divisor). Computing a square root takes much longer than a multiplication, to compensate that cost, the loop must run for many iterations (on average) - whether "many" means more than 20, more than 100 or more than 1000, say, depends. With num larger than 10^8, as is probably the case here, probably computing the square root is the better choice.
Now we have bounded the number of iterations of the trial division loop to √num whether num is composite or prime and reduced the running time by a factor of at least 5000 (assuming that all m[index] > 0, so that always num >= 10^8) regardless of how many primes are among the tested numbers. If most values num takes are composites with small prime factors, the reduction factor is much larger, to the extent that normally, the running time is almost completely used for testing primes.
Further improvement can be obtained by reducing the number of divisor candidates. If num is divisible by 4, 6, 8, ..., then it is also divisible by 2, so num % y never yields 0 for even y > 2. That means all these divisions are superfluous. By special casing 2 and incrementing the divisor candidate in steps of 2,
if (num % 2 == 0)
{
check_prime = 0;
} else {
root = floor(sqrt(num));
for(y = 3; y <= root; y += 2)
{
if (num % y == 0)
{
check_prime = 0;
break;
}
}
}
the number of divisions to perform and the running time is roughly halved (assuming enough bad cases that the work for even numbers is negligible).
Now, whenever y is a multiple of 3 (other than 3 itself), num % y will only be computed when num is not a multiple of 3, so these divisions are also superfluous. You can eliminate them by also special-casing 3 and letting y run through only the odd numbers that are not divisible by 3 (start with y = 5, increment by 2 and 4 alternatingly). That chops off roughly a third of the remaining work (if enough bad cases are present).
Continuing that elimination process, we need only divide num by the primes not exceeding √num to find whether it's prime or not.
So usually it would be a good idea to find the primes not exceeding the square root of the largest num you'll check, store them in an array and loop
root = floor(sqrt(num));
for(k = 0, y = primes[0]; k < prime_count && (y = primes[k]) <= root; ++k)
{
if (num % y == 0)
{
check_prime = 0;
break;
}
}
Unless the largest value num can take is small enough, if, for example, you'll always have num < 2^31, then you should find the primes to that limit in a bit-sieve so that you can look up whether num is prime in constant time (a sieve of 2^31 bits takes 256 MB, if you only have flags for the odd numbers [needs special-casing to check whether num is even], you only need 128 MB to check the primality of numbers < 2^31 in constant time, further reduction of required space for the sieve is possible).
So far for the prime test itself.
If the m array contains numbers divisible by 2 or by 5, it may be worthwhile to reorder the loops, have the loop for i the outermost, and skip the inner loops if m[i] is divisible by 2 or by 5 - all the other numbers are multiplied by powers of 10 before adding, so then num would be a multiple of 2 resp. 5 and not prime.
But, despite all that, it will still take long to run the code. Nine nested loops reek of a wrong design.
What is it that you try to do? Maybe we can help finding the correct design.
We can eliminate a lot of redundant calculations by calculating each part of the number as it becomes available. This also shows the trial division test for primality on 2-3 wheel up to the square root of a number:
// array m[] is assumed sorted in descending order NB!
// a macro to skip over the duplicate digits
#define I(x) while( x<25 && m[x+1]==m[x] ) ++x;
for( a=1; a <= 25; a++) {
num1 = m[a]*100000000;
for( b=1; b <= 25; b++) if (b != a) {
num2 = num1 + m[b]*10000000;
for( c=1; c <= 25; c++) if (c != b && c != a) {
num3 = num2 + m[c]*1000000;
for( d=1; d <= 25; d++) if (d!=c && d!=b && d!=a) {
num4 = num3 + m[d]*100000;
for( e=1; e <= 25; e++) if (e!=d && e!=c && e!=b && e!=a) {
num5 = num4 + m[e]*10000;
for( f=1; f <= 25; f++) if (f!=e&&f!=d&&f!=c&&f!=b&&f!=a) {
num6 = num5 + m[f]*1000;
limit = floor( sqrt( num6+1000 )); ///
for( g=1; g <= 25; g++) if (g!=f&&g!=e&&g!=d&&g!=c&&g!=b&&g!=a) {
num7 = num6 + m[g]*100;
for( h=1; h <= 25; h++) if (h!=g&&h!=f&&h!=e&&h!=d&&h!=c&&h!=b&&h!=a) {
num8 = num7 + m[h]*10;
for( i=1; i <= 25; i++) if (i!=h&&i!=g&&i!=f&&i!=e&&i!=d
&&i!=c&&i!=b&&i!=a) {
num = num8 + m[i];
if( num % 2 /= 0 && num % 3 /= 0 ) {
is_prime = 1;
for ( y=5; y <= limit; y+=6) {
if ( num % y == 0 ) { is_prime = 0; break; }
if ( num % (y+2) == 0 ) { is_prime = 0; break; }
}
if ( is_prime ) { return( num ); } // largest prime found
}I(i)}I(h)}I(g)}I(f)}I(e)}I(d)}I(c)}I(b)}I(a)}
This code also eliminates the duplicate indices. As you've indicated in the comments, you pick your numbers out of a 5x5 grid. That means that you must use all different indices. This will bring down the count of numbers to test from 25^9 = 3,814,697,265,625 to 25*24*23*...*17 = 741,354,768,000.
Since you've now indicated that all entries in the m[] array are less than 10, there certain to be duplicates, which need to be skipped when searching. As Daniel points out, searching from the top, the first found prime will be the biggest. This is achieved by pre-sorting the m[] array in descending order.

Dynamic programming approach to calculating Stirling's Number

int s_dynamic(int n,int k) {
int maxj = n-k;
int *arr = new int[maxj+1];
for (int i = 0; i <= maxj; ++i)
arr[i] = 1;
for (int i = 1; i <= k; ++i)
for(int j = 1; j <= maxj; ++j)
arr[j] += i*arr[j-1];
return arr[maxj];
}
Here's my attempt at determining Stirling numbers using Dynamic Programming.
It is defined as follows:
S(n,k) = S(n-1,k-1) + k S(n-1,k), if 1 < k < n
S(n,k) = 1, if k=1 ou k=n
Seems ok, right? Except when I run my unit test...
partitioningTest ..\src\Test.cpp:44 3025 == s_dynamic(9,3) expected: 3025 but was: 4414
Can anyone see what I'm doing wrong?
Thanks!
BTW, here's the recursive solution:
int s_recursive(int n,int k) {
if (k == 1 || k == n)
return 1;
return s_recursive(n-1,k-1) + k*s_recursive(n-1,k);
}
Found the bug.
You already computed your dynamic array of Stirlings numbers for k=1 (S(n,1)=1 for all n).
You should start computing S(n,2) - that is:
for (int i = 2; i <= k; ++i) //i=2, not 1
for(int j = 1; j <= maxj; ++j)
arr[j] += i*arr[j-1];
Your approach is just fine, except you seem to have made a simple indexing error. If you think about what indexes i and j represent, and what the inner loop transforms arr[j] to, you'll see it easy enough (I lie, it took me a good half hour to figure out what was what :)).
From what I can decode, i represents the value k during calculations, and arr[j] is transformed from S(i+j, i-1) to S(i+1+j, i). The topmost for loop that initializes arr sets it up as S(1+j, 1). According to these loops, the calculations look just fine. Except for one thing: The very first i loop assumes that arr[j] contains S(0+j, 0), and so it is where your problem lies. If you change the starting value of i from 1 to 2, all should be OK (you may need an if or two for edge cases). The initial i=2 will transform arr[j] from S(1+j, 1) to S(2+j, 2), and the rest of the transformations will be just fine.
Alternatively, you could have initialized arr[j] to S(0+j, 0) if it were defined, but unfortunately, Stirling's numbers are undefined at k=0.
EDIT: Apparently I was wrong in my last comment. If you initialize arr to {1, 0, 0, ...}, you can leave starting value of i as 1. For this, you use the initial values S(0, 0)=1, and S(n, 0)=0, n>0 instead.