How I can speed up my algorithm?

How I can speed up my algorithm? - c++

I'm solving some problems and I can't solve these one. I have to write a code where user enter a decimal number, and I need to count how many times that number starts with digit 1 in other numerous systems.
Here is algorithm:
for (int i = 3; i <= n; i++) {
int z = n;
while (z != 0) {
x = z % i;
z = z / i;
}
if (x == 1) {
brOsnova++;
}
}

You can accelerate it already by not checking the i's that verify
i <= n < 2*i since all of them will satisfy.
Therefore, check only for(int i = 3; i <= n/2; ++i) and then add (n+1)/2 to the final brOsnova.
I am sure it can be further accelerated and there must be some O(log(n)) algorithm, but maybe it would be far fetched... or a good candidate question for the algorithm tag.

Instead of the loop, use this:
x = x - ((x / i) * i);
if (x == 1)
{
...
}
This only works for integer maths.

Related

Find the number of pairs of positive integers satisfying the inequality

I'm trying to solve a programming problem where I have to display the number of positive integer solutions of the inequality x² + y² < n, where n is given by the user. I've already written a code that seems to work but not as fast as I'd like it to. Is there any way to speed it up?
My current code:
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
long long n, i, r, k, p, a;
cin >> k;
while (k--)
{
r = 0;
cin >> n;
p = sqrt(n);
for (i = 1; i <= p; i++)
{
a = sqrt(n - (i * i));
r += a;
if ((((i * i) + (a * a)) == n) && (a > 0))
{
r--;
}
}
cout << r << "\n";
}
return 0;
}
Edit:
This is a solution for this task.
The task in English:
Find the number of natural solutions (x≥1, y≥1) of the inequality x²+y² < n, where 0 < n < 2147483647. For example, for n=10 there are 4 solutions: (1,1), (1,2), (2,1), (2,2).
Input
In the first line of input the number of test cases k is given. In the next k lines, there are the n values given.
Output
In the output, you have to display in separate lines the number of natural solutions of the inequality.
Example
Input:
2
10
11
Output:
4
6

Your solution seems fast already. The main possibility to reduce the time spent is to suppress the call to sqrtin the loop. This is obtained by considering that the value a = sqrt(n - (i * i)) does not vary very much from one iteration to the next one.
Here is the code:
r = 0;
p = sqrt(n);
if ((p*p) == n) p--;
a = p;
for (long long i = 1; i <= p; i++)
{
while ((n-i*i) <= a*a) {
--a;
}
r += a;
}

Algorithm on hexagonal grid

Hexagonal grid is represented by a two-dimensional array with R rows and C columns. First row always comes "before" second in hexagonal grid construction (see image below). Let k be the number of turns. Each turn, an element of the grid is 1 if and only if the number of neighbours of that element that were 1 the turn before is an odd number. Write C++ code that outputs the grid after k turns.
Limitations:
1 <= R <= 10, 1 <= C <= 10, 1 <= k <= 2^(63) - 1
An example with input (in the first row are R, C and k, then comes the starting grid):
4 4 3
0 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0
Simulation: image, yellow elements represent '1' and blank represent '0'.
This problem is easy to solve if I simulate and produce a grid each turn, but with big enough k it becomes too slow. What is the faster solution?
EDIT: code (n and m are used instead R and C) :
#include <cstdio>
#include <cstring>
using namespace std;
int old[11][11];
int _new[11][11];
int n, m;
long long int k;
int main() {
scanf ("%d %d %lld", &n, &m, &k);
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) scanf ("%d", &old[i][j]);
}
printf ("\n");
while (k) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
int count = 0;
if (i % 2 == 0) {
if (i) {
if (j) count += old[i-1][j-1];
count += old[i-1][j];
}
if (j) count += (old[i][j-1]);
if (j < m-1) count += (old[i][j+1]);
if (i < n-1) {
if (j) count += old[i+1][j-1];
count += old[i+1][j];
}
}
else {
if (i) {
if (j < m-1) count += old[i-1][j+1];
count += old[i-1][j];
}
if (j) count += old[i][j-1];
if (j < m-1) count += old[i][j+1];
if (i < n-1) {
if (j < m-1) count += old[i+1][j+1];
count += old[i+1][j];
}
}
if (count % 2) _new[i][j] = 1;
else _new[i][j] = 0;
}
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) old[i][j] = _new[i][j];
}
k--;
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
printf ("%d", old[i][j]);
}
printf ("\n");
}
return 0;
}

For a given R and C, you have N=R*C cells.
If you represent those cells as a vector of elements in GF(2), i.e, 0s and 1s where arithmetic is performed mod 2 (addition is XOR and multiplication is AND), then the transformation from one turn to the next can be represented by an N*N matrix M, so that:
turn[i+1] = M*turn[i]
You can exponentiate the matrix to determine how the cells transform over k turns:
turn[i+k] = (M^k)*turn[i]
Even if k is very large, like 2^63-1, you can calculate M^k quickly using exponentiation by squaring: https://en.wikipedia.org/wiki/Exponentiation_by_squaring This only takes O(log(k)) matrix multiplications.
Then you can multiply your initial state by the matrix to get the output state.
From the limits on R, C, k, and time given in your question, it's clear that this is the solution you're supposed to come up with.

There are several ways to speed up your algorithm.
You do the neighbour-calculation with the out-of bounds checking in every turn. Do some preprocessing and calculate the neighbours of each cell once at the beginning. (Aziuth has already proposed that.)
Then you don't need to count the neighbours of all cells. Each cell is on if an odd number of neighbouring cells were on in the last turn and it is off otherwise.
You can think of this differently: Start with a clean board. For each active cell of the previous move, toggle the state of all surrounding cells. When an even number of neighbours cause a toggle, the cell is on, otherwise the toggles cancel each other out. Look at the first step of your example. It's like playing Lights Out, really.
This method is faster than counting the neighbours if the board has only few active cells and its worst case is a board whose cells are all on, in which case it is as good as neighbour-counting, because you have to touch each neighbours for each cell.
The next logical step is to represent the board as a sequence of bits, because bits already have a natural way of toggling, the exclusive or or xor oerator, ^. If you keep the list of neigbours for each cell as a bit mask m, you can then toggle the board b via b ^= m.
These are the improvements that can be made to the algorithm. The big improvement is to notice that the patterns will eventually repeat. (The toggling bears resemblance with Conway's Game of Life, where there are also repeating patterns.) Also, the given maximum number of possible iterations, 2⁶³ is suspiciously large.
The playing board is small. The example in your question will repeat at least after 2¹⁶ turns, because the 4×4 board can have at most 2¹⁶ layouts. In practice, turn 127 reaches the ring pattern of the first move after the original and it loops with a period of 126 from then.
The bigger boards may have up to 2¹⁰⁰ layouts, so they may not repeat within 2⁶³ turns. A 10×10 board with a single active cell near the middle has ar period of 2,162,622. This may indeed be a topic for a maths study, as Aziuth suggests, but we'll tacke it with profane means: Keep a hash map of all previous states and the turns where they occurred, then check whether the pattern has occurred before in each turn.
We now have:
a simple algorithm for toggling the cells' state and
a compact bitwise representation of the board, which allows us to create a hash map of the previous states.
Here's my attempt:
#include <iostream>
#include <map>
/*
* Bit representation of a playing board, at most 10 x 10
*/
struct Grid {
unsigned char data[16];
Grid() : data() {
}
void add(size_t i, size_t j) {
size_t k = 10 * i + j;
data[k / 8] |= 1u << (k % 8);
}
void flip(const Grid &mask) {
size_t n = 13;
while (n--) data[n] ^= mask.data[n];
}
bool ison(size_t i, size_t j) const {
size_t k = 10 * i + j;
return ((data[k / 8] & (1u << (k % 8))) != 0);
}
bool operator<(const Grid &other) const {
size_t n = 13;
while (n--) {
if (data[n] > other.data[n]) return true;
if (data[n] < other.data[n]) return false;
}
return false;
}
void dump(size_t n, size_t m) const {
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
std::cout << '\n';
}
};
int main()
{
size_t n, m, k;
std::cin >> n >> m >> k;
Grid grid;
Grid mask[10][10];
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
int x;
std::cin >> x;
if (x) grid.add(i, j);
}
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
Grid &mm = mask[i][j];
if (i % 2 == 0) {
if (i) {
if (j) mm.add(i - 1, j - 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j) mm.add(i + 1, j - 1);
mm.add(i + 1, j);
}
} else {
if (i) {
if (j < m - 1) mm.add(i - 1, j + 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j < m - 1) mm.add(i + 1, j + 1);
mm.add(i + 1, j);
}
}
}
}
std::map<Grid, size_t> prev;
std::map<size_t, Grid> pattern;
for (size_t turn = 0; turn < k; turn++) {
Grid next;
std::map<Grid, size_t>::const_iterator it = prev.find(grid);
if (1 && it != prev.end()) {
size_t start = it->second;
size_t period = turn - start;
size_t index = (k - turn) % period;
grid = pattern[start + index];
break;
}
prev[grid] = turn;
pattern[turn] = grid;
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
if (grid.ison(i, j)) next.flip(mask[i][j]);
}
}
grid = next;
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (grid.ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
return 0;
}
There is probably room for improvement. Especially, I'm not so sure how it fares for big boards. (The code above uses an ordered map. We don't need the order, so using an unordered map will yield faster code. The example above with a single active cell on a 10×10 board took significantly longer than a second with an ordered map.)

Not sure about how you did it - and you should really always post code here - but let's try to optimize things here.
First of all, there is not really a difference between that and a quadratic grid. Different neighbor relationships, but I mean, that is just a small translation function. If you have a problem there, we should treat this separately, maybe on CodeReview.
Now, the naive solution is:
for all fields
count neighbors
if odd: add a marker to update to one, else to zero
for all fields
update all fields by marker of former step
this is obviously in O(N). Iterating twice is somewhat twice the actual run time, but should not be that bad. Try not to allocate space every time that you do that but reuse existing structures.
I'd propose this solution:
at the start:
create a std::vector or std::list "activated" of pointers to all fields that are activated
each iteration:
create a vector "new_activated"
for all items in activated
count neighbors, if odd add to new_activated
for all items in activated
set to inactive
replace activated by new_activated*
for all items in activated
set to active
*this can be done efficiently by putting them in a smart pointer and use move semantics
This code only works on the activated fields. As long as they stay within some smaller area, this is far more efficient. However, I have no idea when this changes - if there are activated fields all over the place, this might be less efficient. In that case, the naive solution might be the best one.
EDIT: after you now posted your code... your code is quite procedural. This is C++, use classes and use representation of things. Probably you do the search for neighbors right, but you can easily make mistakes there and therefore should isolate that part in a function, or better method. Raw arrays are bad and variables like n or k are bad. But before I start tearing your code apart, I instead repeat my recommendation, put the code on CodeReview, having people tear it apart until it is perfect.

This started off as a comment, but I think it could be helpful as an answer in addition to what has already been stated.
You stated the following limitations:
1 <= R <= 10, 1 <= C <= 10
Given these restrictions, I'll take the liberty to can represent the grid/matrix M of R rows and C columns in constant space (i.e. O(1)), and also check its elements in O(1) instead of O(R*C) time, thus removing this part from our time-complexity analysis.
That is, the grid can simply be declared as bool grid[10][10];.
The key input is the large number of turns k, stated to be in the range:
1 <= k <= 2^(63) - 1
The problem is that, AFAIK, you're required to perform k turns. This makes the algorithm be in O(k). Thus, no proposed solution can do better than O(k)[1].
To improve the speed in a meaningful way, this upper-bound must be lowered in some way[1], but it looks like this cannot be done without altering the problem constraints.
Thus, no proposed solution can do better than O(k)[1].
The fact that k can be so large is the main issue. The most anyone can do is improve the rest of the implementation, but this will only improve by a constant factor; you'll have to go through k turns regardless of how you look at it.
Therefore, unless some clever fact and/or detail is found that allows this bound to be lowered, there's no other choice.
[1] For example, it's not like trying to determine if some number n is prime, where you can check all numbers in the range(2, n) to see if they divide n, making it a O(n) process, or notice that some improvements include only looking at odd numbers after checking n is not even (constant factor; still O(n)), and then checking odd numbers only up to √n, i.e., in the range(3, √n, 2), which meaningfully lowers the upper-bound down to O(√n).

Rephrase pascal code to c++ so it can work as efficient as possible

So I got this task where I have pascal code and I need to get out whats the result. That wouldn't be a problem because I know pascal, but I need it to run in 1 second or less with numbers up to 10^9.
readln(N);
counter:=0;
for i:=N-1 downto 1 do begin
counter:= counter + 1;
if N mod i = 0 then break;
end;
writeln(counter);
Here is my code
#include <iostream>
using namespace std;
int main()
{
int x;
int counter = 0;
cin>>x;
for (int i = 2; i <= x; i++){
if (x % i == 0){
counter = x - x / i;
break;
}
}
cout<<counter;
return 0;
}
but it still cant quite get max score.

Restate problem:
1) Compute F = largest proper factor of X
2) Output X-F
Instead of directly searching for the largest proper factor, apply three trivial optimizations (maybe something more advanced will be needed, but first see if three trivial optimizations are enough).
A) Find S = smallest factor of X greater than 1. Output X-(X/S)
B) Special case for prime
C) Special case for even
int largest_proper_factor(int X)
{
if ( X % 2 == 0 ) return X/2; // Optimize even
// Note the add of .5 is only needed for non compliant sqrt version that
// might return a tiny fraction less than the exact answer.
int last = (int)(.5 + std::sqrt( (double) X )) );
for ( int i=3; i<=last; i+=2 ) // big savings here because even was optimized earlier
{
if ( X % i == 0 ) return X/i;
}
return 1; // special case for prime
}

Numbers like 10^9 usually indicate contest problems, which need creative thinking instead of fast CPU...
See, N mod i = 0 means N is divisible by i. So the loop counts numbers between N and one of its divisor (possibly plus one... Check it.) Which one — remains for you.

Ok i got the result i wanned:
#include <iostream>
using namespace std;
int main()
{
int x;
int counter = 0;
cin>>x;
for (int i = 2; i <= x; i++){
if (x % i == 0){
counter = x - x / i;
break;
}
if (x / 4 == i){
i = x - 1;
}
}
cout<<counter;
return 0;
}
Thank you everyone who helped me:)

Dynamic Programming: Calculate all possible end positions for a list of consecutive jumps

The problem consists in calculate all possible end positions and how many combinations exist for each one.
Given a start position x=0, a length m of the track and a list of jumps. Return the number of possible ends for each position on the interval [-m/2,+m/2]. The jumps must be done in the same order as given but it could be done in negative or positive way.
For example:
L = 40
jumps = 10, 10
Solution:
-20 : 1 (-10, -10)
0 : 2 (-10,+10 & +10,-10)
20 : 1 (+10,+10)
(The output needed is only the pair "position : #combinations")
I did it with a simple recursion, and the result is OK.
But in large sets of data, the execution time is few minutes or hours.
I know that with dynamic programming I can have a solution in few seconds, but I don't know how can I apply dynamic in this case.
There's my actual recursive function:
void escriuPosibilitats(queue<int> q, map<int,int> &res, int posicio, int m) {
int salt = q.front();
q.pop();
if(esSaltValid(m,posicio,-salt)) {
int novaPosicio = posicio - salt;
if(q.empty()) {
res[novaPosicio]++;
} else {
escriuPosibilitats(q,res,novaPosicio,m);
}
}
if(esSaltValid(m,posicio,salt)) {
int novaPosicio = posicio + salt;
if(q.empty()) {
res[novaPosicio]++;
} else {
escriuPosibilitats(q,res,novaPosicio,m);
}
}
}
Where q is the queue of the remaining jumps.
Where res is the parcial solution.
Where posicio is the actual position.
Where m is the length of the track.
Where esSaltValid is a function that checks if the jump is valid in the range of the track length.
PD: Sorry for my english level. I tried to improve my question! Thanks =)

You can use the following idea. Let dp[x][i] be the number of ways to arrive to the position x using until the jump i. Then the answer would be dp[x][N] for each x, and where N is the number of jumps. Even more, you can realize that this dp depends only on the previous row, and then you can simply dp[x] and save the next row in some auxiliary array, and then replace it in each iteration. The code would be something like this:
const int MOD = (int)(1e8+7);
const int L = 100;
int N = 36;
int dx[] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
int dp[L+1];
int next[L+1];
int main() {
int shift = L/2; // to handle negative indexes
dp[shift] = 1; // the initial position has one way to arrive, since you start there
for (int i = 0; i < N; ++i) { // for each jump size
for (int x = -L/2; x <= L/2; ++x) { // for each possible position
if (-L/2 <= x + dx[i] && x + dx[i] <= L/2) // positive jump
next[x + shift] = (next[x + shift] + dp[x + dx[i] + shift]) % MOD;
if (-L/2 <= x - dx[i] && x - dx[i] <= L/2) // negative jump
next[x + shift] = (next[x + shift] + dp[x - dx[i] + shift]) % MOD;
}
for (int x = -L/2; x <= L/2; ++x) { // update current dp to next and clear next
dp[x+shift] = next[x+shift];
next[x+shift] = 0;
}
}
for (int x = -L/2; x <= L/2; ++x) // print the result
if (dp[x+shift] != 0) {
cout << x << ": " << dp[x+shift] << '\n';
}
}
Of course, in case L is too big to handle, you can compress the state space and save the results in a map, and not in an array. The complexity of the approach is O(L*N). Hope it helped.
EDIT: just compute everything modulo 1e8+7 and that's it.

weighted RNG speed problem in C++

Edit: to clarify, the problem is with the second algorithm.
I have a bit of C++ code that samples cards from a 52 card deck, which works just fine:
void sample_allcards(int table[5], int holes[], int players) {
int temp[5 + 2 * players];
bool try_again;
int c, n, i;
for (i = 0; i < 5 + 2 * players; i++) {
try_again = true;
while (try_again == true) {
try_again = false;
c = fast_rand52();
// reject collisions
for (n = 0; n < i + 1; n++) {
try_again = (temp[n] == c) || try_again;
}
temp[i] = c;
}
}
copy_cards(table, temp, 5);
copy_cards(holes, temp + 5, 2 * players);
}
I am implementing code to sample the hole cards according to a known distribution (stored as a 2d table). My code for this looks like:
void sample_allcards_weighted(double weights[][HOLE_CARDS], int table[5], int holes[], int players) {
// weights are distribution over hole cards
int temp[5 + 2 * players];
int n, i;
// table cards
for (i = 0; i < 5; i++) {
bool try_again = true;
while (try_again == true) {
try_again = false;
int c = fast_rand52();
// reject collisions
for (n = 0; n < i + 1; n++) {
try_again = (temp[n] == c) || try_again;
}
temp[i] = c;
}
}
for (int player = 0; player < players; player++) {
// hole cards according to distribution
i = 5 + 2 * player;
bool try_again = true;
while (try_again == true) {
try_again = false;
// weighted-sample c1 and c2 at once
// h is a number < 1325
int h = weighted_randi(&weights[player][0], HOLE_CARDS);
// i2h uses h and sets temp[i] to the 2 cards implied by h
i2h(&temp[i], h);
// reject collisions
for (n = 0; n < i; n++) {
try_again = (temp[n] == temp[i]) || (temp[n] == temp[i+1]) || try_again;
}
}
}
copy_cards(table, temp, 5);
copy_cards(holes, temp + 5, 2 * players);
}
My problem? The weighted sampling algorithm is a factor of 10 slower. Speed is very important for my application.
Is there a way to improve the speed of my algorithm to something more reasonable? Am I doing something wrong in my implementation?
Thanks.
edit: I was asked about this function, which I should have posted, since it is key
inline int weighted_randi(double *w, int num_choices) {
double r = fast_randd();
double threshold = 0;
int n;
for (n = 0; n < num_choices; n++) {
threshold += *w;
if (r <= threshold) return n;
w++;
}
// shouldn't get this far
cerr << n << "\t" << threshold << "\t" << r << endl;
assert(n < num_choices);
return -1;
}
...and i2h() is basically just an array lookup.

Your reject collisions are turning an O(n) algorithm into (I think) an O(n^2) operation.
There are two ways to select cards from a deck: shuffle and pop, or pick sets until the elements of the set are unique; you are doing the latter which requires a considerable amount of backtracking.
I didn't look at the details of the code, just a quick scan.

you could gain some speed by replacing the all the loops that check if a card is taken with a bit mask, eg for a pool of 52 cards, we prevent collisions like so:
DWORD dwMask[2] = {0}; //64 bits
//...
int nCard;
while(true)
{
nCard = rand_52();
if(!(dwMask[nCard >> 5] & 1 << (nCard & 31)))
{
dwMask[nCard >> 5] |= 1 << (nCard & 31);
break;
}
}
//...

My guess would be the memcpy(1326*sizeof(double)) within the retry-loop. It doesn't seem to change, so should it be copied each time?

Rather than tell you what the problem is, let me suggest how you can find it. Either 1) single-step it in the IDE, or 2) randomly halt it to see what it's doing.
That said, sampling by rejection, as you are doing, can take an unreasonably long time if you are rejecting most samples.

Your inner "try_again" for loop should stop as soon as it sets try_again to true - there's no point in doing more work after you know you need to try again.
for (n = 0; n < i && !try_again; n++) {
try_again = (temp[n] == temp[i]) || (temp[n] == temp[i+1]);
}

Answering the second question about picking from a weighted set also has an algorithmic replacement that should be less time complex. This is based on the principle of that which is pre-computed does not need to be re-computed.
In an ordinary selection, you have an integral number of bins which makes picking a bin an O(1) operation. Your weighted_randi function has bins of real length, thus selection in your current version operates in O(n) time. Since you don't say (but do imply) that the vector of weights w is constant, I'll assume that it is.
You aren't interested in the width of the bins, per se, you are interested in the locations of their edges that you re-compute on every call to weighted_randi using the variable threshold. If the constancy of w is true, pre-computing a list of edges (that is, the value of threshold for all *w) is your O(n) step which need only be done once. If you put the results in a (naturally) ordered list, a binary search on all future calls yields an O(log n) time complexity with an increase in space needed of only sizeof w / sizeof w[0].

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js