Some optimization about the code (computing ranks of a vector)?

Some optimization about the code (computing ranks of a vector)? - c++

The following code is a function (performance-critical) to compute tied ranks of a vector:
//The function here is to compute tied-ranks: answers.com/topic/tied-rank
mergeSort(x,inds,ci);
//mergeSort(): to sort vector x of length ci, also returns keys (inds) of x.
int tj=0;
double xi=x[0];
for (int j = 1; j < ci; ++j)
{
if (x[j] > xi)
{
double rankvalue = 0.5 * (j - 1 + tj);
for (int k = tj; k < j; ++k)
{
ranks[inds[k]] = rankvalue;
};
tj = j;
xi = x[j];
};
};
double rankvalue = 0.5 * (ci - 1 + tj);
for (int k = tj; k < ci; ++k)
{
ranks[inds[k]] = rankvalue;
};
The problem is, the supposed performance bottleneck mergeSort(), which is O(NlogN) is several times faster than the other part of codes (which is O(N)), which suggests there is room for huge improvment with the other part of the codes, any advices?

It seems that the algorithm has quadratic behavior: if x[0] is the largest value in the sequence tj stays 0 and you get up to ci iterations internally. Did you mean to use x[inds[0]] and x[inds[j]]?

Related

What is the Big-O Notation for this code?

I am having trouble deciding between N^2 and NlogN as the Big O? Whats throwing me off is the third nested for loop from k <=j. How do I reconcile this?
int Max_Subsequence_Sum( const int A[], const int N )
{
int This_Sum = 0, Max_Sum = 0;
for (int i=0; i<N; i++)
{
for (int j=i; j<N; j++)
{
This_Sum = 0;
for (int k=i; k<=j; k++)
{
This_Sum += A[k];
}
if (This_Sum > Max_Sum)
{
Max_Sum = This_Sum;
}
}
}
return Max_Sum;
}

This can be done with estimation or analysis. Looking at the inner most loop there are j-i operations inside the second loop. To get the total number of operations one would sum to get :
(1+N)(2 N + N^2) / 6
Making the algorithm O(N^3). To estimate one can see that there are three loops which at some point have O(N) calls thus it's O(N^3).

Let us analyze the most inner loop first:
for (int k=i; k <= j; k++) {
This_Sum += A[k];
}
Here the counter k iterates from i (inclusive) to j (inclusive), this thus means that the body of the for loop is performed j-i+1 times. If we assume that fetching the k-th number from an array is done in constant time, and the arithmetic operations (incrementing k, calculating the sum of This_Sum and A[k], and comparking k with j), then this thus runs in O(j-i).
The initialization of This_Sum and the if statement is not significant:
This_Sum = 0;
// ...
if (This_Sum > Max_Sum) {
Max_Sum = This_Sum;
}
indeed, if we can compare two numbers in constant time, and set one variable to the value hold by another value in constant time, then regardless whether the condition holds or not, the number of operations is fixed.
Now we can take a look at the loop in the middle, and abstract away the most inner loop:
for (int j=i; j < N; j++) {
// constant number of oprations
// j-i+1 operations
// constant number of operations
}
Here j ranges from i to N, so that means that the total number of operations is:
N
---
\
/ j - i + 1
---
j=i
This sum is equivalent to:
N
---
\
(N-j) * (1 - i) + / j
---
j=i
This is an arithmetic sum [wiki] and it is equivalent to:
(N - i + 1) × ((1 - i) + (i+N) / 2) = (N - i + 1) × ((N-i) / 2 + 1)
or when we expand this:
i2/2 + 3×N/2 - 3×i/2 + N2/2 - N×i + 1
So that means that we can now focus on the outer loop:
for (int i=0; i<N; i++) {
// i2/2 + 3×N/2 - 3×i/2 + N2/2 - N×i + 1
}
So now we can again calculate the number of operations with:
N
---
\
/ i2/2 + 3×N/2 - 3×i/2 + N2/2 - N×i + 1
---
i=0
We can use Faulhaber's formula [wiki] here to solve this sum, and obtain:
(N+1)×(N2+5×N+6)/6
or in expanded form:
N3/6 + N2 + 11×N/6 + 1
which is thus an O(n3) algorithm.

Applying FFT on multiplication of two very large number without using recursion [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have recently learned the FFT algorithm.
I applied it to the problem of fast multiplication of very large natural number following this pseudocode,
Let A be array of length m, w be primitive m-th root of unity.
Goal: produce DFT F(A): evaluation of A at 1, w, w^2,...,w^{m-1}.
FFT(A, m, w)
{
if (m==1) return vector (a_0)
else {
A_even = (a_0, a_2, ..., a_{m-2})
A_odd = (a_1, a_3, ..., a_{m-1})
F_even = FFT(A_even, m/2, w^2) //w^2 is a primitive m/2-th root of unity
F_odd = FFT(A_odd, m/2, w^2)
F = new vector of length m
x = 1
for (j=0; j < m/2; ++j) {
F[j] = F_even[j] + x*F_odd[j]
F[j+m/2] = F_even[j] - x*F_odd[j]
x = x * w
}
return F
}
It works great but I found a better code that does the same job without recursion and also runs much faster.
I have tried to figure out how it works line by line, however, I failed.
I would really appreciate if you can explain me in detail what is happening those first two for loop (not the math part)
Below is the new code
typedef complex<double> base;
void fft(vector<base> &a, bool invert)
{
int n = a.size();
for (int i = 1, j = 0; i < n; i++){
int bit = n >> 1;
for (; j >= bit; bit >>= 1) j -= bit;
j += bit;
if (i < j) swap(a[i], a[j]);
}
for (int len = 2; len <= n; len <<= 1){
double ang = 2 * M_PI / len * (invert ? -1 : 1);
base wlen(cos(ang), sin(ang));
for (int i = 0; i < n; i += len){
base w(1);
for (int j = 0; j < len / 2; j++){
base u = a[i + j], v = a[i + j + len / 2] * w;
a[i + j] = u + v;
a[i + j + len / 2] = u - v;
w *= wlen;
}
}
}
if (invert)
{
for (int i = 0; i < n; i++)
a[i] /= n;
}
}

Cooley–Tukey FFT implementation has beed described hundreds of times.
Wiki page part with non-recursive method.
The first loop is bit reversal part - code repacks source array, swapping element at i-th index with index of reversed bits of i (so for length=8 index 6=110b is swapped with index 3=011b, and index 5=101b remains at the same place).
This reordering allows to treat array in-place, making calculations on pairs, separated by 1,2,4,8... indexes (len/2 step here) with corresponding trigonometric coefficients.
P.S. Your answer contains onlinejudge tag, so such compact implementation is quite good for you purposes. But for real work it is worth to use some highly-optimized library like fftw etc

Algorithm on hexagonal grid

Hexagonal grid is represented by a two-dimensional array with R rows and C columns. First row always comes "before" second in hexagonal grid construction (see image below). Let k be the number of turns. Each turn, an element of the grid is 1 if and only if the number of neighbours of that element that were 1 the turn before is an odd number. Write C++ code that outputs the grid after k turns.
Limitations:
1 <= R <= 10, 1 <= C <= 10, 1 <= k <= 2^(63) - 1
An example with input (in the first row are R, C and k, then comes the starting grid):
4 4 3
0 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0
Simulation: image, yellow elements represent '1' and blank represent '0'.
This problem is easy to solve if I simulate and produce a grid each turn, but with big enough k it becomes too slow. What is the faster solution?
EDIT: code (n and m are used instead R and C) :
#include <cstdio>
#include <cstring>
using namespace std;
int old[11][11];
int _new[11][11];
int n, m;
long long int k;
int main() {
scanf ("%d %d %lld", &n, &m, &k);
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) scanf ("%d", &old[i][j]);
}
printf ("\n");
while (k) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
int count = 0;
if (i % 2 == 0) {
if (i) {
if (j) count += old[i-1][j-1];
count += old[i-1][j];
}
if (j) count += (old[i][j-1]);
if (j < m-1) count += (old[i][j+1]);
if (i < n-1) {
if (j) count += old[i+1][j-1];
count += old[i+1][j];
}
}
else {
if (i) {
if (j < m-1) count += old[i-1][j+1];
count += old[i-1][j];
}
if (j) count += old[i][j-1];
if (j < m-1) count += old[i][j+1];
if (i < n-1) {
if (j < m-1) count += old[i+1][j+1];
count += old[i+1][j];
}
}
if (count % 2) _new[i][j] = 1;
else _new[i][j] = 0;
}
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) old[i][j] = _new[i][j];
}
k--;
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
printf ("%d", old[i][j]);
}
printf ("\n");
}
return 0;
}

For a given R and C, you have N=R*C cells.
If you represent those cells as a vector of elements in GF(2), i.e, 0s and 1s where arithmetic is performed mod 2 (addition is XOR and multiplication is AND), then the transformation from one turn to the next can be represented by an N*N matrix M, so that:
turn[i+1] = M*turn[i]
You can exponentiate the matrix to determine how the cells transform over k turns:
turn[i+k] = (M^k)*turn[i]
Even if k is very large, like 2^63-1, you can calculate M^k quickly using exponentiation by squaring: https://en.wikipedia.org/wiki/Exponentiation_by_squaring This only takes O(log(k)) matrix multiplications.
Then you can multiply your initial state by the matrix to get the output state.
From the limits on R, C, k, and time given in your question, it's clear that this is the solution you're supposed to come up with.

There are several ways to speed up your algorithm.
You do the neighbour-calculation with the out-of bounds checking in every turn. Do some preprocessing and calculate the neighbours of each cell once at the beginning. (Aziuth has already proposed that.)
Then you don't need to count the neighbours of all cells. Each cell is on if an odd number of neighbouring cells were on in the last turn and it is off otherwise.
You can think of this differently: Start with a clean board. For each active cell of the previous move, toggle the state of all surrounding cells. When an even number of neighbours cause a toggle, the cell is on, otherwise the toggles cancel each other out. Look at the first step of your example. It's like playing Lights Out, really.
This method is faster than counting the neighbours if the board has only few active cells and its worst case is a board whose cells are all on, in which case it is as good as neighbour-counting, because you have to touch each neighbours for each cell.
The next logical step is to represent the board as a sequence of bits, because bits already have a natural way of toggling, the exclusive or or xor oerator, ^. If you keep the list of neigbours for each cell as a bit mask m, you can then toggle the board b via b ^= m.
These are the improvements that can be made to the algorithm. The big improvement is to notice that the patterns will eventually repeat. (The toggling bears resemblance with Conway's Game of Life, where there are also repeating patterns.) Also, the given maximum number of possible iterations, 2⁶³ is suspiciously large.
The playing board is small. The example in your question will repeat at least after 2¹⁶ turns, because the 4×4 board can have at most 2¹⁶ layouts. In practice, turn 127 reaches the ring pattern of the first move after the original and it loops with a period of 126 from then.
The bigger boards may have up to 2¹⁰⁰ layouts, so they may not repeat within 2⁶³ turns. A 10×10 board with a single active cell near the middle has ar period of 2,162,622. This may indeed be a topic for a maths study, as Aziuth suggests, but we'll tacke it with profane means: Keep a hash map of all previous states and the turns where they occurred, then check whether the pattern has occurred before in each turn.
We now have:
a simple algorithm for toggling the cells' state and
a compact bitwise representation of the board, which allows us to create a hash map of the previous states.
Here's my attempt:
#include <iostream>
#include <map>
/*
* Bit representation of a playing board, at most 10 x 10
*/
struct Grid {
unsigned char data[16];
Grid() : data() {
}
void add(size_t i, size_t j) {
size_t k = 10 * i + j;
data[k / 8] |= 1u << (k % 8);
}
void flip(const Grid &mask) {
size_t n = 13;
while (n--) data[n] ^= mask.data[n];
}
bool ison(size_t i, size_t j) const {
size_t k = 10 * i + j;
return ((data[k / 8] & (1u << (k % 8))) != 0);
}
bool operator<(const Grid &other) const {
size_t n = 13;
while (n--) {
if (data[n] > other.data[n]) return true;
if (data[n] < other.data[n]) return false;
}
return false;
}
void dump(size_t n, size_t m) const {
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
std::cout << '\n';
}
};
int main()
{
size_t n, m, k;
std::cin >> n >> m >> k;
Grid grid;
Grid mask[10][10];
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
int x;
std::cin >> x;
if (x) grid.add(i, j);
}
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
Grid &mm = mask[i][j];
if (i % 2 == 0) {
if (i) {
if (j) mm.add(i - 1, j - 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j) mm.add(i + 1, j - 1);
mm.add(i + 1, j);
}
} else {
if (i) {
if (j < m - 1) mm.add(i - 1, j + 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j < m - 1) mm.add(i + 1, j + 1);
mm.add(i + 1, j);
}
}
}
}
std::map<Grid, size_t> prev;
std::map<size_t, Grid> pattern;
for (size_t turn = 0; turn < k; turn++) {
Grid next;
std::map<Grid, size_t>::const_iterator it = prev.find(grid);
if (1 && it != prev.end()) {
size_t start = it->second;
size_t period = turn - start;
size_t index = (k - turn) % period;
grid = pattern[start + index];
break;
}
prev[grid] = turn;
pattern[turn] = grid;
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
if (grid.ison(i, j)) next.flip(mask[i][j]);
}
}
grid = next;
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (grid.ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
return 0;
}
There is probably room for improvement. Especially, I'm not so sure how it fares for big boards. (The code above uses an ordered map. We don't need the order, so using an unordered map will yield faster code. The example above with a single active cell on a 10×10 board took significantly longer than a second with an ordered map.)

Not sure about how you did it - and you should really always post code here - but let's try to optimize things here.
First of all, there is not really a difference between that and a quadratic grid. Different neighbor relationships, but I mean, that is just a small translation function. If you have a problem there, we should treat this separately, maybe on CodeReview.
Now, the naive solution is:
for all fields
count neighbors
if odd: add a marker to update to one, else to zero
for all fields
update all fields by marker of former step
this is obviously in O(N). Iterating twice is somewhat twice the actual run time, but should not be that bad. Try not to allocate space every time that you do that but reuse existing structures.
I'd propose this solution:
at the start:
create a std::vector or std::list "activated" of pointers to all fields that are activated
each iteration:
create a vector "new_activated"
for all items in activated
count neighbors, if odd add to new_activated
for all items in activated
set to inactive
replace activated by new_activated*
for all items in activated
set to active
*this can be done efficiently by putting them in a smart pointer and use move semantics
This code only works on the activated fields. As long as they stay within some smaller area, this is far more efficient. However, I have no idea when this changes - if there are activated fields all over the place, this might be less efficient. In that case, the naive solution might be the best one.
EDIT: after you now posted your code... your code is quite procedural. This is C++, use classes and use representation of things. Probably you do the search for neighbors right, but you can easily make mistakes there and therefore should isolate that part in a function, or better method. Raw arrays are bad and variables like n or k are bad. But before I start tearing your code apart, I instead repeat my recommendation, put the code on CodeReview, having people tear it apart until it is perfect.

This started off as a comment, but I think it could be helpful as an answer in addition to what has already been stated.
You stated the following limitations:
1 <= R <= 10, 1 <= C <= 10
Given these restrictions, I'll take the liberty to can represent the grid/matrix M of R rows and C columns in constant space (i.e. O(1)), and also check its elements in O(1) instead of O(R*C) time, thus removing this part from our time-complexity analysis.
That is, the grid can simply be declared as bool grid[10][10];.
The key input is the large number of turns k, stated to be in the range:
1 <= k <= 2^(63) - 1
The problem is that, AFAIK, you're required to perform k turns. This makes the algorithm be in O(k). Thus, no proposed solution can do better than O(k)[1].
To improve the speed in a meaningful way, this upper-bound must be lowered in some way[1], but it looks like this cannot be done without altering the problem constraints.
Thus, no proposed solution can do better than O(k)[1].
The fact that k can be so large is the main issue. The most anyone can do is improve the rest of the implementation, but this will only improve by a constant factor; you'll have to go through k turns regardless of how you look at it.
Therefore, unless some clever fact and/or detail is found that allows this bound to be lowered, there's no other choice.
[1] For example, it's not like trying to determine if some number n is prime, where you can check all numbers in the range(2, n) to see if they divide n, making it a O(n) process, or notice that some improvements include only looking at odd numbers after checking n is not even (constant factor; still O(n)), and then checking odd numbers only up to √n, i.e., in the range(3, √n, 2), which meaningfully lowers the upper-bound down to O(√n).

Time complexity for an algorithm is ok?

I want to design an algorithm with O(n(log(2)n)^2) time complexity. I wrote this:
for(int i=1; i<=n; i++){
j=i;
while(j != 1)
j=j/2;
j=i;
while(j !=1)
j=j/2;
}
Does it have O(n(log(2)n)^2) time complexity? If not, where I am going wrong and how can I fix it so that its time complexity is O(n(log(2)n)^2)?

Slight digression:
As the guys said in the comments, the algorithm is indeed O(n log n). This is coincidentally identical to the result obtained by multiplying the complexity of the inner loop by the outer loop, i.e. O(log i) x O(n).
This may lead you to believe that we can simply add another iteration of the inner loop to obtain the (log n)2 part:
for (int i = 1; i < n; i++) {
int k = i;
while (k >= 1)
k /= 2;
int j = i;
while (j >= 1)
j /= 2;
}
}
But let's look at how the original complexity is derived:
(Using Sterling's approximation)
Therefore the proposed modification would give:
Which is not what we want.
An example I can think of from a recent personal project is semi-naive KD-tree construction. The pseudocode is given below:
def kd_recursive_cons (list_points):
if length(list_points) < predefined_threshold:
return new leaf(list_points)
A <- random axis (X, Y, Z)
sort list_points by their A-coordinate
mid <- find middle element in list_points
list_L, list_R <- split list_points at mid
node_L <- kd_recursive_cons(list_L)
node_R <- kd_recursive_cons(list_R)
return new node (node_L, node_R)
end
The time complexity function is therefore given by:
Where the n log n part is from sorting. We can obviously ignore the Dn linear part, and also the constant C. Thus:
Which is what we wanted.
Now to write a simpler piece of code with the same time complexity. We can make use of the summation we obtained in the above derivation...
And noting that the parameter passed to the log function is divided by two in every loop, we can thus write the code:
for (int i = 1; i < n; i++) {
for (int k = n; k >= 1; k /= 2) {
int j = k;
while (j >= 1)
j /= 2;
}
}
This looks like the "naive" but incorrect solution mentioned at the beginning, with the difference being that the nested loops there had different bounds (j did not depend on k, but k depended on i instead of directly on n).
EDIT: some numerical tests to confirm that the complexity is as intended:
Test function code:
int T(int n) {
int o = 0;
for (int i = 1; i < n; i++)
for (int j = n; j >= 1; j /= 2)
for (int k = j; k >= 1; k /= 2, o++);
return o;
}
Numerical results:
n T(n)
-------------------
2 3
4 18
8 70
16 225
32 651
64 1764
128 4572
256 11475
512 28105
1024 67518
2048 159666
4096 372645
8192 860055
16384 1965960
32768 4456312
65536 10026855
131072 22413141
262144 49807170
524288 110100270
Then I plotted sqrt(T(n) / n) against n. If the complexity is correct this should give a log(n) graph, or a straight line if plotted with a log-scale horizontal axis.
And this is indeed what we get:

How to flatten a loop through the upper triangle of a matrix?

I have a situation where I am using openMP for the Xeon Phi Coprocessor, and I have an opportunity to parallelize an "embarrassingly parallel" double for loop.
However, the for loop is looping through the upper triangle (including the diagonal):
for (int i = 0; i < n; i++)
// access some arrays with the value of i
for (int j = i; j < n; j++)
// do some stuff with the values of j and i
So, I've got the total size of the loop,
for (int q = 0; q < ((n*n - n)/2)+n; q++)
// Do stuff
But where I am struggling is:
How do I calculate i and j from q? Is it possible?
In the meantime, I'm just doing the full matrix, calculating i and j from there, and only doing my stuff when j >= i...but that still leaves a hefty amount of thread overhead.

If I restate your problem, to find i and j from q, you need the greatest i such that
q >= i*n - (i-1)*i/2
to define j as
j = i + (q - i*n - (i-1)*i/2)
If you have such a greatest i, then
(i+1)*n - i*(i+1)/2 > q >= i*n - (i-1)*i/2
n-i > (q - i*n - (i-1)*i/2) >= 0
n > j = i + (q - i*n - (i-1)*i/2) >= i
Here is a first iterative method to find i:
for (i = 0; q >= i*n - (i-1)*i/2; ++i);
i = i-1;
If you iterate over q, the computation of i is likely to exploit the iterative process.
A second method could use sqrt since
i*n - i²/2 + i/2 ~ q
i²/2 - i(n+1/2) + q ~ 0
delta = (n+0.5)² - 2q
i ~ (n+0.5) - sqrt(delta)
i could be defined as floor((n+0.5) - sqrt((n+0.5)² - 2q))

OK, this isn't an answer as far as I can tell. But, it is a workaround (for now).
I was thinking about it, and creating an array (or corresponding pointer), a[(n*n + n)/2 + n][2], and reading in the corresponding i and j values in my calling code, and passing this to my function would allow for the speed up.

You can make your loop to iterate as it is iterating on the whole matrix.
And just to keep track on your current line and each time you are entering a new line increment the index with the value of that line.
Then: i == line and j == i%n.
See this code:
int main() {
int n = 10;
int line = 0;
for (int i=0; i<n*n; i++){
if (i%n == 0 && i!=0){
line++;
i += line;
cout << endl;
}
cout << "("<<line<<","<<i%n<<")";
}
return 0;
}
Running Example

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js