I am playing with the travelling salesman problem and am looking at the version where:
the towns are points in 2d space and there are paths from every town to all others and the lengths are the distances between the points. So it's very easy to implement the naive solution where you check all permutations of n points and calculate the length of the path.
I've found however that for n >= 10 the compiler does some magic and prints a value that is certainly not the actual shortest path. I compile with the Microsoft visual studio compiler in release mode with the default settings. For values (10,30) it thinks for 30 seconds and then returns some number that seems like it could be correct but it is not (I check in different ways). And for n > 40 it calculates a result immediately and is always 2.14748e+09.
I am looking for an explanation to what does the compiler do in the different situations (the (10,30) case is really interesting). And an example where these optimizations are more useful than the program just spinning to the end of the world.
vector<pair<int,int>> points;
void min_len()
{
// n is a global variable with the number of points(towns)
double min = INT_MAX;
// there are n! permutations of n elements
for (auto j = 0; j < factorial(n); ++j)
{
double sum = 0;
for (auto i = 0; i < n - 1; ++i)
{
sum += distance_points(points[i], points[i + 1]);
}
if (sum < min)
{
min = sum;
s_path = points;
}
next_permutation(points.begin(), points.end());
}
for (auto i = 0; i < n; ++i)
{
cout << s_path[i].first << " " << s_path[i].second << endl;
}
cout << min << endl;
}
unsigned int factorial(unsigned int n)
{
int res = 1, i;
for (i = 2; i <= n; i++)
res *= i;
return res;
}
Your factorial function is overflowing. Try replacing it with one returning int64_t and see your code taking 3 years to terminate for n > 20.
constexpr uint64_t factorial(unsigned int n) {
return n ? n * factorial(n-1) : 1;
}
Also, you don't need to calculate this at all. The std::next_permutation function returns 0 when all permutations have occured (starting from sorted position).
This snippet works just fine.
for (int i = 0; i < 1; i++) // for some other purpose
{
// some other code
double** angle = new double* [10]; // for a 2D matrix
for (int j = 0; j < 10; j++)
{
angle[j] = new double [3];
if (j == 0)
angle[j][0] = 2; // focused on the first column for now
else
angle[j][0] = angle[j-1][0]+3;
std::cout << angle[j][0] << std::endl;
}
for (int i = 0; i < 10; i++)
delete[] angle[i];
delete[] angle;
}
I am trying to not use conditional statement inside the loop. If I replace that with the following line, the code stops working. Please help me understand it.
angle[j][0] = (j == 0) * 2 + (j != 0) * (angle[j-1][0] + 3);
Using g++ -std=c++11 -o out main.cpp; ./out on Ubuntu 16.04 LTS
You're trying to use the ternary operator, but the syntax is wrong.
Do this:
angle[j][0] = (j == 0) ? 2 : (angle[j-1][0] + 3);
The line
angle[j][0] = (j == 0) * 2 + (j != 0) * (angle[j-1][0] + 3);
does not work since you access angle[-1] when j is 0. That is a reason for undefined behavior.
Looking at your comment to the other answer, you are apparently looking for using the conditional operator.
angle[j][0] = (j == 0) ? 2 : (angle[j-1][0] + 3);
As Sahu said, the problem with your combined line is that you take angle[j-1][0] with j==0, which is undefined behavior. This means that combining both if and else parts into a single non-branching statement is not really possible.
Secondly, these two code snippets look different (with the ternary/conditional operator producing fewer lines of C++ code):
if (x == 1)
A = 7;
else
A = 13;
versus
A = (x == 1) ? 7 : 13;
But they compile to exactly the same machine code.
So, how do we fix your problem of not wanting to branch in every single loop iteration?
Since the test/branch variable (j) is also the loop variable and the test is for j == 0, which is also the starting condition, you can do something like this:
double** angle = new double* [10]; // for a 2D matrix
angle[0] = new double[3]; // Prepare the first element
angle[0][0] = 2;
for (int j = 1; j < 10; j++) // Fill out the rest
{
angle[j] = new double[3];
angle[j][0] = angle[j - 1][0] + 3;
std::cout << angle[j][0] << std::endl;
}
Where the setup for the first element is moved out of the loop, after which the loop can be started at j=1 and only have the else branch in the loop body (with no test, of course).
However, given that branch predictors in modern CPUs are pretty awesome, and your loop hits the if-branch exactly once; on the first loop-iteration, and the else-branch on every following one, I doubt that you will see much difference in the execution times for the two versions. So, I would simply recommend that you pick the version that you find most easy to read and understand.
Hexagonal grid is represented by a two-dimensional array with R rows and C columns. First row always comes "before" second in hexagonal grid construction (see image below). Let k be the number of turns. Each turn, an element of the grid is 1 if and only if the number of neighbours of that element that were 1 the turn before is an odd number. Write C++ code that outputs the grid after k turns.
Limitations:
1 <= R <= 10, 1 <= C <= 10, 1 <= k <= 2^(63) - 1
An example with input (in the first row are R, C and k, then comes the starting grid):
4 4 3
0 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0
Simulation: image, yellow elements represent '1' and blank represent '0'.
This problem is easy to solve if I simulate and produce a grid each turn, but with big enough k it becomes too slow. What is the faster solution?
EDIT: code (n and m are used instead R and C) :
#include <cstdio>
#include <cstring>
using namespace std;
int old[11][11];
int _new[11][11];
int n, m;
long long int k;
int main() {
scanf ("%d %d %lld", &n, &m, &k);
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) scanf ("%d", &old[i][j]);
}
printf ("\n");
while (k) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
int count = 0;
if (i % 2 == 0) {
if (i) {
if (j) count += old[i-1][j-1];
count += old[i-1][j];
}
if (j) count += (old[i][j-1]);
if (j < m-1) count += (old[i][j+1]);
if (i < n-1) {
if (j) count += old[i+1][j-1];
count += old[i+1][j];
}
}
else {
if (i) {
if (j < m-1) count += old[i-1][j+1];
count += old[i-1][j];
}
if (j) count += old[i][j-1];
if (j < m-1) count += old[i][j+1];
if (i < n-1) {
if (j < m-1) count += old[i+1][j+1];
count += old[i+1][j];
}
}
if (count % 2) _new[i][j] = 1;
else _new[i][j] = 0;
}
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) old[i][j] = _new[i][j];
}
k--;
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
printf ("%d", old[i][j]);
}
printf ("\n");
}
return 0;
}
For a given R and C, you have N=R*C cells.
If you represent those cells as a vector of elements in GF(2), i.e, 0s and 1s where arithmetic is performed mod 2 (addition is XOR and multiplication is AND), then the transformation from one turn to the next can be represented by an N*N matrix M, so that:
turn[i+1] = M*turn[i]
You can exponentiate the matrix to determine how the cells transform over k turns:
turn[i+k] = (M^k)*turn[i]
Even if k is very large, like 2^63-1, you can calculate M^k quickly using exponentiation by squaring: https://en.wikipedia.org/wiki/Exponentiation_by_squaring This only takes O(log(k)) matrix multiplications.
Then you can multiply your initial state by the matrix to get the output state.
From the limits on R, C, k, and time given in your question, it's clear that this is the solution you're supposed to come up with.
There are several ways to speed up your algorithm.
You do the neighbour-calculation with the out-of bounds checking in every turn. Do some preprocessing and calculate the neighbours of each cell once at the beginning. (Aziuth has already proposed that.)
Then you don't need to count the neighbours of all cells. Each cell is on if an odd number of neighbouring cells were on in the last turn and it is off otherwise.
You can think of this differently: Start with a clean board. For each active cell of the previous move, toggle the state of all surrounding cells. When an even number of neighbours cause a toggle, the cell is on, otherwise the toggles cancel each other out. Look at the first step of your example. It's like playing Lights Out, really.
This method is faster than counting the neighbours if the board has only few active cells and its worst case is a board whose cells are all on, in which case it is as good as neighbour-counting, because you have to touch each neighbours for each cell.
The next logical step is to represent the board as a sequence of bits, because bits already have a natural way of toggling, the exclusive or or xor oerator, ^. If you keep the list of neigbours for each cell as a bit mask m, you can then toggle the board b via b ^= m.
These are the improvements that can be made to the algorithm. The big improvement is to notice that the patterns will eventually repeat. (The toggling bears resemblance with Conway's Game of Life, where there are also repeating patterns.) Also, the given maximum number of possible iterations, 2⁶³ is suspiciously large.
The playing board is small. The example in your question will repeat at least after 2¹⁶ turns, because the 4×4 board can have at most 2¹⁶ layouts. In practice, turn 127 reaches the ring pattern of the first move after the original and it loops with a period of 126 from then.
The bigger boards may have up to 2¹⁰⁰ layouts, so they may not repeat within 2⁶³ turns. A 10×10 board with a single active cell near the middle has ar period of 2,162,622. This may indeed be a topic for a maths study, as Aziuth suggests, but we'll tacke it with profane means: Keep a hash map of all previous states and the turns where they occurred, then check whether the pattern has occurred before in each turn.
We now have:
a simple algorithm for toggling the cells' state and
a compact bitwise representation of the board, which allows us to create a hash map of the previous states.
Here's my attempt:
#include <iostream>
#include <map>
/*
* Bit representation of a playing board, at most 10 x 10
*/
struct Grid {
unsigned char data[16];
Grid() : data() {
}
void add(size_t i, size_t j) {
size_t k = 10 * i + j;
data[k / 8] |= 1u << (k % 8);
}
void flip(const Grid &mask) {
size_t n = 13;
while (n--) data[n] ^= mask.data[n];
}
bool ison(size_t i, size_t j) const {
size_t k = 10 * i + j;
return ((data[k / 8] & (1u << (k % 8))) != 0);
}
bool operator<(const Grid &other) const {
size_t n = 13;
while (n--) {
if (data[n] > other.data[n]) return true;
if (data[n] < other.data[n]) return false;
}
return false;
}
void dump(size_t n, size_t m) const {
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
std::cout << '\n';
}
};
int main()
{
size_t n, m, k;
std::cin >> n >> m >> k;
Grid grid;
Grid mask[10][10];
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
int x;
std::cin >> x;
if (x) grid.add(i, j);
}
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
Grid &mm = mask[i][j];
if (i % 2 == 0) {
if (i) {
if (j) mm.add(i - 1, j - 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j) mm.add(i + 1, j - 1);
mm.add(i + 1, j);
}
} else {
if (i) {
if (j < m - 1) mm.add(i - 1, j + 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j < m - 1) mm.add(i + 1, j + 1);
mm.add(i + 1, j);
}
}
}
}
std::map<Grid, size_t> prev;
std::map<size_t, Grid> pattern;
for (size_t turn = 0; turn < k; turn++) {
Grid next;
std::map<Grid, size_t>::const_iterator it = prev.find(grid);
if (1 && it != prev.end()) {
size_t start = it->second;
size_t period = turn - start;
size_t index = (k - turn) % period;
grid = pattern[start + index];
break;
}
prev[grid] = turn;
pattern[turn] = grid;
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
if (grid.ison(i, j)) next.flip(mask[i][j]);
}
}
grid = next;
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (grid.ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
return 0;
}
There is probably room for improvement. Especially, I'm not so sure how it fares for big boards. (The code above uses an ordered map. We don't need the order, so using an unordered map will yield faster code. The example above with a single active cell on a 10×10 board took significantly longer than a second with an ordered map.)
Not sure about how you did it - and you should really always post code here - but let's try to optimize things here.
First of all, there is not really a difference between that and a quadratic grid. Different neighbor relationships, but I mean, that is just a small translation function. If you have a problem there, we should treat this separately, maybe on CodeReview.
Now, the naive solution is:
for all fields
count neighbors
if odd: add a marker to update to one, else to zero
for all fields
update all fields by marker of former step
this is obviously in O(N). Iterating twice is somewhat twice the actual run time, but should not be that bad. Try not to allocate space every time that you do that but reuse existing structures.
I'd propose this solution:
at the start:
create a std::vector or std::list "activated" of pointers to all fields that are activated
each iteration:
create a vector "new_activated"
for all items in activated
count neighbors, if odd add to new_activated
for all items in activated
set to inactive
replace activated by new_activated*
for all items in activated
set to active
*this can be done efficiently by putting them in a smart pointer and use move semantics
This code only works on the activated fields. As long as they stay within some smaller area, this is far more efficient. However, I have no idea when this changes - if there are activated fields all over the place, this might be less efficient. In that case, the naive solution might be the best one.
EDIT: after you now posted your code... your code is quite procedural. This is C++, use classes and use representation of things. Probably you do the search for neighbors right, but you can easily make mistakes there and therefore should isolate that part in a function, or better method. Raw arrays are bad and variables like n or k are bad. But before I start tearing your code apart, I instead repeat my recommendation, put the code on CodeReview, having people tear it apart until it is perfect.
This started off as a comment, but I think it could be helpful as an answer in addition to what has already been stated.
You stated the following limitations:
1 <= R <= 10, 1 <= C <= 10
Given these restrictions, I'll take the liberty to can represent the grid/matrix M of R rows and C columns in constant space (i.e. O(1)), and also check its elements in O(1) instead of O(R*C) time, thus removing this part from our time-complexity analysis.
That is, the grid can simply be declared as bool grid[10][10];.
The key input is the large number of turns k, stated to be in the range:
1 <= k <= 2^(63) - 1
The problem is that, AFAIK, you're required to perform k turns. This makes the algorithm be in O(k). Thus, no proposed solution can do better than O(k)[1].
To improve the speed in a meaningful way, this upper-bound must be lowered in some way[1], but it looks like this cannot be done without altering the problem constraints.
Thus, no proposed solution can do better than O(k)[1].
The fact that k can be so large is the main issue. The most anyone can do is improve the rest of the implementation, but this will only improve by a constant factor; you'll have to go through k turns regardless of how you look at it.
Therefore, unless some clever fact and/or detail is found that allows this bound to be lowered, there's no other choice.
[1] For example, it's not like trying to determine if some number n is prime, where you can check all numbers in the range(2, n) to see if they divide n, making it a O(n) process, or notice that some improvements include only looking at odd numbers after checking n is not even (constant factor; still O(n)), and then checking odd numbers only up to √n, i.e., in the range(3, √n, 2), which meaningfully lowers the upper-bound down to O(√n).
How do I speed up this recursive function? When it reaches a 10x10 matrix, it takes up a minute or so just to solve a problem. I included the event function as well so you can see when the calculation would take place.
void determinantsFrame::OnCalculateClick(wxCommandEvent &event)
{
double elem[MAX][MAX]; double det; string test; bool doIt = true;
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
test = (numbers[i][j]->GetValue()).mb_str();
if (test == "")
{
doIt = false;
break;
}
for (int k = 0; k < test.length(); k++)
if (isalpha(test[k]) || test[k] == ' ')
{
doIt = false;
break;
}
else if (ispunct(test[k]))
{
if (test[k] == '.' && test.length() == 1)
doIt = false;
else if (test[k] == '.' && test.length() != 1)
doIt = true;
else if (test[k] != '.')
doIt = false;
}
if (doIt == false)
break;
}
if (doIt == false)
break;
}
if (doIt)
{
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
elem[i][j] = static_cast<double>(wxAtof(numbers[i][j]->GetValue()));
det = determinant(elem, n);
wxMessageBox(wxString::Format(wxT("The determinant is: %.4lf"),det));
}
else
wxMessageBox(wxT("You may have entered an invalid character. Please try again"));
}
double determinantsFrame::determinant(double matrix[MAX][MAX], int order) // Here's the recursive algorithm
{
double det = 0; double temp[MAX][MAX]; int row, col;
if (order == 1)
return matrix[0][0];
else if (order == 2)
return ((matrix[0][0] * matrix[1][1]) - (matrix[0][1] * matrix[1][0]));
else
{
for (int r = 0; r < order; r++)
{
col = 0; row = 0;
for (int i = 1; i < order; i++)
{
for (int j = 0; j < order; j++)
{
if (j == r)
continue;
temp[row][col] = matrix[i][j];
col++;
if (col == order - 1)
col = 0;
}
row++;
}
det = det + (matrix[0][r] * pow(-1, r) * determinant(temp, order - 1));
}
return det;
}
}
You can do a bit better with keeping the same algorithm but it is at least O(n!) (probably worse) so higher order matrices will be slow no matter how much you optimize it. Note I did the benchmark times in MSVC 2010 and are there only for rough comparison purposes. Each change is cumulative as you go down the list and is compared to the original algorithm.
Skip Col Check -- As Surt suggested, removing this gets us a speed increase of 1%.
Add 3x3 Case -- Adding another explicit check for a 3x3 matrix gets us the most, 55%
Change pow() -- Changing the pow() call to (r % 2 ? -1.0 : 1.0) gets us a little bit more, 57%
Change to switch -- Changing the order check to a switch gets us a little bit more, 58%
Add 4x4 Case -- Adding another explicit check for a 4x4 matrix gets more, 85%
Things that don't work include:
memcpy -- As Surt suggested this actually looses a good deal of speed, -100%
Threads -- Creating order threads doesn't work well at all, -160%
I was hoping that using threads could get us a significant performance increase but even with all the optimization it is slower than the original. I think the copying of all the memory is making it not very parallel.
Added the 3x3 and 4x4 cases has the most effect and are the primary reason for the over x6 increase in speed. In theory you could add more explicit cases (probably by creating a program to output the required code) to reduce the speed even further. Of course, at some point this kind of defeats the purpose of using a recursive algorithm to begin with.
To get more performance you would probably have to consider a different algorithm. In theory you can change the recursive function into an iterative one by managing your own stack but it is considerable work and you aren't guaranteed a performance increase anyways.
It could be a branch mispredict problem (see also). The test
if (col == order - 1)
col = 0;
Is not needed as far as I can see.
The test fails 1/order times per loop and dominates for small order, which is why larger N aren't so affected. The timing is still large O(N!^3) (afaik) so don't expect miracles.
col = 0; row = 0;
for (int i = 1; i < order; i++) {
for (int j = 0; j < order; j++) {
if (j == r)
continue;
temp[row][col] = matrix[i][j];
col++;
//if (col == order - 1)
// col = 0;
}
col = 0; // no need to test
row++;
}
The algorithm will get a further slowdown when it hit L2 cache, at latest at N=64.
Also the matrix copy might be ineffective, this could be far more effective for large order at the cost of low effectiveness at low order.
for (int r = 0; r < order; r++) {
row = 0;
for (int i = 1; i < order; i++) {
memcpy(temp[row], matrix[i], r*sizeof(double)); // if r==0 will this work?
memcpy(&temp[row][r], &matrix[i][r+1], (order-r-1)*sizeof(double));
// amount of copied elements r+(order-r-1)=order-1.
row++;
}
Make a test with the original code to get the determinant that I got the indexes right!
I am trying to solve Project Euler Problem 88, and I did it without too much effort; however, I find that some seemingly irrelevant code in my program is affecting the result. Here's my complete code (it's not short, but I cannot locate the error. I believe it would be obvious to more experienced eyes, so please read my description first):
#include <iostream>
#include <set>
using namespace std;
bool m[24001][12001];
bool p[24001]; // <------------ deleting this line will cause error in result!
long long answer[12001];
int main() {
long long i;
long long j;
long long l;
set<long long> all;
long long s = 0;
for (i = 0; i <= 24000; i++) {
for (j = 0; j <= 12000; j++) {
m[i][j] = false;
}
}
m[1][1] = true;
for (i = 2; i <= 24000; i++) {
m[i][1] = true;
for (j = 2; (j <= i) && (i * j <=24000); j++) {
for (l = 1; l <= i; l++) {
if (m[i][l]) {
m[i * j][l + 1 + (i * j) - i - j] = true;
}
}
}
}
for (i = 0; i <= 24000; i++) {
for (j = 0; j <= 12000; j++) {
if (m[i][j] && (answer[j] == 0)) {
answer[j] = i;
}
}
}
for (i = 2; i <= 12000; i++) {
cout << answer[i] << endl;
all.insert(answer[i]);
}
cout << all.size() << endl;
for (set<long long>::iterator it = all.begin(); it != all.end(); it++) {
//cout << *it << endl;
s += *it;
}
cout << s << endl;
}
With the "useless" bool array, all the answers are right, between 0 and 24000; but without it, some answers in the middle got corrupted and become very large numbers.
I am completely confused now; why would that unused array affect the middle of the answer array?
Thanks and sorry for the long code! I will be grateful if someone could edit the code into a better example, I simply son't know what is with the code.
You do a silly thing in here:
m[i * j][l + 1 + (i * j) - i - j] = true;
Say, i=160, j=150, l=1... You will try to access m[24000][23692]... And you corrupt the stack, so behavior is undefined.
Next time try to use some profiler and/or debugger.
Add:
#include <cassert>
at the begining and
assert( (i * j) * 12001 + (l + 1 + (i * j) - i - j) <= 12001*24001 );
before the following line:
m[i * j][l + 1 + (i * j) - i - j] = true;
The assertion will fail, which means you write outside the bounds of the array m.
As requested, adding this to an answer.
You are definitely writing beyond the bounds of the array m somewhere, when the unused array p exists, m overwrites in to its contents which doesn't affect the answer array but once p is removed the overwriting happens in to answer array showing up the problems.
Overwriting beyond the bounds of the array is an Undefined Behavior and it causes your program to be ill-formed. With Undefined Behavior all safe bets are off and any behavior is possible. While your program may work sometimes or crash sometimes or give incorrect results.Practically, Anything is possible and the behavior may or even may not be explainable.
In one of your nested loops you use l as the index for the second dimension. This variable can run from 0 to i and i, in turn, can run from 0 to 24000. Since your second dimension of the array can only be index from 0 to 12000 this causes a classic out of range error. This also nicely explains why adding an extra array avoid the problem: the out of range accesses go to the "unused" array rather than overwriting the result.