Algorithm to divide a black-and-white chocolate bar

Algorithm to divide a black-and-white chocolate bar - c++

Problem description:
There's a chocolate bar that consists of m x n squares. Some of the squares are black, some are white. Someone breaks the chocolate bar along its vertical axis or horizontal axis. Then it is broken again along its vertical or horizontal axis and it's being broken until it can broken into a single square or it can broken into squares that are only black or only white. Using a preferably divide-and-conquer algorithm, find the number of methods a chocolate bar can be broken.
Input:
The first line tells you the m x n dimensions of the chocolate bar. In the next m lines there are n characters that tell you how does the chocolate bar look. Letter w is a white square, letter b is a black square.
for example:
3 2
bwb
wbw
Output:
the number of methods the chocolate bar can be broken:
for the example above, it's 5 (take a look at the attached picture).
I tried to solve it using an iterative approach. Unfortunately, I couldn't finish the code as I'm not yet sure how to divide the the halves (see my code below). I was told that an recursive approach is much easier than this, but I have no idea how to do it. I'm looking for another way to solve this problem than my approach or I'm looking for some help with finishing my code.
I made two 2D arrays, first for white squares, second for black squares. I'm making a matrix out of the squares and if there's a chocolate of such or such color, then I'm marking it as 1 in the corresponding array.
Then I made two arrays of the two cumulative sums of the matrices above.
Then I created a 4D array of size [n][m][n][m] and I made four loops: first two (i, j) are increasing the size of an rectangular array that is the size of the searching array (it's pretty hard to explain...) and two more loops (k, l) are increasing the position of my starting points x and y in the array. Then the algorithm checks using the cumulative sum if in the area starting at position kxl and ending at k+i x l+j there is one black and one white square. If there is, then I'm creating two more loops that will divide the area in half. If in the two new halves there are still black and white squares, then I'm increasing the corresponding 4D array element by the number of combinations of the first halve * the number of combinations of the second halve.
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
int counter=0;
int n, m;
ifstream in;
in.open("in.txt");
ofstream out;
out.open("out.txt");
if(!in.good())
{
cout << "No such file";
return 0;
}
in >> n >> m;
int whitesarray[m][n];
int blacksarray[m][n];
int methodsarray[m][n][m][n];
for(int i=0; i<m; i++)
{
for(int j=0; j<n; j++)
{
whitesarray[i][j] = 0;
blacksarray[i][j] = 0;
}
}
while(in)
{
string colour;
in >> colour;
for (int i=0; i < colour.length(); i++)
{
if(colour[i] == 'c')
{
blacksarray[counter][i] = 1;
}
if(colour[i] == 'b')
{
whitesarray[counter][i] = 1;
}
}
counter++;
}
int whitessum[m][n];
int blackssum[m][n];
for (int i=0; i<m; i++)
{
for (int j=0; j<n; j++)
{
if(i-1 == -1 && j-1 == -1)
{
whitessum[i][j] = whitesarray[i][j];
blackssum[i][j] = blacksarray[i][j];
}
if(i-1 == -1 && j-1 != -1)
{
whitessum[i][j] = whitessum[i][j-1] + whitesarray[i][j];
blackssum[i][j] = blackssum[i][j-1] + blacksarray[i][j];
}
if(j-1 == -1 && i-1 != -1)
{
whitessum[i][j] = whitessum[i-1][j] + whitesarray[i][j];
blackssum[i][j] = blackssum[i-1][j] + blacksarray[i][j];
}
if(j-1 != -1 && i-1 != -1)
{
whitessum[i][j] = whitessum[i-1][j] + whitessum[i][j-1] - whitessum[i-1][j-1] + whitesarray[i][j];
blackssum[i][j] = blackssum[i-1][j] + blackssum[i][j-1] - blackssum[i-1][j-1] + blacksarray[i][j];
}
}
}
int posx=0;
int posy=0;
int tempwhitessum=0;
int tempblackssum=0;
int k=0, l=0;
for (int i=0; i<=m; i++)
{
for (int j=0; j<=n; j++) // wielkosc wierszy
{
for (posx=0; posx < m - i; posx++)
{
for(posy = 0; posy < n - j; posy++)
{
k = i+posx-1;
l = j+posy-1;
if(k >= m || l >= n)
continue;
if(posx==0 && posy==0)
{
tempwhitessum = whitessum[k][l];
tempblackssum = blackssum[k][l];
}
if(posx==0 && posy!=0)
{
tempwhitessum = whitessum[k][l] - whitessum[k][posy-1];
tempblackssum = blackssum[k][l] - blackssum[k][posy-1];
}
if(posx!=0 && posy==0)
{
tempwhitessum = whitessum[k][l] - whitessum[posx-1][l];
tempblackssum = blackssum[k][l] - blackssum[posx-1][l];
}
if(posx!=0 && posy!=0)
{
tempwhitessum = whitessum[k][l] - whitessum[posx-1][l] - whitessum[k][posy-1] + whitessum[posx-1][posy-1];
tempblackssum = blackssum[k][l] - blackssum[posx-1][l] - blackssum[k][posy-1] + blackssum[posx-1][posy-1];
}
if(tempwhitessum >0 && tempblackssum > 0)
{
for(int e=0; e<n; e++)
{
//Somehow divide the previously found area by two and check again if there are black and white squares in this area
}
for(int r=0; r<m; r++)
{
//Somehow divide the previously found area by two and check again if there are black and white squares in this area
}
}
}
}
}}
return 0;
}

I strongly recommend recursion for this. In fact, Dynamic Programming (DP) would also be very useful, especially for larger bars. Recursion first ...
Recursion
Your recursive routine takes a 2-D array of characters (b and w). It returns the number of ways this can be broken.
First, the base cases: (1) if it's possible to break the given bar into a single piece (see my comment above, asking for clarification), return 1; (2) if the array is all one colour, return 1. For each of these, there's only one way for the bar to end up -- the way it was passed in.
Now, for the more complex case, when the bar can still be broken:
total_ways = 0
for each non-edge position in each dimension:
break the bar at that spot; form the two smaller bars, A and B.
count the ways to break each smaller bar: count(A) and count(B)
total_ways += count(A) * count(B)
return total_ways
Is that clear enough for the general approach? You still have plenty of coding to do, but using recursion allows you to think of only the two basic ideas when writing your function: (1) How do I know when I'm done, and what trivial result do I return then? (2) If I'm not done, how do I reduce the problem?
Dynamic Programming
This consists of keeping a record of situations you've already solved. The first thing you do in the routine is to check your "data base" to see whether you already know this case. If so, return the known result instead of recomputing. This includes the overhead of developing and implementing said data base, probably a look-up list (dictionary) of string arrays and integer results, such as ["bwb", "wbw"] => 5.

Related

C++) I don't know the fast algorithm to compare coordinates

Up to 100,000 coordinates are entered. Only coordinates corresponding to specific conditions should be output. If there are coordinates with larger x values and smaller y values than each coordinate, the corresponding coordinates are excluded from the output list.
My English is not good, so I'm giving some examples.
[input]
First enter the number of coordinates N to be input.
and enter the coordinates.
[output]
The coordinate numbers corresponding to the condition are output in ascending order.
[input example]
6
1 3
6 6
7 3
8 2
8 6
2 1
[output example]
4
5
6
coordinates image
The following problem was solved with a simple loop, but a timeout occurs when 100,000 values are entered. I don't know which algorithm to use.
I also attach the C++ source code I wrote.
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
int main() {
int N;
cin >> N;
bool* visible = new bool[N];
for (int i = 0; i < N; i++)visible[i] = true;
vector<pair<int,pair<int, int>>> v;
for (int i = 0; i < N; i++) {
int a, b;
cin >> a >> b;
v.push_back(make_pair(i,make_pair(a, b)));
}
for (int i = 0; i < v.size(); i++) {
if (visible[i] == false)
continue;
for (int j = 0; j < v.size(); j++) {
if (visible[i] == true &&visible[j]==true && v[i].second.first < v[j].second.first && v[i].second.second > v[j].second.second) {
visible[i] = false;
break;
}
else if (visible[i] == true && visible[j] == true && v[i].second.first > v[j].second.first && v[i].second.second < v[j].second.second) {
visible[j] = false;
continue;
}
}
}
for (int i = 0; i < v.size(); i++) {
if (visible[i] == true)
cout << v[i].first + 1 << endl;
}
return 0;
}

Looks like the problem about dominating points set.
Sort points by X value (Y is secondary key in case of X equality)
Assign the first point (with the largest X) to Big variable and add it to the result
Walk through array. If you meet point that is not dominated by Big - assign it to Big and add it to the result

The proper terms for your problem are Pareto front(ier), ND-tree, KD-tree, non-dominance problem.
There are libraries for that e.g.
https://github.com/alandefreitas/pareto-front
However it 2 dims it is a simple task.
You have to sort your points by one of the coordinates. And then scan from the best one to the worst and accepts only if the other coordinate is not dominated by previously acquired point. And it will give you O(n log(n)) that is the best possible in general case for such tasks.

Algorithm on hexagonal grid

Hexagonal grid is represented by a two-dimensional array with R rows and C columns. First row always comes "before" second in hexagonal grid construction (see image below). Let k be the number of turns. Each turn, an element of the grid is 1 if and only if the number of neighbours of that element that were 1 the turn before is an odd number. Write C++ code that outputs the grid after k turns.
Limitations:
1 <= R <= 10, 1 <= C <= 10, 1 <= k <= 2^(63) - 1
An example with input (in the first row are R, C and k, then comes the starting grid):
4 4 3
0 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0
Simulation: image, yellow elements represent '1' and blank represent '0'.
This problem is easy to solve if I simulate and produce a grid each turn, but with big enough k it becomes too slow. What is the faster solution?
EDIT: code (n and m are used instead R and C) :
#include <cstdio>
#include <cstring>
using namespace std;
int old[11][11];
int _new[11][11];
int n, m;
long long int k;
int main() {
scanf ("%d %d %lld", &n, &m, &k);
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) scanf ("%d", &old[i][j]);
}
printf ("\n");
while (k) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
int count = 0;
if (i % 2 == 0) {
if (i) {
if (j) count += old[i-1][j-1];
count += old[i-1][j];
}
if (j) count += (old[i][j-1]);
if (j < m-1) count += (old[i][j+1]);
if (i < n-1) {
if (j) count += old[i+1][j-1];
count += old[i+1][j];
}
}
else {
if (i) {
if (j < m-1) count += old[i-1][j+1];
count += old[i-1][j];
}
if (j) count += old[i][j-1];
if (j < m-1) count += old[i][j+1];
if (i < n-1) {
if (j < m-1) count += old[i+1][j+1];
count += old[i+1][j];
}
}
if (count % 2) _new[i][j] = 1;
else _new[i][j] = 0;
}
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) old[i][j] = _new[i][j];
}
k--;
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
printf ("%d", old[i][j]);
}
printf ("\n");
}
return 0;
}

For a given R and C, you have N=R*C cells.
If you represent those cells as a vector of elements in GF(2), i.e, 0s and 1s where arithmetic is performed mod 2 (addition is XOR and multiplication is AND), then the transformation from one turn to the next can be represented by an N*N matrix M, so that:
turn[i+1] = M*turn[i]
You can exponentiate the matrix to determine how the cells transform over k turns:
turn[i+k] = (M^k)*turn[i]
Even if k is very large, like 2^63-1, you can calculate M^k quickly using exponentiation by squaring: https://en.wikipedia.org/wiki/Exponentiation_by_squaring This only takes O(log(k)) matrix multiplications.
Then you can multiply your initial state by the matrix to get the output state.
From the limits on R, C, k, and time given in your question, it's clear that this is the solution you're supposed to come up with.

There are several ways to speed up your algorithm.
You do the neighbour-calculation with the out-of bounds checking in every turn. Do some preprocessing and calculate the neighbours of each cell once at the beginning. (Aziuth has already proposed that.)
Then you don't need to count the neighbours of all cells. Each cell is on if an odd number of neighbouring cells were on in the last turn and it is off otherwise.
You can think of this differently: Start with a clean board. For each active cell of the previous move, toggle the state of all surrounding cells. When an even number of neighbours cause a toggle, the cell is on, otherwise the toggles cancel each other out. Look at the first step of your example. It's like playing Lights Out, really.
This method is faster than counting the neighbours if the board has only few active cells and its worst case is a board whose cells are all on, in which case it is as good as neighbour-counting, because you have to touch each neighbours for each cell.
The next logical step is to represent the board as a sequence of bits, because bits already have a natural way of toggling, the exclusive or or xor oerator, ^. If you keep the list of neigbours for each cell as a bit mask m, you can then toggle the board b via b ^= m.
These are the improvements that can be made to the algorithm. The big improvement is to notice that the patterns will eventually repeat. (The toggling bears resemblance with Conway's Game of Life, where there are also repeating patterns.) Also, the given maximum number of possible iterations, 2⁶³ is suspiciously large.
The playing board is small. The example in your question will repeat at least after 2¹⁶ turns, because the 4×4 board can have at most 2¹⁶ layouts. In practice, turn 127 reaches the ring pattern of the first move after the original and it loops with a period of 126 from then.
The bigger boards may have up to 2¹⁰⁰ layouts, so they may not repeat within 2⁶³ turns. A 10×10 board with a single active cell near the middle has ar period of 2,162,622. This may indeed be a topic for a maths study, as Aziuth suggests, but we'll tacke it with profane means: Keep a hash map of all previous states and the turns where they occurred, then check whether the pattern has occurred before in each turn.
We now have:
a simple algorithm for toggling the cells' state and
a compact bitwise representation of the board, which allows us to create a hash map of the previous states.
Here's my attempt:
#include <iostream>
#include <map>
/*
* Bit representation of a playing board, at most 10 x 10
*/
struct Grid {
unsigned char data[16];
Grid() : data() {
}
void add(size_t i, size_t j) {
size_t k = 10 * i + j;
data[k / 8] |= 1u << (k % 8);
}
void flip(const Grid &mask) {
size_t n = 13;
while (n--) data[n] ^= mask.data[n];
}
bool ison(size_t i, size_t j) const {
size_t k = 10 * i + j;
return ((data[k / 8] & (1u << (k % 8))) != 0);
}
bool operator<(const Grid &other) const {
size_t n = 13;
while (n--) {
if (data[n] > other.data[n]) return true;
if (data[n] < other.data[n]) return false;
}
return false;
}
void dump(size_t n, size_t m) const {
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
std::cout << '\n';
}
};
int main()
{
size_t n, m, k;
std::cin >> n >> m >> k;
Grid grid;
Grid mask[10][10];
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
int x;
std::cin >> x;
if (x) grid.add(i, j);
}
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
Grid &mm = mask[i][j];
if (i % 2 == 0) {
if (i) {
if (j) mm.add(i - 1, j - 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j) mm.add(i + 1, j - 1);
mm.add(i + 1, j);
}
} else {
if (i) {
if (j < m - 1) mm.add(i - 1, j + 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j < m - 1) mm.add(i + 1, j + 1);
mm.add(i + 1, j);
}
}
}
}
std::map<Grid, size_t> prev;
std::map<size_t, Grid> pattern;
for (size_t turn = 0; turn < k; turn++) {
Grid next;
std::map<Grid, size_t>::const_iterator it = prev.find(grid);
if (1 && it != prev.end()) {
size_t start = it->second;
size_t period = turn - start;
size_t index = (k - turn) % period;
grid = pattern[start + index];
break;
}
prev[grid] = turn;
pattern[turn] = grid;
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
if (grid.ison(i, j)) next.flip(mask[i][j]);
}
}
grid = next;
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (grid.ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
return 0;
}
There is probably room for improvement. Especially, I'm not so sure how it fares for big boards. (The code above uses an ordered map. We don't need the order, so using an unordered map will yield faster code. The example above with a single active cell on a 10×10 board took significantly longer than a second with an ordered map.)

Not sure about how you did it - and you should really always post code here - but let's try to optimize things here.
First of all, there is not really a difference between that and a quadratic grid. Different neighbor relationships, but I mean, that is just a small translation function. If you have a problem there, we should treat this separately, maybe on CodeReview.
Now, the naive solution is:
for all fields
count neighbors
if odd: add a marker to update to one, else to zero
for all fields
update all fields by marker of former step
this is obviously in O(N). Iterating twice is somewhat twice the actual run time, but should not be that bad. Try not to allocate space every time that you do that but reuse existing structures.
I'd propose this solution:
at the start:
create a std::vector or std::list "activated" of pointers to all fields that are activated
each iteration:
create a vector "new_activated"
for all items in activated
count neighbors, if odd add to new_activated
for all items in activated
set to inactive
replace activated by new_activated*
for all items in activated
set to active
*this can be done efficiently by putting them in a smart pointer and use move semantics
This code only works on the activated fields. As long as they stay within some smaller area, this is far more efficient. However, I have no idea when this changes - if there are activated fields all over the place, this might be less efficient. In that case, the naive solution might be the best one.
EDIT: after you now posted your code... your code is quite procedural. This is C++, use classes and use representation of things. Probably you do the search for neighbors right, but you can easily make mistakes there and therefore should isolate that part in a function, or better method. Raw arrays are bad and variables like n or k are bad. But before I start tearing your code apart, I instead repeat my recommendation, put the code on CodeReview, having people tear it apart until it is perfect.

This started off as a comment, but I think it could be helpful as an answer in addition to what has already been stated.
You stated the following limitations:
1 <= R <= 10, 1 <= C <= 10
Given these restrictions, I'll take the liberty to can represent the grid/matrix M of R rows and C columns in constant space (i.e. O(1)), and also check its elements in O(1) instead of O(R*C) time, thus removing this part from our time-complexity analysis.
That is, the grid can simply be declared as bool grid[10][10];.
The key input is the large number of turns k, stated to be in the range:
1 <= k <= 2^(63) - 1
The problem is that, AFAIK, you're required to perform k turns. This makes the algorithm be in O(k). Thus, no proposed solution can do better than O(k)[1].
To improve the speed in a meaningful way, this upper-bound must be lowered in some way[1], but it looks like this cannot be done without altering the problem constraints.
Thus, no proposed solution can do better than O(k)[1].
The fact that k can be so large is the main issue. The most anyone can do is improve the rest of the implementation, but this will only improve by a constant factor; you'll have to go through k turns regardless of how you look at it.
Therefore, unless some clever fact and/or detail is found that allows this bound to be lowered, there's no other choice.
[1] For example, it's not like trying to determine if some number n is prime, where you can check all numbers in the range(2, n) to see if they divide n, making it a O(n) process, or notice that some improvements include only looking at odd numbers after checking n is not even (constant factor; still O(n)), and then checking odd numbers only up to √n, i.e., in the range(3, √n, 2), which meaningfully lowers the upper-bound down to O(√n).

program crashes on certain inputs

I have tried writing this code to output an odd-order magic square based on user input of an odd number. When I enter 1 or 3, it works fine. Whenever I enter anything above that such as 5, 7, 9, 11, etc the program crashes the moment I press enter. I've reviewed my code and I can't pinpoint where the problem is. I get no error messages.
Small note: if you know what a magic square is, my algorithm here (given to us by the professor in English to translate to C++) does not output the correct values since they don't all add up to the same number.
#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
int n; //n = order
cout << "Enter an odd integer for the order of the Magic Square: ";
cin >> n;
cout << endl;
if(n%2 == 0) //only allows program to accept odd numbers
{
cout << "The number you have entered is not odd" << endl;
return 0;
}
int x, y; //x and y access the columns and rows of the following matrix
int magicsquare[n][n]; //creates a n by n matrix to set up magic square
int counter, square = n*n; //square is upper boundary
for(x=0; x<n; x++) //initialize all spaces in matrix with zeros
{
for(y=0; y<n; y++)
magicsquare[x][y] = 0;
}
/*Beginning of the magic square algorithm*/
x = 0, y = n/2; //initialize algorithm at the middle column of the top row
for (counter = 1; counter <= square; counter++) //magic square will contain the integers from 1 to n squared
{
magicsquare[x][y] = counter; //places current counter number at current position in the matrix or square
x--; //moves position diagonally up
y++; //and to the right
/*If a move takes you above the top row in the jth column, move to the bottom of the jth column*/
if(x<0)
x = n - 1;
/*If a move takes you outside to the right of the square in the ith row, move to the left side of the ith row*/
else if(y==n)
y = 0;
/*If a move takes you to an already filled square or if you move out of the square at the upper right
hand corner, move immediately below position of previous number*/
else if((magicsquare[x][y] != 0) || (x<0 && y==n))
{
y--; //move one space to the left back into the square
x = x+2; //move two spots down into the square and below previous number
}
}
for(x=0; x<n; x++)
{
for(y=0; y<n; y++)
cout << setw(5) << magicsquare[x][y];
cout << endl;
}
return 0;
}

I can't follow the logic in my head to know if this can ever actually happen, but in this code:
if(x<0)
x = n - 1;
/*If a move takes you outside to the right of the square in the ith row, move to the left side of the ith row*/
else if(y==n)
y = 0;
If both conditions would have been true, you won't fix up y and the next iteration you'll run off the end of the matrix.
Also note that int magicsquare[n][n]; is a compiler extension and not supported by the C++ standard, since n is not a compile time constant. You almost certainly want to use vector instead.

The following is illegal:
int magicsquare[n][n];
Did you ignore errors, or are you using a compiler that doesn't give errors at all? I suggest you to use an IDE that hints you when a mistake is made, so you can easily see your mistake. Please do not use notepad to write C++, that is horrible.
Fixed version:
int** magicsquare = new int*[n]; //creates a n by n matrix to set up magic square
for(int i = 0; i < n+1; ++i)
magicsquare[i] = new int[n];
Now, together with Mark B's hint, you will get this running up in no time.
Do not forget to cleanup magicsquare by the way using delete.

So I don't really know anything about magic squares. But I think that this is the behavior that you are trying to achieve:
for (int counter = 1, x = 0, y = (n / 2); counter <= n * n; ++counter){
magicsquare[x][y] = counter; //places current counter number at current position in the matrix or square
if (counter % n == 0){ //moves down into the square and below previous number
x = (x + 1) % n;
}
else //moves position diagonally up and to the right
{
x = (x + n - 1) % n;
y = (y + 1) % n;
}
}
Two additional points:
Until we can use the Array Extensions Technical Specification I think you should avoid declaring C99's runtime-sized arrays in your code. Even though gcc will allow it. You might look into doing something like: vector<vector<int>> magicsquare(n, vector<int>(n));
This doesn't match the behavior illustrated by Wikipedia's article but you can get there by tweaking the start values and order of indexing.

Finding all paths through a 4x4 grid

how do you go about solving this problem ::
A robot is located at the top-left corner of a 4x4 grid. The robot can move either up, down, left, or right, but can not visit the same spot twice. The robot is trying to reach the bottom-right corner of the grid.
My ideas:
Backtracking solution where we go through the tree of all solutions and print once we reach goal cell. I implemented that but I'm not sure if it's correct or makes sense, or if it is even the right approach. I've posted code here and would much appreciate if someone could explain what's wrong with it.
Recursive solution where I start at start cell, and find the path to the goal cell from each of its neighboring cells recursively, with the base case being hitting the goal cell.
QUESTIONS:
1) Are these two ways of expressing the same idea?
2) Do these ideas even make sense?
3) What is the time complexity of each of these solutions? I think the second one is 4^n?
4) Does my backtracking code make sense?
5) Is there a much simpler way to do this?
Here is my code, which prints the correct number of paths for N = 4. Is it correct?
#define N 4
int counter = 0;
bool legal_move(int x, int y, int array[N+2][N+2]){
bool ret = (array[x][y] == 1);
array[x][y] = 0;
return ret;
}
/*
void print_array(int array[N+2][N+2]){
for(int i = 0; i < N+2; i++){
for(int j = 0; j < N+2; j++)
cout << array[i][j] << " ";
cout << endl;
}
cout << endl << endl;
}
*/
void print_paths(int x, int y, int n, int m, int array[N+2][N+2]){
if(x == n && y == m){
print_array(array);
counter++;
}
else {
int dx = 1;
int dy = 0;
for(int i = 0; i < 4; i++){
if(legal_move(x + dx, y + dy, array)){
print_paths(x + dx, y + dy, n, m, array);
array[x+dx][y+dy] = 1;
}
swap(dx,dy);
if(i == 1)
dx = -dx;
}
}
}
int main(){
int array[N+2][N+2];
for(int i = 1; i < N+1; i++)
for(int j = 1; j < N+1; j++)
array[i][j] = 1;
for(int i = 0; i < N+2; i++)
array[0][i] = array[i][0] = array[N+1][i] = array[i][N+1] = 0;
//print_array(array);
array[1][1] = 0; //Set start cell to be seen.
print_paths(1,1,N,N,array);
cout << counter << endl;
}

I think it's the same idea.
The problem with your code is that you haven't implemented 'but can not visit the same spot twice' correctly.
Suppose that your robot has gone from S to A by some path, and you are now examining whether to go to B adjacent to A. The test should be 'has the robot been to B before on the current path'. But what you have implemented is 'has the robot visited B before on any path'.
In other words you need to modify print_paths to take an extra parameter for the current path, and use that to implement the test correctly.

Here is a partial answer, which talks mostly about complexity (I think your code is correct, after the minor bugfix suggested by john, and backtracking is probably the simplest way to do what you want).
The time complexity seems something like O(3^n^2) to me - there are at most n^2 nodes, and at each node you want to check 3 possibilities. There are actually less nodes, because of the way backtracking works, but the complexity is at least O(2^(n^2/4)), so it's greater than any exponential. (the O in the last formula is actually a theta).
The below diagram describes this lower bound. Cells marked by ? have a "decision" in them - the robot may decide to go straight or turn. There are at least n^2/4 such cells, so the number of paths is at least 2^(n^2/4).
?->-?->-?->-V
| | | | | | |
>-^ >-^ >-^ V
|
V-<-?-<-?-<-?
| | | | | | |
V ^-< ^-< ^-<
|
...

What Ruzzle board contains the most unique words?

For smart phones, there is this game called Ruzzle.
It's a word finding game.
Quick Explanation:
The game board is a 4x4 grid of letters.
You start from any cell and try to spell a word by dragging up, down, left, right, or diagonal.
The board doesn't wrap, and you can't reuse letters you've already selected.
On average, my friend and I find about 40 words, and at the end of the round, the game informs you of how many possible words you could have gotten. This number is usually about 250 - 350.
We are wondering what board would yield the highest number of possible words.
How would I go about finding the optimal board?
I've written a program in C that takes 16 characters and outputs all the appropriate words.
Testing over 80,000 words, it takes about a second to process.
The Problem:
The number of game board permutations is 26^16.
That's 43608742899428874059776 (43 sextillion).
I need some kind of heuristic.
Should I skip all boards that have either z, q, x, etc because they are expected to not have as many words? I wouldn't want to exclude a letter without being certain.
There is also 4 duplicates of every board, because rotating the board will still give the same results.
But even with these restrictions, I don't think I have enough time in my life to find the answer.
Maybe board generation isn't the answer.
Is there a quicker way to find the answer looking at the list of words?

tldr;
S E R O
P I T S
L A N E
S E R G
or any of its reflections.
This board contains 1212 words (and as it turns out, you can exclude 'z', 'q' and 'x').
First things first, turns out you're using the wrong dictionary. After not getting exact matches with Ruzzle's word count, I looked into it, it seems Ruzzle uses a dictionary called TWL06, which has around 180,000 words. Don't ask me what it stands for, but it's freely available in txt.
I also wrote code to find all possible words given a 16 character board, as follows. It builds the dictionary into a tree structure, and then pretty much just goes around recursively while there are words to be found. It prints them in order of length. Uniqueness is maintained by the STL set structure.
#include <cstdlib>
#include <ctime>
#include <map>
#include <string>
#include <set>
#include <algorithm>
#include <fstream>
#include <iostream>
using namespace std;
struct TreeDict {
bool existing;
map<char, TreeDict> sub;
TreeDict() {
existing = false;
}
TreeDict& operator=(TreeDict &a) {
existing = a.existing;
sub = a.sub;
return *this;
}
void insert(string s) {
if(s.size() == 0) {
existing = true;
return;
}
sub[s[0]].insert(s.substr(1));
}
bool exists(string s = "") {
if(s.size() == 0)
return existing;
if(sub.find(s[0]) == sub.end())
return false;
return sub[s[0]].exists(s.substr(1));
}
TreeDict* operator[](char alpha) {
if(sub.find(alpha) == sub.end())
return NULL;
return &sub[alpha];
}
};
TreeDict DICTIONARY;
set<string> boggle_h(const string board, string word, int index, int mask, TreeDict *dict) {
if(index < 0 || index >= 16 || (mask & (1 << index)))
return set<string>();
word += board[index];
mask |= 1 << index;
dict = (*dict)[board[index]];
if(dict == NULL)
return set<string>();
set<string> rt;
if((*dict).exists())
rt.insert(word);
if((*dict).sub.empty())
return rt;
if(index % 4 != 0) {
set<string> a = boggle_h(board, word, index - 4 - 1, mask, dict);
set<string> b = boggle_h(board, word, index - 1, mask, dict);
set<string> c = boggle_h(board, word, index + 4 - 1, mask, dict);
rt.insert(a.begin(), a.end());
rt.insert(b.begin(), b.end());
rt.insert(c.begin(), c.end());
}
if(index % 4 != 3) {
set<string> a = boggle_h(board, word, index - 4 + 1, mask, dict);
set<string> b = boggle_h(board, word, index + 1, mask, dict);
set<string> c = boggle_h(board, word, index + 4 + 1, mask, dict);
rt.insert(a.begin(), a.end());
rt.insert(b.begin(), b.end());
rt.insert(c.begin(), c.end());
}
set<string> a = boggle_h(board, word, index + 4, mask, dict);
set<string> b = boggle_h(board, word, index - 4, mask, dict);
rt.insert(a.begin(), a.end());
rt.insert(b.begin(), b.end());
return rt;
}
set<string> boggle(string board) {
set<string> words;
for(int i = 0; i < 16; i++) {
set<string> a = boggle_h(board, "", i, 0, &DICTIONARY);
words.insert(a.begin(), a.end());
}
return words;
}
void buildDict(string file, TreeDict &dict = DICTIONARY) {
ifstream fstr(file.c_str());
string s;
if(fstr.is_open()) {
while(fstr.good()) {
fstr >> s;
dict.insert(s);
}
fstr.close();
}
}
struct lencmp {
bool operator()(const string &a, const string &b) {
if(a.size() != b.size())
return a.size() > b.size();
return a < b;
}
};
int main() {
srand(time(NULL));
buildDict("/Users/XXX/Desktop/TWL06.txt");
set<string> a = boggle("SEROPITSLANESERG");
set<string, lencmp> words;
words.insert(a.begin(), a.end());
set<string>::iterator it;
for(it = words.begin(); it != words.end(); it++)
cout << *it << endl;
cout << words.size() << " words." << endl;
}
Randomly generating boards and testing against them didn't turn out too effective, expectedly, I didn't really bother with running that, but I'd be surprised if they crossed 200 words. Instead I changed the board generation to generate boards with letters distributed in proportion to their frequency in TWL06, achieved by a quick cumulative frequency (the frequencies were reduced by a factor of 100), below.
string randomBoard() {
string board = "";
for(int i = 0; i < 16; i++)
board += (char)('A' + rand() % 26);
return board;
}
char distLetter() {
int x = rand() % 15833;
if(x < 1209) return 'A';
if(x < 1510) return 'B';
if(x < 2151) return 'C';
if(x < 2699) return 'D';
if(x < 4526) return 'E';
if(x < 4726) return 'F';
if(x < 5161) return 'G';
if(x < 5528) return 'H';
if(x < 6931) return 'I';
if(x < 6957) return 'J';
if(x < 7101) return 'K';
if(x < 7947) return 'L';
if(x < 8395) return 'M';
if(x < 9462) return 'N';
if(x < 10496) return 'O';
if(x < 10962) return 'P';
if(x < 10987) return 'Q';
if(x < 12111) return 'R';
if(x < 13613) return 'S';
if(x < 14653) return 'T';
if(x < 15174) return 'U';
if(x < 15328) return 'V';
if(x < 15452) return 'W';
if(x < 15499) return 'X';
if(x < 15757) return 'Y';
if(x < 15833) return 'Z';
}
string distBoard() {
string board = "";
for(int i = 0; i < 16; i++)
board += distLetter();
return board;
}
This was significantly more effective, very easily achieving 400+ word boards. I left it running (for longer than I intended), and after checking over a million boards, the highest found was around 650 words. This was still essentially random generation, and that has its limits.
Instead, I opted for a greedy maximisation strategy, wherein I'd take a board and make a small change to it, and then commit the change only if it increased the word count.
string changeLetter(string x) {
int y = rand() % 16;
x[y] = distLetter();
return x;
}
string swapLetter(string x) {
int y = rand() % 16;
int z = rand() % 16;
char w = x[y];
x[y] = x[z];
x[z] = w;
return x;
}
string change(string x) {
if(rand() % 2)
return changeLetter(x);
return swapLetter(x);
}
int main() {
srand(time(NULL));
buildDict("/Users/XXX/Desktop/TWL06.txt");
string board = "SEROPITSLANESERG";
int locmax = boggle(board).size();
for(int j = 0; j < 5000; j++) {
int changes = 1;
string board2 = board;
for(int k = 0; k < changes; k++)
board2 = change(board);
int loc = boggle(board2).size();
if(loc >= locmax && board != board2) {
j = 0;
board = board2;
locmax = loc;
}
}
}
This very rapidly got me 1000+ word boards, with generally similar letter patterns, despite randomised starting points. What leads me to believe that the board given is the best possible board is how it, or one of its various reflections, turned up repeatedly, within the first 100 odd attempts at maximising a random board.
The biggest reason for skepticism is the greediness of this algorithm, and that this somehow would lead to the algorithm missing out better boards. The small changes made are quite flexible in their outcomes – that is, they have the power to completely transform a grid from its (randomised) start position. The number of possible changes, 26*16 for the fresh letter, and 16*15 for the letter swap, are both significantly less than 5000, the number of continuous discarded changes allowed.
The fact that the program was able to repeat this board output within the first 100 odd times implies that the number of local maximums is relatively small, and the probability that there is an undiscovered maximum low.
Although the greedy seemed intuitively right – it shouldn't really be less possible to reach a given grid with the delta changes from a random board – and the two possible changes, a swap and a fresh letter do seem to encapsulate all possible improvements, I changed the program in order to allow it to make more changes before checking for the increase. This again returned the same board, repeatedly.
int main() {
srand(time(NULL));
buildDict("/Users/XXX/Desktop/TWL06.txt");
int glomax = 0;
int i = 0;
while(true) {
string board = distBoard();
int locmax = boggle(board).size();
for(int j = 0; j < 500; j++) {
string board2 = board;
for(int k = 0; k < 2; k++)
board2 = change(board);
int loc = boggle(board2).size();
if(loc >= locmax && board != board2) {
j = 0;
board = board2;
locmax = loc;
}
}
if(glomax <= locmax) {
glomax = locmax;
cout << board << " " << glomax << " words." << endl;
}
if(++i % 10 == 0)
cout << i << endl;
}
}
Having iterated over this loop around a 1000 times, with this particular board configuration showing up ~10 times, I'm pretty confident that this is for now the Ruzzle board with the most unique words, until the English language changes.

Interesting problem. I see (at least, but mainly) two approches
one is to try the hard way to stick as many wordable letters (in all directions) as possible, based on a dictionary. As you said, there are many possible combinations, and that route requires a well elaborated and complex algorithm to reach something tangible
there is another "loose" solution based on probabilities that I like more. You suggested to remove some low-appearance letters to maximize the board yield. An extension of this could be to use more of the high-appearance letters in the dictionary.
A further step could be:
based on the 80k dictionary D, you find out for each l1 letter of our L ensemble of 26 letters the probability that letter l2 precedes or follows l1. This is a L x L probabilities array, and is pretty small, so you could even extend to L x L x L, i.e. considering l1 and l2 what probability has l3 to fit. This is a bit more complex if the algorithm wants to estimate accurate probabilities, as the probas sum depends on the relative position of the 3 letters, for instance in a 'triangle' configuration (eg positions (3,3), (3,4) and (3,5)) the result is probably less yielding than when the letters are aligned [just a supposition]. Why not going up to L x L x L x L, which will require some optimizations...
then you distribute a few high-appearance letters (say 4~6) randomly on the board (having each at least 1 blank cell around in at least 5 of the 8 possible directions) and then use your L x L [xL] probas arrays to complete - meaning based on the existing letter, the next cell is filled with a letter which proba is high given the configuration [again, letters sorted by proba descending, and use randomness if two letters are in a close tie].
For instance, taking only the horizontal configuration, having the following letters in place, and we want to find the best 2 in between ER and TO
...ER??TO...
Using L x L, a loop like (l1 and l2 are our two missing letters). Find the absolutely better letters - but bestchoice and bestproba could be arrays instead and keep the - say - 10 best choices.
Note: there is no need to keep the proba in the range [0,1] in this case, we can sum up the probas (which don't give a proba - but the number matters. A mathematical proba could be something like p = ( p(l0,l1) + p(l2,l3) ) / 2, l0 and l3 are the R and T in our L x L exemple)
bestproba = 0
bestchoice = (none, none)
for letter l1 in L
for letter l2 in L
p = proba('R',l1) + proba(l2,'T')
if ( p > bestproba )
bestproba = p
bestchoice = (l1, l2)
fi
rof
rof
the algorithm can take more factors into account, and needs to take the vertical and diagonals into account as well. With L x L x L, more letters in more directions are taken into account, like ER?,R??,??T,?TO - this requires to think more through the algorithm - maybe starting with L x L can give an idea about the relevancy of this algorithm.
Note that a lot of this may be pre-calculated, and the L x L array is of course one of them.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js