BFS maze help c++ - c++

I am attempting to make a maze-solver using a Breadth-first search, and mark the shortest path using a character '*'
The maze is actually just a bunch of text. The maze consists of an n x n grid, consisting of "#" symbols that are walls, and periods "." representing the walkable area/paths. An 'S' denotes start, 'F' is finish. Right now, this function does not seem to be finding the solution (it thinks it has the solution even when one is impossible). I am checking the four neighbors, and if they are 'unfound' (-1) they are added to the queue to be processed.
The maze works on several mazes, but not on this one:
...###.#....
##.#...####.
...#.#.#....
#.####.####.
#F..#..#.##.
###.#....#S.
#.#.####.##.
....#.#...#.
.####.#.#.#.
........#...
What could be missing in my logic?
int mazeSolver(char *maze, int rows, int cols)
{
int start = 0;
int finish = 0;
for (int i=0;i<rows*cols;i++) {
if (maze[i] == 'S') { start=i; }
if (maze[i] == 'F') { finish=i; }
}
if (finish==0 || start==0) { return -1; }
char* bfsq;
bfsq = new char[rows*cols]; //initialize queue array
int head = 0;
int tail = 0;
bool solved = false;
char* prd;
prd = new char[rows*cols]; //initialize predecessor array
for (int i=0;i<rows*cols;i++) {
prd[i] = -1;
}
prd[start] = -2; //set the start location
bfsq[tail] = start;
tail++;
int delta[] = {-cols,-1,cols,+1}; // North, West, South, East neighbors
while(tail>head) {
int front = bfsq[head];
head++;
for (int i=0; i<4; i++) {
int neighbor = front+delta[i];
if (neighbor/cols < 0 || neighbor/cols >= rows || neighbor%cols < 0 || neighbor%cols >= cols) {
continue;
}
if (prd[neighbor] == -1 && maze[neighbor]!='#') {
prd[neighbor] = front;
bfsq[tail] = neighbor;
tail++;
if (maze[neighbor] == 'F') { solved = true; }
}
}
}
if (solved == true) {
int previous = finish;
while (previous != start) {
maze[previous] = '*';
previous = prd[previous];
}
maze[finish] = 'F';
return 1;
}
else { return 0; }
delete [] prd;
delete [] bfsq;
}

Iterating through neighbours can be significantly simplified(I know this is somewhat similar to what kobra suggests but it can be improved further). I use a moves array defining the x and y delta of the given move like so:
int moves[4][2] = {{0,1},{1,0},{0,-1},{-1,0}};
Please note that not only tis lists all the possible moves from a given cell but it also lists them in clockwise direction which is useful for some problems.
Now to traverse the array I use a std::queue<pair<int,int> > This way the current position is defined by the pair of coordinates corresponding to it. Here is how I cycle through the neighbours of a gien cell c:
pair<int,int> c;
for (int l = 0;l < 4/*size of moves*/;++l){
int ti = c.first + moves[l][0];
int tj = c.second + moves[l][1];
if (ti < 0 || ti >= n || tj < 0 || tj >= m) {
// This move goes out of the field
continue;
}
// Do something.
}
I know this code is not really related to your code, but as I am teaching this kind of problems trust me a lot of students were really thankful when I showed them this approach.
Now back to your question - you need to start from the end position and use prd array to find its parent, then find its parent's parent and so on until you reach a cell with negative parent. What you do instead considers all the visited cells and some of them are not on the shortest path from S to F.
You can break once you set solved = true this will optimize the algorithm a bit.
I personally think you always find a solution because you have no checks for falling off the field. (the if (ti < 0 || ti >= n || tj < 0 || tj >= m) bit in my code).
Hope this helps you and gives you some tips how to improve your coding.

A few comments:
You can use queue container in c++, its much more easier in use
In this task you can write something like that:
int delta[] = {-1, cols, 1 -cols};
And then you simple can iterate through all four sides, you shouldn't copy-paste the same code.
You will have problems with boundaries of your array. Because you are not checking it.
When you have founded finish you should break from cycle
And in last cycle you have an error. It will print * in all cells in which you have been (not only in the optimal way). It should look:
while (finish != start)
{
maze[finish] = '*';
finish = prd[finish];
}
maze[start] = '*';
And of course this cycle should in the last if, because you don't know at that moment have you reach end or not
PS And its better to clear memory which you have allocate in function

Related

How to find first set?

I am trying to list the First set of a given grammar with this function:
Note:
char c - the character to find the first set;
first_set - store elements of the corresponding first set;
q1, q2 - the previous position;
rule- store all the grammar rule line by line listed below;
for the first time the parameters are ('S', 0, 0).
void findfirst(char c, int q1, int q2){
if(!(isupper(c)) || c=='$'){
first_set[n++] = c;
}
for(int j=0;j<rule_number;j++){
if(rule[j][0]==c){
if(rule[j][2]==';'){
if(rule[q1][q2]=='\0')
first_set[n++] = ';';
else if(rule[q1][q2]!='\0' &&(q1!=0||q2!=0))
findfirst(rule[q1][q2], q1, (q2+1));
else
first_set[n++] = ';';
}
else if(!isupper(rule[j][2]) || rule[j][2]=='$')
first_set[n++] = rule[j][2];
else
findfirst(rule[j][2],j,3);
}
}
}
But found that if the given grammar looks like this:
S AC$
C c
C ;
A aBCd
A BQ
B bB
B ;
Q q
Q ;
(which the left hand side or any capital letters in the right hand side are non-terminal, and any small case letters are terminal)
the function couldn't correctly output the first set for S, since it will stop at finding the first set of Q and store ';' to the first set and won't go on to find C's first set.
Does anyone have a clue? Thanks in advance.
It is extremely inefficient to compute FIRST sets one at a time, since they are interdependent. For example, in order to compute the FIRST set of A , you need to also compute the FIRST set of B, and then because B can derive the emoty string, you need the FIRST set of Q.
Most algorithms compute all of them in parallel, using some variation of a transitive closure algorithm. You can do this with a depth-first search, which seems to be what you are attempting, but it might be easier to implement the least fixed point algorithm described in the Dragon book (and Wikipedia.
Either way, you will probably find it easier to first compute NULLABLE (that is, which non-terminals derive the empty set). There is a simple linear-time algorithm for that (linear in the size of the grammar), which again is easy to find.
If you are doing this work as part of a class, you'll probably find the relevant algorithms in your course materials. Alternatively, you can look for a copy of the Dragon book or other similar text books.
You could do like the following code:
used[i] means the rule[i] is used or not
The method is Depth-first search, see https://en.wikipedia.org/wiki/Depth-first_search
#include <iostream>
#define MAX_SIZE 1024
char rule[][10] = {
"S AC$",
"C c",
"C ;",
"A aBCd",
"A BQ",
"B bB",
"B ;",
"Q q",
"Q ;"
};
constexpr int rule_number = sizeof(rule) / sizeof(rule[0]);
char first_set[MAX_SIZE];
bool findfirst(int row, int col, int *n, bool* used) {
for (;;) {
char ch = rule[row][col];
if (ch == '$' || ch == ';' || ch == '\0') {
first_set[*n] = '\0';
break;
}
if (islower(ch)) {
first_set[(*n)++] = ch;
++col;
continue;
}
int i;
for (i = 0; i != rule_number; ++i) {
if (used[i] == true || rule[i][0] != ch)
continue;
used[i] = true;
int k = *n;
if (findfirst(i, 2, n, used) == true)
break;
used[i] = false;
*n = k;
}
if (i == rule_number)
return false;
++col;
}
return true;
}
int main() {
bool used[rule_number];
int n = 0;
for (int i = 2; rule[0][i] != '$' && rule[0][i] != '\0'; ++i) {
for (int j = 0; j != rule_number; ++j)
used[j] = false;
used[0] = true;
findfirst(0, i, &n, used);
}
std::cout << first_set << std::endl;
return 0;
}

Algorithm to divide a black-and-white chocolate bar

Problem description:
There's a chocolate bar that consists of m x n squares. Some of the squares are black, some are white. Someone breaks the chocolate bar along its vertical axis or horizontal axis. Then it is broken again along its vertical or horizontal axis and it's being broken until it can broken into a single square or it can broken into squares that are only black or only white. Using a preferably divide-and-conquer algorithm, find the number of methods a chocolate bar can be broken.
Input:
The first line tells you the m x n dimensions of the chocolate bar. In the next m lines there are n characters that tell you how does the chocolate bar look. Letter w is a white square, letter b is a black square.
for example:
3 2
bwb
wbw
Output:
the number of methods the chocolate bar can be broken:
for the example above, it's 5 (take a look at the attached picture).
I tried to solve it using an iterative approach. Unfortunately, I couldn't finish the code as I'm not yet sure how to divide the the halves (see my code below). I was told that an recursive approach is much easier than this, but I have no idea how to do it. I'm looking for another way to solve this problem than my approach or I'm looking for some help with finishing my code.
I made two 2D arrays, first for white squares, second for black squares. I'm making a matrix out of the squares and if there's a chocolate of such or such color, then I'm marking it as 1 in the corresponding array.
Then I made two arrays of the two cumulative sums of the matrices above.
Then I created a 4D array of size [n][m][n][m] and I made four loops: first two (i, j) are increasing the size of an rectangular array that is the size of the searching array (it's pretty hard to explain...) and two more loops (k, l) are increasing the position of my starting points x and y in the array. Then the algorithm checks using the cumulative sum if in the area starting at position kxl and ending at k+i x l+j there is one black and one white square. If there is, then I'm creating two more loops that will divide the area in half. If in the two new halves there are still black and white squares, then I'm increasing the corresponding 4D array element by the number of combinations of the first halve * the number of combinations of the second halve.
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
int counter=0;
int n, m;
ifstream in;
in.open("in.txt");
ofstream out;
out.open("out.txt");
if(!in.good())
{
cout << "No such file";
return 0;
}
in >> n >> m;
int whitesarray[m][n];
int blacksarray[m][n];
int methodsarray[m][n][m][n];
for(int i=0; i<m; i++)
{
for(int j=0; j<n; j++)
{
whitesarray[i][j] = 0;
blacksarray[i][j] = 0;
}
}
while(in)
{
string colour;
in >> colour;
for (int i=0; i < colour.length(); i++)
{
if(colour[i] == 'c')
{
blacksarray[counter][i] = 1;
}
if(colour[i] == 'b')
{
whitesarray[counter][i] = 1;
}
}
counter++;
}
int whitessum[m][n];
int blackssum[m][n];
for (int i=0; i<m; i++)
{
for (int j=0; j<n; j++)
{
if(i-1 == -1 && j-1 == -1)
{
whitessum[i][j] = whitesarray[i][j];
blackssum[i][j] = blacksarray[i][j];
}
if(i-1 == -1 && j-1 != -1)
{
whitessum[i][j] = whitessum[i][j-1] + whitesarray[i][j];
blackssum[i][j] = blackssum[i][j-1] + blacksarray[i][j];
}
if(j-1 == -1 && i-1 != -1)
{
whitessum[i][j] = whitessum[i-1][j] + whitesarray[i][j];
blackssum[i][j] = blackssum[i-1][j] + blacksarray[i][j];
}
if(j-1 != -1 && i-1 != -1)
{
whitessum[i][j] = whitessum[i-1][j] + whitessum[i][j-1] - whitessum[i-1][j-1] + whitesarray[i][j];
blackssum[i][j] = blackssum[i-1][j] + blackssum[i][j-1] - blackssum[i-1][j-1] + blacksarray[i][j];
}
}
}
int posx=0;
int posy=0;
int tempwhitessum=0;
int tempblackssum=0;
int k=0, l=0;
for (int i=0; i<=m; i++)
{
for (int j=0; j<=n; j++) // wielkosc wierszy
{
for (posx=0; posx < m - i; posx++)
{
for(posy = 0; posy < n - j; posy++)
{
k = i+posx-1;
l = j+posy-1;
if(k >= m || l >= n)
continue;
if(posx==0 && posy==0)
{
tempwhitessum = whitessum[k][l];
tempblackssum = blackssum[k][l];
}
if(posx==0 && posy!=0)
{
tempwhitessum = whitessum[k][l] - whitessum[k][posy-1];
tempblackssum = blackssum[k][l] - blackssum[k][posy-1];
}
if(posx!=0 && posy==0)
{
tempwhitessum = whitessum[k][l] - whitessum[posx-1][l];
tempblackssum = blackssum[k][l] - blackssum[posx-1][l];
}
if(posx!=0 && posy!=0)
{
tempwhitessum = whitessum[k][l] - whitessum[posx-1][l] - whitessum[k][posy-1] + whitessum[posx-1][posy-1];
tempblackssum = blackssum[k][l] - blackssum[posx-1][l] - blackssum[k][posy-1] + blackssum[posx-1][posy-1];
}
if(tempwhitessum >0 && tempblackssum > 0)
{
for(int e=0; e<n; e++)
{
//Somehow divide the previously found area by two and check again if there are black and white squares in this area
}
for(int r=0; r<m; r++)
{
//Somehow divide the previously found area by two and check again if there are black and white squares in this area
}
}
}
}
}}
return 0;
}
I strongly recommend recursion for this. In fact, Dynamic Programming (DP) would also be very useful, especially for larger bars. Recursion first ...
Recursion
Your recursive routine takes a 2-D array of characters (b and w). It returns the number of ways this can be broken.
First, the base cases: (1) if it's possible to break the given bar into a single piece (see my comment above, asking for clarification), return 1; (2) if the array is all one colour, return 1. For each of these, there's only one way for the bar to end up -- the way it was passed in.
Now, for the more complex case, when the bar can still be broken:
total_ways = 0
for each non-edge position in each dimension:
break the bar at that spot; form the two smaller bars, A and B.
count the ways to break each smaller bar: count(A) and count(B)
total_ways += count(A) * count(B)
return total_ways
Is that clear enough for the general approach? You still have plenty of coding to do, but using recursion allows you to think of only the two basic ideas when writing your function: (1) How do I know when I'm done, and what trivial result do I return then? (2) If I'm not done, how do I reduce the problem?
Dynamic Programming
This consists of keeping a record of situations you've already solved. The first thing you do in the routine is to check your "data base" to see whether you already know this case. If so, return the known result instead of recomputing. This includes the overhead of developing and implementing said data base, probably a look-up list (dictionary) of string arrays and integer results, such as ["bwb", "wbw"] => 5.

What Ruzzle board contains the most unique words?

For smart phones, there is this game called Ruzzle.
It's a word finding game.
Quick Explanation:
The game board is a 4x4 grid of letters.
You start from any cell and try to spell a word by dragging up, down, left, right, or diagonal.
The board doesn't wrap, and you can't reuse letters you've already selected.
On average, my friend and I find about 40 words, and at the end of the round, the game informs you of how many possible words you could have gotten. This number is usually about 250 - 350.
We are wondering what board would yield the highest number of possible words.
How would I go about finding the optimal board?
I've written a program in C that takes 16 characters and outputs all the appropriate words.
Testing over 80,000 words, it takes about a second to process.
The Problem:
The number of game board permutations is 26^16.
That's 43608742899428874059776 (43 sextillion).
I need some kind of heuristic.
Should I skip all boards that have either z, q, x, etc because they are expected to not have as many words? I wouldn't want to exclude a letter without being certain.
There is also 4 duplicates of every board, because rotating the board will still give the same results.
But even with these restrictions, I don't think I have enough time in my life to find the answer.
Maybe board generation isn't the answer.
Is there a quicker way to find the answer looking at the list of words?
tldr;
S E R O
P I T S
L A N E
S E R G
or any of its reflections.
This board contains 1212 words (and as it turns out, you can exclude 'z', 'q' and 'x').
First things first, turns out you're using the wrong dictionary. After not getting exact matches with Ruzzle's word count, I looked into it, it seems Ruzzle uses a dictionary called TWL06, which has around 180,000 words. Don't ask me what it stands for, but it's freely available in txt.
I also wrote code to find all possible words given a 16 character board, as follows. It builds the dictionary into a tree structure, and then pretty much just goes around recursively while there are words to be found. It prints them in order of length. Uniqueness is maintained by the STL set structure.
#include <cstdlib>
#include <ctime>
#include <map>
#include <string>
#include <set>
#include <algorithm>
#include <fstream>
#include <iostream>
using namespace std;
struct TreeDict {
bool existing;
map<char, TreeDict> sub;
TreeDict() {
existing = false;
}
TreeDict& operator=(TreeDict &a) {
existing = a.existing;
sub = a.sub;
return *this;
}
void insert(string s) {
if(s.size() == 0) {
existing = true;
return;
}
sub[s[0]].insert(s.substr(1));
}
bool exists(string s = "") {
if(s.size() == 0)
return existing;
if(sub.find(s[0]) == sub.end())
return false;
return sub[s[0]].exists(s.substr(1));
}
TreeDict* operator[](char alpha) {
if(sub.find(alpha) == sub.end())
return NULL;
return &sub[alpha];
}
};
TreeDict DICTIONARY;
set<string> boggle_h(const string board, string word, int index, int mask, TreeDict *dict) {
if(index < 0 || index >= 16 || (mask & (1 << index)))
return set<string>();
word += board[index];
mask |= 1 << index;
dict = (*dict)[board[index]];
if(dict == NULL)
return set<string>();
set<string> rt;
if((*dict).exists())
rt.insert(word);
if((*dict).sub.empty())
return rt;
if(index % 4 != 0) {
set<string> a = boggle_h(board, word, index - 4 - 1, mask, dict);
set<string> b = boggle_h(board, word, index - 1, mask, dict);
set<string> c = boggle_h(board, word, index + 4 - 1, mask, dict);
rt.insert(a.begin(), a.end());
rt.insert(b.begin(), b.end());
rt.insert(c.begin(), c.end());
}
if(index % 4 != 3) {
set<string> a = boggle_h(board, word, index - 4 + 1, mask, dict);
set<string> b = boggle_h(board, word, index + 1, mask, dict);
set<string> c = boggle_h(board, word, index + 4 + 1, mask, dict);
rt.insert(a.begin(), a.end());
rt.insert(b.begin(), b.end());
rt.insert(c.begin(), c.end());
}
set<string> a = boggle_h(board, word, index + 4, mask, dict);
set<string> b = boggle_h(board, word, index - 4, mask, dict);
rt.insert(a.begin(), a.end());
rt.insert(b.begin(), b.end());
return rt;
}
set<string> boggle(string board) {
set<string> words;
for(int i = 0; i < 16; i++) {
set<string> a = boggle_h(board, "", i, 0, &DICTIONARY);
words.insert(a.begin(), a.end());
}
return words;
}
void buildDict(string file, TreeDict &dict = DICTIONARY) {
ifstream fstr(file.c_str());
string s;
if(fstr.is_open()) {
while(fstr.good()) {
fstr >> s;
dict.insert(s);
}
fstr.close();
}
}
struct lencmp {
bool operator()(const string &a, const string &b) {
if(a.size() != b.size())
return a.size() > b.size();
return a < b;
}
};
int main() {
srand(time(NULL));
buildDict("/Users/XXX/Desktop/TWL06.txt");
set<string> a = boggle("SEROPITSLANESERG");
set<string, lencmp> words;
words.insert(a.begin(), a.end());
set<string>::iterator it;
for(it = words.begin(); it != words.end(); it++)
cout << *it << endl;
cout << words.size() << " words." << endl;
}
Randomly generating boards and testing against them didn't turn out too effective, expectedly, I didn't really bother with running that, but I'd be surprised if they crossed 200 words. Instead I changed the board generation to generate boards with letters distributed in proportion to their frequency in TWL06, achieved by a quick cumulative frequency (the frequencies were reduced by a factor of 100), below.
string randomBoard() {
string board = "";
for(int i = 0; i < 16; i++)
board += (char)('A' + rand() % 26);
return board;
}
char distLetter() {
int x = rand() % 15833;
if(x < 1209) return 'A';
if(x < 1510) return 'B';
if(x < 2151) return 'C';
if(x < 2699) return 'D';
if(x < 4526) return 'E';
if(x < 4726) return 'F';
if(x < 5161) return 'G';
if(x < 5528) return 'H';
if(x < 6931) return 'I';
if(x < 6957) return 'J';
if(x < 7101) return 'K';
if(x < 7947) return 'L';
if(x < 8395) return 'M';
if(x < 9462) return 'N';
if(x < 10496) return 'O';
if(x < 10962) return 'P';
if(x < 10987) return 'Q';
if(x < 12111) return 'R';
if(x < 13613) return 'S';
if(x < 14653) return 'T';
if(x < 15174) return 'U';
if(x < 15328) return 'V';
if(x < 15452) return 'W';
if(x < 15499) return 'X';
if(x < 15757) return 'Y';
if(x < 15833) return 'Z';
}
string distBoard() {
string board = "";
for(int i = 0; i < 16; i++)
board += distLetter();
return board;
}
This was significantly more effective, very easily achieving 400+ word boards. I left it running (for longer than I intended), and after checking over a million boards, the highest found was around 650 words. This was still essentially random generation, and that has its limits.
Instead, I opted for a greedy maximisation strategy, wherein I'd take a board and make a small change to it, and then commit the change only if it increased the word count.
string changeLetter(string x) {
int y = rand() % 16;
x[y] = distLetter();
return x;
}
string swapLetter(string x) {
int y = rand() % 16;
int z = rand() % 16;
char w = x[y];
x[y] = x[z];
x[z] = w;
return x;
}
string change(string x) {
if(rand() % 2)
return changeLetter(x);
return swapLetter(x);
}
int main() {
srand(time(NULL));
buildDict("/Users/XXX/Desktop/TWL06.txt");
string board = "SEROPITSLANESERG";
int locmax = boggle(board).size();
for(int j = 0; j < 5000; j++) {
int changes = 1;
string board2 = board;
for(int k = 0; k < changes; k++)
board2 = change(board);
int loc = boggle(board2).size();
if(loc >= locmax && board != board2) {
j = 0;
board = board2;
locmax = loc;
}
}
}
This very rapidly got me 1000+ word boards, with generally similar letter patterns, despite randomised starting points. What leads me to believe that the board given is the best possible board is how it, or one of its various reflections, turned up repeatedly, within the first 100 odd attempts at maximising a random board.
The biggest reason for skepticism is the greediness of this algorithm, and that this somehow would lead to the algorithm missing out better boards. The small changes made are quite flexible in their outcomes – that is, they have the power to completely transform a grid from its (randomised) start position. The number of possible changes, 26*16 for the fresh letter, and 16*15 for the letter swap, are both significantly less than 5000, the number of continuous discarded changes allowed.
The fact that the program was able to repeat this board output within the first 100 odd times implies that the number of local maximums is relatively small, and the probability that there is an undiscovered maximum low.
Although the greedy seemed intuitively right – it shouldn't really be less possible to reach a given grid with the delta changes from a random board – and the two possible changes, a swap and a fresh letter do seem to encapsulate all possible improvements, I changed the program in order to allow it to make more changes before checking for the increase. This again returned the same board, repeatedly.
int main() {
srand(time(NULL));
buildDict("/Users/XXX/Desktop/TWL06.txt");
int glomax = 0;
int i = 0;
while(true) {
string board = distBoard();
int locmax = boggle(board).size();
for(int j = 0; j < 500; j++) {
string board2 = board;
for(int k = 0; k < 2; k++)
board2 = change(board);
int loc = boggle(board2).size();
if(loc >= locmax && board != board2) {
j = 0;
board = board2;
locmax = loc;
}
}
if(glomax <= locmax) {
glomax = locmax;
cout << board << " " << glomax << " words." << endl;
}
if(++i % 10 == 0)
cout << i << endl;
}
}
Having iterated over this loop around a 1000 times, with this particular board configuration showing up ~10 times, I'm pretty confident that this is for now the Ruzzle board with the most unique words, until the English language changes.
Interesting problem. I see (at least, but mainly) two approches
one is to try the hard way to stick as many wordable letters (in all directions) as possible, based on a dictionary. As you said, there are many possible combinations, and that route requires a well elaborated and complex algorithm to reach something tangible
there is another "loose" solution based on probabilities that I like more. You suggested to remove some low-appearance letters to maximize the board yield. An extension of this could be to use more of the high-appearance letters in the dictionary.
A further step could be:
based on the 80k dictionary D, you find out for each l1 letter of our L ensemble of 26 letters the probability that letter l2 precedes or follows l1. This is a L x L probabilities array, and is pretty small, so you could even extend to L x L x L, i.e. considering l1 and l2 what probability has l3 to fit. This is a bit more complex if the algorithm wants to estimate accurate probabilities, as the probas sum depends on the relative position of the 3 letters, for instance in a 'triangle' configuration (eg positions (3,3), (3,4) and (3,5)) the result is probably less yielding than when the letters are aligned [just a supposition]. Why not going up to L x L x L x L, which will require some optimizations...
then you distribute a few high-appearance letters (say 4~6) randomly on the board (having each at least 1 blank cell around in at least 5 of the 8 possible directions) and then use your L x L [xL] probas arrays to complete - meaning based on the existing letter, the next cell is filled with a letter which proba is high given the configuration [again, letters sorted by proba descending, and use randomness if two letters are in a close tie].
For instance, taking only the horizontal configuration, having the following letters in place, and we want to find the best 2 in between ER and TO
...ER??TO...
Using L x L, a loop like (l1 and l2 are our two missing letters). Find the absolutely better letters - but bestchoice and bestproba could be arrays instead and keep the - say - 10 best choices.
Note: there is no need to keep the proba in the range [0,1] in this case, we can sum up the probas (which don't give a proba - but the number matters. A mathematical proba could be something like p = ( p(l0,l1) + p(l2,l3) ) / 2, l0 and l3 are the R and T in our L x L exemple)
bestproba = 0
bestchoice = (none, none)
for letter l1 in L
for letter l2 in L
p = proba('R',l1) + proba(l2,'T')
if ( p > bestproba )
bestproba = p
bestchoice = (l1, l2)
fi
rof
rof
the algorithm can take more factors into account, and needs to take the vertical and diagonals into account as well. With L x L x L, more letters in more directions are taken into account, like ER?,R??,??T,?TO - this requires to think more through the algorithm - maybe starting with L x L can give an idea about the relevancy of this algorithm.
Note that a lot of this may be pre-calculated, and the L x L array is of course one of them.

Wrong search in A* pathing code

I have some C++ code I wrote to find an A* path, but it's behaving strangely. There's quite a bit of code here, so I'll split it into chunks and try to explain what I'm doing. I'm not gonna explain how A* pathing works. I assume if you're trying to help you already know the algorithm.
First off, here's my function for calculating the h value of a node:
int
calculateH(int cX, int cY, int eX, int eY, int horiCost = 10, int diagCost = 14) {
int h;
int xDist = abs(eX - cX);
int yDist = abs(eY - cY);
if (xDist > yDist)
h = (diagCost * yDist) + (horiCost * (xDist - yDist));
else
h = (diagCost * xDist) + (horiCost * (yDist - xDist));
return h;
}
I'm pretty sure there's no problem here; pretty simple stuff.
Next my Node class. And I know, I know, make those variables private and use getters; I just did it this way for testing purposes.
class Node {
public:
Node(int x_, int y_, int g_, int h_, int pX_, int pY_, bool list_) {
x = x_;
y = y_;
g = g_;
h = h_;
pX = pX_;
pY = pY_;
list = list_;
};
int x, y, g, h, pX, pY;
bool list;
};
Each Node has an X and Y variable. I only store G and H, not F, and calculate F when I need it (which is only once in my code). Then there's the Parent X and Y values. List is a boolean: fale = open list, true = closed list.
I also have a Object class. The only variables that matter here are X, Y, and Passable, all accessed through getters.
Now here's the start of my actual pathfinding code. It returns a string of numbers corresponding to directions as shown below:
432
501
678
So 1 means move right, 8 means go down and right, 0 means don't go anywhere.
string
findPath(int startX, int startY, int finishX, int finishY) {
// Check if we're already there.
if (startX == finishX && startY == finishY)
return "0";
// Check if the space is occupied.
for (int k = 0; k < objects.size(); k ++)
if ((objects[k] -> getX() == finishX) &&
(objects[k] -> getY() == finishY) &&
(!objects[k] -> canPass()))
return "0";
// The string that contains our path.
string path = "";
// The identifier of the current node.
int currentNode = 0;
// The list of nodes.
vector<Node> nodes;
// Add the starting node to the closed list.
nodes.push_back(Node(startX, startY, 0,
calculateH(startX, startY, finishX, finishY),
startX, startY, true));
Now we loop through until we find the destination. Note that sizeLimit is just to make sure we don't loop forever (it WONT if I can fix this code. As of right now it's very necessary). Everything from this point on, until I mark otherwise, is inside the i j loops.
int sizeLimit = 0;
while ((nodes[currentNode].x != finishX) | (nodes[currentNode].y != finishY)) {
// Check the surrounding spaces.
for (int i = -1; i <= 1; i ++) {
for (int j = -1; j <= 1; j ++) {
bool isEmpty = true;
// Check if there's a wall there.
for (int k = 0; k < objects.size(); k ++) {
if ((objects[k] -> getX() == (nodes[currentNode].x + i)) &&
(objects[k] -> getY() == (nodes[currentNode].y + j)) &&
(!objects[k] -> canPass())) {
isEmpty = false;
}
}
Next part:
// Check if it's on the closed list.
for (int k = 0; k < nodes.size(); k ++) {
if ((nodes[k].x == (nodes[currentNode].x + i)) &&
(nodes[k].y == (nodes[currentNode].y + j)) &&
(nodes[k].list)) {
isEmpty = false;
}
}
Continuing on:
// Check if it's on the open list.
for (int k = 0; k < nodes.size(); k ++) {
if ((nodes[k].x == (nodes[currentNode].x + i)) &&
(nodes[k].y == (nodes[currentNode].y + j)) &&
(!nodes[k].list)) {
// Check if the G score is lower from here.
if (nodes[currentNode].g + 10 + (abs(i * j) * 4) <= nodes[k].g) {
nodes[k].g = nodes[currentNode].g + 10 + (abs(i * j) * 4);
nodes[k].pX = nodes[currentNode].x;
nodes[k].pY = nodes[currentNode].y;
}
isEmpty = false;
}
}
This is the last part of the i j loop:
if (isEmpty) {
nodes.push_back(Node(nodes[currentNode].x + i,
nodes[currentNode].y + j,
nodes[currentNode].g + 10 + (abs(i * j) * 4),
calculateH(nodes[currentNode].x + i, nodes[currentNode].y + j, finishX, finishY),
nodes[currentNode].x,
nodes[currentNode].y,
false));
}
}
}
Now we find the Node with the lowest F score, change it to the current node, and add it to the closed list. The protection against infinite lopping is also finished up here:
// Set the current node to the one with the lowest F score.
int lowestF = (nodes[currentNode].g + nodes[currentNode].h);
int lowestFIndex = currentNode;
for (int k = 0; k < nodes.size(); k ++) {
if (((nodes[k].g + nodes[k].h) <= lowestF) &&
(!nodes[k].list)) {
lowestF = (nodes[k].g + nodes[k].h);
lowestFIndex = k;
}
}
currentNode = lowestFIndex;
// Change it to the closed list.
nodes[currentNode].list = true;
sizeLimit ++;
if (sizeLimit > 1000)
return "";
}
The problem I'm having is that it wont find certain paths. It seems to never work if the path goes up or left at any point. Down, left, and right all work fine. Most of the time anyway. I have absolutely no idea what's causing this problem; at one point I even tried manually following my code to see where the problem was. No surprise that didn't work.
And one more thing: if you're counting my curly braces (first of all wow, you have more dedication than I thought), you'll notice I'm missing a close brace at the end. Not to mention my return statement. There's a little bit of code at the end to actually make the path that I've left out. I know that that part's not the problem however; I currently have it commented out and it still doesn't work in the same way. I added some code to tell me where it's not working, and it's at the pathfinding part, not the interpretation.
Sorry my code's so messy and inefficient. I'm new to c++, so any critical advice on my technique is welcome as well.
I think that when you are looking for the next "currentNode", you should not start with lowestF = (nodes[currentNode].g + nodes[currentNode].h); because that value, in principle, will (always) be lower-or-equal-to any other nodes in the open-set. Just start with the value of std::numeric_limits<int>::max() or some very large number, or use a priority queue instead of an std::vector to store the nodes (like std::priority_queue, or boost::d_ary_heap_indirect).
I'm pretty sure that is the problem. And since your heuristic can very often be equal to the actual path-distance obtained, there is a good chance that following nodes in the open-set turn out to have the same cost (g+h) as the currentNode. This would explain why certain paths get explored and others don't and why it gets stuck.

weighted RNG speed problem in C++

Edit: to clarify, the problem is with the second algorithm.
I have a bit of C++ code that samples cards from a 52 card deck, which works just fine:
void sample_allcards(int table[5], int holes[], int players) {
int temp[5 + 2 * players];
bool try_again;
int c, n, i;
for (i = 0; i < 5 + 2 * players; i++) {
try_again = true;
while (try_again == true) {
try_again = false;
c = fast_rand52();
// reject collisions
for (n = 0; n < i + 1; n++) {
try_again = (temp[n] == c) || try_again;
}
temp[i] = c;
}
}
copy_cards(table, temp, 5);
copy_cards(holes, temp + 5, 2 * players);
}
I am implementing code to sample the hole cards according to a known distribution (stored as a 2d table). My code for this looks like:
void sample_allcards_weighted(double weights[][HOLE_CARDS], int table[5], int holes[], int players) {
// weights are distribution over hole cards
int temp[5 + 2 * players];
int n, i;
// table cards
for (i = 0; i < 5; i++) {
bool try_again = true;
while (try_again == true) {
try_again = false;
int c = fast_rand52();
// reject collisions
for (n = 0; n < i + 1; n++) {
try_again = (temp[n] == c) || try_again;
}
temp[i] = c;
}
}
for (int player = 0; player < players; player++) {
// hole cards according to distribution
i = 5 + 2 * player;
bool try_again = true;
while (try_again == true) {
try_again = false;
// weighted-sample c1 and c2 at once
// h is a number < 1325
int h = weighted_randi(&weights[player][0], HOLE_CARDS);
// i2h uses h and sets temp[i] to the 2 cards implied by h
i2h(&temp[i], h);
// reject collisions
for (n = 0; n < i; n++) {
try_again = (temp[n] == temp[i]) || (temp[n] == temp[i+1]) || try_again;
}
}
}
copy_cards(table, temp, 5);
copy_cards(holes, temp + 5, 2 * players);
}
My problem? The weighted sampling algorithm is a factor of 10 slower. Speed is very important for my application.
Is there a way to improve the speed of my algorithm to something more reasonable? Am I doing something wrong in my implementation?
Thanks.
edit: I was asked about this function, which I should have posted, since it is key
inline int weighted_randi(double *w, int num_choices) {
double r = fast_randd();
double threshold = 0;
int n;
for (n = 0; n < num_choices; n++) {
threshold += *w;
if (r <= threshold) return n;
w++;
}
// shouldn't get this far
cerr << n << "\t" << threshold << "\t" << r << endl;
assert(n < num_choices);
return -1;
}
...and i2h() is basically just an array lookup.
Your reject collisions are turning an O(n) algorithm into (I think) an O(n^2) operation.
There are two ways to select cards from a deck: shuffle and pop, or pick sets until the elements of the set are unique; you are doing the latter which requires a considerable amount of backtracking.
I didn't look at the details of the code, just a quick scan.
you could gain some speed by replacing the all the loops that check if a card is taken with a bit mask, eg for a pool of 52 cards, we prevent collisions like so:
DWORD dwMask[2] = {0}; //64 bits
//...
int nCard;
while(true)
{
nCard = rand_52();
if(!(dwMask[nCard >> 5] & 1 << (nCard & 31)))
{
dwMask[nCard >> 5] |= 1 << (nCard & 31);
break;
}
}
//...
My guess would be the memcpy(1326*sizeof(double)) within the retry-loop. It doesn't seem to change, so should it be copied each time?
Rather than tell you what the problem is, let me suggest how you can find it. Either 1) single-step it in the IDE, or 2) randomly halt it to see what it's doing.
That said, sampling by rejection, as you are doing, can take an unreasonably long time if you are rejecting most samples.
Your inner "try_again" for loop should stop as soon as it sets try_again to true - there's no point in doing more work after you know you need to try again.
for (n = 0; n < i && !try_again; n++) {
try_again = (temp[n] == temp[i]) || (temp[n] == temp[i+1]);
}
Answering the second question about picking from a weighted set also has an algorithmic replacement that should be less time complex. This is based on the principle of that which is pre-computed does not need to be re-computed.
In an ordinary selection, you have an integral number of bins which makes picking a bin an O(1) operation. Your weighted_randi function has bins of real length, thus selection in your current version operates in O(n) time. Since you don't say (but do imply) that the vector of weights w is constant, I'll assume that it is.
You aren't interested in the width of the bins, per se, you are interested in the locations of their edges that you re-compute on every call to weighted_randi using the variable threshold. If the constancy of w is true, pre-computing a list of edges (that is, the value of threshold for all *w) is your O(n) step which need only be done once. If you put the results in a (naturally) ordered list, a binary search on all future calls yields an O(log n) time complexity with an increase in space needed of only sizeof w / sizeof w[0].