I'm implementing minimax for a small game and am noticing something that I'm calling "procrastination". Boiled down to a very simple example:
In a capture-the-flag game, the flag is one square UP from player A, and player B is 50 spaces away. It's A's turn, and he can search 6 moves ahead. What I'm seeing is that all possible moves have a value of "Win" since A knows he can get to the flag before B even if he doesn't grab it immediately. So if UP is the last move in the ordering, he'll just go LEFT and RIGHT for a while until B is within striking distance and then he has to finally get the flag.
At first the behavior looked like a bug, but stepping through it I convinced myself that each move really is a "Win", but the behavior is not good. I could influence the evaluation by making a flag captured 4 moves from now less valuable than a flag captured now, but I wondered if there was an aspect to the minimax search than I'm missing? Is there the any concept of a high score earlier being most desirable than an equally high score obtained only later?
There's nothing in the minimax search itself that would make winning sooner preferable. Since all terminal positions evaluate to the same score, the algorithm effectively chooses a move at random. Make your evaluation function decrease the winning score slightly for each level deeper in the tree where it is called and minimax will choose to win sooner.
Related
I'm exploring a tree of moves for a match-3 type game (with horizontal piece swapping), using a DFS to a depth of 4 moves ahead (because working towards bigger clear combos is much higher scoring). I'm evaluating each possible move of each board to look for the best overall next move to maximise score.
I'm representing the game board as a vector<char> with 100 chars per board, however my current DFS implementation doesn't check if a given board state has already been evluated (there's multiple move sequences that could lead to a given board state).
Given that each board has 90 possible moves, a DFS search to depth 4 evaluates 65,610,000 board states (in 100ms), and therefore I'd assume it's impractical to store each board state for the sake of DFS pruning to avoid re-evaluating a given board state. However, I recognise I could significantly prune the search tree if I avoid re-evaluation of previously evaluated states.
Is there an efficient and or memory conserving method to avoid re-evaluating previously evaluated board states in this DFS?
I'm fairly new to game theory and have understood only normal nim game where you remove stones from piles with no condition and the last player to remove wins. But then I came across a nice problem while reading Game theory tutorial on Topcoder. The gist is as below:
You and a friend are playing a game in which you take turns removing stones from piles. Initially, every pile has at least as many stones as the pile to its left. This property must be maintained throughout the game. On each turn, you remove one or more stones from a single pile. You and your friend alternate turns until it is no longer possible to make a valid move. The last player to have made a move wins the game. Note that if you remove all the stones from a pile, it is still considered a pile.
You are said to have made a "winning move" if after making that move, you can eventually win no matter what your friend does. You are given a int[] piles representing the number of stones in each pile from left to right. It is your turn to move. Find a winning move and return it as a String formatted as "TAKE s STONES FROM PILE k" (quotes for clarity only), where s and k (a 0-based index) are each integers with no leading zeros. If there are multiple winning moves, choose the one that minimizes s. If there is still a tie, choose the one that minimizes k. If no winning move is possible, return the String "YOU LOSE" (quotes for clarity only).
Removal of stones here has a condition such that you need to maintain the overall non-decreasing order, which is becoming a roadblock for me in coming up with a logic. I tried reading the editorial for that, but unfortunately couldn't grasp the idea behind it. Can anyone please explain the solution in a more simple terms?
The editorial does not explain how to solve the original game of Nim, but only provides a link to the Wikipedia page (where the solution can be found).
The editorial just explains how to map the Topcoder problem to that of a regular game of Nim:
First, the game can be transformed into one where the piles have the difference between the original piles (so the 3 6 6 example becomes 3 3 0).
Then the order of the piles is reversed (so the example becomes 0 3 3).
Then a move in this new game becomes a two-step process: remove stones from one pile and add it to the previous one (in the example, the winning move takes 3 from the last and add them to the middle pile, becoming 0 6 0).
Then if you just look at the odd-numbered piles (#1, #3, #5, etc), you get the regular game of Nim, and can apply a documented algorithm on that (so 0 3 3 is the same as a Nim position of 0 3).
The given explanation for that is thus:
any move on an odd-numbered pile becomes just like a move in the normal game of Nim;
any move on an even-numbered pile can be negated by just moving the same number of stones from the receiving odd-numbered pile to the next even-numbered pile (so the same losing position can be imposed again on the player).
void Form(int N, char pegA, char pegB, char pegC) {
if (N == 1)
cout<<"move top disk on peg "<<pegA<<" to peg"<<pegC<<endl;
else {
Form(N-1, pegA, pegC, pegB);
cout<<"move top disk on peg "<<pegA<<" to peg"<<pegC<<endl;
Form(N-1, pegB, pegA, pegC);
}
}
This is a recursive algorithm for the Tower of Hanoi Game. Can this be a form of depth-first search? If not, what is it? thanks
This isn't a depth-first search, because we know which move to make each time, so there's no choice. You can think about it that way. Nothing more than that.
Look, in the case of Depth first search, what do we do? We go deeper and try to find the correct way to go. But here, can you show me one move that was unnecessary? There isn't any.
So this is a simple recursive approach, where we solve smaller instances of the problem, and then construct the solution for the larger one, simply, from the smaller ones' results. That's it.
This is not a depth first search.
It is a recursive tree traversal... dive?
The logic of which move to make is built into the algorithm, and it cleverly uses recursion to handle each sub-tower transition. There is no searching here.
The algorithm works in the following way
End of recursion:
when there is one disk remaining, move it to the destination peg
this is the end of each sub-tower move
Recursive logic:
move all disks above the one we are interested in to another peg
output the move command for the disk we are interested in
move all the disks we moved before on top of the disk we just moved.
The way it does this, is by the clever (and confusing) manipulation of the peg references. it remaps which are the source and destination pegs when it makes the second recursive call.
Another thing to note:
There is no persistent state in this algorithm. The only state each layer cares about is the state of n and only then if it is 1 or not. On function entry, the state of the current layer (on start peg) and all layers above (also on start peg) is known and the state of any disks below the current is unneeded.
This is why nothing needs to actually be moved; as the state is known, the knowledge about the changes in state can be encoded into the algorithm (and are, by the changed order of the pegs passed into the recursive calls). There is no need to base decisions off the current state, and so no need to effect a state.
i started learning Pascal on my own and i want to create a program where i will calcualte the number of possible moves for a horse to make in 2 moves if i know it's starting possition. So i know how to do it, first i calcualte if it's possible to move on all 8 sides (2 up and 1 left, 2 up and one right, 2 left and one up...) and if it is possible then i do it again for the second move, but then i would have the same code i used to calculate the possible moves the first time repeating for 8 times. Sorry if it's a dumb question, if you can give me some tutorial on this matter, i just started learning, i don't even know if it's possible to do this.
You can create a Function for your calculation. It could return the number of possible moves from a given location.
Something like this:
Function calcMoves(pos : Position) : integer
Begin
...logic...
calcMoves := theNumberOfMovesThatAreLegalFromPos
End
Sorry if the syntax is off, it's been a while since I did stuff in Pascal.
However, the idea is that you can now reuse the calculation.
What might be even better is to have a function that calculates not only the number of allowed moves, but also the position. You will need to return some kind of Array or collection. This way you could call the function once with the starting position as argument and then iterate through all possible positions after that move, using the end position of the first move as start position for the second move. That's what I would do, but I really cannot remember how collections were working in Pascal.
I have been pulling out my hair trying to figure out how the MinMax algorithm, and hopefully the alpha-beta pruning algorithms work. I'm confused as to the recursion that occurs.
Firstly, does each intermediate board get scored? or only terminal game boards.
Secondly, what exactly is being returned? how does the program know where to place the next move? I see that im supposed to return the board score (in tictactoe, -1,0,1) but how does the program know which move should be played next.
I have tried finding a simple C or c++ program to demonstrate this, but I have not had much luck. I am trying to learn this alogrithms I can create a presentation for the rest of my computer programming class.
Thanks a lot!
V
Only terminal positions (after quiescence search) are scored. Non-terminal positions compare the score returned by a recursive minimax() call to the best score returned so far. In the case of alpha-beta the returned score is also compared to the alpha value.
The point of minimax is producing a score. Your mistake appears to be thinking that the minimax search function needs to return the best move. It can be coded that way, but it might be simpler for you to instead have a top level loop in another function that executes a move, uses minimax() to produce a score and unexecutes the move. Keep track of the move with the best score and return that move when the loop completes or the time to choose a move runs out.