I'm implementing a search algorithm into the search function with Negamax with alpha-beta pruning. However, it often misses forced checkmate.
(Note: "Mate in X" counts whole turns, while "depth" and "move(s)" relies on half moves.)
Example
The position with the following FEN: 1k1r4/pp1b1R2/3q2pp/4p3/2B5/4Q3/PPP2B2/2K5 b - - 0 1 has a Mate in 3 (depth of 5 to the algorithm).
It goes Qd1+, Kxd1, Bg4+, Kc1/Ke1 (Doesn't matter), Rd1#.
It can spot the checkmate from 1 move away, but fails at higher depths.
Possible Causes
It could be a typo, a misused type, or even a complete misunderstanding of the method, as all of it happened before.
Simplified Code
I've make some part of the code code easier to read. (eg. remove std::, turns multiple lines into function).
Shouldn't changes the functionalities though.
Root Call
pieceMove searchBestMove (gameState currentState, int depth) {
//Calls the Negamax search
pieceColor sideToMove = whoseTurnIsIt();
vector<pieceMove> moveList = generateLegalMoves(currentState, sideToMove);
pieceMove bestMove;
signed int bestEval = numeric_limits<signed int>::max();
for (const auto move : moveList) {
signed int evaluation = negaMax(applyMove(currentState, move), numeric_limits<signed int>::min(), numeric_limits<signed int>::max(), depth - 1, 1);
if (evaluation < bestEval) {
bestMove = move;
bestEval = evaluation;
}
}
return bestMove;
}
Search Function
signed int negaMax (gameState currentState, signed int alpha, signed int beta, int depth, int rootDepth) {
//Main Negamax search
//Terminal node
if (depth == 0) {
return evaluates(currentState); //Replace this line with the one below to enable the extended search
//return quiescenceSearch(currentState, alpha, beta);
}
//Mate distance pruning
signed int mateDistScore = numeric_limits<signed int>::max() - rootDepth;
alpha = max(alpha, -mateDistScore);
beta = min(beta, mateDistScore - 1);
if (alpha >= beta) return alpha;
vector<pieceMove> moveList = generateLegalMoves(currentState);
//If no moves are allowed, then it's either checkmate or stalemate
if (moveList.size() == 0) return evaluates(currentState)
orderMoves(currentState, moveList);
for (const auto move : moveList) {
signed int score = -negaMax(applyMove(currentState, move), -beta, -alpha, depth - 1, rootDepth + 1);
if (score >= beta) return beta; //Bata cutoff
alpha = max(score, alpha);
}
return alpha;
}
Extended Search
signed int quiescenceSearch (gameState currentState, signed int alpha, signed int beta) {
//Searches only captures
//Terminal node
int evaluation = evaluates(currentState);
if (evaluation >= beta) return beta;
alpha = max(alpha, evaluation);
vector<pieceMove> moveList = generateCaptureMoves(currentState);
//If no moves are allowed, then it's either checkmate or stalemate
if (moveList.size() == 0) return evaluates(currentState);
orderMoves(currentState, moveList);
for (const auto move : moveList) {
signed int score = -quiescenceSearch(applyMove(currentState, move), -beta, -alpha);
if (score >= beta) return beta; //Bata cutoff
alpha = max(score, alpha);
}
return alpha;
}
I think you need to call the function "quiescenceSearch" when the depth is 0 in "negaMax". Also you need to check for "checks" too in "quiescenceSearch" along with captures since they are not quiet moves. Also Matedistance pruning works only when positions are properly scored(https://www.chessprogramming.org/Mate_Distance_Pruning#Mating_Value). May be checking if your evaluation function is evaluating properly could also help.
Related
I am writing 3D tic tac toe game using minimax algorithm with alpha beta pruning, but the algorithm doesnt give optimal solution, it goes and chooses next possible solution from winning states, not taking into concern what is on the board, meaning it wont block my moves.
This is the code:
`
int Game::miniMax(char marker, int depth, int alpha, int beta){
// Initialize best move
bestMove = std::make_tuple(-1, -1, -1);
// If we hit a terminal state (leaf node), return the best score and move
if (isBoardFull() || getBoardState('O')!=0 || depth>5)
return getBoardState('O');
auto allowedMoves = getAllowedMoves();
for (int i = 0; i < allowedMoves.size(); i++){
auto move = allowedMoves[i];
board[std::get<0>(move)][std::get<1>(move)][std::get<2>(move)] = marker;
// Maximizing player's turn
if (marker == 'O'){
int bestScore = INT32_MIN;
int score = miniMax('X', depth + 1, alpha, beta);
// Get the best scoring move
if (bestScore <= score){
bestScore = score - depth * 10;
bestMove = move;
// Check if this branch's best move is worse than the best
// option of a previously search branch. If it is, skip it
alpha = std::max(alpha, bestScore);
board[std::get<0>(move)][std::get<1>(move)][std::get<2>(move)] = '-';
if (beta <= alpha){
break;
}
}
} // Minimizing opponent's turn
else{
int bestScore = INT32_MAX;
int score = miniMax('O', depth + 1, alpha, beta);
if (bestScore >= score){
bestScore = score + depth * 10;
bestMove = move;
// Check if this branch's best move is worse than the best
// option of a previously search branch. If it is, skip it
beta = std::min(beta, bestScore);
board[std::get<0>(move)][std::get<1>(move)][std::get<2>(move)] = '-';
if (beta <= alpha){
break;
}
}
}
board[std::get<0>(move)][std::get<1>(move)][std::get<2>(move)] = '-'; // Undo move
}
if(marker== 'O')
return INT32_MIN;
else
return INT32_MAX;
}
`
What do I need to change to make it work?
I tried other ways to implement minimax, but it doesnt give optimal, or any, solution.
The limit on depth is because it is too slow for bigger depths, but still not giving solution, also increasing the value of the constant that multiplies the depth in the score part is only slowing the program
I am trying to implement threefold repetition detection in my chess engine, but the method I use leads to incorrect play. I check for one repetition of the position in the current search space or two repetitions "behind" the root. This approach is used by Stockfish too but it makes my engine weaker (ELO score drops significantly when testing).
Each position in the history of the game has a zobrist hash index.
Threefold repetition detection:
bool Eval::isThreefoldRepetition() const {
const std::deque<GameState>&history=internal_board.getHistory();
int repetitions=0;
uint64_t hash=internal_board.getGameState().zobrist_key;
for(int k=history.size()-2;k>=0;k-=2){
if(history[k].zobrist_key==hash){
if(k>=root)//found repetition in current search space
return true;
//else we need 2 repetitions that happen before the root
repetitions++;
if(repetitions==2)
return true;
}
}
return false;
}
Search:
int Eval::negamax(int depth, int alpha, int beta, Color color) {
if(isThreefoldRepetition())
return threefold_repetition;
uint64_t hash=internal_board.getGameState().zobrist_key;
int alphaOrig=alpha;
Transposition node=TranspositionTable::getInstance().getTransposition(hash);
if(node.getType()!=NodeType::Null &&node.getDepth()>=depth){
if(node.getType()==NodeType::Exact)
return node.getValue();
else if(node.getType()==NodeType::LowerBound)
alpha=std::max(alpha,node.getValue());
else if(node.getType()==NodeType::UpperBound)
beta=std::min(beta,node.getValue());
if(alpha>=beta)
return node.getValue();
}
if (depth == 0) {
return getHeuristicScore(color);
}
std::vector<Move> moves = movegen.getAllMoves();
if (moves.size() == 0) {
if (movegen.isInCheck(color))
return checkmate - depth;//try to delay the checkmate
return stalemate;
}
int best = -infinity;
Move best_move;
setRating(moves);
std::sort(moves.begin(), moves.end(), compare);
for (const Move &move:moves) {
if(!hasTimeLeft()){
premature_stop=true;
break;
}
internal_board.makeMove(move);
int down = -negamax(depth - 1, -beta, -alpha, getOpposite(color));
internal_board.undoLastMove();
if(down>best){
best=down;
best_move=move;
}
alpha = std::max(alpha, best);
if (alpha >= beta)
break;
}
if(!premature_stop) {
//save the position in the transposition table
NodeType node_type;
if (best <= alphaOrig)//did not affect the score
node_type = NodeType::UpperBound;
else if (best >= beta)
node_type = NodeType::LowerBound;
else node_type = NodeType::Exact;
TranspositionTable::getInstance().addEntry(Transposition(node_type, hash, depth, best, best_move));
}
return best;
}
Notice that I can't check for a 'real' threefold repetition because a position will not be repeated thrice in the search space when using transposition tables.
Is my method of evaluation wrong?
I am trying to understand the code of fpaq0 aritmetic compressor but I am not able to fully understand it.Here is the link to the code -fpaq0.cpp
I am not able to understand exactly the how ct[512]['2] and cxt are working.Also I am not very much clear how decoder is working.Why before encoding every charater e.encode(0) is being called.
NOTE; I have understood the arithmetic coder presented in the link-Data Compression with Arithmetic Encoding
void update(int y) {
if (++ct[cxt][y] > 65534) {
ct[cxt][0] >>= 1;
ct[cxt][1] >>= 1;
}
if ((cxt+=cxt+y) >= 512)
cxt=1;
}
// Assume a stationary order 0 stream of 9-bit symbols
int p() const {
return 4096*(ct[cxt][1]+1)/(ct[cxt][0]+ct[cxt][1]+2);
}
inline void Encoder::encode(int y) {
// Update the range
const U32 xmid = x1 + ((x2-x1) >> 12) * predictor.p();
assert(xmid >= x1 && xmid < x2);
if (y)
x2=xmid;
else
x1=xmid+1;
predictor.update(y);
// Shift equal MSB's out
while (((x1^x2)&0xff000000)==0) {
putc(x2>>24, archive);
x1<<=8;
x2=(x2<<8)+255;
}
}
inline int Encoder::decode() {
// Update the range
const U32 xmid = x1 + ((x2-x1) >> 12) * predictor.p();
assert(xmid >= x1 && xmid < x2);
int y=0;
if (x<=xmid) {
y=1;
x2=xmid;
}
else
x1=xmid+1;
predictor.update(y);
// Shift equal MSB's out
while (((x1^x2)&0xff000000)==0) {
x1<<=8;
x2=(x2<<8)+255;
int c=getc(archive);
if (c==EOF) c=0;
x=(x<<8)+c;
}
return y;
}
fpaq0 is a file compressor which uses an order-0 bitwise model for modeling and uses 12-bits carry-less arithmetic coder for entropy coding stage. ct[512][2] stores counters for each contexts to compute symbol probabilities. The context (order-0 in fpaq0) is calculated with partial bits with a leading one (to simplify calculations).
For more easy explanation, let's skip EOF symbol for now. Order-0 context calculated as follow without EOF symbol (simplified):
// Full byte encoding
int cxt = 1; // context starts with leading one
for (int i = 0; i < 8; ++i) {
// Encoding part
int y = ReadNextBit();
int p = GetProbability(ctx);
EncodeBit(y, p);
// Model updating
UpdateCounter(cxt, y); // Update related counter
cxt = (cxt << 1) | y; // shift left and insert new bit
}
For decoding, context is used without EOF symbol like following (simplified):
// Full byte decoding
int cxt = 1; // context starts with leading one
for (int i = 0; i < 8; ++i) {
// Decoding part
int p = GetProbability(ctx);
int y = DecodeBit(p);
WriteBit(y);
// Model updating
UpdateCounter(cxt, y); // Update related counter
cxt = (cxt << 1) | y; // shift left and insert new bit
}
fpaq0 is designed as a streaming compressor. Meaning that it doesn't need to know exact length of the input stream. So, the question how decoder should know when to stop? EOF symbol used exactly for that. While encoding every single byte, a zero bit is encoded as a flag to indicate there is more data to follow. One indicates we reached the end of stream. So, decoder knows when to stop. That's the reason why our context model is 9-bits (EOF flag + 8 bits data).
Now, the last part: probability calculation. fpaq0 uses just counts of past symbols under order-0 context to calculate final probability.
n0 = count of 0
n1 = count of 1
p = n1 / (n0 + n1)
There are two implementation details that should be addressed: counter overflow and division by zero.
Counter overflow is addressed by halving both counts when they reach a threshold. Since, we're dealing with p, it makes sense.
Division by zero is addressed by inserting one into formula for each variables. So,
p = (n1 + 1) / ((n0 + 1) + (n1 + 1))
I have m*n table which each entry have a value .
start position is at top left corner and I can go right or down until I reach lower right corner.
I want a path that if I multiply numbers on that path I get a number that have minimum number of zeros in it's right side .
example:
1 2 100
5 5 4
possible paths :
1*2*100*4=800
1*2*5*4= 40
1*5*5*4= 100
Solution : 1*2*5*4= 40 because 40 have 1 zero but other paths have 2 zero.
easiest way is using dfs and calculate all paths. but it's not efficient.
I'm looking for an optimal substructure for solving it using dynammic programming.
After thinking for a while I came up to this equation :
T(i,j) = CountZeros(T(i-1,j)*table[i,j]) < CountZeros(T(i,j-1)*table[i,j]) ?
T(i-1,j)*table[i,j] : T(i,j-1)*table[i,j]
Code :
#include <iostream>
#include <vector>
#include <algorithm>
#include <numeric>
using namespace std;
using Table = vector<vector<int>>;
const int rows = 2;
const int cols = 3;
Table memo(rows, vector<int>(cols, -1));
int CountZeros(int number)
{
if (number < 0)
return numeric_limits<int>::max();
int res = 0;
while (number != 0)
{
if (number % 10 == 0)
res++;
else break;
number /= 10;
}
return res;
}
int solve(int i, int j, const Table& table)
{
if (i < 0 || j < 0)
return -1;
if (memo[i][j] != -1)
return memo[i][j];
int up = solve(i - 1, j, table)*table[i][j];
int left = solve(i, j - 1, table)*table[i][j];
memo[i][j] = CountZeros(up) < CountZeros(left) ? up : left;
return memo[i][j];
}
int main()
{
Table table =
{
{ 1, 2, 100 },
{ 5, 5, 4 }
};
memo[0][0] = table[0][0];
cout << solve(1, 2, table);
}
(Run )
But it is not optimal (for example in above example it give 100 )
Any idea for better optimal sub-structure ? can I solve it with dynammic programming ?!
Let's reconsider the Bellman optimality equation for your task. I consider this as a systematic approach to such problems (whereas I often don't understand DP one-liners). My reference is the book of Sutton and Barto.
The state in which your system is can be described by a triple of integer numbers (i,j,r) (which is modeled as a std::array<int,3>). Here, i and j denote column and row in your rectangle M = m_{i,j}, whereas r denotes the multiplication result.
Your actions in state (i,j,r) are given by going right, with which you end in state (i, j+1, r*m_{i,j+1}) or by going down which leads to the state (i+1, j, r*m_{i+1,j}).
Then, the Bellman equation is given by
v(i,j,r) = min{ NullsIn(r*m_{i+1,j}) - NullsIn(r) + v_(i+1,j, r*m_{i+1,j})
NullsIn(r*m_{i,j+1}) - NullsIn(r) + v_(i,j+1, r*m_{i,j+1}) }
The rationale behind this equation is the following: NullsIn(r*m_{i+1,j}) - NullsIn(r) denotes the zeros you have to add when you take one of the two actions, i.e. the instant penalty. v_(i+1,j, r*m_{i+1,j}) denotes the zeros in the state you get to when you take this action. Now one wants to take the action which minimizes both contributions.
What you need further is only a function int NullsIn(int) which returns the nulls in a given integer. Here is my attempt:
int NullsIn(int r)
{
int ret=0;
for(int j=10; j<=r; j*=10)
{
if((r/j) * j == r)
++ret;
}
return ret;
}
For convenience I further defined a NullsDifference function:
int NullsDifference(int r, int m)
{
return NullsIn(r*m) - NullsIn(r);
}
Now, one has to do a backwards iteration starting from the initial state in the right bottom element of the matrix.
int backwardIteration(std::array<int,3> state, std::vector<std::vector<int> > const& m)
{
static std::map<std::array<int,3>, int> memoization;
auto it=memoization.find(state);
if(it!=memoization.end())
return it->second;
int i=state[0];
int j=state[1];
int r=state[2];
int ret=0;
if(i>0 && j>0)
{
int inew=i-1;
int jnew=j-1;
ret=std::min(NullsDifference(r, m[inew][j]) + backwardIteration({inew,j,r*m[inew][j]}, m),
NullsDifference(r, m[i][jnew]) + backwardIteration({i,jnew,r*m[i][jnew]}, m));
}
else if(i>0)
{
int inew=i-1;
ret= NullsDifference(r, m[inew][j]) + backwardIteration({inew,j,r*m[inew][j]}, m);
}
else if(j>0)
{
int jnew=j-1;
ret= NullsDifference(r, m[i][jnew]) + backwardIteration({i,jnew,r*m[i][jnew]}, m);
}
memoization[state]=ret;
return ret;
}
This routine is called via
int main()
{
int ncols=2;
int nrows=3;
std::vector<std::vector<int> > m={{1,2,100}, {5,5,4}};
std::array<int,3> initialState = {ncols-1, nrows -1, m[ncols-1][nrows - 1]};
std::cout<<"Minimum number of zeros: "backwardIteration(initialState, m)<<"\n"<<std::endl;
}
For your array, it prints out the desired 1 for the number of zeros.
Here is a live demo on Coliru.
EDIT
Here is an important thing: in production, you usually don't call backwardIteration as I did because it takes an exponentially increasing number of recursive calls. Rather, you start in the top left and call it, then store the result. Next you go left and down and each time call backwardIteration where you now use the previously stored result. And so on.
In order to do this, one needs a memoization concept within the function backwardIteration, which returns the already stored result instead of invoking another recursive call.
I've added memoization in the function call above. Now you can loop through the array from left top to right bottom in any way you like -- but prefereably take small steps, such as row-by-row, column-by-column, or rectangle-for-rectangle.
In fact, this and only this is the spirit of Dynamic Programming.
I'm trying to add Alpha Beta pruning into my minimax, but I can't understand where I'm going wrong.
At the moment I'm going through 5,000 iterations, where I should be going through approximately 16,000 according to a friend. When choosing the first position, it is returning -1 (a loss) whereas it should be able to definitely return a 0 at this point (a draw) as it should be able to draw from an empty board, however I can't see where I'm going wrong as I follow my code it seems to be fine
Strangely if I switch returning Alpha and Beta inside my checks (to achieve returning 0) the computer will attempt to draw but never initiate any winning moves, only blocks
My logical flow
If we are looking for alpha:
If the score > alpha, change alpha. if alpha and beta are overlapping, return alpha
If we are looking for beta:
If the score < beta, change beta. if alpha and beta are overlapping, return beta
Here is my
Recursive call
int MinimaxAB(TGameBoard* GameBoard, int iPlayer, bool _bFindAlpha, int _iAlpha, int _iBeta)
{
//How is the position like for player (their turn) on iGameBoard?
int iWinner = CheckForWin(GameBoard);
bool bFull = CheckForFullBoard(GameBoard);
//If the board is full or there is a winner on this board, return the winner
if(iWinner != NONE || bFull == true)
{
//Will return 1 or -1 depending on winner
return iWinner*iPlayer;
}
//Initial invalid move (just follows i in for loop)
int iMove = -1;
//Set the score to be instantly beaten
int iScore = INVALID_SCORE;
for(int i = 0; i < 9; ++i)
{
//Check if the move is possible
if(GameBoard->iBoard[i] == 0)
{
//Put the move in
GameBoard->iBoard[i] = iPlayer;
//Recall function
int iBestPositionSoFar = -MinimaxAB(GameBoard, Switch(iPlayer), !_bFindAlpha, _iAlpha, _iBeta);
//Replace Alpha and Beta variables if they fit the conditions - stops checking for situations that will never happen
if (_bFindAlpha == false)
{
if (iBestPositionSoFar < _iBeta)
{
//If the beta is larger, make the beta smaller
_iBeta = iBestPositionSoFar;
iMove = i;
if (_iAlpha >= _iBeta)
{
GameBoard->iBoard[i] = EMPTY;
//If alpha and beta are overlapping, exit the loop
++g_iIterations;
return _iBeta;
}
}
}
else
{
if (iBestPositionSoFar > _iAlpha)
{
//If the alpha is smaller, make the alpha bigger
_iAlpha = iBestPositionSoFar;
iMove = i;
if (_iAlpha >= _iBeta)
{
GameBoard->iBoard[i] = EMPTY;
//If alpha and beta are overlapping, exit the loop
++g_iIterations;
return _iAlpha;
}
}
}
//Remove the move you just placed
GameBoard->iBoard[i] = EMPTY;
}
}
++g_iIterations;
if (_bFindAlpha == true)
{
return _iAlpha;
}
else
{
return _iBeta;
}
}
Initial call (when computer should choose a position)
int iMove = -1; //Invalid
int iScore = INVALID_SCORE;
for(int i = 0; i < 9; ++i)
{
if(GameBoard->iBoard[i] == EMPTY)
{
GameBoard->iBoard[i] = CROSS;
int tempScore = -MinimaxAB(GameBoard, NAUGHT, true, -1000000, 1000000);
GameBoard->iBoard[i] = EMPTY;
//Choosing best value here
if (tempScore > iScore)
{
iScore = tempScore;
iMove = i;
}
}
}
//returns a score based on Minimax tree at a given node.
GameBoard->iBoard[iMove] = CROSS;
Any help regarding my logical flow that would make the computer return the correct results and make intelligent moves would be appreciated
Does your algorithm work perfectly without alpha-beta pruning? Your initial call should be given with false for _bFindAlpha as the root node behaves like an alpha node, but it doesn't look like this will make a difference:
int tempScore = -MinimaxAB(GameBoard, NAUGHT, false, -1000000, 1000000);
Thus I will recommend for you to abandon this _bFindAlpha nonsense and convert your algorithm to negamax. It behaves identically to minimax but makes your code shorter and clearer. Instead of checking whether to maximize alpha or minimize beta, you can just swap and negate when recursively invoking (this is the same reason you can return the negated value of the function right now). Here's a slightly edited version of the Wikipedia pseudocode:
function negamax(node, α, β, player)
if node is a terminal node
return color * the heuristic value of node
else
foreach child of node
val := -negamax(child, -β, -α, -player)
if val ≥ β
return val
if val > α
α := val
return α
Unless you love stepping through search trees, I think that you will find it easier to just write a clean, correct version of negamax than debug your current implementation.