Mahjong-solitaire solver algorithm, which needs a speed-up - c++

I'm developing a Mahjong-solitaire solver and so far, I'm doing pretty good. However,
it is not so fast as I would like it to be so I'm asking for any additional optimization
techniques you guys might know of.
All the tiles are known from the layouts, but the solution isn't. At the moment, I have few
rules which guarantee safe removal of certain pairs of same tiles (which cannot be an obstacle to possible solution).
For clarity, a tile is free when it can be picked any time and tile is loose, when it doesn't bound any other tiles at all.
If there's four free free tiles available, remove them immediately.
If there's three tiles that can be picked up and at least one of them is a loose tile, remove the non-loose ones.
If there's three tiles that can be picked up and only one free tile (two looses), remove the free and one random loose.
If there's three loose tiles available, remove two of them (doesn't matter which ones).
Since there is four times the exact same tile, if two of them are left, remove them since they're the only ones left.
My algorithm searches solution in multiple threads recursively. Once a branch is finished (to a position where there is no more moves) and it didn't lead to a solution, it puts the position in a vector containing bad ones. Now, every time a new branch is launched it'll iterate via the bad positions to check, if that particular position has been already checked.
This process continues until solution is found or all possible positions are being checked.
This works nicely on a layout which contains, say, 36 or 72 tiles. But when there's more,
this algorithm becomes pretty much useless due to huge amount of positions to search from.
So, I ask you once more, if any of you have good ideas how to implement more rules for safe tile-removal or any other particular speed-up regarding the algorithm.
Very best regards,
nhaa123

I don't completely understand how your solver works. When you have a choice of moves, how do you decide which possibility to explore first?
If you pick an arbitrary one, it's not good enough - it's just brute search, basically. You might need to explore the "better branches" first. To determine which branches are "better", you need a heuristic function that evaluates a position. Then, you can use one of popular heuristic search algorithms. Check these:
A* search
beam search
(Google is your friend)

Some years ago, I wrote a program that solves Solitaire Mahjongg boards by peeking. I used it to examine one million turtles (took a day or something on half a computer: it had two cores) and it appeared that about 2.96 percent of them cannot be solved.
http://www.math.ru.nl/~debondt/mjsolver.html
Ok, that was not what you asked, but you might have a look at the code to find some pruning heuristics in it that did not cross your mind thus far. The program does not use more than a few megabytes of memory.

Instead of a vector containing the "bad" positions, use a set which has a constant lookup time instead of a linear one.

If 4 Tiles are visible but can not be picked up, the other Tiles around have to be removed first. The Path should use your Rules to remove a minimum of Tiles, towards these Tiles, to open them up.
If Tiles are hidden by other Tiles, the Problem has no full Information to find a Path and a Probability of remaining Tiles needs to be calculated.
Very nice Problem!

Related

What is the difference between state evaluation and heuristics in game-AI?

I am trying to implement a minimax algorithm for an AI player within a simple card game. However, from doing research I am confused what are the key differences between state evaluation and heuristics.
From what I understand heuristics are calculated by the current information available to the player (e.g. in chess, the pieces and their relevant locations). With this infomation, they come to a conclusion based on a heuristics function which essentially provides a "rule of thumb".
A state evaluation is the exact value of the current state.
However I am unsure why both things co-exist as I cannot see how they are much different from one another. Please can someone ellaborate, and clear up my confusion. Thanks.
Assuming a zero-sum game, you can implement a state-evaluation for end-states (game ended with win, draw, loss from perspective of player X) which results 1,0,-1. A full tree-search will then get you perfect-play.
But in practice the tree is huge and can't be searched completely. Therefore you have to stop the search at some point, which is not an end-state. There is no determined winner or loser. Now it's hard to mark this state with 1,0,-1 as the game might be too complex to easily evaluate the winner from some state far away from the end-state. But you still need to evaluate this positions and can use some assumptions about the game, which equals to heuristic-information. One example is piece-mass in chess (queen is more valuable then a pawn). This is heuristic information incorporated into the non-perfect evaluation-function (approximation of the real one). The better your assumptions / heuristics, the better the approximation of the real evaluation!
But there are other parts where heuristic information can be incorporated. One very imporant area is controlling the tree-search. Which first-move will be evaluated first, which last. Selecting good moves first allows algorithms like alpha-beta to prune huge parts of the tree. But of course you need to state some assumptions/ heuristic-information to order your moves (e.g. queen-move more powerful than pawn-move; this is a made-up example, i'm not sure about the effect of this heuristic in chess-AIs here)

2D Tower Defense - Units stacking on top of each other

I'm currently implementing a 2D top down Tower Defense game. For the pathfinding I've used a Breadth-First-Search backwards from the goal. Everything works quite fine, though my units all follow the exact same line and therefore might stack on top of each other.
For units of the same time, I can of course release them one after another but if faster and slower units are mixed, the faster ones will "walk over" the slower ones and it looks quite weird.
In Fieldrunners 2 units walk around each other when the need to pass which looks quite cool, though I imagine that this is quite complex to implement.
Do you have any idea how I can solve these issues / improve my game?
You could try looking into something known as steering behaviour. Use collision checks to determine when a unit is about to collide with something that they cannot pass through on a node that they should be able to pass through and use steering behaviour to avoid it.
This has the benefit of meaning that you don't have to constantly update and recalculate paths for all units, and so it is far more scalable.

How to fetch patterns from a game board in a fast way?

For my recent project I'm right now looking for an efficient way to structure and store the board information with consideration of the usage for patternmatching.
I'm having a square board, and for pattern matching, I'm using bitfields with 2 bits representing one field of the board. The patterns to match have a diamond shape, that could be centered around any possible field on the board. (so the center is not static, I need to be able to do it for any center)
Example of diamond area around O:
..X..
.XXX.
XXOXX
.XXX.
..X..
If parts of the diamond are outside the the playing area, the bits will be set to 11. The diamonds can have differing radiuses, aboves example would have a radius of 2.
Another important thing for the efficiency of the system is, that I have to be able to quickly rotate/mirror the pattern into all 8 possible symmetries.
For this, it may be beneficial to actually NOT store the information of the central point in the pattern, and as this is not required for my algorithm anyway, this may be a valuable timesaver. Because now some bitshifting magic is possible to quickly rotate/mirror the patterns.
As this kind of patternmatching has to be done at a high frequency, it can prove to be a severe bottleneck of my overall project, when implemented badly.
When trying to get a nice model for doing all this work, I figured, there are 3 important keyareas that require thinking about, but are of course tightly connected.
A. How is the data stored in the board implementation.
Currently this is done in a rather difficult manner, which would be too difficult to read from with such high frequency. But it would be no problem or timeloss to actually store and update the 2 bit data in any possible way for the entire board.
Easiest would be to just store the entire board in an bitset with the size of twice the board, and then each two bit represent the value of a single field. But there is no necessarity for doing it in a special sequence or in only one bitset, even though at first it may look natural to do so.
Anyway, this is the part I'm most felxible about, as this can be done without performance issues in any way it seves the other 2 critical parts of the problem the best.
B. How is the data stored in the pattern.
This is already more difficult. As said, my intention is to store them in a bitset of the appropriate size, but there is he question in what order.
There seem to be two ways, that quickly come to mind:
a) (this could be done with or without the central point C)
...0...
..123..
.45678.
9ABCDEF
.GHIJK.
..LMN..
...O...
b)
...0...
..N14..
.ML235.
KJI.678
.HFC9A.
..GDB..
...E...
If we are just talking about the patterns, b) seems clearly superior. A rotation of the pattern is done by a simple rotateshift (3 bitops total per rotation) and even mirroring the pattern can be done with about a dozen bitops. This kind of operations are much more time consuming with a).
But b) has also some severe drawback... And this leads to:
C. How is the data read from the board implementation to the pattern.
Looking at aboves 2 potential ways to order the pattern bits, now a) is clearly superior. a) can be read by a bunch of bitops from a potential array, as discussed in A. you bitshift each line (getting the line by AND with a bitset nulling all other bits) to the appropriate place and put them together with some OR-operations. Even near the board edges this is done very quick.
Problem of course is, that this would still only get me one possible symmetry of the pattern, but rotations/mirrors are not that easily done. This could be circumvented by saving each pattern to match agaisnt 8 times, but this would look very crude, and may cause troubles elsewhere.
With b) this is much more difficult... Honestly, I don't see a way how it can be done quick, without checking every single bit individually. But when increasing the pattern size (like radius 15) this takes forever, when done very often, especially as the [] operator of bitsets is rather slow.
One possible solution I thought of writing it in CUDA, with each thread generating a pattern around one field, and each block of the thread checking one fixed position around this center. But as I haven't used CUDA before, I don't know how reasonable this is, but if done parallel, this sounds more reasonable than iterating over all positions serially.
As I still didn't find a satisfying solution for the problem, I wanted to ask here, if someone probably knows how it can be done better:
- either rotate/mirror patterns of type a)
- or quickly read pattern of type b) (possibly by arranging the data in a better way in step A., I'm flexible here)
- or if the CUDA idea may actually solve that problem
- or maybe some completely different way, I didn't think of, as I'm sure this has been done before by smarter people
If it matters: I'm coding with VS Pro 2013 and don't mind using boost. If CUDA could solve this effectively, I would also use it.
EDIT:
Okay... So I continued thinking about the whole thing. Maybe there are some other ways to make the whole thing more efficient, by doing some work in more efficient batches.
First of all, what I usually need: On a given board position (and we are talking about 10k positions per second) I need for a large set of positions (every empty field of the board, so most fields) all patterns from size 15 down to size 3. I only need the biggest pattern matched by my database, but in any case, I may often need most of them. So there are 2 things, that could make some time savings possible:
1) some efficient way to use the larger pattern, to generate the pattern one size smaller. This should actually possible, when using the bitordering from b), if it is done the proper way... Then it would only need a few bit ops to cut out the outer ring...
2) As often neighboring fields need their specific pattern, if there would be some way to create their patterns in some sort of batch operation... But I admit, I don't see how this could be done very well... But there may be some time savings.
Oh, and another additional comment, as I had the discussion earlier today with some friend: No it is not an option, t instead of matching the board position against the pattern database, to reverse it and do it the other way around (check if DB pattern matches some board position) I have way too many patterns for that. When doing it the first way, i can just look, if the bitstring exists in my database and be done.
Edit2:
Another Update... First I looked into CUDA, and as it seems incompatible with VS2013, this is a severe blow to that idea. Second I thought about the process how patterns are matched. In fact, it may seem possible, instead of going from the large patterns down to the small ones, doing it in reverse. Now suddenly my pattern library is less of a dictionary but more of a searchtree, as larger patterns certainly have their inner core saved as pattern as well. This should speed up any lookups, but still does not solve my problem of the patterngeneration, sadly.
Edit3:
As I felt, it is more worth of an answer then an edit, I just posted my own new idea (which is different from what I had in mind when posting this question) below.
Okay, as I was thinking about this more and more, I now think, that the following solution may be the best to tackle the problems. This is certainly not final, but my currently best idea. So any criticism is welcome and improvements can surely be made.
As the discussion in the comments led me to the believe, that the approach imagined in the question is not practical for the problem at hand, I now drastically changed my idea. Instead of trying to read the pattern around each empty intersection after each move, I will now update the surrounding pattern of each empty intersection after each move made.
This can be made in an efficient way, as we can use 2 very important features of our patterns:
1) each larger patterns core (so the pattern reduced to a lower radius) will guaranteed to be in the database
2) most patterns will have a rather low radius, and in most cases, not many positions on the board are changed with each move, resulting in not too many positions needing a recheck of their patterns.
My idea is, to store the currently largest pattern, it's radius and it's evaluation with each empty intersection. Now, while a move is made, I generate a list of all positions changed during that move. (usually one) Once the move is finished, I iterate over all empty positions on the board and look at their distance to the closest change. Now we are having 3 possible cases:
a) the distance is smaller or equal the radius of currently matched pattern. Now we have to recheck the pattern.
b) the distance is one bigger then the radius of the currently matched pattern. Now we have to check, if actually a (r+1) size pattern exists, matching the surrounding. If it does, we have to check r+2 etc, until we found the largest.
c) the distance is even bigger: We can keep everything as it is.
As we are having now basically a tree of patterns, with each pattern having lots of child pattern with an incremented radius, it is actually practical to store the pattern information in a series of bitets, each representing a ring of a certain radius around the center.
I hope that this system maximizes the reusability of all the information at hand and is fast enough for my needs. As mentioned before, I welcome criticism and opinions for improvement and if there is not better solution found, will probably implement it in the near future. Once done, I can probably report back on the results.

C++ - fastest sorting algorithm for objects based on distance

I'm trying to make a game or 3D application using openGL. The game/program will have many objects in them and drawn to the screen(around 7000 of them). When I render them, I would need to calculate the distance between the camera and the object and sort them in order to correctly render the objects within the scene. Knowing this, what is the best way to sort them? I really want the sorting to be done really fast, but I've heard there are "trade off" for them, so what algorithm should I use to get the best performance out of it?
Any help would be greatly appreciated.
Edit: a lot of people are talking about the z-buffer/depth buffer. This doesn't work in some cases like a few people talked about. This is why I asked this question.
Sorting by distance doesn't solve the transparency problem perfectly. Consider the situation where two transparent surfaces intersect and each has a part which is closer to you. Perhaps rare in games, but still something to consider if you don't want an occasional glitched look to your renderer.
The better solution is order-independent transparency. With the latest graphics hardware supporting atomic operations, you can use an A-buffer to do this with little memory overhead and in a single pass so it is pretty efficient. See for example this article.
The issue of sorting your scene is still a valid one, though, even if it isn't for transparency -- it is still useful to sort opaque objects front to back to to allow depth testing to discard unseen fragments. For this, Vaughn provided the great solution of BSP trees -- these have been used for this purpose for as long as 3D games have been around.
Use http://en.wikipedia.org/wiki/Insertion_sort which has O(n) complexity for nearly sorted arrrays.
In your case by exploiting temporal cohesion insertion sort gives fastest results.
It is used for http://en.wikipedia.org/wiki/Sweep_and_prune
From link above:
In many applications, the configuration of physical bodies from one time step to the next changes very little. Many of the objects may not move at all. Algorithms have been designed so that the calculations done in a preceding time step can be reused in the current time step, resulting in faster completion of the calculation.
So in such cases insertion sort is best(or similar sorts with O(n) at best case)

Nearest neighbour search on graphics hardware

Given a huge collection of points (float64) in 2d space...
Is there a way to determine the nearest neighbour using a feature of OpenGL or DirectX?
I've implemented a kd-tree, which is still not fast enough.
A kd-tree should work just fine. But here's some hints.
I implemented a kd-tree once for a million point data set once. Here's what I learned out of it:
Did you try profiling your code? You might find that there are easy optimizations to make such as common helper functions needing to be forced inline.
Did you actually test your code to validate that it was culling out tree branches for partitions that are easily identified as "too far away". If you aren't careful, you can easily have a bug that does needless distance computations on points too far away.
Easiest thing: Where comparing linear distance between points, you don't need to take the SQRT of (x2-x1)*(y2-y1).
Most of the time spent in my code was just building the tree from the original data set, including multiple full sorts on each iteration deciding which axis was the best to partition on. An easier algorithm would be to just alternate between partitioning on the x and y axis for each tree branch and to cache the sorting order for each axis. It may not build the most optimal search tree, but the overall savings can be enormous.