Generating individual moves from a bitboard of moves

Generating individual moves from a bitboard of moves - bit-manipulation

In my chess engine, that uses bitboards for representing the board's state, generates a chunk of pseudo-legal moves in one go, a bitboard being the result. For example:
Pawns:
A little bitboard magic later:
The bitboard at the end is simply a chunk of possible moves. How do engines usually take this bitboard and generate individual moves from them? Do I have to iterate over every single bit to check if it's set? Iterating over a bitboard seems to defy the very purpose of using bitboards though, which is why I'm a bit skeptical.
Is there a better way?

Then, typically you apply some variant of the minimax algorithm to evaluate how good the moves are, so you can pick (what you estimate to be) the best move. A simple variant is, for example, alpha-beta.
The variants mainly deal with attempting to guide the search towards "probably useful moves" and away from useless areas of the search space, because the search tree is very wide and your ability to explore it deeply is extremely important for a good chess AI - exploring it shallowly makes the AI easy to "trap" because it will make choices that look good short-term even though they work out badly later on.
So yes, you will iterate over the bitboards. That doesn't really defy their purpose - you've still (probably) computed the moves much faster than if you hadn't used bitboards. For the simplest AI you could just take "the first" move using standard bitboard techniques, but an AI that plays like that will be below novice level, having no regard for winning or losing at all.

You don't have to iterate over 64 single bits. You can prepare/pre-define for example a 256-sized lookup array with all possible move-lists where 8-bit indices represent attack-sets of a piece on a single rank. Then you can iterate only 8 times with bitwise shift operation (bitboard >> 8) to pass subsequent rank-attack-sets as an index to the array and extract the move-list. It will speed up roughly 8 times comparing to one-bit stepping loop. Maybe you should enhance this array to [8][256] actually to pass also a rank number itself and extract a final move-list (with x,y coordinates) depending on your needs. The memory cost is still insignificant.

Related

how to use bitboards in chess?

I am making a bitboard based chess engine and I would like to ask - assuming that I made a bitboard to every piece, what do I do with it? I read a little bit about some techniques like if you shift the pawns bit board to the left by 7 and 9 you get a bitboard representing the squares they attack, but how do I use it?
or how do I use the rook bitboard or bishop bitboard? like what are their targets, and if I find it how do I connect it with the other pieces bitboards?
I have been searching on it for days now but did not find a sufficient answer...
thanks

Bitboards is another type of board respresentation than for example a 2d array board or a 1d array. The main advantage is that they can help you generate valid moves for a position quicker and that you can use them more easily to get certain evaluation structures and parameters.
Usually you have 1 bitboard for each piece and each side (12 total), one for each color (2 total), one for all pieces, one for castling rights, one for side to move. With bit operators and bit manipulation you can calculate the valid moves for a position with the help of precomputed tables and only a few bit operations.
I suggest looking at this YouTube series which goes through the entire process of writing a bitboard chess engine from scratch.
Another good source to get how the concepts work is to look at the Chessprogramming site.
I hope it helps! It is not easy to wrap your head around, but the gain from using them is great.

parallel quadtree construction from morton ordered points

I have a collection of points [(x1,y1),(x2,y2), ..., (xn,yn)] which are Morton sorted. I wish to construct a quadtree from these points in parallel. My intuition is to construct a subtree on each core and merge all subtrees to form a complete quadtree. Can anyone provide some high level insights or pseudocode how may I do this efficiently?

First some thought on your plan:
Are you sure that parallelizing construction will help? I think there is a risk that you won't a much speedup. Quadtree construction is rather cheap on the CPU, so it will be partly bound by your memory bandwidth. Parallelization may not help much, unless you have separate memory buses, for example separate machines.
If you want to parallelize construction on parallel machines, it may be cheapest to simply create separate quadtrees by splitting your point collection in evenly sized chunks. This has one big advantage over other solution: When you want insert more points, or want to look up points, the morton order allows you to pretty efficiently determine which tree contains the point (or should contain it, for insertion). For window queries you can do a similar optimization, if the morton-codes of the 'min/min' and the 'max/max' corners of the query-window lie in the same 'chunk' (sub-tree), then you only need to query this one tree. More optimizations are possible.
If you really want to create a single quadtree on a single machine, there are several ways to split your dataset efficiently:
Walk through all points and identify global min/max. Then walk through all points and assign them (assuming 4 cores) to each core, where each core represents a quadrant. These steps are well parallelizable by splitting the dataset into 4 evenly sized chunks, and it results in a quadtree that exactly fits your dataset. You will have to synchronize insertion, into the trees, but since the dataset is morton ordered, there should be relatively few lock collisions.
You can completely avoid lock collisions during insertion by aligning the quadtrants with Morton coordinates, such that the morton-curve (a z-curve) crosses the quadrant borders only once. Disadvantage: the tree will be imbalanced, i.e. it is unlikely that all quadrants contain the same amount of data. This means your CPUs may have considerably different workloads, unless you split the sub-tree into sub-sub-trees, and so on, to distribute the load better. The split-planes for avoiding the z-curve to cross quadrant borders can be identified on the morton-code/z-code of your coordinates. Split the z-code in chunks of two bits, each to bits tell you which (sub-)quadrant to choose, i.e. 00 is lower/left, 01 is lower/right, 10 is upper/left and 11 is upper/right. Since your points a morton ordered, you can simply use binary search to find the chunks for each quadrant. I realize this maybe sound rather cryptic without more explanation. So maybe you can have a look at the PH-Tree, it is essentially are Z-Ordered (morton-ordered) quadtree (more a 'trie' than a 'tree'). There are also some in-depth explanations here and here (shameless self advertisement). The PH-Tree has some nice properties, such as inherently limiting depth to 64 levels (for 64bit numbers) while guaranteeing small nodes (4 entries max for 2 dimensions); it also guarantees, like the quadtree, that any insert/removal will never affect more than one node, plus possibly adding or removing a second node. There is also a C++ implementation here.

Fast hamming distance between 2 bitset

I'm writing a software that heavily relies on (1) accessing single bit and (2) Hamming distance computations between 2 bitset A and B (ie. the numbers of bits that differ between A and B). The bitsets are quite big, between 10K and 1M bits and i have a bunch of them. Since it is impossible to know the bitset sizes at compilation time, i'm using vector < bool > , but i plan to migrate to boost::dynamic_bitset soon.
Hereafter are my questions:
(1) Any ideas about which implementations have the fastest single bit access time?
(2) To compute Hamming distance, the naive approach is to loop over the single bits and to count differences between the 2 bitsets. But, my feeling is that it might be much faster to loop over bytes instead of bits, perform R = byteA XOR byteB, and look in a table with 255 entries what "local" distance is associated with R. Another solutions would be store a 255 x 255 matrix and access directly without operation to the distance between byteA and byteB. So my question: Any idea how to implement that from std::vector < bool > or boost::dynamic_bitset? In other words, do you know if there is a way to get access to the bytes array or i have to recode everything from scratch?

(1) Probably vector<char> (or even vector<int>), but that wastes at least 7/8 space on typical hardware. You don't need to unpack the bits if you use a byte or more to store them. Which of vector<bool> or dynamic_bitset is faster, I don't know. That might depend on the C++ implementation.
(2) boost::dynamic_bitset has operator^ and a count member, which together can be used to compute the Hamming distance in a probably fast, though memory-wasting way. You can also get to the underlying buffer with to_block_range; to use that, you need to implement a Hamming distance calculator as an OutputIterator.

If you do code from scratch, you can probably do even better than a byte at a time: take a word at a time from each bitset. The cost of XOR should be very low, then use either an implementation-specific builtin popcount, or else the fastest bit-twiddling popcount you can find (which may or may not involve a 256-entry lookup).
[Edit: looks as if this could apply to boost::dynamic_bitset::to_block_range, with the Block chosen as either int or long. It's a shame that it writes to an OutputIterator rather than giving you an InputIterator -- I can't immediately see how to use it to iterate over two bitsets together, except by using an extra thread or else copying one of the bitsets out to an int array first. Either way you'll take some copy overhead that could have been avoided if it had left the program control to you. The thread is pretty complicated for this task, and of course has its own overheads, and copying out the data probably isn't any better than using operator^ and count().]

I know this will get downvoted for heresy, but here it is: you can get a pointer to the actual data from a vector using &vector[0]; (for vector ymmv). Then, you can iterate over it using c-style functions; meaning, cast your pointer to an int pointer or something big like that, perform your hamming arithmetic as above, and move the pointer one word-length at a time. This would only work because you know that the bits are packed together continuously, and would be vulnerable (for example, if the vector is modified, it could move memory locations).

Perfect hash function for a set of integers with no updates

In one of the applications I work on, it is necessary to have a function like this:
bool IsInList(int iTest)
{
//Return if iTest appears in a set of numbers.
}
The number list is known at app load up (But are not always the same between two instances of the same application) and will not change (or added to) throughout the whole of the program. The integers themselves maybe large and have a large range so it is not efficient to have a vector<bool>. Performance is a issue as the function sits in a hot spot. I have heard about Perfect hashing but could not find out any good advice. Any pointers would be helpful. Thanks.
p.s. I'd ideally like if the solution isn't a third party library because I can't use them here. Something simple enough to be understood and manually implemented would be great if it were possible.

I would suggest using Bloom Filters in conjunction with a simple std::map.
Unfortunately the bloom filter is not part of the standard library, so you'll have to implement it yourself. However it turns out to be quite a simple structure!
A Bloom Filter is a data structure that is specialized in the question: Is this element part of the set, but does so with an incredibly tight memory requirement, and quite fast too.
The slight catch is that the answer is... special: Is this element part of the set ?
No
Maybe (with a given probability depending on the properties of the Bloom Filter)
This looks strange until you look at the implementation, and it may require some tuning (there are several properties) to lower the probability but...
What is really interesting for you, is that for all the cases it answers No, you have the guarantee that it isn't part of the set.
As such a Bloom Filter is ideal as a doorman for a Binary Tree or a Hash Map. Carefully tuned it will only let very few false positive pass. For example, gcc uses one.

What comes to my mind is gperf. However, it is based in strings and not in numbers. However, part of the calculation can be tweaked to use numbers as input for the hash generator.

integers, strings, doesn't matter
http://videolectures.net/mit6046jf05_leiserson_lec08/
After the intro, at 49:38, you'll learn how to do this. The Dot Product hash function is demonstrated since it has an elegant proof. Most hash functions are like voodoo black magic. Don't waste time here, find something that is FAST for your datatype and that offers some adjustable SEED for hashing. A good combo there is better than the alternative of growing the hash table.
#54:30 The Prof. draws picture of a standard way of doing perfect hash. The perfect minimal hash is beyond this lecture. (good luck!)
It really all depends on what you mod by.
Keep in mind, the analysis he shows can be further optimized by knowing the hardware you are running on.
The std::map you get very good performance in 99.9% scenarios. If your hot spot has the same iTest(s) multiple times, combine the map result with a temporary hash cache.
Int is one of the datatypes where it is possible to just do:
bool hash[UINT_MAX]; // stackoverflow ;)
And fill it up. If you don't care about negative numbers, then it's twice as easy.

A perfect hash function maps a set of inputs onto the integers with no collisions. Given that your input is a set of integers, the values themselves are a perfect hash function. That really has nothing to do with the problem at hand.
The most obvious and easy to implement solution for testing existence would be a sorted list or balanced binary tree. Then you could decide existence in log(N) time. I doubt it'll get much better than that.

For this problem I would use a binary search, assuming it's possible to keep the list of numbers sorted.
Wikipedia has example implementations that should be simple enough to translate to C++.

It's not necessary or practical to aim for mapping N distinct randomly dispersed integers to N contiguous buckets - i.e. a perfect minimal hash - the important thing is to identify an acceptable ratio. To do this at run-time, you can start by configuring a worst-acceptible ratio (say 1 to 20) and a no-point-being-better-than-this-ratio (say 1 to 4), then randomly vary (e.g. changing prime numbers used) a fast-to-calculate hash algorithm to see how easily you can meet increasingly difficult ratios. For worst-acceptible you don't time out, or you fall back on something slower but reliable (container or displacement lists to resolve collisions). Then, allow a second or ten (configurable) for each X% better until you can't succeed at that ratio or reach the no-pint-being-better ratio....
Just so everyone's clear, this works for inputs only known at run time with no useful patterns known beforehand, which is why different hash functions have to be trialed or actively derived at run time. It is not acceptible to simple say "integer inputs form a hash", because there are collisions when %-ed into any sane array size. But, you don't need to aim for a perfectly packed array either. Remember too that you can have a sparse array of pointers to a packed array, so there's little memory wasted for large objects.

Original Question
After working with it for a while, I came up with a number of hash functions that seemed to work reasonably well on strings, resulting in a unique - perfect hashing.
Let's say the values ranged from L to H in the array. This yields a Range R = H - L + 1.
Generally it was pretty big.
I then applied the modulus operator from H down to L + 1, looking for a mapping that keeps them unique, but has a smaller range.
In you case you are using integers. Technically, they are already hashed, but the range is large.
It may be that you can get what you want, simply by applying the modulus operator.
It may be that you need to put a hash function in front of it first.
It also may be that you can't find a perfect hash for it, in which case your container class should have a fall back position.... binary search, or map or something like that, so that
you can guarantee that the container will work in all cases.

A trie or perhaps a van Emde Boas tree might be a better bet for creating a space efficient set of integers with lookup time bring constant against the number of objects in the data structure, assuming that even std::bitset would be too large.

Best Data Structure for Genetic Algorithm in C++?

i need to implement a genetic algorithm customized for my problem (college project), and the first version had it coded as an matrix of short ( bits per chromosome x size of population).
That was a bad design, since i am declaring a short but only using the "0" and "1" values... but it was just a prototype and it worked as intended, and now it is time for me to develop a new, improved version. Performance is important here, but simplicity is also appreciated.
I researched around and came up with:
for the chromosome :
- String class (like "0100100010")
- Array of bool
- Vector (vectors appears to be optimized for bool)
- Bitset (sounds the most natural one)
and for the population:
- C Array[]
- Vector
- Queue
I am inclined to pick vector for chromossome and array for pop, but i would like the opinion of anyone with experience on the subject.
Thanks in advance!

I'm guessing you want random access to the population and to the genes. You say performance is important, which I interpret as execution speed. So you're probably best off using a vector<> for the chromosomes and a vector<char> for the genes. The reason for vector<char> is that bitset<> and vector<bool> are optimized for memory consumption, and are therefore slow. vector<char> will give you higher speed at the cost of x8 memory (assuming char = byte on your system). So if you want speed, go with vector<char>. If memory consumption is paramount, then use vector<bool> or bitset<>. bitset<> would seem like a natural choice here, however, bear in mind that it is templated on the number of bits, which means that a) the number of genes must be fixed and known at compile time (which I would guess is a big no-no), and b) if you use different sizes, you end up with one copy per bitset size of each of the bitset methods you use (though inlining might negate this), i.e., code bloat. Overall, I would guess vector<bool> is better for you if you don't want vector<char>.
If you're concerned about the aesthetics of vector<char> you could typedef char gene; and then use vector<gene>, which looks more natural.
A string is just like a vector<char> but more cumbersome.

Specifically to answer your question. I am not exactly sure what you are suggestion. You talk about Array and string class. Are you talking about the STL container classes where you can have a queue, bitset, vector, linked list etc. I would suggest a vector for you population (closest thing to a C array there is) and a bitset for you chromosome if you are worried about memory capacity. Else as you are already using a vector of your string representaion of your dna. ("10110110")
For ideas and a good tool to dabble. Recommend you download and initially use this library. It works with the major compilers. Works on unix variants. Has all the source code.
All the framework stuff is done for you and you will learn a lot. Later on you could write your own code from scratch or inherit from these classes. You can also use them in commercial code if you want.
Because they are objects you can change representaion of your DNA easily from integers to reals to structures to trees to bit arrays etc etc.
There is always learning cure involved but it is worth it.
I use it to generate thousands of neural nets then weed them out with a simple fitness function then run them for real.
galib
http://lancet.mit.edu/ga/

Assuming that you want to code this yourself (if you want an external library kingchris seems to have a good one there) it really depends on what kind of manipulation you need to do. To get the most bang for your buck in terms of memory, you could use any integer type and set/manipulate individual bits via bitmasks etc. Now this approach likely not optimal in terms of ease of use... The string example above would work ok, however again its not significantly different than the shorts, here you are now just representing either '0' or '1' with an 8 bit value as opposed to 16 bit value. Also, again depending on the manipulation, the string case will probably prove unwieldly. So if you could give some more info on the algorithm we could maybe give more feedback. Myself I like the individual bits as part of an integer (a bitset), but if you aren't used to masks, shifts, and all that good stuff it may not be right for you.

I suggest writing a class for each member of population, that simplifies things considerably, since you can keep all your member relevant functions in the same place nicely wrapped with the actual data.
If you need a "array of bools" I suggest using an int or several ints (then use mask and bit wise operations to access (modify / flip) each bit) depending on number of your chromosomes.
I usually used some sort of collection class for the population, because just an array of population members doesn't allow you to simply add to your population. I would suggest implementing some sort of dynamic list (if you are familiar with ArrayList then that is a good example).
I had major success with genetic algorithms with the recipe above. If you prepare your member class properly it can really simplify things and allows you to focus on coding better genetic algorithms instead of worrying about your data structures.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js