Gomoku datas representation in C

Gomoku datas representation in C - c++

I'm working on a Gomoku game I'm currently done with GUI etc, and I need to code the IA and Rule Checker (for optional rules such as Capture, forbidden patterns etc).
I was planning on representing the board with an int array something like:
uint goban[361];
Which would represent a 19 * 19 Goban (board). Let's say we can split a 32bit integer in 4 byte and within each byte we can stock metadata like this for example:
1st byte: Is this case empty/black/white ?
2nd byte: Is this case part of a special pattern ?
3rd byte: In which position of the pattern am I ?
4th byte: Am I capturable ?
I don't know if this kind of solution is suitable for a Gomoku AI but the main problem I've is how to write it properly. Let's take pattern:
-OO-O-
It's a open & free three, it has space inside and at the extremity. How Am I supposed to link this pattern with a static representation without coordinates ?
One other concern is when should I update pattern and how because out of 361 case it can be pretty long if I update the previous figure to this:
XOO-O-
I've to update all four case so I don't think it's apropriate, plus it can affect many other vertical / diagonal patterns.
Should I rather make a list of patterns currently on the map like this:
std::list<ThreatList> tlist;
and make the map a simple tribool or char array ?
I want my data representation to give me maximum information to get a fast update of the influence map which would be filled by my evaluation function. I've read couple things about threat space search and other Gomoku algorithm but they don't talk about data representation and I don't get how to do it correctly, can you please help me find a clean way to represent pattern and how to update them.
Thanks you.

Take a look at this open source Gomoku:
https://github.com/garretraziel/gomoku
I think you will find a lot of interesting ideas in there.

Related

How to set up word wrap for an stc.StyledTextCtrl() in wxPython

I was wondering about this, so I did quite a bit of google searches, and came up with the SetWrapMode(self, mode) function. However, it was never really detailed, and there was nothing that really said how to use it. I ended up figuring it out, so I thought I'd post a thread here and answer my own question for anyone else who is wondering how to make an stc.StyledTextCtrl() have word wrap.

Ok, so first you need to have your Styled Text Control already defined, of course. If you don't know how to do this, then go watch some tutorials on wxPython. I recommend a youtuber called sentdex http://youtube.com/sentdex, who has a complete series on wxPython, as well as Zach King, who has a 4 episode series on making a text editor. Anyways, my definition of my text control looks like this: self.control = stc.StyledTextCtrl(self, style=wx.TE_MULTILINE). Yours could look a little different, but the overall idea is the same.
self.control = stc.StyledTextCtrl(self, style=wx.TE_MULTILINE)
Many places will tell you that it will need to be SetWrapMode(self, mode), but if you have self.CONTROLNAME at the beginning like I do, you will get an error if you also put self as an argument because self. at the beginning counts as the argument. However, if your control is defined with self.CONTROLNAME and you don't put the self.CONTROLNAME at the beginning of your SetWordWrap()function, you'll also get an error, so be careful with that. Mode just has to be 0 or 1-3. So for example, mine looks like this: self.control.SetWrapMode(mode=1). Word wrap mode options:
0: None |
1: Word Wrap |
2: Character Wrap |
3: White Space Wrap
My final definition and word wrap setup looks like this:
self.control = stc.StyledTextCtrl(self, style=wx.TE_MULTILINE)
self.control.SetWrapMode(mode=1)
And that's it! Hope this helped.
Thanks to #Chris Beaulieu for correcting me on an issue with the mode options.

I see you answered your own question, and you are right in every way except for one small detail. There are actually several different wrap modes. The types and values corresponding to them are as follows:
0: None
1: Word Wrap
2: Character Wrap
3: White Space Wrap
So you cannot enter any value above 0 to get word wrap. In fact if you enter a value outside of the 0-3 you should just end up getting no wrap as the value shouldn't be recognized by Scintilla, which is what the stc library is.

It would be more maintainable to use the constants stc.WRAP_NONE, stc.WRAP_WORD, stc.WRAP_CHAR and stc.WRAP_WHITESPACE instead of their numerical values.

Complex regex to check for two words and a quanity

Ok what I'm trying to do is to check for the presence of
"TestItem-1"
a number greater then 1
one of the possible words in the list of "KG. Kg, kg, Kilo(s) or Kilogram(s)"
Where any of the items could be in any order and within a 6 word limit of each other.
Has to be done in regex as there is no access to the underlying scripting engine
This is what I've got as there a way of checking greater then I decided to use a range of 1-999 for the number check.
\b(?:[T|t]estItem-1\W+(?:\w+\W+){1,6}(^[0-9]|[1-9][0-9]|[1-9][0-9][0-9])$)\W+(?:\w+\W+){1,6}[K|k]il[o|os]|[K|k][[G|GS]|[g|gs]]|[|K|k]ilogra[m|ms]\b
Examples of what I need to find would be like -
"TestItem-1 is unstable in quanties above 12 Kilograms"
"1 Kilogram of TestItem-1"
While I wouldn't want to find -
"15 units of TestItem-1"
I know that what I got isn't working each section appears to work independently of each other but not together.
I pass this over to far greater minds then mine :)

You can try something like this:
\b(?:[2-9]|\d\d+)\b\s\b(?:KG.|Kg,|kg,|Kilos?|Kilograms?)\b(?:\S+\s){0,6}\bTestItem-1\b|\bTestItem-1\b(?:\S+\s){0,6}\b(?:[2-9]|\d\d+)\b\s\b(?:KG.|Kg,|kg,|Kilos?|Kilograms?)\b
Not ideal with the duplication but without lookarounds that's the best I could think of. I'll try and improve it in a bit.

Search Large Text File for Thousands of strings

I have a large text file that is 20 GB in size. The file contains lines of text that are relatively short (40 to 60 characters per line). The file is unsorted.
I have a list of 20,000 unique strings. I want to know the offset for each string each time it appears in the file. Currently, my output looks like this:
netloader.cc found at offset: 46350917
netloader.cc found at offset: 48138591
netloader.cc found at offset: 50012089
netloader.cc found at offset: 51622874
netloader.cc found at offset: 52588949
...
360doc.com found at offset: 26411474
360doc.com found at offset: 26411508
360doc.com found at offset: 26483662
360doc.com found at offset: 26582000
I am loading the 20,000 strings into a std::set (to ensure uniqueness), then reading a 128MB chunk from the file, and then using string::find to search for the strings (start over by reading another 128MB chunk). This works and completes in about 4 days. I'm not concerned about a read boundary potentially breaking a string I'm searching for. If it does, that's OK.
I'd like to make it faster. Completing the search in 1 day would be ideal, but any significant performance improvement would be nice. I prefer to use standard C++ with Boost (if necessary) while avoiding other libraries.
So I have two questions:
Does the 4 day time seem reasonable considering the tools I'm using and the task?
What's the best approach to make it faster?
Thanks.
Edit: Using the Trie solution, I was able to shorten the run-time to 27 hours. Not within one day, but certainly much faster now. Thanks for the advice.

Algorithmically, I think that the best way to approach this problem, would be to use a tree in order to store the lines you want to search for a character at a time. For example if you have the following patterns you would like to look for:
hand, has, have, foot, file
The resulting tree would look something like this:
The generation of the tree is worst case O(n), and has a sub-linear memory footprint generally.
Using this structure, you can begin process your file by reading in a character at a time from your huge file, and walk the tree.
If you get to a leaf node (the ones shown in red), you have found a match, and can store it.
If there is no child node, corresponding to the letter you have red, you can discard the current line, and begin checking the next line, starting from the root of the tree
This technique would result in linear time O(n) to check for matches and scan the huge 20gb file only once.
Edit
The algorithm described above is certainly sound (it doesn't give false positives) but not complete (it can miss some results). However, with a few minor adjustments it can be made complete, assuming that we don't have search terms with common roots like go and gone. The following is pseudocode of the complete version of the algorithm
tree = construct_tree(['hand', 'has', 'have', 'foot', 'file'])
# Keeps track of where I'm currently in the tree
nodes = []
for character in huge_file:
foreach node in nodes:
if node.has_child(character):
node.follow_edge(character)
if node.isLeaf():
# You found a match!!
else:
nodes.delete(node)
if tree.has_child(character):
nodes.add(tree.get_child(character))
Note that the list of nodes that has to be checked each time, is at most the length of the longest word that has to be checked against. Therefore it should not add much complexity.

The problem you describe looks more like a problem with the selected algorithm, not with the technology of choice. 20000 full scans of 20GB in 4 days doesn't sound too unreasonable, but your target should be a single scan of the 20GB and another single scan of the 20K words.
Have you considered looking at some string matching algorithms? Aho–Corasick comes to mind.

Rather than searching 20,000 times for each string separately, you can try to tokenize the input and do lookup in your std::set with strings to be found, it will be much faster. This is assuming your strings are simple identifiers, but something similar can be implemented for strings being sentences. In this case you would keep a set of first words in each sentence and after successful match verify that it's really beginning of the whole sentence with string::find.

Custom Textbox: Highlighting and Selection

I posted a question similar to this earlier, however, after thinking about it and testing the answers, I believe I misinterpreted the answers and the answerer(s) misinterpreted me. The original question is here. I think people believed that I just wanted to highlight strings, I didn't state my exact purpose. So, I will now:
What I've been trying to do lately is create a 100% from scratch text box in C++ CLR using GDI+. I've gotten to the challange of placing the caret when the user clicks in the textbox. Doing simple math (Where they clicked divided by line width) I can figure out which line they clicked. But in order to get the character clicked, I need (unless there are better ways) to compare the bounding rectangles of all the characters in the line and place the caret before the one the mouse fits into. In order to do this, I need to get the exact bounds of each individual character, not an entire string.
I've already tried a few things, none of which seemed to work:
Graphics::MeasureString is not recommended by anyone, nor does
it give what I want
TextRenderer::MeasureText is more accurate, but for this not accurate enough
Graphics::MeasureCharacterRanges has a 32 character
cap, and I'm expecting lines to be over 32 characters long in some
cases
I believe I can't use these methods, unless there are ways around their limitations. I hope I made my problem and expected solution a lot more clear than I previously did.

Because of the way text is kerned and anti-aliased, the boundary of a character depends on all of the characters to the left of it. However you don't need to know every character boundary, only the ones on either side of your click point. You can find those with a binary search - split your string in half, measure that (using TextRenderer::MeasureText), and determine if it's to the left or right of your click point. Keep narrowing down the size of the string until there's only one possibility remaining.

Regex for binary multiple of 3

I would like to know how can I construct a regex to know if a number in base 2 (binary) is multiple of 3. I had read in this thread Check if a number is divisible by 3 but they dont do it with a regex, and the graph someone drew is wrong(because it doesn't accept even numbers). I have tried with: ((1+)(0*)(1+))(0) but it doesn't works for some values. Hope you can help me.
UPDATE:
Ok, thanks all for your help, now I know how to draw the NFA, here I left the graph and the regular expresion:
In the graph, the states are the number in base 10 mod 3.
For example: to go to state 1 you have to have 1, then you can add 1 or 0, if you add 1, you would have 11(3 in base 10), and this number mod 3 is 0 then you draw the arc to the state 0.
((0*)((11)*)((1((00) *)1) *)(101 *(0|((00) *1 *) *0)1) *(1(000)+1*01)*) *
And the other regex works, but this is shorter.
Thanks a lot :)

I know this is an old question, but an efficient answer is yet to be given and this question pops up first for "binary divisible by 3 regex" on Google.
Based on the DFA proposed by the author, a ridiculously short regex can be generated by simplifying the routes a binary string can take through the DFA.
The simplest one, using only state A, is:
0*
Including state B:
0*(11)*0*
Including state C:
0*(1(01*0)*1)*0*
And include the fact that after going back to state A, the whole process can be started again.
0*((1(01*0)*1)*0*)*
Using some basic regex rules, this simplifies to
(1(01*0)*1|0)*
Have a nice day.

If I may plug my solution for this code golf question! It's a piece of JavaScript that generates regexes (probably inefficiently, but does the job) for divisibility for each base.
This is what it generates for divisibility by 3 in base 2:
/^((((0+)?1)(10*1)*0)(0(10*1)*0|1)*(0(10*1)*(1(0+)?))|(((0+)?1)(10*1)*(1(0+)?)|(0(0+)?)))$/
Edit: comparing to Asmor's, probably very inefficient :)
Edit 2: Also, this is a duplicate of this question.

For some who is learning and searching how to do this:
see this video:
https://www.youtube.com/watch?v=SmT1DXLl3f4&t=138s
write state quations and solve them with Axden's Theorem
The way I did is visible in the image-result is the same as pointed out by user #Kert Ojasoo. I hope i did it corretly because i spent 2 days to solve it...

n+2n = 3n. Thus, 2 adjacent bits set to 1 denote a multiple of 3. If there are an odd number of adjacent 1s, that would not be 3.
So I'd propose this regex:
(0*(11)?)+

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js