OpenCv: License Plate Recognition - c++

I have been working on License Plate Recognition based on github repository
https://github.com/MicrocontrollersAndMore/OpenCV_3_License_Plate_Recognition_Cpp
but I need to detect small characters. but I can't figure it out.
I think I need to change on the size checking but I can't figure it out.
https://github.com/MicrocontrollersAndMore/OpenCV_3_License_Plate_Recognition_Cpp/blob/master/DetectChars.cpp
bool checkIfPossibleChar(PossibleChar &possibleChar) {
// this function is a 'first pass' that does a rough check on a contour to see if it could be a char,
// note that we are not (yet) comparing the char to other chars to look for a group
if (possibleChar.boundingRect.area() > MIN_PIXEL_AREA &&
possibleChar.boundingRect.width > MIN_PIXEL_WIDTH && possibleChar.boundingRect.height > MIN_PIXEL_HEIGHT &&
MIN_ASPECT_RATIO < possibleChar.dblAspectRatio && possibleChar.dblAspectRatio < MAX_ASPECT_RATIO) {
return(true);
} else {
return(false);
}}
AND
double dblDistanceBetweenChars = distanceBetweenChars(possibleChar, possibleMatchingChar);
double dblAngleBetweenChars = angleBetweenChars(possibleChar, possibleMatchingChar);
double dblChangeInArea = (double)abs(possibleMatchingChar.boundingRect.area() - possibleChar.boundingRect.area()) / (double)possibleChar.boundingRect.area();
double dblChangeInWidth = (double)abs(possibleMatchingChar.boundingRect.width - possibleChar.boundingRect.width) / (double)possibleChar.boundingRect.width;
double dblChangeInHeight = (double)abs(possibleMatchingChar.boundingRect.height - possibleChar.boundingRect.height) / (double)possibleChar.boundingRect.height;
// check if chars match
if (dblDistanceBetweenChars < (possibleChar.dblDiagonalSize * MAX_DIAG_SIZE_MULTIPLE_AWAY) &&
dblAngleBetweenChars < MAX_ANGLE_BETWEEN_CHARS &&
dblChangeInArea < MAX_CHANGE_IN_AREA &&
dblChangeInWidth < MAX_CHANGE_IN_WIDTH &&
dblChangeInHeight < MAX_CHANGE_IN_HEIGHT) {
vectorOfMatchingChars.push_back(possibleMatchingChar); // if the chars are a match, add the current char to vector of matching chars
}
Thanks a lot in Advance.

You should first debug to see at which conditions the two A, A fails.
The MIN_PIXEL_AREA, MIN_PIXEL_WIDTH & MIN_PIXEL_HEIGHT may not be able to accomodate the small size A.
In second code snippet you provided, change the syntax of if statement from
if(condition1 && cond2 &&...)
to syntax
if(condition1) {if(codition2) {....}}. This will tell you where these condition fails.
Finally, in second snippet, a lot of conditions to check if the bounding rect is a character depends a lot on what kind of character is seen in past. Since in your case, the character AA differs in size, distance and direction(vertical) as well. Thus it would be better to reinitialize for AA instead of using previous characters, or some more conditions should be added for validating characters.[Like if both height and width decreased]
Once you know which conditions fails in step 2 and why, making relevant changes of step 3 should be simple.
Edit:
I looked further through the repo, and checked function findVectorOfVectorsOfMatchingChars and findVectorOfMatchingChars.
Analysis of findVectorOfMatchingChars function: This function takes a possibleChar and checks if this char is close(when all if condition passes) match with any of the possibleChar of vectorOfChars. If there is a match, store all matches together and return the results
Analysis of findVectorOfVectorsOfMatchingChars function: This function picks any possibleChar from vectorOfPossibleChars and finds all it matches using findVectorOfMatchingChars. If a good match is found, this function calls itself using (vectorOfPossibleChars - matchedPossibleChars).
Now, here is the problem.
Lets say each possibleChar is a vertex of the graph G and there is an edge between two possibleChar iff they satisfy the condition defined in findVectorOfMatchingChars function.
Now, lets say we have a graph with A,B,C,D,X as possibleChar vertex with X close enough to A,B,C,D but A,B,C,D are just far enough of each other to not be considered a close match.
Now let's apply findVectorOfVectorsOfMatchingChars on this vector of possibleChars.
Option 1: If we choose X first, we find A,B,C,D as its matching possibleChar and thus we get all possibleChar.
Option 2: If we choose A first, we find X to be matching possibleChar of A, but not B,C,D. Thus we remove A,X from vectorOfPossibleChars and reapply findVectorOfVectorsOfMatchingChars on B,C,D. Now, since there is no match between B,C,D, we end up with no match for B, or C or D.
Solution to rectify:
Create a graph class and register each possibleChar in it as Vertex. Make edges between each pair of vertex using conditions defined in findVectorOfMatchingChars.
You may need to customize conditions to incorporate the edges between other vertices and the 2 A's vertex. For this, you should use more datasets, so that the condtion you create or changing of threshold is not too generic to accomodate non-license plate chars.
Find connected tree in the graph to find all the characters. This may add all possibleChars. TO avoid that, you can limit addition using weighted edge.

Related

How can I check if two cells are equal in brainf*ck?

How can I check if the value in cell #0 is equal to the value in cell #1? I am trying to write code equivalent to:
if(a == b)
{
//do stuff
}
else
{
//do something else
}
I have read Brainfuck compare 2 numbers as greater than or less than, and the second answer gave me a general idea of what I'd need to do, but I cannot figure it out. (That solution gives if a < b, else.)
I am thinking I need to do something along the lines of decrementing both values, and if they reach 0 at the same time, then they are true. But I keep getting stuck at the same exit point every time I think about it.
How can I check if two cells are equal in brainfuck?
I think I have it, I'm not a brainfuck expert but this question looked interesting. There might be a simpler way to do it, but I went with your method of decrementing values one by one.
In this case, if the two values in cell 0 and 1 are equal jump a ton forward, if they are not equal jump a little forward (second brackets is the not equal case, third brackets is the equal case)
Note that I'm using brainfucks while statements as a ghetto if (cell != 0)
+++++++++++++++++
>
+++++++++++++++++
>+<
[ - < - >] <[>>>>>] >> [>>>>>>>>>>>>>>>>>>>>>]
Try it online: http://fatiherikli.github.io/brainfuck-visualizer/#KysrKysrKysrKysrKysrKysKPgorKysrKysrKysrKysrKysrKwo+KzwKWyAtIDwgLSA+XSA8Wz4+Pj4+XSA+PiBbPj4+Pj4+Pj4+Pj4+Pj4+Pj4+Pj4+XQoKCg==
An example implementation, print T (true) if the two values are equal, F (false) if they are not equal
http://fatiherikli.github.io/brainfuck-visualizer/#KysrCj4KKysrKwo+KzwKWyAtIDwgLSA+XSA8Wz4+PgorKysrKysrKysrKysrKysrKysrKworKysrKysrKysrKysrKysrKysrKworKysrKysrKysrKysrKysrKysrKworKysrKysrKysrCi4KPgoKXSA+PiBbCisrKysrKysrKysrKysrKysrKysrCisrKysrKysrKysrKysrKysrKysrCisrKysrKysrKysrKysrKysrKysrCisrKysrKysrKysrKysrKysrKysrCisrKwouCj4KXQ==
+>>(a+++++)>(b+++++)>>+<<<
[[->]<<]
<
[>>>>>-<<<<<
a>b
]
>>
[->>-<
a<b
]
>>
[-
a=b
]
Pointer ends on the same pointer in the same state but the code within the appropriate brackets has been executed.
I came up with this for my bf compiler thing
basically it subtracts and then checks if the result is 0.
Can be easily changed to execute stuff in if/else-ish way
Layout:
[A] B
>[-<->]+<[>-<[-]]>
Output
0 [result]
Result is 1 if equal

Repeat random placing of ships in battleship, depending on restrictions

I'm creating a battleship field and I have created a method that creates spots based off of random parameters. The spots would be like the "1a", "2a" setup.
Here's what it looks like when you call it
ships.append(place_ship(ships, randint(2,3), letter, str(randint(1,4)), random.choice([True,False])))
randint(2,3) refers to the length of the ship, the random choice parameter is whether the boat is vertical or not, the letter refers to the random letter from this:
letter = ''
while letter != "a" and letter != "b" and letter != "c":
letter = random.choice(string.ascii_lowercase)
The method that creates it is place_ships:
def place_ship(ships, length, x, y, vert):
idk = {}
for i in range(length):
square = y+x
if vert:
x = chr(ord(x) + 1)
else:
y = str(int(y) + 1)
idk[i] = square
return idk
My question is how to place the ships without having any of the spots cross ships.
Here is an example of what the locations in a list look like:
['2c', '2d', '3c', '4c', '2c', '2d']
So 2c,2d is a ship 3c,4c, then 2c,2d
I've tried to do the following, but it only checks the first ship
for i in all_ship_loc:
occurence = all_ship_loc.count(i)
while occurence > 1:
place_ship(ships, randint(2,3), letter, str(randint(1,4)),random.choice([True,False]))
Thanks!
You may want to cretate a dictionary for every possible place, so you check if the generated position is already occuped.
dict = { "a1" : True } # True is occuped
Or you could use a list but with only the occuped places.
You might store a matrix of occupied places (values 0,1). Placing a ship occupies some elements (spots where the ship is located, adjacent etc.). For a given ship size, and a given row/column you easely can find out amount of positions available. So your alogorithm may be as follows:
starting from biggest ship
randomly choose a direction (horisontal/vertical)
enumerate positions available in that direction
if none available, reposition some last ships
choose a random position within available set, mark corresponding spots occupied

Algorithm to find out whether the matches for two Glob patterns (or Regular Expressions) intersect

I'm looking at matching glob-style patterns similar the what the Redis KEYS command accepts. Quoting:
h?llo matches hello, hallo and hxllo
h*llo matches hllo and heeeello
h[ae]llo matches hello and hallo, but not hillo
But I am not matching against a text string, but matching the pattern against another pattern with all operators being meaningful on both ends.
For example these patterns should match against each other in the same row:
prefix* prefix:extended*
*suffix *:extended:suffix
left*right left*middle*right
a*b*c a*b*d*b*c
hello* *ok
pre[ab]fix* pre[bc]fix*
And these should not match:
prefix* wrong:prefix:*
*suffix *suffix:wrong
left*right right*middle*left
pre[ab]fix* pre[xy]fix*
?*b*? bcb
So I'm wondering ...
if this is possible to do (implement a verification algorithm), if at all?
if not possible, what subset of regex would be possible? (i.e. disallow * wildcard?)
if it is indeed possible, what is an efficient algorithm?
what are the time complexity required?
EDIT: Found this other question on RegEx subset but this is not exactly the same as the words that hello* and *ok matches is not a subset/superset of each other but they do intersect.
So I guess mathematically, this might be phrased as; is it possible to deterministically check that a set of words that one pattern match, intersecting with a set of words that another pattern matches, result in a non-empty set?
EDIT: A friend #neizod drew up this elimination table which neatly visualize what might be a potential/partial solution: Elimination rule
EDIT: Will adds extra bounty for those who can also provide working code (in any language) and test cases that proves it.
EDIT: Added the ?*b*? test case discovered by #DanielGimenez in the comments.
Now witness the firepower of this fully ARMED and OPERATIONAL battle station!
(I have worked too much on this answer and my brain has broken; There should be a badge for that.)
In order to determine if two patterns intersect, I have created a recursive backtracking parser -- when Kleene stars are encountered a new stack is created so that if it fails in the future everything is rolled back and and the star consumes the next character.
You can view the history of this answer to determine how arrived at all this and why it was necessary, but basically it wasn't sufficient to determine an intersection by looking ahead only one token, which was what I was doing before.
This was the case that broke the old answer [abcd]d => *d. The set matches the d after the star, so the left side would still have tokens remaining, while the right side would be complete. However, these patterns two intersect on ad, bd, cd and dd, so that needed to be fixed. My almost O(N) answer was thrown out.
Lexer
The lexing process is trivial, except that is processes escape characters and removes redundant stars. Tokens are broken out into sets, stars, wild character (?), and character. This is different than my previous versions where one token was a string of characters instead of a single character. As more cases come up, having strings as tokens was more of a hindrance than advantage.
Parser
Most of the functions of the parser are pretty trivial. A switch given the left side's type, calls a function that is a switch that determines the appropriate function to compare it with the right side's type. The result from the comparison bubbles up the two switches to the original callee, typically the main loop of the parser.
Parsing Stars
The simplicity ends with the star. When that is encountered it takes over everything. First it compares its side's next token with the other side's, advancing the other side until if finds a match.
Once the match is found, it then checks if everything matches all the way up to the end of both patterns. If it does then the patterns intersect. Otherwise, it advances the other side's next token from the original one it was compared against and repeats the process.
When two anys are encountered then the take off into their own alternative branches starting from each others' next token.
function intersects(left, right) {
var lt, rt,
result = new CompareResult(null, null, true);
lt = (!left || left instanceof Token) ? left : tokenize(left);
rt = (!right || right instanceof Token) ? right : tokenize(right);
while (result.isGood && (lt || rt)) {
result = tokensCompare(lt, rt);
lt = result.leftNext;
rt = result.rightNext;
}
return result;
}
function tokensCompare(lt, rt) {
if (!lt && rt) return tokensCompare(rt, lt).swapTokens();
switch (lt.type) {
case TokenType.Char: return charCompare(lt, rt);
case TokenType.Single: return singleCompare(lt, rt);
case TokenType.Set: return setCompare(lt, rt);
case TokenType.AnyString: return anyCompare(lt, rt);
}
}
function anyCompare(tAny, tOther) {
if (!tOther) return new CompareResult(tAny.next, null);
var result = CompareResult.BadResult;
while (tOther && !result.isGood) {
while (tOther && !result.isGood) {
switch (tOther.type) {
case TokenType.Char: result = charCompare(tOther, tAny.next).swapTokens(); break;
case TokenType.Single: result = singleCompare(tOther, tAny.next).swapTokens(); break;
case TokenType.Set: result = setCompare(tOther, tAny.next).swapTokens(); break;
case TokenType.AnyString:
// the anyCompare from the intersects will take over the processing.
result = intersects(tAny, tOther.next);
if (result.isGood) return result;
return intersects(tOther, tAny.next).swapTokens();
}
if (!result.isGood) tOther = tOther.next;
}
if (result.isGood) {
// we've found a starting point, but now we want to make sure this will always work.
result = intersects(result.leftNext, result.rightNext);
if (!result.isGood) tOther = tOther.next;
}
}
// If we never got a good result that means we've eaten everything.
if (!result.isGood) result = new CompareResult(tAny.next, null, true);
return result;
}
function charCompare(tChar, tOther) {
if (!tOther) return CompareResult.BadResult;
switch (tOther.type) {
case TokenType.Char: return charCharCompare(tChar, tOther);
case TokenType.Single: return new CompareResult(tChar.next, tOther.next);
case TokenType.Set: return setCharCompare(tOther, tChar).swapTokens();
case TokenType.AnyString: return anyCompare(tOther, tChar).swapTokens();
}
}
function singleCompare(tSingle, tOther) {
if (!tOther) return CompareResult.BadResult;
switch (tOther.type) {
case TokenType.Char: return new CompareResult(tSingle.next, tOther.next);
case TokenType.Single: return new CompareResult(tSingle.next, tOther.next);
case TokenType.Set: return new CompareResult(tSingle.next, tOther.next);
case TokenType.AnyString: return anyCompare(tOther, tSingle).swapTokens();
}
}
function setCompare(tSet, tOther) {
if (!tOther) return CompareResult.BadResult;
switch (tOther.type) {
case TokenType.Char: return setCharCompare(tSet, tOther);
case TokenType.Single: return new CompareResult(tSet.next, tOther.next);
case TokenType.Set: return setSetCompare(tSet, tOther);
case TokenType.AnyString: return anyCompare(tOther, tSet).swapTokens();
}
}
function anySingleCompare(tAny, tSingle) {
var nextResult = (tAny.next) ? singleCompare(tSingle, tAny.next).swapTokens() :
new CompareResult(tAny, tSingle.next);
return (nextResult.isGood) ? nextResult: new CompareResult(tAny, tSingle.next);
}
function anyCharCompare(tAny, tChar) {
var nextResult = (tAny.next) ? charCompare(tChar, tAny.next).swapTokens() :
new CompareResult(tAny, tChar.next);
return (nextResult.isGood) ? nextResult : new CompareResult(tAny, tChar.next);
}
function charCharCompare(litA, litB) {
return (litA.val === litB.val) ?
new CompareResult(litA.next, litB.next) : CompareResult.BadResult;
}
function setCharCompare(tSet, tChar) {
return (tSet.val.indexOf(tChar.val) > -1) ?
new CompareResult(tSet.next, tChar.next) : CompareResult.BadResult;
}
function setSetCompare(tSetA, tSetB) {
var setA = tSetA.val,
setB = tSetB.val;
for (var i = 0, il = setA.length; i < il; i++) {
if (setB.indexOf(setA.charAt(i)) > -1) return new CompareResult(tSetA.next, tSetB.next);
}
return CompareResult.BadResult;
}
jsFiddle
Time Complexity
Anything with the words "recursive backtracking" in it is at least O(N2).
Maintainability and Readability
I purposely broke out any branches into there own functions with a singular switch. Assitionally I used named constants when a one character string would suffice. Doing this made the code longer and more verbose, but I think it makes it easier to follow.
Tests
You can view all the tests in the Fiddle. You can view the comments in the Fiddle output to glean their purposes. Each token type was tested against each token type, but I haven't made one that tried all possible comparisons in a single test. I also came up with a few random tough ones like the one below.
abc[def]?fghi?*nop*[tuv]uv[wxy]?yz => a?[cde]defg*?ilmn[opq]*tu*[xyz]*
I added an interface on the jsFiddle if anybody wants to test this out themselves. The logging is broken once I added the recursion.
I don't think I tried enough negative tests, especially with the last version I created.
Optimization
Currently the solution is a brute force one, but is sufficient to handle any case. I would like to come back to this at some point to improve the time complexity with some simple optimizations.
Checks at the start to reduce comparisons could increase processing time for certain common scenarios. For example, if one pattern starts with a star and one ends with one then we already know they will intersect. I can also check all the characters from the start and end of the patterns and remove them if the match on both patterns. This way they are excluded from any future recursion.
Acknowledgements
I used #m.buettner's tests initially to test my code before I came up with my own. Also I walked through his code to help me understand the problem better.
With your very reduced pattern language, the pastebin link in your question and jpmc26's comments are pretty much all the way there: the main question is, whether the literal left and right end of your input strings match. If they do, and both contain at least one *, the strings match (because you can always match the other strings intermediate literal text with that star). There is one special case: if only one of them is empty (after removing pre- and suffix), they can still match if the other consists entirely of *s.
Of course, when checking whether the ends of the string match, you need to take into account the single-character wildcard ? and character classes, too. The single-character wildcard is easy: it cannot fail, because it will always match whatever the other character is. If it's a character class, and the other is just a character, you need to check whether the character is in the class. If they are both classes, you need to check for an intersection of the classes (which is a simple set intersection).
Here is all of that in JavaScript (check out the code comments to see how the algorithm I outlined above maps to the code):
var trueInput = [
{ left: 'prefix*', right: 'prefix:extended*' },
{ left: '*suffix', right: '*:extended:suffix' },
{ left: 'left*right', right: 'left*middle*right' },
{ left: 'a*b*c', right: 'a*b*d*b*c' },
{ left: 'hello*', right: '*ok' },
{ left: '*', right: '*'},
{ left: '*', right: '**'},
{ left: '*', right: ''},
{ left: '', right: ''},
{ left: 'abc', right: 'a*c'},
{ left: 'a*c', right: 'a*c'},
{ left: 'a[bc]d', right: 'acd'},
{ left: 'a[bc]d', right: 'a[ce]d'},
{ left: 'a?d', right: 'acd'},
{ left: 'a[bc]d*wyz', right: 'abd*w[xy]z'},
];
var falseInput = [
{ left: 'prefix*', right: 'wrong:prefix:*' },
{ left: '*suffix', right: '*suffix:wrong' },
{ left: 'left*right', right: 'right*middle*left' },
{ left: 'abc', right: 'abcde'},
{ left: 'abcde', right: 'abc'},
{ left: 'a[bc]d', right: 'aed'},
{ left: 'a[bc]d', right: 'a[fe]d'},
{ left: 'a?e', right: 'acd'},
{ left: 'a[bc]d*wyz', right: 'abc*w[ab]z'},
];
// Expects either a single-character string (for literal strings
// and single-character wildcards) or an array (for character
// classes).
var characterIntersect = function(a,b) {
// If one is a wildcard, there is an intersection.
if (a === '?' || b === '?')
return true;
// If both are characters, they must be the same.
if (typeof a === 'string' && typeof b === 'string')
return a === b;
// If one is a character class, we check that the other
// is contained in the class.
if (a instanceof Array && typeof b === 'string')
return (a.indexOf(b) > -1);
if (b instanceof Array && typeof a === 'string')
return (b.indexOf(a) > -1);
// Now both have to be arrays, so we need to check whether
// they intersect.
return a.filter(function(character) {
return (b.indexOf(character) > -1);
}).length > 0;
};
var patternIntersect = function(a,b) {
// Turn the strings into character arrays because they are
// easier to deal with.
a = a.split("");
b = b.split("");
// Check the beginnings of the string (up until the first *
// in either of them).
while (a.length && b.length && a[0] !== '*' && b[0] !== '*')
{
// Remove the first character from each. If it's a [,
// extract an array of all characters in the class.
aChar = a.shift();
if (aChar == '[')
{
aChar = a.splice(0, a.indexOf(']'));
a.shift(); // remove the ]
}
bChar = b.shift();
if (bChar == '[')
{
bChar = b.splice(0, b.indexOf(']'));
b.shift(); // remove the ]
}
// Check if the two characters or classes overlap.
if (!characterIntersect(aChar, bChar))
return false;
}
// Same thing, but for the end of the string.
while (a.length && b.length && a[a.length-1] !== '*' && b[b.length-1] !== '*')
{
aChar = a.pop();
if (aChar == ']')
{
aChar = a.splice(a.indexOf('[')+1, Number.MAX_VALUE);
a.pop(); // remove the [
}
bChar = b.pop();
if (bChar == ']')
{
bChar = b.splice(b.indexOf('[')+1, Number.MAX_VALUE);
b.pop(); // remove the [
}
if (!characterIntersect(aChar, bChar))
return false;
}
// If one string is empty, the other has to be empty, too, or
// consist only of stars.
if (!a.length && /[^*]/.test(b.join('')) ||
!b.length && /[^*]/.test(b.join('')))
return false;
// The only case not covered above is that both strings contain
// a * in which case they certainly overlap.
return true;
};
console.log('Should be all true:');
console.log(trueInput.map(function(pair) {
return patternIntersect(pair.left, pair.right);
}));
console.log('Should be all false:');
console.log(falseInput.map(function(pair) {
return patternIntersect(pair.left, pair.right);
}));
It's not the neatest implementation, but it works and is (hopefully) still quite readable. There is a fair bit of code duplication with checking the beginning and the end (which could be alleviated with a simple reverse after checking the beginning - but I figured that would just obscure things). And there are probably tons of other bits that could be greatly improved, but I think the logic is all in place.
A few more remarks: the implementation assumes that the patterns are well-formatted (no unmatched opening or closing brackets). Also, I took the array intersection code from this answer because it's compact - you could certainly improve on the efficiency of that if necessary.
Regardless of those implementation details, I think I can answer your complexity question, too: the outer loop goes over both strings at the same time, a character at a time. So that's linear complexity. Everything inside the loop can be done in constant time, except the character class tests. If one character is a character class and the other isn't, you need linear time (with the size of the class being the parameter) to check whether the character is in the class. But this doesn't make it quadratic, because each character in the class means one less iteration of the outer loop. So that's still linear. The most costly thing is hence the intersection of two character classes. This might be more complex that linear time, but the worst it could get is O(N log N): after all, you could just sort both character classes, and then find an intersection in linear time. I think you might even be able to get overall linear time complexity, by hashing the characters in the character class to their Unicode code point (.charCodeAt(0) in JS) or some other number - and finding an intersection in a hashed set is possible in linear time. So, if you really want to, I think you should be able to get down to O(N).
And what is N? The upper limit is sum of the length of both patterns, but in most cases it will actually be less (depending on the length of prefixes and suffixes of both patterns).
Please point me to any edge-cases my algorithm is missing. I'm also happy about suggested improvements, if they improve or at least don't reduce the clarity of the code.
Here is a live demo on JSBin (thanks to chakrit for pasting it there).
EDIT: As Daniel pointed out, there is a conceptual edge-case that my algorithm misses out on. If (before or after elimination of the beginning and end) one string contains no * and the other does, there are cases, where the two still clash. Unfortunately, I don't have the time right now to adjust my code snippet to accommodate that problem, but I can outline how to resolve it.
After eliminating both ends of the strings, if both strings are either empty or both contain at least *, they will always match (go through the possible *-distributions after complete elimination to see this). The only case that's not trivial is if one string still contains *, but the other doesn't (be it empty or not). What we now need to do is walk both strings again from left to right. Let me call the string that contains * A and the one that doesn't contain * B.
We walk A from left to right, skipping all * (paying attention only to ?, character classes and literal characters). For each of the relevant tokens, we check from left to right, if it can be matched in B (stopping at the first occurrence) and advance our B-cursor to that position. If we ever find a token in A that cannot be found in B any more, they do not match. If we manage to find a match for each token in A, they do match. This way, we still use linear time, because there is no backtracking involved. Here are two examples. These two should match:
A: *a*[bc]*?*d* --- B: db?bdfdc
^ ^
A: *a*[bc]*?*d* --- B: db?bdfdc
^ ^
A: *a*[bc]*?*d* --- B: db?bdfdc
^ ^
A: *a*[bc]*?*d* --- B: db?bdfdc
^ ^
These two should not match:
A: *a*[bc]*?*d* --- B: dbabdfc
^ ^
A: *a*[bc]*?*d* --- B: dbabdfc
^ ^
A: *a*[bc]*?*d* --- B: dbabdfc
^ ^
A: *a*[bc]*?*d* --- B: dbabdfc
!
It fails, because the ? cannot possibly match before the second d, after which there is no further d in B to accommodate for the last d in A.
This would probably be easy to add to my current implementation, if I had taken the time to properly parse the string into token objects. But now, I'd have to go through the trouble of parsing those character classes again. I hope this written outline of the addition is sufficient help.
PS: Of course, my implementation does also not account for escaping metacharacters, and might choke on * inside character classes.
These special patterns are considerably less powerful that full regular expressions, but I'll point out that it is possible to do what you want even with general regular expressions. These must be "true" regexes, i.e. those that use only Kleene star, alternation ( the | operation ), and concatenation with any fixed alphabet plus the empty string and empty set. Of course you can also use any syntactic sugar on these ops: one-or-more (+), optional (?). Character sets are just a special kind of alternation [a-c] == a|b|c.
The algorithm is simple in principle: Convert each regex to a DFA using the standard constructions: Thompson followed by powerset. Then use the cross product construction to compute the intersection DFA of the two originals. Finally check this intersection DFA to determine if it accepts at least one string. This is just a dfs from the start state to see if an accepting state can be reached.
If you are not familiar with these algorithms, it's easy to find Internet references.
If at least one string is accepted by the intersection DFA, there is a match between the original regexes, and the path discovered by the dfs gives a string that satisfies both. Else there is no match.
Good question!
The main complexity here is handling character classes ([...]). My approach is to replace each one with a regular expression that looks for either exactly one of the specified characters (or ?) or another character class that includes at least one of the specified characters. So for [xyz], this would be: ([xyz?]|\[[^\]]*[xyz].*?\]) - see below:
Then for "prefixes" (everything before the first *), put ^ at the beginning or for "suffixes" (everything after the last *), put $ at the end.
Further details:-
Also need to replace all instances of ? with (\[.*?\]|[^\\]]) to make it match either a character class or single character (excluding an opening square bracket).
Also need to replace each individual character that is not in a character class and is not ? to make it match either the same character, ? or a character class that includes the character. E.g. a would become ([a?]|\[[^\]]*a.*?\]). (A bit long-winded but turned out to be necessary - see comments below).
The testing should be done both ways round as follows: Test prefix #1 converted into regex against prefix #2 then test prefix #2 converted into regex against prefix #1. If either match, the prefixes can be said to "intersect".
Repeat step (3.) for suffixes: For a positive result, both prefixes and suffixes must intersect.
EDIT: In addition to the above, there is a special case when one of the patterns contains at least one * but the other doesn't. In this case, the whole of the pattern with * should be converted into a regular expression: * should match against anything but with the proviso that it only includes whole character classes. This can be done by replacing all instances of * with (\[.*?\]|[^\\]]).
To avoid this answer becoming bulky I won't post the full code but there is a working demo with unit tests here: http://jsfiddle.net/mb3Hn/4/
EDIT #2 - Known incompleteness: In its current form, the demo doesn't cater for escaped characters (e.g. \[). Not a great excuse but I only noticed these late in the day - they aren't mentioned in the question, only the link. To handle them, a bit of additional regex complexity would be needed, e.g. to check for non-existence of a backslash immediately before the [. This should be fairly painless with negative lookbehind but unfortunately Javascript doesn't support it. There are workarounds such as reversing both the string and regular expression with negative lookahead but I'm not keen on making the code less readable with this extra complexity and not sure how important it is to the OP so will leave it as an "exercise for ther reader". In retrospect, should maybe have chosen a language with more comprehensive regex support!
Determining whether a regex matches a subset of another regex using greenery:
First, pip3 install https://github.com/ferno/greenery/archive/master.zip.
Then:
from greenery.lego import parse as p
a_z = p("[a-z]")
b_x = p("[b-x]")
assert a_z | b_x == a_z
m_n = p("m|n")
zero_nine = p("[0-9]")
assert not m_n | zero_nine == m_n

Creating a histogram with C++ (Homework)

In my c++ class, we got assigned pairs. Normally I can come up with an effective algorithm quite easily, this time I cannot figure out how to do this to save my life.
What I am looking for is someone to explain an algorithm (or just give me tips on what would work) in order to get this done. I'm still at the planning stage and want to get this code done on my own in order to learn. I just need a little help to get there.
We have to create histograms based on a 4 or 5 integer input. It is supposed to look something like this:
Calling histo(5, 4, 6, 2) should produce output that appears like:
*
* *
* * *
* * *
* * * *
* * * *
-------
A B C D
The formatting to this is just killing me. What makes it worse is that we cannot use any type of arrays or "advanced" sorting systems using other libraries.
At first I thought I could arrange the values from highest to lowest order. But then I realized I did not know how to do this without using the sort function and I was not sure how to go on from there.
Kudos for anyone who could help me get started on this assignment. :)
Try something along the lines of this:
Determine the largest number in the histogram
Using a loop like this to construct the histogram:
for(int i = largest; i >= 1; i--)
Inside the body of the loop, do steps 3 to 5 inclusive
If i <= value_of_column_a then print a *, otherwise print a space
Repeat step 3 for each column (or write a loop...)
Print a newline character
Print the horizontal line using -
Print the column labels
Maybe i'm mistaken on your q, but if you know how many items are in each column, it should be pretty easy to print them like your example:
Step 1: Find the Max of the numbers, store in variable, assign to column.
Step 2: Print spaces until you get to column with the max. Print star. Print remaining stars / spaces. Add a \n character.
Step 3: Find next max. Print stars in columns where the max is >= the max, otherwise print a space. Add newline. at end.
Step 4: Repeat step 3 (until stop condition below)
when you've printed the # of stars equal to the largest max, you've printed all of them.
Step 5: add the -------- line, and a \n
Step 6: add row headers and a \n
If I understood the problem correctly I think the problem can be solved like this:
a= <array of the numbers entered>
T=<number of numbers entered> = length(a) //This variable is used to
//determine if we have finished
//and it will change its value
Alph={A,B,C,D,E,F,G,..., Z} //A constant array containing the alphabet
//We will use it to print the bottom row
for (i=1 to T) {print Alph[i]+" "}; //Prints the letters (plus space),
//one for each number entered
for (i=1 to T) {print "--"}; //Prints the two dashes per letter above
//the letters, one for each
while (T!=0) do {
for (i=1 to N) do {
if (a[i]>0) {print "*"; a[i]--;} else {print " "; T--;};
};
if (T!=0) {T=N};
}
What this does is, for each non-zero entered number, it will print a * and then decrease the number entered. When one of the numbers becomes zero it stops putting *s for its column. When all numbers have become zero (notice that this will occur when the value of T comes out of the for as zero. This is what the variable T is for) then it stops.
I think the problem wasn't really about histograms. Notice it also doesn't require sorting or even knowing the

C++ - solve a sudoku game

I'm new to C++ and have to do a home assignment (sudoku). I'm stuck on a problem.
Problem is that to implement a search function which to solve a sudoku.
Instruction:
In order to find a solution recursive search is used as follows. Suppose that there is a
not yet assigned field with digits (d1....dn) (n > 1). Then we first try to
assign the field to d1, perform propagation, and then continue with search
recursively.
What can happen is that propagation results in failure (a field becomes
empty). In that case search fails and needs to try different digits for one of
the fields. As search is recursive, a next digit for the field considered last
is tried. If none of the digits lead to a solution, search fails again. This in
turn will lead to trying a different digit from the previous field, and so on.
Before a digit d is tried by assigning a field to
it, you have to create a new board being a copy of the current board (use
the copy constructor and allocate the board from the heap with new). Only
then perform the assignment on the copy. If the recursive call to search
returns unsuccessfully, a new board can be created for the next digit to be
tried.
I've tried:
// Search for a solution, returns NULL if no solution found
Board* Board::search(void) {
// create a copy of the cur. board
Board *copyBoard = new Board(*this);
Board b = *copyBoard;
for(int i = 0; i < 9; i++){
for(int j = 0; j < 9; j++){
// if the field has not been assigned, assign it with a digit
if(!b.fs[i][j].assigned()){
digit d = 1;
// try another digit if failed to assign (1 to 9)
while (d <=9 && b.assign(i, j, d) == false){
d++;
// if no digit matches, here is the problem, how to
// get back to the previous field and try another digit?
// and return null if there's no pervious field
if(d == 10){
...
return NULL;
}
}
}
}
return copyBoard;
}
Another problem is where to use the recursive call? Any tips? thx!
Complete instruction can been found here: http://www.kth.se/polopoly_fs/1.136980!/Menu/general/column-content/attachment/2-2.pdf
Code: http://www.kth.se/polopoly_fs/1.136981!/Menu/general/column-content/attachment/2-2.zip
There is no recursion in your code. You can't just visit each field once and try to assign a value to it. The problem is that you may be able to assign, say, 5 to field (3,4) and it may only be when you get to field (6,4) that it turns out there can't be a 5 at (3, 4). Eventually you need to back out of recursion until you come back to (3,4) and try another value there.
With recursion you might not use nested for loops to visit fields, but visit the next field with a recursive call. Either you manage to reach the last field, or you try all possibilities and then leave the function to get back to the previous field you visited.
Sidenote: definitely don't allocate dynamic memory for this task:
//Board *copyBoard = new Board(*this);
Board copyBoard(*this); //if you need a copy in the first place
Basically what you can try is something like this (pseudocode'ish)
bool searchSolution(Board board)
{
Square sq = getEmptySquare(board)
if(sq == null)
return true; // no more empty squares means we solved the puzzle
// otherwise brute force by trying all valid numbers
foreach (valid nr for sq)
{
board.doMove(nr)
// recurse
if(searchSolution(board))
return true
board.undoMove(nr) // backtrack if no solution found
}
// if we reach this point, no valid solution was found and the puzzle is unsolvable
return false;
}
The getEmptySquare(...) function could return a random empty square or the square with the least number of options left.
Using the latter will make the algorithm converge much faster.