Calculating combinations with duplicated values and stacks that hold more than one value - combinations

Brace yourself. Given a set of number with various values repeated (i.e. 1,2,2,2,3,4,5,5,6,7,7) and special slots that can hold an infinite amount of values, how would one calculate the possible combinations where each value is distributed into one of the slots once and only once? Another restriction is that each slot has to have at least one value. The final restriction is that value combinations within a single slot cannot be repeated within a single trial. For instance:
1 | 2,7 | 2,3,4,5 | 2,7 | 6
This would be illegal because "2,7" is repeated within a single set (meaning one combination). The numbers above act as a single combination, or one trial. To press the return key would initialize a second trial (combination), in which "2,7" could be repeated with no error. Whereas:
1,2 | 2,2,3,4| 5,5,6 | 7,7
and
1,2 | 2,2,3| 5,5,6 | 4,7,7
would be legal because "5,5,6" is only repeated in a separate trial. Above are two separate combinations (two trials). "5,5,6" is, indeed, repeated but the repetition clause only applies and is illegal when repetition is present within one combination.
I'm not sure how to apply basic combination arithmetic to this problem or even if basic formulas could apply. How would this problem be calculated? Help.

Related

Regex expression for odd # of a's and odd # of b's

I need to create a regex in the language {a,b} that accepts all string with an odd number of a's and an odd number of b's.
Here is my latest and closest try:
(((aa+bb)*(ab+ba))*+((ab+ba)(aa+bb)*)*)
The grader says that it failed on "", which I assume means it accepts lambda but I do not see how. This does not mean that this is the only thing wrong.
Help please!
Your attempt has several issues:
Indeed "" is matched (and shouldn't): all parts of your regular expression are optional
abab, abba, ... etc would be matched as well because ((aa+bb)*(ab+ba))* could be matching an even number of times.
The same goes for the second half of the regular expression
Here is one that would do the trick:
(aa+bb)*(ab+ba)((aa+bb)*(ab+ba)(aa+bb)*(ab+ba))*(aa+bb)*
Here the first (ab+ba) part is not optional, so "" would not match.
There are four states to consider:
even number of a's, even number of b's (initial state)
even number of a's, odd number of b's
odd number of a's, even number of b's
odd number of a's, odd number of b's (target state)
(aa+bb)* is state invariant: the state before the match is the same as the state after the match.
(ab+ba) swaps state 1 with state 4 and vice versa (and state 2 with state 3 and vice versa, but we're not interested in that)
((aa+bb)*(ab+ba)(aa+bb)*(ab+ba))* is state invariant, but it allows the state to go to any other state and come eventually back to the original, ... in all possible ways. When this pattern is executed, the starting state is 4, and so it also exits at that state.
If we take out all the state invariant parts, only (ab+ba) is left over, which transitions the initial state to the target state.
All allowed atomic state changes are covered in this expression.

Convert a regulation expression to DFA

I have been trying different ways to solve this problem for over an hour and am getting very frustrated.
The problem is: Give regular expressions and DFAs for each of the following languages over Sigma = {0,1}.
a). {w ∈ Σ* | w contains an even number of 0s or an odd number of 1s}
If anyone could provide hints or get me started on figuring this one out, it would be very appreciated!
I know it is something along the lines of this DFA but this one is for
{w ∈ Σ* | w contains an even number of 0s or exactly two 1's}
so it's a bit different but I can't figure it out.
You can see it as follows: you always have to remember two things:
whether the number of 0s is even or odd; and
whether the number of 1s is even or odd.
Now if we denote even with e and odd with o, we consider four states: ee (both even), eo (even number of 0s and odd number of 1s), oe and oo.
Now when we read a zero (0), we simply swap the first state token, so it means we introduce transitions from:
ee - 0 -> oe;
eo - 0 -> oo;
oe - 0 -> ee; and
oo - 0 -> eo.
The same for ones (1):
ee - 1 -> eo;
eo - 1 -> ee;
oe - 1 -> oo; and
oo - 1 -> oe.
Now we only need to determine the initial state and the accepting state(s). The intial state is ee, since at that moment we have considered no zeros and no ones.
Furthermore the accepting state can by determined by the condition:
w contains an even number of 0s or an odd number of 1s
So that means the accepting states are ee, eo and oo. A drawing of this DFA is shown below:
There exists an algorithmic way to convert a DFA into an equivalent regular expression as is stated here.
You can construct a regular expression by splitting the problem into two easier problems:
a regex that checks if the number of 0s is even; and
a regex that checks if the number of 1s is odd.
For the first, you can use the regex:
(1*01*0)*1*
Indeed: you first have a group (1*01*0). This group ensures that there are two zeros, and 1s can appear everywhere in between. We allow an arbitrary number of repetitions, since the number always remains even. The regex ends with 1* since it is still possible that there are additional ones in the string.
The second problem can be solved with the regex:
0*1(0*10*1)*0*
The solution is more or less the same. The expression between the brackets: (0*10*1) ensures that the ones occur evenly. By adding a 1 in front, we ensure the number of 1s is odd.
A regular expression that then solves the problem is:
(1*01*0)*1*|0*1(0*10*1)*0*
Since the "pipe" (|) means "or".
Think about what possible states you can ever be in.
A number contains either an even number of 0's or an odd number of 0's. (2 possible states)
A number contains either an even number of 1's or an odd number of 1's. (2 possible states)
Now let's look at what combinations are accepted by your language:
even 0's, even 1's: accept
even 0's, odd 1's: accept
odd 0's, even 1's: reject
odd 0's, odd 1's: accept
As a result, your DFA will need 4 states, of which 3 are accept states and 1 is a reject state. Every state will have 2 transitions leading to a different state. Since the empty string has an even number of 0's and an even number of 1's, the first state will be the initial state.
For making this into a regular expression: think about how you'd match an even number of 0's, then how you'd match an odd number of 1's. The language is just the union of these two.
Alternatively, as suggested by Willem, you can use an algorithm to convert any NFA to a regular expression. It has the advantage of being very general, but it's also more technical. Either way, it should lead to an equivalent regular expression.
What does a number with an even number of 0's look like? It might start with any number of 1's, but when we do find a 0 we better find another one! There can be any number of 1's in between, but we only care about the 0's. Thus, we come up with the following regular expression:
1*(01*01*)*
You should be able to apply a similar logic to match an odd number of 1's. Finally, OR the two expressions to get the requested regular expression.

How do I convert language set notation to regular expressions?

I have this following questing in regular expression and I just can't get my head around these kind of problems.
L1 = { 0n1m | n≥3 ∧ m is odd }
How would I write a regular expression for this sort of problem when the alphabet is {0,1}.
What's the answer?
The regular expression for your example is:
000+1(11)*1
So what does this do?
The first two characters, 00, are literal zeros. This is going to be important for the next point
The second two characters, 0+, mean "at least one zero, no upper bound". These first four characters satisfy the first condition, which is that we have at least three zeros.
The next character, 1, is a literal one. Since we need to have an odd number of ones, this is the smallest number we're allowed to have
The last-but-one characters, (11), represent a logical grouping of two literal ones, and the ending * says to match this grouping zero or more times. Since we always have at least one 1, we'll always match an odd number. So we're done.
How'd I get that?
The key is knowing regular expression syntax. I happen to have quite a bit of experience in it, but this website helped me to verify.
Once you know the basic building blocks of regex, you need to break down your problem into what you can represent.
For example, regex allows us to specify a lower AND upper bound for matching (the {x,y} syntax), but doesn't allow to specify just a lower bound ({x} will match exactly x times). So I knew I would have to use either + or * to specify the zeros, as those are the only specifiers that permit an infinite number of matches. I also knew that it didn't make sense to apply those modifiers to a group; the restriction that we must have at least 3 zeroes doesn't imply that we must have a multiple of three, for example, so (000)+ was out. I had to apply the modifier to only one character, which meant I had to match a few literals first. 000 guarantees matching exactly three 0s, and 0* (Final expression 0000*) does exactly what I want, and then I condensed that to the equivalent 000+.
For the second condition, I had to think about what an odd number is. By definition, an odd number can be expressed by 2*k + 1, where k is an integer. So I had to match one 1 (Hence the literal 1), and some number of the substring 11. That led me to the group, and then the *. On a slightly different problem, you could write 1(11)+ to match any odd number of ones, and at least 3.
1 A colleague of mine pointed out to me that the + operator isn't technically part of the formal definition of regular expressions. If this is an academic question rather than a programming one, you might find the 0000* version more helpful. In that case, the final string would be 0000*1(11)*

Closest Pattern Matching

I am trying to identify citations within text. I can use LEX to define and match the citation patterns. However, this only works when the citations are correct. There tend to be A LOT of subtle errors in the documents.
These variations generally are not speling erors. The most common errors are missing punctuation or citation elements.
Question: is there some effective table driven method to do close matches? A possible variation on LEX? Or maybe a LEX programming technique (like error in YACC).
Resurrecting this question because I see no one answered you.
With usual regex engine: no
As you've probably found out, with most engines, regex is not the best tool for doing close matches or arbitrary words or phrases. Sure, at the most basic level, something like \bs?he\b will find either he or she... But to find all close matches in a word such as interactive, you'd have to generate a regex that introduces a number of permutations in the word... Neither efficient nor effective.
One exception
The one exception I know of is Matthew Burnett's [regex][1] module for Python. To start with, it's a terrific engine, one of the only two that I know (with .NET) that supports infinite-width lookbehinds. (JGSoft supports it too, but it's not tied to a language.)
This engine has a fuzzymatch mode that might just do what you want. You can provide a "cost equation" (max number of substitutions and so on).
You'd have to hook up Python to your data... There might be a module available for that.
Here's an excerpt from the doc.
Regex usually attempts an exact match, but sometimes an approximate,
or "fuzzy", match is needed, for those cases where the text being
searched may contain errors in the form of inserted, deleted or
substituted characters.
A fuzzy regex specifies which types of errors are permitted, and,
optionally, either the minimum and maximum or only the maximum
permitted number of each type. (You cannot specify only a minimum.)
The 3 types of error are:
Insertion, indicated by "i" Deletion, indicated by "d" Substitution,
indicated by "s"
In addition, "e" indicates any type of error.
The fuzziness of a regex item is specified between "{" and "}" after
the item.
Examples:
foo match "foo" exactly
(?:foo){i} match "foo", permitting insertions
(?:foo){d} match "foo", permitting deletions
(?:foo){s} match "foo", permitting substitutions
(?:foo){i,s} match "foo", permitting insertions and substitutions
(?:foo){e} match "foo", permitting errors
If a certain type of error is specified, then any type not specified
will not be permitted.
In the following examples I'll omit the item and write only the
fuzziness.
{i<=3} permit at most 3 insertions, but no other types
{d<=3} permit at most 3 deletions, but no other types
{s<=3} permit at most 3 substitutions, but no other types
{i<=1,s<=2} permit at most 1 insertion and at most 2 substitutions,
but no deletions
{e<=3} permit at most 3 errors
{1<=e<=3} permit at least 1 and at most 3 errors
{i<=2,d<=2,e<=3} permit at most 2 insertions, at most 2 deletions, at
most 3 errors in total, but no substitutions
It's also possible to state the costs of each type of error and the
maximum permitted total cost.
Examples:
{2i+2d+1s<=4} each insertion costs 2, each deletion costs 2, each
substitution costs 1, the total cost must not exceed 4
{i<=1,d<=1,s<=1,2i+2d+1s<=4} at most 1 insertion, at most 1 deletion,
at most 1 substitution; each insertion costs 2, each deletion costs 2,
each substitution costs 1, the total cost must not exceed 4
You can also use "<" instead of "<=" if you want an exclusive minimum
or maximum:
{e<=3} permit up to 3 errors
{e<4} permit fewer than 4 errors
{0
By default, fuzzy matching searches for the first match that meets the
given constraints. The ENHANCEMATCH flag will cause it to attempt to
improve the fit (i.e. reduce the number of errors) of the match that
it has found.
The BESTMATCH flag will make it search for the best match instead.
Further examples to note:
regex.search("(dog){e}", "cat and dog")1 returns "cat" because that
matches "dog" with 3 errors, which is within the limit (an unlimited
number of errors is permitted).
regex.search("(dog){e<=1}", "cat and dog")1 returns " dog" (with a
leading space) because that matches "dog" with 1 error, which is
within the limit (1 error is permitted).
regex.search("(?e)(dog){e<=1}", "cat and dog")1 returns "dog"
(without a leading space) because the fuzzy search matches " dog" with
1 error, which is within the limit (1 error is permitted), and the
(?e) then makes it attempt a better fit.
In the first two examples there are perfect matches later in the
string, but in neither case is it the first possible match.
The match object has an attribute fuzzy_counts which gives the total
number of substitutions, insertions and deletions.
A 'raw' fuzzy match:
regex.fullmatch(r"(?:cats|cat){e<=1}", "cat").fuzzy_counts (0, 0, 1)
0 substitutions, 0 insertions, 1 deletion.
A better match might be possible if the ENHANCEMATCH flag used:
regex.fullmatch(r"(?e)(?:cats|cat){e<=1}", "cat").fuzzy_counts (0, 0, 0)
0 substitutions, 0 insertions, 0 deletions.

Sorting names with numbers correctly

For sorting item names, I want to support numbers correctly. i.e. this:
1 Hamlet
2 Ophelia
...
10 Laertes
instead of
1 Hamlet
10 Laertes
2 Ophelia
...
Does anyone know of a comparison functor that already supports that?
(i.e. a predicate that can be passed to std::sort)
I basically have two patterns to support: Leading number (as above), and number at end, similar to explorer:
Dolly
Dolly (2)
Dolly (3)
(I guess I could work that out: compare by character, and treat numeric values differently. However, that would probably break unicode collaiton and whatnot)
That's called alphanumeric sorting.
Check out this link: The Alphanum Algorithm
i think u can use a pair object and then make vector > and then sort this vector.
Pairs are compared based on their first elements. So, this way you can get the sort you desire.